I'm not sure where the Pigouvian tax comes in, but Dan Kahan:
I often get asked to review papers that use M Turk samples.
This is a problem because I think M Turk samples, while not invalid for all forms of study, are invalid for studies of how individual differences in political predispositions and cognitive-reasoning proficiencies influence the processing of empirical information relevant to risk and other policy issues.
I've discussed this point at length.
And lots of serious scholars now have engaged this isssue seriously.
"Seriously" not in the sense of merely collecting some data on the demographics of M Turk samples at one point in time and declaring them "okay" for all manner of studies once & for all. Anyone who produces a study like that, or relies on it to assure readers his or her own use of an M Turk sample is "okay," either doesn't get the underlying problem or doesn't care about it.
I mean really seriously in the sense of trying to carefully document the features of the M Turk work force that bear on the validity of it as a sample for various sorts of research, and in the sense of engaging in meaningful discussion of the technical and craft issues involved.
I myself think the work and reflections of these serious scholars reinforce the conclusion that it is highly problematic to rely on M Turk samples for the study of information processing relating to risk and other facts relevant to public policy.
The usual reply is, "but M Turk samples are inexpensive! They make it possible for lots & lots of scholars to do and publish empirical research!"
Well, thought experiments are even cheaper. But they are not valid.
If M Turk samples are not valid, it doesn't matter that they are cheap. Validity is a non-negotiable threshold requirement for use of a particular sampling method. It's not an asset or currency that can be spent down to buy "more" research-- for the research that such a "trade off" subsidizes in fact has no value.
Another argument is, "But they are better than university student samples!" If student samples are not valid for a particular kind of research, then journals shouldn't accept studies that use them either. But in any case, it's now clear that M Turk workers don't behave the way U.S. university students do when responding to survey items that assess whether subjects are displaying the sorts of reactions one would expect in people who claim that they are members of the U.S. public with particular political outlooks (Krupnikov & Levine 2014).
I think serious journals should adopt policies announcing that they won't accept studies that use M Turk samples for types of studies they are not suited for.
But in any case, they ought at least to adopt policies one way or the other--rather than put authors in the position of not knowing before they collect the data whether journals will accept their studies, and authors and reviewers in the position of having a debate about the appropriateness of using such a sample over & over. Case-by-case assessment is not a fair way to handle this issue, nor one that will generate a satisfactory overall outcome.
So ... here is my proposal:
Pending a journal's adoption of a uniform policy on M Turk samples, the journal should oblige authors who do use M Turk samples to give a full account--in their paper-- of why the authors believe it is appropriate to use M Turk workers to model the reasoning process of ordinary members of the U.S. public. The explanation should consist of a full accounting of the authors’ own assessment of why they are not themselves troubled by the objections that have been raised to the use of such samples; they shouldn't be allowed to dodge the issue by boilerplate citations to studies that purport to “validate” such samples for all purposes, forever & ever. Such an account helps readers to adjust the weight that they afford study findings that use M Turk samples in two distinct ways: by flagging the relevant issues for their own critical attention; and by furnishing them with information about the depth and genuineness of the authors’ own commitment to reporting research findings worthy of being credited by people eager to figure out the truth about complex matters.
There are a variety of key points that authors should be obliged to address. ...
I feel pretty confident M Turk samples are not long for this world for studies that examine individual differences in reasoning relating to politically contested risks and other policy-relevant facts (again, there are no doubt other research questions for which M Turk samples are not nearly so problematic).
Researchers in this area will not give much weight to studies that rely on M Turk samples as scholarly discussion progresses.
In addition, there is a very good likelihood that an on-line sampling resource that is comparably inexpensive but informed by genuine attention to validity issues will emerge in the not too distant future.
E.g., Google Consumer Surveys now enables researchers to field a limited number of questions for between $1.10 & $3.50 per complete-- a fraction of the cost charged by on-line firms that use valid & validated recruitment and stratification methods.
Google Consumer Surveys has proven its validity in the only way that a survey mode--random-digit dial, face-to-face, on-line --can: by predicting how individuals will actually evince their opinions or attitudes in real-world settings of consequence, such as elections. Moreover, if Google Surveys goes into the business of supplying high-quality scholarly samples, they will be obliged to be transparent about their sampling and stratification methods and to maintain them (or update them for the purposes of making them even more suited for research) over time.
As I said, Amazon couldn't care less whether the recruitment methods it uses for M Turk workers now or in the future make them suited for scholarly research.
The problem right now w/ Google Consumer Surveys is that the number of questions is limited and so, as far as I can tell, is the complexity of the instrument that one is able to use to collect the data, making experiments infeasible.
But I predict that will change. ...
I agree that you should make the case that the benefits of an MTurk sample exceed its costs. There are also other relatively cheap survey samples. SurveyMonkey and Survey Sampling, Inc charge as low as $3 and $6 per complete (as of the last time I checked).
An interesting study would be to compare, say, a willingness-to-pay function from MTurk to these other samples.
Note: Ash Morgan and I use an MTurk sample in this paper.
Hat tip: Andrew Gelman