Following Tim's post, from the Q&A with Ron Wasserstein (ASA’s executive director):
Retraction Watch: You note in a press release accompanying the ASA statement that you’re hoping research moves into a “post p<0.05” era – what do you mean by that? And if we don’t use p values, what do we use instead?
Ron Wasserstein: In the post p<0.05 era, scientific argumentation is not based on whether a p-value is small enough or not. Attention is paid to effect sizes and confidence intervals. Evidence is thought of as being continuous rather than some sort of dichotomy. (As a start to that thinking, if p-values are reported, we would see their numeric value rather than an inequality (p=.0168 rather than p<0.05)). All of the assumptions made that contribute information to inference should be examined, including the choices made regarding which data is analyzed and how. In the post p<0.05 era, sound statistical analysis will still be important, but no single numerical value, and certainly not the p-value, will substitute for thoughtful statistical and scientific reasoning.
In business stats I teach that the p-value is a continuous measure (i.e., p=0.0168), not discrete (p<0.05), and the quickest way to get an idea about statistical significance from computer output. I say look for those p-values below 0.10 and then consider the effect sizes. But ignorance of effect size is not such a problem in applied disciplines. For example, policy-relevant environmental and resource economics is focused on estimation of benefits and costs and other policy impacts are often the research goal.
Also, we should be doing more one-tailed tests when theory suggests it. For example, the contingent valuation and choice experiment price (i.e., tax, bid) and scope tests should be one-sided (I think this is what we did in SEJ 1998 for the scope test but not the price test, ugh). Many scope tests classified as "fail" by Desvousges, Mathews and Train (2012) would be reclassified as "mixed" or "pass" if one-tailed tests were the norm.
Cho and Abe (Is two-tailed testing for directional research hypotheses tests legitimate?) begin like this:
Standard textbooks on statistics clearly state that non-directional research hypotheses should be tested using two-tailed testing while one-tailed testing is appropriate for testing directional research hypotheses (e.g., Churchill and Iacobucci, 2002 and Pfaffenberger and Patterson, 1987). However, during the actual conduct of statistical testing, this advice is not often heeded. According to our observation of 492 recent empirical articles that have used structural equation modeling (SEM), regression analysis, and analysis of variance (ANOVA) in five selected marketing research-related journals, the Journal of Marketing, Journal of Marketing Research, Marketing Science, Journal of Consumer Research, and Advances in Consumer Research (2001–2005), there were 2703 (N = 2703) research hypotheses in total. Overall, 90.9% (n = 2458) of them are expressed in directional form, but only 9.1% (n = 245) of them are described in non-directional form.
How did research evolve to this point? Is it just laziness? Or, did someone prominent criticize a paper for having weak statistics because the author used theory to inform inference?
I don't have any problem if researchers report that a one-tailed test is statistically significant at the 90% confidence level (i.e., a t-statistic of 1.37 is enough to justify discussion of an interesting result and economics is a social science). I think I'll put this into practice and this and see what happens (CNREP is just around the corner).
All that said, I've been guilty as a referee of scolding authors for discussing coefficients where p>0.10.
Recent Comments