How big is your P? I'm sure Donald Trump could answer that, but I'm talking about the big news today that statisticians think that the general science/social science/pseudo-science community have been using P-values wrong and it has affected the state of scientific research and publication in general. I'm not a statistician, but I play one in my economics classes. I used to think I knew what a P-value was, but after reading a number of convoluted and some not-so-convoluted attempts at defining it I'm no longer sure I know what I thought I knew. So here's my understanding of a P-value in overly simplistic form (typical for Env-Econ). Let me know where I've gone wrong.
Suppose I have a hunch that male students at Ohio State are taller than female students on average. But, I can't measure every student at Ohio State, so instead, I am going to use the heights of the male students and the female students in my class of 20 students to see if I can learn anything about all students at Ohio State. First I am going to make an ass out of you and me and assume that the 20 students in my class are drawn completely at random with respect to their height from the general population of students at Ohio State. That is, there's no correlation between the students' heights and the likelihood they will end up in my class. Everyone in the full population of students has an equal probability of ending up in my class.
Next, I am going to play dumb and assume that male and female students at Ohio State are the same height on average. Under that assumption (called the null hypothesis), I would expect that the male and female students in my class are going to be roughly the same height, on average. But because I am only drawing a small subsample from the population, I am not going to expect the null hypothesis to hold exactly with my subsample of 20 students. Or put another way, if we were to repeatedly redraw samples of 20 students from the full population of students, we would expect to get different average heights for each subsample, but we would expect the average of the averages to be the same as the population average if we take a large enough number of subsamples. This is called sampling variation, and it is this sampling variation that creates the need for a P-value.
Now to test the null hypothesis that male and female students are the same height on average, I calculate a t-statistic for differences in means using the data I have from my sample of 20 students. I then calculate the corresponding P-value under the assumption that the null hypothesis is true.
So suppose I get a p-value of 0.04. What does that mean? The p=0.04 means that if I were to redraw 100 new random samples of 20 students from the full population of Ohio State students, I would expect that only 4 of those new samples would generate differences in means equal to or smaller than the difference between male and female heights I observe in my class*.
So a p-value just gives us information on the likelihood that the data we observe comes from a data generating process consistent with the null hypothesis.
Alright, now tell me what I screwed up.
*Yes, John, I'm ignoring the one-tail/two-tail issues.