The reports of statistical tests usually include a measure called p-value. If it does not, it is probably not a very reliable analysis. P-values are commonly used when comparing two or more groups. P-values are necessary because analysts rarely have the access to the whole population and use samples instead.
Sample and Population
Population includes all the members of one particular group. For example, female and male populations are about 4 billion each. Image if you wanted to see if females differ in height to males. For obvious reasons it is practically impossible to test every female and male in the world. Does this mean that you cannot try to assess whether there is a difference in height between these groups? Statisticians do not think so! That is where samples come into play.
A sample is simply a subset of population. If you are still interested in the height difference between males and females you can select a random sample, or a random number of people from the population, and analyze it. Unfortunately, you might select a biased sample. Perhaps, by chance, the males in your sample are unusually tall and females are unusually short. That is where the p-value comes into play.
P-value
Let’s say that you have selected a sample of 10 males and 10 females. You measured them and discovered a 2 cm difference between your samples. Whereas males are on average 170 cm and females are 168 cm. If you had tested the whole population and observed the same difference you could now confidently say that males are taller than females. However, you used a sample which means that you need to do some additional steps before concluding any findings. Now you need to use a test to see whether the 2 cm difference is statistically significant. Or in other words, you need to see whether your p-value is lower than 0.05 (5%). P-value is the probability that the results which you obtained from your two samples could have occurred if there was actually no difference between males and females in the population. In the context of our question, the p-value show us the probability of observing males as 170 cm tall and females as 168 cm tall in our sample when the actual height of males and females in the population is equal.
If you are familiar with alternative and null hypotheses then another way to describe p-value is as the probability of finding a difference between two groups when the null hypothesis is in fact true.
The most common p-value threshold is 5% or 0.05. That is, a test is considered statistically significant if p < 0.05. But this threshold is arbitrary and different researchers use different p-value to assess significance depending on how stringent they want their criteria to be. For instance, 1% or p=0.01 as well as 10% or p=0.1 are other common significance levels.
