What is P-Value? – Understanding the meaning, math and methods (2024)

P Value is a probability score that is used in statistical tests to establish the statistical significance of an observed effect. Though p-values are commonly used, the definition and meaning is often not very clear even to experienced Statisticians and Data Scientists. In this post I will attempt to explain the intuition behind p-value as clear as possible.

P Value Intuition and Simplest Explanation. Photo by Clarissa Pambianco.

Introduction

In Data Science interviews, one of the frequently asked questions is ‘What is P-Value?”.

Believe it or not, even experienced Data Scientists often fail to answer this question. This is partly because of the way statistics is taught and the definitions available in textbooks and online sources.

According to American Statistical Association,
“a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.”

That’s hard to grasp, yes?

Alright, lets understand what really is p value in small meaningful pieces so ultimately it all makes sense.

When and how is p-value used?

To understand p-value, you need to understand some background and context behind it. So, let’s start with the basics.

When and how is p-value used?

p-values are often reported whenever you perform a statistical significance test (like t-test, chi-square test etc). These tests typically return a computed test statistic and the associated p-value. This reported value is used to establish the statistical significance of the relationships being tested.

So, whenever you see a p-value, there is an associated statistical test.

That means, there is a Hypothesis testing being conducted with a defined Null Hypothesis (H0) and a corresponding Alternate hypothesis (HA).

The p-value reported is used to make a decision on whether the null hypothesis being tested can be rejected or not.

Let’s understand a little bit more about the null and alternate hypothesis.

Now, how to frame a Null hypothesis in general?

While the null hypothesis itself changes with every statistical test, there is a general principle to frame it:

The null hypothesis assumes there is ‘no effect’ or ‘relationship’ by default.

For example: if you are testing if a drug treatment is effective or not, then the null hypothesis will assume there is not difference in outcome between the treated and untreated groups. Likewise, if you are testing if one variable influences another (say, car weight influences the mileage), then null hypothesis will postulate there is no relationship between the two.

It simply implies the absence of an effect.

Examples of Statistical Tests reporting out p-value

Here are some examples of Null hypothesis (H0) for popular statistical tests:

Welch Two Sample t-Test: The true difference in means of two samples is equal to 0
Linear Regression: The beta coefficient(slope) of the X variable is zero
Chi Square test: There is no difference between expected frequencies and observed frequencies.

Get the feel?

But how would the alternate hypothesis would look like?

The alternate hypothesis (HA) is always framed to negate the null hypothesis. The corresponding HA for above tests are as follows:

Welch Two Sample t-Test: The true difference in means of two samples is NOT equal to 0
Linear Regression: The beta coefficient(slope) of the X variable is NOT zero
Chi Square test: The difference between expected frequencies and observed frequencies is NOT zero.

What p-value really is

Now, back to the discussion on p-value.

Along with every statistical test, you will get a corresponding p-value in the results output.

What is this meant for?

It is used to determine if the data is statistically incompatible with the null hypothesis.

Not clear eh?

Let me put it in another way.

The P Value basically helps to answer the question: ‘Does the data really represent the observed effect?’.

This leads us to a more mathematical definition of P-Value.

The P Value is the probability of seeing the effect(E) when the null hypothesis is true.

If you think about it, we want this probability to be very low.

Having said that, it is important to remember that p-value refers to not only what we observed but also observations more extreme than what was observed. That is why the formal definition of p-value contain the statement ‘would be equal to or more extreme than its observed value.’

How is p-value used to establish statistical significance

Now that you know, p value measures the probability of seeing the effect when the null hypothesis is true.

A sufficiently low value is required to reject the null hypothesis.

Notice how I have used the term ‘Reject the Null Hypothesis’ instead of stating the ‘Alternate Hypothesis is True’.

That’s because, we have tested the effect against the null hypothesis only.

So, when the p-value is low enough, we reject the null hypothesis and conclude the observed effect holds.

But how low is ‘low enough’ for rejecting the null hypothesis?

This level of ‘low enough’ cutoff is called the alpha level, and you need to decide it before conducting a statistical test.

But how low is ‘low enough’?

Practical Guidelines to set the cutoff of Statistical Significance (alpha level)

Let’s first understand what is Alpha level.

It is the cutoff probability for p-value to establish statistical significance for a given hypothesis test.
For an observed effect to be considered as statistically significant, the p-value of the test should be lower than the pre-decided alpha value.

Typically for most statistical tests(but not always), alpha is set as 0.05.

In which case, it has to be less than 0.05 to be considered as statistically significant.

What happens if it is say, 0.051?

It is still considered as not significant. We do NOT call it as a weak statistical significant.
It is either black or white. There is no gray with respect to statistical significance.

Now, how to set the alpha level?

Well, the usual practice is to set it to 0.05.

But when the occurrence of the event is rare, you may want to set a very low alpha. The rarer it is, the lower the alpha.

For example in the CERN’s Hadron collider experiment to detect Higgs-Boson particles(which was very rare), the alpha level was set so low to 5 Sigma levels, which means a p value of less than 3 * 10^-7 is required reject the null hypothesis.

Whereas for a more likely event, it can go up to 0.1.

Secondly, more the samples (number of observations) you have the lower should be the alpha level. Because, even a small effect can be made to produce a lower p-value just by increasing the number of observations.
The opposite is also true, that is, a large effect can be made to produce high p value by reducing the sample size.

In case you don’t know how likely the event can occur, its a common practice to set it as 0.05. But, as a thumb rule, never set the alpha greater than 0.1.

Having said that the alpha=0.05 is mostly an arbitrary choice. Then why do most people still use p=0.05?
That’s because thats what is taught in college courses and being traditionally used by the scientific community and publishers.

What P Value is Not

Given the uncertainty around the meaning of p-value, it is very common to misinterpret and use it incorrectly.

Some of the common misconceptions are as follows:

P-Value is the probability of making a mistake. Wrong!
P-Value measures the importance of a variable. Wrong!
P-Value measures the strength of an effect. Wrong!

A smaller p-value does not signify the variable is more important or even a stronger effect.

Why?

Because, like I mentioned earlier, any effect no matter how small can be made to produce smaller p-value only by increasing the number of observations (sample size).

Likewise, a larger value does not imply a variable is not important.

For a sound communication, it is necessary to report not just the p-value but also the sample size along with it. This is especially necessary if the experiments involve different sample sizes.

Secondly, making inferences and business decisions should not be based only on the p-value being lower than the alpha level.

Analysts should understand the business sense, understand the larger picture and bring out the reasoning before making an inference and not just rely on the p-value to make the inference for you.

Does this mean the p-value is not useful anymore?

Not really. It is a useful tool because it provides an objective standard for everyone to assess. Its just that you need to use it the right way.

Example: How to find p-value for linear regression

Linear regression is a traditional statistical modeling algorithm that is used to predict a continuous variable (a.k.a dependent variable) using one or more explanatory variables.

Let’s see an example of extracting the p-value with linear regression using the mtcars dataset. In this dataset the specifications of the vehicle and the mileage performance is recorded.

We want to use linear regression to test if one of the specs “the ‘weight’ (wt) of the vehicle” has a significant relationship (linear) with the ‘mileage’ (mpg).

This can be conveniently done using python’s statsmodels library. But first, let’s load the data.

With statsmodels library

# Load Packagesimport pandas as pdimport statsmodels.formula.api as smf# Importdf = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/mtcars.csv', usecols=['mpg', 'wt'])df.head()

	mpg	wt
0	4.582576	2.620
1	4.582576	2.875
2	4.774935	2.320
3	4.626013	3.215
4	4.324350	3.440

The X(wt) and Y (mpg) variables are ready.

Null Hypothesis (H0): The slope of the line of best fit (a.k.a beta coefficient) is zero
Alternate Hypothesis (H1): The beta coefficient is not zero.

To implement the test, use the smf.ols() function available in the formula.api of statsmodels. You can pass in the formula itself as the first argument and call fit() to train the linear model.

# Train modelmodel = smf.ols('mpg ~ wt', data=df).fit()

Once model is trained, call model.summary() to get a comprehensive view of the statistics.

# Resultsprint(model.summary())

 OLS Regression Results ==============================================================================Dep. Variable: mpg R-squared: 0.780Model: OLS Adj. R-squared: 0.773Method: Least Squares F-statistic: 106.7Date: Mon, 30 Sep 2019 Prob (F-statistic): 2.15e-11Time: 14:04:29 Log-Likelihood: -7.3950No. Observations: 32 AIC: 18.79Df Residuals: 30 BIC: 21.72Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975]------------------------------------------------------------------------------Intercept 6.3553 0.194 32.741 0.000 5.959 6.752wt -0.5969 0.058 -10.328 0.000 -0.715 -0.479==============================================================================Omnibus: 1.783 Durbin-Watson: 1.363Prob(Omnibus): 0.410 Jarque-Bera (JB): 1.523Skew: 0.514 Prob(JB): 0.467Kurtosis: 2.707 Cond. No. 12.7==============================================================================Warnings:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The p-value is located in under the P>|t| against wt row. If you want to extract that value into a variable, use model.pvalues.

# P-Valuesprint(model.pvalues)

Intercept 4.891527e-25wt 2.146343e-11dtype: float64

Since the p-value is much lower than the significance level (0.01), we reject the null hypothesis that the slope is zero and take that the data really represents the effect.

Well, that was just one example of computing p-value.

Whereas p-value can be associated with numerous statistical tests. If you are interested in finding out more about how it is used, see more examples of statistical tests with p-values.

Conclusion

In this post we covered what exactly is a p-value and how and how not to use it. We also saw a python example related to computing the p-value associated with linear regression.

Now with this understanding, let’s conclude what is the difference between Statistical Model from Machine Learning model?

Well, while both statistical as well as machine learning models are associated with making predictions, there can be many differences between these two. But most simply put, any predictive model that has p-values associated with it are considered as statistical model.

Happy learning!

What Next?

To understand how exactly the P-value is computed, check out the example using the T-Test.

FAQs

What is P-Value? – Understanding the meaning, math and methods? ›

P Value is a probability score that is used in statistical tests to establish the statistical significance of an observed effect.

What does the p-value represent in math? ›

The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. The P stands for probability and measures how likely it is that any observed difference between groups is due to chance.

Know More ›

What is your understanding of the p-value? ›

What is the P value? The P value means the probability, for a given statistical model that, when the null hypothesis is true, the statistical summary would be equal to or more extreme than the actual observed results [2].

Learn More Now ›

How do you explain p-value to a child? ›

If your car wins the race, you can say that it's faster than the slow car. The p-value is like a score that tells you how likely it is that your car is really faster than the other one, and not just because of luck. The lower the p-value, the more confident you can be that your car is truly faster.

Learn More Now ›

How do you calculate the p-value in simple terms? ›

For a lower-tailed test, the p-value is equal to this probability; p-value = cdf(ts).
For an upper-tailed test, the p-value is equal to one minus this probability; p-value = 1 - cdf(ts).

More items...

Know More ›

What does P mean in math surface area? ›

■ Use surface area formulas. In general, for a prism or cylinder. o Lateral surface area is S = Ph, where P is the perimeter of the base and h is the height of the prism or cylinder. o Total surface area is S = Ph + 2B, where B represents the area of one base.

Learn More ›

What is the p-value and significance level for dummies? ›

The p-value is compared to the pre-defined alpha value. The result of the experiment is significant when the p-value is less than, equal to the alpha value signifying that a change was detected, rejecting the null hypothesis.

Explore More ›

How do you interpret the meaning of the p-value quizlet? ›

The P value definition: the probability, under the assumptions of the null hypothesis, of obtaining a result equal to or more extreme than what was actually observed. Simple definition of P value: P value is the probability of getting the observed result or more extreme under the null model.

Explore More ›

How do you interpret p-value in confidence interval? ›

In accordance with the conventional acceptance of statistical significance at a P-value of 0.05 or 5%, CI are frequently calculated at a confidence level of 95%. In general, if an observed result is statistically significant at a P-value of 0.05, then the null hypothesis should not fall within the 95% CI.

Learn More Now ›

What is an example of a P value? ›

P-values are expressed as decimals and can be converted into percentage. For example, a p-value of 0.0237 is 2.37%, which means there's a 2.37% chance of your results being random or having happened by chance. The smaller the P-value, the more significant your results are.

Read The Full Story ›

How do you describe p-value in research? ›

P-values are a continuum (between 0 and 1) that provide a measure of the strength of evidence against H0. For example, a value of 0.066, will indicate that there is a probability that we could observe values as large or larger than our critical value with a probability of 6.6%.

What is p-value and confidence interval for dummies? ›

p-values simply provide a cut-off beyond which we assert that the findings are 'statistically significant' (by convention, this is p<0.05). A confidence interval that embraces the value of no difference between treatments indicates that the treatment under investigation is not significantly different from the control.

Show Me More ›

How do you write the p-value? ›

If p values are reported, follow standard conventions for decimal places: for p values less than 0.001, report as 'p<0.001'; for p values between 0.001 and 0.01, report the value to the nearest thousandth; for p values greater than or equal to 0.01, report the value to the nearest hundredth; and for p values greater ...

Discover More ›

How do you find p-value by hand? ›

To compute a p-value by hand all you do is find the area “outside” of the test ratio value from step 6 in 'normal curve' – that is your p-value. There are two areas “outside” of your test ratio from step 6 – one on each side of the normal curve. The p-value is the area to the “outside” of the z-scores of -2.0 and 2.0.

Show Me More ›

How do you calculate the p-value of a function? ›

How do I find p-value from t?

Left-tailed t-test: p-value = cdf_t_,_d(t_score)
Right-tailed t-test: p-value = 1 - cdf_t_,_d(t_score)
Two-tailed t-test: p-value = 2 × cdf_t_,_d(−|t_score|) or. p-value = 2 - 2 × cdf_t_,_d(|t_score|)

Jan 18, 2024

Find Out More ›

What does p-value of 0.05 mean? ›

These are as follows: if the P value is 0.05, the null hypothesis has a 5% chance of being true; a nonsignificant P value means that (for example) there is no difference between groups; a statistically significant finding (P is below a predetermined threshold) is clinically important; studies that yield P values on ...

Read On ›

What does a P value of .05 represent? ›

Again: A p-value of less than . 05 means that there is less than a 5 percent chance of seeing these results (or more extreme results), in the world where the null hypothesis is true. This sounds nitpicky, but it's critical.

Explore More ›

What does a P value of 0.9 mean? ›

If the null hypothesis holds, then the p-values of your statistic have a uniform distribution. A p-value of 0.2 just means that you'd get a statistic greater than that 20% of the time under the null hypothesis; a p-value of 0.9 means you'd see a greater value 90% of the time.

Show Me More ›

What does a high p-value indicate? ›

Statistics and the P-Value

When the p-value is high, it means that it is very likely that what we are seeing is due to random chance. A low p-value, however, means that the probability of the results coming from random chance is unlikely.

View Details ›