Chi Square Test Of Independence – Guide & Examples

02.04.23 Types of chi-square Time to read: 9min

How do you like this article?

0 Reviews


Chi-Square-Test-of-Independence-01

The chi-square test of independence is a statistical test used to determine whether two categorical variables are associated or independent. A way to assess the independence or dependence of variables is to use a contingency table, allowing you to compare the expected frequencies with the observed ones. In the realm of statistics, the chi-square test serves as a valuable tool across fields, such as marketing, social science, and medical research.

Chi-Square Test of Independence – In a Nutshell

  • The chi-square test of independence determines if two categorical variables are related.
  • With a contingency table the expected and observed frequencies can be compared.
  • The null hypothesis assumes no relationship, while the alternative hypothesis does.
  • The chi-square test of independence calculates the chi-square statistic.
  • The p-value is then calculated and rejects or accepts the null hypothesis.
  • The chi-square test of independence is a valuable tool among others, outlined in this article.

Definition: Chi-square test of independence

The chi-square test of independence is a statistical test used to determine the association between two categorical variables. The chi-square test of independence, also known as Pearson’s chi-square test, is a widely used nonparametric test because it does not rely on the assumptions of parametric tests, particularly the assumption of a normal distribution.

The chi-square test of independence is calculated by comparing the observed frequencies of categories in a contingency table with the frequencies that would be expected if the variables were independent. The components needed for the test are the observed frequencies, expected frequencies, and degrees of freedom.

Contingency tables

Contingency tables summarize and display the relationship between two categorical variables in the chi square test of independence. They are cross-tabulation tables, two-way frequency tables, or crosstabs.

They are useful for analyzing the relationship between two categorical variables, and they can be used as the basis for statistical tests such as the chi square test of independence.

Example

A contingency table could show the number of males and females who study psychology and those who take history.

The rows would represent gender (male and female), and the columns would represent study status (psychology and history).

Gender Psychology History
Male 67 130
Female 124 50

The chi-square test of independence hypotheses

The chi-square test of independence is used to test whether the observed frequencies of the categories in a contingency table differ significantly from those expected if the variables were independent.

Example

You can collect data on blood types from a sample of 500 individuals and create a contingency table with the observed frequencies. We then use the chi-square goodness of fit test to compare the observed frequencies with the expected frequencies based on the ABO blood type distribution in the general population.

The hypotheses for the chi-square goodness of fit test could be:

Example

Expectation of equal proportions:

  • Null hypothesis (H0): The distribution of blood types in the population is consistent with the expected distribution.
  • Alternative hypothesis (Ha): The distribution of blood types in the population significantly differs from the expected distribution.

Example

Expectation of different proportions:

  • Null hypothesis (H0): The distribution of blood types in the population is consistent with the average distribution.
  • Alternative hypothesis (Ha): The population’s blood types distribution significantly differs from the average distribution.

Expected values

Expected values in the context of the chi square test of independence refer to the frequencies that would be expected if the two categorical variables were independent.

The formula for calculating the expected frequency for each cell of a contingency table is:

Example

Consider a study on the relationship between education level and voting behavior. A researcher collects data from a sample of 500 individuals and records their education level (high school, college, graduate school) and voting behavior (voted, did not vote).

When is the chi square test of independence used?

The chi square test of independence can be used when certain criteria and circumstances are met:

  • The variables under investigation are categorical or nominal
  • The variables are independent of each other
  • The expected frequency count for each cell in a contingency table is at least 5

If these criteria are met, the chi square test of independence can be used to test whether there is a significant association between the two categorical variables.

Example

You can use chi square test of independence to investigate the relationship between gender and religion.

Calculating the test statistic of the chi-square of independence

The formula for calculating the test statistic of the chi square test of independence is:

 

where

The chi-square test statistic measures the difference between the observed and expected frequencies in a contingency table.

To calculate the test statistic for the chi square test of independence, follow these five steps:

  1. Create a contingency table with the observed frequencies for the two categorical variables.
  2. Calculate the expected frequencies for each cell in the contingency table.
  3. Calculate the difference between each cell’s observed and expected frequencies, and square the difference.
  4. Divide the squared difference by the expected frequency for each cell.
  5. Sum the values obtained in step 4 to get the chi-square test statistic.

1. Table of frequencies

To conduct the chi square test of independence, the first step is to establish a contingency table containing the counts or frequencies of each category of one variable for each category of the other variable.

Example

We want to investigate the relationship between a new medical intervention and patient outcome. We collect data from 200 patients and record whether they received the intervention (yes or no) and had a positive outcome (yes or no). We create a contingency table for the chi square test of independence with the observed frequencies:

Intervention Outcome Observed Frequencies
Yes No 60
No No 40
No Yes 30
Yes Yes 104

2. Calculating O – E

This step of chi square test of independence helps to quantify the extent to which the observed frequencies differ from what would be expected under the assumption of independence between the two variables.

To calculate O – E, an additional column is added to the contingency table to represent the difference between the observed and expected frequencies for each cell.

Using the previous example of the medical intervention and patient outcome, the contingency table with added columns would be:

Example

Intervention Outcome Observed Frequencies Expected Frequencies O - E
Yes No 60 50 10
No No 40 50 -10
No Yes 30 70 -40
Yes Yes 10 30 -20

3. Calculating (O – E)²

To calculate (O – E)², another column is added to the contingency table. This third step of calculating the chi square test of independence assesses the squared difference between each cell frequencies of observed and expected values.

Using the same example of the medical intervention and patient outcome, the contingency table with the additional columns would be:

Example

Intervention Outcome Observed Frequencies Expected Frequencies O - E (O - E)2
Yes No 60 50 10 100
No No 40 50 -10 100
No Yes 30 70 -40 1600
Yes Yes 10 30 -20 400

4. Calculating (O – E)²/ E

To calculate , an additional column is added to the contingency table to represent the result of dividing the squared difference between the observed frequency and the expected frequency by the expected frequency for each cell.

Example

Intervention Outcome Observed Frequencies Expected Frequencies O - E (O - E)2
Yes No 60 50 10 100 2
No No 40 50 -10 100 2
No Yes 30 70 -40 1600 22.86
Yes Yes 10 30 -20 400 13.33

This step scales the contribution of each cell to the overall chi-square test statistic.

5. Calculating X²

The last step in the chi square test of independence is to sum the values in the  column to obtain the overall chi-square test statistic. This test statistic measures the degree of association between the two categorical variables.

Continuing with the same example of the medical intervention and patient outcome in our chi square test of independence, we can sum the values in the column as follows:

Example

χ² = 2+2+22.86+13.33
χ² =40.19

Performing the chi square test of independence

When performing the chi square test of independence, a large value of the chi-square test statistic indicates that the observed frequencies in the contingency table are significantly different from the expected frequencies under the assumption of independence between the two categorical variables.

The six steps to perform the chi square test of independence are:
1. State the null and alternative hypotheses
2. Create a contingency table
3. Calculate the expected frequencies
4. Calculate the chi-square statistic using the formula:
5. Determine the degrees of freedom and p-value
6. Interpret the results association.

1. Calculating the expected frequencies

The first step in using the chi square test of independence is to calculate the expected frequencies for each cell in the contingency table. The formula for calculating the expected frequency for a cell is:

2. Calculating the chi-square

The second step of the chi square test of independence is to calculate the test statistic (χ²) using the formula: , where O is the observed frequency and E is the expected frequency.

where

3. The critical chi-square value

The critical chi-square value can be found in a chi-square distribution table or software, based on the chosen level of significance and the degrees of freedom (df). The formula for degrees of freedom for the chi square test of independence is:

where

The significance level is typically set at 0.05 or 0.01.

Example

In a 2×2 contingency table, the critical chi-square value with df=1 and α=0.05 is 3.84.

4. Comparing the chi-square value to the critical value

The next step in the chi square test of independence is to compare the calculated chi-square test statistic to the critical value obtained from the chi-square distribution table or software. If the calculated chi-square test statistic is greater than the critical value, the null hypothesis is rejected and it is concluded that there is a significant association between the two categorical variables.

5. Should the null hypothesis be rejected?

If the calculated chi-square test statistic is greater than the critical value, the null hypothesis is rejected, indicating a significant association between the two categorical variables. If the calculated chi-square test statistic is less than or equal to the critical value, the null hypothesis is not rejected, indicating no significant association between the two categorical variables.

Example

If the calculated chi-square test statistic is 10.26 and the critical chi-square value is 3.84, we would reject the null hypothesis and conclude that there is a significant association between the two variables.

Practice questions for the chi-square test of independence

How much knowledge do you have regarding the chi-square test of independence? The ideal and convenient method to find out how much you know is by asking yourself some practice questions for the chi-square test of independence. Therefore, the downloadable document below will explore some practice questions for the chi-square test of independence and their answers.

Practice questions to the chi-square test of independence
Download

Chi-square test of independence vs. other tests

Apart from chi-square test of independence, some other tests in other scenarios include:

Test When to use it
Chi-square goodness of fit When there is only one categorical variable and we want to test whether the observed frequencies fit a known or expected distribution.
Fisher’s exact test When the sample size is small (typically less than 20) and the expected frequency for one or more cells is less than 5.
McNemar’s test When the data are paired or matched, such as in a before-and-after study or a case-control study.
G test When the sample size is small or the expected frequency for one or more cells is less than 5, and when the Chi-square test is not appropriate due to its assumptions.

Printing Your Thesis With BachelorPrint

  • High-quality bindings with customizable embossing
  • 3D live preview to check your work before ordering
  • Free express delivery

Configure your binding now!

to printing services

FAQs

To perform a chi square test of independence in R, you can use the chisq.test() function, specifying the two categorical variables you want to test for independence. The function returns the test statistic, degrees of freedom, and p-value for the test.

A chi square test of independence is a statistical method used to determine if there is a significant association between two categorical variables.

To perform a chi square test of independence, the researcher creates a contingency table and calculates the chi-square statistic by comparing observed and expected frequencies.

The p-value is then calculated to determine if the null hypothesis is rejected or accepted in the chi square test of independence.

If the p-value is less than 0.05, the two variables have a significant association. If the p-value exceeds 0.05, there is no significant association. Another way, is to calculate the effect size, which can also determine the strength of the association.