Chi Square (Contingency Tables)

We have looked at hypothesis tests to analyze the proportion of one population vs. a specified value, and the proportions of two populations, but what do we do if we want to analyze more than two populations? A chi-square test is a hypothesis test in which the sampling distribution of the test statistic follows a chi-square distribution when the null hypothesis is true. There are multiple chi-square tests available and in this module we will cover the Pearson’s chi square test used in contingency analysis.

  • Null Hypothesis (H0): p1 = p2 =… = pk
  • Alternative Hypothesis (Ha): At least on of the proportions is different from others

The symbol k is the number of populations of our interest; k ≥ 2.

What is the Chi Square Test?

The chi-square test can also be used to test whether two factors are independent of each other. In other words, it can be used to test whether there is any statistically significant relationship between two discrete factors.

  • Null Hypothesis (H0): Factor 1 is independent of factor 2.
  • Alternative Hypothesis (Ha): Factor 1 is not independent of factor 2.

Chi Square Test Assumptions

  • The sample data drawn from the populations of interest are unbiased and representative.
  • There are only two possible outcomes in each trial for an individual population: success/failure, yes/no, and defective/non-defective etc.
  • The underlying distribution of each population is binomial distribution.
  • When np ≥ 5 and np(1 – p) ≥ 5, the binomial distribution can be approximated by the normal distribution.

How Chi Square Test Works

Test Statistic

Chi Square EQ1

Where:

  • Oi is an observed frequency
  • Ei is an expected frequency
  • N is the number of cells in the contingency table

If Median Test SXL_00(calculated chi-square statistic) is smaller than Median Test SXL_01 (critical value), we fail to reject the null hypothesis. The test statistic is calculated with the observed and expected frequency.

Use Minitab to Run a Chi-Square Test

Case study 1: We are interested in comparing the product quality exam pass rates of three suppliers A, B, and C using a nonparametric (i.e. distribution-free) hypothesis test: chi-square test.
Data File: “Chi-Square Test1” tab in “Sample Data.xlsx”

  • Null Hypothesis (H0): pA = pB = pC
  • Alternative Hypothesis (Ha): At least one of the suppliers has different pass rates from the others

Use Minitab to Run a Chi-Square Test:

Steps to run a chi-square test in Minitab:

  1. Click Stat → Tables → Cross Tabulation and Chi-Square.
  2. A new window named “Cross Tabulation and Chi-Square” pops up.
  3. Select “Results” as “For rows.”
  4. Select “Supplier” as “For columns.”
  5. Select “Count” as “Frequencies.”
  6. Click the “Chi-Square” button.
  7. A new window named “Cross Tabulation – Chi-Square” pops up.
  8. Check the boxes of “Chi-square analysis”, “Expected cell counts,” and “Each cell’s contribution to the Chi-Square statistic.”
  9. Click “OK” in the window named “Cross Tabulation – Chi-Square.”
  10. Click “OK” in the window named “Cross Tabulation and Chi-Square.”
  11. The Chi-square test results appear in the session window.

Model summary: Counts are based on the sample observation. Expected counts are based on the assumption that the null hypothesis is true. Since the p-value is smaller than alpha level (0.05), we reject the null hypothesis and claim that at least one supplier has different pass rate from others.

Case study 2: We are trying to check whether there is a relationship between the suppliers and the results of the product quality exam using nonparametric (i.e., distribution-free) hypothesis test: chi-square test.
Data File: “Chi-Square Test2” tab in “Sample Data.xlsx”

  • Null Hypothesis (H0): Product quality exam results are independent of the suppliers.
  • Alternative Hypothesis (Ha): Product quality exam results depend on the suppliers.

Use Minitab to Run a Chi-Square Test:

Steps to run a chi-square test in Minitab:

  1. Click Stat → Tables → Cross Tabulation and Chi-Square.
  2. A new window named “Cross Tabulation and Chi-Square” pops up.
  3. Select “Results” as “For rows.”
  4. Select “Supplier” as “For columns.”
  5. Select “Count” as “Frequencies are in.”
  6. Click the “Chi-Square” button.
  7. Check the boxes of “Chi-square analysis”, “Expected cell counts,” and “Each cell’s contribution to the Chi-Square statistic.”
  8. Click “OK” in the window named “Cross Tabulation – Chi-Square.”
  9. Click “OK” in the window named “Cross Tabulation and Chi-Square.”
  10. The Chi-square test results appear in the session window.

Model summary: The p-value is smaller than the alpha level (0.05) and we reject the null hypothesis. The product quality exam results are not independent of the suppliers. These results indicate the danger that we can get into when using discrete data. Not everything is as simple as yes/no or pass/fail. Even though supplier C has a lower fail rate of 10, you can see that the number of marginal results is higher. However, the p-value tells us that we must reject the null hypothesis and claim that the quality exam results are dependent on the suppliers.

Leave a Comment