Two-Way ANOVA – Definition & When To Use It

23.03.23 Statistics Time to read: 6min

How do you like this article?

0 Reviews


Two-way-ANOVA-01

The two-way ANOVA (Analysis of Variance) serves as a prominent analytical technique in statistics. It is designed to analyze the effects of two independent categorical variables on a continuous dependent variable. On the contrary, stands the one-way ANOVA, which evaluates the impact on only one variable. This type of assessment, allows researchers to determine relationships between variables on a deeper level, providing more accurate insights into data sets.

Two-Way ANOVA – In a Nutshell

  • ANOVA is a statistical method that compares means between two or more groups.
  • Two-way ANOVA is used to determine the difference between the means of two groups.
  • It is a parametric statistical method that requires numerical data.
  • It assumes homogeneity of variance and independence of observations
  • Learn how to conduct a two-way ANOVA correctly.
  • See how to interpret and present the results of a two-way ANOVA.

Definition: Two-way ANOVA

ANOVA stands for Analysis of Variance, a statistical method used to determine whether there are significant differences between two or more groups. It is used to compare the means of two or more groups and determine if they are significantly different from each other.

Example

You are conducting a study to investigate the effects of two factors, gender and age, on income. You collect data on employees’ salaries in a company and categorize them by gender (male or female) and age group (1 = 18-30, 2 = 31-50, or 3= 51 and above).

In this case, a two-way ANOVA is used to determine whether gender and age group significantly impact the average salary of employees.

When is a two-way ANOVA used?

A two-way ANOVA is appropriate when you have gathered data on a continuous dependent variable measured at different levels of two categorical independent variables. The dependent variable in a two-way ANOVA can be a numerical measure of a characteristic or behavior that is averaged across groups to calculate the mean.

Salary is a quantitative variable because it represents income. It can be divided to find the average salary per person.

A categorical variable represents a set of categories or groups. It is a variable that can take on one of a limited number of values or levels, which are often represented by labels or names. Gender types male and female are levels within the categorical variable gender type. Age groups, 1,2 and 3 are levels within the categorical variable age group.

Conduct a final format revision for a print of your thesis
Before submitting your thesis for print, check on your formatting with our 3D preview function for a final time. It provides an exact virtual visualization of what the printed version will resemble, making sure the physical version meets your expectations.

The Function of the two-way ANOVA

The two-way ANOVA utilizes the F test to determine the statistical significance of the differences between groups. The F test compares the variability in each group mean to the overall variance in the dependent variable in what is known as a group-wise comparison test.

In a two-way ANOVA with interaction, three hypotheses can be tested:

  1. No significant difference between the means of the groups formed by varying factor 1.
  2. No significant difference between the means of the groups formed by varying factor 2
  3. No significant difference in the means of the groups formed by varying the levels of factor 1 and 2.

In contrast, a two-way ANOVA with no interaction tests whether each factor has a main effect on the dependent variable but no interaction between the factors. In our average salary experiment, we can use two-way ANOVA to test three hypothesis:

Null hypothesis (H0) Alternate hypothesis (Ha)
There is no difference in average salary
for any gender type
There is a difference in average salary by gender type
There is no difference in average salary at any age bracket There is a difference in average salary at any age bracket
The effect of one independent variable on average salary does not depend on the effect of the other independent variable (a.k.a. no interaction effect) There is an interaction effect between age group and gender type on average salary

Two-way ANOVA assumptions

A two-way ANOVA makes several assumptions about the data and the statistical model that must be met for the results to be reliable and valid. These are:

  • Homogeneity of variance: The variance of the dependent variable should be equal across all groups. Use a non-parametric test like Kruskal-Wallis test if your data set fails to exhibit homogeneity.
  • Independence of observations: The observations should be independent of each other. This means that the values of the dependent variable in one group should not be related to the values in any other group.
  • Normally-distributed dependent variable: The data within each group should follow a normal distribution. This can be checked using normal probability plots or other tests of normality.

Conducting a two-way ANOVA

The dataset from our income experiment includes observations of:

  • Income (average salary per person)
  • Gender type (male, female)
  • Age group (1 = 18-30, 2 = 31-50, or 3= 51 and above)
  • Industry (1, 2, 3, 4)

Two-way ANOVA in R

The two-way ANOVA will test whether the independent variables (gender type and age group) affect the dependent variable (average salary). But there are some other possible sources of variation in the data that we want to take into account.

After loading the data into the R environment, we will create each of the three models using the aov() command, and then compare them using the aictab() command.

Two-way ANOVA R code

two.way aov(salary ~ gender + age group, data = worker.data)

In the second model, to test whether the interaction of gender and age group influences the salary, use a ‘ * ‘ to specify that you also want to know the interaction effect.

Two-way ANOVA with interaction R code

interaction aov(salary ~ gender* age group, data = worker.data)

Because our workers were randomized within industries, we add this variable as a blocking factor in the third model. We can then compare our two-way ANOVAs with and without the blocking variable to see whether the industry matters.

Two-way ANOVA with blocking R code

blocking aov(salary~ gender * age group + block, data = worker.data)

Model comparison

We can use Akaike information criterion (AIC) to calculate the best-fit model by finding the model that uses the fewest parameters to explain the largest variation. We can use the aictab() to perform a model comparison.

AIC R Sample code

library(AICcmodavg)

model.set list(two.way, interaction, blocking)

model.names c(“two.way”, “interaction”, “blocking”)

aictab(model.set, modnames = model.names)

Two-Way ANOVA – Result interpretation

The output looks like this:

Df Sum Sq Mean Sq F value Pr(>F)
Gender 2 6.068 3.034 9.073 0.000253 ***
Age 1 5.122 5.122 15.316 0.000174 ***
Residuals 92 30.765 0.334
Signif. codes: 0 `***' 0.001 "*' 0.01 "1 0.05 0.1 ‘’ 1

The model can be interpreted using the following columns:

  1. Df displays the degrees of freedom for each variable, which is equal to the number of levels in the variable minus 1.
  2. Sum sq represents the sum of squares, which is the variation between the group means created by the independent variable levels and mean.
  3. Mean sq refers to the mean of the sum of squares, which is calculated by the sum of squares divided by the degrees of freedom.
  4. F value is the test statistic obtained from the F test, which is the mean square of the variable divided by the mean square of each parameter.
  5. Pr(>F) indicates the p-value of the F statistic, which depicts the likelihood that the F value from the F test would occur if the null hypothesis of no difference were true.

Post-hoc test

A post-hoc test will be used to test which levels are actually different from each other since ANOVA only shows which parameters are significant. We use the Tukey’s Honestly-Significant-Difference (TukeyHSD) test as shown below:

Tukey R code

TukeyHSD(two.way)

The output looks like this:

Tukey multiple comparisons of means 95% family-wise confidence level

Fit: aov(formula = salary – gender + age, data = worker.data)

Two-way ANOVA – Result presentation

The following shows an example of a potential discussion of the results.

Example

The increased salary under combination 3 and at older ages suggests that under normal conditions similar to ours, this combination would be most significant for competitive salaries. The lack of interaction between gender and age group suggests that age does not affect the ability of the workers to earn more, though at ages higher than ours, this may become the case.

FAQs

A one-way ANOVA is used to test for differences between two or more groups on a single independent variable, whereas a two-way ANOVA is used to test for differences between groups on two independent variables, and their interaction effect on a single dependent variable.

ANOVA is typically used when you want to determine if there is a significant difference between the means of two or more groups.

The assumptions of ANOVA include normality, homogeneity of variances, and independence of observations.

The results of a two-way ANOVA test are typically reported as F-statistics, with a corresponding p-value that indicates the statistical significance of the differences between the means of the groups being compared.

Print Your Thesis Now
BachelorPrint as an online printing service offers
numerous advantages for Canadian students:
  • ✓ 3D live preview of your configuration
  • ✓ Free express delivery for every order
  • ✓ High-quality bindings with individual embossing

configure now