Advanced Statistics and Experimental Design: Day 2 R Workshop – From Data Management to ANOVA and Post‑hoc Tests

 4 min read

YouTube video ID: 6f4_o1G49pE

Source: YouTube video by RUFORUMNetworkWatch original video

PDF

Introduction

The session was the second (actually third) day of the Advanced Statistics and Experimental Designs training organized by the Forum Secretariat. Participants were welcomed, reminded of meeting etiquette, and thanked for their punctuality and engagement.

Session Overview

  • Facilitators: Prof. Balava, Dr. Thomas, and technical specialist Salman.
  • Funding acknowledgment: World Bank and the Institute of Food and Nutrition.
  • Options for in‑person training were offered for those who prefer physical workshops.

Preparing the R Environment

  1. Create a project folder for each analysis (data, scripts, README, etc.).
  2. Set the working directory in RStudio via Session → Set Working Directory → Choose Directory.
  3. Install required packages (e.g., agricolae, doBy, lattice, effects).
  4. Load packages with library() calls before running any analysis.

Data Management

  • The example data set “bond_time” was built from vectors representing replicates for four treatments (A, B, C, D).
  • Participants were instructed to run a series of lines (43‑82) that create the data frame, compute summary statistics, and generate the vector of observation numbers (1‑12).
  • Emphasis was placed on not re‑typing code; instead, use the provided script files (*.R) and run them directly.

Running One‑Way ANOVA

model1 <- lm(bond_time ~ treatment, data = bond_time)
anova(model1)
  • The ANOVA table reports degrees of freedom, sum of squares, mean squares, F‑value, and p‑value.
  • In the example, the p‑value was 1.084e‑5, far below the conventional α = 0.05, indicating a significant difference among at least one pair of treatments.

Interpreting the ANOVA Output

  • P‑value: compared to α to decide significance.
  • Stars notation (***, **, *) reflects the magnitude of significance (e.g., *** ≈ p < 0.001).
  • Coefficients: the intercept represents treatment A; other coefficients show differences relative to A.
  • R‑squared and adjusted R‑squared give a sense of model fit (discussed later for regression contexts).

Checking Model Assumptions

  1. Residual vs. Fitted Plot – looks for constant variance; the example showed increasing spread for larger fitted values, suggesting heteroscedasticity.
  2. QQ Plot – assesses normality of residuals; most points followed the 45° line, with a few outliers.
  3. Residuals by Treatment – boxplots revealed that treatment D had markedly higher variability than the others.
  4. Statistical Tests
  5. Levene’s Test (via leveneTest) for homogeneity of variance.
  6. Shapiro‑Wilk or sf.test for normality of residuals.
  7. Results: p > 0.05 for normality (fail to reject H₀ → normality holds); p > 0.05 for Levene’s (variances appear homogeneous), but the rule‑of‑thumb ratio (max variance / min variance) was > 5, flagging potential heterogeneity.

Multiple Comparison Procedures

  • LSD (Least Significant Difference)r LSD.test(model1, "treatment", p.adj="none")
  • Produced three different letters (A, B, C, D), indicating all pairwise differences were significant in the unadjusted test.
  • Bonferroni Adjustmentr LSD.test(model1, "treatment", p.adj="bonferroni")
  • Made the test more stringent; some previously significant pairs became non‑significant, illustrating the importance of controlling Type I error when many comparisons are made.

Practical Examples

  1. Dog‑Bone Crushing Experiment – examined how different dog sizes and bone types affect crushing time. Two models were fitted:
  2. Model 3: only bone type as predictor (non‑significant).
  3. Model 4: bone type + dog type (both predictors significant, residuals became random).
  4. Demonstrated how omitting a relevant factor can inflate residual variance and mask true effects.
  5. Irrigation Treatments on Yield – a second data set with five irrigation levels (including “no irrigation”).
  6. Boxplots highlighted that method 2 gave the highest yield but also the greatest variability.
  7. ANOVA indicated significant differences; post‑hoc tests (LSD, Bonferroni) clarified which methods differed.

Common Troubleshooting Tips

  • Always run the library calls before any analysis; missing packages generate red error messages.
  • Use the provided script files instead of typing commands manually to avoid syntax errors.
  • When sharing screens, ensure the correct RStudio window is selected and that the microphone is muted when not speaking.
  • If a script fails, check that the data objects have been created (e.g., bond_time must exist before fitting the model).
  • For persistent errors, copy the exact error message, Google it, and apply the suggested fix.

Closing Remarks

  • Participants were encouraged to review the recorded sessions on YouTube and revisit the scripts.
  • The next day will cover two‑way ANOVA, split‑plot designs, regression, and correlation analysis.
  • A reminder that R is a general‑purpose statistical language, not limited to agriculture.
  • The facilitators thanked everyone for their patience and participation.

Effective experimental analysis hinges on disciplined data organization, proper setup of the R environment, rigorous checking of ANOVA assumptions, and the judicious use of post‑hoc tests (LSD, Bonferroni) to draw reliable conclusions about treatment effects.

Frequently Asked Questions

Who is RUFORUMNetwork on YouTube?

RUFORUMNetwork is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Plot** – assesses normality of residuals; most points followed the 45° line, with

few outliers. 3. Residuals by Treatment – boxplots revealed that treatment D had markedly higher variability than the others. 4. Statistical Tests - Levene’s Test (via `leveneTest`) for homogeneity of variance. - Shapiro‑Wilk or `sf.test` for normality of residuals. - Results: p > 0.05 for normality (fail to reject H₀ → normality holds); p > 0.05 for Levene’s (variances appear homogeneous), but the rule‑of‑thumb ratio (max variance / min variance) was > 5, flagging potential heterogeneity.

PDF