Comprehensive Guide to Day 4 of the R Training: From Working Directory to Hypothesis Testing

 4 min read

YouTube video ID: OFfVqlLphYU

Source: YouTube video by RUFORUMNetworkWatch original video

PDF

Introduction

The fourth session of the R training builds on Day 2 material and moves participants toward independent data analysis. The instructor emphasizes the importance of a correctly set working directory, loading data files, managing packages, and writing reproducible scripts.

1. Setting Up the Working Environment

  • Working directory: Use Session → Set Working Directory → Choose Directory to point RStudio to the folder that contains all data files (CSV, PDF, AR, etc.).
  • Script management: Keep all commands in a single script; comments are added with # so the script can be re‑run without re‑installing packages.
  • Package handling: Install once with install.packages() and load with library(). Comment out the install line after the first run.

2. Importing Data

  • File formats: CSV, Excel, text, SPSS, SAS, Stata are all supported. CSV is recommended for its simplicity.
  • Reading a CSV: read.csv("AlgeriaWeight.csv") stores the data frame as AlgeriaWeight.
  • Troubleshooting: Errors usually stem from an incorrect working directory; verify the file list in the lower‑right pane of RStudio.

3. Exploring the Built‑in iris Dataset

  • data(iris) loads the classic flower data set.
  • summary(iris) provides the five‑number summary and mean for each numeric variable.
  • names(iris) lists the variables: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species.
  • head(iris) and tail(iris) show the first and last six rows.

4. Basic Plotting

  • Scatter plot: plot(iris$Sepal.Length, iris$Sepal.Width).
  • Color by species: plot(iris$Sepal.Length, iris$Sepal.Width, col=iris$Species) adds a categorical colour.
  • Adding a regression line: abline(lm(Sepal.Width ~ Sepal.Length, data=iris)) draws the line of best fit.
  • Customising symbols: pch, lty, and col arguments control point shape, line type, and colour.
  • Multi‑panel layout: par(mfrow=c(2,2)) creates a 2×2 grid for four related plots.

5. Using attach() and detach()

  • attach(iris) lets you refer to variables without the $ operator.
  • Always detach(iris) after you finish to free memory and avoid name clashes.

6. Data Transformation

  • Converting to factors: iris$Species <- as.factor(iris$Species) enables ANOVA and other categorical analyses.
  • Creating new variables: AlgeriaWeight$logYield <- log10(AlgeriaWeight$Yield) adds a log‑transformed column.
  • Renaming columns: names(AlgeriaWeight)[1] <- "Site" changes the first column name.
  • Subsetting: AlgeriaB <- subset(AlgeriaWeight, Region == "B") extracts only region B observations.

7. Summarising Data

  • summary(AlgeriaWeight$Yield) gives min, 1st quartile, median, mean, 3rd quartile, max.
  • dim(AlgeriaWeight) returns rows and columns (e.g., 1344 × 5).
  • table(AlgeriaWeight$Site, AlgerianWeight$Genotype) produces a cross‑tabulation.

8. Hypothesis Testing with t‑tests

  • Paired t‑test (self‑ vs. cross‑pollinated seeds): R t.test(cross, self, paired=TRUE, alternative="two.sided")
  • Result includes t‑statistic, degrees of freedom, and p‑value.
  • Compare p‑value to a chosen significance level (α = 0.05, 0.01, or 0.10) to decide whether to reject the null hypothesis.
  • Independent two‑sample t‑test (male vs. female length): R t.test(length ~ Sex, data=nutes, paired=FALSE, var.equal=TRUE)
  • var.equal toggles the assumption of equal variances; setting it to FALSE invokes Welch’s correction.
  • One‑sample t‑test: Tests whether a sample mean equals a known value (e.g., seed count = 15).

9. Non‑parametric Alternatives (Wilcoxon Tests)

  • Use when normality or equal‑variance assumptions are violated.
  • Wilcoxon signed‑rank test for paired data: R wilcox.test(cross, self, paired=TRUE)
  • Wilcoxon rank‑sum test for independent groups: R wilcox.test(length ~ Sex, data=nutes)
  • P‑values from non‑parametric tests are usually larger (less powerful) than those from parametric tests, but they remain valid without distributional assumptions.

10. Interpreting Results

  • Statistical significance (p < α) indicates evidence against the null hypothesis.
  • Practical significance: Even a statistically significant difference may be trivial in real‑world terms (e.g., a 5 kg yield increase on a 600 kg baseline).
  • Degrees of freedom reflect the amount of independent information; paired designs halve the df compared to treating the same data as independent.

11. Common Pitfalls & Troubleshooting

  • Forgetting to set the working directory leads to “file not found” errors.
  • Not attaching a data frame before using $‑free variable names causes “object not found” messages.
  • Mis‑specifying paired= or var.equal= changes the test’s assumptions and can dramatically alter the outcome.
  • Adjust the plot pane size in RStudio to view multi‑panel graphics correctly.

12. Next Steps

Tomorrow the instructor will introduce regression analysis, building on the scatter‑plot and line‑of‑best‑fit concepts demonstrated today. Participants are encouraged to experiment with the scripts, modify plot parameters, and practice the t‑test and Wilcoxon workflows on their own data sets.


By the end of this session, learners should be comfortable with: 1. Setting and verifying the working directory. 2. Importing and inspecting data frames. 3. Creating and customizing basic plots. 4. Transforming variables and converting data types. 5. Performing both parametric and non‑parametric hypothesis tests and interpreting p‑values.

Mastering the workflow—from setting the working directory, loading and reshaping data, to visualising relationships and conducting appropriate hypothesis tests—empowers you to perform rigorous statistical analyses in R without needing to watch the video again.

Frequently Asked Questions

Who is RUFORUMNetwork on YouTube?

RUFORUMNetwork is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF