From Data Visualization to Experimental Design: A Comprehensive Workshop Overview

 4 min read

YouTube video ID: xWGdzqYfW6k

Source: YouTube video by RUFORUMNetworkWatch original video

PDF

Introduction

The session began with a brief check‑in and a reminder of the assignment given the previous day. Participants were invited to present their work on plotting different types of variables using R.

Assignment Presentation – Wycliffe

  • Objective: Plot three combinations of variables (two quantitative, two qualitative, and one quantitative‑with‑qualitative).
  • Tools Used:
  • tidyverse for data manipulation and plotting.
  • patchwork to combine multiple ggplot objects.
  • Workflow:
  • Load libraries (library(tidyverse), library(patchwork)).
  • Read the CSV file with read_csv() and convert gender and job_category to factors.
  • Plot 1 – Two quantitative variables (salary vs cell_begin):
    • ggplot(data, aes(x = salary, y = cell_begin)) + geom_point() + geom_smooth()
    • Added labels and increased text size for readability.
  • Plot 2 – Two qualitative variables (job_category filled by gender):
    • Bar chart showing counts per job category, colored by gender.
  • Plot 3 – Quantitative vs qualitative (salary_begin vs gender):
    • Reordered gender levels, log‑transformed salary, and used a boxplot.
  • Combine plots with patchwork (p1 / p2 / p3) and annotate with figure labels (a, b, c).
  • Save the final figure using ggsave() at 300 dpi.
  • Outcome: All three plots were generated, combined, and saved successfully.

Feedback and Additional Presentations

  • The facilitator praised Wycliffe’s work, especially the use of patchwork.
  • Subsequent participants (Mona, Ayanna, etc.) attempted similar tasks, receiving guidance on appropriate plot types (e.g., boxplot for salary vs gender) and the importance of correct factor conversion.

Data Export and Reporting

  • Demonstrated how to export plots from RStudio’s Plot pane:
  • Use the Export button → Save as Image, Save as PDF, or Copy to Clipboard.
  • Insert the exported graphics into a Word document and add figure captions.

Research Process Overview

  • Research Cycle: Problem identification → Objectives → Hypotheses → Data collection → Analysis → Interpretation → Dissemination.
  • Emphasized that methodology must suit the discipline (social sciences, agriculture, etc.) and that research is iterative, not linear.

Variable Types and Appropriate Visualisations

Variable PairRecommended Plot
Quantitative‑QuantitativeScatter plot (with smoothing line)
Qualitative‑QualitativeBar chart or cross‑tabulation
Quantitative‑QualitativeBox plot (or bar chart of means)

Cross‑Tabulation

  • Used table() to create frequency tables for categorical variables (e.g., gender vs job category).
  • Converted frequencies to percentages with prop.table() and rounded for reporting.

Fundamentals of Experimental Design

  1. Sources of Variation
  2. Controlled variation (manipulated by the researcher – e.g., different breeds, fertilizers).
  3. Uncontrolled variation (environmental factors like temperature, soil texture).
  4. Key Principles
  5. Replication: Repeating a treatment on independent experimental units to estimate experimental error and increase precision.
  6. Blocking: Grouping similar experimental units to reduce background noise.
  7. Randomization: Assigning treatments to units by chance to avoid bias and satisfy statistical test assumptions.

Detailed Design Types

  • Completely Randomized Design (CRD): Assumes homogeneous experimental units; treatments are assigned randomly without blocking.
  • Randomized Complete Block Design (RCBD): Blocks are formed based on a known source of variation (e.g., soil fertility). Each block receives all treatments.
  • Latin Square Design: Controls two sources of variation (rows and columns). Requires a square number of treatments; each treatment appears once per row and column.
  • Split‑Plot Design: Used when one factor requires larger experimental units (main plot) and another factor can be applied to sub‑plots. Different error terms are estimated for main‑plot and subplot factors.
  • Incomplete Block Designs: When a block cannot accommodate all treatments. Includes:
  • Balanced Incomplete Block (BIBD): Every pair of treatments occurs together the same number of times.
  • Partially Balanced Incomplete Block (PBIBD): Pairs occur with varying frequencies, allowing fewer replicates.
  • Resolvable (Alpha) Lattice Designs: Groups smaller incomplete blocks into “super‑blocks” that together form a complete replicate. Useful when statutory constraints dictate a fixed number of treatments and replicates.

Practical Examples Discussed

  • Agricultural field trial: Blocking by soil fertility and shading; randomization within each block.
  • Animal study: Using cages as experimental units, avoiding pseudo‑replication by treating each cage as a replicate.
  • Split‑Plot with Tomato Varieties and Pesticide Levels: Main plot = pesticide (applied to whole rows), subplot = tomato variety (applied within rows).
  • Alpha Lattice for 78 rice lines: Two replicates of five‑unit blocks to meet legal trial requirements.

Q&A Highlights

  • Clarified the difference between experimental units (the smallest unit receiving a treatment) and sampling units (units measured for response).
  • Discussed how to handle unbalanced replicates and the role of covariates in analysis.
  • Explained why null hypotheses are used for statistical testing, while research hypotheses guide experimental design.
  • Addressed how to decide the number of replicates based on variability, effect size, desired confidence, and resource constraints.

Closing Remarks

  • Participants were reminded to run the provided R scripts, install the agricolae package, and prepare for the next morning’s practical session on experimental design implementation.
  • The facilitator thanked everyone and wished them a productive remainder of the day.

Effective research hinges on two pillars: clear, reproducible data visualisation and rigorously planned experimental designs. Mastering tools like ggplot2 and patchwork enables insightful graphics, while understanding replication, blocking, and randomisation ensures that experiments yield unbiased, precise, and interpretable results.

Frequently Asked Questions

Who is RUFORUMNetwork on YouTube?

RUFORUMNetwork is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

PDF