Mastering Data Analysis with R: A Comprehensive Recap of the CEAFSN Short Training Session
Introduction
The virtual workshop opened with a warm welcome from the Center of Excellence in Agricultural Systems and Nutrition (CEAFSN) and its partner, the World Bank‑funded REFORM program. Participants were reminded to stay connected via the Zoom link and the live YouTube stream, and to complete the attendance sheet with a WhatsApp‑compatible contact number for future communications.
Background of the Training Program
- Host: CEAFSN, University of ...
- Funding: World Bank (REFORM)
- Purpose: Build capacity for agri‑food system transformation in Africa through a series of short, non‑academic courses.
- Course Schedule (first four weeks):
- Data Analysis with R (this week)
- Proposal Writing (next week)
- Advanced Statistical Experimental Design (following week)
- Academic Writing Skills (two weeks later)
- Future Offerings: Additional courses will be added after the initial four.
Objectives and Expected Outcomes of the R Training
- Equip participants with the ability to download, install, and configure R and RStudio.
- Teach data import, manipulation, and export across common formats.
- Introduce R Markdown for reproducible reporting (PDF, HTML, PowerPoint).
- Demonstrate data‑visualisation techniques for high‑quality graphics.
- Provide hands‑on experience with inferential statistics (correlation, regression, categorical analysis, GLMs).
- By the end of the session, participants should be able to:
- Explain experimental design and survey concepts.
- Perform correlation, regression, categorical analysis, and GLMs in R.
- Apply R to quantitative and qualitative data from agriculture, health, economics, etc.
Participant Demographics & Pre‑Training Assessment
- Gender: Majority male, fewer female participants.
- Academic Level: 45% PhD, 35% MSc, 15% BSc, remainder researchers/lecturers.
- Geographic Reach: Kenya (782), Mozambique (531), Uganda (473), Nigeria (184), Malawi (127), Benin (107), Tanzania (103), Ethiopia (97), Zimbabwe (91), South Africa (79) plus participants from the USA, UK, Turkey, etc.
- Research Stage: 60% were at the data‑analysis phase, followed by concept/proposal development.
- Self‑Rated R Knowledge: 45% reported no knowledge, 42% some knowledge, 5% neutral, <10% good, <1% excellent – confirming the need for this introductory course.
- Key Expectations (from questionnaire): Data manipulation, visualization, inferential statistics, experimental design, multivariate analysis, and domain‑specific topics such as breeding, molecular and geospatial analysis.
Overview of R and RStudio
- R: Open‑source statistical programming language; free to download and use.
- RStudio: Integrated Development Environment (IDE) that works with R (R can run without RStudio, but RStudio cannot run without R). It provides:
- Console, script editor, environment pane, plots pane, packages pane, help pane, and viewer.
- Syntax highlighting, code history, and project management.
Step‑by‑Step Installation Guide
- Download R
- Go to
https://www.r-project.org(or type "download R" in a search engine). - Click Download R → choose the appropriate OS (Windows, macOS, Linux).
- For Windows, select R‑4.3.1‑win.exe (latest version at the time of training).
- Run the installer and follow the default prompts.
- Download RStudio
- Visit
https://www.rstudio.com/products/rstudio/download/. - Click the Download button under the RStudio Desktop Open‑Source License for your OS.
- Run the resulting
.exe(or.dmgon macOS) and complete the installation. - Verify Installation
- Open RStudio; the console should display the R version (e.g.,
R version 4.3.1). - If an older R version is detected, update via the same download steps or run in R:
r install.packages("installr") library(installr) updateR() - Update RStudio (if needed)
- In RStudio, go to Help → Check for Updates and follow the prompts.
Getting Familiar with the RStudio Interface
- Four main panes:
- Source editor (top‑left) – write scripts (
.Rfiles). - Console (bottom‑left) – where commands are executed.
- Environment/History (top‑right) – view loaded objects and command history.
- Files/Plots/Packages/Help/Viewer (bottom‑right) – manage files, view plots, install/load packages, access help, and view web content.
- Creating a new script:
File → New File → R Script. Save with a meaningful name (e.g.,day1_analysis.R). - Running code: Place the cursor on a line and click Run (or press
Ctrl+Enter). - Saving work: Click the disk icon or press
Ctrl+S. Unsaved changes are highlighted in red.
Installing and Loading Packages
- Common packages introduced:
tidyverse– data manipulation and visualization.readxl– import Excel files.dplyr,ggplot2,epidisplay,dplyr,tidyr(sub‑packages of tidyverse).- Installation methods:
- Command line:
install.packages("tidyverse"). - RStudio GUI:
Tools → Install Packages…. - Batch install:
install.packages(c("tidyverse", "readxl", "ggplot2")). - Loading a package:
library(tidyverse); repeat for each required package. - Verification: No red error messages in the console after
library()indicates successful loading.
Basic R Commands Demonstrated
- Using R as a calculator:
2 + 2,6 * 9 - 4. - Assigning variables:
a <- 10ora = 10. - Simple arithmetic with variables:
a + b. - Comments: Prefix with
#– ignored by R, useful for notes. - Help system:
help("t.test")or?t.testopens the Help pane with usage, arguments, and examples.
Importing Data from Excel
- Prepare the file (e.g.,
lab2_data.xlsx) in the working folder. - In RStudio:
File → Import Dataset → From Excel…. - Browse to the file, preview the data, and click Import.
- RStudio generates the code, typically:
r library(readxl) lab2_data <- read_excel("path/to/lab2_data.xlsx") View(lab2_data) - Verify the import by checking the Environment pane (object name and dimensions).
Homework and Next Steps
- By tomorrow:
- Ensure R 4.3.1 and the latest RStudio are installed.
- Install and load the packages listed in the PDF (at least
tidyverseandreadxl). - Save a script named
day1.Rwith the installation and library commands. - Before the next session:
- Review the PDF that lists additional useful packages (
dplyr,ggplot2,epidisplay, etc.). - Practice importing an Excel file and exploring it with
View()and basicsummary()commands. - Support channels:
- Post technical questions in the Q&A chat during live sessions.
- Use the recorded YouTube stream and the shared Google Drive folder for reference materials.
Conclusion
The first day of the CEAFSN short‑course series successfully equipped participants with the foundational skills to install R and RStudio, navigate the IDE, manage packages, and perform basic data import and manipulation. With a majority of attendees reporting little to no prior R experience, the hands‑on approach and clear step‑by‑step instructions ensured that everyone left the session ready to continue with more advanced statistical analyses in the upcoming modules.
By mastering the installation, basic commands, and package management in R, participants are now prepared to conduct robust data analyses that will support their research and grant proposals, advancing the broader goal of transforming Africa's agri‑food systems.
Frequently Asked Questions
Who is RUFORUMNetwork on YouTube?
RUFORUMNetwork is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.
Does this page include the full transcript of the video?
Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.