Comprehensive Guide to Importing, Cleaning, and Analyzing Data in R

 3 min read

YouTube video ID: Ic9PU70slN4

Source: YouTube video by RUFORUMNetworkWatch original video

PDF

Recap of the First Session

  • Installed R (v4.4.0) and RStudio.
  • Updated both the R version and RStudio interface.
  • Loaded essential packages (e.g., tidyverse, readxl).
  • Learned basic syntax: variable assignment (<- or =), case‑sensitivity, using R as a calculator, and the importance of specifying measurement levels (categorical vs. numeric).

Setting Up Your Workspace

  1. Folder Structure – Keep a dedicated folder (e.g., R_Training/Day1) on Desktop or Documents containing all scripts and data files.
  2. Working Directory – In RStudio: Session → Set Working Directory → Choose Directory. This tells R where to look for files and avoids path errors.
  3. Cleaning the Console – Click the broom icon or run rm(list = ls()) to start with a clean environment.

Importing Data

1. Excel Files (.xlsx)

library(readxl)
my_data <- read_excel("file.xlsx")
  • Use the Import Dataset menu → From Excel for a GUI shortcut.
  • After import, R shows the number of observations and variables in the Environment pane.

2. CSV Files (.csv)

my_data <- read.csv("file.csv", stringsAsFactors = FALSE)
  • Ensure the working directory points to the folder containing the CSV.
  • stringsAsFactors = FALSE keeps character columns as characters until you explicitly convert them.

Inspecting the Imported Data

  • head(my_data) – first six rows (or specify a number: head(my_data, 3)).
  • str(my_data) – structure, data types, and factor levels.
  • summary(my_data) – quick descriptive statistics for all variables.

Converting Characters to Categorical Variables (Factors)

my_data$parent_school <- as.factor(my_data$parent_school)
my_data$rank         <- as.factor(my_data$rank)
  • For many columns at once:
library(dplyr)
my_data <- my_data %>% mutate(across(where(is.character), as.factor))
  • After conversion, str(my_data) will show Factor with the appropriate number of levels.

Basic Descriptive Statistics

  • Continuous variables: mean(), sd(), summary().
  • Categorical variables: table(my_data$parent_school) for frequencies; prop.table() for percentages.
  • The summarytools package offers a one‑step freq() function that returns counts, percentages, cumulative frequencies, and handles missing values.

Creating New Variables with Conditional Logic

my_data$financial_literacy <- ifelse(my_data$quiz_score > mean(my_data$quiz_score),
                                      "Literate", "Illiterate")
  • ifelse() tests a logical condition and assigns values accordingly.
  • The new variable appears in the Environment and can be examined with table(my_data$financial_literacy).

Renaming, Subsetting, and Dropping Columns

  • Rename (using dplyr):
my_data <- rename(my_data, quiz_score = Q_score)
  • Subset rows (e.g., only first‑year students):
first_year <- filter(my_data, year == "First")
  • Select / drop columns:
my_data_reduced <- select(my_data, -parent_school, -rank)

Workflow Tips

  • Reproducibility: Save the full script (.R file) and always start with setwd() and library() calls.
  • Error handling: Read R’s error messages; they often point to missing packages or incorrect file paths.
  • Practice: Use the provided Google Drive folder, the YouTube recordings, and the Day1 script to repeat each step until it feels automatic.

What Comes Next?

The next session will cover exploratory data analysis, visualisation (histograms, bar charts, pie charts), and an introduction to inferential statistics (Chi‑square, correlation, ANOVA). You will also learn how to perform simple regression models using the tidyverse workflow.


Key take‑away: Mastering data import, proper variable typing, and basic manipulation in R creates a solid foundation for any statistical analysis you will perform later.

By following the step‑by‑step procedures for setting the working directory, importing Excel or CSV files, converting character columns to factors, and creating new variables, you can confidently prepare any dataset for analysis in R and move swiftly into more advanced statistical techniques.

Frequently Asked Questions

Who is RUFORUMNetwork on YouTube?

RUFORUMNetwork is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

What Comes Next?

The next session will cover **exploratory data analysis**, **visualisation** (histograms, bar charts, pie charts), and an introduction to **inferential statistics** (Chi‑square, correlation, ANOVA). You will also learn how to perform simple **regression** models using the tidyverse workflow. --- **Key take‑away:** Mastering data import, proper variable typing, and basic manipulation in R creates a solid foundation for any statistical analysis you will perform later. By following the step‑by‑step procedures for setting the working directory, importing Excel or CSV files, converting character columns to factors, and creating new variables, you can confidently prepare any dataset for analysis in R and move swiftly into more advanced statistical techniques.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF