Understanding Principal Component Analysis, Multidimensional Scaling, Correspondence and Factor Analysis – A Practical Guide

 5 min read

YouTube video ID: rCg8-KR6PPU

Source: YouTube video by RUFORUMNetworkWatch original video

PDF

Introduction

The session resumed after a technical glitch and covered the most frequently used multivariate techniques for data reduction and interpretation: Principal Component Analysis (PCA), Multidimensional Scaling (MDS), Correspondence Analysis (CA) and Factor Analysis (FA). The lecturer emphasized when each method is appropriate, how they are computed, and how to implement them in R.

Principal Component Analysis (PCA)

  • Purpose: Reduce a high‑dimensional data set (e.g., 50 variables) to a few uncorrelated components that retain most of the variance.
  • How it works: New variables (PC1, PC2, …) are linear combinations of the original variables with coefficients analogous to regression weights. PC1 explains the largest amount of variability, PC2 the next largest, and so on. The components are orthogonal (uncorrelated).
  • Visualization: Biplots (score‑plot + loading vectors) reveal patterns or clusters in the original high‑dimensional space.
  • When to use: When variables are quantitative, the number of observations is smaller than the number of variables, or multicollinearity hampers regression.
  • Limitations: PCA does not handle categorical data directly and can be mis‑used if the underlying assumptions (linearity, large sample size) are ignored.

Multidimensional Scaling (MDS)

  • Concept: Instead of operating on the raw data matrix, MDS first computes a pair‑wise distance (or dissimilarity) matrix and then finds a low‑dimensional configuration that preserves those distances.
  • Types:
  • Classical (Metric) MDS: Assumes the distance matrix satisfies metric properties; essentially equivalent to Principal Coordinate Analysis.
  • Non‑metric MDS: Works with ordinal or non‑metric distances (e.g., expert judgments).
  • Use case: Helpful when the original data contain missing values or when a distance‑based view is more natural than a variable‑based view.

Correspondence Analysis (CA)

  • Target data: Categorical variables summarized in a contingency table.
  • Procedure: Performs a chi‑square test, converts observed/expected frequencies into a matrix of relative frequencies, then applies a decomposition similar to PCA to obtain principal coordinates for rows and columns.
  • Interpretation: Plots display associations between categories (e.g., smoking level vs. staff position) in a low‑dimensional space.

Factor Analysis (FA)

  • Goal: Identify latent (unobservable) constructs—factors—that explain the common variance among observed variables.
  • Two branches:
  • Exploratory FA (EFA): No a‑priori constraints; used to discover the number and composition of factors.
  • Confirmatory FA (CFA): Theory‑driven; specifies which variables load on which factors and tests model fit (often via structural equation modelling).
  • Key concepts:
  • Communality: Portion of each variable’s variance shared with other variables (the part explained by the factors).
  • Uniqueness: Variable‑specific variance plus random error.
  • Factor loadings: Correlations between variables and factors; high absolute loadings (>0.3) indicate strong association.
  • Rotation: Orthogonal (e.g., varimax) yields independent factors; oblique (e.g., promax) allows correlated factors, improving interpretability.
  • Practical steps in R: Install psych and GPArotation, read the data (read.csv), test the correlation matrix (Bartlett’s test), extract factors (fa()), examine loadings, compute factor scores (regression or weighted‑average), and optionally rotate.

Practical R Demonstration

  1. Setup – Install and load psych and GPArotation.
  2. Working directory – Set to the folder containing test_score.csv and the R script.
  3. Read datadata <- read.csv("test_score.csv").
  4. Check suitability – Bartlett’s test of sphericity; a significant p‑value indicates that variables are correlated enough for FA/PCA.
  5. Correlation matrixcor(data) reveals two clusters of tests (1,2,4,6) and (3,5,7,8).
  6. Run FAfa(data, nfactors = 2, rotate = "varimax").
  7. Interpretation – Loadings show which tests belong to each factor; communalities indicate how much variance each test shares with the factors.
  8. Variance explained – The first two factors together account for ~78 % of total variance, confirming that two factors are sufficient.
  9. Factor scores – Obtain individual‑level scores for further analysis (e.g., regression, clustering).

Common Pitfalls & Tips

  • Never run PCA/FA on an identity matrix – No correlation means no reduction is possible.
  • Standardize variables when they are on different scales; otherwise the component with the largest variance dominates.
  • Avoid over‑extraction – Adding more factors yields diminishing returns; use eigenvalues >1, scree plot, or chi‑square tests to decide.
  • Interpretation matters – Statistical output is only useful when you can map factors to meaningful constructs (e.g., “sociability” vs. “consideration”).
  • Software nuances – In R, the fa() function can accept either a raw data matrix, a correlation matrix, or a covariance matrix; results should be consistent across these inputs.

Choosing the Right Method

SituationRecommended technique
Quantitative variables, many correlated predictorsPCA (data reduction) → regression with component scores
Need to visualise similarity of observations (including missing data)Classical or non‑metric MDS
Categorical data in a contingency tableCorrespondence Analysis
Underlying latent constructs (e.g., burnout, socioeconomic status)Exploratory FA → Confirmatory FA if theory exists

Conclusion

The session equipped participants with a clear roadmap: start by assessing the correlation structure, select the appropriate multivariate technique (PCA, MDS, CA, or FA), execute the analysis in R, and interpret the results through loadings, communalities, and rotations. Understanding the assumptions and strengths of each method prevents misuse and enables robust, interpretable dimensionality reduction.

Effective dimensionality reduction hinges on matching the data type and research goal to the right multivariate tool—PCA for quantitative reduction, MDS for distance‑based visualisation, CA for categorical associations, and FA for uncovering latent constructs—while carefully checking assumptions and interpreting loadings.

Frequently Asked Questions

Who is RUFORUMNetwork on YouTube?

RUFORUMNetwork is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF