Understanding Principal Component Analysis, Multidimensional Scaling, Correspondence and Factor Analysis – A Practical Guide

Name: Advanced Training on Scientific Data Management for Post-Graduate - Day 3
Uploaded: 2026-01-16T09:27:18.166206+00:00
Channel: RUFORUMNetwork
Description: Summary and key takeaways on Understanding Principal Component Analysis, Multidimensional Scaling, Correspondence and Factor Analysis – A Practical Guide
RUFORUMNetwork
Jan 16, 2026
•
5 min read
YouTube video ID: rCg8-KR6PPU
Source: YouTube video by RUFORUMNetwork — Watch original video
PDF
Introduction

The session resumed after a technical glitch and covered the most frequently used multivariate techniques for data reduction and interpretation: Principal Component Analysis (PCA), Multidimensional Scaling (MDS), Correspondence Analysis (CA) and Factor Analysis (FA). The lecturer emphasized when each method is appropriate, how they are computed, and how to implement them in R.
Principal Component Analysis (PCA)

Purpose: Reduce a high‑dimensional data set (e.g., 50 variables) to a few uncorrelated components that retain most of the variance.
How it works: New variables (PC1, PC2, …) are linear combinations of the original variables with coefficients analogous to regression weights. PC1 explains the largest amount of variability, PC2 the next largest, and so on. The components are orthogonal (uncorrelated).
Visualization: Biplots (score‑plot + loading vectors) reveal patterns or clusters in the original high‑dimensional space.
When to use: When variables are quantitative, the number of observations is smaller than the number of variables, or multicollinearity hampers regression.
Limitations: PCA does not handle categorical data directly and can be mis‑used if the underlying assumptions (linearity, large sample size) are ignored.
Multidimensional Scaling (MDS)

Concept: Instead of operating on the raw data matrix, MDS first computes a pair‑wise distance (or dissimilarity) matrix and then finds a low‑dimensional configuration that preserves those distances.
Types:
Classical (Metric) MDS: Assumes the distance matrix satisfies metric properties; essentially equivalent to Principal Coordinate Analysis.
Non‑metric MDS: Works with ordinal or non‑metric distances (e.g., expert judgments).
Use case: Helpful when the original data contain missing values or when a distance‑based view is more natural than a variable‑based view.
Correspondence Analysis (CA)

Target data: Categorical variables summarized in a contingency table.
Procedure: Performs a chi‑square test, converts observed/expected frequencies into a matrix of relative frequencies, then applies a decomposition similar to PCA to obtain principal coordinates for rows and columns.
Interpretation: Plots display associations between categories (e.g., smoking level vs. staff position) in a low‑dimensional space.
Factor Analysis (FA)

Goal: Identify latent (unobservable) constructs—factors—that explain the common variance among observed variables.
Two branches:
Exploratory FA (EFA): No a‑priori constraints; used to discover the number and composition of factors.
Confirmatory FA (CFA): Theory‑driven; specifies which variables load on which factors and tests model fit (often via structural equation modelling).
Key concepts:
Communality: Portion of each variable’s variance shared with other variables (the part explained by the factors).
Uniqueness: Variable‑specific variance plus random error.
Factor loadings: Correlations between variables and factors; high absolute loadings (>0.3) indicate strong association.
Rotation: Orthogonal (e.g., varimax) yields independent factors; oblique (e.g., promax) allows correlated factors, improving interpretability.
Practical steps in R: Install psych and GPArotation, read the data (read.csv), test the correlation matrix (Bartlett’s test), extract factors (fa()), examine loadings, compute factor scores (regression or weighted‑average), and optionally rotate.
Practical R Demonstration

Setup – Install and load psych and GPArotation.
Working directory – Set to the folder containing test_score.csv and the R script.
Read data – data <- read.csv("test_score.csv").
Check suitability – Bartlett’s test of sphericity; a significant p‑value indicates that variables are correlated enough for FA/PCA.
Correlation matrix – cor(data) reveals two clusters of tests (1,2,4,6) and (3,5,7,8).
Run FA – fa(data, nfactors = 2, rotate = "varimax").
Interpretation – Loadings show which tests belong to each factor; communalities indicate how much variance each test shares with the factors.
Variance explained – The first two factors together account for ~78 % of total variance, confirming that two factors are sufficient.
Factor scores – Obtain individual‑level scores for further analysis (e.g., regression, clustering).
Common Pitfalls & Tips

Never run PCA/FA on an identity matrix – No correlation means no reduction is possible.
Standardize variables when they are on different scales; otherwise the component with the largest variance dominates.
Avoid over‑extraction – Adding more factors yields diminishing returns; use eigenvalues >1, scree plot, or chi‑square tests to decide.
Interpretation matters – Statistical output is only useful when you can map factors to meaningful constructs (e.g., “sociability” vs. “consideration”).
Software nuances – In R, the fa() function can accept either a raw data matrix, a correlation matrix, or a covariance matrix; results should be consistent across these inputs.
Choosing the Right Method

Situation	Recommended technique
Quantitative variables, many correlated predictors	PCA (data reduction) → regression with component scores
Need to visualise similarity of observations (including missing data)	Classical or non‑metric MDS
Categorical data in a contingency table	Correspondence Analysis
Underlying latent constructs (e.g., burnout, socioeconomic status)	Exploratory FA → Confirmatory FA if theory exists
Conclusion

The session equipped participants with a clear roadmap: start by assessing the correlation structure, select the appropriate multivariate technique (PCA, MDS, CA, or FA), execute the analysis in R, and interpret the results through loadings, communalities, and rotations. Understanding the assumptions and strengths of each method prevents misuse and enables robust, interpretable dimensionality reduction.
Effective dimensionality reduction hinges on matching the data type and research goal to the right multivariate tool—PCA for quantitative reduction, MDS for distance‑based visualisation, CA for categorical associations, and FA for uncovering latent constructs—while carefully checking assumptions and interpreting loadings.
Frequently Asked Questions

Who is RUFORUMNetwork on YouTube?

RUFORUMNetwork is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.
Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.
Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.
Multivariate Analysis Textbook Recommended
Provides comprehensive theory and practical examples for PCA, MDS, CA, and FA, helping readers deepen their statistical understanding
Amazon →
R For Data Science Book
Covers data import, cleaning, and the R packages used for factor analysis and PCA, enabling hands‑on implementation of the lecture material
Amazon →
Psych Package R Documentation
Official guide to the `psych` package, which contains functions for factor analysis, rotations, and reliability testing, essential for reproducing the examples
Amazon →
Links may be affiliate links. We only include resources that are genuinely relevant to the topic.
Summarize another video
Full Transcript YouTube

e
e
e
e
e
e e
uh good afternoon
everyone uh we like
to uh would like to apologize for what
happened yesterday I think it was a
general problem we thought it was a
localized problem but we happy uh that
today
we are back on track and
hopefully
um it will not happen again where we
open and pray for for the same uh so I
would like to pick up from where we left
yesterday uh and then uh we will do some
practical at the end of of the day so
let me just try to go
to can can I see in the chat if um if
you're getting me
now type
one okay good so I think we are we are
on
track
so I just want to pick up exactly from I
want to start from the principal
component
analysis yeah so yesterday we were
talking about uh data reduction or
Dimension reduction and principal
component is one of the the tools or uh
that is used very
frequently in Dimension reduction and
it's also one of the tool that is highly
abused by very many people in most cases
uh if you ask somebody why using this
particular tool what are you what do you
want to communicate sometimes you don't
get the
answer so
for the main reason as to why one can
use a principle component analysis the
first one is to reduce the number of
dimensions of the data set so if for
example you have say
50 50 variables that you collected you
can reduce this to say under five
variables five under five principal
component and then once you produce uh
the many Dimension uh you can use that
information uh to find out uh if there's
any pattern in the high dimension the
data uh this is mainly done through
looking at the the principal components
if you plot uh we use by plots and from
the B plots you'll be able to see
whether there's some pattern in your
data set will also be used uh also use
it for visualizing high dimension
data so the whole idea behind principal
component analysis is that uh we are
creating new sets of variables and the
sets of variables are what we call
principal components and principal
components are simply going to be a
linear
combination of the original variable
that you're dealing with so is this is a
linear combination of the variable that
you have so let's say for example this
is our new variable that we want to
create we call it principal component
one or
pc1 and then uh pc1 is going to be a
combination of all the other factors the
other variable that you measured in your
in your study so there's X1 there's X2
after to say xq and each of them will
have uh some kind of a coefficient like
what we have in the regression so you
can see from here uh that's why we say
this is a linear combinations of the
different variable so the the idea here
that the new variable that you create uh
will
have uh how many people are not seeing
my screen
just let me first stop
the stop the
show I
want yeah I think everybody seeing my
screen so I don't know there are people
who claim they are not seeing okay good
so let me just continue
okay so so we
have uh this new dimension this new so
we have y1 we have Y2 we have Y3 and so
on and so forth so the new variables
that I derive are of increasing
importance like we're saying here so
that means uh uh principal component one
is having is the the one that explain
most of the variability in the data set
and uh followed by principal component
number two but in addition also that the
principal component are independent or
are not correlated so in in one way or
the other they tend to uh give
independent pieces of information so the
pieces are
uncorrelated and are ordered so that the
first few of them account for most of
the variability in the original data set
so that's the whole idea behind the data
reduction so if you have 50 different
variables that you measured uh you want
to reduce this to say for example let's
say only three principal
components and that would be the the
main purpose so we hope that the three
main principle The three principal
component uh that you actually using are
going to account for as much variability
as possible uh in your in in in the
study uh so uh principal component can
also be looked at as changing the D
changing the the the the coordinate
system so we have the our normal uh XY
coordinate system uh so here and when
you see what with our XY coordinate uh
we with principal component analysis we
are creating new we are creating new new
new uh coordinates so we are creating
actually new AIS now as you can see this
is our first axis which is our principal
component one and principle component
one as you can see it cover it has this
is where you see the bigger percentage
of the noise of the variability so
principal component one tend to explain
a bigger variability in our in our data
set compared to principal component two
and then we can bring maybe another
principal component three or principle
component four and so on and so forth uh
so from you are so many variables you
can reduce uh the number of principal
component uh so in in terms of
utilization so once
uh
you converted or you created your
principal component uh you're going to
actually use this information to either
generate uh good graphical represent uh
representation uh that is what to say uh
you want to have very good visualiz
out of it or we can also use the outcome
or the output from Principal component
analysis as an input in other analysis
and this is especially in principal
component uh in regression analysis so
what we can do is then that because we
have so many
observations uh we cannot use all of
them in the regression so sometimes you
have so many observation and the many
observation the number observation
actually too many compared to the number
of unit on which you you made made
measurement so let's say for example we
take measurement on 50 uh people so you
have one up to 50 and
then you have 100 maybe say 150
variables measured on this uh this 50 so
you can see from here that uh your rows
are shorter than the number of columns
so in that case this kind of data cannot
be handled properly using the normal
regression that we have so the best
thing would then first be to reduce uh
the number of variables uh from uh say
150 to say 10 and then we can do
regression analysis now this is also
useful when uh the variable that you're
dealing with are highly correlated
remember when we said when the variables
are highly correlated what happens the
issue multicolinearity will come in and
the issue multicolinearity uh makes uh
the the coefficients that you've
calculated to be unstable they will have
a lot more
variability so in in in this case we
just need to get the first pre principle
component and then we use this principle
component to do our Recreation analysis
or do our data visualization rather than
to depend on the entire data set that
you're dealing with let me see if there
any question
are we still
okay uh you can put your your your
questions uh in the chat in the q& a if
there's some other question you can put
in Q and A and then my colleagu uh Ellen
I will will uh answer them for you our
boss is is is is is going to come a
little bit late but she will be in uh so
today you see you you she will join at
some point along way okay so let's
proceed uh so uh then the the other
uh data reduction techniques uh that is
also commonly used is what we call
multi-dimensional scaling so
multi-dimensional scaling uses the same
Principle as principle component
analysis that is also Based on data
reduction and then it is also for
visualization and other thing the main
difference here is that um our uh MDS
does not operate directly on the
multivariate data set itself so what
happens is that it first converts uh the
data set into pair wise distances
between the unit so if we have your your
data set the normal data set that we
know uh let's say you picked several
observation on 10 student uh what it
does is then it first uses that data
there to come and uh create a data
Matrix now when you create a data Matrix
and then from the data Matrix you do the
decompositions and then you get your
your principal component or principal
coordinates so this one are what call uh
you can have what call principle
coordinate analysis so and then from
there you can still do your
visualization the same way you would do
with principal component
analysis so multi-dimension scaling aims
to find a set of points in low Dimension
uh like just like principal component
analysis um and then we have two uh type
of of of multi-dimensional scaling we
have what called the classical
multi-dimension scaling which is also
called principal coordinate analysis I
know
there is always confusion between
principal component analysis and
principal coordinate analysis so I think
today you'll get the
clarification uh principal component
multi-dimensional scaling is actually
useful especially if you have some
missing observations when you have
missing observation sometime calculating
the principal component becomes a little
bit challenging so if you want data
reduction the first thing you can do is
to convert uh your data set into
pairwise distances and these pairwise
distances can then be decomposed in a
similar way to now get for you uh
different uh uh different the new
variables or the the principle the
coordinates and this new variable can
then be used uh the same way you can use
the principal component
analysis so here it takes uh an input in
form of a dissimilarity matrix and then
produce for you
uh
now the the differ between the
classical and then the nonmetric and
then the
nonclassical uh uh principal coordinate
principal M multidimensional scaling is
that the multi-dimensional scaling the
classical ones assume that the distance
uh that you're dealing the distance kind
of Matrix that you're dealing with has
to follow some kind of
conditions so for example the distance
between A and B and C should add up to
some value so that's you need I'll ask
you to go and read about the metric
characteristics of the distance uh but
uh we then have the the the the
nonmetric MDS uh which in case it can
take distant measures that are not
metric so for example sometimes uh you
can have distances based on Judge
judgement my ordinal judgment say oh A
and B are closer together I mean you
could have distances based on R based on
some kind of score uh if they don't
follow the metric uh then you can still
use multi-dimensional scaling but in
this case uh the distance doesn't
there's some characteristic that does
not uh the distance measure that you
have doesn't have to
follow you to go and try to read a
different between the classical
multi-dimensional scaling and then the
nonmetric multidimensional scaling but
at the end of the day when it comes to
the interpretations uh the
interpretation is similar to what we
have in principal component analysis so
I'm try to build up the argument uh from
Principal component analysis and we said
okay from Principal component analysis
uh you can convert your row data into
distant matrices and then you can discom
decompose the the the matrices the
distant matrices that you have to get
the different uh principal coordinates
or principal I mean you could also still
talk about principal component but these
are principal coordinates and then you
still use it the same way uh like you
use the principal
component and then we we have uh what
called the correspondent analysis so the
correspondent analysis is also very
similar to multidimensional scaling
and that mean it's also similar to
principal component analysis uh so here
what happens is
that
uh we are dealing in this case with
categorical variables so when you're
dealing with categorical variables you
we all know about contingency Square so
let's say for example we have let's say
we have uh a contingency table
okay so we can have a contingency table
of two
factors uh let's say you have I think I
have have
a okay let me just use this example so
here for example you have the smoking
class so you can say somebody's a
nonsmoker somebody is a light smoker a
medium smoker a heavy smoker so and then
we have staff categorization so we have
senior staff senior managers Junior
managers senior employees Junior
employees and then you have the
secretary so this is what we call a
contingency table now if you want uh
what we know what we always know how to
deal with this is that we can always go
and and do kai Square analysis based on
that now when you do kai Square uh you
can make a conclusion that let's say for
example here that that there is an
association uh let's say we can conclude
that there is an association between the
smoking class and and say the manager
the Stu categorization so if we we we we
we conclude that there's an association
so the question that always comes in
mind would then be where is the
association coming
from okay where is the association
coming from so for Kai Square always
what we do we we look at we
calculate uh we calculate the uh
expected expected versus observed value
so you want to see uh the cells where
there's a big difference between the
expected value and the observed value
and you try to use that as a basis to
explain uh the association between the
two factors in this case between the
smoking class and then the stop
categorization so somebody may want to
say well
uh that let me first try to clean off
this okay somebody want to say well uh
junior
employees uh seems
to uh to be more in the smoking category
let's
say uh than
nonsmoking okay so you can see you can
look at this as
18 versus this uh you could look at the
secretary as most of the secret criter
is are either light smoker or nonsmokers
you could look at it in that form so
with with correspondent analysis what we
do is then that we move one step ahead
of this so let me go back a little bit
here okay so what we we do here is that
from the contingency table like what we
saw we come out with a K Square test we
can do a kai kai Square test and now
from the kai Square test we can
construct some kind of a table not
really a distant measure per se let me
just put it in quotes and then based on
on this we can do a dimension reduction
just like what we've done before
so uh always what happens in this case
we end up with calculating probabilities
what's the probability of being here uh
if there was no independent compared to
if there is indep there is dependent
between the two factors so we're going
to construct uh tables of
probabilities and these tables of
probabilities are the one uh that we are
going to use uh to actually do a right
Dimension reduction on so then it will
help us to have a better explanation of
what is going on in a 2X two or in a
contingency table so corresponding
analysis is like principal component
analysis but not apply to categorical
variable
uh it can also be
represented uh in forms of the First
Dimension versus the second dimension
and and so on and so forth so here for
example you can have here this dimens
this group versus that so if this is our
first Dimension so you can see that uh
this seems to have this category uh
versus uh this
category and then if you go to the
second category we seems to have this
group
here we seems to we have this group here
uh versus uh that group
here I'm also happy to announce the
arrival of our boss uh Prof you're most
welcome
okay so so so uh mathematically
correspondent analysis can be regarded
either as a method for decomposing the K
Square statistics used to test for
Independence in a contingence table uh
in two comp component corresponding to
the different dimension that you have or
it can also be a method for simultaneous
assigning scale to row a separate scale
to row and separate scale to column so
you so in if you go back here you want
to compare you will be able to compare
different categories of
smokers uh and that is you can look at
the different columns
independently or you can also look at
the different rows independently so you
actually be able to compare the
different employees and you'll be able
to say for example that well uh we seems
to have uh
more and more senior employees who are
nonsmoker uh compared to uh other groups
so that's the kind of information that
correspondent analysis is likely to give
us so it is a technique for displaying
multivariate categorical uh data
graphically by deriving conditions uh
coordinates to represent the category on
both a row and column and then you can
PL to display the association uh
graphically so this is what kind of uh
what we have here
are we together
still
okay uh and
then so I I tried to build the story
a little bit from the principal
component analysis uh we came to
multi-dimension SC and then we came to
correspondent analysis and then we
finally going to look at what called
fact analysis now fact analysis has a a
special uh uh considerations so factal
been developed mainly in social
sciences uh where uh there are things
sometime you cannot measure you cannot
measure some certain things directly if
you say somebody is is religious for
example there are different elements of
being religious is it because somebody
goes to church goes to the MOS is it
because they're help to help others I
mean what are the different component of
that if you say that somebody is is is
wonderful how would you measure somebody
being wonderful there could be different
elements to that now for for those who
who work uh from 9 to 5 there is a a
common thing called burnout so burnout
is when someone who has been working
very hard on the project all of a sudden
he feels that the energy is gone so
sometime you reach your office and you
find like you want to go back home but
because you don't want to be chased from
your job you will have to continue so
that's that something that you hope but
you cannot measure B out uh directly but
you can measure things that you can
associate with burn out so you can for
example uh design a question that can
help to measure motivation so for
example you can tell somebody can you
rate the level your motivation on a
scale of one to five uh you can measure
stress can you ask somebody can you
raise out the level of stress
uh you can find out whether somebody
have new ideas or
not so we assume that these different
variables are driven by the same
underlying or the same latent variable
and these are what called the factors so
factors are these things that you cannot
measure but you interested in studying
so like for example intelligent
intelligence is one of the the the areas
that use a lot of factor
analysis so so the the concept
here uh is that this multiple observed
variables have similar pattern of
responses because they associated with a
particular L uh variable I'm going to
show you an example when I go to another
presentation so we can assume that
people May respond similar to questions
about income uh
about education about
occupation so about all this will be all
of them are measuring what called the
social economic status so if you talk
about
income uh you talk about education talk
about so income meas one aspect of
social economic status education would
measure another one occupation and so on
and so forth uh so we need uh to be
able uh to look at the data set that we
have and then be able to come up with
some kind of
uh a variable uh that will help us to
explain uh what is going
on so factor analysis are three main
uses the first one is to understand the
structure of a set of
variables uh so this has been used
extensively to understand the structure
of Lattin variable called intelligence
so most of what we use today was
developed uh based on on on that
um it is also used to construct
questionnaires to measure some
underlying variable so if you want for
example to measure let's say burnout now
how do I get I construct a set of
questionnaires a set of questions that
can actually measure a burn out and like
principal component analysis is also
used uh to reduce the data set to imag
imaginable IM manageable size while
retaining as much of the original
information as possible and um in most
cases what happens is that factor
analysis and principal component
analysis are quite similar to close to
each other the differences are are not
much but you can also first do principal
component analysis and then proceed to
do factor analysis based on that so
principal component analysis is part can
is a process uh that can help that you
can it's a step in achieving a factor
analysis so the underlying Dimension
that we want to measure are what we call
factors or or or Lattin
variables so so the the main idea here
is that by reducing a set of inter
related variable into a smaller subset
of smaller sets of factors Factor
analysis will be able to achieve what
called
Pony uh by explaining the
maximum amount of common variance in the
correlation Matrix I have a small
example that I'm going to show after
this that will really try to bring uh
this one quite
clearly uh just to to mention uh when
you're doing factor analysis factor
analysis can be categorized into two we
have what called
exploratory fact analysis and then we
have a confirmatory fact analysis so the
exploratory factor analysis is used in
the earlier stages of
Investigation uh and is mainly you can
also use it to set out uh future
hypothesis that can be uh can be
tested so with with
exploratory factor analysis uh will
determine which observe variables are
highly correlated with the factors so
remember if we come up with three
factors the different variable that we
measure will contribute differently to
the different Factor so we want to see
which variables contribute uh to which
factor so if we are measuring B out what
are the different factors that are there
in Bur out and which of the variable
that we measure are
that let's check whether
there question that I need my attention
but I think it's
okay so so here the the thing is that we
have the variable that we have observe
and we want to this variable to some
factors or what we what we have an
underlying factors so we want to
determine which of the variables are
highly correlated uh to the factor and
how many factors are needed to give an
adequate description our data set and
this is also very useful as far as far
as constructing questionnaires are
concerned so if for
example uh you have several uh
variables uh that you measured and then
you come and and look at them in I mean
try to construct factor analysis out of
the data set that you collected you are
going to realize that some questions
become redundant in that they will not
be contributing uh to the different
factors that you've highlighted so in
that case you may need to drop them from
your questionnaire so that you don't go
and ask your respondent so many question
that may at the end of the day not
contribute to your
study so what what briefly also what we
we we we can tell about exploratory
factor analysis that you don't place any
constraint on which variable load on
which factor so you since you exploring
you're not going to restrict that uh
variable 1 2 three are for this factor a
and not for Factor 2 B and so on and so
forth but in
confirmatory uh factor analysis uh you
can actually have some constraint so
variable are allowed to relate to a
particular factors uh while it may have
zero loading on other factors I'm going
to explain what the loading
is uh with an example probably this one
is going to be a lot more
easier so a confirmatory factor analysis
may actually arise from a theoretical
consideration so always as a a social
scientist or as a Psychiatry or a
psychologist you you have some
theoretical consideration and and based
on that uh that consideration you may
want to actually go and test uh that con
uh test your concept using exploratory
data analysis or uh you can
start with the results from the
exploratory fact analysis and then based
on the result explorat factor analysis
you can go ahead and do your
confirmatory uh factor analysis now
confirmatory factor
analysis um is actually one of the sets
of uh what we call uh Lattin VAR Lattin
variable
modeling okay so the the confirmatory
factor analysis model a subset of a more
General approach to to modeling latent
variable uh and these are mainly
categorized as structural equation
modeling or covariant structure modeling
so these are things that we may need a
bit more time to cover them so such kind
of model allow both the response and
explanatory latent variables uh to be
linked by series of linear
equation so that is an overview of the
different
approaches that one can use uh to
determine useful multivariate analysis
so uh the main uh purpose of the
presentation is to empower you uh to
allow you to have a ability to be able
to make a choice when it comes to
analysis and you need to know when you
want you can use multivariate analysis
and when you can use University analysis
but also you need to
know which of the tool you can actually
use so that is the purpose for the
presentation that started yesterday and
ended today uh due to uh the internet
failure of yesterday otherwise we should
have finished this yesterday so now I I
want us to move and try to look at fact
analysis in more deta tell and then we
do some uh we run some Aras script and
then try to do interpretation of some
results I hopefully uh the it will using
a life example will reinforce the
points okay are we
together so I'm going
to twit a
bit are there questions
okay I I I thought the difference
between the the the with PCA P principal
component analysis I as I indicated uh
is can be done within uh within fact
analysis uh fact analysis uh per se uh
the main idea behind factor analysis is
that you want to
actually analyze lat variables there
variables that you cannot
measure and and you cannot me directly
but you can uh measure other variables
and then you can use the information
from other variable construct what we
call factors and based on those factors
you'll be able to explain the phenomena
what is going
on okay now a factor in this case when
we talking about a factor in this case a
factor in this case is some kind of a
conceptual idea let's say for example
band out as a factor let's say being
religious as a factor a variables are
those
questions that will measure the
different aspects of of of of of the
different aspects of of of a factor so
let's say for example let's say the
factor is we want is religiousity okay
we want to see we are interested in I
mean finding out how religious people
are but you don't have a scale on which
you can measure uh being religious that
now becomes an underlying factor which
is an underlying variable or a latent
variable which we now call a factor now
things that you can measure for example
you could say um you can ask people how
many times do you go to church in a
month that could be one going to church
is one aspect of somebody being
religious but that can be measured
directly that's a variable now you can
also ask uh how many times do how many
times do you greet your neighbors every
day as you're going that could also be
one aspect of being religious uh do you
visit which how many time do you visit
which doctor that would also be part of
being being released so those are the
different aspects of what we
measure uh of what we actually measure
ah now the the issue of multivariate
which data can be applied to I mean we
talk the categorical data I think the
categorical method that we talked about
corresponding analysis as one multivar
technique that can be used uh for for
for categorical analysis correspondent
can be used for categorical analysis
after you've done your Kai Square you
can proceed uh we talked about the the
principal component mainly make the
assumption that you're dealing with
quantitative variables uh if you're not
dealing with quantitative variables you
can also deal with true or false you can
construct distant measures and then when
you construct the distant measures you
can actually go ahead and and do
multi-dimensional scaling which is still
similar to princip component analysis
but now you're able to use it on two uh
a distant onto a matrix which is a
matrix of pair wise distances between
the different
observations difference between t and a
k Square no I think that one you can go
and
read okay uh now let's have uh okay let
let me first uh continue a little bit
with another presentation and then I can
give a break then we come back and
no those are now the details or the
procedures that each principal each each
multivariate analysis have a whole range
of procedur so whether you're going to
do standardization whether you're going
to use correlation Matrix whether you're
going to use Co Varian Matrix those are
the finer details so for me at this
point is first of all we need to know
you should be able to choose a method
that you want to use based on the
analysis that you want to do and then
you need to know how to do and that's
where now we can go and look at in each
and every uh in the different method
that we have uh so once you choose
principal component analysis then you
can go in principal component analysis
and look at the steps and look at the
procedure that is covered
there okay let me change a little bit
so the material for today has been
shared Okay so so I I I I have a one of
the books that is my favorite book this
discovering statistics using r i I
picked some material from that and
modify for this presentation this a
shorter presentation on fact
analysis okay so uh so so this is an
example that we going to look at let's
say for example you're interested in
determining what makes a person popular
uh so there are several measurement that
we believe can tap into different
aspects of popularity so the first thing
is uh uh uh pass somebody's social
skills uh the second one is if you can
me how selfish they
are uh you can also meas how interested
they are how others find them interest
so we can say for example on a scale of
1 to five can you rate how how
interested are you in other people so
you can get a measurement there uh the
proportion of time they spend talking
about other people in a conversation so
if you go in a conversation and you talk
about the other one the other one the
other one but you can also measure the
proportional time spend talking about
thems so if all the time you're talking
about Thomas my Thomas is a very good
person and all that so we can these are
the different aspects of what makes
somebody popular so we can measure so
these are the variables the social
skills is a variable selfishness is a
variable interest is a variable talk
time is talk one is a variable talk two
is a variable and then our popularity is
the factor our popularity is the concept
that we want to we want to we want to so
this is the underlying Factor underlying
concept that we want to
measure okay now let's assume
no I don't have the link of the book
here okay so now let's let's let's let's
say after you have gone to the field
you've asked maybe you ask 100 people uh
and they tell you uh uh about talk time
about social skills about interest about
talk time to selfish and lying whether
somebody's able to lie or not so now
what we see here this is what we call
uh this is what we call a correlation
Matrix so a correlation Matrix is
supposed to show a linear relationship
between uh two different variables and
now for example here uh the correlation
between talk one and social skill is
positive is
0 uh
772 so that means that a person who
talks more about others also tend to
have higher social skills now a person
who talks more about others also tends
to
be in people tends people tend to find
that person more interested
then the person with social skills is
also let's let's first go back here so
interest how interesting others find
them okay so here we can see that a
person with high social skills tends to
be consider uh other tend to find that
person interesting so you can see from
here that we have
three we
have
three variables that actually are highly
correlated among themselves so this
variable seems to contribute to
something they related that mean they
must be measuring a similar aspect and
then let's say they're measuring what we
have as Factor
one and then the second one here
now when you look at the others you can
see that talk time one and talk time two
talk one and talk two are not correlated
so that mean if you cannot talk more
about other people while at the same
time talking about yourself so that's
why they're not
correlated people who talk more about
others are not I
mean they tend to be less
selfish although there's no no this is
not very high
uh but now if we come here we seems to
see another another thing taking place
in this side
here the somebody who talks more about
themsel is seems to be positive
correlated to
selfishness okay uh it's also if you
talk more about yourself you also tend
to be more of a liar okay uh so so you
can see this are good aspect and these
are negative aspects
okay so here from this remember this is
a multivariate set so we have variable
one variable two variable three variable
four variable five variable six so with
multi-dimension reduction is we are
interested in reducing this variable to
say uh two principal component or two
factors and so on okay so is this uh is
this uh uh matric clear to everybody
okay everybody seems to be okay with
that
good ah George you saying
no okay so let's let's let's proceed a
bit okay so factor analysis strive to
reduce uh The aror Matrix down into its
underlying Dimension by looking at which
variable seems to clust together in a
meaning F way so from here we seems to
see uh that these variables these
clusters here cluster together and they
seems to indicate
that it seems to indicate aspects of
somebody really being very social and
this one here it seems to look at
something like antisocial or something
related not caring about other people
consideration about other
people okay so we can see that we have
two CL
okay so reduction is achieved by looking
at which variable correlate highly with
a group of other variable but do not
correlate with others outside the group
so we can see from here that this ones
here correlate amongs and then they
don't correlate well with the rest you
see you see when you looked at this
values here they don't correlate with
the rest the correlation with the rest
are quite low and this one also
correlate amongs but they don't
correlate much with the with the
rest okay so in this example we have two
clusters uh which we can translate say
two factors so one factor could be
General sociability and then the second
Factor could be uh consideration for
others so the question is uh if we get
one person we can how do you score in
relation to factor one how do you score
in relation to factor two uh So based on
your score relation Rel to factor one
and relation to factor two we can tell
whether you are social or not social uh
okay we can tell how much you have
consideration for others or
not okay so based on this we can also
later come and look at plot the so this
is our Factor one and that's our Factor
two so you see Factor one uh have high
score you can see this group here will
have high scores the scores are from
here they have high scores
on on Factor one okay so talk time one
is highly correlated with sociability
interest in others is highly correlated
to sociability social skills is highly
related to sociability but when it comes
to uh the consideration which is Factor
two you see their scores are all in a
range from I think they they they fall
between under they fall
under under two or 2.5
0.25 me first clean
up okay so you can see from here uh that
all the variable all this variable here
have this very low
scores okay but
this group here have high scores on on
that uh so we have two groups
here okay if you talk talk one is highly
correlated with sociability interest is
highly correlated with sociability and
then social skill is also highly
correlated with sociability now when we
come to uh consideration for others then
you have selfish talk to and liar having
high correlation here but low
correlation on the the sociability
factor so this is what we have so the
coordinates of the variable along each
axis represent the strength of
relationship between the variable and
each factor okay and if for example you
get some variable that seems to fall
somewhere here if you're constructing
questionnaires and you get VAR variable
that seems to fall here now this
variables are not contributing to this
and are not contributing to that so this
variables probably are not very useful
as far as your construction the
questionnaire is concerned if you want
to measure uh popularity uh in in in
your population okay so so in ideal well
a variable should have a large
coordinate for one of the axis and a low
coordinate for other so here for example
talk time has a very high coordination
with sociability but then it has a very
low one for consideration that is what
we have okay so we would
prefer we would prefer a variable to
relate to only one of the factors but in
reality this may not necessarily be the
case and in some case uh if you had
people talking about rotation so
rotation may come in to help so that we
have a situation where we have a very
clear distinction uh between a variable
and the
factor a variable that have a large
coordinate on the same axis are assumed
to measure different aspects of a common
and line
Dimension so all these ones here are
measuring different aspects of
sociability all these ones are measuring
different aspects of consideration for
others so the coordinates of a variable
along the classification axis is what we
call as a factor loading so if you're
going to look at the results of factor
analysis you're going to see what we
call Factor
loading so Factor loading can literally
be thought of as a PE correlation
between a factor and a variable so the
higher the value the higher the more
they
correlated so
mathematically uh so mathematically this
is how we would represent a factor so
you have your Factor uh Factor one and
then Factor
one this is uh
the loading for variable one this is the
loading for variable
two after to variable n so we have this
as the loading for the different
variable uh so for the case of um uh
our uh Toy example sociability you have
B1 and talk time one B2 and social
skills B3 and interest B4 and talk time
two B5 and selfish and then B6 and lying
or being a liar so these are these are
what constitute the linear part of the
linear so here we certainly have what
call a linear combination like we talk
about in princi component analis these
are linear combinations so what we can
see from here that each
factor is a linear combinations of the
variables that you
have so each factor is a linear
combinations now the question is that at
some point
uh this this may
contribute like sociability we this one
was high this one was high this one was
high and then the other ones were low so
this mean this values are going to be
low this value going to be low and that
value is going to to be low then when
you go to considerations this values
will be high will be high will be high
so if we are so the
B are what called the factor loading
that we're talking about and the factor
loading are relating to the correlation
between the factor and the variable that
you're dealing
with so from from here we can see if we
are to put the value you can see the
loading uh for talk one in sociability
0.87
uh
0.96 for social skills
0.92 for interest and now now look at
the the the
rest you can see 0 0 for talk time two 0
0.1 for for selfish and
0.09 for lying and these are the ones
that we have here so these are the first
three
and these are the next three so that
means our Factor one are mainly in these
are the ones of interest in Factor one
when you come to factor two you see now
the thing change and then these ones are
are no longer contributing much to to
that okay so this is how a factor Matrix
looks like these are the betas our
factors
okay so we we we now know uh at least we
know about the factors we know about the
the
loadings uh so so the interpretation of
um factor
analysis uh is helped greatly by a
technique known as
rotations so rotation can either be what
we call orthogonal rotation or an
oblique rotation so so when we look at
this example
here if I I clean it up you see this one
is a clear cut difference here you have
this group here and you have that group
there there may be a time when things
are a little bit shabby you can have the
thing running all over the place so if
you do rotation it might make you have a
better picture of what is really going
on uh within so with the
orthogonal uh rotation so each factor
will be independent of each other that
means the correlation the loading is the
correlation between the variable and the
factor so so here you can see that we
have two
factors the first Factor only deals with
the first three uh variables the second
Factor deals
with the third the the last three
variable so in this case we can actually
look at this this as some kind of a
orthogonal rotation because they create
a factor
analysis factors that are independent of
each other now you can have kind what
you call
oblique uh uh rotation so with oblique
rotation we allow for the factors to be
correlated or related to each other
somehow so we are not going to have a
situation where you have a
clearcut uh
differences like this where this is off
here and that you could have a situation
where one of these could be in both
group and and it also depends entirely
on uh what your theory tells you okay so
the correlation between the variables
and the factor will differ from the
corresponding regression so there are
different way of looking at uh uh your
linear
combinations somebody has raed their
hand up there some question that I need
to
answer yeah so he said the BET has a
correlation coefficient yes the BET has
a
correlation uh rotation is also
reduction no rotation is not necessarily
reduction rotation is just you're trying
to uh I mean he's trying to manipulate
the system so that you can get a better
view or better or your result should be
able to give you a better theoretical
representation of some
sort okay yeah so so we we looked at the
the
loadings uh okay the next thing is the
scores so so what we want to do is then
that uh each person should then be
scored based on the factor because at
the end of the day we want to know in
terms in terms of factor factor one how
do you perform in terms of factor two
how do you perform can we consider you
to be somebody social or or somebody
highly social or can we consider you to
be somebody who is not social so that
means uh since we have this equation
here we have this equation
here uh for
sociability uh what we simply need to
know is that if we have somebody called
Dennis we have somebody called Dennis
with some information relating to Dennis
so we could substitute let's say Dennis
on a scale of 1 to four on 1 to five
Dennis has has a score of let me first
clean it up a little
bit so we can have a situation where uh
this person talks is I mean talk one as
a score of four and then on social skill
as a score of nine uh on interest as a
score of
eight uh on uh talk two as a score of
six and then on selfish as a score of
eight and then
on on Li lying as a score of of of of of
six now you can see the scores are not
that different
but the coefficients are the one so in
terms of sociability this passion scores
are
19.22 and
then uh in terms of consideration this
person scored say
15 uh 21 so probably what we're saying
is are the scores are maybe this person
seems to be neutral a bit uh
19 22 versus is
15.24 so it's more on the sociability
side but also score somehow high on the
consideration now this is what we call
the weighted average method which you
simply pluck in the values uh there is
also what we call the regression based
method I'm not going to go deep into it
but then it also calculates for you so
when you look at regression B based
method what something strange seems to
be happening
here of course the value are much closer
together but also we can see that
there's a twitch in terms of sociability
the person has a lower score compared to
consideration but in general this person
seems to be somewhere in the middle
there are a majority of people who don't
belong to either side uh there are
people who at right consider be consider
to be score very high on consideration
and very low and sociability and so on
and so forth
now the naming of the factors are
actually very challenging the naming of
the factors will depend entirely on you
as a specialist
and and and how you understand your feel
but what we are trying to give you here
is we're just trying to show you what is
available uh and uh we give you the tool
it is you to decide what you want to use
with the Tool uh when you fail to use
the tool don't blame the manufacturer
because the manufacturer gave you the
tool and with the instruction so we we
giving you the clear instruction on what
how you can use the
tool so how do we use uh factor analysis
Factor scores so any further analysis
can be done on the score uh instead of
uh somebody's asking do all variable
need to be in the score the same range
no the variable don't need to be in the
same
range okay so one thing is then that we
can use the score uh to compare female
more social compared to the male or
using the sociability scale so you
already have a variable that you can now
use uh to run but you can also use this
to do regression analysis so now if I
have um all the measurement on
sociability I have measurements on
considerations I have measurement on
another say third variable I should be
able to put them together and
then uh I I run my regression
analysis okay so now choosing method for
running factor
analysis so there are several several
ways of of doing factor analysis or
aning factor in your data so your choice
will depend on what you want to do with
analysis if you want to do ex uh uh just
just an exploratory a descriptive uh you
can do exploratory uh factor analysis in
most cases people do more exploratory
fact analysis or you may be interested
in confirming or generalizing from
sample to population then in that case
you can do your confirmatory so the
first thing is you need to choose are
you going for exploratory or you're
going for uh
confirmatory so you you may need first
uh so the the the the fact analysis was
first
build uh
for exploratory analysis and then uh
that we use to generate uh the
hypothesis you can also use what we call
the principal component based
method so the principal axis are used as
you factoring and in that case the
conclusion can be restricted to the
sample you can also use another method
which is the maximum likelihood method
or there's also what we call the Kia
Alpha Factor method now for these two
method the conclusion can be applied to
the population but are restricted to the
variables use uh the challenges with the
making with choosing uh multivariate
analysis always
if your level understanding what each
factor what each VAR each analysis does
uh you're likely to choose the wrong
method first of all you may choose the
right method and still uh you also do I
mean you can choose the wrong technique
or you can pick the right technique but
still within the technique you can still
mess up so to to to me uh since this is
the first uh uh uh lecture or the first
training we have had on multivariate
anal is I I I would suggest first of all
if it is your first time first of all
pick the general concept after picking
the general concept you need to get a
bit more time go down and try to uh look
at the individual uh technique that you
want to use yourself so if you want to
use multivariate uh you want to use
principal component analysis then I
would suggest then that you go uh to the
principal component analysis and try to
understand more there are several
examples there several tutorials that
are available online you can run them uh
and then uh you should be able to uh at
least perfect uh your skills before you
can uh go and do the
analysis a few more slides to go uh when
we are
doing uh factor analysis there is a term
called communality which is very very
common
uh
now when we look at at uh this uh
correlation variance Co variance or
correlation
Matrix uh we can see that
uh we have values of one one across the
diagonal and these values of one across
the diagonal are actually these are
actually the variances divide by the
variances themselves
okay so here the total variance for each
variable are so what what we are saying
here is that if you go and look at the
data you've collected on talk time
one of course if you talk time one
you're going to get there's going to be
variation okay there's going to be uh
variance so since there's going to be
variation not not I mean people will
give you different responses on a scale
of say one to five uh so part of this
variation here uh is shared with
remember this
DC3 we said that they seems to be
they're dealing with one factor so that
mean there's some variation that they
share among them so there's variation
that is shared with the social scale
this variation that is shared with uh
with interest so this is what we call a
common variance so variance shared with
other variables and then there is the
unique variance so unique variance is
going to be variance that is specific to
that particular variable let's say
variant that specific to uh talk talk
one um and part of that unique variance
uh in in the strictest term this unique
variance should be the variance that is
only attri attributable uh
to only one measurement uh I mean in
other
words there is that component of the
random noise which is also also included
in so so this unique variance here has
something it has what is unique uh to
the value but it also has some kind of
noise that is associated with it but the
key thing here is that when you go to
look at the F factor analysis output
there is going to be what we call the
common variance
and then there going to be what we call
the unique variance now you want the
common variance to be high because you
want to look at you want them to be
related to a certain
Factor now if the commonality is equal
to one that mean there's no specific
variance if the commonality is equal to
zero that mean there's no common
variance so all the variance will be
specific to that
particular uh variable so in factor
analysis the the interest is in the
common variance we want the common
variance to be as high as
possible okay so we can have some
questions and
then I saw there are some people with
their hands up is it RIS accidentally
or okay
say relationship between principal
component and Fa in terms of variable
within both
approaches for experimental research can
I just do principal analysis without
trying to create
factors uh now the the the the the first
one I I I'm actually trying to
okay so principal
component uh is is is is
mainly it doesn't really stress that
much the importance the issue of the
factor but if once you bring in the
issue of the factor in the principal
component then you're already doing
you're already doing factorial uh but
principal component analysis is a lot
used more as as data reduction and
principal component uh does not uh C
principal component we looking at the
amount of variance that is explained by
the first principal component uh while
in the pror in the factorial we are
looking at the amount of the common
variance that is explained so that's one
kind of unique difference that uh the
the the principal compon analysis look
at the total variant that is explained
by the principal component while uh in
factor analysis we are interested in
that common variance uh that has that
has been EXP expain in that so for the
question uh for experimental research
can I just do principal analysis without
trying to create factors uh I find
principal component analysis as one of
uh the method that is highly abuse
especially by by by by by plant breeders
for some reasons
uh I I I I I find it a bit difficult to
to to
recommend it apart from maybe it looking
fancy when you have really very few data
set you have very few observation that
you're trying to uh trying to to get um
to to me the question is if if I want to
use principal component analysis I need
to ask myself why do I want to use
principal component analysis do I have
do I want to look at relationship
between the different variables because
I I've seen the sometimes somebody comes
up with principal component analysis
there are this nice principle component
analysis where you also put in the VAR
you also put in the variables together
with what you're trying to to classify
that is also fine yeah but you have to
be a little bit more
careful okay
uh can PCA and
open-ended questions no principal
component doesn't hand open-ended
question principal component are based
on on qu on numerical values that you
have so they can handle like a scale but
it cannot handle uh open-ended
questions in case variables are weekly
correlated we still now if the variables
are weekly correlated there is actually
no reason as to why you should do
principal component analysis or factor
analysis because at the end of the day
the Assumption here is that if the
variables are related then you can get
an underlying Factor now if they're not
related there will be no underlying
factors and the the number of principal
component that you will get to explain
the amount of variability in your data
is going to be very high and at the end
of the day it will not it doesn't really
had it doesn't make sense
what uh what will be the option for plan
breeding data set no no no there are
very many thing you can do with your
that I'm not saying you should not do
principal component analysis I'm saying
there are many other tools that you can
use so in my opinion of us okay first
have your principal component analysis
result and then try to interpret it now
one way of course the easiest way to if
you can interpret and you find it for
Falls nicely within your uh your your it
falls nicely within
your uh I mean within your theory and
what that's fine go ahead with it what's
the main difference between uh PCA and
principal coordinate analysis I think
PCA deals directly with the DAT the road
data PC or principal coordinator
analysis you first convert the road data
into distant measures and then you do
the decomposition that's the main
difference but the interpretations are
very
similar uh is the rule of th that if you
want to do factor analysis you perform
PC first not necessarily but but when
what we're going to see in ARA is that
you can have the the factors are already
done you don't need to worry that much
about
that supposing you have a lot of
variable one factor can't you do a
PCA then do explain exploratory the PCA
and exploratory are done for different
the PCA and and fact analysis are done
slightly different uh idea so okay we
are going to run some analysis let's
have a small
break let's have 10 minutes break and
then we come so as you have uh as we
going to have the break uh my colleague
has shared for you a folder uh with
material for today so I want you
to uh download the folder uh put it on
your desktop or somewhere because we're
going to use it for the
analysis okay so we're going to have a
five 10 minutes of break we come back at
uh 1536
this folder for day
three I mean doing PCA and doing fact
doing Factor doing PCA before factor is
not going to change the relationship
that you have your factor analysis and
your principal component analysis result
is going to be based entirely on this
table here
so the the variance covariance Matrix or
the correlation Matrix is the one that
is going to determine whether you're
going to get different you're going to
get many many factors or or
not please download the
folders We cannot send it through email
please there so many people
day three material please download day
three material the link should have been
provided a few times more
already yeah you first unzip once you
download please
unzip okay let me take some
water thank
you
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e e
hello welcome
back yeah we have a a very very
simple
uh practical to run so the first thing I
want to know is uh how many people have
the file
okay good we seems to have a number of
people with the file have you zipped the
file
okay so please uh those who are I'm
going to give you another
uh five
minutes uh to ensure that
everybody pick the
material uh they can be okay let me hope
they send the link somebody resend the
link
again is somebody who is having
challenges want to be sent on email I
don't
know can one good suar sent an email for
f
fedick the link is going to be posted
back it's going to be put back here
okay I think
uh Ellen has put uh the link can you
please download and share with your
colleagues
uh there are lots of online materials uh
that you can actually try to uh to
run so you can go on
YouTube we we have a CSV file which can
be open in Excel so that's
okay okay so uh let me try to uh look at
the let me try to get the folder so that
we see the content of the folder
okay so in the in the
folder uh you should there are three
three files there is a fact analysis a
PDF file there's fact analysis.
R and then there is a test score. CSV so
these are the different
a file that I want you to have
now I want you to
open the the the factor analysis. R you
can right click and open with ar
Studio open with ar Studio I want you to
so please make sure you first seeing
what the content how many people
I seeing have the files can I can you
can I see people typing yes those who
have the three files in the
folder okay I think we we have a number
of them I'm not seen any no
okay okay so the next thing that I want
us to do is I want us to open uh the
second you see the the second uh file
the one that ends with R I want you to
right click on it and then say open with
r
Studio once this open with other Studio
you should be able to uh to see let me
now stop sharing this
okay so what I want you to do is I want
you to
okay so what we first need to do is we
need to install uh the two
packages uh there
is
uh there is a
psych so you need to run line one and
line
two that isas so if I run my line
one it's already telling me updating the
loadest but so I say
cancel so I've I've already installed so
I want you to install the two the two
files two
folders not folders sorry two packages
and once you've you've installed the two
packages I want you to run our line
number three and number four uh so what
we know is that for you to use aror you
need the package uh and the package can
be installed using install. packages the
first
two lines I use for installing the
packages and then once you've installed
the packages uh you still need to call
the packages
using the library so you can call run
line number
three and run line number number four
okay um so is that
done we run number three and four can I
see in the
chat
okay uh I seems to see
some anybody an error anybody Wasing an
error to show us the error
uh he said warning package site was
built under under that wait a bit Ah the
the
tab you don't have the I you're saying
install not install so you need to bring
the I there that's why you're getting
the
error T can you do
that just run the line the way just put
a CER on the line and run it and let me
see if you if you've done right you can
type yes can you type
yes the rest seems to be
okay okay uh now the next thing that I
want us to do is uh I want you to go to
the do you see on top where there is AR
Studio
I want you to go and look go and look
for where type
session I want us to look for to make
the folder that has our data set we want
to make it a working
directory so you go to
session uh go to set working directory
and then you choose a directory
tabita what issue do you have can you
allow tabita to can you allow tabita to
to share screen I can stop sharing the
screen we want to
see I'm going to stop sharing the
screen can
we can we make it a bit uh to become a
or let me just okay let me say multi not
panel allow to
share so that we see where the error is
coming
from uh somebody said the AR studio is
not work out
opening uh
Dr can you
make can you promote
tabita I
pr Saab
[Music]
Sam I think you should be
okay tab can you share your
screen she's
rejoining okay
I don't know which tab you meant there
is I don't know there more than one tab
but I saw one tab there's one is
declined yeah
two there is now she's now a panelist so
she can actually share screen
nio kindly share your
screen unmute yourself and
speak um yeah please go ahead I think
I've seen where the problem is it tells
me that I'm not
installed okay uh drong you want us to
promote another
person okay I'm failing how to share the
screen mm look for the green
icon green
okay have you seen
it yeah yes click on it and then choose
the window which you want to
share good
go
ahead so it tells me that packages are
are
not
installed tab can you click on install
do you have that one that warning on top
there can you click on
install yeah that one click on install
there
and let's wait and see a first wait okay
can can you run Library run run line
three and we
see okay run line uh
four okay you're fine okay thank you if
you see any any
any note on top indicating uh that you
need to install you can just click
install okay okay you can stop sharing
now any other person getting having a
problem who wants to to share the
screen okay so I can assume
you cannot install Edward Edward Ed or
kumu can you promote kumu Edward
kumu but you're not written it correct
your name is not written you you decid
not to write it
correctly
[Music]
Ed there's
it more oh okay
uh he's joining he has
declined he has
declined you cannot see
install but he said he has not
declined he has he has declined I see
whenever I try to promote someone I see
a person who has declined hey
yeah okay
it is even now it is saying he declined
to be promot okay let's let's just try
to to to
run uh run run with the people who have
and then we can we can see later there
Samuel Mal Sam
Samuel
Mal can you promote that one maybe then
we we we we we
run yes I
I'm reading like reading
Latin uh
some I have requested him
to
Yes
um I do not know
why yeah he's here he's sharing
[Music]
uhhuh line four is saying that after I
run line four saying attaching
package ration the following objects are
Mas from package no but that's okay
that's not a
problem that's fine you're okay thank
thank you yeah okay so let's find then
you can stop sharing and then we
proceed okay so I want you to set a
working directory how many people can
set their working
directory so you you just go on top
there you see you see you see M the have
session uh set working directory uh
choose the choose a directory just see
what uh what I have
there and then when you choose a
directory uh look
for uh look for the
the
file the folder that you just
downloaded uh that is where I want us to
work
on you see for at the end of the day you
should be able to see what you have what
I have done in my console here once this
is done then the rest are all going to
be easy to
run okay
so I assume that uh o you have set a
working directory uh so the next thing
that I want us to do is try to run your
line number 10 line number 10 which
is we are trying to
read this is the command read. CSV we
are reading test score uh. CSV this is
our file that we're trying to read so
let's run uh number
line number
10 and then we can
check uh using this command
Ed we can run
this so you should be able to see that
here we have the results for eight test
test one test two test three 4 5 6 7 8
so this eight tests were given to a
group of
students and each of them uh got some
scores now we want to see the first
thing that we want to do is we want to
see
whether
uh there's correlation between the
different test if there's correlation
between the different tests uh that
means we can go ahead and run uh our
factor analysis if there's no
correlation that mean there is no need
for running factor analysis so we can do
uh CEST
butlet uh for using the data set itself
so here if we run line number
12 okay uh it should be able to show you
some some warning here AR was not Square
finding R for the data so here you can
see it gives you a k
square that is
67
6788 it gives you
a a p value which is 5 * 10 5.17 *
10^5 is this a small or a big a big
probability and it gives you 18 degrees
of freedom so uh since
so what we are trying to test here is we
are testing whether uh there is a
correlation between uh the different
variables so we are testing uh whether
we have an identity Matrix or not now
the Nal hypothesis indicates the N
hypothesis is that uh we have an
identity Matrix with identity Matrix
that means we have ones across the
diagonal and zero elsewhere zero
correlation mean there are no
relationship between the different
variables so if we reject the N
hypothesis then we should be able to go
ahead and do our uh do our factor
analysis so let me see from the
chat how many people have
run okay
how is the directory chosen so for the
directory uh you go
to go to session if your line number
nine refus first let's go to
session I hope you're seeing my screen
well
session uh set working
directory and choose a
directory uh that means you need to know
where are your direct where your folder
is anybody wants to share so that we can
get your directory you can we can we we
we we you can help the rest to see
anybody want to
share just type I want to share if
you're failing to get director yes
there's
onuka can you promote
to
share so that we can try to solve the
problem
yeah the test scores are the variables
in this
case somebody being promoted to share
let me stop
sharing kindly let me know the name you
want me
TOA
okay the name almost like the name of
your boss so that is is not there oh
Cory he's joining Cory is
joining yes is there
okay is my friend okay so we want to see
where you
are wait a
bit H choose
directory you wait a
bit what is happening we are not seeing
the we are not seeing the the the theory
working can you do it again
uh-huh uh where are
you uh
can you can you first first install I
want to see whether your system works or
not can you install
the just click the
install uh
oh can you try again and we
see no no don't install again
you
okay I want to see they try
to I don't I I I I don't know why why
the system is not working
[Music]
no you need to first set the working
directory so if you don't set the
working directory then you will not be
able to to get to to get there
okay fine we these things are quite
let's first run uh with the the I think
there just something that seems not to
to add
up yes it's not installed
where okay let me just proceed and we we
run now uh we will look at the the rest
will follow first let's first follow on
the
screen okay so so the first thing here
is uh we need to
test test whether we are dealing with an
identity Matrix an identity Matrix means
the diagonals have ones and then the off
diagonals have zeros and that
means in that case that there's no
correlation between the different
variables and once there's no
correlation between the different
variable there will be no need of of
running of running factor analysis or
principal component analysis okay so now
let's go and run our line number number
13 uh in line number 13 what we are
trying to do is we are trying to run
correlation uh between uh the different
variables that we have the eight
variable that we have so let's run line
number 13
and then we can
print what we have generated remember we
created an object uh called core subset
now what we need to do is we need
to we want to print but we want two to
round up to two decimal places so run
line number
14 okay so if you run line number 14 you
can see uh that
uh there is a
correlation between test one uh what do
we have a high correlation between test
one and which test can I see on the
chart test one and which
test t one and four uhhuh and then test
one and and six so you can see here we
have
0.74 for Test 2
0.79 for test 4 and
0.7 uh 9 for test six so that means uh
test
two the test one test two test
four and test number six seems to be
related they seems to be measuring one
thing now test three very low
correlation with test one test five and
then test seven and eight now when we go
to two we should be able to see the same
thing with one with four and with six
now when we go to test three test three
has ey correlation with test
five H it has ey correlation with with
test seven and I correlation with uh
test number eight now we have across the
diagonal we have uh 11 one so one one
that this is the the variance the the co
variance with
itself okay so from here we seems to see
that there seems to be two uh two groups
there is one 3 4 and six and then there
is three uh seven and
eight okay uh now let's what we are
going to do first is we are going to run
our factor analysis uh using this
function here which is fact anal and
it's going to give us uh some
information here so let's run uh line
18 of course if you run line 18 you
don't expect to see anything because we
we have created an object so when you've
created an
object uh so in this what we're doing is
that we are running this function here
uh the data set is our score remember
this is what we got from here and then
the number of factors we say we want one
factor we are just specifying one factor
and then later we're going to to look at
two factors three factors and so on and
so forth so we can look at 919 we have
created something which we call
ea-1 so we want to see what is
this now it tells this is Factor anal
because it's created from from that
function uh we can also look at the
loadings remember we talked about the
loading the loadings are the bet that we
talking about so let's let's create an
object called the loadings
okay and
then it's the same command twice so we
can run line number 23
here okay so we we looking we just
picking the first value which is
0. 1 1 31 now let's look at what is the
structure of this
loading uh it has uh something I mean it
has test one it has Factor so but let's
run our line 25 first then we see what
is there in line 25 okay so with line uh
20 remember here we have our Factor this
is for we have only one factor now if
you
remember uh this is uh for test one this
is that mean this is the beta for B for
test one this is B for test two this is
B for test three for test four for test
five for test
six for test six Forest seven and test
eight so you can see now that which
variables are having high correlation
with Factor one can can you type in the
chat which one have high correlation for
chart for Factor one remember this is
the first Factor
can I see in the
chat which one has I correlation with
Factor
one okay say three yes three has a high
correlation five yes five also has a
high correlation and then uh seven and
and
eight okay and those one the one that we
saw ini
initially the one that the one that are
here here remember three add a high
correlation with with five had a high
correlation with seven had a high
correlation with with so those are the
ones that are picked as our Factor
number one
um uh the other thing that we can do is
we can square the loadings we can square
the loading Square each of the loading
and then you sum them up so that's what
call the sums of of the the sums of
squares so let's run uh run our line 26
we're going to make it uh so you can see
the
total loading some sub so it's just
simply this squar plus that squared plus
that squar plus that
squared now if we sum this one all and
we multiply by the total number of
columns now remember the total number of
columns are the total number of tests
that we have we have eight different
tests so here we are going to look at
the proportions of variance that is
accounted for for the whole data set by
the first principle the first Factor so
what we're going to do here is we run
here so you can see that the first
factor I account for about 40% of the
total variability that we are observing
the total proportion of variance that we
have
there the sums of the loading is what
I'm trying to explain the sums of the
loading is trying to explain uh the the
the proportion of the variance that is
explained by the factor so the first
Factor explains uh about 40% of the
variance uh that we we observe but we
are going to get a much better results
later okay so now what we did here we we
ran this line 18 we specify one factor
now let's go and specify two factors we
are going to run the same command but
this time we're saying n is equal to to
two so let's run line number
29 and we look at the output from line
29
okay okay now the first thing that it
gives us here it gives us uh we say
uniqueness now remember we talked about
that the two components of the variance
there's the variance that it shares with
uh the other variables and then there
variance that it for itself alone so you
can see from here that
0.23 is for test one alone 0.22 for test
two alone 0.17 for test three alone and
so on and so forth
now we have our factors down
here okay so here this are our load so
remember here we specify that we wanted
two factor factor one and Factor two now
when you come to test one you see test
one
here it doesn't have a loading on Factor
one what what happens is because now uh
AR will decide only to print uh up to a
certain amount so you can see from here
that so we assume where this are zeros
so Factor one
test one has a no loading on Factor one
it has a zero uh it has a zero on two
also has a zero four also has a zero and
then six has a zero so this Factor one
is the factor that is associated with
test number two test sorry test number
three uh test number five test number
seven and test number
eight and then
uh our Factor two is associated with
test number one test number two test
number four and test number six so
remember this is what we saw in the
correlation that we have before now now
let's go and and look at down further at
the sums of squares for the
loading uh initially we our calculation
on line line number 27 get us this value
already and then for Factor two we have
this is the value so you can see from
here that factor
one gave us
3.27 Factor 2 gave us
3.09 now in terms of the proportion of
variance that factor once accounted for
is about 40% that is what we calculated
in line 27
and then uh Factor two accounted for uh
38% and remember the same principle like
what we had in principal component
analysis the first factor is the one
that uh that counts for the highest
followed by the second and then the
third and the fourth so we can run three
four five up to eight and then the
cumulative so the first one is 40% the
second one so cumulative so the first
two factors account for
0.78% of the variance that we are
dealing
with and then here we have a test so
what we have a test here and this is the
test hypothesis that two factors are
sufficient so this is what we're saying
test that two factors are sufficient now
it uses a Kai Square test so here it
gives us a K Square statistics of 15.1
hit on 13 degrees of freedom and then it
gives us a value of uh
0.296 so I want to ask you based on the
P value uh can we conclude uh that two
factors are
sufficient somebody said
no
uhhuh somebody said yes
okay ah okay now the first thing is
let's look at the what is our null
hypothesis the null hypothesis is that
two factors are
sufficient okay now are we rejecting the
null hypothesis or we failing to reject
the N hypothesis are we rejecting the N
hypothesis
we are not rejecting the N hypothesis so
in this case we fail to reject the N
hypothesis and therefore conclude that
actually two factors are
sufficient so since we failed to reject
the N hypothesis we conclude that two
factors are are sufficient now let's
look at our for the we ran we ran the
first one with with principle component
one okay with principal component one
our K Square was
327 on 20 degrees of freedom it gave us
a P value of 1.92 * 10^ 57 now this is a
much much smaller probability smaller
than 0.05 so we reject the another
hypothesis that one factor is sufficient
so that's why we went and ran two
factors now let's run three factors and
see
okay so when we run three factors
now and you're going to see this also on
principal component you're going to see
the same
thing okay now you see the uniess are
also going the values are
changing okay so the factor one we still
have our test three our test five our
test seven and our test eight and then
Factor two we still have our test two
test one test two uh test four and test
six and then we have a third Factor uh
that seems to say something about test
uh number four and test number number
five but remember here we already
indicated that actually two Fact Two two
were enough but we still ran out of
curiosity now let's look at the the the
the the loadings
here okay so the sums of squares are
reducing so you can see now that uh in
terms of the proportion of variance that
is accounted for now this one actually
account for very very small percentage
and so we can see moving from the
second factor to the third we gain about
uh we go from 79 approximate from 80 to
83 so here we we already say there is
actually no need of going for three
factors but we just R the three factors
because we wanted to see exactly what is
going on
there uh let's run four and see what
comes
okay so we still have
our our our two our factors the first
two factors are intact we have something
happening in factor three and we have
something happening in factor four now
in most cases uh if like when we run
principal component analysis the later
factors May the later principal
component may actually be giving some
kind of a finer finer relationship uh
that may may maybe could for example you
could be looking at things like
identifying some outliers or some
individuals that are very very unique so
when you're going to do principal
component analysis and you you you plot
the first two uh the first two uh PCS we
assume that those ones always capture as
much of much of the information as
possible now the any additional
information now will move on and become
less and less so
here so you can see that the first two
uh factors will give us about
80% the first three will give us 82% the
first four will give us 84% but you can
see
from from two to four we have gain very
little about 4% so this is where the
issue of pony comes in that do we want
to go in for many many factors and yet
they are not giving us adequate
explanation uh but the good news is at
at least from here
uh we when we tested the two it already
told us that the two are sufficient
there's no need for going for a third
factor or a fourth Factor now once we
agree that the two are sufficient the
next thing would then be how do we
interpret uh the two principle uh
components or the two factors in this
case now so you need to start looking at
that okay
uh maybe uh the first two let's say
these students were they came when they
very serious uh so they passed the two
first two tests and then they somehow
relax again uh and things did not go
well in the third test then they pick it
up again then they relax then they pick
up again but here you could have a
situation where for Factor one these are
people who came and first started
sleeping and then they woke up at some
point but now you need it has to be you
to look at the factors and interpret it
in light uh with what you know as H as a
specialist in your particular
field he somebody said they could be
retakers or they could be yeah so this
is the one that could be retakers they
were not attending then all of a sudden
somebody comes and then attends and then
disappears and then comes and then yeah
so as the semester comes to an end they
then they can move in and have the but
this data was just
generated uh the the the other thing is
uh that you can also use a
aarian uh
Matrix uh to
run uh you can also use covariance
Matrix to actually run the factor uh the
factor factor factor analysis so instead
of using the the original
remember we use the score so you can use
Co score you can create aarian
structure and then you can round it off
and you see what is going on so here you
have the different variance uh this is
the the variance Co variance
Matrix uh you still Pi up that it is
test one and test two uh test one and
test four uh test one and and
and test uh six but here also we don't
have the one anymore this one here now
is we have 0.9 0.9 is now the variance
so this is the variance for test one the
variance for test two the variance for
test three and so on and so forth now we
can run the same
function but now we are using a coari so
you need to check how it is defined this
one say covariance Matrix equals to cat
two
factors we can run
this and we should still get the same
result like what we got
before so what what we're saying here is
that you can actually have a aarian
matrix and not the actual data and you
can see use the covariance Matrix uh to
move on and have that okay
uh now remember we talked about uh the
scores and we look at the the the factor
scores uh and we said there method you
can use weighted method or you can use a
regression method so let's say for
example if we use a regression
method so so these are the scores uh on
Factor one and Factor two of the first
six individuals that you have in your
data set so I could have for the all of
these wait a bit sorry I need
to I just need to get to extract all the
scores from there
okay so you can see differences so you
can see this person here is is high on
oh
sorry I drag
my
okay uh and now the the the other thing
that we talked about is we talked about
rotation and we said the rotation is
mainly brought about to be able to help
us to have a a a better visualization or
a better understanding of what is going
on yes somebody said I think because the
P value already showed us that much of
the variations already explained by the
two factors so additional factors are
less less value true okay so we can run
uh a factor analysis but this time we
are specifying that we want a very
rotation so this is one of the the
rotations and then the scores we said
calculated using regression
method okay so we can run
that and then we can
extract uh the
scores correlation between the two
factors
remember so you can see here the
correlation between the the two factors
is very low so the the correlation
between the two factors being very low
means we have a situation where if a
variable has a high loading on Factor
one then it will have no loading or very
low loading on Factor two so so we have
a situation where this is similar to
what called
orthogonal an orthogonal
rotation we can also do another rotation
which is called Promax rotation so you
can just change it from here uh I'm not
going to go into detail to explain for
you all these types of rotation but I'm
sure uh when you go and read uh you
should be able to come up
with an easier way so when you look at
this the the correlations are you see
the correlations this is negative here
we have actually they similar value for
the correlation and this ones are
slightly higher compared to to this is
0.001 this is 0.01 so you can see now
that with v Marx we we Pro prax we seems
to have some correl slight correlation
between uh the
two between the two
factors so we can run uh with another so
there is a uh fa a uh this fa comes from
a psych function it also run for you
factor analysis but then it gives you
different output different kind of
output so let's run this and see what we
can get out of
it we can extract the loadings out of
that okay let's
see so we still
have okay so you can see M1
mr1 you have
this so Factor one is still test three
test five test seven and test eight and
then Factor two we have test one test
two test four and test and St six and
then here we still have
our cumulative uh variance you can also
extract uniqueness
uh in the case of the the first one it
gives you uniqueness this one you can
extract it and this value should be
similar to what we saw before uh you can
also extract the
commonalities so the commonalities are
the the amount the variant that it
shares with the rest now in this case uh
test one will share this
77.8 but it will share it with with test
two uh test four and test six and then
test two test three we share it
with uh
test three will so each these are the
commonalities remember commonalities we
say these are factors these are
variances that are shared with the other
variables uh so if you you add the
uniqueness plus the commonalities you
should be able to get you should be able
to get
one okay because that's the total
variance that you have assuming the
random part get in
two okay
so two different method for for
calculating they give you slightly
different
results uh then there we have a third
one we can have use the the the function
principle uh so in this case you specify
uh your score this is the data set how
many factors you want and then rotation
do you want rotation here we said no
none you can have the different type of
rotation like we saw before so we can
also run this and we get uh similar
results so if we run
this okay so let's look up and see the
kind of results that we have so here we
looking at we want to have eight Factor
factors remember we have eight test so
eight factors mean the number of factors
the same as the number of variable so
apparently you almost not doing
anything okay so let me check uh go up
here of course here is typically gives
you principal
component okay so you you can see the
first principal component the first PC
and then you can also see that the first
PC is scoring highly on three on five on
seven and on eight the second PC is
scoring high on one
two uh four and and then on six now the
third one you see the third one seems to
be nothing really in particular the
fourth one and so on and so forth now it
also gives you uh the H H Square H
square mean like the the variance that
you you have
now and then so here you have the values
of one one one one one and then it gives
you U this is uniqueness because now the
the the commonality the this is
representing the the the this this are
one one one so this 0 0 they more less
like 00 0 and then let me go down a
bit and then it also tells
you uh the the loadings for the
different uh pces and these loadings are
converted into the proportion of
variance accounted for so you can see
the first two accounts for
84% then 88 and 9 and so on and then uh
the the proportion explained in each
case and then the cumulative proportion
explain uh we have this and then uh here
we're saying this is another test we are
testing the hypothesis uh that eight
components are sufficient of course here
the probability is not there because we
are using the entire data set we are
using all the principle now if we change
from
from 8 to two remember we saying n
equals 8 and we have eight different
test so you apparently we are not doing
anything because we running the entire
value there so let's run our line number
71 okay and we
check what is going on
here so so here you see we have
two we have
two uh pieces principal component one
principal component two and then we have
our H Square which H Square which is the
the commonality and then here we have
the uniqueness so this plus that and
this is the amount of the common
variance that we we we have uh for for
for for the for the for each
uh each variable then we have our
loadings this is similar to what we saw
before uh we have the proportion of
variance they explained and so on and so
forth so you can see from here uh that
actually the first two principle I mean
actually com I mean explained uh more
less everything uh that we have there
and
then uh of course we have a nice test
here you see now we have the
probabilities here initially we didn't
have the probability because we did not
have any degrees of freedom we we had
eight variables we put all of them into
into our into our our our our principal
component analysis and therefore we
cannot uh actually do anything out of it
so we can
also bring the
correlations uh between the
different okay so you can have the Cor
relation between the different test uh
now here you have one and
two one and
four one and
six remember these are the ones and then
of course here you have your uh this is
similar to what we we we had up
here the diagonal one is is similar to
to this ones here the H
0.83 and
[Music]
0.82 can see here 0.8 0.82 so these are
these are the the commonalities that you
you have here okay and these are the
correlations uh that we have so this
correlation is uh the correlation
between the different variables that you
have so there are different ways you see
this correlation here is based on the
model but you can also have correlation
that is based on the actual value now
the difference between the correlation
based on the model and based on the
actual value so let's see for example
here remember here is based on our our
model with two factors here we can have
correlation based on model with one
factor you see with one factor the
correlation is is is is
lower but the pattern is still the same
uh now this is the correlation that we
started with the one that is from the
entire data
set okay so you can see
0.7 and then of course the diagonal now
is 11 one one because that is the
variance that you have there so
so but we can also I mean this is just
more playing around with so you can
actually look at the residual if you
compare this to this you can see how
differ is the
correlation based on the road data and
the correlation based on model with two
factors so if we run line line number 77
okay so you can see this this is the
variance now this variance here is more
related to the unique variance for each
of the observation and these are the
correlations so you can see the
correlations are a lot so the difference
this is the difference between so a
bigger part of the correlation was uh
there's a big
uh so not really big is because what we
want is that here we have a good model
that explain 80% of the observation so
the difference between uh the draw data
and a good model should be smaller now
if we compare a difference between a raw
data correlation based on a raow data
and correlation FL based on one variable
we expect that this relationship should
be worse than the previous one so this
values
here are smaller we expect
we expect uh line 78 to produce for us
slightly bigger values uh than what we
have before and if you look here uh you
actually
see uh
no 2.6 let me check the other one no I
think I mixed up the the the the
explanation let me see
what I I think I mixed up the
explanation uh you like to see uh
msd2 no we cannot start something new
now we're just trying to wind up what we
have uh so if you use the PCR uh you can
also uh you can also no the plot will
come later in the in the the next one
the plot will come later so we can also
do rotation so here we talk we can do
ver Max rotation uh we have two factors
we have four factors we have two factors
so we can run
this run
that and then uh we
can now try to uh look at this but we
want to only show uh uh loadings that
are greater than 0.3 so let me first
print this and we have a look
at okay okay so here uh of course you
can see that uh you have test through
three test five test seven test eight
these are the ones this are one
group and then the second group we have
this and then the third group we have
only test four the fourth group we have
only test one and then this one gives
you the
commonalities and then this gives you uh
the unique
variance and then this ones give you the
the the the the common variance that is
shared by the actual value that is
shared um now if that was for for for
model with the with four number of four
factors we can do it with model with
with with two
factors okay so somebody saying
rotation so we rotation is we when we're
talking about uh uh before we started
doing the Practical uh we're talking
about rotation is is one way of how uh
to improve on the
interpretability of our to improve the
interpretability of of our our factors
so you can have what we call orthogonal
uh
rotation which assume independent
independent information is given by his
Factor where zero correlation between
the different factors or you can have
oblique rotation in which you can have
different degrees of relationship
between the different factors and this
is supposed to be based on the kind of
theory uh that you have uh now let's
have uh so this have an oblique kind of
relationship oblique rotation which is
kind
of oblique rotation one of the oblique
rotation is what call
oblimin and this is supposed
to we want to see whether there's going
to be Improvement in this blank spaces
here but that depends entirely on
whether
uh what kind of relationship we are
dealing with so we can still
try and then I wanted to run in a way
that I leave anything else that is below
0.01 okay so you can see
here wait a bit what do I
do okay so so here we can see uh we have
0.9 uh 0.92
uh 0 point that but now we also have
these small ones that were there because
of what we did here if we change this to
0.5 uh we'll see completely a different
story you see if it change to 0.5 then
this un completely remain brand so you
can actually play around with this to
show uh if there any sharing between the
different uh factors so you can see here
that our Factor one is purely uh for
those with test three test five test s
and test eight and then our second
factor is for those ones with uh 61 2
and zero I think that is more or
less
okay okay so I think that is
where the
Practical uh for today will
end uh we have a few minute to try to
answer
questions uh categorical analysis is
challenging then but we have Gras it
fairly
well PCA so PCA can then be used to
develop theories yes PCA can used to
develop theories can be used to develop
hypothesis so yeah we have 10
minutes to wind
up there uh Dr D there are some
questions also in the q& a maybe if you
could also okay the questions in Q and A
okay let me try to stop
sharing and see whether I can answer
some
questions what multicolinearity
[Music]
uh multicolinearity is in when we are
doing reg multiple linear regression
uh sometimes variables there are
variables that are highly correlated
when these variables are highly
correlated and you put them into a
regression into a multiple regression
they are literally giving similar pieces
of information and that is what we call
multicity and and with that multicity
what happens is that the the the the the
the regulation coefficient that you
estimate will tend to be very noisy in
that have very high uh high high
standard standard error so that's the
what what is about
multicity uh PCA is not the same as
regression you can use the result of PCA
into regression analysis instead of
using your ordinary variable to Reg
regression you can have the scores from
PCA uh to run as a variable in
regression
analysis uh can PC handle open-ended
questions PCA ensures that you with PCA
you need quantitative data open-ended
question unless you can relate it to
quantitative data it cannot handle it
can handle like at scale because like at
scale will generate for us maybe values
1 to 5 1 to
n please in case variables are weekly
correlated could it weekly correlated
could you still be if the variables are
weekly
correlated certainly you're not going to
do data reduction because there's
nothing to reduce because each one is in
is supplying independent information so
there is nothing much that you can
handle from that how is the concept of
PCA looking for ination PCA by yes the B
plots can be useful uh the B plots but
the the the what we have as the uh Amy B
plot or gig B plot also be acts on I
mean they they work on the basis of
principal component analysis uh and in
that case therefore yeah they they can
be useful in understanding genotype
environment interaction uh so this is
where you can use principal component
analysis to actually work in to to
actually uh uh help you to understand
your your data uh there's a question my
colleague type in the chat I'm also
concerned what procedure should you do
first before uh going for PCA now uh my
colleague is going to do some uh PCA
analysis tomorrow so we we are going to
handle part of this there there are
issues that like do I do principal
component analysis based on aarian
Matrix based on uh uh on a correlation
Matrix uh for PCA certainly you need to
to that needs to be handled for
principal for factor analysis uh using
handan may not really be an issue much
of an issue
um uh please is the PCA formal
regression I think I've already answered
this question please look at the
presentation that uh we put in uh I
don't get a difference between principal
coordinate and principal component
analysis I think principal coordinate is
a form of multi-dimensional scaling and
in this case the data is first converted
into a distant Matrix and then the
distant Matrix is DEC composed to
produce Dimension that you can then
visualize like you do
PCA uh somebody saying can we use ukian
distance yes there are many type of
distance hopefully when we handling
clust analysis we could me a few distant
measures so you have ukan distant you
have Manan distant if you're doing
genetics you have a lot more more
distant measures than you can
imagine uh will you teach us a software
for qualitative
no yes multidimensional scaling is done
in ARA you don't need to do it manually
uh please do you transform or
standardize the variable before carrying
on PCA I think this one will be handled
tomorrow if you want to reduce data to a
small size using PCA how how does it
work do it some of the several variable
oh we talked about this I think I think
we explained this earlier this are all
so you principal component analysis
create new variables which are linear
combinations of the old variables and
those
combinations uh we just need to
determine how many principal components
are adequate and then based on that uh
we can move on with the
analysis how does PCA in longitudinal
data analysis and crosssectional uh
longal you have elements of time and you
can it can be useful especially uh to
reduce on the issue of multicolinearity
so it would actually be be good uh for
that uh okay my question answer
good uh do
we most of the multivar methods the
presenter have talked about rely on
visual presentation it Al it all depends
on your interpretation for your visual
presentation in represent presentation
and that apart from the my very logistic
question that you can use people
confident interval uh is there any
factual a way one can present results so
fine but that is not necessarily true
you can use I mean we have seen how to
use P value in determining the number of
principal component or the number of
factors to use uh we talked about the
hotelling T Test yesterday we talked
about manua we talked about
manova all those ones uses they use
statistical testing so it's not long
it's not just about
visualization
uh uh causes of forest def uh Forest
degradation such as firewood and
that yeah it's possible
uh shall we be given
certificate Dr will tell will answer
that
questions for us we all interested in
giving you the
knowledge I have dear thank you so much
for the inut been
following okay I think that one somebody
else I still have a challenge on how to
export that table that one we're going
to have another introduction
introductory level course why two Factor
having two positive correlation and have
one in negative so it depends on the
relationship that exists between the
different factors and between the
different variables can you generate
significant level with correlation yes
certainly
uh yes after the factor loading or
factor scoring you can can use the
result from fact analysis to go and do
regression that is
true can factor analysis be used for
purely quantitative in science like
experiment yes you can also do that but
you need to be able to it's all about
whether you're able to interpret uh what
comes out remember the mathematical
mathematically you can get something
very nice but if you cannot explain it
in terms of biology then it might might
be an issue
uh no there is no ideal for common
variance no no no I'm not saying I'm not
saying plan Builders should not use
principal component analysis I'm saying
they should not abuse it I I think that
is the big difference you can use it but
don't abuse it
supposing you have a lot of variables
with Factor One how can can you I think
this was
answered okay so I
think okay I I think we can uh and stop
here uh uh I thank you very much for
your uh paion and dedication during the
day
uh maybe my colleagues have something to
say or I will give take you back to Dr
Naro so the the ladies will they refuse
to that you listen to their voice today
but tomorrow you they will talk to
you uh thank you so much Dr Dr and
Professor for today's session just just
wanting to respond to the person of
certificates we are not uh giving
certificates this time this session is
meant to enrich your understanding of
these statistics statistical procedures
uh using R so it is just for your own
benefit not for for the paper of the
certificate so thank you very much
everyone for being U uh here and
attentive for those who we
the difficulties with their Network
we advise you to visit our YouTube
channel and you'll be able to see uh
these trainings there for your better
understanding thank you very much have a
good evening bye see you
tomorrow okay so thank you everyone have
a nice evening