Understanding Calibration Weights: When and How to Make Non‑Probability Data Generalizable

Q: Why Weighting Is Needed

* **Non‑probability samples** do not reflect the true distribution of key demographic characteristics (age, sex, race, education, etc.). * Without adjustment, any estimate (e.g., prevalence of smoking) can be wildly inaccurate and may lead to false policy recommendations. * Weighting creates a *pseudo‑population* that mimics the target population, allowing more credible inference.
Chisquares
Feb 22, 2026
•
4 min read
YouTube video ID: 4RNzVdJdAYk
Source: YouTube video by Chisquares — Watch original video
PDF
Introduction

The lecture focused on a recurring problem in epidemiology and public‑health research: you have collected data from a convenient or non‑probability source (e.g., web surveys, volunteer panels) and you want to draw conclusions that apply to a larger target population. The solution is calibration weighting – a statistical technique that forces a biased sample to resemble a reference population.
Why Weighting Is Needed

Non‑probability samples do not reflect the true distribution of key demographic characteristics (age, sex, race, education, etc.).
Without adjustment, any estimate (e.g., prevalence of smoking) can be wildly inaccurate and may lead to false policy recommendations.
Weighting creates a pseudo‑population that mimics the target population, allowing more credible inference.
Standardization vs. Weighting

Concept	Goal	How It Works
Standardization	Compare two different populations (A and B) on the same outcome.	Choose an external reference population, transform both A and B to have the same demographic structure as the reference, then compare the resulting pseudo‑populations.
Weighting	Make a single sample look like its parent population.	Assign each respondent a weight that reflects how under‑ or over‑represented their demographic group is relative to the target population.
Types of Weights

Design Weight – arises when the sampling design gives unequal selection probabilities (e.g., intentional oversampling of a subgroup).
Non‑Response Weight – corrects for systematic non‑participation (e.g., older adults less likely to answer an online survey).
Final Weight – product of design and non‑response weights; this is what is used in analysis.
Creating Calibration Weights

Identify the reference population (usually a recent census or national survey).
Select weighting variables – demographic variables that are available both in your data and in the reference (age, sex, race, education, marital status, employment, etc.).
Obtain population marginals – percentages for each category of the selected variables.
Handle mismatched categories:
If your data contain a “missing” category, re‑assign those cases randomly to existing categories.
If you have an “other” category not present in the census, either collapse it into a broader group or seek a reference source that uses the same coding.
Apply a weighting algorithm (raking, iterative proportional fitting, or the platform’s built‑in calibration). The algorithm adjusts the weights until the weighted sample distribution matches the population marginals.
Validate – compare weighted vs. unweighted distributions; ensure the weighted totals sum to the sample size and that extreme weights are not inflating variance.
Practical Example: South African Smoking Survey

Problem – An online survey (Health 24) reported a 50 % smoking prevalence, far above the known national rate (~20 %). The sample was 50 % White, while the national population is ~10 % White.
Solution – Researchers created calibration weights using the 2020 South African census as the reference. Variables used: age, sex, race, employment, education, marital status.
Result – After weighting, the estimated prevalence dropped to a more plausible figure (≈ 4.7 % in the weighted example) and the racial composition moved from 50 % White to 19.8 %, much closer to the census distribution.
Lesson – Ignoring weights would have produced misleading, non‑generalizable results; weighting restored credibility for policy‑relevant conclusions.
Using the K‑Quest Platform

Close the survey – weights can only be generated after data collection ends.
Select variables – choose from a dropdown list; the platform automatically handles missing categories.
Enter population percentages – the interface forces the totals to 100 % for each variable.
Generate the weight – a new column (e.g., cal_weight) appears in the exported dataset.
Apply in analysis – set the survey design to use cal_weight and run weighted means, regressions, or prevalence ratios.
Common Pitfalls

Mismatched categories – weights cannot be computed if the sample and reference use different coding schemes.
Using only one variable – weighting on a single demographic rarely removes bias; aim for 10‑20 variables when possible.
Ignoring the final weight – treating all cases as weight = 1 re‑introduces the original bias and yields invalid estimates.
Over‑reliance on weights – weighting reduces but does not eliminate bias; report limitations and compare weighted results with external benchmarks.
Recommendations for Researchers

Clarify intent early: if you aim for generalizable knowledge, plan for weighting before data collection.
Collect high‑quality demographic information that matches the reference source.
Use design weights when the sampling plan is non‑random; add non‑response weights if certain groups systematically drop out.
Document the entire weighting process (variables, source of marginals, algorithm) for transparency and reproducibility.
When publishing, present both unweighted and weighted results, and discuss how weighting changed the estimates.
Conclusion

Calibration weighting is the bridge that turns a convenient, non‑probability sample into a dataset that can speak to a broader population. By carefully selecting demographic variables, aligning categories with a reliable reference, and applying the appropriate weighting algorithm, researchers can dramatically improve the validity of their findings and avoid the pitfalls of misleading, non‑generalizable conclusions.
Calibration weighting transforms biased convenience samples into credible, population‑representative data, enabling researchers to draw valid, policy‑relevant conclusions when generalizability is the goal.
Frequently Asked Questions

Who is Chisquares on YouTube?

Chisquares is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.
Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.
Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.
Survey Sampling And Analysis Book Recommended
Provides step‑by‑step guidance on designing surveys and applying calibration weights, essential for turning non‑probability data into generalizable results
Amazon →
Applied Survey Data Analysis Textbook
Covers practical weighting techniques (raking, post‑stratification) with real‑world examples, helping researchers implement the methods discussed in the lecture
Amazon →
Epidemiology: Beyond The Basics Book
Explains the distinction between standardization and weighting in epidemiologic studies, reinforcing the conceptual differences highlighted in the article
Amazon →
R For Data Science Printed Edition
Teaches how to perform weighting and weighted regression in R, a common tool for the statistical workflows described
Amazon →
Links may be affiliate links. We only include resources that are genuinely relevant to the topic.
Summarize another video
Full Transcript YouTube

for which
we Sorry which we wanted to use for
research but as you can imagine since
the data came from from this Source you
could be pretty sure that it was not
representative so in other words we're
back to that scenario here the one we
described earlier where our data came
from a convenient sample right but we
wanted to make results that were
generalizable so I needed to understand
that point that's why I keep repeating
myself like a broken
Bell because that is a Croc of the whole
lecture the whole lecture is when do we
use in this specifically calibration
weight we use calibration weight when
somehow your results are non
probability but you want to make them
somewhat generalizable at least to some
extent again okay I promise that's the
last time I'm going to repeat
that
so to to to put what
what waiting does I want to compare two
um constructs in epidemiology that are
very similar so you
can okay that's a completely different
of tangential issue
um so the first concept is the issue of
standardization right I want I want to
talk about two concepts here
standardization and weights they have
similar but very different a
bit in standardization I have two
populations population one and
population two and I want to compare
those two
populations but I can't I can't compare
them at least not directly because they
are so different look at this
populations right look at the shape
we're just looking looking at it from
perspective of an analogy now the shape
of this guy and the shape of thisy guy
are very different this guy is a square
block this guy is a circular shape or I
don't even know what that's an moros
shape there's no shape to it it's
lumpless so it's hard for me to compare
the two directly because they are so
different now think of it as as how that
applies to a population right you could
have two populations that are very
different and you want to compare them
let's say you have population a is a
population that has a lot of young
people and population B is population
has a lot of old people obviously
they're not they're not the same they're
not similar so you cannot directly
compare them if you were to say if you
to ask those two populations um on on a
scale of 1 to 10 how much do you love
playing soccer of course the results are
going to be different because they are
fair different ages right so the
question then becomes how do we compare
outcomes between population one and
population two given they so
different in Stand standardization what
we do is that we Force we take a
reference population another population
and we force those two populations to be
like the reference population and then
we can compare populations one and
populations two this is called
standardization and I explain
standardization because a lot of people
is asking so what's the difference
between standardization and waiting all
right so we know what standardization is
I to have an idea of what it's supposed
to do now let's talk about
weight with weight it's a little bit
different this is our population here
with weight right this is our our our
sample so think back to the cycle of
inference we have a population and then
we have a sample so here this is our
sample this is what it looks like now
this is our population and this is what
it looks like now you see we have a
problem here the for that cycle of
inference to work the sample and the
population must look
alike right they must look similar they
don't have identical but they have to
look similar this don't these two things
don't look similar at all this one is
shaped like this this one is shaped like
this so the question is how then can we
in good faith make that cycle of
inference where we get inferences from
the sample to be generalizable to the
population now that's where weights come
in Waiting is a way of forcing our
sample to look like the target
population so we take this clay
and we somehow force it to look like
this guy now you can now see that these
two guys they don't look identical but
at least they look similar so now we can
go ahead and compare these two things so
that's what weit do with waiting is a
statistical process of we take this
sample that doesn't look anything like
the parent population and we force it to
look like the parent population so that
we can make comparisons or we can make
inferences from from the sample to the
parent population I'm going to stop
there and see if you guys have any
questions before to make sure you're not
lost any
question multistage sampling is a
probability
sample any other
questions all right
none a little recap on standardization
Okay so
let's go back station so let's say we're
trying to ask what is the death rate
between population a and population B
population a is very young people
population B is very old people so the
question is what is the death rate
between these two now the obvious answer
is why are you trying to compare those
two groups they're not of course you
expect that the older population will
have a higher death rate so that is not
even an interesting question right so if
if you are looking at know because the
the the two are not directly comparable
because if I'm trying to look at um debt
from all costs not just debt from you
know aging related factors my question
is what is all cost mortality and how
does that compare between populations a
and population B so I'm not I'm not
necessarily interested in debts that
occur when you you you grow old and then
you develop some condition that is
related to aging and you die of course
we'll expect that to be higher in the
second population that is older I'm
interested in all cause mortality but I
have a problem right I cannot compare
those two groups directly because they
are they are very different so what do I
do I can now say all right I'm going to
force these two populations both of them
are going to look like this Ty
population so we're going to say if
population one with like this reference
population then the death rate will be
so and so right and then we are also
going to say if population two were like
this reference population then its death
rate will be so and so so because I have
now forc both of them to adopt the
characteristics of this external
population that allows me to make direct
comparisons because population one is no
longer population one it's like
population one if it were the reference
population we call that a pseudo
population it's called pseudo because it
doesn't exist we just made it
up I'm not going into the the nuances of
it I just want you to understand at
least on an intellectual level what we
what we're doing behind the scenes when
we write that
code when we talk about the pseudo
population I need to just understand in
very simple terms no complication or not
no nothing
fancy with standardization we're just
taking these two populations and forcing
them to look like a reference population
and then comparing them because now they
both look like a reference
population what conditions are to be met
before using non
probability um for generalization you
have to wait the
data so you have to first of all check
that how does this population look Rel
relative to my the question by the way
was what conditions are to met before
you using non probability for
generalization so the answer is if you
if you're trying to generalize you
should at least try to um you should try
and Sample the right way we don't want
medicine after death calibration weight
is just medicine after death so you
should design your study appropriately
so that you don't even have to worry
about this kind of issues we are talking
about the scenario where you would you
have no choice this is it is where it is
that's what you met so when this is not
to encourage bad Stu study designs you
should design your stud the right way
and Sample the right way all right so um
I just going go through this and then
we'll look at some of the Practical
applications and if you still have
questions we can then take
them all right so again with
weights this is our population here I'm
just this is to explain the same thing
we explained in the previous
slides here we've taken from this parent
population we've taken a sample
now you can see that the distribution of
these colors blue gray and orange is as
exactly the same here as it is here of
course this is a bigger cycle because
it's a parent and this is smaller
because it is a sample but apart from
the size you can see that these two are
exactly the same now this is what we
want ideally right and this happened
because we drew a random sample and
since the sample was random it allowed
the characteristics of a population here
to be retained in the sample so in this
kind of scenario we don't need
weight now here is where we have a bias
sample you can see that blue is more
than half here but it's less than half
here so this sample does not look like
the parent
population so we cannot make that cycle
of inference we cannot make inferences
from a sample back to the parent
population why because this the child
does not look like the parent so the
parents will say no I can't adopt you
you cannot make inferences from from you
to me because we don't look
alike so what do we have to do well we
have to use weight weight will force
this sample to now look like the parent
population so that now we can now make
inferences from this back to the parents
that is all we're trying to achieve with
weight I hope that is clear so again
don't worry about the technical nuances
or the the scientific mark behind it
just understand it from a simple
pictorial form of why weights are needed
and what they achieve it's all about we
want to force this sample to now look
like the parent population so that we
can go ahead and make some kind of
inferences so that is that is what we're
trying to
achieve um so why do we wait your
sampling technique was not based on a
specific sample frame so you know you
you got data from the Internet or a web
survey like the heal 21st survey we we
we use in our study where individuals
self- select like a convenient sample a
volunteer
sample you have to wait now there
there's another of waiting which we're
not going to focus on so much today
which is um when you over sample right
but we're we're just going to FOC for
today we're just going to focus on on
this first part because the last lecture
we had we focused more on the design
weight so now what happens if your data
has weight but you ignore
them let's let's discuss that and then
we'll go on to a practical application
of what we're talking about so that you
might have some context so let's look at
two scenarios the green scenario and the
red scenario the green scenario is they
were weight and we accounted for them in
our analysis so here we're trying to
find the average so when they their
weight and you account for them the
average is simply one time this plus 2 *
the weight plus three * it weight plus 4
* its weight plus 5 * its weight all
divided by the sum of the weight so the
result here we see that our average is
4.7 now what happens when there are
weight and we ignore them when you
ignore weight you are it's you are you
are you are you are then saying that all
all the everybody in the data set has a
weight of one that is what happens when
you ignore weight you are assigning
automatically whether you realize it or
not you automatically giving everybody a
weight of
one so when analysis are not weighted
that's an indirect way of saying that is
that everybody has a weight of one right
so unweighted analysis means everybody
has a weight of one at least that's what
the analysis assumes so here you are
saying 1 * 1 + 2 * 1 + 3 * 1 plus 4 * 1
plus 5 * 1 all divided by the sum of a
weight right and that gives you 3.0 so
here see when the analysis was weighted
the answer was 4.7 when the analysis was
not or when we ignored the weight the
result is 3.0 now 4.7 the last time I
checked is not the same thing as 3.0
so that means that your results are not
valid so this is what happens when there
are weights but you ignore them so
please don't ignore the
weight so like I said two types of
weight design weights and non-response
weights we'll talk more about them let
me with this slides we talked about them
extensively last lecture on weight so
I'm not going to dwell on them so much
for the sake of time
now when you are when you are creating
weight you need to you need to weigh
select the variables you are trying to
weigh on right um and
so the idea is that your variable or
your data set has certain
characteristics in it that differ from
the population so the first step is you
want to take a tabulation of some key
variables like demographics AG sex race
you know education you want to compare
those variables in your data set
and you want to see how they compare to
your overall population so if my data
set is for example the South African
population I want to see what percentage
of people in my data set are male and
female and how does that compare to the
South African population overall based
on for example sensus data if those two
if those two are different then that
means I have to weigh the data right so
that's how you select the variables that
you are going to weigh on they are like
Dem graphic variables and you have to
you know then select make sure that you
have information for those variables in
the sensors too now there are some
things that might happen when you are
trying to weigh weigh your
data the ideal scenario is that the
categories for your variable are the
exact same categories in the sensors
data or wherever you are using as a
reference population so here for example
the ideal scenario is that in my data
set I'm trying to wait on on gender and
I have male and female as the categories
and in the sensors data I also have male
and female the categories you have that
you're trying to weigh must be the exact
categories that you have in the
reference population in this case the
sensus so this is an ideal
scenario now it's not every time you
have an ideal scenario so you have to
know how to manage it now you don't have
to know all of these things even you
don't have to do this manually the
platform on the K platform all of this
is completely automated for you this
lecture is just to give you an idea of
what is going on behind the platform so
that you know what's going on um now
this is ideal scenario they're trying to
create a to wait on let's say gender and
the gender categories is exactly what
you have in the data set versus what you
have in the
sensus now let's look at other scenarios
that may not be
ideal here we have a second scenario
right we have in our data set we have
three category we have male and female
but we have missing values of course the
sensus doesn't have missing values the
sensus has male and female but our data
set has male female and
missing remember that the rule is the
categories you have in your data set
must be exact same categories as you
have in the sensors or whatever you're
using as your reference population again
you cannot wait your data if you don't
have external information there must be
there's some information you are
bringing externally to force the sample
to look like the parent that means that
you have information about the parent
you're trying to force the sample to
adopt the characteristics of a parent
population so the moral of the story is
to wait data you must have information
on each of those variables you want to
weigh on so if I'm trying to weigh on on
male on gender that means I have data on
gender and know the the distribution of
gender in my target population if I
don't know that then I can't weigh the
data so here we have a little problem
because in our data set we have three
categories male female and
missing and those categories must be
exactly what you have in in your
reference population in this case the
sensors so what do we do here well we
can absorb this missing values and and
assign its values at random to the male
and female
categories so that we end up something
like this so instead of having male
female and missing we write our what
what we're doing is that we take the
three here and assign them at random to
male and female so that male now becomes
it was there were 23 people initially
but now by random we added two to read
so it's now 25 and female is now 33
because we added one more to this by at
random it has to be at random so now we
can see that the categories in our data
set is now exactly what we have with the
sensors now we can then proceed to weigh
on gender right so that is what must
happen the two categories must be
um no you don't remove it you you don't
have to do anything on you know you just
upload the data I'm this this lecture is
just trying to somebody was asking do
they have to do this manually no you
don't the platform automatically does
all of these things the platform you
just you specify the variables you want
to weigh the data on every single thing
is done for you automatically obviously
but this but you also need to understand
what is going on so that's why this
we're trying to explain to you what
happens in the
background all right so that is that is
scenario two and that's how we can
address it moving on to the next
scenario in scenario three we have
something similar we have male female
and order in our data set but the sensus
only has male and female so we we're
we're in a in in a fix here because like
the categories must be identical for you
to weigh them so what are the options
well option one is to reassign other at
random into male and
female option two is we might say okay
we know this other right we might say
okay it's female
SL maybe the other category is very
similar to one of those groups so we can
just
combine um that other with one of those
one of those categories or option three
is that we can go and look for another
dat another source of information that
categorize gender in the census or where
wherever that information is coming from
as male female or other one way or the
other something has to give right so the
two groups the two um categories must be
identical before you can weigh
so here is scenario number four here we
have male female transgender and agender
as our categories in the data set but in
the sensus we have only male female and
other so what do we do here we can
collapse transgender and AG gender into
one group and we can call it other so
that now these two categories are
identical and we can proceed with our
our waiting all right
I'm going to stop here ask answer any
questions you have before go on to a
more applied form of this class so that
you can understand what in the world
we're talking about I can understand if
you're lost but we're going to look at a
a paper that did all of the things and I
I will walk you through the process so
you can see how that works
out so let me take
questions okay so I'm just trying to
check the last one is it possible to
even get a sample that will look exactly
like the parent
population no it does it doesn't have to
look exactly remember it just has to
look it has to have similar
characteristics can I standardize four
populations that were studied for
different uh for different
duration for a similar similar attitude
about standardization this last on
waiting standardization is a when you
have two different populations you want
to compare if I want to compare the
United States and Nigeria two different
populations on an outcome then I
standardize them first so I can now make
direct comparisons so but for the sake
of of analyzing if you're just analyzing
one population standardization does not
apply because for you to standardize
that means you have two separate
populations you're trying to compare and
you need to standardize both of them so
that they look alike before you can
compare so that does not apply in that
context at
all for four
populations sorry four
populations yeah you can standardize any
number of populations I mean two or
more okay in a population like Nigeria
how do you wait how do you wait it so
that it can reflect the real population
of Nigeria so that you can carry on your
research all right to be continue that's
what that's what the second part of
lecture is supposed to address so hang
in
there is pseudo population the same with
assumption population with a
what assumption population yeah well
they're both saying the same thing sud
population is more technical term we we
calling the C population because that
does not exist we just created it from
from statistics doesn't exist in the
real world the only reason why we
created it so that we can compare
something
okay a little info on capture recapture
please okay and would would since that's
not the direct focus of this class we'll
answer it at the end if if we have
enough time but let's just focus on
weight
issues but I answer
that yeah that was actually the last
okay some questions now here uh okay
okay can we wait variables before
sampling can you wait variables before
sampling no you
can't you know remember you have to you
have to sample collect data from the
survey and then you waigh if you don't
have information in the data set you can
possibly
weigh does that imply that waiting
occurs due to PO stud
design that's a good question it
yes and no sometime it's just it it
could be that like in this case of the
health 24 survey right we we didn't
collect the data this company had the
data and then it reached out and say oh
we have by the way we have data for
South Africa overall right and we're
like oh really we we how many people in
your data set like 18,000 people oh wow
that's a lot of people how was it
sampled like oh we just people just came
to our website and answer like we have a
problem your sample is not
representative but don't worry we will
still get the data and use it we will
wait the data that that that had nothing
to do with us that's how we met the data
now sometimes you despite the best
design you do participants may still not
respond so let's say you did a random
sample and you administered your survey
and all the people of a particular
demographic or you know some groups just
refuse to answer your your question for
whatever reason you will still end up
with a buyer sample so sometimes it's
not about you it's about the
participants and sometimes it's just
because of bad design obviously but you
you have very little control over how
people are going to respond to you in
fact you have no control okay maybe you
have some control you can give them
incentives and stuff but essentially my
point is that people can choose whether
or not to participate in your survey and
if there's a systematic difference
between some certain groups then you end
up with a bias sample then you have to
worry about weight so it's it's not
always a function of bad design it could
just be that people you know the mode of
administration let's say if you did on
the internet not everybody has access to
the internet right the people who have
access to the internet may be people who
are rich or people who are young that
means that your survey has
systematically excluded people who are
older or people who don't have the
phones or who don't have the internet
now your sample will no longer looks
like the parent population because it's
selective it's just a special group of
people who are your survey so whenever
you have any scenario that creates a
special group people who are in your
survey are a special group not everybody
then you have to worry about
waiting I hope that makes
sense thank you if one has to analyze
data across all nine provinces in South
Africa which are not equal in terms of
size of population and their skin
color okay that's a good question so but
the question is incomplete so it's like
how then do you
assess whether the sample is represent
how do you wait it okay how do you wait
it he completed it later how do you wait
it okay how do you wait it we will we'll
give an example so don't worry that's
the second part of the lecture we're
going to give an example on
that um this random assignment at random
of male and female to fill missing
values to mimic the reference population
does the outcome or result look exactly
like what other programming language
does for example python or R thank you
so it if you if you wait something twice
they are not going to look Alik because
assignment to groups is at at random and
if it's at random it means that if you
repeat it again it's not going to be the
same even if you use the same software
because you are signing at random now
what what we use on our platform is we
use programming languages too right so
it's so what in principle is the same
thing that's been done it's just that
you don't have to do it
manually I hope that's clear I'm reading
a question that says can you repeat what
attributes can drive us to waight our
sample that's a good question so you
start on demographic variables like age
sex race social Dem like social
demographic variables that's what we
typically use so in some of our large
service we use up to 15 to 20 variables
think of things like income education
race ethnicity where you live whether
it's a rural or urban area where your
marital status whether you have children
whether your religion things like that
that's are the kind of variables we look
at so we are looking at two things we're
looking at what is the distribution in
our sample and what is the distribution
in the Target population nationally so
in the case of South Africa we're saying
in our sample what percentage is what
percentage is male and female in our
sample and when we look at the national
sample or the national population what
percentage is male and female
respectively if those two things are not
similar then we know that we had some
selection buyers here so we have to
account for gender in our waiting then
we move to the next variable and say
okay race ethnicity what is the
percentage of white blacks Hispanics in
our sample and what is the percentage in
the National sample if those two things
are not similar they don't have to be
identical but it should be similar if
they're not we say h we have selection
Biers on this and that's how you go for
all the variables right then you can now
say okay these are the variables that we
need to account for in our weight and
then you wait on those variables so
that's the that's a that's a simplified
road map of what goes
on all right let's go on and see can the
outcome of waiting lead to manuscript
reject
rejection well rejection or
non-rejection depends on what your
intent is remember it all starts with
your intent if your intent is not to
create and that's a very very important
point I want to really emphasize it is
about your intent if your intent is not
to create generalizable knowledge then
you don't have to be worrying about
waiting so let me go back to that
original slide because I really want to
emphasize that I can't emphasize
enough it is your intent is there do you
have an intention to create
generalizable knowledge now
unfortunately the only person who can
know your intent is you nobody can know
your intent except you so you are the
one who has to decide whether you trying
to create data knowledge that is
generalizable or non-generalizable but
this is the lipos test is there an
intent to create generalizable knowledge
now that is what would determine whether
you wait the data or you don't wait the
data you can say I do not care about
this study was not about
generalizability this we're just
interested in what the participants had
to say and not about generalizing what
they said to the whole country and that
is perfectly fine too scientific
publishing is all about how you argue
your case there's no or wrong answer we
only have a problem when you tell me
that the results are generalizable back
and see that they're not that's when I
have a problem with you but if you
already came out clean and told me that
no I have zero interest in
generalizing then that's fine you know
nobody's nobody's going to force you to
say what to Define that your intent is
this way when it was that way but you
have to figure out what your intent is
and your methodology has to align with
that intention the problem is when your
intention is going one way and your
methodology is going one the other way
that's when we have a
problem
okay so we us wa sorry go ahead sorry no
sorry continue no you had a question to
us please go
ahead can you repeat what attributes can
drive us to wait our
sample social demographic variables and
you have to start
comparing within your sample versus the
target population whether it's a
national sample or or or the the country
data sensors for example so social
demographic variables like age sex race
income number of children religion those
are all important variables essentially
we're trying to say was there bias in
how you selected your sample and that
bias will be reflected in various
demographic
variables can we say that waiting
requires that a secondary data published
in Brackets must have existed and
related to your intended study
area exactly you cannot wait if you
don't have external data right so if you
said if you told me you wanted to weigh
the data and I asked you so where is the
where are you getting your population
marginals from where you getting the
source data from and you are staring at
me as if you've never heard of that ever
in your life I'm like so how do you want
to wa wait the data like is it with a
magic wand
you have to if you want to weigh the
data you must have data external data
that aligns with the variables in your
data set right so that is the only it's
like exact same way when you want to
weigh yourself you you you have to bring
a way a weighing scales and stand on
that scale it's the same thing you want
to weigh data you must bring in that
scale which is the population
characteristics from the sensors that is
what we're going to use to weigh the
data so those two things must go hand in
hand
I see a question here so we usually wait
based on one variable well no generally
well you can use one variable but it's
very rare that we use only one variable
typically we use several variables um
like I said earlier in some of our large
National surveys we use up to between 13
to 20 variables when we weighing
weighing the
data because we want our population to
be like the parent population on all
those characteristics
should you do your waiting for sample
size when using data from DHS or do you
do your waiting when you are when you
have your
results and when you use the like DHS
somebody has already done those weights
for you already right so in when use
publicly available data like DHS or GTS
or Gat there's a variable called final
weight somebody a statistician sat down
and did all of these things we're
talking about for you but the point of
this lecture is to say when you are
collecting your own data and you have to
create the weight yourself how do you go
about this so if you're using data set
like DHS oh somebody has already
suffered and done all that work for you
but you also have to know how to do
these things yourself so that is the F
the focus of this lecture is not on ways
that have been created already about how
do you
create um how how do you create weights
yourself you collected data online
social media in brackets they
self-administered they made their
choices on who to vote for you know
their age group that is similar to those
on real voters register can you weigh by
AG group note you don't have real
election data by age
group no it's not really section data
you're looking for is you are looking
for data on the
sensors you're not waiting on the
outcomes you're waiting on the
demographic
characteristics right so if I go to a
population that a a if I go to a a
population that is predominantly you
know let's say I did a telephone survey
or a web survey and I find that most of
them say a is the answer all right and
then I'm like but to what extent is the
answer truly a in the over population
when I know that this population is most
mostly a young population so the
question I'm asking is if this
population were to be like the general
population will the answer of a still be
40% that's the question I have in mind
then I'm like okay how then do I force
this population to be like the overall
population so I can know the true answer
of what is the percentage of a well the
answer is I wait the data on age so I go
to my population my sensors and say what
what's the distribution of age in the
general population so I can use that to
force this sample to look like the
parent
population how does the waiting of
social demographic variables you have
described differ from waiting when equal
number of respondents are selected using
cluster sampling so there are two types
of
Weights as mentioned two types of weight
one is design weight
so how do you understand this think of
weight as very simple right you can need
weights because you cause the problem or
you can need weights because the
respondent caused the problem in a
survey there are only two sides the
person giving the survey and the person
answering the survey these are the only
two sides that exist in the survey if I
cause the problem in quote in how I
design the survey that is called design
weight because it has to do with the
design of the study right so let's say I
oversample from one population well
oversampling means that people who are
oversampled have the higher likelihood
of being selected so suddenly not
everybody has the same probability of
selection there's a bias there already
but that bias was injected intentionally
by me because I knew I was going to
correct it later using weight to this
Focus we are not talking so much about
design weights just because we've talked
about them extensively last time now
that's a different issue nonresponse
weight is when the problem is not from
me but from my respondent my respondent
chose not to answer right and because of
the fact that some groups responded
differently the response rates were
different I now have non response
differences in nonresponse patterns
let's say between old people and you
know younger people let's say I give a
web survey right and it's mostly young
people that are online taking my survey
if I now look at the istics of my
population I will see that most of them
are young that means that I over you
know just because of them that that had
nothing to do with me the older people
for whatever reason maybe they were not
available or they didn't have access to
it did not participate they did not
respond and so we have to create
nonresponse weight so those are the two
types of weight either the problem came
from me which which is design weight or
the problem came from the participant
which is non-response weight and that's
why when you use dat like DHS you will
have a variable called final weight the
final weight is simply the design weight
times the non-response weight we combine
those two weights together to give us
the final weight which you then use so
when you take a data set that is
publicly available and you just analyze
it these are the decisions and all the
work that has been done in the
background that you're not even aware of
um but when the whole the whole point of
this is so that you can be at least
familiar with these things so you can if
you have to do a national survey
yourself at least you know that okay I
know there's something called design
weights I have to worry about and I also
know that there's something called
nonresponse weight and so it's just to
some basic education on this topic so
that you you are at the very least aware
of
them question can yes can inclusion and
exclusion criteria solve this weight
issue no you know inclusion criteria
you're just saying that let's say I say
my survey is people who are aged 18
years or older sure I've defined that
inclusion criteria already but I now
launched this survey to the people 18
years or
older now after I launch it I still have
problems because some people refuse to
answer or the response rates were
different I still have the same problem
so inclusion
criteria does not solve the issue
there's almost no story that doesn't
have an inclusion criteria but we still
have to wait the data um so that of
course it you know it's helpful to have
inclusion criteria but the issue of
Weights is you know it's it's almost
separate issue
entirely can one wa data reference to an
initially collected and published sample
data you can use a I'm trying to
understand that question
um Can can you repeat the question again
I just want to make sure I'm answering
answering the person you know um valid
question can one way
data I think with reference to an
initially collected and published sample
data yeah you can use you can wait data
has already been collected before which
is the second part of the lecture that
we're supposed to move into so I can see
how all of this applies in in in real
context how does the waiting to how does
the waiting to account for social
demographic characteristics you just
described differ from the waiting on the
kis platform when equal number of
respondents are selected using cluster
something we've answer that question
before
how no it's not it's not exactly the
same draku okay um yeah so basically we
M we wait based on one variable if yes
how do we select this one
variable we're on several Variables
answer that before you it's very rare
that you only one variable if you are
weighing on only one variable your
otherwi saying that your population
differs from the parent population only
on that one variable that's what you're
saying that is highly if that if the
population def by gender they will most
likely def by age and and race and
income level too right because if most
of your if most of your participants are
older where older people have more money
than youth that means that they will
most likely also differ by income and
also differ by other characteristics so
don't just look for one variable look
for many variables and test to compare
between the sample population and Target
population okay that was the last
question thank you you answer all right
great so let's let's let's jump in now
and look at
a this the the the case study we're
talking about
about so this is our case
study
um so like no that's not let me see
where is this so like I said this is the
we got our data from here let me pull up
the where is
this so I'm trying to look for the
results from the unweighted so we
can let me just pull it up
directly oops
can you see my
PDF 24.com
yes yes we can see it all right great so
remember the story let's tie the story
together we got this data from news 24
right and like I said this is internet
survey that they're not they're not a
research company they're just Media
company but they just put surveys and
once in a while say hey do you want to
answer this question on smoking right
that's how we got data
so as you can imagine the data was were
not
representative when when when they give
us the data they also gave us this
report and so to them this was their
published findings that they thought
were very interesting and so um for us
the appeal was that wow this is a light
number of people they had 18,957 people
so we we're quite interested in that let
me see other questions
here oh okay sorry you guys have so many
air STS
all right now when we look at the when
we start looking at the population here
you start seeing some some hints of okay
like when you look at Race For example
by for for context this is South Africa
right when you look at when you look at
race the percentage of of white this is
green 50% of the population is in this
sample is white
who sees the problem with
that L can you can you can you can you
see a problem with that since your
resident African
expert sorry I can't see
clearly okay can you see this
um we're looking at this population
group
here and we're looking at this this is
by Race So this population 50% of it is
white so I was asking you as a resident
sou afri you saw something with wrong
with that oh yes
definitely we only have um maybe what
70% of the people in are
black so it can't be
50% yeah so the racial distribution of
whites in South Africa maybe is maybe
around 10% I believe yeah yeah but see
here in this sample it is 50% 50% and
10% are very different so this is the
first time that you have a problem right
and then when you look at other
demographics um you know look at income
and all of that you might also see
problems right and then that is also
translated in into the outcomes of a
study too so if you look at let's scroll
down a bit to look at so the first thing
you do is don't just pick a data set and
just say oh I want to dive into it and
start analyzing and Publishing no the
moral of a story is that that when you
get a data set take your time
to um take your time to really explore
the data set and understand it first
that's it so if if you if you did not
get anything from this class it should
be that don't just be in a rush to
analyze data understand the data set
first and understand whether it meets
whether the data is fit for use right so
here we're looking at the data like okay
um based on this data this data set
what is the prevalence of smoking so so
you can see that according to this data
set almost half of 50% of South Africa
smokes currently either they smoke
rarely or or regularly that that doesn't
make any sense like prevalence of 50%
that's Greece the only country in the
world where you have such a high
prevalence rate is is country like
Greece right so you can see the danger
of you just grabbing this data and going
to publish with it and saying oh the
prevalence of smoking in South Africa is
50% everybody's going to look you at
you're
crazy like either you're either you're
crazy or you're completely like
do there is no way in the world that
smoking prevalence in South Africa is
50% and it is 50% because of the sample
the sample is not representative
now if you are if you let's go back to
that original point we made that your
intent should determine the analysis or
your interpretation of the data if you
said 50% of people in this sample were
smokers nobody has a problem with that
we only have a problem when you now
start saying 50% of South Africans are
smokers then we start scratching ahead
and saying that makes no
sense that is actually
categorically
false yeah that that's fake news you
know so so that's the point I'm just I'm
just trying to reinforce right it is all
about your intent but the methodology
and how you frame results must also
follow along with your intent you have
to know when to say 50% of the sample
versus 50% of South Africans those are
very two different statements now in
this case Our intention was to we're not
necessarily interested in this sample
because we do research from a regulatory
perspective from policy perspective
right we always interested in results
that are generalizable now our intent as
researchers in this case was to create
generalizable knowledge so the question
was H we came into this data late
they've already collected the data but
we need the data because it has a lot of
information we need right we want to
make this data representative at least
to some extent
so the only decision was to wait the
data so I'm just giving you some
historical perspective so you can see
how these decisions are made and how
they work so now for context this is the
study we ended up publishing from the
study let me show you that and I'll now
walk you through how we went through the
process of of of the decision
right so look at the topic here
associations between electronic
cigarette use and quitting Behavior
among South African adult smokers so see
how we frame this here right among South
African adult smokers the framing
suggest that we are now talking about
not just the samples but smokers in
South Africa in general right um so this
was studed publishing the bmj um Tobacco
Control now let's let's take a little
bit of time to look at the methods so
you can now see how we reported the
methods
how we reported the limitations as well
and then we can go back a little bit to
the methodology again the goal is to
guide you through you know through this
historical perspective so that when
you're in that kind of context you know
okay what should we be doing or what
should we not be doing so let's let's
look at some of them quick look at the
the analysis
here um let's go to the analysis
here H but let's just focus on this part
here let me highlight it
here oh man okay one more
try aha all right so that's what we're
interested in here so it says
calibration
weight we developed using raikin which
is alterative prop
fitting with the South African census
estimate serving as a reference
population what does that mean it means
that we took the South African
population and we used the reports or
the percentages from the South African
sensors as the waiting Factor so
remember we have established that you
cannot wait if you don't have external
data you're waiting on right you need
you must have data right um and so in
this case we used data from that
population from the sensors as our guide
let me pull that data so we can see in
real time what we're talking about give
me a second if I can find this
okay so this is
the uh L seems the one that put this ort
this data together all right so this is
the column here this is the variables we
had from Health 24 this is the data set
Health 24 you can see the variables age
employment race
gender
education marital
status and a whole bunch of others
right these are the
percentages from the data set but more
importantly this is what we're
interested in this column this came from
the sensus statistics South Africa
so the St Africa is saying you know this
are the percentage of of the different
groups right we're interested in
unemployed this is so we you must have
these numbers and they must add up to
100 percentage you know in total right
the total must add up to 100 for each of
the characteristics so you're not making
this numbers up so this is a very
important column this is the column from
the sensource or from wherever now it's
not sometimes it's hard to get data from
one source you might have to go and look
for data you know estimates from
different sources um I just did a I just
waited a national survey recently where
I had to get estimates from at least six
or seven different sources right because
I couldn't find everything in one place
so it's not as if you just have
everything sitting for you nicely in one
place you would have you might have to
go and look for it but you have to make
sure that it's representative data at
the national level let me see there some
comments I want to make sure all right
um so that is so this is the first thing
you need you need to know um what and we
call this population
marginals is just a fancy term to say
what is the percentage for each of those
groups in the population you are
referring to or that is the reference
population so you get those numbers and
they must add up to
100% so you get all of them and that's
the so that's what we're saying in that
here when we go back here calibration
weights were developed using ra with the
South African sensors estimate serving
as a reference
population descriptive analysis were
performed using weighted percentages so
the whole point of waiting now is to
make sure that the results we get so
remember I showed you the difference
between when you wait the data and when
you ignore waiting so in this case we
waited the data with the weights we just
created so we calculated weighted
percentages and bootstrap 5% confidence
intervals um prevalence ratios were
calculated with person regression model
okay let's that's that's not that's not
really important for
the um okay I think that's that's as far
as our weight is concerned let's just
stop that there for now and let's go
back to uh to back to the issue of
weight the question says in case of
Nigeria where we have the global sensus
of 2006 and projected population I want
to use the 2006 Census Data of the
current estimated data the population
doesn't change that dramatically between
2006 and 2009 so use the whichever one
is closest to your data so you might be
analyzing the data today but the data
may have been collected in 2006 so again
there's no rule of thumb just use
whichever data is closest to whatever
data you are using you know the sample
data and the parent population must be
similar remember so if the data were
collected in
2001 well it would be helpful to go back
2001 and look for data from that time so
that the two are like apples and apples
not apples and oranges I hope that
answers your
question now I want to now go back to um
the reviewers because we had a lot of um
we had just to see again how we
communicate this issues with reviewers
as part of a scientific process so that
you have a full spectrum of uh what goes
on so this is this is response to
reviewers in that paper so I want to
give you a highlight of some of the
issues that were raised during um the
peer riew process and how we responded
to them you can use projected data
projected data is based on actual data
just Project based on what we have we
say based on what we have this what we
expect tomorrow so if you have projector
data you can use that as well if you
have actual data even
better um so so let's look at some of
the reviewer comments Ms I'm not going
to go through all of the comments but by
the way this is always how you should
structure your response to reviewers um
you should you should group them this is
completely off topic by the way in case
you did not notice um group them by who
who whose comments this are so editors
comments reviewer one and then you say
reviewer number one comment Number One
reviewer number one comment number two
and for each comment you should provide
your response you should very clear so
that when they reading it is obvious to
them um as a joural editor this is how I
like when authors submit their responses
back to me it's it's easier for me to
read it then and and make a decision but
anyway that's what not this is not a
class on peer review let's go back to
the
topic um you're not waiting to dat
yourself the T Quest platform does
everything for you the goal of this
class is not show you how to do the
mechanics of waiting yourself manually
it is just so you understand what's
going on in the background and and you
can you can be intelligent about the
issue as well so don't worry about
waiting this thing yourself well I'm
going to show you um how it how how you
can do it on the platform but you got to
understand this the theory first all
right
so aha so this is reviewer number one
comment number two L can you confirm you
can see this word
document yes sir okay great so R number
one comment number two it says why did
you wait the data I can only accept your
decision to wait the data once I have
been convinced that you are justified in
doing so to do to do this submit a
comparison of your sample which is your
data and the population which is a
sensus data for each of the variables
that you use for your weights this
should be in table form this will show
whether you were justified in Waiting
the data or not in terms of race and
gender income Etc what groups did you on
Sample what groups do you over sample
and he goes on um because the study was
based on an online survey I presume that
you did not have an experimental design
that allows you to reway the data in a
way that makes nationally representative
I know you acknowledge that the data may
still not be fully representative of the
South African ad population your
limitations but then why did you
wait it is okay if the data are not
nationally representative they don't
need to be this goes back to issue of
your
but again our intent was to create
generalizable
knowledge if it turns out that your
sample should not have been waited then
redo the analysis on the unweighted data
and frame your conclusions in terms of
the sample only this is a very good
point right you can't you can't use
unweighted analysis and still be talking
about the nation overall your analysis
and your framing has to follow the kind
of analysis you did all right so this is
our response
we have provided the requested data
points Below in appear in order of
appearance of the columns we have
compared results for one the unwed data
set two the weighted data set three
weighted data from a 2017 sasas which is
a national representative household
survey of South African adults and four
media sensus projections for statistic
from statistic South Africa so here this
is what we've done right the First
Column is the data that were not
waited this is what we have and the
second one is the one we waited after we
wait waited the data we calculated the
same thing this is what is in the second
column column three is from another
National survey that is well respected
to be nationally representative and
column four is from the Census so when
you look at gender for
example the United data shows that
50.2% are male our waited sample shows
that to be
47.7 which is almost very very similar
to what you get from National estimates
let's go to something else like let's
look at race ethnicity since it's more
dramatic um for race ethnicity the
unweighted sample shows that only
27.3% of the sample were
black when we waited the data that was
68.8 3 it is still lower than what you
have in the National estimate but at
least it's much much closer again
remember that waiting will not
automatically Sol all the problem but it
at least reduces the
problem for wh for example they sample
on weighted samples showed that 50.4%
were white but when after we waited the
data that percentage dropped to
19.8 again it's still higher than what
we see nationally but at least greatly
improves the the
distribution so now to respond to
reviewer's question on why we
waited one is to to increase the
relevance of the research for policy at
a time when South Africa is poised to
pass several Tobacco Control regulations
including EET taxation by the treasury
and the law to deem ecigarettes as
tobacco products data are surely needed
waiting the data so it resembles the
general population better benefits
Public Health as opposed to unweighted
analysis whose distribution looks
nothing in like South African population
especially the racial
distribution it is evident that waiting
not fully correct the sampling error but
at the very least it minimizes this
bias and then we're saying waiting
including of web based service is a
standard procedure and we just this is
more of just saying this is standard
procedure um and then we also provide
more context so the reviewer had earlier
said that because the study was based on
an online survey I presume that it did
not have an experimental design that
allows you to reway the data in a way
that makes it national representative
our response to that is yes the reviewer
is correct that most probability based
samples require a different approach to
waiting in such cases base weights are
simply calculated as the inverse of the
selection probabilities and the products
of the conditional probabilities at all
stages
computed equals the
weight however there are several other
weighting protocols besides base weight
or design weight um for example in the
world of Journalism where we weight web
service we use other things like raking
and so I'm explaining here that in our
study we use a procedure know as raking
to generate this weight this accounts
for non- coverage or nonresponse buyers
but does not require individual
selection
probabilities all that is required is
the distributions of a sample population
as well as a Target
population and so we've said all right
here is the code science is all about
reproducibility so we say okay here is
the source of population marginals this
is where we got the data for the
National Distribution and here is our
code on how we implement the W in
algorithm I provide a code here and this
is how the code was
calculated and okay if you want to
recreate the weight yourself well knock
yourself out and so that is how of
course that's how you want to respond to
reviewers obviously Pap was published
because our responses were quite
comprehensive and robust um
any questions on that on that comment
and response I just want to give you
context as to the kind of negotiations
that you also have to you have to engage
with with journals before you publish
and again take note that even after we
waited the data we still had it wasn't
like it don't as if we solve the problem
completely because they the the
distribution of wi was still much larger
than what we had in this n nation and
that's what we also included in our
limitations here we said
that um some limitations exist to this
study let's see ha the final one here we
said finally despite despite waiting to
reduce non- coverage and non-response
biases this data may still not be fully
representative of a South African adult
population because adjustments were made
for only a few variables for which
information was available in the data
set remember we we didn't adjust for a
whole lot of variables we adjusted only
for age employment race gender education
marital status yeah six we only adjusted
for six variables and so that is not
like a whole lot of variables like like
I said earlier remember in most life
service we typically will account for up
to 20 variables but so for those of you
who are asking is it enough to weigh on
only one variable that's your answer
right it's no it's not enough in general
you you typically want to adjust for as
many demographic variables as as
possible
um because that just improves the
representativeness of the study and
again our intent was to create
generalizable knowledge because we were
we mean we are policy researchers like
um we're policy think tank so we it is
of zero interest to us to analyze data
among a a sample we don't even know
about no we care about National
estimates or national data and that's
why we had to weigh the data so that we
could make inferences that were at least
close to National
representativeness um I see some hands
raised up yes let's let's see your
questions yeah any
questions okay now if there are no
questions I would like to show you
quickly how this is done on the K Quest
platform we we already did look like
your hand is
up no sorry that was
okay all righty so let let me quickly we
we did this the last time I'm not sure
how beneficial this will be again but
for those of you who were not there the
last time I will just go ahead and and
show you how this is done on the kqu
platform
so um your survey when you collect your
data your survey must first of all be
closed before you can wait the data
because waiting means that you are done
with the data your data collection is
closed so it's closed so that means you
have to finish your survey before you
weigh the data remember that nobody in
a data set can have a weight of zero and
nobody in a data set can have a missing
weight
either if I if the survey were not ended
and I did waiting today and then some
people came tomorrow and answer the
question here because I already waited
the data that means that that variable
they are going to have missing values so
the platform will not allow you to
generate weight for a data set that is
still ongoing as far as data collection
so that is a validation that is built
into the data into the platform so
you've done your survey oops
sorry let
mean so you've done your survey you've
closed the data now you want to wait the
data so let's let's look at how that
works so you go to the data you the
survey you we trying to weigh
let's do a
quick all right this is it here it's
closed so I have this data set this is
just a mock example um but if you if
you've collected data this is the exact
same process you would go through uh so
like I was saying earlier there are
validations built into the system so
that the system the platform will
prevent you from trying to weigh data if
data collection is still ongoing
going sorry what's happening to my
network
here um already right
so so you come here under data access
and insight that button
there and you see there's a button that
says weight
data can we import data into soft way
yeah we're going to work we we not
released that yet but we can we we
that's one of the things we're trying to
operate um but for now you can only wait
the that you've collected on the
platform but we're going to fix that
very soon so again let me repeat the
journey you've already collected data on
personel design that's where most of you
are familiar but to where you come to
manage collected
data you come to data access and
insights this is where you Al to
download your data set and everything
but now we're not interested in
downloading the data we're interested in
Waiting the data so you click on waight
data
and here if you have done weights before
you have a history of this weights right
you can create new weights now remember
I said that um if you if you wait data
and you let you know and you you come
back tomorrow and you wanted to for
example edit those weights with a
different parameter you can always come
back here and this buttons to do a whole
bunch of things but anyway that's not
the focus of today let's create let's go
about how do you create weight you come
here if you've never done any weights
this will obviously be empty there will
be no history so you click on this
button that says new calibration
weight and next you select the variables
that you wish to use in creating the
weight right and so drop down here to
display all the variables in your data
set you now select the ones that you
wish to
use um let's say you select that one and
I select this
one and I select I'm just this is just a
a mock demo so I'm just creating just
random things here so I've selected four
variables that I wish to weigh on here
and you can see they're all displayed at
the bottom here all right now the next
step is for me
to
okay I was thinking that data sets that
waited after analysis no you don't wait
the set after analysis the whole point
is to use the weight in the analysis is
so you get the weights and then you
incorporate them in your analysis so
that's the whole point to SO waiting
must occur if you wait the data after
your analysis the question is what's the
point of of creating weight point of
creating weight is to incorporate them
into your
analysis all right so we've selected the
variables here um and and and now we
have to we have to provide the
percentages right so you see that each
here you have for example this variable
whatever variable it is we have to
provide the value these are the
categories in the data set yes and no
so if the if this variable had missing
values the platform will automatically
reassign those missing values to yes or
no so all of that process has been
completely done for you you don't have
to worry about that you now you now have
to now put the percentages in the Target
population like let's say in what
whatever variable this is let's say the
population is
40% and in for the second category is
50% now this has to add up to 100
otherwise you get an error the to the
percentages must add up to 100% right so
the system will not allow you to
progress without that so let's say I
Chang this to 60 now I can say I'm done
with this variable then next variable
this is marital status I I have to
provide the percentages not just from my
memory I have to look at the sensors
what the sensors says let's say 30% are
separated and let's say 10% are widowed
you can't have a variable that is zero
the platform how can it be zero doesn't
make any sense so it has to be a value
that is above zero um let's say um 23%
are never been married and 12% are this
and the the current percentage is
displayed at the bottom for you and 10%
at this and let's say 15% at this now
that's up to 100 can say I'm done with
this variable if it's a continuous
variable that you can only wait on
categorical variables so if it's a
continuous variable is selected the
platform will say okay how many
categories do you want to split that
variable into so like I can say I want
three categories right and then it's
going to say okay from 2010 to where do
you want to define the end the end of
the first category let's say 20 to
2015 and then from the last cut off
which is from here to like what's the
next value next value I might say is
2020 uh and then from the last cut off
which is this to this and then I will
provide the different percentages in the
population right but let's say for sake
of Simplicity let's say I just want to
wait on this um two variables let me
delete this for Simplicity so that we
can just use this
two um confirm
removal so we have two variables here
that we want to wait on so we've
selected them we've assigned the
population distributions and now we just
create generate
weight so the platform has now created a
weight for us
and you look at the history this is is
531 so this is the weight variable
here up top here these are the two
variables we use for the weighting this
is the name of a weight variable so that
when you download your data set it will
have the weight variable in it
automatically you can also go ahead and
do a whole bunch of other things like
you can delete the weight you can
compute
Benchmark look at the summary of the
weights so this is telling me that the
weights range from this is the new
weight we created this is the minimum
value this is the maximum value this is
the median so this is the five number
summary for the weight so this like you
will not understand how critical this
feature on this platform is I I you know
it's one of those things that you really
must have gone through some serious
statistical suffering to understand how
much of a just a lifesaver this this
feature is you know um but it really
really saves your life when you're doing
that kind of national surveys and you
have weight um because it really really
simplifies the process in a way that
anybody really can generate weights for
themselves and use them in their study
let I see there a whole bunch of
comments so let me stop now and take
those
questions
uh so so now you created a weight right
now when you now want to analyze your
data look at the manual we' said we said
you have to set the data to survey
weight right then you are the exact same
weights now use in your
analysis so there are two parts of a
coin creating the weight and then using
the weight so we've created a weight so
if you download this data set you'll
have a weight a new variable called V2
you can change the name of a variable if
you want you can say okay I don't want
to call it that V 12 I want to call it
something that can remember like my new
weight so I don't forget and you can
rename the weight right and so that when
you when you are now analyzing the data
in your data set you go to my new weight
and you can use that variable but the
the use of a weight will be similar to
what we discussed earlier I think we
showed you how to do let me see if I can
um the the manual I can't figure out
where the manual is now but in the
manual you have how to account for
weight but it's exactly the same process
for incorporating those weight as you
have in the manual so yes at that point
it doesn't defies exact same thing any
other questions let's
see no
questions I was thinking that sorry go
ahead I was thinking that the data sets
are waited after data analysis am I
wrong you are thinking that what is wa
after the analysis the data
the data set no is only waited after
analysis no you don't the whole point of
waiting is to incorporate the the weight
in your
analysis if that makes
sense sorry go ahead
yes can we import data into the software
for
waiting yeah you will shortly be able to
do that but for now no but we're working
on
that yeah that's that's it for now
question wise I can't see Hands today
let me try to
check I think I who I think we should
have another session that just focuses
on using the weights on the platform how
to use it I think today's class was
focus a little bit more on the theory I
I think that's important too right I
think you have to know the theory at
least a little bit of it to understand
what's going on um and so we could have
a different session that just focuses on
okay now this is just Hands-On demo how
do you actually use weights on the
platform I think that would be helpful I
see some people agree too
so so we could choose to have next week
we we're plan planning on
having um
next week we're planning to have a
session on causal
inferences long do you want us to stick
go ahead with that or should we just do
the weight
instead are we then going to do the caal
caal inferences after the week
after um yeah no well oh we can just
ignore Cal differences um Al together
[Music]
okay those who have their hands up I
think kin
okay keep next week there and then we
we'll see if we
have
oops okay but by the way Cal inference
just going to be a whole bunch of theory
it's not there's no handson or demo next
week we're just talking about what in
the world is H
inferences and so
um can we combine the two yeah I will
try but I can I can't promise it's
better to have one session where we
understand it completely than to try to
deal with two topics where we end up
being extremely
confused try to both but we just play by
ear and I understand that this weight
issues the waiting is sometimes a bit
abstract and I try to make it as simple
as possible so but I I really hope that
you can at least understand they have an
intuition for what waiting is and why we
sometimes have to deal with weight um
it's all about intent like what is your
intent do you intend to create
generalizable knowledge so you are the
one who is in driver's driver seit uh
but should you choose to go down this
route of generalizability then you have
to make sure you do it well and that's
where weights come
in and I think that what we've also
addressed today is one half of the I
assume that M people were in this last
session where we focused on design
weight right but I that may not be a
true assumption so we also one of the
other things we need to do is we need to
bring those two things together you have
designed weight and you have nonresponse
weights how do we then marry them
together to create a you know a final
weight so that you don't go away with
half knowledge so maybe we can also have
um you know the next session or
whichever session we are focusing on
handson we could use that to build on
the issue of design weights as well and
and have so that we can make your
knowledge of weight more
holistic because half knowledge is very
dangerous
Nal Hassan you have a
question you may
unmute no I don't
have oh okay your hand is off okay um
who else
sorry
Ty Ty
hello good evening
hello okay okay just a quick one um you
you said something about uh confounding
the other time and um just a little bit
of throwing of light into it from your
side my own uh mind before uh was um
there are two main instances that can
cause a variable to
be okay I mean to use for confounding
without variable number one if maybe it
is significant in your analysis then you
can use it to test for confounding or
even though it's not significant if it's
been reported as uh as an I mean
independent you know variable that can
influence the outcome then you can also
use it for compounding
uh so I mean but from the way you uh
said it the other time it looks like you
said you can use any variable to test
for confounding just a little bit of
clarification on that thank you you're
welcome confounding has to be based on a
solid theoretical knowledge so what does
that mean it means that you should have
you should have good knowledge so it's
not just any variable that you can use
right before you so one of the things I
really discourage um researchers from
doing is this things like forward
selection or backward selection where
you just it's like you're just throwing
variables into a computer and saying oh
whatever the computer brings out I'm
going to use right that is not
appropriate because
it's research should be a very
theoretical or at least Theory driven
so there are three requirements that
have to be met for a variable to be
called a confounder number one it has to
be independently associated with the
outcome number two has to be
independently associated with the
predictor and three it must not lie on
the Cal pathway so you should have the
habit of drawing DS D dgs are directly
cyclic graphs and I think we should have
a session on dgs uh because it is
extremely important in EP practice of
epidemiology dgs will help you to
conceptualize um whether or not a
variable is a is a confounder now I must
also say that the fact that a variable a
was a confounder in Mr a study doesn't
mean it's going to be confounder in my
study because no two studies are are the
same that is again why it has it is
extremely important for selection of
variables based on solid knowledge of
theory and your understanding of the
disease context or whatever the outcome
is I hope I hope that makes
sense yes yes it does it does thank you
it does okay so when we in our session
on dags or director cyly graph
we'll focus more on how does that work
in practice like beyond the theory right
in practice how does this work right so
that we're not just doing things like
robots or machines like there has has
some Theory and so we're going to focus
on that when we talk about
DS let's
say yeah um thank you very much for the
session so far um the question I have is
not on um what we talked about today but
it's just something that I have had a
problem with while trying to use like um
things like P tricks and things like
that when you're dealing with
qualitative analysis right and you have
to I'm sure you've used like C tricks
before and you have to um basically
categorize your qualitative data into
like um keywords and things like that
does CH Square have like an AI
integration in which um you just need to
like put in your I mean it has your data
and everything so it generates that
keywords that can group um group your
responses into like the required format
or does it have to be manual I don't
know if you get my question I do yeah no
everything is done for you automatically
on the kis platform so if your data is
is so for each for all qualitative data
the platform does two kinds of analysis
for you which is automatically reported
in the analysis report number one is
does the Thematic analysis thematic
analysis is a qualitative approach where
we identify themes themes are derived
from code so the platform does all of
that and so you have you know thematic
groupings of the responses second the
platform also does a world Cloud
analysis World Cloud analysis is a bit
different from thematic analysis because
it's it's looking at the preponderance
of words that were used and using that
to create a a cloud so to speak so you
can look at what words were the most
frequently used and to what extent so um
so depending on so you have a choice to
go with the Thematic analysis but both
are presented for you automatically so
you don't have to manually do it I mean
we are very big on automation so
anything that is that can be automated
we do automate it and that is something
that can easily be automated so we do
automate it all right thank you you're
welcome Abdul uh thank you very much uh
professor Isaac thank you very much now
my question is
uh uh where you did your reporting on
that secret issue respond to uh the the
reviewers is that from the code book or
it is independent of the it's after the
code book then uh secondly again uh I
like the cigarette uh stuff but the
issue is that for example we have have
different surveys that is done in
Nigeria by the Bureau of
Statistics and other uh surveys for
example when they talk about data on out
of school girls I constantly find it
difficult to uh accept whether those
data is correct can it be standardized
or can it be with it so that we can be
able to uh see how it is uh done based
on the High Square uh platform thank you
very much thanks so the reports to
reviewers that is just um it's it's it's
not based on the codebook that is just
um that was just we and we and other
coauthors just responding to the to the
editor and the reviewer so it's just
based on yeah we're just responding to
their questions and the questions may be
different from One reviewer to the other
right so it's just a direct response to
to them as an editor and chief myself I
I really appreciate responses that are
very thoughtful and that are very
indepth for me I also think of
responding to I think of response to um
Journal reviewers as also an opportunity
to to Al to clarify but also to
educate because sometimes the concept of
peer review is a misnomer because the
people reviewing sometimes are not your
peers sometimes are people who are very
early in their career
so and of course sometimes there are
people who are quite experienced too
right much more experienced than you but
my point is that sometimes that is not
necessarily your peers who are reviewing
so for me as an editor somebody who
loves who loves mentoring I Al I always
provide responses in such depth that it
clarifies but also educates so that's
why my responses tend to be more more
nuan than maybe that might be necessary
but it's a good thing because you once
you respond for the first time you're
not not respond again the paper is
almost guaranteed to be accepted after
that round of revisions um the second
question about waiting um and again this
is this is
why we need to have sessions like this
because you need to understand how these
things work and challenge sometimes
challenge the process and say I don't
think this was because the people who
are doing The Waiting are people just
like you for all you care they might
know less about waiting and you do now
if you have no knowledge of a topic
whatsoever you will just accept whatever
is given to you as well the word of God
if you understand this processes however
you can look at it and say no I don't
think this was done correctly and there
are many instances where I have read I
have used that I'm like no this is
there's a mistake with this waiting and
and I reach out to the survey you know
administrators you know and like this
this doesn't look right you have a
missing value for way this impossible to
have weights that missing but you can
only make those kinds of feedback if you
are well informed so I think a
well-informed academic Community is a
great thing because it it it improves
the process for everybody you know you
challenge things and if you don't like
it you can reway the data yourself um
although the problem is that if you
reway the data yourself it makes it
impossible to compare with studies
because if other people use data set
with the weights that were supplied and
then I Reed the data
myself that that creates a bit of a
difference so and that's why they in
your methodology you have to be very
clear on what was done now there are
some instances where I have used data
and recalibrated the weight myself even
the weight came from came with that
survey because there are certain
statistical procedures that you might be
doing that also generate weights
themselves so take for example I'm
trying to do I inverse proportionality
waiting which is a statistical process
to adjust for missing values now ipw
will generate a weight of itself
separate from the weight that came in
the asset so in that case the final
weight will be the original final weight
times the ipw weight right so now I have
a new weight altoe so that is why you Al
also have to explain your methodology in
in great detail but I think overall it's
it's just making sure were very well
informed about the process of weight and
and also just challenging right write
you can write today you can write to the
survey board and say I can you provide
more details about the methodology like
how did you weight this thing what
variables were used and I think that
that benefits everybody because it
improves our surveillance systems and
our service and data collection systems
and just makes us better
overall so um I hope that answered your
question um
okay great any other
questions okay thank you very much for
this uh class I'm so impressed and
really quite impressed and just like
Abdul I was also here wondering about
how International communities come you
are sounding sorry um Jo you're sounding
very faint I'm not sure whether you're
wearing
headphones okay yeah I'm wearing a
headphon okay okay you make you might
want to remove them or maybe you why the
other way the wrong way okay can you
hear me now ah yes what did you do did
you remove them yes I had to waiting
for okay so I was also wondering like
Abdul how we have International
Community been coming across several
information they say 40% of Nigerians
live in object poverty and there are no
variables or I don't know how they are
measured or the weight assigned to but
like you said it's always proper to
question but now the challenge I have is
that how do you question an authority
that the country itself has you know
assigned itself to like everything they
say right and you know when you're
talking and when for international uh
Publications and you don't have mention
of say un Center for housing and suties
you don't have the World Bank mentioned
it's like you haven't really covered all
information so that I find quite
challenging but my second question is
like is
can we have this how to use K squares
for
quantitative just type it and have those
information I think it will also help us
in also learning beyond the class so
that a lot of people can also be aware
of Kai's Quest so it's for me it's just
a a simple request if it's possible to
have how to use Sky squar online on
YouTube and then we can learn faster
also how to use a platform from there
thank you definitely thank you very much
so your your first point about like
estimates that come out and you're not
sure I think you can you can ask we we
need to do a better job of also asking
and this is what call you you know more
of a democratic you know science
scientific Community right we need to
start holding our scientific leaders P
to the fire
if you put out an estimate or report you
have to provide an appendix that
discusses the methodology you can't just
put out results without methodology for
how those results were were were derived
so that's something that we need to be
doing and if we see if we see that is
not done we can write to them right say
hey um with reference to your the
reports this official government report
um I see that this was not this was not
addressed
right there are many ways you can
address that you can write to them and
you can write an editorial right this is
this is that Medium you have an
editorial is
a addressed to you know issues like that
you can say okay scientific
communication are we capturing things
right right and that could also be a gap
of Gap in knowledge where you can
explore in your in your research like is
the prevalence of object poverty truly
40% in Nigeria what is the what is the
what could be wrong with the methodology
right and so you can also request for
that same data they used and reanalyze
it and and in fact that could even be
your own specialty like become me
methodologist and so I think there many
ways you could go around that to to
addressing it but I think um keeping
silent collectively is not the solution
we have to find a way of saying we need
methodology we need to be more
transparent Society in terms of how
numbers are calculated or generated for
for our
consumption and yes on YouTube we can
definitely put up you know more videos
on how to use a platform um on YouTube
and other
platforms thank you any other
questions are we trying to say that
reweight of data will always lead to
loss of some data no no no no I never
said that I said it will lead to loss of
comparability the data will not be lost
but if I weit data with one particular
variable and you weight data with
another variable remember that what
waiting is doing it is creating a pseudo
population St population simply means a
fake population that does not exist in
real life but if all of are using the
same weight at least we all create we
all creating the same fake population
right if I use my weight I create my own
fake population you use your weight you
create your own fake population oh then
how are we comparing these two
populations right so what I was saying
is that when you use different weights
you create different pseudo populations
or fake populations and so it makes it
hard to compare between Apples and and
apples because you you're looking at
completely different cedo population so
the moral of the story is that when you
wait what you're doing in effect is that
you're creating a fake
population well we don't call it fake we
call it Z population but you know
literally it's a fake population you are
creating um and so that is why if you
create your own weight you have to
provide a methodology so that when
people are comparing the studies they
won't be shocked that how come the
prevalence change so much in the two
studies well it's because there are two
different populations
altoe thanks for the lecture
how is this related to propensity
matching it's it propensity matching is
way of adjusting for
confounding um it has confounding is a
completely different issue from what
we're talking about here there are three
types of biases in epidemiology we talk
about confounding which is you know the
most common kind kind of bias talk about
selection bias right I talk about
measurement bias um measurement bias has
to do with how things were captured and
so confounding is a bit different it's
an entirely different ball of fish um
entirely and propensity score matching
is just one way of controlling for
confounding buyers the other ways too of
course um but in the context of what
we're talking this is this is an
entirely different issue we're talking
in this case we're talking about issues
of generalizability which is a form of
you know external validity to what
extent do results from our study um
carry over to a broader population and
that is the issue we're talking about
here so confounding is a a different
issue which you can address with um um
press disc
course sometimes a survey may run along
with data that is obtained from
Empirical research in the laboratory and
the hope that information from
demographic data can be related to
results from the lab how can this be
handled using the KC
platform so responses from if you if you
collect data right you collect data from
an individual let's say you and this
something we do routinely this is
nothing new like we might do um for
example a survey where we have different
components so we have a selfreported
module which is like okay what's your
smoking and where do you live and how
many people live in your household those
are self-reported
items but then we also have other um
other indicators like for example um you
know your blood sugar and your you know
your hba1c level and so many other
indicators those are captured in the lab
right you can you can also have a
scenario where um it is it is you know
um the each individual has a unique code
that is then transported or or matched
whether they are in the self-reported
mode or in the laboratory mode right and
so that makes it easier for you to
upload both responses and match them to
the the same individual um so there are
many ways you can do that so one is for
you to assign a key to so you can think
of it as two different surveys right
where you have okay survey number one is
self-reported responses so you identify
each individual with a unique ID so you
can ask them what's your phone number or
what you ask them something that's very
unique to them and you ask that you have
that same information in the other
survey that so you have the same unique
key and then you can then merge the two
data sets together with that unique key
so that that way the self-reported
information and the laboratory
information are together merged for the
same individual so there the platform
allows us to do that in many many ways
but um for for me if I were do doing it
it's just easier to use a common keyy in
two different surveys so that as I'm
collecting a lab data and I'm collecting
AED data I have a common key that is
unique to individual which I can then
use later to merge I hope that that's
the platform also allows you to merge
your data so that's um that's not
complicated at
all
okay working on a
comparative please go ahead
working on a comparative study among
adolesents living with HIV and
noninfected groups in a state question
one do you recommend waiting for both
groups using State sensus data as the
reference population question two if yes
to question one which is the most
appropriate to use for waiting National
censors or state sensus data thank
you if you're going to wait then you
have to the sample population and the
waiting population must be the same I
can't be collecting data in the states
and then I use National Data to weight
if it's a States then I have to use data
for that state the whole idea is to make
that sample to look like the target
population in this case the target
population is the state not the country
so the reference population must
be what you want the sample to look like
if you're trying if you're trying to use
National parameters then that means that
you trying to say this sample should
look like the nation but you what you
want is the sample to look like the
state so you use the state population
data as the population marginals and I
don't recommend weighting like waiting
should be should be something you do
rarely it shouldn't be like you know
just sometimes just you know simple is
simple is good so no I won't recommend
you you can just use data from that
population um for for your inferences so
the inferences are not extended beyond
the studed population which is
completely fine too so remember it's a
question of what is your
intent if the intent is just to refer to
that population then please you know
just don't worry about
weight okay does it now imply that the
variables for each data set in the
pseudo population after waiting of all
the variables that will be used for data
analysis
uh that question is a bit convoluted but
I'm going to answer
in how I understand it so remember I was
said that waiting is all about forcing
the sample to look like the parent
population the way we force the sample
is by taking those waiting variables and
you know and making that sample to look
like the parent population on those
variables now outside those variables
what whatever another variable that we
not use well that means that the sample
will not look like the parent population
on those other variables that would not
include it will only look like the
parent population on the variables that
were
included that is my understanding of
that question if if I did not if I miss
something any part of it I'm happy to
clarify
further can the can the platform not
generate the values automatically when
we choose from census conducted in
countries of
choice no it cannot do because that
would be that would bad practice to do
so like how do we know your samp even if
you said your sample is Nigeria how do
we know your sample is of of it could it
could be sample of young adults it could
be sample of older adults only it could
be a sample of anything there's no way
we will know that because guess what it
is your
intent so so as as we canot read your
intent and who is in your study
population you are the only one who can
supply those
parameters the Hallmark of a good
engineer
or one of the easiest things that in you
know Engineers or data scientists fall
prey of is trying to optimize that which
should
never be
optimized all right so it's great to be
all about efficiency and convenience but
you as a good engineer you have to know
when optimization must end and somebody
people have to do the manual work so
that that ability to say no that okay no
no no this is where you draw the line
this this is the parameter that must be
done by human being is very extremely
important you know and in this context
those are one of those things that must
be supplied by the human being because
we're talking about your intent in this
case and nobody can know that except you
you you're the only one who knows where
your study population is or the specif
ages and so you're the one who has
Supply that value
yourself what happens with populations
that might be illegal in a certain
area population that might be illegal is
that did I hear you
correctly yes in a certain area
legality criminality and stud population
have nothing to do with each other I
might be doing a study of um of
undocumented
individual the country if that is so if
if if your study is of just let's say
you're looking at study of um
undocumented sex workers right well then
why are you trying to wait the data like
what is the what is the population you
are trying to wait the data to in that
case waiting does not even make sense
because the study population is your
your interest is in that population
alone so you should not be worrying
about waiting because the inferences for
your study do not exceed that special
population alone so anytime you have a
special population in your study that
already tells that don't you don't need
to wait that data you just have to worry
about that sample because your
inferences are just going to be limited
to that sample alone that is why it's
called a special population as opposed
to a general population you only worry
about waiting when you're talking about
the general so opposite of general is
special when do when do we use gender
and sex or can the two be used into
the two cannot very different things and
they're not exchangeable at all sex is a
biolog construct sex only has two um
categories male and
female of course you can have you
know um then there is more of a bio
behavioral construct um you could have
boys girls um man woman transgender
those are completely different things
entirely so they're not exchangeable at
all now if you have data on both grade
you can ask but you also have to you be
aware that it is also very sensitive
information collecting information on on
gender in some cases in some cases you
also get a lot of wrong information
people will not you know and so in in in
in in certain neighborhoods if if I'm
collecting data in the northern part of
Nigeria for example I will not NE care
about or I would not necessar feel
confident about collecting data on on
gender uh because for one even if I did
I I will expect a lot of
misclassification so in that case I
would just collect data on on on sex
alone so you have to worry about
the how much of an issue is there how
acceptable is it is it issues of you
know um what quality of data you you
think you might get if you even ask the
question so before you ask the question
you really want to think about it very
carefully and and think of the quality
of a data you might end up with and so
sometimes it's like okay I just I just
going to ask you about the sex only like
what's what sex where you at Birth male
or female and that becomes more simple
to
collect it appears it appears that it is
not the data set that is waited but
rather the results obtained from the
data set considering the example you use
from South Africa where we see figures
in percentage is is that
correct so is we the data set we
generate so what the k platform does
when we cre that weight is that it takes
that data set and gives everybody a
weight so everybody we have a new column
called the weight column right then when
we're doing our analysis we incorporate
that
weight in our analysis so the analysis
also become weighted right
so in a n in a nutshell we are creating
a weight variable in a data set so that
we can use it to weigh our analysis if a
variable if if if there's no variable
called weight then like where we can't
wait the analysis we can only use
variables that in our data set so before
we can weight data we have to first of
all create a weight variable which is
what the platform is doing and then you
can then use that weight variable to do
your own analysis
no further question any other
questions okay in case I'm asked to
provide ethical clearance in a study
should I change my study to longitudinal
because of the protocol involved to get
ethical
clearing no the platform okay I see the
platform you know you can get ethical
clearance for both cross longital so you
don't need to change anything
if I'm to ask what is the difference
between gender and sex gender is a
behavioral construct sex is a biological
constru sex means that biology means
that it is who you are biology means
it's defined by your chromosomes so sex
is defined by xx and XY chromosomes if
you have XY chromosomes you are a male
if you have XX you are a female so
that's what we mean by it's a biological
con construct gender on the other hand
is a social construct social construct
means that I get to Define my own gender
I can come tomorrow and tell you hey
guys I am a lady
now and
you so that's what mean by it's a social
construct okay you get DEF
it you get to Define it
yourself and if I say I am a lady and
you now call me I'm sure you heard
people say oh you misgendered me
misgendered me you said meaning that you
call me a gender separate from what I
identify well I hope you get a point now
right so gender is a social construct
there are there's no set there's no
limit to the number of genders because
they keep growing every day so genders
could be you know like male you know
male and female are strictly so if
you're talking about sex sex so don't
confuse male and female with men and
women male and female are categories of
sex male female categories of sex so
don't put sex and then put man and woman
it sounds very confusing because you
assigning categories that are meant for
gender but putting them under the word
sex if you're going to say sex the only
thing that I expect to see under your
categories are male and female if you
talk about gender now gender can there
are many categories for gender that
Beyond two under gender you can have man
woman boy girl transgender you can have
a gender you can have there is no limit
to the number of genders because it's
it's and I also remember that another
thing you also have to also remember
that gender is not also the same thing
as one's sexual orientation right sexual
orientation is quite different from your
gender so somebody who identify as a a
female and then that's they say that
sexual orientation is you know they are
they are straight or they you know gay
or they are you know bisexual and son so
forth so that is a entirely different
thing but bottom line is sex is a
biological construct that is defined by
your chromosomes whereas gender is a
social construct that is defined by how
you feel that money and you know that's
why we call the social
constructs how do you discuss weighted
variables and how does it affect
statements on
generalizability of the finding Okay so
so you define the variables essentially
you could list the variables that you
use for the waiting how to for the
waiting and and why you selected them so
it could be that you you compared the
distribution of those variables in your
data set versus those in the you know in
the general population right and you
found a difference these are typically
demographic variables and in general you
for the waiting you have to for for you
to know what the waiting is the impact
on waiting you must be able to have a
benchmark so if you look at what we did
in the our oh I'm no longer sharing my
screen um if you look at how we looked
at the let me go back to my screen
quickly so if you want to talk about the
impact on
generalizability you should be able to
have a table like
this that
shows when the data are waited on and
when they on waited and the sensors
estimates that that this is helpful to
be able to know whether your waiting did
anything useful or not so that's how you
that's how you address
that okay that was the last question
okay this is one of those topics that I
feel like just a bit on um a bit
unfamiliar to many
people but I I hope you guys learned
left one or two
things good we
learning all right that's great and some
some of these things take a lot of
repetition you have to hear them over
and over and also do them it's not just
hearing you have to try your hands
on undoing them too so that's way that's
way it becomes solidified