Sample‑size considerations for different study designs

Q: Who is Chisquares on YouTube?

Chisquares is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Q: Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Chisquares

Feb 25, 2026

•

7 min read

YouTube video ID: 6su7gzcWAGY

Source: YouTube video by Chisquares — Watch original video

PDF

Single‑case investigations (e.g., the first Ebola or COVID‑19 patient) do not require a sample‑size calculation because only one individual is examined.
Case series (multiple patients with the same disease) also do not need a formal sample‑size calculation; they are used to describe percentages, means, and other descriptive statistics.

Descriptive vs. Analytic studies

Study type	Need for sample‑size calculation	Typical purpose
Descriptive (ecological, cross‑sectional)	Yes – to achieve a desired precision for prevalence or proportion estimates.	Estimate population parameters.
Analytic (case‑control, cohort, randomized trial)	Yes – to detect a prespecified difference (effect size) between two or more groups.	Test hypotheses about associations or treatment effects.

Effect size, power, and required sample size

Effect size = the smallest clinically relevant difference the investigator wishes to detect between groups (e.g., risk difference, odds ratio).
The smaller the effect size, the larger the required sample size.
Analogy: detecting a tiny object under a microscope requires a more powerful (larger) microscope; similarly, detecting a tiny effect requires a larger study.

Type I and Type II errors

Error type	Description	Common cause
Type I (false positive)	Concluding a positive association when none exists.	Multiple comparisons / “p‑hacking” (searching for a p‑value < 0.05).
Type II (false negative)	Failing to detect a real association.	Sample size that is too small.

Both errors can coexist in a single study.
Which error is more dangerous depends on context (e.g., in drug development, a Type I error may lead to an ineffective drug being marketed, whereas a Type II error may withhold a beneficial drug). The decision must be made case‑by‑case.

Sample size, validity, and precision

Validity (internal and external) is determined by sampling procedures, measurement bias, and study design, not by sample size.
Precision refers to the width of confidence intervals; larger samples produce narrower intervals (greater precision).
A study can be precise but not valid (narrow confidence interval around a biased estimate) or valid but not precise (wide confidence interval around an unbiased estimate).

Finite‑population correction (FPC)

Sample‑size formulas for surveys assume the sample is an infinitesimally small fraction of the population.
When the sample exceeds about 5 % of the population, the FPC must be applied to adjust the required size.
For most large populations (city, state, country, world) the correction is negligible; it matters only for relatively small populations.

Multi‑arm trials and multiplicity

With k groups, the number of pairwise comparisons is

[ \frac{k!}{2!(k-2)!} ]

(e.g., 3 groups → 3 comparisons; 4 groups → 6 comparisons).
- To control the overall Type I error rate, the Bonferroni adjustment divides the nominal α (e.g., 0.05) by the number of comparisons, yielding a more stringent significance threshold and a larger required sample size.

Sample‑size inputs for comparative trials

Parameter	Typical input	Notes
Control‑group prevalence	e.g., 60 %	Used as baseline.
Effect size	Absolute prevalence difference (e.g., 20 %) or odds ratio (e.g., 1.5)	Choose the metric that matches the planned analysis.
Number of arms	2, 3, …	Determines multiplicity adjustments.
Power	80–90 % (commonly 90 % for drug trials)	Higher power → larger sample.
Response/compliance rate	e.g., 60 %	Adjusts upward for anticipated loss to follow‑up.
Multiple‑comparison adjustment	Yes/No	Enables Bonferroni correction.

Example: control prevalence = 60 %, desired reduction to 40 % (20 % absolute difference), 4 arms, 90 % power, 60 % response rate → ≈ 583 participants per arm (total ≈ 2 332).
Using an odds‑ratio of 1.5 for the same scenario reduces the required per‑arm size to ≈ 37 (total ≈ 148).

Choosing the appropriate effect metric

Prevalence (or risk) difference is preferred when the absolute rates in each arm are known, because it removes ambiguity about direction.
Odds ratio is appropriate for case‑control studies where logistic regression will be used.
Different metrics yield different sample‑size estimates; the choice must align with the planned statistical model.

Block randomization and arm balance

Simple random allocation can produce unequal arm sizes, reducing power because power is driven by the smallest arm.
Block randomization forces roughly equal numbers in each arm, preserving power.

Cluster‑randomized trials and design effect

In cluster designs, participants within the same cluster are more alike (intra‑cluster correlation, ρ).
The design effect (DE) inflates the required sample size:

[ DE = 1 + \rho (m - 1) ]

where m = average cluster size.
- A DE of 1.5 → increase total sample size by 50 %; DE = 2 → double the sample size.
- When the exact ρ is unknown, analysts often explore a range (e.g., 1.5–4) in a sensitivity table.

Non‑inferiority, superiority, and equivalence trials

Trial type	Null hypothesis (H₀)	Alternative hypothesis (H₁)	Directionality
Superiority	No difference (Δ = 0)	New treatment better (Δ > 0)	One‑sided (often)
Non‑inferiority	New treatment worse by more than Δ (Δ < ‑δ)	New treatment not worse than Δ (Δ ≥ ‑δ)	One‑sided (in the non‑inferior direction)
Equivalence	Difference exceeds ±Δ	Difference lies within ±Δ	Two‑sided

In non‑inferiority trials, setting Δ (the non‑inferiority margin) is critical; it may be based on regulatory guidance, expert consensus, or the lower bound of a confidence interval from prior studies.
Sample‑size calculations for non‑inferiority trials use the same formulas as superiority trials but treat the test as one‑sided in the non‑inferior direction and require a larger sample to avoid a Type II error that would falsely claim non‑inferiority.

Identifying study designs and common mislabelings

Descriptive studies report characteristics without group comparisons.
Analytic studies compare groups (case‑control, cohort, randomized trial).
Mislabeling (e.g., “interventional case‑control” or “prospective case‑control”) is common; case‑control studies are always observational because exposure is not assigned.
Correct identification requires understanding the temporal relationship between exposure and outcome and whether the investigator manipulates exposure.

Sample‑size calculation for diagnostic‑test accuracy

Need to estimate sensitivity and specificity separately.
Inputs: disease prevalence, expected sensitivity, expected specificity, desired margin of error (e.g., ±5 %), confidence level (e.g., 95 %).
Total sample size = n₁ (for sensitivity) + n₂ (for specificity).
Example: prevalence = 10 %, sensitivity = 90 %, specificity = 85 %, margin = 5 % → total ≈ 161 participants.

Planning studies: protocol and statistical analysis plan (SAP)

A protocol describes the study objectives, design, population, and data‑collection methods.
A statistical analysis plan details how data will be analyzed (e.g., which tests, handling of missing data, subgroup analyses) and must be finalized before data collection begins to prevent “moving the goalposts.”
Together, the protocol and SAP ensure that the sample‑size calculation aligns with the intended analyses and that the study remains methodologically sound.

Key take‑aways

Sample‑size calculations are essential for any study that aims to estimate parameters with a given precision or to detect a prespecified effect.
The required size grows as the effect size shrinks, as the number of comparisons increases, and as intra‑cluster correlation inflates variance.
Type I errors are driven mainly by multiple testing; Type II errors stem from insufficient power. Context determines which error is more consequential.
Larger samples improve precision but do not guarantee validity; proper design, sampling, and measurement are equally important.
For multi‑arm, cluster, and non‑inferiority trials, specialized adjustments (Bonferroni, design effect, non‑centrality parameter) must be incorporated into the sample‑size formula.

These principles provide a systematic framework for determining how many participants are needed across the wide variety of epidemiologic and clinical‑trial designs discussed.

Takeaways

Sample‑size calculations are required for studies that need precise parameter estimates or to detect a predefined effect.
Smaller effect sizes, more comparisons, and intra‑cluster correlation increase the required sample size.
Type I errors are mainly caused by multiple testing while Type II errors result from insufficient power, and the relative importance of each depends on the study context.
Larger sample sizes improve precision by narrowing confidence intervals but do not ensure validity, which depends on design and measurement.
Multi‑arm, cluster‑randomized, and non‑inferiority trials need specific adjustments such as Bonferroni correction, design effect, and one‑sided testing in their sample‑size formulas.

Frequently Asked Questions

Who is Chisquares on YouTube?

Chisquares is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Sample Size Determination Recommended

explains methods for calculating required participants and helps apply concepts now

Amazon →

Clinical Trial Design

covers trial planning including multi‑arm and non‑inferiority designs, supporting current research needs

Amazon →

Epidemiology Textbook

provides foundational knowledge on study designs and validity, useful for immediate study planning

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

a a case where we had like the first
Ebola case or the first case of
um let's see um I will just move it to
the side all right the first case of
covid first case of Ebola um we are just
looking at one person right and we're
trying to see what what are your
symptoms like what's the age
of and so as you can imagine we don't
need a sample size calculation for that
kind of scenario
we also don't need sample siiz
calculations for Cas series a Cas series
is a study type where we have multiple
cases so again going back to the Ebola
example or the covid example we had the
first person who was diagnosed and then
we had other cases um multiple cases
make up a series like think of when you
watch Netflix a series is made up of
many episodes so that's exact same thing
right you know a a case series is made
up of many many cases put together and
that gives us a more complete picture so
with case series we start to do
quantitative analysis like percentages
means descriptive analysis but again we
don't need sample size calculation for
Cas series because I mean we we can't
determine how many cases are going to
have a disease right so we get we take
what we get um cross-sectional studies
eological studies and all the ones above
may require sample size um but even at
that there is also a distinction between
how we calculate samples size for
descriptive studies which will be
ecological studies and cross-sectional
studies versus how we calculate sample
size for analytic studies analytic
studies are studies where you have two
groups or more being compared so
analytic studies include case control
studies cohort studies whether they are
retrospective or prospective and
randomiz trials those are all analytic
studies or comparative studies right and
so when when we talking about sample
size in the context of analytics studies
we talk about the word called effect
size effect size is simply saying what
is the smallest Gap that we expect to
see between the treatment group and the
control group so what what what Gap what
difference between those two would we
determine as being something that is
significant to us you the investigator
you're the one who will set that number
like you say what do I consider as
something that is clinically relevant is
the gap this small relevant clinically
or does it have to be this big yeah you
determine that of course know that the
smaller the Gap you want to see the
bigger the sample size so the smaller
the effect size the the the bigger the
sample size you can also compare with
the analogy of a microscope the tinier
the thing you want to capture in a
microscope right the the the bigger the
sample size you the the microscope you
require so this is supposed to be a pop
quiz for
you the more powerful a microscope is
the blank the size of things it can
detect based on what we just said in the
previous slide is the answer a or is the
answer
b b bigger
a a okay we want we want we want the
people answer A and B to defend their
answers why is it B let's start with a B
hello yes hello my own answer is B
because the power the more the powerful
the M
microscope the more bigger you will see
the orm
or object place on it because normally
object are very tiny mostly
microorganisms so you have a very
powerful microscope the more um you
adjust the the the size that the size
the bigger you see the
object okay thank you very much for your
Insight um the answer of the person who
answered a can you justify your answer
yes I I think it's a because uh for a
start you only need a a microscope to
look at things that ordinarily cannot be
seen by the eye so if it's smaller you
need a a a a more powerful microscope to
be able to see I think a yeah you're
absolutely right so if think of
something is big if it's big already
well we don't need the microscope right
yeah imagine let's let's talk about
extreme like things are dramatically big
an elephant well you don't need a
microscope to look at an elephant
because it's big already but as we start
going towards things that smaller and
smaller think of the like a an N even
smaller right think of an electron right
you you start needing a more and more
powerful and much more bigger microscope
right so the smaller the effect so the
point we're trying to make is that the
smaller the effect size you want to
detect the bigger or more powerful the
study you need so if you want to detect
an odds ratio of at least
1.1 and then researcher B wants to
detect an Al ratio of at least
1.5 so 1.1 versus 1.5 which one will
have a bigger sample
size the study with effect size of Al
ratio 1.1 or the one with what ra 1.5
which one would have a bigger sample
size
1.1 yeah 1.1 yeah so some of these
things the the point I'm trying to make
here is that you need to understand
these things by first principle such
that without even thinking you can say
oh no no no that we can't afford that
kind of study because the sample size
for that will be much bigger so you
should be able to just from from by
reflex know that yeah if I have a study
that requires this effect size this will
be the impact On sample size without
even having to go and calculate anything
you should know those things from first
principles just that okay this is the
relationship between power and effect
size because yeah you need you need that
kind of Baseline
knowledge we talked about the the we
also talk a lot about um certain types
of Errors type one and type two errors
again these are some fundamental things
that you need to know just reflexly you
should know that okay if a sample size
is too small what what would be the
impact right just from your spinal cord
like that you should know those things
um and that helps you to assess study
and you know and whether or not there's
error in them and she also talk about
the kind of error so there are two types
of stat statistical errors that are
Irrelevant in this context type one and
type
two right and you can remember the type
of error with this story of the story of
a the boy who cried wolf so who
remembers this story we've talked about
it before who can give us a short
Narrative of a
story
nobody yes
sir yes sir can you hear me sir yeah we
can hear you
yes yes um okay the story is that the
boy cried wolf and when people came
around they didn't see any wolf so
the next time he cried wolf wolf and
there was actually W nobody answered
himly I remember
very that's excellent so what was the
first mistake that happened in this
story
there yeah
exactly so we can call that can we agree
that we can summarize that this was a
false positive
signal yes does that make sense that was
the first mistake that happened in his
story we had a false positive signal
it's like wolf but there is no wolf
right you know so it was a signal but it
was positive right so and that reminds
us of what a type one error is it's a
false positive right and and you come
out and you say oh our study found a
positive association between um eating
chicken and developing cancer wow and
like people like wow that's that's
really really
scary but it is it is a false positive
right in reality that Association in
nature that Association does not exist
but you found it in your study all right
that is a type one error who knows the
most common cause of a type one error
what is the most common reason we have
type one errors false
positives um see large sample
size um okay not not necessarily that
could that could contribute but that's
not the most common reason but that
that's that's a great
answer when you are looking to add
exactly looking exactly and that that is
called different terms right we call
that P hacking we call that fishing
Expeditions right we call that multiple
comparisons people just want to publish
their paper so
desperately somehow they they've heard
somewhere that there's something called
P of P value of 0.05 right and somehow
your study has to magically be smaller
than P less than
0.05 and so they begin his conquest of
finding this P
value and they compare and they compare
and like oh finally we have found the P
value that is the most common cause of a
type one error multiple comparisons or
just going on the fishing
Expedition so that's one side and then
on the other side you you have a type
two error right type two error is in
this case there was now a wolf but you
know we did not detect the signal it
cried wolf but we refused to listen so
that's that's a false negative right and
so what what what the most common reason
for a false negative result or a type
two
error sample size yeah what about sample
size too small or too big I think too
small exactly too small can you have a
can you have both type one and type two
error coexisting together
yes or no it's not it's not I promise
it's not a trick
question yes
sir what do you think can you have it or
you
can't we have two
NOS okay
why can't you have them for those who
said
no why why do you imagine that you can't
have them coexisting
together because you're saying that it's
impossible to have a study where you
have a small sample size and multiple
comparisons but is that really true
okay well you can have both of them can
exist you can have both type one and
type two error coexisting the study
right um which do you think is more
dangerous type one error or type two
error of course they are both bad but
which do you think which you think is
more
dangerous let's say let's let's focus in
let's say in pharmaceutical space let's
say a pharmaceutical company is making a
drug for a life or death
condition right and would you rather
have a type one error or type two
error we have mostly type one and type
two okay so you rather say you rather
this pharmaceutical company comes out
and say that this
drug protects
lives when in effect it does not
right or will you rather a on the side
of caution and say well um this drug
does not work so will you rather have
pharmace companies over promise and
under
deliver or just under promise that's
what we asking an effect because you
have to understand what these STS mean
Beyond just saying type one and type two
error you have to understand what they
mean in real life right you also have to
understand how we prevent them from
occurring in again in real life real
life application not just the theory of
it right so um so in in the case of of
of
um type two in the case of pharmacetical
pharmaceuticals right will you rather
have a type one
error will you rather try to avoid a
type one error or avoid type two
error avoid a type two error
okay
um type one you know in general right in
general um because remember that the
pharmaceutical compan is is motivated or
any company right they are motivated to
show results so if this result doesn't
show positive well they can go back and
try again right and then they can
continue investigating in more and more
studies that's fine so any day in time I
would Al I always leave with a type you
know I rather we avoid a type one error
which you you is that you say that wow
this drug kills everything right when it
in fact it does not um and so but again
the condition is always is it's a
very Dynamic and a very contextual
condition right you know in some
particular cases in in an entirely
different case it might be better to
worry about type two error than type one
error right so it all depends on context
but the point I'm trying to make here is
that you need to understand what the
terms mean in practice not just the
theory of it and you need to be able to
evaluate on a Case by case basis which
one you should be more worried about in
any situation because it's not set in
stone in in one particular case you
might be type one error might be more of
a bigger problem than type two error
than in other in other settings so um
you shouldn't just have a as if it's had
wired that this is just the way these
things work you know it varies based on
context so you should have you should
have that expertise enough to
contextualize any particular problem
you're in all right let's move
on so this is just to correct a
misconception
here uh that I also find very prevalent
or very
widespread so here in the very first
sentence here it says a sample that is
larger than necessary will be better
representative of the population
and will H provide more accurate results
can someone critique that
sentence for better of a wor what is do
you agree or disagree with that
statement why or why
not yeah I would say um first of all a
sample size need like once you calculate
your sample
size you have to work with that at first
in order not to waste resources
especially like in a large study where
you're trying to in investigate a new
drug and also it's also good to have
like a large sample size because of like
so many people might drop out from the
study eventually so it has both its pros
and cons okay I have I have a problem
with two words there can you guess which
two words I have a problem
with um larger than necessary okay yeah
no try
again
um more accurate more accurate accurate
the word accurate yeah and what's the
second
word bet representative exactly okay so
they they the claim here is made that
sample size it lik sample size is
associated the more larg the sample size
the more accurate your results or the
more representative your results that's
a central claim being made in that
sentence so the question essentially is
is does sample size confer accuracy or
representativeness to a
study yes or
no I think no I think there there are
many there are many factors that we
consider from sampling procedure to bias
say measurement bias and other
determinants that really determine the
accuracy of results not really having a
big sample you can have a very big
sample but if you have uh multiple
measurement biases and and selection
biases then the results will not be
accurate yeah I think spoken like an
epidemiologist I like that answer right
you're absolutely correct sample size
has nothing to do with validity validity
simply means accurate
right and representativeness is just
another type of validity it's external
validity right we talk about internal
and external validity so if sample size
has no impact on validity what then does
sample size have impact
on we talk about validity but another
word we talk about all the time which is
what think sample size more it's more
has more to do with the power of the
study okay and that is because affect
what Precision Precision
exactly the question of how many people
we need is a question of precision the
question of how we select those people
like sampling is a question of validity
right I'm sure you've seen all of this
this um Bull's Eye issue of image so
here you have this guy the the dots all
fall in one same place right but none
hits the Bull's Eye this is reliable
reliable also means Precision right the
result here is
reliable but not valid because validity
means you hit the Bull's
Eye here you have the result hit the
bull's eye but it's everywhere right it
is valid but not
precise and by the way we measure
Precision by the width of the confidence
interval so when say that results are
not precise we mean that they have very
wide confidence intervals when we say
the results are precise they have very
very narrow confidence inters so if you
have a very very big study you're going
to see that it's going to have very very
narrow confidence so the results are
precise right but they not not they may
not be valid that's what we're saying so
it's it's very important to understand
the difference between those terms
validity is influenced by how we sample
and measurement issues right sample size
primarily affect you know um you know um
Precision of our
estimate so there such yes before you
move further there are questions uh
kindly help throw more clarity on the
presence of both types of errors in a
study or some or same hypothesis thanks
someone ask the same
question so
um they I can see the question you're
reading
okay okay sorry I got I got it yeah so
the presence of both types of errors in
the study okay
um Let me let me show you let me let me
let me pull up a stop share let me pull
up a an example it's more
um it's more applicable study from pop
hey okay can you see um can you see my
screen yes we can okay so here this is a
study we did with some colleagues at
Merc um looking at exploratory analysis
of clinical trial data so in the path
towards creating any medicines there are
typically two agencies that work very
closely together right you have
Regulatory Agencies like in this in the
US
FDA but then you also have you know um
htas you know um those are necessarily
more concerned about you know um issues
of reimbursement and issues of payment
right so that's what health technology
assessment agencies do they like okay
this the Regulatory Agencies like
European medicines agency has approved
this drug all right who are we going to
pay for to get this drug so different
these two agencies are looking at very
different things regulatory agencies are
looking at does this drug issue two two
major things safety and efficacy um is
this drug does this drug work and is it
safe that's what FDA is looking at
that's what the European medicines
agency is looking at that's what um
nafdac is looking at right safy and
efficacy but HT agenes are looking at
very very different things issues of
cost and issues of you know equity and
all of that so what happens is that when
you release results from clinical trials
HT agencies would sometimes ask you to
perform subgroup analysis like you know
and sometimes numbering into hundreds if
not thousands of subgroup
analysis right because they want to
justify why they should or should not
pay for that medication for particular
subgroup so what what we did in this
study was to look at the dangers of
inter over interrogating the data like
what happens when you take when you take
a a study that has been done and you
subject it to the kind of analysis that
are requested by HT agencies what do we
expect all right because now here you
you have to worry about two things now
you have to worry about type one error
because of over interrogation of a data
but you also have to worry about type
two error so let me that's what that
essentially what we're trying to do so
we took a clinical trial and we analyzed
that clinical trial as per the
requirements so let me just show you
some of the key findings from the study
that emphasize the dangers of both type
one error and type two error in certain
studies right
so this ha this is an excellent figure
this fig shows you um so again we took
some of the the studies um some of the
HT assessments that were done in Germany
right done by please can we mute
ourselves the corporate
here all right
thanks so when when we took all of the
the HT agent you know HT evaluations
that had been done you realize that in
some of them they they analyz
populations where as small as
6.9% of the original sample size that
was required for the full analysis
imagine that the original analysis
required 1,000
people right because remember when
you're doing a study you Cate sample
size the sample size tells you that you
need 1,000 people you do the study you
find the result then another person
comes along or HD agent now says oh we
want to do a subgroup analysis among
this subgroup but this subgroup you're
asking for is just less than 10% of the
population so you're talking about less
than 100 people that you are using to
analyze for an effect that we had
calculated sample size and determine
that we need 1,000 people that is that
of course you you might end up coming up
as showing that oh there is no
significant difference right the
treatment has no effect and that's
dangerous because based on that you now
say we not going to pay for that
treatment among that group like that is
very dangerous thing to do right so in
this case what we're worrying about here
is both type one error and type two
error type two error because you have
very small sample sizes right the sample
sizes do not meet what we had required
originally in the original clinical
trial and so the chances that you are
going to find something significant are
very very small right and then the fact
that you also have multiple groups
because um the they this is the what we
call here the problem of Multiplicity
right you are having so many groups that
you are you are you are comparing you
know like like I said earlier in in in
some in some cases you might might have
hundreds of groups that you are required
to do subgroup analysis for that is
multiple comparisons so you you might
end up finding something just because
you're are looking so hard so in this
case you can see how you have an issue
of both type one error and type two
error and this go also apply in just
regular studies too because um you end
up you know you take a study that you're
looking at and you you end up having a
scenario where you you're saying oh
we're looking at this subgroup and that
subgroup and you end up looking like
like 50 subgroups right and your sample
size to begin with was not even large so
you can see how you end up having a
problem of both type one error and type
two error so that could definitely
manifest in in you know study and and
then of course that that comes with
issues of reduced precision and power
because the sample size is not small so
you have very wide confidence in PS
right and then of course you have small
small you know small power because the
sample size is small inherently so this
this this this was just to call
attention to the fact
that when you do that you end up with
you know resulting in results that are
arbitrary so again here right this in
this in this in this table here what we
did also was to take the original take a
clinical trial that had been published
and we reanalyzed it by the requirements
of some of the HD agencies and you'll
find
that sometimes your results are positive
sometimes they are contradictory so like
how can the same study provide the
results that are saying this is
protective at the same time saying this
is harmful or at other time saying this
there's no effect well the result is
that you're just this is this is just
noise it's just noise in the data right
so this is the the whole point of this
study was to call attention to the fact
that yes you know um that there's
inherent danger when you do that that um
there there you know that there are ways
we can do better right we should realize
that um first of all that there's danger
in such and you know and then here's our
suggestions for how to improve that
process but you are scientist right
that's why there's a need for you to
understand how do things cool how do
things you know um impact your your
studies so that you can determine a
prior the subgroups you're going to
compare if you're going to do any
comparisons it's not just that you're
doing your analysis and then you're like
H I think it would be nice to compare
this results in this group as well no no
no if you're going to do any comparisons
that should have been specified in your
protocol a long time ago before you
collected data it's not when you collect
the data
so when you collect start live you start
just doing anything that comes to your
mind that is that is cheating that is
not allowed you know analysis should be
done per protocol um but anyway I hope
that that that that makes
sense all right um so we can move on
now again this is a study I shared
earlier uh
um that you
know it is scary that you know of you
know randomized trials published in top
journals only 34% correct correctly are
calculated sample size you know that is
a very very scary statistic especially
when you consider that these are like
the top journals we're talking about
journals like Lancet and Newland Newland
Journal of Medicine and bmj and you know
so it's like wow if if if such a high
error rate is occurring in in in this
kind of journals then it's it's it's
just crazy when you think of what's
happening in other journals that are
know as top tier so it's it's clear that
um sample size calculation is an very
important skill we need to to to harness
and to improve on and to become experts
in so what I had done was you know the
sample size exercises I had posted were
pulled from this site I I had looked for
some of the you know some of the
toughest challenges I could find and
those were some of the questions that
had been asked by actual people who
wanted actual results to their study so
let's go through some of them um we went
through a few of them the last time but
again I'll repeat because we have some
new people in the
audience so this one was a sorry is
there a
question okay maybe not so this one was
a question asking for sample size
calculation for a forarm trial all right
and this is what is represent presented
here so here you have they have one
experimental
arm and they have three um control arms
or you know three standard arms
according to the
question according to question in the
one control arm the prevalence of the
outcome in that control group is
60% the effect size they wish to see is
a 20% reduction in the experimental arm
so they want the prevalence to reduce
they expect to see a reduction from 60%
to 40% right so that is so the question
is if I have this arms and I want to see
this reduction what is the sample size
that I need for my
study one of the things you have to
worry about in So just talking talking a
little about the the the technical
aspects now when you have multiple arms
in a study one of the things you have to
worry about is multiple comparisons
right so if you have just two arms in
your study you have only one
comparison so if you you have an A and M
B you are only comparing comparing A and
B so of course that's reciprocal a
versus B is the same as B versus a so
it's just one
comparison when that increases to three
groups a b and c now you have three
comparisons A and B A and C B and C and
as the number of groups increases
um oh we still have this error in the
slide then as the number of of groups
increases then the number of comparisons
increases right so now you have with
four groups you have six comparisons
right um this should be this should be C
and D so this is a this is a
typo and so we can express that
generally with this equation this the
equation that tells you the number of
comparisons that will be in your study
so if you have n number of groups the
total number of comparisons that you
will have will be represented by the
equation n factorial all over 2
factorial n minus 2 factorial all right
so if we express it here for three
groups n is three so that would be 3
factorial ID two factorial all over 3
minus 2 factorial that will be three so
the more so you have this is one formula
you need to know
because why do you need to know it well
you need to be able to adjust you need
to do a you know um
you need to be a to explain what what's
going on right so even know of course
the K Quest application does everything
for you so you don't have to worry about
that and you don't have to know the
formula but you should know that the
more groups you have in the more arms
you have in your study right you the
more they need to adjust for
multiplicity remember multiplicity is
when you're comparing many things a
versus B A versus c c versus d right so
you have to adjust for that and how do
we do that well we make it more
stringent for us to declare that
something is significant right so we
take the nominal P value nominal P value
is p
0.05 I will make it even smaller by
dividing the nominal P value by the
number of comparisons so in case in the
case of where we have three groups three
comparisons rather we are dividing the
nominal P value which is
0.05 by three so that our our new
adjusted P value which we call the bond
ferony adjusted P value is now 0.017 and
so when we say that some a result is
significant we are judging it not by
0.05 but by 0.017 so that is what we
call the bonfiring adjusted P value
right and the smaller the P value
required the larger the sample size so
that is why this is how that is tied to
the P value so that means that the more
groups you're going to
compare the smaller the P value are
going to impose based on the adjustment
and bigger your sample size will have to
be so that's the whole without going
into the theory that's that's the whole
essence of why you need to adjust for
that so let's show how with this one
with this example we have here how we
can calculate the sample size on on the
K Quest um platform but any any
questions so
far none okay all right so let's go
ahead and culate that
um so we that new sample size we select
comparative study with three or more
arms and with percentage as
outcome so that's the first thing we do
there the prevalence of outcome among
control group if we go back to our slide
this is the standard am one the
prevalence is 60% based on the narrative
or based on the question that the person
posed so we
okay somebody asked go over the question
of multiple comparisons um let me finish
this and then I'll go over the the issue
and maybe you can also get an answer
while when we review
that um so 60% is the prevalence of the
outcome among the control group so we
enter 60% here prevalence of outcome
among control we enter
60 then the next thing is we have
provided you with a lot of options for
you to Define your effect size remember
the effect size is the minimum Gap you
want to see between the treated group
versus the controll groups right so the
question is what is that how do you how
how are you defining the treatment Gap
well in this case it's already defined
for us right this person says they want
to see a prevalence of 40% in the
experimental arm so we can just go ahead
and put the lowest prevalence of a
treatment groups is
40 and then the rest are standard level
of confidence is 95 number of arms is
four because we have four arms there 1 2
3 and four that includes the
experimental arm as
well the power is 90% the UNP response
rate or compliance rate let's say
60% and then here you're going to check
whether or not you want multiple
comparisons let's say we don't select we
don't we say we don't want to do M
multiple comparisons and we get result
okay let's see one of our parameters is
invalid
see so we have percent outcome 60% this
oh okay this is invalid because we we
provided odds ratio of it's impossible
to have an odds ratio of 40% so we tried
to we've tried to protect you from
yourself so what you should select is
this one and then
get
results let's
see lowest prevalence of treatment
groups
60% Let me refresh this page maybe it's
still sticking on with the
old so we'll put a lot of validations
into that so that when you come with a
result that is beyond normal
expectations so that you don't get a
result that does not make sense right so
40% 60% do among the control group The
prevalence of outcom here is
40 level of confidence is
95 this
is
um CH the wrong study comparative study
with three or more
arms all right so
40 lowest prevalence of treatment groups
we entered that
as so 60 here and 40 here based on the
question asked by the person level of
confidence 95 number of arms is
four power desired is 90 let's
see so they we need 583
individuals um per group and we have
four groups so we need 2332 individuals
across the four groups right um if you
you can you can choose any of these
other
options um but you you have to obviously
provide a number that makes sense if you
say you are looking at prevalence
difference let me explain what this
different this these
are the first one is you you're saying
what is the percentage of the outcome
among the treatment group so that is
what we're saying Here Right This is
40 prevalence difference is saying what
is the difference between the control
group versus
the treatment group so in this case if
I'm looking at the differences in
prevalence then what I will enter will
not be 40 but 20 right so in that case
if I said prevalence difference what I
would need to enter here then would be
20 so let's enter 20 and let's see the
results all right um so now we have a
prevalence of 484 persons per group for
a total sample size of 1,936
across the the different groups right um
so then the next one will be the odds
ratios between the treated and untreated
so let's say we want an odds ratio of
1.5 so um with with an ODS ratio of 1.5
we have we we we require a number of you
know um individuals per group at 37 and
then times 4 that be 148 so you you can
play along with this and try and alter
the the different parameters right um
and see how that influences your sample
size so if for example I'm interested in
the in let's say um this particular
calculation I might um change different
levels of the prevalence for example I
have three groups here and so I could
provide the prevalence for group one
group two as different possible
scenarios to see what the sample size
might be so that that way um let's see
why 30% un response rate well I use 30%
um you you should base this
on this response rate should be based on
what you think um will be the compliance
rate in your study in general clinical
trials have high high compliance rate
because people are there we're following
them at the exact same time um but you
can pick a value that you think works
best right and so here in this case
let's go back here lowest prevalence of
a treatment group 60 and 40 here all
right but keep it at 60% response rate
now don't try to be too optimistic right
um I can change the response rate and
see how that affects my my my my you
know my sample size right you know now
don't try to be too optimistic don't try
and oh we're going to achieve a a
response rate of 100% you have to
acknowledge that there will be some non
compliance in your study there will be
every study has attrition right so you
can you can try and estimate what that
attrition level might be maybe based
from you know results of the pilot study
maybe you did a pilot study and you
found that okay most people found the
treatment acceptable and most people
actually stayed on treatment right and
if your if your compliance rate from
your pilot study was 80% well you can go
ahead and put 80% and you have a
justification for why it's 80% and of
course that will reduce the sample size
you need right um so you can you can try
different scenarios and see how that
affects the sample size for the study
and and then you can decide on what
makes sense for you to go ahead with I
hope I hope that makes
sense okay um
so using prevalence difference achieves
a different answer so let's go let's so
if you have 60% and you have let's say
lower prevalence of um lowest prevalence
is 40% let's go back to
that and then we now say so let's try
and replicate that again so with the
lowest prevalence right we have 60 and
40 um and then prevalence
difference right so here we have 182
728 and here we have 219 876 right um so
why why do you think um that those kind
of things influence the the outcome
any any thought like if you entered an
odds ratio for example right of
1.5 and you try and calculate that and
you enter the prevalence ratio of
1.5 and you calculated that why do you
think that the results change from let's
say one indicator even they sound alike
right
um why why would that why would that be
the case
those are not necessar identical
constructs right um for example if we're
talking about ODS ratio versus
prevalence ratio they're similar
especially when the outcome is very very
you know um rare those those two may be
interchangeable but they're not
necessarily mathematically equivalent of
each other um and so you will expect um
smaller differences from you know from
one indicator versus another um try and
use the indicator that is most aligned
with how you're going to analyze the
data right um and so for example if if
I'm going if I'm if I'm doing a case
control study I would rather use odds
ratios because in the case control study
we use matched do ratios for analysis so
so you can be sure that the different
ones will give you slightly different
estimates and that's why you want to
compare to make sure that you know um
you get what you're looking for because
they
the the prevalence difference could be
in either direction right prevalence
difference of 20 could mean that you are
looking at the prevalence of 60 versus
80 or 40 versus 60 right it could be
anything so the more precise you are
then the the the better you know the
more robust the results you're looking
for so giving a choice of let's say
um 60 versus 40 if if the absolute
prevalences have been provided that
would be the best because there's less
ambiguity about what we're talking about
all right because here we're saying the
prevalence is 60 and they they the
control is is 40 or vice versa right um
but saying a prevalence difference of 20
relative to the control group will mean
I'm either saying the experimental group
is 80 or that experimental group is 40
so now you see have a I have a different
interpretation so if you calculate a
sample size for 60 versus
80 versus 60 versus um you know um
versus versus versus 40 there are
obviously differences right those two
things are not the same so the more
precise you are the the the the more
accurate
um I will have thought that prevalence
difference will result in a higher so
prevalence difference again remember
that it is it is you know whatever
whatever you specify here so let me
let's go back to the formula those are
very good questions but let's let's go
back to the formula and look at what is
going on behind the
scene so um this this is the formula for
calculating sample size for a multi-arm
trial right and is total number of total
sample size
um they
the K represents the number of groups
under comparison so in this case um K is
four um because we have four arms in our
study um K minus one denotes the degrees
of freedom Alpha represents Alpha level
which is typically set at 0.05 and this
is the mechanism by which the bond
feronia adjustment influences the sample
size because when we have a a bom for
adjusted P value right the alpha level
would now become much much smaller and
with a much smaller Alpha that means
that we will need a much bigger sample
size um beta represents the the type two
error and one minus beta is the
power the Theta is the non-centrality
parameter which is the most important
thing here when we're calculating sample
size for um to for multi-arm
Interventional studies right so this is
the major difference when we're
calculating sample size for um two arm
studies versus multiarm studies this
parameter here the noncentrality
parameter right so at the end of the day
what we are really the issue that you
have raised has to do with how we are
capturing Delta right because Delta is
the magnit of a difference on the aine
scale it's not necessarily what we're
measuring it on is not the we take that
whatever difference has been reported
and the difference in the context of a
multi AR study is measuring an axent
square root scale
right so when you when you convert those
scales whatever absolute things you've
measured on that scale small differences
could translate much much wider um you
know um values and that's why you want
you that's that's the whole point of
trying to make sure that you test you
know have a a sensitivity table I think
I shared this
earlier you have a table like this one
here um if I can find it
right this a sensitive table that would
say okay when I provide a prevalence of
this and prevalence of that in this
group this is what I find right so
bottom line is that the more specific
you are right the more specific you are
the the the more you know easy to
interpret the number you're saying right
so giving the choice of prevalence
difference or the actual prevalence
estimate I will always go with the
actual prevalence because um again like
we mentioned earlier the difference
between um let's go back to the slide
here yeah um given given a prevalence
difference of 20% right 60 versus 80 and
60 versus 40 um you know those are very
different things because we're looking
in Rel in relation to the comparison
group right we have a fixed reference
group that we're trying to Anchor and so
um again remember that was saying is
this a is this a superiority trial right
where we're saying the treatment we're
looking at is is is under the null
alternative hypothesis we're saying it
is better than the control group so the
direction matters it's it's not as if
that there it's it's the direction does
not matter it does matter and so that's
why the more you are more precise about
the the prevalence the better for you
all right so let's let let me see are
there any other questions in
the is there a possibility that three
standard arms can have different
prevalence yes definitely you can have a
you can have a scenario where the three
have and and this is let's look at the
second example that addresses the issue
you've raised um but let's I want want
make sure that there are no other
questions I have not I have
missed yes the sample size is adjusted
for multiple comparisons and we use Bon
feron adjustment and so in the sample
size app you have the option there to
select that
checkbox that indicates that you're
going to do multiple comparisons and
that will automatically adjust the
sample size for the multiple comparisons
and again that has to do with the value
of alpha right so when you specify that
multiple comparisons will be done we
divide the nominal Alpha which is
0.05 by the number of comparisons and
that gives us the bond Fair adjusted P
value and of course you have a much
larger sample size required with that
kind of adjustment all right let me see
other questions that I have not
answered
um all
right how does number of
arms how does number of arms affect
block randomization do I need to mention
the number of blocks depending on the
arms in the study to make my calculated
sample size so block randomization
essentially is
um that's a good that's a good question
um there are different ways to randomize
right um randomization will not
necessarily result in equal sample size
so that's very important to
know and and so when you do when you
assign people at random so let's say I
have four arms in my study and as people
are coming coming I am simply just
assigning them at random to you go here
and you go there I will end up having a
study where the arms are very different
in size right and that's the whole idea
of doing block randomization block
randomization says you know that we are
assign at random to the groups you know
we start at at random on the group and
we assign in blocks so that the whole
idea behind block randomization is to
result in groups that are much more
closer in size right and the reason why
we care about the sizes being
approximately the same is because
statistical power is driven by the
smaller arm smaller arm right so if I
have a let's have let's keep it let's
keep things simple if I have two groups
right group a and Group B and group a
has a a much smaller sample
size not only does that affect Precision
that also affects power because remember
that power is driven by the smaller arm
right that's so we we we we need to make
sure that our our groups are as you know
evenly sized as possible for that reason
so um with with with with with this
calculation we assuming that that of
course you're going to do um you know um
block randomization because the sample
size per group here you just multiply
that by the number of groups to give the
total size right so that assumes
obviously that you are going to have
roughly sized you know um groups but of
course in in reality it's impossible
almost impossible to ensure that the
arms are like identical in size right
it's it's quite hard to pull that off
but again sample size is all about
estimations it's not all sample size
calculations are wrong right some are
useful you can think of it that way the
same way we talk about models all models
are wrong some are useful so but with
that at least that gives us a closer
approximation to what we need to achieve
for us to get results that we are
looking for
um so let me see what else is
there is there is a possibility that the
three standard arms can have different
prevalence if so how do you calculate
the sample size so what you do in that
kind of case is what was showed here in
the in the sensitivity
table you calculate the different um you
you take M1 M2 M3 and four which have
different prevalence estimate you
calculate the sample size that's needed
and then you can say okay the based on
the three three different arms these are
the VAR different sample sizes and for
you to be more conservative of course
you have to go with the highest sample
size assuming you have the resources
right if you don't have the resources
well you can argue for why you are going
with you know um the smaller arm size
right but that is what you should do in
such a case so option number one you can
Pro provide sample sizes for the
different
arms option number two you can take the
average of all of them so in this case
here let's go back to that that
arm so in this case I could take the
average of standard m one standard M two
and standard M three the average of them
I would then put in my formula here so
that when the prevalence if I'm talking
about the prevalence of outcome among
the control group instead of 60 I could
put let's say average is 55 right and
then the prevalence of the treatment
group is 40
again the level of confidence is 95
number of arms is four the power is 90
the unsp speeded response rate is this
and now this this is my sample size
based on the average of course it's
going to be different that than if I had
used the lowest group or the highest
group right but the average is at least
more conservative it's a more
conservative approach to take all right
um let's go to the second
example so this one is a bit similar to
the previous
one but the only difference is that
instead of having multiple control arms
here we have multiple intervention arms
so we have one control arm and then we
have two treatment arms you remember in
the previous one we had multiple control
arms and you know and we had you
know um one treatment arm here is the
other way way around but the same
principle applies right we can only have
one number to use in our sample size
calculation so here we can either take
one of them or we can take the average
of them um so I have a pop quiz for you
here with many treatment groups we we
need one number to represent all the
treatment groups in calculating effect
size right so if this is a control group
and this is the lowest of the treatment
groups and this arm this middle bar is
the average and this is the highest
right the question is which of the
treatment options is most likely going
to yield the biggest difference between
the treatment and control
groups what do you
think if this is the control group right
here and you have lowest M highest which
gives you the biggest difference
between the control and the
treatment no
thoughts okay somebody said highest
lowest why do you say highest
is farther from the control group
exactly so it's so is it's just we're
talking about the absolute difference
now right so yes this is this is the
farthest difference right so yeah you're
absolutely correct this will give us the
biggest difference now the question is
which one should we not
use if you were to say that there's one
one of those that we should not use at
all which one would that be
okay we should and why why should we not
use the
highest yeah see some of you said higher
we should not use the highest right but
why it will likely give us uh less uh
representative or comparative value with
the sample or control
um you're right but not for the reason
you mentioned you said it will not give
US Representative and maybe you are say
trying to say the right thing but um but
I kind of want to say a different
way
okay um let's say you are the the the
director of FDA right and pharmaceutical
company comes sorry I keep saying
talking about phac not because it's for
any reason but I think the examples are
more more dramatic and they say that oh
this drug of our this is the big
difference we're going to deliver to you
and then another one comes and says oh
this drug well we believe we're very
confident this is the this is the this
is the difference that we can we can
deliver which which claim will you
rather go with a big difference or a
modest
difference I remember this is just a
promise
sorry a big difference oh you go the big
okay okay let's say let's say a big
let's change the word now let's say
instead of saying difference a big claim
versus a small claim which one will you
go
with claim small one because the largest
is most likely to be not
true okay let's look at it another way
right let's see they let's say a very
dramatic Claim by the pharmace company
says this drug oh this drug will ra
there or this drug will just cure a
headache right which one is more
believable the exaggerated claim or the
realistic and modest
claim the modest claim the
modest modest yeah yeah that's why in
public health and Medicine in general we
want to a on the side of caution we
don't want to believe your drama right
so we would rather we rather believe the
things that more believable right and
that we say in general we say that we
want to be more conservative
that's a technical or professional way
of saying it right science strives to be
conservative and by conservative what we
actually mean is that we would rather
believe in something that is modest but
believable than something that just
feels like hype right um science is not
one to like believe in hype that is
realistic like somebody clearly said
yeah so that is why when you are
calculating sample size right we rather
go with the average of all of them or
the lowest right and that's why on the
kis app too right we say this first
indicator says if you have multiple
treatment arms provide the lowest
prevalence of the treatment groups or
the average of them all what we're just
saying is that please don't give us this
number give us the average or give us
the lowest because this one will end up
you know you the claims are just not not
consistent with reality and that's why
we just want I they may be true but it's
you know we it's more likely that this
is realistic and this is realistic so in
general you want to stick with the you
know the the lowest or the average and
so the sample size here is the same
thing right if we go back to the picture
so here we have group a the prevalence
is
50 and here we have the prevalence of 25
for Group B and the prevalence of 37.5
for Group C so the average of 25 and 37
um I'm not sure what that number is but
let's say we'll put as 30 we'll put the
other parameters that we need here we
have three arms we're going to put three
here and we will get
results and so this is what we need for
our study we need 187 people in each
group or 561 for all three of
them any questions on that let me stop
and take
questions I think that is a bit
straightforward but if you have
questions please you can type them in
the box and we'll answer
them okay none great thank you thank you
so much could we U try cating for two
groups okay if you have just two groups
like a control and treatment group yeah
yeah yes okay in that case we don't need
we don't need this formula we can just
go
to
um oops sorry loog
out so in that case we just go to
comparative study with two
arms the prevalence here is um let's say
we said was 50 I
believe the the prevalence of outcome
among the control let's say was
30 so you see here it doesn't ask you
about the lowest of treated or the
average of them all it just say what's
the prevence among the treatment group
because there just one treatment
group um the level of confidence desired
is a test one-sided or two-sided we say
two-sided because two-sided expects that
if we are looking at a a a superiority
trial um we are saying that it could be
that the treatment is better off than
the control or it's actually worse off
than the control right um so that
acknowledges that it could go either way
of course a two-sided test needs more
power is Imagine like you're driving a
car and you want the road by for kids
and you want to go both ways that means
you have to drive to one side and then
you have to drive to the other side you
need more gas right so the same way if
you want to do a two-sided test you need
more you need more sample size to do
that and then you will then you know put
your response rate and you get results
so that would be how you would calculate
that if you had only two groups because
then you don't have to worry about um
multiple comparisons the formula is
drastically different as well for
calculating sample size between two
groups versus three or more groups and
that is because we don't have to worry
about one parameter called the
non-centrality parameter um the non-
centrality parameter is simply a it's
it's it's imagine you have a bunch of
keys and you you are trying to to you
know you you only have one chance to
open a treasure chest with this Keys
think of a non centrality parameter as
this stuff you have on the keys that
helps you identify which one is is is a
COR correct key right um when we are
doing sample size calculations for multi
AR studies we absolutely need to figure
out the noncentrality parameter which is
a very complex statistical procedure to
calculate the NCP for a given number of
arms right that is what makes s size
calculations for multiarm studies quite
complex because because yeah because
it's just complexing of its nature
because there are so many things we need
to account for that we will not even
worry about if we were not having
multiple arms right that's why you can
see the formula for a three AR or more
stud quite complex because we need
there's a lot going on into those
formulas is that
clear yes thank you so much all right
you're welcome um
okay in equivalence trial okay that that
is a the power can be anything the power
can be 80% so we definitely recommend
the power of between 80 and 90% but
we'll come to equivalence trials in a
minute um because those are quite
different from superiority
trials all right if we're all happy with
this part can go to the next question
which is
the um cluster and M trials now this one
is different the question here was
I am doing a sample size for a three
armed trial to assess which type of
delivery methods lead to higher
ownership of mosquito Nets in an African
Community all I have found on the
internet and in books refers to two arm
trials and I can't find any information
on calculations for multiarm
trials so the issue here is it's not
just the prearm trials it is also the
issue of cluster this person just Tri
this sentence there and focused on the
MTI arm study but a bigger concern here
is even that it's cluster so this is a
cluster randomization right in in in
most trials what you have is that you're
randomizing individuals at the
individual level to treatment or control
in a cluster randomization you are
randomizing groups and why might we want
to do that well because you're trying to
it is sometimes just logistically
impossible like look at look at the
example they gave here right they are
trying to see which kind of
Transportation method delivers um the
highest net to a community so it you
know if if a community is by the
Riverside logistically it just makes
sense to deliver it by both right and or
you might just have a contract with the
truckers and say please this is your
route just deliver this net every day to
this community it is just logistically
easier to to assign the treatment you
know in groups as opposed to saying this
guy in this house will receive it by
boat then the guy in the next house will
receive receive it by Lorry or truck and
then the other guy will receive it by
bicycle that results in a lot of it's
just logistical nightmare right and it
also it also results in contamination
because if the guy who is supposed to
receive it by boat doesn't get it in two
days he he would just go to his neighbor
and say guy how how do you get your net
right and how can you connect me with
that guy who is delivering for you now
you already have contamination in your
study so in some cases you have to worry
about cluster randomization now the
problem with clusters however is that
you have intra cluster
correlation people in the same cluster
will tend to behave alike talk alike and
you know so the question is how are you
sure that the differences you have seen
between Villages a B and C are because
of the treatment and not because of just
the
similarities or differences as a group
right um because people don't live in
group Village a or Village B at random
they chose to go there and so they the
intra intal correlation makes it a
challenge which we need to account for
specially right so that is what is you
know so in this kind of study we have to
worry about two things now we have to
worry about the fact that it's multi AR
there are three arms and then in
addition to that we have to worry about
intracluster correlation right because
what what what intracluster correlation
does is that it it it kind of reduces
the diversity of our responses
right because if people in the same
cluster talk alike behave alike do
everything alike if we ask 100 of them a
question think of them as being clones
of each other all right if if if
everybody in the village is a clone of
each other
or let's let's make it less dramatic if
every second person in the village is a
clone of the first guy and we've
administered 100
questionnaires what we actually have is
not 100 questionnaires but
50 because they are literally clones
every second guy is a clone of the first
guy right and and so that's a problem
with intracluster correlation it the
responses have you know um less variance
you know they as as opposed to they
being at random at at the level of
individuals so we have to account for
that when we are calculating sample size
for for
clusters so again put very s simply the
Apple doesn't fall far from the tree
again emphasizing the idea that people
who are from the same cluster may share
very similar traits that may be
different from people from other
clusters so how do we account for this
we do that with a construct called the
design
effect so put simply the design effect
is a multiplier that kind of like
penalizes clusters for having higher
intra cluster correlation so the
punishment comes in the form of
increased sample size basically so if we
say a cluster has an inra cluster
correlation of 1.3 that means that in
simple terms we have to increase the
sample size by 1.3 right but in more
technical terms what we mean is that the
the variance of a if had they have we
done a random sample not at level of
groups the variance will have been 1.3
Times Higher with a simple random
selection than what we have with the
cluster but take message really is that
the design effect will have to account
for the reduced diversity of responses
by increasing the sample size by a
factor of a design effect so like I said
design effect of two means that double
your sample size a design effect of
three means you have to Triple your
sample size and so on and so forth the
design effect cannot be less than one
and you know one actually is the design
effect for a simple selection of
individual random selection of
individuals so the design effect has to
be more than one in a
nutshell so how do you get the value of
the design effect well you can you can
you can guess the value you can just put
in a placeholder like 1.5 a value
between 1.5 and four um so you can again
that's where sensitivity tables come in
right can say at the design effect of
1.5 this is what the sample size I get
at the value of two at value of 2.5 the
value of three so that's one way you can
do it if you are very adventurous you
can calculate by
yourself the design effect is is very s
very simple to calculate actually this
is the formula the formula is you know
um 1 plus the delta multipli by n minus
one where the the Delta is the inra
cluster correlation and M is the average
cluster size of course this is a very
simple formula that assumes that all the
Clusters are averagely sized there's a
more complex formula that we use when
the Clusters are radically different
sizes but that's talk for another day so
um so once you get the average size of
clusters you just plug plug that in and
you calculate the the design effect
now question then is how do you know
what the intracluster correlation is um
this is a very simple way of thinking
about it um you can think of it as the
ratio of what is the probability that
two people that are randomly selected
from the same cluster will have similar
responses due to chance divided by the
probability that two people random
selected from different clusters will
have similar responses due to chance so
in other words was I'm saying that if I
pick two people
from okay if I pick two people right
let's say I have cluster a and cluster
B if I selected two people from cluster
a what is the what is the chance that
the responses they give me to a
particular question will be similar just
purely due to chance right and then what
if I picked two people now and again but
one from cluster a and one from cluster
B what is the what is the probability
that their responses will again be
similar purely due to chance um so
that's how you can estimate it but in in
in in practice nobody calculates design
effect they just use default values like
um
1.52 you just come up with a sensitivity
table and just plug in values more than
one more than 1.0 so you can start from
1.5 and you can just look at the the the
possible impact on sample size so um
unless you really really adventurous I
wouldn't worry about um all of that of
how how how am I going to come up with a
value for the intracluster correlation
this is just providing with a
theoretical aspect so that you you
you're aware of what's going on um so
but that that's the design effect in in
essence so let's let's go over and let's
calculate that sample size
for for that
um yeah they of course the lar the sever
factors that will affect the intra
cluster correlation it would depend on
the size of a cluster for example it
would depend on several other
qualitative reasons qualitative factors
like are they is the cluster like a
family how well do they know each other
so it's not just pure SCI right just
think of it like just everyday
stuff why would two people give you very
similar responses well depends on if
they've known each other for how long
they live together whether they are very
good friends the level of relationship
they social networks they shared values
I mean all of those things come into
play determining whether or not um the
respon will be similar so it's it's not
just some esoteric stuff it's just
day-to-day stuff right um so you can you
can think of a 100 things why reasons
why people from the same cluster will
give you similar responses um so let's
let's calculate sample size now
for that so in that case we come here
we'll come to CL randomized trials we'll
select three
arms um
with percentage as
outcome and then for let's look at
our
our parameters
there okay we we don't have any but so
we can just you know just go through
some hypothetical numbers so let's say
the prevalence of outcome among uh
control group is 30 again lowest
prevalence of the treatment group we say
30 it's 50 and then 30 level of
confidence number of arms is three aha
this is where you see the design effect
this is one the one thing that is new
here we we're accounting for the design
effect here right and again we provide
an explanation of what a design effect
is we say that let me expand this the
design effect is a multiplier that
accounts for
clustering in in service penalizing them
for similarities within clusters
so in simple random surveys with no
clustering the design effect is one
which is the lowest possible value but
you can put values between 1.5 and and
four this is not a survey but the
principle also applies here design that
effect applies wherever you have cluster
selection whether it is in service
whether it's in cluster mind trials or
wherever so you have to account for that
so like I said earlier have a
sensitivity table where you say Okay I
effect of 1.5 the sample size is this at
a design effect of
two the sample size is this right at a
design effect of
2.5 the sample size is you know this of
course as the design effect increases
the sample size will increase as well
because that is the whole point of a
design effect it's it's penalizing um
the clusters for being so similar to
each other and so you have to increase
the sample size
yes design effect applies to
observational studies as well so if you
are doing a survey a cross-sectional
survey and you are doing cluster
sampling so bottom line wherever you
have cluster selection of anything you
have to worry about design effects how
about that so whether it's an whether
it's whether it's an experiment like a
cluster randomized trial or whether it's
a survey where you're selecting clusters
as far as you're selecting clusters oh
you have to worry about design effect so
I hope that is that is clear
all righty now let's go back to the
other
scenarios the scenar other scenario was
um how sample what demonstrate the fact
that sample size is agnostic of
population size right
um so the point is that whether you are
calculating sample size for one million
people right or for 100 million people
it doesn't matter so this is the example
I used here if I have to calculate a
survey sample size for for Boston which
is a city or Massachusetts which is the
state or for the United States which is
a country or for the whole world it does
not matter the population right so the
sample size will always be the same so
sample size is agnostic of the
population size so let's let's show that
for example example I have a
cross-sectional study with um percentage
as outcome let's say here the population
size is 1 million right
oops 1 million and one let me one
million
people so I'm I'm I'm not I'm not going
to change anything I will just change
the population size so we'll see I get
results
481 I add an extra
zero get results 481
add another extra zero get
results
481 like I can keep adding all the zeros
in this world the sample size will not
change it will be 481 481 sample size is
agnostic of population size right now
that that is as that is at a point right
the question is at the point let's put
the sample
size H 347 now what has
changed the population side at so that's
why we say at a
point and what is that point um let's go
back to our
slides this is this point
here the Assumption the underlying
assumption in sample size calculation is
that the sample which is this is
infinitesimally
smaller than the population right we
that's that is the Assumption we're
going on when we Cal sample size that
this sample is so so so so small
compared to the population now what we
can't just say inally smaller we have to
go with the number right in science we
always go with the number so the
question is what do we mean that it is
it is INF tally smaller well if your
sample is smaller than 5% of the
population then anything above that does
not matter right um so as far as it is
smaller than 5%
whether you you can keep on increasing
the number forever and ever it does not
matter the problem happens when your
sample is 5% or more of this so in other
words we're saying that when your
population this sample is no longer
infinitesimally smaller right when it
now becomes bigger oh now we have a
problem right because this assumption
has been violated is no longer inally
smaller it is it is 5% of bigger right
now we have to now correct for the
violation of this assumption remember
the Assumption was that there is
infinite there there this population is
infinitely smaller than this but it is
no longer infinitely smaller in fact it
is finite you know it is is finite means
that it is fixed right it is it is so
large now that we like this assumption
is
violated and so we have to correct for
it and that is what we call in
methodology finite population
correction so if you look at the sample
size formula for a survey you can see
there is no population size here there
is no saying you know denom of a
population the only reason why we ask
you to provide the population size is so
that as we are calculating the sample
size we are trying to compare it to the
population and we're saying has this
assumption been violated or not so
whether it's 1 million people or 1
billion people or 10 billion people that
sample size that population is still
smaller than 5% as far as does not cross
that threshold the population can keep
going bigger and bigger and bigger and
it does not matter the only time it
matters when is when your population
size is rather small right and we can
say that wow okay this sample is no
longer infinitesimally smaller than the
population size then that is when the
funeral population correction kicks in
and that is the only time population
size Matters is when you have a rather
small population and you have to draw a
sample right and so that is why we
provide a default value of 1 million and
that's why in the hint here we say a
rough estimate will suffice if if un
none please enter 1 million and so um
sometimes people get themselves into a
bunch they're weing a gosh I want to
calculate sample size for a country and
I I don't know the population of the
country it doesn't matter just put any
big just put the million that's fine it
doesn't really matter whether the
country has 25 million people or 300
million people it makes no
difference are there any questions on
that is that
clear okay
question from 1.5 as a oh okay an answer
to somebody else's question
all right let's move on then um sample
size
four yeah thank you so much uh how do we
Define a finite population and I've seen
uh some sample sizes where they first
calculate using the casely the common
crosssectional studies and then they
control or they they recalculate for
finite population using a
formula how applicable is that
any decent suiz calculator should do
that automatically so you don't have to
worry about
it oh okay on the platform thanks yeah
yeah so any any any any um any app that
is worth it should those are things I
should do automatically in background
for you so of course that's done on the
K platform
too sure thanks you're welcome
yeah um any other
questions no no no please I'm I'm not
saying I'm not saying the samples should
be 5% of
population in fact I'm not you should
not worry about that 5% number I'm just
trying to explain to you the what goes
on behind the S the sample size should
be what you calculate from the app I'm
just telling you that there are some
adjustment that goes on in the
background right and the 5% in fact
please forget about the 5% right you
don't need to know
that I'm saying that the finite
population correction kicks in when we
determine that your sample size is 5% or
larger of the population right that is
the only time the population size
matters so when we ask you in the sample
size app what's the population what's
the size of a
population you know you have to realize
that the app the sample size formula
does not have n in it there's nowhere in
that formula where we say Let me let me
go back to this
there's there's nowhere in this in the
in the formula that says population size
no it's not so population size does not
even come into play when we are
calculating samp it's not needed we only
use it to correct or definite violations
of of the infinite assumption and that's
what we call the FPC infite population
correction because now your sample is
finite related you know your your your
your population is phite related
relative to the sample we assume we want
to assume at least statistically that
your population is just infinite it just
goes on and on and on and on so that
your sample is just a very tiny
proportion of that infinite population
but should we now deter that oh this
assumption is violated this population
is long as infinite as we assume is
actually quite finite then we have to
calculate we have to calculate the
correction for that which is what goes
in there so no I'm not saying that your
samp size should be 5% of your
population you should come into the app
and calculate your sample size and then
we will determine whether the assumption
is violated and we will so this number
you have you have here adjust for
anything that needs to be adjusted so
you don't have to worry about all of
those but this is an academic
environment where we're talking about
not just how to it's not just a demo
we're also talking about the the
technical considerations right so that's
why those discussions come into play uh
um I hope I I hope that distinction is
clear okay all righty
um all right let's go to the next
calculation which
is why am I having this stuff on my
those lines how does that happen I'm not
sure what's
happening okay
great um
I am very perplexed about this line this
red line I'm not sure what that is but
anyway that's
fine they sample size for a
retrospective cohort
study so a retrospective cohort
study uh and a prospective cohort study
and a case control study all have the
same sample size calculation so if you
know one you should know all um but I
thought I was just explain it difference
between those three terms what they mean
because people get confused about them
all the
time in all three of them we have
exposure exposure exposure the exposure
always comes first right exposure will
always come before the outcome so in in
a prospective coh study exposure comes
for the outcome retrospective coh study
exposure comes for the outcome and the
case control study exposure also comes
for the outcome so the outcome comes
last because that that is how that is
how things happen in nature you are
exposed and then you get the
outcome what differs between prospective
cohort study and respective cohort study
is the um they you know like where we
are as the
investigator so in a prospective cohort
study the investigator is
here and they were right there you know
imagine like let's think of it as you
know you are you are cooking a meal
right if I if I if I have a
world-renowned chef and I ask them to
come and cook a meal and teach me how to
cook a meal and they come to my house
and I'm right there with them while
they're cooking like they bring some
ingredients which are the exposure right
the goal is to cook a dish which is the
outcome and I am there with them right
from when they unpack the ingredients
and I'm watching right there with them
and I can see how they're putting things
together and everything until we cook
the meal that is a prospective C study I
am with the chef right from when the
chef started that is a prospective
cohart study in a retrospective cohart
study the chef is there he's cooking his
thing now I have to go to to a meeting
so I say oops I'm going to just set up
the video to record you while you're
cooking then when I come I will go back
and review this
video so what's happening the rospective
qu studies that the exposure was there
proceeded and the outcome was formed
then when I come back
I am going back towards to towards time
I'm reversing time via the video and I'm
now saying oh and mom watching us oh I
see how he put things together that is
what a retrospective coh I came at the
end but I'm trying to reconstruct Time
by going back to the video and seeing
how things
progressed in a case control study the
chef is there unfortunately there's no
video recording and I only have the dish
at the end I have to now take this dish
and try and Peck pie it out and figure
out what went into it because I was not
there when the guy started he did not
leave me any notes about these are the
ingredients that went in there I was I
did not set up any video recording right
all I just know is I came and met dish
and I have to try and figure out what
went into this dish and I'm I'm taking
the part I'm like I think this this
looks like onions oh this looks like
this right that is what a case control
study is a case control study you have
the outcome all you're trying to do is
figure out what what the heck is the
exposure that leads to this outcome
right in a retrospective cohort study
the outcome has already happened right
the exposure was there and you're going
back in time you're going back to
records and you know Hospital knows and
you're trying to reconstruct the
exposure as though you were there back
in time and in prospective cohort study
you are right there when the exposure
started and you're falling along in time
to see how the outcomes occur and how
differ between the exposed and
nonexposed groups right again all three
studies are trying to look at the same
thing what is the exposure between
what's the relationship between exposure
and outcome it's just how they're doing
that that is different and also where
the investigator starts
from sample size however calculation for
all three is exactly the same any
questions all right so now let's see how
we calculate sample size for the three
of
them so um you'd come to um comparative
study with two arms percentage as
outcome again this is the exact same
thing we talked about earlier you
provide a prevalence of outcome among a
control group you provide all of this
and again you get results it is exactly
like what we discuss so a a parallel arm
randomized trial this control study
cohort study and a you know whether it's
perspective or retrospective
exact same sample size
calculation the one thing I will not is
that for case control
studies we recommend using a ratio of
four right for the ratio of controls to
cases right that is what that is the
level at which you have optimized sample
size for controls versus cases anything
above four you are really just wasting
your time because it won't generate any
better you know any any better power so
that is the recomend ended level um for
sample size
calculation how can we identify the
appropriate study type of a paper public
by someone else how can we report study
that follows this
designs um I'm trying to understand what
qu what your question is how do you
identify the appropriate study type now
please there's a huge one of the things
you want to avoid is onp med you can
always ask
there there's just a lot of blunder on
some people don't know the study they're
conducting right if you search popet for
something like
um let's
say so
um when you search something like
um pop Med for let me put this in exact
quotes so that gives us the exact result
um because I just want to address that
question you you asked
there so you know um who who can who can
iny a problem with this study here
specifically I'm interested in this this
part here
I think you can't have a prospective
case control study there are just so
many things wrong with this right it's
like a prospective TOA longitudinal like
what the
heck this things this does not exist
like you are just making stuff off I
think the problem and this I'm trying to
respond to the question of um somebody
asked how do I determine the kind of
study that somebody's did please don't
rely on what they said because they
don't know what they're talking about
they don't know the study they did
themselves and unfortunately right this
is not a rare thing like you see like
thousands of studies having this you
know um again you see things like you
know um prospective case control study
you see things like um I'm sure like if
you search other things like you know um
Interventional oops
but then right even here you see things
like terms like a retrospective case
control studies like this control
Studies by definition I retrospect
it what do you mean what do you mean
retrospective case control is there one
that is not
retrospective is funny and then you see
things like Interventional case control
study it's like what in the world are
you talking about that does not exist
but by the way who who who who can tell
us why this is a huge problem like why
why is it such a wrong thing to say
Interventional case control study why
does that make not make any single
sense who can help us out here you can
intervene what has happened
earlier his contr studies are they
observational studies or are
experiments these are these are
basically observation studies and not
experimental studies so unless we
talking about a randomized controlled
Tri or maybe basically quasi
experimental design but not a case
control so it's purely an observational
study thank you so when you say the word
Interventional what does it mean you
don't assign
exposure so intervention means an
experiment yeah so you see why we have a
problem why you describe your study as
an Interventional case control
study like what what are you smoking
like that doesn't make any
sense you know what I mean so um so back
to that question of how do we know um
and then you also have get words right
also the Other Extreme people describes
I mean um and this unfortunately these
are not just um in small journals even
the even in the even the biggest of
Journal you see things like this this
this one is describing this as a
descriptive case control study what's
the problem with
this so all Studies by definition are
either descriptive or analytical
descriptive means that you don't have
groups being compared an analytical
studies is where you have two groups
being compared case control is obvious
in the name the you're comparing cases
and you're comparing
controls that should obvious that it's
not a descriptive study it's an
analytical study right so going back to
that question of how do we know what
stud you know the question was how do we
know this the design the person is using
please do not rely on what they wrote
half of the time they're wrong so you
have to know that's why you yourself
need to know those things so that you
can know where the study whether what
they did right um and again don't don't
just imagine that because it was
published in the landset it is somehow
correct you'll be
shocked so you have to understand what
the studies are and and then you have to
make sense of just what the design is so
so that is all I can say to that
question really other than just like you
have to understand it yourself um but
the bottom line is that um whether it's
prospective cohort or respective court
or case control um sample size
calculations for those are the
same now let's go on to um another set
of studies um non-inferiority trials um
this I I I I thought I would pay pay
some attention to this because the
sample size calculation for
noninferiority trials are quite
different from from superiority trials
generally when people say people would
say I want to do a randomized trial but
that is not enough right you you want to
be more specific what kind of trial are
you doing is it a superiority trial and
this is how how you um for those of us
who are not familiar with with it I just
thought I would provide a visualization
of what the different types of studies
mean so we have superiority trials
equivalence trials and non-inferiority
trials right so in general when you hear
people say randomized trial they are
actually talking about superiority
trials in in most cases although you
know that's an assumption because maybe
they don't even know what trial they're
doing
themselves it's not impossible so but in
general you have a sense that when
people say randomized trial they're
actually referring to Spirit trial so um
so for those of you in the audience
please you you want to mind your
language try to be as precise as
possible when describing things so if
you're talking about a superiority trial
well you should say a individual romise
parallel arm superiority trial right
that tells us everything we know we need
to know so superiority trial is saying
that I am better right so that's why you
see that the Delta is in One Direction
only we're saying uh that you know the
the treatment is is better the nor
hypothesis is explicitly assessing that
the the the treatment is better than the
control that's what that's what goes on
the spirity trial and the null
hypothesis this is very important and
this impact sample size calculation
dramatically the null hypothesis in a
superiority trial is that the new
treatment is not different from a
standard of care
or that the difference in means is zero
or difference in prevalence is zero you
can frame that in a million ways but
we're just saying that the the the new
treatment is not different from the
standard of care right the alternative
hypothesis is saying that no the new
treatment is actually better that's what
the superiority trial is saying now
let's go over to the Other Extreme which
is the non- inferiority trial the non
inferiority TR is saying I am not I'm
not worse so it's saying okay this is
the Delta right to say I am wor right
and it's saying that at least I am not
I'm not going the other direction right
so a so true or
false um non- infed trials are one-sided
hypothesis true or
false true exactly uh true exactly
you're right yes um oh coh studies
cohort studies are analytical we are
comparing two groups remember exposed
and nonexposed I just saw the question
pop up so I thought I answer it so in so
non inferiority trials were saying that
oh I am no worse than this standard of
care now take note that the null
hypothesis in a non inferiority trial is
that the new treatment is inferior to
the standard of care take note that in a
superior trial we're saying that the new
treatment is not different from the
standard of
care why would it why will it be a
dangerous thing to have this null
hypothesis
here if we said the null hypothesis in a
non-inferiority trial is how do I get
this into is this
right let's say we we we we somebody
says that you know someone who does not
understand the nuances of study design
says that the nor hypothesis because
they're just used to nor hypothesis
being that oh the means are the same why
will this be such a dangerous dangerous
thing to do to say that the nor
hypothesis in a non-inferiority trial is
that the two treatments are equal who
can anybody think of a rational why that
would be the most dangerous thing in the
world
okay when we when we don't detect the
treatment effect right remember we fall
back to the null hypothesis so that
means that in a superiority trial it
will fail to detect that the null
hypothesis is true that the alternative
is true we will then conclude that we
will then accept the null hypothesis
that indeed those two treatments are not
different from each
other do you understand that that's
what's happening right we the
investigators are hoping and praying
that my gosh they not the alternative
hypothesis should be true and the
alternative hypothesis is that this new
treatment is better than the standard of
care or than the control that's what
they're begging and praying that we want
to find that there is superiority but if
we can't find a spirity we are going to
just say that we are going to go ahead
with the null hypothesis which is that
the the the treatments are not different
from each other now if we if we set the
the the null hypothesis of a
non-inferiority trial to be the that the
treatments are not different from each
other well that is what the drug
manufacturer wants to see they want to
see that there's no difference in
treatment right and under what under
what circumstances can we
conclude that there is no difference we
talked about this earlier what is one
way that you can result you can generate
results that say there's no difference
who remembers
this is type two error
remember under what circumstances can we
create type two
error when our sample
sizes we have size sorry small sample
size more sample size so imagine a
scenario right I am I'm the maker of a
pharmacetical um drug right let's say a
drug for for a New Drug a drug for let's
say treating
hypertension and I want to make a claim
that this new drug is no is no worse off
than the current
treatment if I'm allowed to rig this
with this hypothesis right what what is
that what is it I I so easily
do I could set up a study that has very
small sample size
right and then if all I all I end up
doing is saying oh yeah the nor
hypothesis is true that this new
treatment is not is not different from
Stand of care do you do you see why that
is so
dangerous or you still don't get
itat could as well use the new treatment
since they are not different from each
other yeah and that is why in in a non
inferiority trial we we we have to make
sure that we disincentivize you from
deliberately designing a bad
study because you don't want the null
hypothesis to be true because the null
hypothesis is that yeah your treatment
is
worse now you have to make sure that you
get a very large
study to prove that you're not
wor is that is that
clear and then an equivalence trial is
like okay it's equivalent you know um we
have two boundaries minus Delta and
Delta and so but in general they they
they most confusing one to calculate
generally is non inferior to trial
because of the the reversal of the
hypothesis now how do we determine the
Delta the Delta for a non-inferiority
trial there are many ways to do that we
can use sometimes FDA would say
recommend that this is the Delta we
could use the so the Regulatory Agencies
could propose
a but okay
um but there's a question here but also
that would mean H1 and H not will be the
same no so it's not the same
um it sounds similar but you know so the
N hypothesis here is that new treatment
is in
inferior right that's nor hypothesis for
for a non-inferiority trial the
alternative is that it is not inferior
that's why we call it a noninferiority
trial um and so you are the goal is to
really decentivize you from um from from
oh okay just okay I understand what
you're saying yes I agree with you yes
if you allowed to do that exactly
exactly yeah because that's essentially
what the guy wants to prove he wants to
prove that his drug is non inferior so
if you were allowed to do that he like
yeah just do a study with five
people and so that is why you are not
allowed to do that that's why the
hypothesis for a non-inferiority trial
is quite different uh I want to make
sure I'm not missing any questions so
let me quickly
um okay all right no questions so
the Delta here in so in calculating the
sample size of a of a yes no I wanted to
ask you to go back to slide mod all
right okay so in a non inferiority trial
right the biggest problem we have is
calculating the value of delta so the
Delta is what margin does this stud have
to fall in that we can say they not they
not it's non inferior so that margin
that Delta we can get in many ways FDA
sometimes Weighing on that regulatory
genes sometimes we convene a group of
experts if it's a if it's a if it's a
drug let's say our previous example a
drug for hypertension well who who who
better to decide that than clinical
experts we bring in nephologist we bring
in a panel of medical experts and we
tell we ask them guys what will be the
performance of this new drug that you as
clinical experts will say that this drug
this is a zone of non-inferiority so we
call this this is the value Delta and
this Zone we call the zone of
non-inferiority right so we can our
medical experts can tell us or we can
look at the result of the the initial
trial and take it lower confidence
interval and say okay at least you
should meet this lower confidence
interval right uh for you to be
considered non- inferior so there are
many ways there's no right or wrong way
but you know there are established ways
of determining the zone of
non-inferiority which is what we use in
in Trials so again to calculate that on
the kite squ app you simply come over
to
um equivalency trials again calculation
is the same um but what is different is
the zone of
non-inferiority you provide a prevalence
of outcome among the control
group um you indicate the the respon and
the power and the other things as before
but here this is the key thing here the
the absolute prevalence difference this
is the zone of non-inferiority we're
talking about so if you say that I
expect you know um
20% um difference and take note that
here in this non-inferiority trial you
are not asked to provide whether the
test is one-sided or
two-sided because we are calculating it
as a a a two-sided test you know um so
that that sorry as a one-sided test in
the non- inferior Direction so um that's
why you don't have to specify whether it
is um um it is one-sided or
two-sided so you um you provide the the
the values and so this tells you that
you need one 34 people in each each
direction right and that that all in
total you need you know 268 people in
your in your trial um one calculating
for a a an a a an equivalence trial
remember that the main difference
between an equivalence trial and a non-
inferiority trial is that they exist in
two directions the the hypothesis
testing is in both ways right so that
means that it's this is a two-sided test
this is a one-sided test that is the
major difference between um the two of
them what do we do for studies where we
don't we don't have the prevalence of
outcome in control um if you don't have
the prevalence in the control group you
have to you can use the prevalence in
the population you can use from other
studies
um or you can ask an next clinical
expert to provide you with the most
educated guess remember that sample is
all about guessing it's a guess work but
it's an educated guest so better to have
something than nothing so you can have a
group of experts to say okay um you know
what do you think and then you have a
sensitivity table that looks at
different
parameters that that cover the reason
reasonable expected level right and
that's so that's what you could you
could
do um
let's see we'll talk about class
clusters the last sample size here we'll
talk about
quickly is for diagnostic
test so diagnostic test essentially are
sometimes you come up with a new test um
maybe you have a test for Ebola or covid
or um whatever it is and the question is
how many people do we need to do this
test in to validate it because when
you're doing a test you'll have to
validate the test so imagine that you
are a cro you a contract with such
organization and Company a biotech
company comes to you and say hey we have
a new test we've created for like a
rapid test for XYZ a rapid test for
pregnancy a rapid test for covid and we
want to want to test and validate this
um this new this new thing like so
please can you design the study for us
so essentially you what you have to do
is determine what number of people do I
need to validate this thing that's the
first thing right
um so that's what we are talking about
here so here is a the case study here
you are designed a study to evaluate the
accuracy of a new screen test for a
disease if the prevalence of a disease
is 10% the sensitivity of a test is
expected to be
90% And
The specificity is 85% calate the sample
size need to estimate these parameters
the margin of error of 5% at 95%
confidence level so again it you know
you just have to deliver for this
company and then say okay um here is
what we need so let's do that
quickly um when we view the World
Through The Eyes of a test the whole
world is either positive or negative so
even if your results come out at
indeterminate well you still fall under
one category you know we just don't know
what category you're in right so that is
very important to understand this is
just a quick crash course on what
sensitivity and specificity is so in
this case this is you know we have
individuals who have the disease right
the happy faces means that they don't
have the outcome of interest um or let's
say the depends on whether the outcome
is a positive thing or negative thing
let's say the outcome here in this case
is a happy a happy stuff maybe you want
a lottery right or you you have some
good jeans you know sensitivity is
essentially saying what what percentage
of people with the condition or with the
outcome can your test detect so here we
have 1 2 3 4 5 6 seven seven happy faces
let's say that our test was able to
detect um seven of you know um six of
them six of the seven right the
sensitivity of our test will be six over
seven right that would be um oh sorry in
this case sorry I'm misleading you
sensitivity we're using a red guy so the
red guys are the guys with the the
unhappy faces unhappy faces are the guys
with a the bad condition so these guys
have let's say um some terrible disease
right and there are 1 two 3 four five
six seven of them our test has found six
out out of the seven so the sensitivity
of our test is 6 over 7 * 100 so this is
86% so sensity measures how many of the
people that that have the outcome your
test can
find and then um specificity saying of
those that do not have the outcome how
many many can you find but specificity
is as important as sensitivity too right
if I imagine I did not have a if I not
have cancer and I go and a test and not
tell me that yes you don't have cancer
that that is very disastrous right you
know so it is as important to detect
cases as it is to detect non- cases so
that's why we worry a lot also about
specificity so in this case we have 1
two 3 four five six seven Happy faes who
don't have the
outcome the test can only identify four
of them as not having a condition so
that means the specificity of a test is
4 over 7 which is
57%
so essentially there right sample size
calculation for the accuracy of a test
one requires us to know that the world
exists in two groups those who have a
condition and those who have who do not
have a condition and then knowing how
many people do we need to determine the
accuracy for
sensitivity and how many people do we
need to determine accuracy for
specificity
and then you put the two together and
that gives the total sample size you
need to validate a diagnostic test so
that's how we do it so let me show you
how you do that on the app you come
to um diagnostic
accuracy level of confidence the
prevalence of outcome now um again if
you don't know the prevalence you can
put in 50 why because 50 guarantees you
the maximum sample size
so they but in this case we know we we
we told that the prevalence of the
outcome is
10%
um 10% this is the value here the
sensitivity of a test
is um what was the sensitivity is
90% right and so you can you have the
values there that that tell you what to
do so
90% the specificity of a test is 85%
the margin of error is the 5% confidence
level is you know um 95% and the
anticipated response rate again depends
on how many people are going to answer
or not answer right and so in this case
we need 1,61 people in our study to
validate this this new test again this
is the formula this is the um part of
the formula that addresses sample size
calculation for the
sensitivity this is the part that
assesses sample size calculation for
specificity and I can see the final
sample size is simply the sum of the
sample size for sensitivity and the
sample size for
specificity and those are the parameters
there so any questions that's I think
that's that's the end of our lecture
that's all we
have so bottom line sample size is just
a an approximation right of of course
but you need to use the right formula
you need to you need to be aware of the
um the formulas it's I mean science
obviously um but this is General how you
would follow for the different study
designs
okay there's a question any question
yeah what do we do for studies when we
want to get the prevalence of outcome
and control for in interation
on I think we we answer it already you
said we use the population and if you
can't get in that population look at the
next population after we're not that
different after all and if you can get
the next population look well look F
elsewhere if you can get anywhere um try
and be as as conservative as possible
it's okay to
guess see how we be guessing 50% when we
don't know um but don't guess for a
number that will give you guess for the
worst case scenario if you're going to
guess use worst case scenario guessing
then you then that means that you're
being conservative don't just assume
that life will be fair to you and the
sky will be blue no no no assume the
worst case scenario and say okay since I
didn't get the value anywhere this is my
guess but guess is also robust because
we we are the worst case scenario you
know like I say aim for the Stars if you
miss you might hit the birds right so
yeah we we've aimed for the worst case
scenario so that we know we are covered
regardless of any scenario we find so
that's what you that's what you do
any other
questions uh hello
yes yeah so uh thank you so much for for
this um I just have just a small
question regarding the analysis as well
as the sample size you know some
statisticians guide that if you're doing
some descriptive analysis for instance
you're using a t
or you're using analysis of variance
then there's a particular sample size
you can use but then um I feel like this
all depends on the on the St design
itself and also what are the parameters
that you're looking at in terms of
what's the prevalence in the exposure
what's the prevalence in in in the
control and also uh in the placeo yeah
so my question is should we look at the
samp
proze based on the stud design or we
should look at the analysis that will be
employed because some of the analysis
like when you look at the T Test it's
more like just a descriptive analysis
but at the end of the day you need to do
more than more than a t test so should
we based our sample size on the
infuential statistics or should it be
based
on on the study design itself I think
that's just one thing that I want to to
to have to have it clear thank you okay
that's that's a good question right so
um so the point is this is why you have
to plan your studies carefully right um
and this is also one of the reasons why
whenever you're doing a study one of the
things you need to have is in place is
um what we call a statistical analysis
plan right a statistical analysis plan
tells us how the data will be analyzed
right so um you know if let you know if
I'm if I'm doing a study and I I have my
aims you know in one in two a three it
is not enough to write a protocol the
protocol says yeah this is how we're
going to collect the data the
statistical analysis plan should say
this is how we're going to analyze data
and for those of you who are who will
ever go into the world of business uh in
research oh you you you have to always
write whenever you write a protocol you
also have to write a statistical
analysis plan in fact we call it P up
protocol and statistical analysis plan
those two things go hand in hand right
and the whole idea is that it's it's to
prevent you from cheating it's not when
you collect data that you anoun start
say h now let's let's this is how we're
going to analyze data no right from the
very start you're going to analyze
you're going to determine this is what
the aims are this is how we're going to
analyze the data you can even come up
with dummy data and say these are the
kind of figures we're going to do kind
of test we're going to do right and so
when you anal when you now collect data
everything is done per protocol right
you're not changing the wheel you're not
moving the gold post um and and all of
that and so that that is that is a propi
way to so that drives the question
research question drives the methodology
including the status school test right
so if I determine that this aim I want
to compare the the mean levels of you
know blood sugar levels um among these
three groups right okay I know that okay
I this is this is the the the the sample
size I need already that has already
been accounted for in the calculation so
it's not that when I come back later
then I have to start moving a goal post
in reality however most times we just
inherit data sets that we had no hand in
planning for right so in that case you
just deal with what you have and that's
why for almost every statistical test we
have a parametric version and a
nonparametric version right a parametric
version is when you have a large sample
um like 30 or more people right um
because the the magic behind 30 is that
when you hit the sample size of 30 right
then you meet the requirements for the
central limit theory kcks in I you know
I believe almost all of us should have
heard of central limit theorem the idea
behind Central limit theorem is that at
when that assumption kicks in the it
doesn't matter the distribution of the
population at all right you know we
don't we don't we don't care so much
about the distribution so that is why we
go ahead and use methods for parametric
data um but at a small sample size you
know you can also use non-parametric
methods so um So my answer really will
be plan your study design it very well
develop a statistical analysis plan
especially if you're writing a thesis or
a dissertation right those are very
important it's it's just good practice
to develop knowing that okay if I'm
doing the study I have to write the plan
including a plan on how I'm going to
analyze data so that I'm not moving a
gold POS all the time so that the sample
size are calculated I know that it will
it will be sufficient for all I want to
do I hope that answers your
question okay there's another question
there is a sub part of protocol
methods um the protocol and you know
generally this the protocol and sap are
generally different documents that's why
we call it pup
uh protocol and statistical analysis
plan so I mean but there's no hard
there's no hard and fast rule right some
people can in general every protocol has
a small section where you say analysis
that is not what the analysis plan is in
this context a statistical analysis plan
is a much Fuller document that talks
about in detail how the analysis will be
done so it's not just that small
paragraph you have in your me in your
protocol that says oh we're going to use
descriptive analysis for the data and
and we're going to do regression
analysis it's it's much more it's much
more detailed document than that so you
can put it together but in general they
all they're typically different
different documents so you have a
protocol which tells you how data will
be collected the meth methods to address
issues of buyas in data collection right
and then the analysis plan statical
analysis plan talks about how will be
analyzed the data that you collected
with your protocol so they the two
documents then together they give you a
very um holistic picture of everything
about the study from the design
implementation and then the analys