Understanding the Gamma Distribution: Definition, Applications, Intuition, and Mean Derivation

Name: 39 - The gamma distribution - an introduction
Uploaded: 2026-02-14T13:10:53.568670+00:00
Channel: Ox educ
Description: Summary and key takeaways on Understanding the Gamma Distribution: Definition, Applications, Intuition, and Mean Derivation, covering What Is the Gamma

Ox educ

Feb 14, 2026

•

3 min read

YouTube video ID: J0Yzmb_PY3Y

Source: YouTube video by Ox educ — Watch original video

PDF

What Is the Gamma Distribution?

A continuous probability distribution defined for (Y \ge 0).
Parameters: (\alpha>0) (shape) and (\beta>0) (rate).
Probability density function (PDF): [ f_{Y}(y\mid\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}\,y^{\alpha-1}e^{-\beta y},\qquad y\ge0 ]
(\Gamma(\alpha)) is the gamma function, the continuous analogue of the factorial.

Why Use the Gamma Distribution in Bayesian Inference?

Modeling non‑negative quantities: It is ideal for variables that represent rates, waiting times, or any quantity that cannot be negative.
Prior for Poisson rate ((\lambda)): When the data follow a Poisson distribution, a Gamma prior on (\lambda) is conjugate, leading to analytically tractable posteriors.
Prior for precision ((\tau = 1/\sigma^2)): In normal‑likelihood models, a Gamma prior on precision simplifies calculations because precision is also non‑negative.

Intuition Behind the Two Parameters

Shape Parameter (\alpha)

(\alpha = 1): The PDF reduces to an exponential decay (e^{-\beta y}); the distribution peaks at zero.
Increasing (\alpha) adds a polynomial factor (y^{\alpha-1}) that initially pushes the density away from zero, creating a hump. As (\alpha) grows, the peak moves rightward and the distribution becomes more symmetric.
Examples:
(\alpha=2): PDF (y e^{-\beta y}) – starts low, rises to a modest peak, then decays.
(\alpha=3): PDF (y^{2} e^{-\beta y}) – higher, later peak.

Rate Parameter (\beta)

Controls scale (inverse of the mean). Larger (\beta) makes the distribution taller and narrower (sharper peak) because:
The factor (\beta^{\alpha}) raises the overall height.
The exponential term (e^{-\beta y}) forces a faster decay.
Visual effect: With (\alpha) fixed, raising (\beta) squeezes the distribution toward zero while increasing its maximum.

Visual Exploration (Conceptual)

Alpha = 1, Beta = 1 → Simple exponential curve.
Alpha = 2, Beta = 1 → Hump appears; peak moves right of zero.
Alpha = 3, Beta = 1 → Higher, more right‑shifted hump.
Alpha = 3, Beta = 2 → Same shape but taller and sharper; the tail drops off more quickly.
Computational tools (e.g., MATLAB) can plot these families to see how the PDF morphs with parameter changes.

Deriving the Mean of a Gamma Distribution

Start with the expectation definition: [ \mathbb{E}[Y]=\int_{0}^{\infty} y\,f_{Y}(y\mid\alpha,\beta)\,dy ]
Insert the PDF and pull out constants: [ \mathbb{E}[Y]=\frac{\beta^{\alpha}}{\Gamma(\alpha)}\int_{0}^{\infty} y^{\alpha}e^{-\beta y}\,dy ]
Recognize the integral as a Gamma function with shape (\alpha+1) and rate (\beta): [ \int_{0}^{\infty} y^{\alpha}e^{-\beta y}\,dy = \frac{\Gamma(\alpha+1)}{\beta^{\alpha+1}} ]
Combine constants: [ \mathbb{E}[Y]=\frac{\beta^{\alpha}}{\Gamma(\alpha)}\cdot\frac{\Gamma(\alpha+1)}{\beta^{\alpha+1}} = \frac{\Gamma(\alpha+1)}{\Gamma(\alpha)}\cdot\frac{1}{\beta} ]
Use the property (\Gamma(\alpha+1)=\alpha\Gamma(\alpha)) to simplify: [ \mathbb{E}[Y]=\frac{\alpha}{\beta} ]
The result holds for any positive (\alpha), integer or not.

Key Takeaways

The Gamma distribution is a flexible tool for modeling positive continuous data and serves as a conjugate prior for Poisson rates and precision parameters.
Shape (\alpha) determines the location and existence of a hump; rate (\beta) controls scale, making the distribution taller and narrower as it increases.
The mean of a Gamma((\alpha,\beta)) distribution is simply (\alpha/\beta), derived via a neat trick that leverages the Gamma function’s integral definition.

The Gamma distribution’s simple yet powerful form—characterized by shape and rate—makes it indispensable for Bayesian modeling of non‑negative quantities, and its mean is elegantly given by the ratio of its parameters, (\alpha/\beta).

Frequently Asked Questions

Who is Ox educ on YouTube?

Ox educ is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

What Is the Gamma Distribution?

- A continuous probability distribution defined for \(Y \ge 0\). - Parameters: \(\alpha>0\) (shape) and \(\beta>0\) (rate). - Probability density function (PDF): \[ f_{Y}(y\mid\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}\,y^{\alpha-1}e^{-\beta y},\qquad y\ge0 \] - \(\Gamma(\alpha)\) is the gamma function, the continuous analogue of the factorial.

Why Use the Gamma Distribution in Bayesian Inference?

- **Modeling non‑negative quantities**: It is ideal for variables that represent rates, waiting times, or any quantity that cannot be negative. - **Prior for Poisson rate (\(\lambda\))**: When the data follow a Poisson distribution, a Gamma prior on \(\lambda\) is conjugate, leading to analytically tractable posteriors. - **Prior for precision (\(\tau = 1/\sigma^2\))**: In normal‑likelihood models, a Gamma prior on precision simplifies calculations because precision is also non‑negative.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Gamma Distribution Textbook Recommended

Provides clear explanations and examples of the Gamma distribution, helping readers apply it in Bayesian inference now

Amazon →

Bayesian Statistics For Beginners Book

Introduces conjugate priors like the Gamma distribution for Poisson and precision models, essential for current data analysis projects

Amazon →

Probability And Statistics For Engineers Paperback

Covers continuous distributions including Gamma, with practical engineering examples relevant today

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

in this video I want to provide an
introduction to the gamma distribution
so what we're going to talk about here
is we're first of all going to Define
what is exactly meant by the gamma
distribution then importantly what we're
going to do is we're going to talk about
why might we use the gamma distribution
and we're going to explain some sort of
the
examples which sort of might be
appropriate to model using a gamma
distribution particularly looking at
sort of basian inference here then what
we're going to do is we're going to go
ahead and we're going to look at some
sort of
intuition as to how the gamma
distribution Works in terms of its two
parameters so we're actually going to
look at graphs of the PDF in this
particular example and we're going to
see how those graphs or those PDFs vary
as we vary the
parameters finally we're going to go
ahead and we're going to derive the mean
of a gamma distribution which is
actually fairly simple to do if you use
one small trick so starting off with
defining the gamma distribution we say
that Y is distributed as a gamma random
variable with parameters Alpha and beta
if it has a corresponding PDF which is
the probability of Y given Alpha and
beta as being equal to Beta to the^
Alpha divided through by gamma of alpha
Al so this is now a sort of gamma
function here which is the continuous
analog of the factorial function times
y^ Alpha - 1 Time e^ - beta * Y and
firstly we should say that this
distribution quite obviously is a
continuous
distribution and secondly we should say
that it's only defined for y being
greater than or equal to zero also we
assume that the parameters Alpha and
beta are both bigger than zero okay so
that's the mathematical definition of a
gamma distribution let's now talk about
why we might use a gamma distribution
particularly in reference to basian
inference so let's think about some
examples of sort of parameters or
situations which we might model using a
gamma distribution well seeing as Y is
always greater than or equal to zero we
might model y as some sort of mean
measure of a count
variable and we're saying here a sort of
a mean measure because remember that
because it's a mean it can actually take
on a non- integer value so particularly
I'm thinking here of when you might use
a gamma prior as a prior for the
parameter Lambda of a Pon distribution
so here we would say as our
prior Lambda is gamma distributed with
parameters Alpha and beta and just to be
absolutely clear it's appropriate to
model Lambda as a gamma distribution
because the gamma distribution is only
defined in this circumstance for Lambda
being greater than or equal to zero so
that's okay that's exactly what we would
expect a sort of mean count variable to
have in terms of its characteristics
another type of situation whereby we
might use a gamma distribution in basian
inference is if we are specifying a
prior for a Precision
parameter so Precision is equal to one
over the
variance and often we think in terms of
precision rather than invariances and it
turns out that in this circumstance the
gamma distribution is a relatively good
distribution to use because of its
conjugate properties for modeling the
Precision
and the Precision again is a parameter
which is always greater than or equal to
zero so the fact that our distribution
is only defined for positive values
makes sense in terms of a prior for
precision okay so let's now talk about
some of the intuition behind the graphs
of the PDF as we vary the parameters
Alpha and beta and let's start off by
considering the example where Alpha is
actually equal to one in that
circumstance we can write down the
formula for the gamma distribution so
now we've got the probability of Y given
that Alpha is equal to one and given
beta is just in this circumstance what
I'm going to do is I'm going to forget
about these sort of normalizing
constants here because these constants
here aren't a function of the variable
they're not a function of Y so I'm just
going to forget about them at first and
when Alpha is equal to one this first
part here y to the^ Alpha - one is
actually y ^ 0 so that's just one and
that disappears and our gamma
distribution just sort of goes as e to
the power minus beta * y so if we were
to draw this distribution now we could
imagine that it's just basically going
to be if we were to sort of draw the PDF
it's just going to be an exponential
decay so it's just going to sort of Peak
at zero and it's just going to Decay
exponentially with a rate of sort of
beta here so this is the c circumstance
where Alpha is equal to 1 let's now
think about what would happen if we
increased Alpha to being equal to two
now we've got that the probability
density the probability of Y given Alpha
is equal to two and given beta now goes
as well now we've just got y to the
power Alpha is 2 so Alpha minus 1 is 1
so it's just y time e to the power minus
beta Y and now we can see that basically
as y increases there is a differential
effect of both of these terms as y
increases this first term acts to
increase so that actually to increase
the PDF whereas as y increases this
second term here actually
decreases and because generally
exponentials beat sort of powers we know
that this sort of second term is going
to dominate when Y is quite big and
perhaps this first term will dominate
when Y is small and what that actually
means is that our PDF for the case of
alpha being equal to 2 is going to look
something like this and initially what
happens is y is dominating and then the
beta Y starts to take over and then
there is this sort of exponential decay
afterwards which corresponds to the
second term really dominating and taking
it towards zero as we go to Infinity so
this is the circumstance when Alpha is
equal to two and we can imagine if we
were to change this to Alpha being equal
to 3 what we would then have is we would
have a y^2 here and you can imagine that
y^2 is going to increase faster than y
and eventually it's going to lose out to
the second term but at least in the
short run it's going to do better than
the alpha equals 2ks so what we can do
is we can draw the PDF for this sort of
third example here and it might look
something like this blue line which I'm
drawing here perhaps a bit smoother than
that which I've drawn here but the point
is that it sort of Peaks uh at some
point which is after the alpha equals 2
Mark now what we can do is we can think
about the effect of changing beta in our
distribution and for the effect of
changing beta what I'm going to do is
I'm going to think about the probability
of Y given sort of Alpha and beta as
being given by something which is beta
the^ Alpha * y^ Alpha - 1 * e^ minus
beta Y and actually what I'm going to do
is I'm going to forget about this sort
of first or this sort of middle term
here because I'm keeping Alpha the same
so let's just forget about that term and
all we've got here is a sort of beta to
the power Alpha time e^ minus beta *
Lambda so here you can see the effect of
increasing beta as you increase beta
there is sort of two effects one of the
effects is via this first term and this
sort of governs the height which our
distribution sort of reaches and as you
increase beta that's obviously going to
increase the height of our PDF but the
second term here means that as I
increase beta the sort of rate at which
we Decay towards zero is that much
faster so this actually acts to sort of
decrease the distribution particularly
after we have reached its maximum and so
what it's actually going to do is it's
going to make the distribution that much
sharper so as I increase beta it's going
to become taller and it's going to
become sharper so you can imagine for
the case of when sort of these were all
sort of drawn for the case of when beta
was equal to one say as I increase beta
you can imagine that the alpha equals
free case will increase upwards and
perhaps what will happen is it will look
something more like this line which I'm
drawing here so this might be the alpha
sorry the beta equals 2 case when Alpha
is equal to three and as I said beta is
now equal to two so now the
distributions become higher but the rate
of it sort of declining towards zero is
that much faster it doesn't take long
now for the distribution to get very
very close to zero as I increase y so
the distribution's actually become
taller and it's become sharper but I
don't need to take my word for this so
what I've actually gone ahead and done
is I've coded this up into mat lab and
so now what we're doing is we're
starting off with the circumstance of
when Alpha and beta are both equal to
one and we know from our sort of
preceding analysis that the gamma
distribution should just look like an
exponential decline for that
circumstance so if I run this we see
exactly that we see that the gamma
distribution in this circumstance is
just an exponential decline as I
increase Alpha to two we should imagine
that we're going to start to see this
sort of Hump shape because now we've got
these sort of two contrasting effects
that are f are sort of fighting against
each other as y increase increases so if
I increase that to two we see that the
distribution now has this sort of Hump
structure and as I increase Alpha more
towards three we're going to see a
shifting out of this hump more towards
sort of center of our graph as we see it
here so if we run this it's moving out
that bit more if I increase Alpha that
much more up to let's say six it's going
to increase a lot so it's going to sort
of the point at which it reaches a
maximum is going to be that much further
to the right
and it's going to take that much longer
for the exponential component of it to
take the distribution towards zero now
if we start off with a sort of case of
alpha being equal to three and beta
being equal to one and then we start to
vary beta we start off with this
distribution when beta is equal to one
as I increase beta to two remember what
we expect to see we expect that the
distribution is going to become taller
but it's going to become that much
sharper as well so the distribution is
going to go to zero after the maximum
that much quicker
so if we run this we see exactly that
the distribution is both taller and it
is taking that much less time to go
towards zero as I increase y as I
increase beta that much more we can see
here that we're going to sort of get a
little bit higher and the distribution
is going to become sharper still so you
can see here quite easily the effect of
changing Alpha and beta now what I want
to do is I want to work out the mean of
a gamma distribution and to do so I'm
going to now move to to a new clean
canvas so I'm going to start out by
writing out the PDF for this example so
just so that we have it at the top the
probability of Y given Alpha and beta is
just equal to Beta to the^ Alpha all
divided through by the gamma function of
alpha time y^ Alpha minus 1 * e^ - beta
* Y and remember that this distribution
is only defined for y being greater than
or equal to zero so what we're trying to
do here is we're trying to work out the
mean of a gamma distribution so
essentially what we're trying to work
out here is we're trying to work out the
expectation of Y and we know that this
in the continuous PDF example is just
found by integrating over all values of
Y that are allowed so here from not to
Infinity of Y times this PDF so that's
just y * beta to^ Alpha / through by
gamma of alpha * y^ Alpha - 1 * e^ minus
beta * y integrated over y then what I'm
going to do is I'm going to take out
actually this sort of constant here and
that's just going to have a factor of
beta to power Alpha over gamma of alpha
outside the front of this integral
because it doesn't contain any y terms
so now I'm integrating from not to
Infinity well I've got y * y ^ Alpha
minus 1 so y 1 * y the alpha minus 1
which is just going to give me a y power
Alpha times the exponent of minus beta y
integrated over choice of Y and what I
could do is I could go ahead and I could
do some sort of integration by parts
iteratively and that would allow me to
work out this rather difficult integral
but it turns out you don't actually need
to do this because there is a bit of a
trick that we can use here
essentially apart from the normalizing
constant this term inside the sort of
integral is a gamma sort of distribution
with a parameter Alpha + one and its
second parameter still equal to Beta the
only difference as I say is the fact
that we haven't got this sort of
normalizing sort of constant out in
front of it but what we can actually do
is we can sort of write that normalizing
constant in front of it within the
integral so then what we have is beta
the^ Alpha divided through by gamma of
alpha then I'm going to leave a bit of
space times the integral from 0 to
Infinity of now we're going what I'm
going to put here is beta to the^ Alpha
+ 1 divided through by the gamma
function of alpha + 1 time y ^ Alpha *
e^ minus beta y integrated over choice
of Y but obviously I've just introduced
this constant term here so I have to
divide through that which is equivalent
to multiplying through by one over that
so that's just multiplied through by
gamma of alpha + 1 divided through by
Beta to the^ Alpha + 1 and then we've
got this equality now holding so why
have I actually done that well the
reason I've done that is because now
this term that I'm sort of underlining
this integral now is just an integral
over a gamma density and we know that a
gamma density is a probability
distribution so this integral has to
equal one so we can just forget about
this whole sort of integral term here
and we're just left with this sort of
term that we have on the left here just
involving these constants and we can
simplify these ready enough remember
that the gamma function of a sort of
parameter n is actually the continuous
analog of the factorial distribution and
actually if we just confine ourselves to
the circumstance of when we're talking
about sort of integers here the gamma
function of n is equal to nus1 factorial
so we can actually use the
simplification when we're just talking
about Alpha being an integer so when we
sort of consider Alpha to be an integer
we can rewrite this whole thing as beta
the^ Alpha divided through by Beta the^
Alpha + 1 on the top it just becomes
Alpha factorial and on the bottom we
have Alpha minus one factorial
then what we can do is we can simplify
the first part which is the part with
the beta in it and we've just got a beta
to the power Alpha divided through by
the beta to the^ Alpha Time beta so this
whole sort of first term here just
becomes one over beta and this second
term is actually quite easy to simplify
as well when we consider the fact that
Alpha factorial is just Alpha times you
know Alpha minus one Etc and Alpha -1
factorial is just equal to Alpha -1
time you know Alpha minus 2 Etc so
they're both exactly the same after this
first a so the whole of this sort of
second part is going to cancel between
the two of these things here and we're
just going to be left with Alpha
factorial over Alpha minus one factorial
just yielding Alpha and hence we get
that the mean of this distribution is
just given by Alpha over Beta And as it
turns out this actually holds for non-
integer Alpha as well

Help & FAQ

Be Smart

Tim Ferriss

Apr 04, 2026

Watch Read Summary

PDF

What Is the Gamma Distribution?

Why Use the Gamma Distribution in Bayesian Inference?

Intuition Behind the Two Parameters

Shape Parameter (\alpha)

Rate Parameter (\beta)

Visual Exploration (Conceptual)

Deriving the Mean of a Gamma Distribution

Key Takeaways

Frequently Asked Questions

Who is Ox educ on YouTube?

Does this page include the full transcript of the video?

What Is the Gamma Distribution?

Why Use the Gamma Distribution in Bayesian Inference?

Helpful resources related to this video

Share This Summary

Embed This Summary