LmCast :: Stay tuned in

90% of the T Distribution

Recorded: May 30, 2026, 11:01 p.m.

Original Summarized

90 % of the t distribution

Entropic Thoughts
90 % of the t distribution

Home
Archive
Tags
About
xkqr.org

90 % of the t distribution

by kqr,
scheduled 2026-05-26
Tags: forecasting
statistics

William Sealy Gosset was great. He improved beer at Guinness by using the
statistics that existed at the time. Not happy with that, he invented new
statistics to brew even better beer. The things he invented are used all over
the place now, but Guinness wanted to keep him a secret weapon, so they made him
publish his results under the fake name Student.

One thing Gosset realised is that it is wrong to compute 90 % confidence
intervals for the mean by taking the standard deviation of the sample, and
assume a normal distribution, like-a-so:

\[\hat{\mu} \pm 1.645 \hat{\sigma}\]

When we do this we get too narrow a range, because while we recognise
\(\hat{\mu}\) is just an approximation, we are assuming we know \(\sigma =
\hat{\sigma}\) with certainty!

Gosset came up with correction tables based on the number of samples used in the
estimation of the confidence interval, to account for our uncertainty in the
estimation of \(\hat{\sigma}\). Here are some useful values, rounded to be easier
to memorise:

Number of samples
Correction factor for 90 % interval

2

3

4
1.5×

5
1.3×

6–8
1.2×

9–20
1.1×

To use this table, count how many samples the estimation of the standard
deviation is based on, multiply the estimation of the standard deviation
\(\hat{\sigma}\) with the correction factor, and then multiply again with 1.645 to
get a 90 % interval. If the number of samples is greater than 20, the naïve
estimation of the standard deviation is good enough for a 90 % interval.

Thus, if we have 7 samples and these have lead us to estimate a mean of 32
minutes with a standard deviation of 8 minutes, we should not think of the 90 %
confidence interval as

\[ 32 \pm 8×1.645\]

but rather as

\[32 \pm 8×1.2×1.645\]

Already with 7 samples, the actual 90 % confidence interval is fairly close to
the naïve one, being only a factor of 1.2 too narrow. With fewer samples, the
uncertainty in the standard deviation is larger, so we should estimate a
similarly wider confidence interval.11 A stronger confidence interval, like
the 95 % or even 99 % interval will be correspondingly much wider after the
Student t correction.

This is the table for 90 % intervals because that’s what I need most often.
Gosset didn’t actually come up with any specific approximation table; he came up
with the entire Student’s t distribution which lets us create any table of
correction factors we need.

Variation from just two values

Although the above table is what you need for getting a 90 % confidence interval, we
can also use a similar technique to get a sloppy estimation of the standard
deviation based on just two samples. The sample standard deviation of two values
is given by

\[\frac{\left(\mathrm{high} - \mathrm{low}\right)}{\sqrt{2}}\]

This massively underestimates the actual standard deviation, because it is based
on just two values. But one standard deviation corresponds to a t score of
1.846, so we can multiply the above by that, and we get a better approximation
of the standard deviation.

If we round the constant factors for convenience, we’ll find that the
appropriate estimation of the standard deviation (corrected through the t
distribution) is 1.3 times the distance between the two numbers we have. That’s
incredibly useful in practice!

Example of how to use it

I’m sure you’ve been in a situation where someone has asked something like “Is
49 litres a good result?”

You don’t know, of course, so you ask “Compared to what?”

Maybe they respond “Compared to 43 litres!”

That sounds impressive, but you don’t want me to chastise you, so you say, “That
still tells me nothing because I don’t know the variation inherent in the
process. Give me another typical result!”

They might then say “Uhh, 47 litres.”

Now you let your guard down and think, “Oh, 49 is above both the typical
results. Very good!”

And then i chastise you!

So you turn on your brain instead.

You have received two typical numbers: 43 and 47. These don’t tell you much
about how the inherent variation, but they do tell you a little. The distance
between them is four. If we multiply that by 1.3, we get our estimation of the
standard deviation, which is something like 5 litres. That means 49 litres is
less than one standard deviation away from the midpoint of 45 litres. That’s a
normal result, not unusually good or bad.

Sidenotes
1 A stronger confidence interval, like
the 95 % or even 99 % interval will be correspondingly much wider after the
Student t correction.

If you liked this and want more you should
buy me a coffee.
That helps me turn my 170+ ideas backlog into articles.

Shoutout to my amazing wife
  without whose support I would never
make it past the first sentence. ♥

S = k log W. Comments? Send
me an email.

You can also subscribe for new articles.

William Sealy Gosset developed statistical methods to address uncertainties in estimation, particularly concerning confidence intervals. He recognized that assuming a normal distribution when calculating 90 percent confidence intervals for the mean by using only the sample standard deviation and assuming it equals the population standard deviation leads to confidence intervals that are too narrow. Gosset addressed this issue by creating correction tables based on the number of samples used in the estimation, thereby accounting for the uncertainty in the estimate of the standard deviation.

To adjust the confidence interval calculation, Gosset created correction factors for 90 percent intervals dependent on the number of samples. For instance, the correction factors vary based on whether two samples, three samples, up to twenty samples, or a range of samples are used for estimation. The methodology involves multiplying the estimated standard deviation by the appropriate correction factor derived from these tables, and then multiplying by the standard t-score (1.645) to determine the final interval. If the number of samples exceeds twenty, the naive estimation of the standard deviation is deemed sufficient for a 90 percent interval. This correction accounts for the increased uncertainty in estimating the standard deviation when the sample size is small, leading to wider and more realistic confidence intervals, especially for stronger confidence levels like 95 percent or 99 percent.

Gosset's work established the theory underlying the Student's t distribution, which allows for the creation of such correction factors necessary for accurate interval estimation.

Furthermore, Gosset explored methods for estimating the standard deviation when only two values are available. The sample standard deviation for two values is calculated as the difference between the high and low values divided by the square root of two, which significantly underestimates the actual standard deviation because it relies only on minimal data. To obtain a better approximation of the standard deviation in this scenario, the text suggests multiplying this distance by a factor derived from the t distribution, resulting in an estimation that is approximately 1.3 times the distance between the two observed numbers. This technique is useful for estimating the inherent variation when only limited data points are known, allowing for a more robust assessment of whether an observed result is statistically unusual or typical relative to the process variation.