90% of the T Distribution
Recorded: May 30, 2026, 11:01 p.m.
| Original | Summarized |
90 % of the t distribution Entropic Thoughts Home 90 % of the t distribution by kqr, William Sealy Gosset was great. He improved beer at Guinness by using the One thing Gosset realised is that it is wrong to compute 90 % confidence \[\hat{\mu} \pm 1.645 \hat{\sigma}\] When we do this we get too narrow a range, because while we recognise Gosset came up with correction tables based on the number of samples used in the Number of samples 2 3 4 5 6–8 9–20 To use this table, count how many samples the estimation of the standard Thus, if we have 7 samples and these have lead us to estimate a mean of 32 \[ 32 \pm 8×1.645\] but rather as \[32 \pm 8×1.2×1.645\] Already with 7 samples, the actual 90 % confidence interval is fairly close to This is the table for 90 % intervals because that’s what I need most often. Variation from just two values Although the above table is what you need for getting a 90 % confidence interval, we \[\frac{\left(\mathrm{high} - \mathrm{low}\right)}{\sqrt{2}}\] This massively underestimates the actual standard deviation, because it is based If we round the constant factors for convenience, we’ll find that the Example of how to use it I’m sure you’ve been in a situation where someone has asked something like “Is You don’t know, of course, so you ask “Compared to what?” Maybe they respond “Compared to 43 litres!” That sounds impressive, but you don’t want me to chastise you, so you say, “That They might then say “Uhh, 47 litres.” Now you let your guard down and think, “Oh, 49 is above both the typical And then i chastise you! So you turn on your brain instead. You have received two typical numbers: 43 and 47. These don’t tell you much Sidenotes If you liked this and want more you should Shoutout to my amazing wife You can also subscribe for new articles. |
William Sealy Gosset developed statistical methods to address uncertainties in estimation, particularly concerning confidence intervals. He recognized that assuming a normal distribution when calculating 90 percent confidence intervals for the mean by using only the sample standard deviation and assuming it equals the population standard deviation leads to confidence intervals that are too narrow. Gosset addressed this issue by creating correction tables based on the number of samples used in the estimation, thereby accounting for the uncertainty in the estimate of the standard deviation. To adjust the confidence interval calculation, Gosset created correction factors for 90 percent intervals dependent on the number of samples. For instance, the correction factors vary based on whether two samples, three samples, up to twenty samples, or a range of samples are used for estimation. The methodology involves multiplying the estimated standard deviation by the appropriate correction factor derived from these tables, and then multiplying by the standard t-score (1.645) to determine the final interval. If the number of samples exceeds twenty, the naive estimation of the standard deviation is deemed sufficient for a 90 percent interval. This correction accounts for the increased uncertainty in estimating the standard deviation when the sample size is small, leading to wider and more realistic confidence intervals, especially for stronger confidence levels like 95 percent or 99 percent. Gosset's work established the theory underlying the Student's t distribution, which allows for the creation of such correction factors necessary for accurate interval estimation. Furthermore, Gosset explored methods for estimating the standard deviation when only two values are available. The sample standard deviation for two values is calculated as the difference between the high and low values divided by the square root of two, which significantly underestimates the actual standard deviation because it relies only on minimal data. To obtain a better approximation of the standard deviation in this scenario, the text suggests multiplying this distance by a factor derived from the t distribution, resulting in an estimation that is approximately 1.3 times the distance between the two observed numbers. This technique is useful for estimating the inherent variation when only limited data points are known, allowing for a more robust assessment of whether an observed result is statistically unusual or typical relative to the process variation. |