Degrees of Freedom

Sub Topics
It is denoted by the abbreviation 'df'. Degrees of freedom is a parameter for data distributions like $\chi ^{2}$, 't' and 'F' distributions.

The concept of degrees of freedom appears to be bit confusing. When sample statistics are used for the estimation of population parameters, the degrees of freedom is obtained by subtracting the number of estimates used in the process of calculation from the number data values in the sample.

 Definition of degrees of freedom Degrees of freedom 'df' is the number of data values free to vary after a statistic is computed from the data set.

The degree of freedom is not only a factor in the computation of population estimates, but it is also a parameter that determines the shapes of data distributions like Student's t-distribution.

Degrees of Freedom Definition

Degrees of freedom is an important concept used in estimating population parameters.

Degrees of Freedom n-1

The formula used for finding the population variance is $\sigma ^{2}=\frac{\sum (x-\mu )^{2}}{N}$. where x a value and μ the mean of population data and N the number of data values in the population.

Degrees of Freedom Formula

$\sigma ^{2}=\frac{\sum (x-\mu )^{2}}{N}$

When we estimate the population variance using sample data, we use the population mean x as an estimate for the population mean μ. That means we are using one estimated value in the process of estimating the variance. If there are n data values in the sample, as we have fixed the mean  for the data set, if we know n -1 values, then the nth value can be determined by subtracting the sum of n - 1 values from x.

This means one value is dependent on the n - 1 chosen values which are independent, and hence termed as free variables. Hence the degree of freedom to be used for the estimation of population variance is n - 1 which replaces the number of values N in the formula. The estimated population variance is denoted by $s^{2}$ and is computed using the formula

$s^{2}=\frac{(x-\overline{x})^{2}}{n-1}$

T Test Degrees of Freedom

To estimate the population mean using sample mean the normal distribution is used, either when the population standard deviation is known or the sample size ≥ 30. In the later case the sample standard deviation is used as an unbiased estimator for the population standard deviation. But when the sample size is small and < 30, the normal distribution cannot be used for interval estimation or in a Hypothesis testing. The t- distribution is used whose shape is determined by degrees of freedom as against the normal distribution the shape of which is dependent on the standard deviation.

Degrees of Freedom Table

The t-distribution has some properties similar to normal distribution and it differs with it on some characteristics.

 Characteristics similar           to normal distribution Characteristics different          from normal distribution It is bell shaped The variance is greater           than 1 It is symmetric about the mean. The shape of the curve is           dependent on the degree           of freedom (df) and hence            on the sample size. The mean, median and mode are all           equal to zero and they are all located            at the center of the distribution. As the sample size increases           the t-distribution comes closerr           to the normal distribution. It never touches the horizontal axis.

Let us see an example how the t-distribution and degrees of freedom are used in determining the maximum error in an estimate.

The formula used for finding the maximum error in estimate of population mean $E = t_{\frac{\alpha }{2}}(\frac{s}{\sqrt{n}})$  where s is the sample standard deviation and $t_{\frac{\alpha }{2}}$ is the t - value found in the t-distribution table corresponding to significance level α and degree of freedom = n-1.

6 degrees of freedom

Example:

A sample of 10 data values has a mean = 0.32 with a standard deviation = 0.08.  Find the 95% confidence interval for the true mean.

First we need to find the critical value $t_{\frac{\alpha }{2}}$ The significance level α = 1 - confidence level = 1- 0.95 = 0.05  Hence $\frac{\alpha }{2}=\frac{0.05}{2}= 0.025$

n = 10 as there are 10 values in the sample.  Hence the degree of freedom to be used n-1 = 9

As shown in the table the critical value $t_{0.025} = 2.26216$  for df = 9.

Hence the maximum error for estimate $E = t_{\frac{\alpha }{2}}(\frac{s}{\sqrt{n}})$
= $2.26216(\frac{0.08}{\sqrt{10}})$ = 0.06  rounded to the tenth.
Now the confidence interval for the true mean is given by
x - E < μ < x + E
0.32 - 0.06  < μ < 0.32 + 0.06
0.26  < μ < 0.38

Degrees of Freedom Chi Square and F Distributions

The family of chi-square distributions is also dependent on degrees freedom, like the t- distribution. It is represented by the Greek Letter χ.

 Properties of $\chi ^{2}$ Distribution It is a continuous distributionIt is not symmetric but skewed to the right.The value of $\chi ^{2}$ is always positive.Each curve in the family of ch-square distribution is          uniquely defined by the degree of freedom n-1, where n is the sample size.

Calculate degrees of freedom

f statistic compares the variances of two random samples of two independent normal populations.
$f=\frac{\frac{s_{1}^{2}}{\sigma _{1}^{2}}}{\frac{s_{2}^{2}}{\sigma _{2}^{2}}}$
Since the f statistic is dependent on two variances, the parameters that describe the F- distribution are the degrees of freedom of the two variances.

Properties of F-distribution

• The value of F is always positive as the variances are always positive.
• The distribution is positively skewed.
• The mean value of F is approximately 1.
• The F-distribution curve is determined by the degree of freedom of the variance of the numerator and the degree of freedom of the variance of the denominator.

Degrees of Freedom Example

1.  Sample variance

The degree of freedom used for calculating the sample variance = n - 1 where n is the sample size.

If the sample size = 25, then the degree of freedom for sample variance = 25 -1 = 24

2.  Critical values in t-Distributions
a.  t- tests for mean and proportions for small samples

For finding the critical value for a given significance level the degree of freedom to be calculated using df = n-1, where n is the number of elements in the sample.
If the sample consists of 20 values, then the df to be used for finding the critical value = 20 -1 = 19.

b. Difference between two means - Independent small samples
If the population variances are assumed to be unequal, then the degree of freedom used is the smaller of $n_{1}-1$ and $n_{2}-1$ where $n_{1}$ and $n_{2}$
are the sample sizes.

When the population variances are equal then the degree of freedom = $n_{1}+n_{2}-2$
If the two sample sizes are 14 and 12, then df  = 12 - 1 = 11. if the population variances are not equal and df = 14 + 12 - 2 = 24 if the population variances are
equal.

c. Difference between two means - Dependent small samples

As the sample sizes are equal in this case, the df = n -1 where n is the sample size of each sample.
If two dependent samples each of size 18 are tested for difference of means, then the df = 18 - 1 = 17.

3.  Chi-square test for goodness of fit.
The degree of freedom calculated for finding the critical value in ch-square test for goodness of fit is n - 1 where n is the number of categories in the observed                    frequency table. If the number of categories considered = 4, then the degree of freedom = 4 - 1 = 3.

4.  Chi-square test for independence
The degree of freedom for contingency table is given by df = (number of rows -1)(number of columns -1).
If there are 3 rows and 4 columns in the contingency table, then the df = (3 -1)(4 -1) = 2 x 3 = 6

5.   Degrees of freedom n-1
F- distribution is used in one way Anova techniques. The two degrees of freedom used for determining the F critical values are
df N = the degree of freedom for between variances = k - 1 If there are k number of sample groups.
df D = the degree of freedom for within the group variance = N - k  where N is the total number of values
If there are 4 groups in the analysis each consisting of 5 values, then k =5 and N = 5 x 4 = 20
Hence df N = 4 -1 = 3    and df D = N - k = 20 - 4 = 16

6. n-k-1 degrees of freedom
A polynomial in one variable of degree n-1 can be fitted with n data points. If the number of variables used = k, then the degree of freedom for a multiple regression
df = n - k -1