Central Tendency

Statistically, central tendency refers to either the average or to the balancing or the most common occurrence concept of a data set. Each of these characteristic is useful in analyzing the data and drawing inferences therefrom. Descriptive statistics define measures of central tendency and provides methods to evaluate them for a data set. The Inferential statistics makes use of measures of central tendencies calculated for a sample to draw conclusions about the population characteristics.

The word central tendency indicates the middle value and is measured using the mean median mode. Each of these measures are calculated differently and all these are measured in different situations depending upon the occurrence of the date. Its the degree of clustering of values of the distribution and is calculated using the above measures. 

Central Tendency Definition

Central tendency is the middle most value that is computed from the given application or data that is available according to the given situation. Central tendency of a data set can be defined as the centralness of a data distribution which can refer either to the average, middle or the most common occurring behavior of the data values. The general measure used to represent these characteristics are the mean, median and mode of the distribution.

Central Tendency Examples

Below you could see central tendency examples,

Example 1: Find the mean, median and mode of the set 82, 89, 83. 81, 82, 10

The items of the data when arranged in order (say from least to greatest) are,

10, 81, 82, 82, 83, 89

The mean of the data = 10 + 81 + 82 + 82 + 82 + 89 = $\frac{426}{6}$ = 71

There are two middle terms 82 and 82 and hence the median is $\frac{(82 + 83)}{2}$ = 82.5

The number 82 occurs most and hence the mode is 82.

The number 10 is far different from other numbers in the set. Such an item is called an outlier in the set. Suppose we ignore the outlier and calculate the mean, it becomes as, 81 + 82 + 82 + 82 + 89 = $\frac{416}{5}$ = 83.20. It can be seen now that an outlier has a great influence on the mean but has little influence on median or mode. Hence a more realistic conclusion of a central tendency is the measure of median.

Suppose the given data is the set of the scores by a student. The median gives a better report about the student than the mean. The score of 10 may be incorrectly awarded or the student might have taken that particular test in a hard situation. But that score drastically reduces his mean. Normally a good judge will go by the median in such cases.

Suppose the numbers are the code numbers of different commodities sold by a shop. The mode 82 tells that the commodity referred by the code 82 is more popular than the rest of the commodities.

Example 2: 
The Quiz scores of students are given below:  Find the mean,median and mode of the data set.

3, 6, 7, 4, 9, 5, 8, 10, 4, 5, 6, 6.

Mean of the data x= $\frac{3+6+7+4+9+5+8+10+4+5+6+6}{12}$ = 6.08

To find the median, the data values are arranged in ascending order
 3, 4, 4, 5, 5, 6, 6, 6, 7, 8, 9, 10

As there are even number of values, the 6th and the 7th in the ordered set.  Both of them are equal to 6.  Hence the median = 6.

The value 6 occurs most (three times) in the data set. Hence the mode of the data values = 6.

Central Tendency Bias

Central tendency bias results when the respondents of a survey shy away in giving extreme opinion or the surveyor wanted to project the common or general views. The respondent may avoid giving responses like "Highly dissatisfied"  "Extremely satisfied"and instead choose options like "Dissatisfied" or "satisfied". The surveyor would also avoid including extreme response options in order to project the middle view.

Measures of Central Tendency

The three common measures of central tendency are the mean, median and mode of a data set.

Mean: Mean is the average or arithmetic mean of the data values in the distribution. In a simpler way mean is calculated by dividing the sum of the data values by the number of values in the data set. The symbol μ is used to represent the population mean, while the mean of the sample is indicated by x.
The mean x of the data set x1, x2, x3,......xn   is given by the formula
x = $\frac{x_{1}+x_{2}+x_{3}+.........x_{n}}{n}$
Median: Median is the middle value of an ordered data set. To find the median, the values of a data set are first ordered from the lowest to the highest. If there are an odd number of terms, the median is the $\frac{n+1}{2}$th term in the order. If there are even number of terms, the median is the average of two middle values, $\frac{n}{2}$th and the $\frac{n}{2}+1$th values  in the ordered set.

Mode: The mode is the most frequently occurring value in the data set. A data set can have more than one mode.

Central Tendency Mean

The most commonly used measure of a central tendency is the mean. It is also called the average. A mean deviation is defined as the sum of the items of a data divided by the number of items in the data.

For example, consider a data of set of numbers 2, 5, 6, 10, 12. The sum of the items is the sum of the numbers 2 + 5 + 6 + 10 + 12 = 35 and number of items is 5. Hence the mean of this data is, $\frac{35}{5}$ = 7.

Central Tendency Median

The median of a data is the middle item of the data when all the items are arranged in an order.

For example, consider a data of set of numbers 2, 8, 6, 12, 10.

The items of the data when arranged in order (say from least to greatest) are,

2, 6 , 8, 10, 12

The middle item is 8 and hence 8 is the median of this data.

In case of a data with even number of items, the mean of the middle two terms is the median of the data.

Median is a better measure of central tendency compared to mean if the data is skewed and also it is not influenced by the presence of outliers. We will see this in the illustrated example.

Central Tendency Mode

The mode of a data is the item or items which occur the most out of all the items of the data, when all the items are arranged in an order. There may not be any mode in some data or there may be more than one mode in some data.

For example, consider a data of set of numbers 2, 8, 6, 12, 10.

The items of the data when arranged in order (say from least to greatest) are,

2, 6 , 8, 10, 12

All the items in the data occur only once. Hence there is no mode or nil mode for this data.

But consider a data of set of numbers 2, 8, 6, 2, 6, 12, 10, 6, 8, 3, 8

The items of the data when arranged in order (say from least to greatest) are,

2, 2, 3, 6, 6, 6, 8, 8, 8, 10, 12

In this data, the number 2 occurs twice, the number 6 occurs thrice and also the number 8 occurs thrice. Hence there are two modes for this data which are 6 and 8.

When the data represent a category, the mode of the data tells us the most favorite item.

Central Tendency and Dispersion

A measure of central tendency is not sufficient to describe the data fully.  When we say the average test score of a class is 62, it does not mean that all ore even most of the students scored 62 in the test. To know about the entire class's performance, one must also know how the marks are clustered around 62.  Thus a measure which tells how the data set is dispersed ( spread) is also used along the measure of central tendency is called a measure of dispersion or variability.

The common measures of dispersion which are used in statistical analysis are the range, variance, standard deviation and interquartile range.

Range is the simplest measure of variation and it is the difference between the highest and the lowest values in the data set.  Still it is not very useful in describing the data as it does not convey any information about how the data values are related to a measure of central tendency like mean or median.

Variance is the measure of dispersion used along with mean, the measure of central tendency in describing the data. Indeed mean is used in the computation of variance. Standard deviation is the positive square root of the variance which is often used with mean. While the population standard deviation is denoted by σ, the sample standard deviation is represented by 's'. 

The formulas for finding the variance and standard deviation are as follows:
Variance = σ2 = $\sum_{i=1}^{N}\frac{(x_{i}-\overline{x})^{2}}{N}$

Standard deviation σ = $\sqrt{\sum_{i=1}^{N}\frac{(x_{i}-\overline{x})^{2}}{N}}$
Interquartile range (IQR) is the measure of variability used with median to describe the data set. IQR is the difference between the first and third quartiles and gives the range of the central 50% of the data set.

Central Limit Theorem

When a distribution is normally distributed, all the three measures of central tendency coincide.  For a normally or approximately normally distributed data, the probability of the occurrence of a value in an interval can be found using standardized normal curve and z-table.

According the central limit theorem that the distribution of sample means approaches normalcy as the number of samples is increased.

Below you could see central limit theorem:

As the sample size increases, the distribution of sample means taken with replacement from a population with mean μ and standard deviation σ will approach a normal distribution. The mean of the distribution of sample means = μ with standard deviation = standard error of the population = $\frac{\sigma }{\sqrt{n}}$.

The distribution of sample means can thus be assumed to be normal for large number of samples, even if the population is not known to be normally distributed. Thus the central limit theorem provides a method to hypothesis testing using test statistic and critical value.

Topics in Central Tendency