Quartiles are a special type of percentiles. The three quartiles Q1, Q2 and Q3 divide the distribution into three equal parts, each consisting of 25% of data as shown in the diagram given below.

From the diagram it is clear, 50% of data are distributed between the limits of first and the third quartiles. The range of this middle 50% distribution is called the interquartile range and it is denoted shortly as IQR.

From the diagram it is clear, 50% of data are distributed between the limits of first and the third quartiles. The range of this middle 50% distribution is called the interquartile range and it is denoted shortly as IQR.

Unlike the other measures of spread like the range, variance and mean absolute deviation the Interquartile range is not influenced by the presence of extreme values or outliers in the distribution. As the two summary statistics median and interquartile range are relatively less affected by outliers in the data, these are called resistant statistics. In exploratory data analysis interquartile ranges as a measure of spread is used more frequently as against standard deviation in traditional approach.

IQR = Q3 – Q1

1. Order the data set from the lowest to the highest value.

2. Find the median of the data set. Median divides the distribution into two equal halves.

3. Find the median of the lower half which has values less than the median. This value is the first quartile and denote it by Q1.

4. Find the median of the upper half of the distribution which has values larger than the median. This is the third quartile represented by Q3.

5. Use the formula IQR = Q3 –Q1 to calculate the interquartile range.

1. Order the data set and find the values of Q1 and Q3.

2. Calculate the interquartile range using the formula IQR = Q3 – Q1.

3. Compute the lower and upper tolerance limits using the formulas

Lower Limit = Q1 – 1.5(IQR) and Upper limit = Q3 + 1.5(IQR

4. If a value x in the data set is an outlier, if x < lower limit or x> upper limit.

The average monthly incomes in dollars of 16 households chosen in two localities are given below. Compare the average earnings of a family in the two localities.

Area |
Monthly incomes of the families in dollars | ||||||||||||||

L 1 | 2200 | 2500 | 1800 | 3000 | 2000 |
2200 |
2500 |
2700 |
1500 | 2400 | 2000 | 2100 | 1800 |
2600 |
2800 |

L 2 | 1500 | 1600 | 1700 | 1750 | 2000 | 2200 | 2400 |
2200 |
2000 | 1800 | 2100 | 1800 | 2500 | 2300 | 2100 |

Ordering the two data sets from the lowest to the highest,

Area |
Monthly incomes of the families in dollars | |||||||||||||||

L 1 | 1500 | 1800 | 1800 | 2000 | 2000 | 2100 | 2000 | 2200 | 2200 | 2400 | 2500 | 2500 | 2600 | 2700 | 2800 | 3000 |

L 2 | 1500 | 1600 | 1700 | 1700 | 1750 | 1800 | 1800 | 2000 | 2000 | 2100 | 2100 | 2200 | 2200 | 2300 | 2400 | 2500 |

The five point summary for the two data sets are tabulated as follows

Area |
Minimum | Q1 | Median | Q3 | Maximum | IQR |

L 1 | 1500 | 2000 | 2200 | 2550 | 3000 | 550 |

L 2 | 1500 | 1725 | 2000 | 2200 | 2500 | 475 |

The box plot done on the same axis for the two data sets look like this

Comparing the averages, the median for L 1 is greater than the median for L 2. This means the average income of a family in locality 1 is higher. At the same time the Interquartile range of the locality 1 is also greater than that of locality 2. This means the data is more spread around the central value. The whiskers for the first plot (L 1) also are longer indicating the spread is overall large for the income data set for locality 1.