# Interquartile Range

Sub Topics
Quartiles are a special type of percentiles. The three quartiles Q1, Q2 and Q3 divide the distribution into three equal parts, each consisting of 25% of data as shown in the diagram given below.

From the diagram it is clear, 50% of data are distributed between the limits of first and the third quartiles. The range of this middle 50% distribution is called the interquartile range and it is denoted shortly as IQR.

## Interquartile Range Definition

Interquartile range is the difference between the first and third quartile values.  It is the range for central 50% of an ordered data distribution.
Unlike the other measures of spread like the range, variance and mean absolute deviation the Interquartile range is not influenced by the presence of extreme values or outliers in the distribution. As the two summary statistics median and interquartile range are relatively less affected by outliers in the data, these are called resistant statistics. In exploratory data analysis interquartile ranges as a measure of spread is used more frequently as against standard deviation in traditional approach.
Interquartile Range Formula

IQR = Q3 – Q1
Steps involved in finding the interquartile range

1.    Order the data set from the lowest to the highest value.
2.    Find the median of the data set. Median divides the distribution into two equal halves.
3.    Find the median of the lower half which has values less than the median.  This value is the first quartile and denote it by Q1.
4.    Find the median of the upper half of the distribution which has values larger than the median. This is the third quartile represented by Q3.
5.    Use the formula IQR = Q3 –Q1 to calculate the interquartile range.

## Interquartile Range and Outliers

Interquartile range is a useful in identifying the outliers in a distribution.  Outliers are the extreme low and high values in the distribution which influence the mean and spread of the distribution rather adversely.
The steps to identify the outliers are given below.
1.    Order the data set and find the values of Q1 and Q3.
2.    Calculate the interquartile range using the formula IQR = Q3 – Q1.
3.    Compute the lower and upper tolerance limits using the formulas
Lower Limit = Q1 – 1.5(IQR) and Upper limit = Q3 + 1.5(IQR
4.    If a value x in the data set is an outlier, if x < lower limit or x> upper limit.
Example for finding outliers using Interqurtile range

## Interquartile Range and Box Whisker Plot

The box whisker plot is made using the five point summary the minimum and maximum values of the distribution and the three quartiles. Indeed the width of the box represents the interquartile range of the plotted data set. For comparing two or more data sets, box whisker plots of these distributions are drawn on the same axis. The position of medians helps in comparing the central value, while the interquartile range is used for comparing the spread of the data sets. An example is given here to illustrate this.
The average monthly incomes in dollars of 16 households chosen in two localities are given below. Compare the average earnings of a family in the two localities.

 Area Monthly incomes of the families in dollars L 1 2200 2500 1800 3000 2000 2200 2500 2700 1500 2400 2000 2100 1800 2600 2800 L 2 1500 1600 1700 1750 2000 2200 2400 2200 2000 1800 2100 1800 2500 2300 2100

Ordering the two data sets from the lowest to the highest,

 Area Monthly incomes of the families in dollars L 1 1500 1800 1800 2000 2000 2100 2000 2200 2200 2400 2500 2500 2600 2700 2800 3000 L 2 1500 1600 1700 1700 1750 1800 1800 2000 2000 2100 2100 2200 2200 2300 2400 2500

The five point summary for the two data sets are tabulated as follows

 Area Minimum Q1 Median Q3 Maximum IQR L 1 1500 2000 2200 2550 3000 550 L 2 1500 1725 2000 2200 2500 475

The box plot done on the same axis for the two data sets look like this

Comparing the averages, the median for L 1 is greater than the median for L 2.  This means the average income of a family in locality 1 is higher. At the same time the Interquartile range of the locality 1 is also greater than that of locality 2.  This means the data is more spread around the central value.  The whiskers for the first plot (L 1) also are longer indicating the spread is overall large for the income data set for locality 1.