Interquartile Range

Quartiles are a special type of percentiles. The three quartiles Q1, Q2 and Q3 divide the distribution into three equal parts, each consisting of 25% of data as shown in the diagram given below.

Interquartile Range
From the diagram it is clear, 50% of data are distributed between the limits of first and the third quartiles. The range of this middle 50% distribution is called the interquartile range and it is denoted shortly as IQR.

Interquartile Range Definition

Interquartile range is the difference between the first and third quartile values.  It is the range for central 50% of an ordered data distribution.
Unlike the other measures of spread like the range, variance and mean absolute deviation the Interquartile range is not influenced by the presence of extreme values or outliers in the distribution. As the two summary statistics median and interquartile range are relatively less affected by outliers in the data, these are called resistant statistics. In exploratory data analysis interquartile ranges as a measure of spread is used more frequently as against standard deviation in traditional approach.
Interquartile Range Formula

IQR = Q3 – Q1
Steps involved in finding the interquartile range

1.    Order the data set from the lowest to the highest value.
2.    Find the median of the data set. Median divides the distribution into two equal halves.
3.    Find the median of the lower half which has values less than the median.  This value is the first quartile and denote it by Q1.
4.    Find the median of the upper half of the distribution which has values larger than the median. This is the third quartile represented by Q3.
5.    Use the formula IQR = Q3 –Q1 to calculate the interquartile range.

Interquartile Range Example


Interquartile Range and Outliers

Interquartile range is a useful in identifying the outliers in a distribution.  Outliers are the extreme low and high values in the distribution which influence the mean and spread of the distribution rather adversely.
The steps to identify the outliers are given below.
1.    Order the data set and find the values of Q1 and Q3.
2.    Calculate the interquartile range using the formula IQR = Q3 – Q1.
3.    Compute the lower and upper tolerance limits using the formulas
       Lower Limit = Q1 – 1.5(IQR) and Upper limit = Q3 + 1.5(IQR
4.    If a value x in the data set is an outlier, if x < lower limit or x> upper limit.
Example for finding outliers using Interqurtile range

Interquartile Range and Box Whisker Plot

The box whisker plot is made using the five point summary the minimum and maximum values of the distribution and the three quartiles. Indeed the width of the box represents the interquartile range of the plotted data set. For comparing two or more data sets, box whisker plots of these distributions are drawn on the same axis. The position of medians helps in comparing the central value, while the interquartile range is used for comparing the spread of the data sets. An example is given here to illustrate this.
The average monthly incomes in dollars of 16 households chosen in two localities are given below. Compare the average earnings of a family in the two localities.

Area
 Monthly incomes of the families in dollars
 L 1  2200  2500  1800  3000  2000
 2200
 2500
 2700
 1500  2400  2000  2100  1800
 2600
 2800
 L 2  1500  1600  1700  1750  2000  2200  2400
 2200
 2000  1800  2100  1800  2500  2300  2100

Ordering the two data sets from the lowest to the highest,

Area
 Monthly incomes of the families in dollars
 L 1  1500  1800  1800  2000  2000  2100  2000  2200   2200  2400  2500  2500  2600  2700  2800  3000
 L 2  1500  1600  1700  1700  1750  1800  1800  2000   2000  2100  2100  2200  2200  2300  2400  2500

The five point summary for the two data sets are tabulated as follows

     Area        
      Minimum               Q1                   Median                 Q3                 Maximum           IQR          
     L 1        1500       2000        2200        2550        3000      550
     L 2        1500       1725        2000        2200        2500      475

The box plot done on the same axis for the two data sets look like this
Interquartile Range and Box Whisker Plot
Comparing the averages, the median for L 1 is greater than the median for L 2.  This means the average income of a family in locality 1 is higher. At the same time the Interquartile range of the locality 1 is also greater than that of locality 2.  This means the data is more spread around the central value.  The whiskers for the first plot (L 1) also are longer indicating the spread is overall large for the income data set for locality 1.