# Skewness

Sub Topics
Every frequency distribution can be graphed with respect to all its variables. Hence in terms of the graph, any frequency distribution can be defined. Symmetry is a concept that is used in defining distribution in terms of graphical representation. A distribution is said to be symmetric if it looks the same from both left and right side of the center point. The center point is called the axis of symmetry.
Here the measures of central tendency like mean, median and mode will always be equal to each other and the axis of symmetry which is the ordinate at the mean will divide the distribution into two equal parts such that one side will be a mirror image of the other.
This graph shows an example of a symmetric distribution.

## Skewness Definition

Skewness in statistics has been developed with respect to symmetry; in fact, it is the opposite of symmetry. So we can define skewness as a measure of asymmetry of the distribution that means, it helps to measure how much the distribution is not symmetric.
It describes which side of the distribution has longer or shorter tail.

When we define skewness we can also include a topic called Kurtosis which is a measure of whether the data are peaked or flat relative to a normal distribution. That is, a distribution sets with a high kurtosis will tend to have a distinct peak near the mean value, and then it will decline rapidly, and then have a heavy tails. Distribution sets with low kurtosis will tend to have a flat top near the mean value.

Both of these statistics are used as shape statistics. Hence skewness and kurtosis help in describing the shape of the distribution of the datas.

On the basis of the shapes interpreting statistic skewness can be done in three ways

Positive skewness: If the right tail  is longer than the left tail in the graph of the distribution, the function is said to have positive skewness. The presence of the extreme observations on the right hand side of a distribution makes it positively skewed.
So, if the mean > median> mode in any distribution, then it can be said to follow positive skewness.
Negative skewness: If the left tail  is longer than the right tail in the graph of the distribution, then the function will have negative skewness. The presence of the extreme observations on the left hand side of a distribution makes it positively skewed.
So, if the mean < median< mode in any distribution, then it can be said to follow negative skewness.
Zero skewness: If the two tails are of the same length and shape, then we say that the function has zero skewness. Then the distribution will be normal and symmetric.

## Skewness of Data

Any data can be said to be skewed, which meaning it tends to have a tail which is long on one side or the other. If the long tail is seen on the left hand side, then it is negative skewed data. If the long tail is seen on the right hand side, then it is positive skewed data. The normal distribution cannot be skewed as it will be perfectly symmetric.

## Measure of Skewness

The extent of skewness can be measured using different method. So in order to calculate skewness, the following methods can be use.

1. Measure of skewness- Karl Pearson
In this the measure of skewness is based on the property of the divergence of mean from the mode in any skewed distribution.
The formula defined as the Karl Pearson’s Coefficient of skewness SK is
SK =$\frac{Mean-Mode}{S tan d arddeviation}$It is independent of the unit of measurements.
Using this interpreting skewness is easy

If SK > 0, then the distribution is positively skewed.
If SK < 0, then the distribution is negatively skewed

Now by the empirical relation, Mean – Mode = 3(Mean – Median)
Hence ,
Karl Pearson’s Coefficient of skewness SK can also be written in another way
SK =$\frac{3(Mean-Mode)}{Standard\ deviation}$

2. Measure of skewness- Bowley
The basic property here is based on quartiles of the skewed distribution.
The formula defined as the Bowley’s Coefficient of skewness SQ is
SQ =$\frac{Q_3-2m+Q_1}{Q_3-Q_1}$   where Q1 is the first quartile, M is the median and Q3 is the third quartile of the skewed distribution.
3. Measure of skewness- Kelly
It is based on the percentile value, that is, it is based on 10th percentile, P10 and 90th percentile P90, by which 10% of the observations on each extreme are ignored.
The formula defined as the Kelly’s Coefficient of skewness SP is
SP$\frac{P_{90}-2P_{50}+P_{10}}{P_{90}-P_{10}}$So any of the above three formulas for calculating skewness can be used to find the direction and extend of skewness. Here SK, SQ and SP are the skewness coefficient.

## Skewness Example

Lets consider the following data and lets try to find the skewness of it
3, 5, 7, 5, 6, 10, 12
We can use any of the formulas. So lets consider the Kark Pearson’s coefficient of skewness.
By the formula,
SK =  $\frac{Mean-Mode}{Standard\ deviation}$
So we need mean , mode and standard deviation

Here mean = $\frac{(3+5+7+5+6+10+12)}{7}$ = 6.86

As 5 is repeating two times, the mode = 5

Now standard deviation = $\sqrt{\frac{\sum(X_1-\bar{X})^2}{n-1}}$

= ((3-6.86)²+ (5-6.86)²+(7-6.86)²+(5-6.86)²+ (6-6.86)²+ (10-6.86)²+$\frac{(12-6.86)^2}{(7-1)}$
= 3.13
SK =
= $\frac{(6.86 – 5)}{3.13}$
= 0.59
As SK >0 , the distribution is positively skewed.

Population skewness
Other than the above mentioned method, there is a direct method to calculate the skewness of a distribution. It is based on the third moment and standard deviation.
The direct formula is
S=$\frac{1}{n}$ $\frac{\sum_{i-l}^{n}(X_i-X_{avg})^3}{\sigma^3}$

where n = sample size, $\sigma$ = population standard deviation
This can be used to find the population skewness.

## Sample Skewness

When we are dealing with a sample data, the formula for finding the skewness can be modified as
S=$\frac{n}{(n-1)(n-2)}$ $\frac{\sum_{i-l}^{n}(X_i-X_{avg})^3}{S^3}$

where n = sample size, s = sample standard deviation

## Histogram Skewness

A histogram is an extension of bar chart except that this is continuous. The data taken are grouped together in class intervals, and frequently data is marked accordingly. If a histogram is said to be symmetrically distributed, then the two halves of the histogram will appear like a mirror image to each other.

When the distribution is skewed, then the histogram will never have such a mirror image.
In the histogram diagram,

$\rightarrow$    If the left hand side bars are longer, skewness is negative.

$\rightarrow$    If the right hand side bars are longer, skewness is positive.