powered by Tutorvista.com

Sales Toll Free No: 1-855-666-7440

- What is Descriptive Statistics?
- Descriptive Statistics Vs Inferential Statistics
- Types of Descriptive Statistics
- Descriptive statistics Analysis
- Simple Descriptive Statistics
- Univariate Descriptive Statistics
- Descriptive Statistics Sample
- Descriptive Statistics Graphs
- Bivariate Descriptive Statistics

Statistics provides methods for collecting, organizing, summarizing, presenting, analyzing and interpreting data. For this purpose the study is broadly divided into two branches.

Descriptive statistics provides methods to process raw data and present it for analysis. Inferential statistics deals with methods that generalize the population traits from populations characteristics.

This page proposes to give an overview on descriptive statistics.

- Descriptive statistics
- Inferential statistics

This page proposes to give an overview on descriptive statistics.

Descriptive Statistics Definition |

Descriptive statistics is the branch of statistics which provides methods and tools for collection, organization, summarization and presentation of data. |

Technically it is also known as Exploratory data analysis (EDA). The exploration of data is done both analytically and graphically. Data can be analyzed using measures of central tendency, variability and position. The data can also be displayed using different graphing techniques. The purpose of applying descriptive statistics methods is only to display and summarize data and not generalize the results.

The national census is taken once in every 10 years by US Government to get information about the average age, income, housing and educational details etc about the US population. The census bureau employs various means to collect organize and summarize the data. Finally they publish the information collected in the form of charts, graphs and tables.

Inferential Statistics complements descriptive statistics by providing techniques like estimation and hypothesis tests to generalize population characteristics using sample data. The methods applied in inferential statistics make use of probability as a measure.

Suppose we collect data from 50 university students on their GPA scores. We may find the mean GPA score, the standard deviation, calculate the five point summary and present this as a Box plot. We may also make comparative graphs like Bar diagrams and Pie charts for the grades obtained in different subjects. These activities are related to descriptive statistics.

Again if we use the averages calculated to estimate a national average and draw conclusions therefrom providing error significance, then these activities fall within the boundaries of inferential statistics.

For example if we are collecting data on the heights and weights of Children in pre schools, then the variables are the height and weight. The data are the recorded heights and weights of children chosen.

The variables are classified on the basis of types of data they assume.

Data /\ / \ / \ / \ |
||||

Qualitative data |
Quantitative data /\ / \ / \ |
|||

Discrete Variables | Continuous variables |

Data is also classified according on number of variables it deals with.

Univariate Data

When data is collected on one variable, such data distribution is termed to be univaraiate. The organization, summarization and presentation of univariate data only describes the data collected, but do not explore the cause.

For example, the frequency distribution of GPA scores of University students can give you the average scores, the standard deviation and graph the relating the score ranges to respective frequencies. But it cannot hint at the cause of such a distribution.

Bivarriate Data

Bivariate data contains two variables, whose values change simultaneously. The examination of such data can tell about the relationship between the two variables and can explain how one variable variable is affected by a change in the other.

If the above example of data on GPA scores also include the student's graduation scores at School level, such a data can be explored to find the relationship between these two scores.

**Telephone Surveys****Mailed Questionnaire surveys****Personal interview surveys****Online surveys via internet**.

Often samples are used to collect data to study the characteristic of a population. The sample formed for this purpose needs to be unbiased, meaning all the members of the population should have the same chance of being included in the sample. The four basic methods of sampling applied for this purpose are **random, systematic, stratified and cluster sampling.** All these four types use random methods to include population elements into the sample.

The Math test scores of 100 students can be tabulated as follows:

ClassMark Range |
Frequencyf |

0 - 20 | 4 |

21 - 40 | 25 |

41 - 60 |
28 |

61 - 80 |
26 |

81 - 100 |
17 |

Total | 100 |

Columns will be appended to this table depending upon the measures calculated to summarize data or the graph used to display data. For calculating the mean of the distribution a column to include the class mid-point x and another column to show the product are added.

The bivariate data is also arranged in tabular form for analyzing data using scatter plots or finding the measures of covariance or correlation.

Student |
Days absentx |
Grade in Finalsy |

Maria | 3 |
87 |

Philip | 15 |
42 |

Danny | 6 | 82 |

Betty | 10 |
75 |

Sally | 11 | 60 |

Christopher | 5 |
92 |

John | 8 |
78 |

As for the univariate data, columns are suitably added for further manipulation of data.

**Measures of central tendency****Measures of dispersion****Measures of position**

A measure of central tendency conveys the idea of centralness or the average of the data set. The three most commonly used measures for this purpose are

**Mean****Median****Mode**

While the

A measure of dispersion tells about the spread of the data. The common measures of variability or dispersion are

**Range****Standard deviation/Variance****Inter Quartile range**

The

The

A measure of position tells about the relative position of a data value in the data set. Median is a measure of position as well as a measure of central tendency. The important measures of position used in the exploration of univariate data are,

**Percentiles****Quartiles****Z - scores.**

The sample mean is considered as an unbiased estimate of the population. Hence same methods and formulas used in descriptive statistics are used for computing the sample mean.

The formula used in descriptive statistics to compute standard deviation $\sigma =\sqrt{\frac{(x_{i}-\overline{x})^{2}}{n}}$

If the population standard deviation is not known, the standard error is used which is calculated using the standard deviation 's' of the sample = $\frac{s}{\sqrt{n}}$.

The sample standard deviation is calculated using degree of freedom to make it unbiased, that is free from sampling errors. The degree of freedom is the number of variables which are free to vary after an estimate is used during the process of the estimation of another measure. The formula for calculating standard deviation makes use of mean. So mean is the measure first calculated. To determine all the data values once the mean is known, we need to know n - 1 values. Hence n -1 is the degree of freedom used in the calculation of sample standard deviation.

Sample standard deviation $s=\sqrt{\frac{(x_{i}-\overline{})^{2}}{n-1}}$.

**Dot plots****Bar graphs****Pie charts****Histograms****Stem Plots****Box Plots****Cumulative frequency charts and Ogives.**

The Scatter plots describe the graphs of Bi variate data. Using scatter plots regression lines and curves can be done to make an estimating model based on the data.

$\sigma _{xy}= \sum_{i=1}^{n}\frac{(x-\overline{x})(y-\overline{y})}{n}$

As the covariance will be expressed in terms of the units of x and y, another correlation measure is used, which is called

The formula used to find the correlation coefficient is

$\rho =\frac{\sigma _{xy}}{\sigma _{x}\sigma _{y}}$

That is the correlation coefficient $\rho $ is got by dividing the covariance by the product of standard deviation of the two variables.

Interestingly the

$\sigma _{xx}=\sigma ^{2}_{x}$