Recall that **data** is information in the form of numerical figures. To analyse data or compare one set of data with another, there is a need for single representative value of data. We have discussed earlier three such simple measures, namely – mean, median and mode. Geometric mean and harmonic mean are two other measures of central tendency. These give a rough picture of where the data points are centred. But in some cases, the variation in a set of data cannot be satisfactorily described by a single representative value or measure. A combination of measures is required to analyse data and draw meaningful conclusions.

Let us take up a practical example in a game of cricket, which may interest most of you. Assume that the scores of Virendra Sehwag (Mr Dashing) and Rahul Dravid (Mr Dependable) in ten innings are as under:

Player/Innings | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

Sehwag | 26 | 68 | 28 | 0 | 40 | 66 | 82 | 95 | 7 | 118 |

Dravid | 42 | 36 | 58 | 43 | 50 | 56 | 51 | 56 | 78 | 60 |

Now, how do you compare the performance of these two batsmen ?

The (arithmetic) mean or more simply the average is given by .

So the mean of Sehwag =

The mean of Dravid =

Coincidentally the means of both the players are same !

Let us consider another measure – the median.

Arranging the same scores in increasing order

Player/Innings | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

Sehwag | 0 | 7 | 26 | 28 | 40 | 66 | 68 | 82 | 95 | 118 |

Dravid | 36 | 42 | 43 | 50 | 51 | 56 | 56 | 58 | 60 | 78 |

Median is defined as | → | th value if n is odd. |

→ | th values if n is even. |

n = 10 is even in our case.

∴ Median is the average of 5th and 6th values in the table.

So the median of Sehwag =

and that of Dravid =

Interestingly, the medians of both the players are nearly the same.

Can we, therefore, say that the performance of both of them is the same ?

We might be tempted to say so. But just have a cursory look at their scores.

Well, Sehwag has 2 single digit scores (of 0 and 7) and 1 century (118). Dravid has neither !

Secondly, Sehwag has minimum and maximum scores of 0 and 118. Their difference is 118 – 0 = 118.

Dravid has 36 and 78 as his minimum and maximum scores. Their difference is 78 – 36 = 42.

This single figure (118 or 42) is a measure of spread or dispersion – specifically called as **range**.

A lower 'range' represents a better set of data.

By using this measure, we can conclude that Dravid's performance is better than that of Sehwag.

For **ungrouped data**: Range is simply the difference between the maximum (or largest, say L) value and the minimum (or smallest say S) value in the set of observations.

i.e, Range = L – S

Co-efficient of range = (L – S)/(L + S)

For **grouped data ** (i.e, data given in the form of a frequency table): Range is the difference between the upper limit of the largest class and lower limit of the smallest class.

But 'range' alone may not be an adequate measure to compare different sets of data as it has the following disadvantages:

i) It uses only two values (the maximum and minimum) in the entire set of data.

ii) It does not tell us how the data is spread w.r.t. a 'measure of central tendency' say the arithmetic mean.

Range is therefore a crude measure of dispersion.

We shall discuss two more measures of dispersion – **mean deviation ** and ** standard deviation**, which give a more realistic picture.

We shall define them for both ungrouped data and grouped data subsequently.

When range, mean deviation or standard deviation are calculated for a **distribution**, it is called **statistic**.