The normal distribution curve is a symmetrical bell-shaped curve and has an infinite base. Long, flat-looking tails cover quite a few values but only a small proportion of the area.

The normal distribution curve depends on **mean** because that identifies the position of the center of the curve. It also depends on standard deviation that determines the height and width of the curve. If the standard deviation of a distribution is small, then the curve is tall and narrow (Fig 1); if the standard deviation of a distribution is large, then the curve is short and wide (Fig 2).

There is a point on each side of the normal distribution curve where the slope is steepest. These two points are called **points of inflection**. These are located one standard deviation (1σ) away from the mean (μ) i.e., at: x = μ – 1σ and x = μ + 1σ.

The rule states that

(i) approximately 68% of the terms in a normal distribution are within one standard deviation(± 1σ) of the mean

(ii) 95% of the terms in a normal distribution are within two standard deviations (± 2σ) of the mean and

(iii) 99.7% of the terms in a normal distribution are within three standard deviations (± 3σ) of the mean.

Therefore, given a normal distribution, most terms will be within three standard deviations (± 3σ) of the mean.

If we know that the given distribution is a normal distribution, then we can calculate the mean (μ) and standard deviation (σ) of a normal distribution using percentage information from the population.

## Checking normality

Initial step to check whether a given distribution is normal distribution or not is to draw a picture of a given distribution. Dotplots, stemplots, boxplots and histograms are useful graphical displays to show that the data is unimodal and roughly symmetric.

A more specialized graphical display to check normality is the normal probability plot. If the normal probability plot is roughly a diagonal straight line, then a given distribution is roughly a normal distribution. While this plot more clearly shows deviations from normality, it is not as easy to understand as a histogram. The normal probability plot is difficult to calculate by hand.

**Example:**Following are the runs scored by an Australian cricket team in last 13 one day international matches: 191, 210, 198, 175, 204, 186, 223, 195, 182, 200, 217, 192, 208. Can we conclude that the distribution is roughly normal?

**Sol:**We know that a more specialized graphical display to check normality is the normal probability plot. Thus, the normal probability plot for the given data is as follows:

A graph this close to a diagonal straight line indicates that the data have a distribution very close to normal. Note that alternatively, one could have plotted the data (in this case, match scores) on X-axis and the normal scores on Y-axis.

The binomial probability refers to the probability that an n-trial binomial experiment results in exactly 'x' successes, when the probability of success for each trial is 'p'. This probability is denoted by B(n, p, x). The formula for calculating the binomial probability is given as:

**B(n, p, x) = P(X = x) = **,

where is equal to .

## Cumulative binomial probability

A cumulative binomial probability refers to the probability that the binomial random variable falls within a specified range (e.g., is greater than or equal to a stated lower limit and less than or equal to a stated upper limit).

For example, we might be interested in the cumulative binomial probability of obtaining 4 or fewer heads in 10 tosses of a coin, that is P(X ≤ 4). This would be given as the sum of all the individual binomial probabilities below 4

i.e., P(X ≤ 4) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4).

**Note: **(i) If X is a binomial random variable, then X can take on the values 0, 1, 2, . . ., n.

(ii) If X is a geometric random variable, then it takes on the values 1, 2, 3, ...

(iii) There can be zero successes in a binomial, but the earliest a first success can come in a geometric setting is on the first trial.

**Normal approximation to the binomial distribution: ** Many practical applications of the binomial distribution involve examples in which 'n' [i.e., sample size] is large. However, for large 'n', the binomial probabilities can be tedious to calculate. Since the normal distribution can be viewed as a limiting case of the binomial distribution, it is natural to use the normal to approximate the binomial in appropriate situations.

Binomial distribution is a discrete probability distribution. Hence it takes values only at integers whereas the normal distribution is a continuous probability distribution, hence it is continuous with probabilities corresponding to areas over interval. Therefore, we establish a technique for converting from one distribution to other distribution. (An approximation: Each binomial probability corresponds to the normal probability over a unit interval centered at the desired value.)

A normal distribution is a good approximation to the binomial distribution whenever both 'np' [or mean of the binomial distribution] and n(1 – p) are greater than or equal to 10

i.e., np ≥ 10 and n(1 – p) ≥ 10, where p = probability of success.

This theorem states that: When the size of the sample (n) is large, the sampling distribution of any statistic will be approximately normal. The larger is n, the more normal will be the shape of the sampling distribution.

A rough rule of thumb for using the central limit theorem is that sample size (n) should be at least 30 [i.e., n ≥ 30], although the sampling distribution may be approximately normal for much smaller values of sample size (n) if the population doesn't depart much from normal. The central limit theorem allows us to use normal calculations to do problems involving sampling distributions without having to have knowledge of the original population.

## Sampling distribution of a sample proportion

If a random variable 'X' is the count of successes in a sample of n trials of a binomial experiment, then the **proportion of success** () is given as the 'ratio of count of successes to the number of trials' i.e., = .

is what we use for the sample proportion (a statistic). The true population proportion would then be given by p.

We discussed earlier that, if 'X' is a binomial random variable, then the mean and standard deviation of sampling distribution of X are given by:

μ_{X} = np, σ_{X} = .

We know that if we divide each term in a data set by the same value n, then the mean and standard deviation of transformed data set will be the mean and standard deviation of the original data set divided by n.

Therefore, the mean of a proportion of success is given by: = p

and the standard deviation of a proportion of success is given by:

= .

**Normal approximation to the sampling distribution of : **Like binomial distribution, sampling distribution of will be approximately normally distributed if n, p are large enough. The test is exactly same as it was for the binomial: If X has B(n, p) and = then has approximately , provided that np ≥ 10 and n(1 – p) ≥ 10.