drgwen.org-Statistics Tutorial

NORMAL DISTRIBUTION

I. Distributions

a. Normal Curve

1) The normal curve is a theoretically perfect frequency polygon in which the mean, median, and mode all coincide in the center that takes the form of a symmetrical bell-shaped curve. Curve developed by De Moivre (French Mathematician), with the notion that many human traits, such as intelligence, attitudes, and personality, are distributed among the population in a fairly "normal way.” For example, most of the IQ scores will fall around the mean (100), and there will be relatively few extreme scores, such as below 55 and over 145.

2) In hypothesis testing, we consider the probability that a given difference or relationship could occur by chance alone. Understanding the normal curve helps understand concepts underlying hypothesis testing.

3) Baseline of the normal curve is measured in standard deviation units shown by the small letter z. A score that is one standard deviation above the mean symbolized by +1z, and -1z shows a score that is one standard deviation below the mean.

4) In a normal distribution, approximately 34% of the scores fall between the mean and one standard deviation above the mean.

68% of the scores fall between -1z and +1z.
34% of scores fall between the mean and +1z
14% of scores fall between +1z and +2z
2% of scores fall between +2z and +3z
34% of scores fall between the mean and -1z
14% of scores fall between -1z and -2z
2% of scores fall between -2z and -3z

Example: GRE Examination is scaled to have a mean of 500 and a standard deviation of 100. If 68% of the scores fall between +1z, 68% of the scores fall between 400 and 600 ( -1z = 500 - 100 = 400 and + 1z = 500 + 100 = 600). Ninety six percent of the scores fall between 300 and 700 (-2z = 500 - (2)(100) = 300 and +2z = 500 + (2)(100) = 700).

b. Percentiles

1) Percentile allows us to describe a given score in relation to other scores in a distribution. It allows us to compare scores on tests that have different means and standard deviations. A percentile is calculated as:

Number of scores less than a given score

----------------------------------------------- x 100

Total number of scores

   Example: n=50 (40 had less than 90); your score = 90

40

---       x 100 = 80

50

You achieved a higher score than 80% of those who took the test.

2)  Calculating percentile when you have a standard score: First look up the score in the table to determine what percent of the normal curve falls between the mean and the given score. Then, if the sign is positive, you add the percentage to 50. If the sign is negative, you subtract the percentage from 50.

Example:

An IQ of 115 is +1z and the percentile is 34.13 + 50 = 84.13.
An IQ of 85 is -1z and the percentile is 50 - 34.13 = 15.87

c.  Standard Scores

1)  Standard scores are a way of expressing a score in terms of its relative distance from the mean. A z-score is a standard score. In research, standard scores are used more often than percentiles.

2) Formula for a standard score:

x - M

      z = -----

s
d. Transformed Standard Score
1) Calculating z-scores results in decimals and negative numbers, some prefer to transform them into other distributions. One distribution that has been widely used is one with a mean of 50 and a standard deviation of 10.

2) Transformed scores referred to as T-scores.

3) To convert a z-score to a T-score, use the following formula:

T = 10z + 50

Example:  With a z-score of 2.5, the T-score would be

T = (10)(2.5) + 50

T = 25 + 50

T = 75

e.  Non-Normal Distributions

1) When a distribution does not have relatively equal numbers on each side of the distribution but has a large number of scores on one side, the distribution is referred to as skewed. This disproportionate hump of scores causes a "tail" to be formed at the opposite end of the distribution.

2) A positively skewed distribution has a tail extending on the right or positive side of the distribution.

3) A negative distribution has a tail extending on the left or negative side of the distribution.

4)       Even a bell-shaped curve need not be normal. The measure of relative peakedness or flatness of the curve is called kurtosis. A narrow peaked curve is leptokurtic, and a flatter curve is platykurtic.

f.  Central Limit Theorem

1) It has been shown that when most samples are drawn from a population, the means of these samples tend to be normally distributed. The larger the number of samples, the more the distribution reaches the normal curve.

2)  To calculate standard scores necessary to determine position under the normal curve, need to know the standard deviation of the distribution (z = (x - M)/s). This new standard deviation of the means is called the: standard error of the mean (Sx). Term error indicates the fact that due to sampling error, each sample mean is likely to deviate somewhat from the true population mean.

3) It has also been shown that there is a constant relationship between the standard deviation of a distribution of sample means (standard error of the mean), the standard deviation of the population from which the samples were drawn, and the size of the samples. The formula for standard error of the means is:

s

sx   = ----

√n

4) To summarize the central limit theorem:

As n increases:

1. The sampling distribution of the means of the samples approaches a normal distribution.

2. The mean of these means approaches the mean of the population.

g.  Confidence Intervals

1) Since means are normally distributed, we can use the standard deviation of the distribution of means, the standard error of the mean, to determine areas under the normal curve, and then we can determine how confident we are that a population mean would fall within a certain interval.

2) 95% of the distribution falls between +1.96 standard deviations from the mean and 99% falls between +2.58 standard deviations from the mean. Given characteristics of the normal curve, there is a 95% probability that a given mean will fall between +1.96 standard deviations from the actual population mean and a 99% probability that a given sample mean will fall between +2.58 standard deviations from the actual mean.

3)  To set confidence intervals, use the following formulas: (95% of the cases is contained between +/- 1.96 SD from the mean and 99% between +/- 2.58 from the mean)

95% = M + 1.96 (s_m)

99% = M + 2.58 (s_m)

(s_m)=standard error

Example: Sample M =1, SD = 27, n=81

First need to calculate standard error of the mean: standard deviation/square root of n

Next calculate 95% CI

Mean + (1.96) (standard error)

                      100 + (1.96) (3)

                      100 + 5.88 = 94.12 and 105.88

27               27

s_m = ---         =    ---      = 3

√81                9

drgwen.org tutorials