drgwen.org-Statistics Tutorial

MEASURES OF CENTRAL TENDENCY

I. Levels of Inquiry

a. Research studies are conducted to answer questions and to test hypotheses. Research questions take on different forms depending on the level of inquiry.

1) What is this?
2) What's happening here?
3) What will happen if?
4) How can I make .... happen?

b. Type of data collected and methods of statistical analysis must be considered during the planning of the project.

II. Levels of Measurement

a. Measurement: assignment of numbers to characteristics according to some rule. Types:

1)Nominal: the first and lowest scale in the hierarchy. Includes labeling and categories.

2) Ordinal Scale: Members of a set (e.g., objects, people) are ordered from most to least with respect to some characteristic. Examples-measures of performance, attitude, personality.

3) Interval Scale: In addition to being able to rank or order individuals or objects on some characteristic, the distance between the points on the scale is known, the measure is at the interval level. Distance between any two adjacent points on the scale is the same as the distance between any other two adjacent points.

4) Ratio Scale: Most precise measurement scale-includes all the characteristics of the lower three scales, and in addition, has a true zero.

III. Presenting Descriptive Data

a. Purpose of statistics is to reduce data to a manageable and understandable form. Descriptive statistics used to describe characteristics of the sample under study.

b. Frequency Distributions

1) List of scores from highest to lowest and tally how many people got each score.

2) Class intervals: If more than 10 to 20 scores, combine scores into groups for presentation in tables.

1. class intervals should be mutually exclusive and exhaustive
2. avoid open-ended intervals

3) Absolute Frequency (f): is the number of subjects who received a certain score or whose score fell within a particular class interval. (Example: n=50, 20 people get scores between 70-79)

4) Relative Frequency: calculated by converting actual numbers to percents. (Example from above: n=50, 20/50 x 100 = 40%)

c. Rounding Off

1) If last digit to be rounded off is less than 5, round to the lower number, if higher than 5 round to the higher number.

Example: 4.423=4.42
4.426=4.43

2) If the number to be rounded is a 5, round to the nearest even number, avoids systematic bias up or down.
Example: 4.425=4.42
4.435=4.44

3) Whole numbers, same rules prevail.
Example: 8.2=8 8.8=9
8.5=8 9.5=10

d. Graphic Methods

1) Histogram: most commonly used graphical method. Type of bar graph and used with continuous rather than categorical data (interval and ratio data). Factors to keep in mind:

1. Use "true" class intervals on the horizontal line.
2. If possible, have the class intervals equal.
3. With equal class intervals, all columns have the same width.
4. The vertical and horizontal axes should be about equal in length.
5. The graph should be clearly labeled with a title and labels for both axes.
6. The graph should be clear, w/o reference to text

2) Frequency Polygon: also commonly used graphic method, used separate from histogram. Useful if you want to place two or more distributions on the same pair of axes for sake of making comparisons.

3) Cumulative Percentage Polygon: used less often than the frequency polygon, but preferred when trying to indicate position of a given score in relation to the distribution rather than the overall form of the distribution. Vertical axis is composed of cumulative percentages rather than frequencies. Horizontal axis is the same as in the histogram.

4) Bar Graph: Used with categorical data arranged from lowest to highest. Bars may be either horizontal or vertical.

5) Pie Charts: Not used very often in research reports, generally used by newspapers.

IV. Descriptive Statistics

a. Samples and Populations

1) Population: includes all members of a defined group.
2) Sample: subset of the target population.
3) Parameters: characteristics of populations (denoted by Greek letters).
4) Statistics: characteristics of samples (denoted by Roman letters).

b. Symbols

1) Greek letter sigma (∑) means "the sum of" Example: ∑X means sum of X when all scores in the X distribution are added up

X Scores
X1 = 4
X2 = 4
X3 = 10
X4 = 5
X5 = 7
_______
∑X 30

Distinguish between

∑X
∑X2
(∑X)2

To calculate ∑X2, you must first square each of the numbers in the X distribution. This results in 16, 16, 100, and so on. Sum of all these squared numbers is ∑X2 .

To calculate (∑X)2 simply square the ∑X (∑X)2 = (30)2 = 900

c.Measures of Central Tendency: Measures of central tendency are single points on the measurement scale for a given variable. Many of the variables used in behavioral sciences are distributed so that most scores fall in the middle, with fewer scores falling on either side, in the "tails" of the distribution. There are distributions however that do not assume such a "normal" distribution. Need to know shape of the distribution and dispersion of the scores in order to interpret the data correctly. Most common measures are mean, median, and mode. These measures describe the "middle" of a group of scores.

1) Mean: Mean is the arithmetic average. Add up the scores and divide by the number of scores in the distribution. Symbol is x bar which stands for the mean of the sample, and, µ which stands for the mean of the population. The formula for the sample mean is x bar = ∑X/n, where n = the number in the sample. Characteristics of the mean include:

1. Extreme values can distort
2. The sum of the deviations of the scores in the distribution from the mean always equals zero

X

_
(X – X = 2)

4

4 - 6 = -2

4

4 - 6 = -2

10

10 - 6 = 4

5

5 – 6 = -1

7

7 – 6 = 1

∑X = 30

∑X= 0

Mean = 6

X	_ (X – X = 2)
4	4 - 6 = -2
4	4 - 6 = -2
10	10 - 6 = 4
5	5 – 6 = -1
7	7 – 6 = 1
∑X = 30	∑X= 0

Mean = 6

3. The sum of the squares of the deviations around the mean is smaller than the sum of squares around any other mean.

_ X	X=(X-X)	X2	X-median	(X-median)2	X-mode	(X-mode)2
4	4-6= -2	4	4-5= -1	1	4 - 4 = 0	0
4	4-6= -2	4	4-5= -1	1	4 - 4 = 0	0
10	10-6= 4	16	10-5= 5	25	10 - 4 = 6	36
5	5-6= -1	1	5-5= 0	0	5 - 4 = 1	1
7	7-6= 1	1	7-5= 2	4	7 - 4 = 3	9
30	0	26	5	31	10	46

Mean = 6
Median = 5
Mode = 4

2) Median: this is the midpoint in a set of ranked scores. In other words, the median is the point below which one half of the scores lie. It is the 50th percentile. To calculate the median, you must first put the numbers in order from lowest to highest. If the number of scores is odd, the median is the middle score. If the number of scores is even, the median is halfway between the two middle scores.

Scores	Scores in Rank Order	Median
2 7 6 3	2 3 6 7	(3 + 6)/2 = 4.5
4 1 3 5 7	1 3 4 5 7	4
8 7 9 3	3 7 8 9	(7 + 8)/2 = 7.5
6 2 5 3 1	1 2 3 5 6	3

3) Mode: least frequently used but is the only measure applicable to categorical data. It is the most frequently occurring score in a distribution. Usually it is located at the center of the distribution, but not always the case. If there are two modes, the distribution is called bimodal.

Scores	Mode
5 1 7 9 3 5	5
1 3 8 7 7 3 8 7	7
1 4 5 4 6 5 8 7	4 and 5
1 3 8 2 9 5	No mode

4) Comparison of Mean, Median, and Mode

1. Selection of a method for describing the central tendency of the data depends part on the scale of measurement of the variable. If data are nominal, only mode used. With ordinal, mode or median used, if ordinal data treated at interval level, may use the mean when describing the center of the data. When data are interval or ratio level of measurement, any of these measures may be used.

2. Statistically, mean is more stable than median or mode, considered to be the most sensitive.

3. Median is helpful with extreme scores or truncated data.

4. Mode used mostly for qualitative data.

V. Measures of Dispersion

a. Range

1) Simplest to calculate, used for range of scores.

2) To calculate, simply subtract lowest score from the highest. Example: If we subtract Joan's score from Sally's (98-65=33), we find that the range for the examination is 33.

3) In reporting results, might say that the scores ranged from 65 to 98, more meaningful than saying range was 33.

4) Range can be used to compare variability among distributions. For example, means for two exams were 80, however, one went from 50 to 100, the other 70 to 90. Although, means were identical, range of one was 50, range of other 20.

b. Interquartile Range (IR)

1) Scores from standardized examinations often reported in percentiles. The percentile rank for a score received on a particular test indicates what percent of the scores fall below that score. For example, a score of 78 may represent a percentile of 94, and you would know that you had done better than 94 percent of the individuals on whom the test was standardized. Percentiles let you know how you stand among your peers.

2) 50th percentile is at the midpoint, also the median. Percentiles are reported in terms of quartiles. The 100 percentile points are divided into four "quarters." The 25th percentile is called the first quartile (Q1), the 50th percentile is the second quartile (Q2), the 75th percentile is the third quartile (Q3), and the 100th percentile is the fourth quartile (Q4).

3) Simple range may be unstable because of extreme values. Interquartile Range (IR) can be used to deal with this difficulty. The IR is defined as Q3 - Q1. This gives the range of scores from the 25th to the 75th percentile, the middle 50% of the data.

c. Semi-interquartile range (Q)

1) Defined as (Q3 - Q1)/2, or the average amount by which these two quartiles vary from the median (50th percentile).

2) To locate the 25th percentile, take the number of cases in a distribution (n) and divide by 4 (or multiply by .25), and to locate the 75th percentile, take 3/4 of n (or multiply n times .75).

3) IR and Q reported when range is not representative of the distribution because of some extreme values.

d. Standard Deviation

1) Most commonly reported measure of variability. Usually, if mean is reported as the measure of central tendency, the standard deviation is reported as the measure of variability. Means and standard deviations are generally reported together, whether in texts or tables. Standard deviation represents the average amount by which the scores vary from the central score, the mean.

2) Reminder: X stands for deviation of a given score from the mean and is calculated as x =x – x bar. ∑x2 stands for the sum of the squared deviation. Small letter n stands for the number of subjects in a sample, and uppercase N stands for the number of subjects in a population. Greek letters are used for measures of the population (parameters), and Roman letters for measures of the sample (statistic). Small case sigma, s, is used for the parameter, and s is used for the statistic.

                                               s (standard deviation for population = √∑x2/N

                                              s (standard deviation for population = √∑x2/n-1

1. Subtracting 1 from the number of subjects in the sample gives an “unbiased” estimate of the standard deviation of the population.

2. Standard deviation is a measure of squared deviations around the mean. It is a measure of the “least squares” around the mean.

3) Variance

1. Variance is the average of the squared deviations.

2. Formula is the one for standard deviation without the square root sign. Population variance is denoted s2 and the sample is s2. This is because the variance is the standard deviation squared. The formulas are:

                                ∑X2 – (∑X)2
                                     -----
             ∑x2                    N
   s 2 =   ------        = -----------------

N                    N

                ∑X2 – (∑X)2
                                  -----
              ∑x2              n
   s 2 =   ------        = ----------------
            n - 1             n - 1

Standard Deviation Example

	X	X2	n=8
	96	9216	Mean = 86
	81	6561	Mode = 97
	97	9409	Median = 85
	97	9409
	87	7569
	70	4900
	83	6889
	77	5929
	---------- ∑X=688	------------- ∑X2=59882


SD for Sample	_________ s = √ ∑x2 – (∑x)2 n --------------- n - 1	___________ = √ 59882 – (688)2 8 --------------- 8 - 1	_____________ = √ 59882 – (473344) 8 ------------------- 7
	____________ = √ 59882 – 59168 ------------------- 7	______ = √102	= 10.10

SD for Population	_________ s = √ ∑x2 – (∑x)2 N --------------- N	___________ = √ 59882 – (688)2 8 --------------- 8	______________ = √ 59882 – (473344) 8 ------------------- 8
	___________ = √ 59882 – 59168 ------------------- 8	____ = √89.25	= 9.45

X	_ (X – X = 2)
4	4 - 6 = -2
4	4 - 6 = -2
10	10 - 6 = 4
5	5 – 6 = -1
7	7 – 6 = 1
∑X = 30	∑X= 0

drgwen.org tutorials