drgwen.org-Statistics Tutorial

MULTIPLE CORRELATION & REGRESSION

I. Multiple Correlations (Overview)

a. The relationship is measured between one variable and a combination of other variables. In r, talking about one independent variable (X), and one dependent variable (Y). In multiple correlation (R), talking about more than one independent variable (X1, X2, X3 and so on) and one dependent variable (Y).

II. Regression

a. Introduction

1) Regression is a technique that makes use of the correlation between variables and the notion of a straight line to develop a prediction equation. Once a relationship has been established between two variables, it is possible to develop an equation that allows us to predict the score of one of the variables, given the score of the other.

2) In multiple correlation, regression is used to establish a prediction equation (independent variables are each assigned a weight based on their relationship to the dependent variable).

3) Regression may be used in relation-searching and association-testing.

III.Simple Linear Regression

a.Simple Regression: A correlation between two variables used to develop a prediction equation. Based on a linear relationship.

1) The higher the correlation, the more accurate the prediction.

2) To be able to make predictions, the relationship between two variables, the independent (X) and the dependent (Y) must be measured. If there is a correlation, a regression equation can be developed that will allow prediction of Y, given X.

3)“Regression” means literally a falling back toward the mean. Each prediction "regresses" back toward the mean, depending on the strength of the correlation.

4) Prediction Equation

Y' = a + bX

1. Y' is the predicted score. Given data on X and Y from a sample of subjects called the regression sample, a and b can be calculated. With those two measures, Y can be predicted given X.

2. The letter a is called the intercept constant and is the value of Y when X=0. It is the point at which the regression line intercepts the Y axis.

3. The letter b is called the regression coefficient and is the rate of change in Y with a unit change in X. It is a measure of the slope of the regression line.

4. The regression line is the "line of best fit" and is formed by a technique called the method of least squares. Because the mean is the center of the data, the sum of the deviations of the scores around the mean ∑ (x-M), adds up to 0. Also, if you square those deviations and add them, that number will be smaller than the sum of the squared deviations around any other method of central tendency. In the same way, the regression line passes through the exact center of the scatter diagram; thus, it is the “line of best fit”. Regression line represents the predicted scores (Y’s), but since prediction is not perfect, actual scores (Ys) would deviate somewhat from predicted scores. Because regression line passes through the center of the pairs of scores, if you add up the deviations from the regression line (Y - Y’), they would equal 0.

IV.Multiple Regression: This is possible when there is a measurable multiple correlation between a group of predictor variables and one dependent variable. The prediction equation is:

Y' = a + b1X1 + b2X2 + b3X3 + ....bkXk

a.Significance Testing

1) When doing a simple, linear regression, the correlation between the two variables is tested for significance, and r2 represents meaningfulness.

2) With multiple correlation, interested not only in the significance of the overall R and thus the amount of variance accounted for (R2) but also in the significance of each of the independent variables.

3) In multiple regression, the multiple correlation is tested for significance and each of the b-weights is also tested for significance. Testing the b-weight tells whether or not the independent variable associated with it is contributing significantly to the variance accounted for in the dependent variable.

4) The F distribution is used for testing significance of the R2s, and either the F- or t-distribution is used to test the significance of the bs.

5) When testing the significance of the R2s, the degrees of freedom (df) are calculated as k/(n - k - 1). The k stands for the number of independent variables, and n stands for the number of subjects. When testing significance of b-weight, the df is 1/(n - k - 1).

drgwen.org tutorials