REGRESS [NOCONST] [NOPRINT] dependentVector independentVector {independentVector} resultVariable

Replaces the contents of the result variable with the coefficients of the linear regression equation determined by the dependent vector and the independent vector(s).

The first vector argument is taken to be the dependent variable, the last vector is taken to be the result variable, and those in between are taken to be the independent vectors.

The REGRESS command solves for the coefficients of the equation of the form:

Y = anXn + . . . + a2X2 + a1X1+ a0

where the Xi represent the independent variables and ai represents the coefficients. The coefficients in the resultVariable are in the same order as their respective independent vectors (I.e., the Xi).

The keyword noprint is ignored because Statistics101 does not print anything when REGRESS executes. Allowing the noprint keyword is for backward compatibility with Resampling Stats.

The keyword noconst is also ignored.

REGRESS does not automatically do a CLEAN. This differs from what the original Simon and Bruce Resampling Stats Manual says. The CLEAN command has been added to make that capability available at any time.

Example From "Statistics The Easy Way by Douglas Downing and Jeffrey Clark. p. 286: "Find multiple regression coefficients for private construction activity between 1976 and 1994." Note that each COPY command (as for all commands) must be all on one line, even though they are shown here on several lines just to fit the page or screen. The "\" is a marker that Statistics101 interprets to mean the the line is continued on the next line.

COPY 1,19 trend
COPY 1976,1994 year
COPY (5.04 5.54 7.93 11.19 13.36 16.38 12.26   \
 9.09 10.23 8.10 6.81 6.66 7.57 9.21 8.10 5.69 \
3.52 3.02 4.21) interestRate
COPY (7.7 7.1 6.1 5.8 7.1 7.6 9.7 9.6 7.5 7.2 \ 7.0 6.2 5.5 5.3 5.5 6.7 7.4 6.8 6.1) \ unemploymentRate COPY (165.4 193.1 230.2 259.8 259.7 272.0 \ 260.6 294.9 348.8 377.4 407.7 419.4 432.3 \ 443.7 442.2 403.4 435.0 464.5 506.9) \ newConstruction REGRESS newConstruction trend interestRate \ unemploymentRate result print result

The above program produces the following output:

result: (17.214658335347053 2.1913474611043284,
   -13.193145703307891 249.9537129900964)

Once you have the coefficients, you can use them with the PREDICT subroutine to compute the Y values for any appropriate values of the independent variables. The subroutine is included in the "/lib" directory in the file "predictCommand.txt".

Here's an example using REGRESS and PREDICT to evaluate visually the quality of a one-variable least-squares regression fit.

INCLUDE "lib/predictCommand.txt"

'Let's see how good the least-square regression method is at

'estimating a line from measured data that has random errors.

'First we create a line which represents the phenomenon we

'are to measure. We later pretend we don't know what the line

'is and try to estimate the line using "measured data". (You

'can change the slope and intercept to get different lines.)

x = 1,100

slope = 1

intercept = 20

'This is the ideal line without any measurement errors:

y = x * slope + intercept

'Now, we choose some x values at which to "measure" the

'phenomenon. We choose the x's at random from the original

'range of x then sort them for best graphing presentation.

'You can vary the number of samples to see the effect on

'the accuracy of the predicted line:

numberOfSamples = 10

UNIFORM numberOfSamples 1 100 xSamples

SORT xSamples xSamples

'Next, we use the ideal line to compute the matching Y

'values and add a random error to each one. This simulates a

'measurement that has a random error. You can experiment by

'changing the mean and/or standard deviation of the errors.

'If the mean is not zero, your measurement is biased. If you

'were using some instrument to do the measurements, bias

'means that the instrument is not properly calibrated.

errorMean = 0

errorStandardDev = 10

NORMAL numberOfSamples errorMean errorStandardDev errors

fakeMeasuredData = xSamples * slope + intercept + errors

'Now, we regress our "measured data" to generate the

'coefficients of a line that is the "best fit" to the data.

REGRESS fakeMeasuredData xSamples coefficients

'Next, we use the coefficients to generate the y values that

'would be predicted by the best fit line.

PREDICT coefficients yPredicted xSamples

'Now, draw the graph to show the original line, the "measured

'data" and the estimated line.

XYGRAPH pairs scatter 1 "Regression Evaluation" xSamples \

fakeMeasuredData x y xSamples yPredicted

Here is the result of one run of the above program: