CLEAN variable {variable}

The CLEAN command removes "missing data" (NaN, or "Not a Number") from its given vector or vectors. It treats multiple vectors as having aligned data elements, i.e., the nth element in one vector is related to the nth element of all the others. If any vector has a NaN element, the elements at that same location of that vector and all the other vectors will be removed, shortening all the vectors by 1 for each NaN.

Normally, the vectors should all be the same length. If they are not, the short ones are extended to the length of the longest by repeating their last element.

NOTE: The CLEAN command permanently changes the contents and possibly the lengths of its input vectors.

During execution of the REGRESS command, the original Resampling Stats program automatically performed a "clean" prior to its computation, but did not do "clean" prior to the other commands such as CORR, SUMABSDEV and SUMSQRDEV. To clean your data for those commands required an error prone series of Resampling Stats instructions. Statistics101 provides the CLEAN functionality as a separate command so you can do it with a single statement any time you need to. Therefore the REGRESS command in Statistics101 does not automatically do a CLEAN.


If your data vectors contain "missing numbers", you should use this command prior to calling any of REGRESS, CORR, SUMABSDEV or SUMSQRDEV commands. For example,

 CLEAN y x1 x2 x3
 REGRESS y x1 x2 x3 result

If you have several vectors whose data elements are not correlated, but need to be cleansed of "missing data", treat each one with a separate CLEAN command. So, for example, to clean vectors a, b, and c, which do not have related elements, do it this way:

  CLEAN a
  CLEAN b
  CLEAN c

The following program

COPY (  1   2  NaN  4  5  6   7) A
COPY ( 11  22  33  44 55 66 NaN) B
COPY (111 NaN 333) C
CLEAN A B C
PRINT A B C

produces this result:

A: (1.0 4.0 5.0 6.0)
B: (11.0 44.0 55.0 66.0)
C: (111.0 333.0 333.0 333.0)

Whereas the following program, where the vectors are cleaned separately,

COPY (  1   2 NaN  4  5  6   7) A
COPY ( 11  22  33 44 55 66 NaN) B
COPY (111 NaN 333) C
CLEAN A
CLEAN B
CLEAN C
PRINT A B C

produces this result:

A: (1.0 2.0 4.0 5.0 6.0 7.0)
B: (11.0 22.0 33.0 44.0 55.0 66.0)
C: (111.0 333.0)