Statistics101Resampling Stats Commands

A simulation of a statistical or probability problem in the Resampling Stats language consists of a sequence of Resampling Stats commands. A Resampling Stats command usually consists of one text line that starts with the command name, followed by a list of arguments. The argument list consists of literals, variables, and keywords. The complete list of commands organized into categories can be found here.

Continuation Lines

Normally you will type each command and all its arguments on a single line. A line in the editor can be as long as needed. But sometimes you might want to divide a command over more than one line. You can do this by ending each line of the command except its last with the continuation character, which is the backslash ("\"). For example, the command

HISTOGRAM percent binsize 0.1 "Probability Density" "X value" distribution

could in the extreme, be written over several lines like this:

`HISTOGRAM percent \  binsize 0.1 \  "Probability Density" \  'Y axis label  "X value" \              'X axis label  distribution             'Last line has no continuation marker`

Note that the last line must not have a continuation character. If you have a comment on a continuation line, the comment must follow the continuation marker as in the first two comments in the above example. If you put the continuation marker after the comment, it will be considered part of the comment and ignored, resulting in a syntax error message. Also, do not break a line in the middle of a variable name, keyword, or quoted string.

Definitions

The basic data structure in Resampling Stats is the list, or vector, which is an ordered set of numbers. Each number in a vector is called an element. A vector consisting of just one number is also called a number. A sequence is a list of numbers, each of which differs by the same number, called the "stepsize", in the same direction from its predecessor.

A literal is a value in and of itself that has no name. There are several kinds of literals:

• A number literal is just an unnamed number. 12.34 is a literal number. So is 4.567E03, which is the scientific notation for the number 4567.0. Further comments on literal numbers:

• The simplest rule to remember is to avoid spaces and plus signs within numbers. Plus signs are never allowed in front of a number, but are optional in the exponent of a number in scientific notation. Examples of valid numbers: 1.234E12, -3.456E-5, 12.345e+3. Example of an invalid scientific notation numbers: +1.234E12 (has a leading plus sign), 1.234 E12 (has a space before the "E").

• A fractional number, such as 0.02 or 0.123 must have the zero prior to the decimal point. In other words, .02 and .123 are incorrect. In fact, the program will interpret them as having missing data. Thus, the command

COPY (1 .02 .123) myVector

will produce the result

myVector: (1.0 NaN 2.0 NaN 123.0)

• A list literal is represented in Resampling Stats as a set of numbers separated by one or more spaces (not commas) and enclosed within parentheses, such as (5 3 9 8 0 21 45).

• A sequence literal is specified by two or three numbers separated by commas. The first number becomes the first value in the sequence, the second number becomes the final value in the sequence, and the optional third number, called the "step" or "stepsize" specifies the difference between adjacent values. The step must be greater than zero. The comma is called the sequence operator. For example, the sequence of integers from 1 to 100 is represented by the literal 1,100. The sequence of integers from 20 down to 1 is represented by the literal 20,1. Note that the list literal (above) must be enclosed in parentheses, but the sequence literal must not be. The numbers may be integers, floating point numbers, or even named constants (see below). If only two numbers are used, then the difference between each adjacent pair of numbers in the sequence is one. If you want a different step size, you will add that as the third number. For example, 0,10,2 produces the sequence of even numbers between 0 and 10, inclusive. Another example, 2.5,7.5,0.5 produces the sequence (2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5).

• A multiple literal is specified by two numbers separated by a number sign, "#", in the format quantity#value. The number sign is called the multiple operator. For example, in its simplest form, 5#2 is interpreted as a list of 5 twos, i.e., it is equivalent to (2 2 2 2 2). See the COPY command for more information on the use of multiple specifications.

• Non-numerical literals are special characters which have significance within certain commands. Examples are # and %. Keywords, which are optional words that identify certain values to a command, are also non-numerical literals. Examples are: percent, used by the HISTOGRAM command, and divn used by the STDEV command.

• A missing data element literal is either a period (".") or "NaN". NaN stands for "Not a Number". These can be used in a data file to be read using the READ command or in a literal list in a program. For example, in the command COPY (1 2 3 NaN 5 . 7) data, the list has two missing values. The case (upper case or lower case) of NaN is not considered; therefore, "NAN", "nan", "nAn", etc., are all interpreted as "NaN". Missing data has the following behavior with regard to test operators (see Test Operators table) NaNs are equal to each other. For all other tests, they fail. For example, NaN is not equal to, greater than, or less than any actual number. NaN is not positive or negative infinity. This is different from their behavior in the original Resampling Stats where the missing values were equivalent to negative infinity.

A keyword is a word, recognized by Statistics101, which selects a feature of a command or alters the behavior of a command. For example, for the READ command, the keyword file is used to identify where the command should look for its data.

A variable in the Resampling Stats language is a named entity whose value is a list. A name begins with a letter and may be followed by any combination of letters, numbers, underscore ("_"), hyphen ("-") and dollar sign ("\$"). Upper and lower case are not distinguished, so the names "name", "NAME", and "NaMe" are treated as identical. A feature added by Statistics101 to the Resampling Stats language is that variables can be used in many places where before only literal constants were allowed. For example, values are allowed on either or both sides of the "#" in multiple specifications and on either or both sides of the "," in sequence specifications. For example, var1#var2 is valid as long as var1 and var2 are defined earlier in the program. Variables are not allowed in lists, however. For example, (1 2 var1 3) is illegal because it contains the variable name var1.

A named constant is a number that has been given a name by the NAME command. The rules for valid names of named constants are the same as those for variables. Named constants can be used wherever literal numbers can be used. Thus if ace and king are named constants (say with values 1 and 13, respectively), they can be used in lists, e.g., (ace 3 4 5 king), in sequences, e.g., ace,king, and in multiple specifications, e.g., 4#ace. The named constant is a new feature added to the Resampling Stats language by Statistics101. There are two categories of named constants: Names and Named Values.

• Enums are named constants to which the user (you) has not assigned a value. Statistics101 assigns an arbitrary undisclosed value to the name. You use enums when the value of the element is not important. For example, it is not important what numbers are assigned to the names heads and tails; only that their values be different.

• Named Values are named constants to which you have assigned specific values. You use these when the number associated with the name is important. Examples of this are ace and king above, which were assigned the values 1 and 13.

As stated earlier, named constants can be used anywhere that normal numbers can be used. When they are used in sequences, such as ace,king , the sequence is determined by the order of the names between ace and king, inclusive, when they were defined in their NAME command, not by their assigned values, if any.

Throughout the Resampling Stats language, typographical case is not distinguished. Commands, keywords, and variable names may be upper case, lower case, or any combination. Thus, two variables whose names vary only in their case are the same variable. For example, myVar and MYVAR and MyVar are all considered the same.

The values of named constants and literals are established during "parse time" as Statistics101 analyzes and builds your program. The values of variables are determined at execution, or "run time", as your program is actually being executed. Therefore, the value of a sequence or multiple specification that uses variables will be undefined until run time whereas those using only literals or named constants will be defined at parse time. This is a technical point that may be of interest only to a very few users.

Command Description Syntax

All the commands of the Resampling Stats language that are implemented by Statistics101 are listed and described in the Command Description Table. Each command description has a one-line entry that describes its argument list. For example, the command syntax of the ADD command is the following:

The following table explains how to interpret each part of the command syntax descriptions. Based on the table, the above command means that the ADD command takes at least two input vectors and one result variable. If you find it difficult to understand or apply the syntax description for a command, try using the Wizard for that command, which you can access via the F2 key or the Help>Wizards menu item.

Item

Description

Example

Command Name

The name of the command is always the first non-blank item on each line. The name may be entered in upper case, lower case, or mixed case. Statistics101 is insensitive to the case of the command.

Note that command names are not "reserved", i.e., they can also be used as variable names if that makes sense. Statistics101 distinguishes between a command and a variable that have the same name by position: if it is first on the line, it is a command; otherwise it is a variable name. This usage is not necessarily recommended, but it is allowed.

The following are considered identical by Statistics101:

`ABS vector1 resultVectorAbs Vector1 ResultVector`

Here's an example of re-use of a command name as a variable name:

`ABS vector1 absPRINT abs`

Any UPPER CASE character string

Any word in upper case in the syntax description represents itself. You may type the word into your Resampling Stats program command in either upper or lower case since case is ignored by Statistics101.

STDEV [DIVN | POP] inputVector resultVariable

In the above command syntax description, the keywords "DIVN" and "POP" represent themselves. The keywords are optional as indicated by the square brackets, but only one may be selected as indicated by the bar, "|". An example satisfying the above syntax specification, for the STDEV command, would be:

`STDEV divn mySampleData myResult`

...Number

...Vector

...Variable

...String

...StringVariable

Words with these endings represent user-supplied items, such as a number, a vector, a variable name. In the command syntax tables, the ellipsis (...) is replaced by a descriptive prefix, producing names such as: sizeNumber, inputVector, and resultVariable. See the examples at the right. The terms used in the command descriptions are defined as follows.

Variable: a named vector whose contents can change during the program's run.

LiteralNumber: an actual number, such as 123, 1.23, or -1.23e4.

NamedConstant: a single number that has been given a name via the NAME command.

LiteralList: any one of literalNumber, a series of literalNumbers and/or namedConstants enclosed in parentheses, a multiple specification, or a sequence specification. Examples, respectively: 123 (1 2 3 4) 3#5 1,10.

Number: any one of literalNumber, namedConstant, or variable. If it is a variable that has more than one element, only the first element is used.

Vector: any one of literalNumber, literalList, namedConstant, vector variable, or array variable.

String: either a literal string or a string variable. A literal string is any set of characters enclosed within double quotes, such as "this is a literal string".

StringVariable: A named variable whose contents is a text string

In the syntax description of the NORMAL command,

NORMAL sizeNumber meanNumber standardDeviationNumber resultVariable

sizeNumber means that the user must supply a number representing the size of sample to be taken.

meanNumber means that the user must supply a number representing the mean of the desired distribution.

standardDeviationNumber means that the user must supply a number representing the standard deviation of the desired distribution.

resultVariable means that the user must supply the name of a variable to accept the result of a command.

Where the terms number and variable are defined in the column to the left.

(...)

Parentheses are used to group choices of required items.

READ [FILE (\"fileName\" | fileNameStringVariable) ] | ARGFILE] ...

Here, the FILE keyword is optional, but if it is present, then it must be followed by either a quoted literal file name or by a string variable containing the file name.

[...]

Anything between square brackets is optional.

RANKS [DESCENDING] inputVector resultVariable

Here, the keyword "descending" represents itself, but its presence is optional. Here is an example using the command:

`RANKS descending inVector outVector`

{...}

Anything between curly brackets can be present zero or more times

BOXPLOT inputVector {inputVector}

This command description means that the BOXPLOT command takes at least one vector argument (i.e., "one inputVector plus zero or more inputVectors").

|

The vertical bar represents "or". It separates alternative terms. Only one of the alternative terms can be chosen for a single command.

SEED [JAVA | MERSENNE] [literalNumber]

This means that for the SEED command, you have the option (expressed by the square brackets) of choosing either the word "java" or the word "mersenne", but not both, as the first argument of the SEED command. Again, case doesn't matter.

test

A comparison of each element of a vector with one or two numbers that returns a true or false result. For more information on tests see the section on tests.

Note that the results of all tests are affected by the FUZZ command.

Also note that for all tests the one or two numbers to the right of the test operator may be variables and/or named constants. For example,

COUNT Z between A B result

The test comparison operators are:

• > Greater than

• < Less than

• = Equal

• <> Not Equal

• >= or => Greater than or equal

• <= or =< Less than or equal

• memberof

• notMemberOf

• between

• notBetween

Test logical operators are (these are only allowed with IF, ELSEIF, and WHILE):

• AND

• OR

• XOR

• NOT

See the "Test Operators" table below for further explanation.

Tests

There are eight commands that require tests. These are COUNT, IF, MULTIPLES, RECODE, RUNS, TAGS, WEED, and WHILE. A test limits a command so that it only operate on input vector elements that pass the test. Most of the tests compare elements of a vector on the left of the test operator to a number on the right. Two of the tests (between and notBetween) compare elements of a vector on their left to two numbers on the right of the test operator. See the examples in the Test Operators table immediately below. The right-hand argument(s) of a test need not be a literal number(s). They may be vectors, but if they are, only the first element of the right-hand argument(s) is used by the test. Thus, if you have the command COUNT vec1 = vec2 result, only the first element in vec2 is used for the test against all the elements of vec1. The arguments of a test may also be named constants. As usual, the typographical case of the keywords is ignored, so for example, notBetween is the same as notbetween and NOTBetween.

Test Operators

Operator

Description

>

Greater than. The element on the left is greater than the number on the right. E.g., the following command counts the number of elements of Z that are greater than 5 and puts the result in result.

`COUNT Z > 5 result`

<

Less than. The element on the left is less than the number on the right. . E.g.,

`COUNT Z < 5 result`

=

Equal. The element on the left is equal to the number on the right. E.g.,

COUNT Z = 5 result

<>

Not equal. The element on the left is not equal to the number on the right. E.g.,

`COUNT Z <> 4 result`

>=

or

=>

Greater than or equal. The element on the left is greater than or equal to the number on the right. E.g.,

`COUNT Z >= 5 result`

<=

or

=<

Less than or equal. The element on the left is less than or equal to the number on the right. E.g.,

`COUNT Z <= 5 result`

memberof

The element on the left is a member of the list on the right. E.g., The following command counts how many elements of Z are members of aList and puts the answer in result.

`COUNT Z memberof aList result`

notmemberof

The element on the left is not a member of the list on the right. E.g., The following command counts how many elements of Z are not members of aList and puts the answer in result.

`COUNT Z notMemberOf aList result`

between

The element on the left is between (inclusive) the two numbers on the right.

E.g., The following command counts the number of elements of Z that are between 1 and 10 and puts the answer in result. Note: there is no comma between the 1 and the 10.

COUNT Z between 1 10 result

Also, the limits may be in reverse order and it still works:

COUNT Z between 10 1 result

notbetween

The element on the left is not between (inclusive) the two numbers on the right.

E.g., The following command counts the number of elements of Z that are not between 1 and 10 and puts the answer in result. Note: there is no comma between the 1 and the 10.

COUNT Z notBetween 1 10 result

Also, the limits may be in reverse order and it still works:

COUNT Z notBetween 10 1 result

Logical Expressions

The term "Logical Expressions" includes simple tests such as those above, and compound tests. "Compound tests" consist of several tests that are linked by the logical operators, AND, OR, XOR, and NOT. Logical expressions are used with the IF, ELSEIF, and WHILE commands. All the logical operators are described in the Logical Operators table, below. Here is an example of a snippet of code that does not use logical expressions:

```   IF player1 = twin
IF player2 = twin
END
END```

Writing an equivalent snippet using a logical expression simplifies the code as follows:

```   IF player1 = twin AND player2 = twin
END```

The operators AND, OR, and XOR are called "binary" operators because they take two operands, or arguments, one on each side. The NOT operator is called a "unary" operator because it takes only one operand, that on its right side. A logical expression may be of any complexity and length but it must be entirely on one line as must all Resampling Stats commands. As usual, the operators may be in upper case, lower case, or mixed case.

In the absence of parentheses, the order in which the logical operators are executed is determined by their default "precedence". Higher precedence operators are executed before lower precedence operators. The operators OR, and XOR are of equal, and lowest, precedence. AND is of higher precedence, and NOT is of highest precedence. The following examples will help to clarify the precedence rules. In the examples, the terms "test1" and "test2" etc. are used as a "shorthand" to represent tests of the form "a = b" or "a < b". Thus, a complete logical expression such as

`a > b AND c = d OR e between 1 2 AND f < g`

can be written like this in that shorthand:

`test1 AND test2 OR test3 AND test4`

This shorthand is just for the purposes of visually simplifying the following examples. It is not valid Resampling Stats syntax and cannot be used in the Statistics101 program. Using the shorthand, the logical expression represented by,

`test1 AND test2 OR test3 AND test4`

is equivalent to

`(test1 AND test2) OR (test3 AND test4)`

because of the default precedence rules. The two AND operators will be executed prior to the OR operator. Another example, this time adding a NOT:

`test1 AND not test2 OR test3 AND test4`

The above logical expression is equivalent to:

`(test1 AND (not test2)) OR (test3 AND test4)`

A good way to understand how the precedence rules apply in the absence of parentheses that override the defaults is this: First all NOTs bind to the test that follows them, second, all ANDs bind to their arguments, and third, all ORs bind to their arguments that are the results of the first two steps. In the second step, if there are several ANDs in sequence they are all grouped together as in this next comparison of equivalent expressions:

`test1 AND test2 AND test3 OR test4 AND test5 AND test6`

The above logical expression is equivalent to:

`(test1 AND test2 AND test3) OR (test4 AND test5 AND test6)`

Applying the rules to this next expression which has several ORs in sequence with an AND in the middle:

`test1 OR test2 OR test3 AND test4 OR test5 OR test6`

results in this equivalent expressed using parentheses:

`test1 OR test2 OR (test3 AND test4) OR test5 OR test6`

If you have any uncertainty about how the defaults apply in a complex logical expression that you write, you should use parentheses to force the expression to have the meaning you want.

Logical Operators

Operator

Precedence

Description

NOT

1
(highest)

Logical NOT. Results in true if the test on its right is false. Results in false if the test on its right is true. Highest precedence of all the logical operators. Example:

`IF NOT A = 4`

which happens to be equivalent to

`IF A <> 4`

AND

2

(middle)

Logical AND. Results in true if both the test on its left and the test on its right are true. Results in false otherwise. Precedence is below that of NOT and above that of OR and XOR. Example:

`IF A > 5 AND B between 1 2`

OR

3

(lowest)

Logical OR. Results in true if either the test on its left is true, or the test on its right is true, or both are true. Results in false otherwise. XOR and OR are of equal precedence. Theirs is the lowest precedence. Example:

`IF A < 5 OR B <> C`

XOR

3

(lowest)

Logical EXCLUSIVE OR. Results in true if either the test on its left is true or the test on its right is true, but not both. Results in false otherwise. XOR and OR are of equal precedence. Theirs is the lowest precedence. Example:

`IF A = 5 XOR B = 15`