Statistics101 Resampling Stats Commands

A simulation of a statistical or probability problem in the Resampling Stats language consists of a sequence of Resampling Stats commands. A Resampling Stats command usually consists of one text line that starts with the command name, followed by a list of arguments. The argument list consists of literals, variables, and keywords. The complete list of commands organized into categories can be found here.

Continuation Lines

Normally you will type each command and all its arguments on a single line. A line in the editor can be as long as needed. But sometimes you might want to divide a command over more than one line. You can do this by ending each line of the command except its last with the continuation character, which is the backslash ("\"). For example, the command

HISTOGRAM percent binsize 0.1 "Probability Density" "X value" distribution

could in the extreme, be written over several lines like this:

HISTOGRAM percent \
binsize 0.1 \
"Probability Density" \ 'Y axis label
"X value" \ 'X axis label
distribution 'Last line has no continuation marker

Note that the last line must not have a continuation character. If you have a comment on a continuation line, the comment must follow the continuation marker as in the first two comments in the above example. If you put the continuation marker after the comment, it will be considered part of the comment and ignored, resulting in a syntax error message. Also, do not break a line in the middle of a variable name, keyword, or quoted string.

Definitions

The basic data structure in Resampling Stats is the list, or vector, which is an ordered set of numbers. Each number in a vector is called an element. A vector consisting of just one number is also called a number. A sequence is a list of numbers, each of which differs by the same number, called the "stepsize", in the same direction from its predecessor.

A literal is a value in and of itself that has no name. There are several kinds of literals:

COPY (1 .02 .123) myVector

will produce the result

myVector: (1.0 NaN 2.0 NaN 123.0)

A keyword is a word, recognized by Statistics101, which selects a feature of a command or alters the behavior of a command. For example, for the READ command, the keyword file is used to identify where the command should look for its data.

A variable in the Resampling Stats language is a named entity whose value is a list. A name begins with a letter and may be followed by any combination of letters, numbers, underscore ("_"), hyphen ("-") and dollar sign ("$"). Upper and lower case are not distinguished, so the names "name", "NAME", and "NaMe" are treated as identical. A feature added by Statistics101 to the Resampling Stats language is that variables can be used in many places where before only literal constants were allowed. For example, values are allowed on either or both sides of the "#" in multiple specifications and on either or both sides of the "," in sequence specifications. For example, var1#var2 is valid as long as var1 and var2 are defined earlier in the program. Variables are not allowed in lists, however. For example, (1 2 var1 3) is illegal because it contains the variable name var1.

A named constant is a number that has been given a name by the NAME command. The rules for valid names of named constants are the same as those for variables. Named constants can be used wherever literal numbers can be used. Thus if ace and king are named constants (say with values 1 and 13, respectively), they can be used in lists, e.g., (ace 3 4 5 king), in sequences, e.g., ace,king, and in multiple specifications, e.g., 4#ace. The named constant is a new feature added to the Resampling Stats language by Statistics101. There are two categories of named constants: Names and Named Values.

As stated earlier, named constants can be used anywhere that normal numbers can be used. When they are used in sequences, such as ace,king , the sequence is determined by the order of the names between ace and king, inclusive, when they were defined in their NAME command, not by their assigned values, if any.

Throughout the Resampling Stats language, typographical case is not distinguished. Commands, keywords, and variable names may be upper case, lower case, or any combination. Thus, two variables whose names vary only in their case are the same variable. For example, myVar and MYVAR and MyVar are all considered the same.

The values of named constants and literals are established during "parse time" as Statistics101 analyzes and builds your program. The values of variables are determined at execution, or "run time", as your program is actually being executed. Therefore, the value of a sequence or multiple specification that uses variables will be undefined until run time whereas those using only literals or named constants will be defined at parse time. This is a technical point that may be of interest only to a very few users.

Command Description Syntax

All the commands of the Resampling Stats language that are implemented by Statistics101 are listed and described in the Command Description Table. Each command description has a one-line entry that describes its argument list. For example, the command syntax of the ADD command is the following:

ADD inputVector inputVector {inputVector} resultVariable

The following table explains how to interpret each part of the command syntax descriptions. Based on the table, the above command means that the ADD command takes at least two input vectors and one result variable. If you find it difficult to understand or apply the syntax description for a command, try using the Wizard for that command, which you can access via the F2 key or the Help>Wizards menu item.

Item

Description

Example

Command Name

The name of the command is always the first non-blank item on each line. The name may be entered in upper case, lower case, or mixed case. Statistics101 is insensitive to the case of the command.

Note that command names are not "reserved", i.e., they can also be used as variable names if that makes sense. Statistics101 distinguishes between a command and a variable that have the same name by position: if it is first on the line, it is a command; otherwise it is a variable name. This usage is not necessarily recommended, but it is allowed.

The following are considered identical by Statistics101:

ABS vector1 resultVector
Abs Vector1 ResultVector

Here's an example of re-use of a command name as a variable name:

ABS vector1 abs
PRINT abs


Any UPPER CASE character string

Any word in upper case in the syntax description represents itself. You may type the word into your Resampling Stats program command in either upper or lower case since case is ignored by Statistics101.

STDEV [DIVN | POP] inputVector resultVariable

In the above command syntax description, the keywords "DIVN" and "POP" represent themselves. The keywords are optional as indicated by the square brackets, but only one may be selected as indicated by the bar, "|". An example satisfying the above syntax specification, for the STDEV command, would be:

STDEV divn mySampleData myResult


...Number

...Vector

...Variable

...String

...StringVariable


Words with these endings represent user-supplied items, such as a number, a vector, a variable name. In the command syntax tables, the ellipsis (...) is replaced by a descriptive prefix, producing names such as: sizeNumber, inputVector, and resultVariable. See the examples at the right. The terms used in the command descriptions are defined as follows.

Variable: a named vector whose contents can change during the program's run.

LiteralNumber: an actual number, such as 123, 1.23, or -1.23e4.

NamedConstant: a single number that has been given a name via the NAME command.

LiteralList: any one of literalNumber, a series of literalNumbers and/or namedConstants enclosed in parentheses, a multiple specification, or a sequence specification. Examples, respectively: 123 (1 2 3 4) 3#5 1,10.

Number: any one of literalNumber, namedConstant, or variable. If it is a variable that has more than one element, only the first element is used.

Vector: any one of literalNumber, literalList, namedConstant, vector variable, or array variable.

String: either a literal string or a string variable. A literal string is any set of characters enclosed within double quotes, such as "this is a literal string".

StringVariable: A named variable whose contents is a text string

In the syntax description of the NORMAL command,

NORMAL sizeNumber meanNumber standardDeviationNumber resultVariable

sizeNumber means that the user must supply a number representing the size of sample to be taken.

meanNumber means that the user must supply a number representing the mean of the desired distribution.

standardDeviationNumber means that the user must supply a number representing the standard deviation of the desired distribution.

resultVariable means that the user must supply the name of a variable to accept the result of a command.

Where the terms number and variable are defined in the column to the left.


(...)

Parentheses are used to group choices of required items.

READ [FILE (\"fileName\" | fileNameStringVariable) ] | ARGFILE] ...

Here, the FILE keyword is optional, but if it is present, then it must be followed by either a quoted literal file name or by a string variable containing the file name.

[...]

Anything between square brackets is optional.

RANKS [DESCENDING] inputVector resultVariable

Here, the keyword "descending" represents itself, but its presence is optional. Here is an example using the command:

RANKS descending inVector outVector

{...}

Anything between curly brackets can be present zero or more times

BOXPLOT inputVector {inputVector}

This command description means that the BOXPLOT command takes at least one vector argument (i.e., "one inputVector plus zero or more inputVectors").

|

The vertical bar represents "or". It separates alternative terms. Only one of the alternative terms can be chosen for a single command.

SEED [JAVA | MERSENNE] [literalNumber]

This means that for the SEED command, you have the option (expressed by the square brackets) of choosing either the word "java" or the word "mersenne", but not both, as the first argument of the SEED command. Again, case doesn't matter.

test

A comparison of each element of a vector with one or two numbers that returns a true or false result. For more information on tests see the section on tests.

Note that the results of all tests are affected by the FUZZ command.

Also note that for all tests the one or two numbers to the right of the test operator may be variables and/or named constants. For example,

COUNT Z between A B result

The test comparison operators are:

  • > Greater than

  • < Less than

  • = Equal

  • <> Not Equal

  • >= or => Greater than or equal

  • <= or =< Less than or equal

  • memberof

  • notMemberOf

  • between

  • notBetween


Test logical operators are (these are only allowed with IF, ELSEIF, and WHILE):

  • AND

  • OR

  • XOR

  • NOT


See the "Test Operators" table below for further explanation.

Tests

There are eight commands that require tests. These are COUNT, IF, MULTIPLES, RECODE, RUNS, TAGS, WEED, and WHILE. A test limits a command so that it only operate on input vector elements that pass the test. Most of the tests compare elements of a vector on the left of the test operator to a number on the right. Two of the tests (between and notBetween) compare elements of a vector on their left to two numbers on the right of the test operator. See the examples in the Test Operators table immediately below. The right-hand argument(s) of a test need not be a literal number(s). They may be vectors, but if they are, only the first element of the right-hand argument(s) is used by the test. Thus, if you have the command COUNT vec1 = vec2 result, only the first element in vec2 is used for the test against all the elements of vec1. The arguments of a test may also be named constants. As usual, the typographical case of the keywords is ignored, so for example, notBetween is the same as notbetween and NOTBetween.


Test Operators

Operator

Description

>

Greater than. The element on the left is greater than the number on the right. E.g., the following command counts the number of elements of Z that are greater than 5 and puts the result in result.

COUNT Z > 5 result

<

Less than. The element on the left is less than the number on the right. . E.g.,

COUNT Z < 5 result

=

Equal. The element on the left is equal to the number on the right. E.g.,

COUNT Z = 5 result

<>

Not equal. The element on the left is not equal to the number on the right. E.g.,

COUNT Z <> 4 result

>=

or

=>

Greater than or equal. The element on the left is greater than or equal to the number on the right. E.g.,

COUNT Z >= 5 result

<=

or

=<

Less than or equal. The element on the left is less than or equal to the number on the right. E.g.,

COUNT Z <= 5 result

memberof

The element on the left is a member of the list on the right. E.g., The following command counts how many elements of Z are members of aList and puts the answer in result.

COUNT Z memberof aList result

notmemberof

The element on the left is not a member of the list on the right. E.g., The following command counts how many elements of Z are not members of aList and puts the answer in result.

COUNT Z notMemberOf aList result

between

The element on the left is between (inclusive) the two numbers on the right.

E.g., The following command counts the number of elements of Z that are between 1 and 10 and puts the answer in result. Note: there is no comma between the 1 and the 10.

COUNT Z between 1 10 result

Also, the limits may be in reverse order and it still works:

COUNT Z between 10 1 result

notbetween

The element on the left is not between (inclusive) the two numbers on the right.

E.g., The following command counts the number of elements of Z that are not between 1 and 10 and puts the answer in result. Note: there is no comma between the 1 and the 10.

COUNT Z notBetween 1 10 result

Also, the limits may be in reverse order and it still works:

COUNT Z notBetween 10 1 result


Logical Expressions

The term "Logical Expressions" includes simple tests such as those above, and compound tests. "Compound tests" consist of several tests that are linked by the logical operators, AND, OR, XOR, and NOT. Logical expressions are used with the IF, ELSEIF, and WHILE commands. All the logical operators are described in the Logical Operators table, below. Here is an example of a snippet of code that does not use logical expressions:

   IF player1 = twin
      IF player2 = twin
         ADD 1 twinsMatchedCount twinsMatchedCount
      END
   END

Writing an equivalent snippet using a logical expression simplifies the code as follows:

   IF player1 = twin AND player2 = twin
         ADD 1 twinsMatchedCount twinsMatchedCount
   END

The operators AND, OR, and XOR are called "binary" operators because they take two operands, or arguments, one on each side. The NOT operator is called a "unary" operator because it takes only one operand, that on its right side. A logical expression may be of any complexity and length but it must be entirely on one line as must all Resampling Stats commands. As usual, the operators may be in upper case, lower case, or mixed case.

In the absence of parentheses, the order in which the logical operators are executed is determined by their default "precedence". Higher precedence operators are executed before lower precedence operators. The operators OR, and XOR are of equal, and lowest, precedence. AND is of higher precedence, and NOT is of highest precedence. The following examples will help to clarify the precedence rules. In the examples, the terms "test1" and "test2" etc. are used as a "shorthand" to represent tests of the form "a = b" or "a < b". Thus, a complete logical expression such as

a > b AND c = d OR e between 1 2 AND f < g

can be written like this in that shorthand:

test1 AND test2 OR test3 AND test4

This shorthand is just for the purposes of visually simplifying the following examples. It is not valid Resampling Stats syntax and cannot be used in the Statistics101 program. Using the shorthand, the logical expression represented by,

test1 AND test2 OR test3 AND test4

is equivalent to

(test1 AND test2) OR (test3 AND test4)

because of the default precedence rules. The two AND operators will be executed prior to the OR operator. Another example, this time adding a NOT:

test1 AND not test2 OR test3 AND test4

The above logical expression is equivalent to:

(test1 AND (not test2)) OR (test3 AND test4)

A good way to understand how the precedence rules apply in the absence of parentheses that override the defaults is this: First all NOTs bind to the test that follows them, second, all ANDs bind to their arguments, and third, all ORs bind to their arguments that are the results of the first two steps. In the second step, if there are several ANDs in sequence they are all grouped together as in this next comparison of equivalent expressions:

test1 AND test2 AND test3 OR test4 AND test5 AND test6

The above logical expression is equivalent to:

(test1 AND test2 AND test3) OR (test4 AND test5 AND test6)

Applying the rules to this next expression which has several ORs in sequence with an AND in the middle:

test1 OR test2 OR test3 AND test4 OR test5 OR test6

results in this equivalent expressed using parentheses:

test1 OR test2 OR (test3 AND test4) OR test5 OR test6

If you have any uncertainty about how the defaults apply in a complex logical expression that you write, you should use parentheses to force the expression to have the meaning you want.

Logical Operators

Operator

Precedence

Description

NOT

1
(highest)

Logical NOT. Results in true if the test on its right is false. Results in false if the test on its right is true. Highest precedence of all the logical operators. Example:

IF NOT A = 4

which happens to be equivalent to

IF A <> 4

AND

2

(middle)

Logical AND. Results in true if both the test on its left and the test on its right are true. Results in false otherwise. Precedence is below that of NOT and above that of OR and XOR. Example:

IF A > 5 AND B between 1 2

OR

3

(lowest)

Logical OR. Results in true if either the test on its left is true, or the test on its right is true, or both are true. Results in false otherwise. XOR and OR are of equal precedence. Theirs is the lowest precedence. Example:

IF A < 5 OR B <> C

XOR

3

(lowest)

Logical EXCLUSIVE OR. Results in true if either the test on its left is true or the test on its right is true, but not both. Results in false otherwise. XOR and OR are of equal precedence. Theirs is the lowest precedence. Example:

IF A = 5 XOR B = 15