STRING_REPLACE [FIRST] [MATCHCASE] regexString replacementString inputString resultString

The STRING_REPLACE command replaces one or all substrings in inputString that match the regular expression in regexString with the replacementString. The result is placed in resultString. By default, the match is made ignoring typographical case. If you want to match case, add the matchCase keyword. Also by default, all matches of the regexString are replaced. If you want to replace only the first occurrence of a match, add the keyword first.

A "regular expression", or "regex", is a string that describes a pattern of characters. The simplest regular expression is just a string of normal characters, such as "this is". When used in the STRING_REPLACE command as the regexString such a string represents itself. For example:

STRING_REPLACE "this is" "that is" "this is a string" result
PRINT result

produces:

that is a string

A regex can also contain "metacharacters". Metacharacters are characters that have special meaning. The metacharacters that can be used in a regex are any of these: (.[{\^-$|]})?*+. If you want to use a metacharacter as an ordinary character in your regex string, precede it with a backslash. But note that since backslash is an escape character in Java strings, to represent a single backslash in a regex you need to use two backslashes. For example to replace the extension (including the dot) in a file name, you must use a backslash ahead of the dot to force the dot to be a normal character:

STRING_REPLACE "\\..+$" ".doc" "myFile.ext" resultString
PRINT resultString

The result is:

myFile.doc

The meaning of the regex above is: the two backslashes and the first dot represent a literal dot. The next dot is a metacharacter that represents "any character", the "+" extends the "any character" dot to mean that there must be one or more characters until the end of the string, which is signified by the dollar sign. So you can read the regex as matching "a literal dot followed by one or more characters followed by the end of the string." The above replacement won't work the way you would want if there are more than one dot in a filename. That's because the regex will match the first occurrence of a dot all the way to the end of the string. But for an extension you only want to change the substring after the last dot. A more general regex, that will replace only the last dot in a string and anything after it would be:

STRING_REPLACE "\\.[^\\.]+$" ".doc" "myFile.john.html" resultString
PRINT result

for which the result is:

myFile.john.doc

The regex in the above command can be read as matching "a dot followed by any number of characters, none of which are dots, up to the end of the string."

Regular expressions is a powerful feature, but a complicated topic which takes a good deal of study. You can find tutorials on Java's regular expressions (which are what the STRING_REPLACE command accepts) here:

Quickstart guide: http://www.regular-expressions.info/quickstart.html

Other tutorials:

http://java.sun.com/docs/books/tutorial/essential/regex/intro.html

http://www.javaregex.com/tutorial.html

Here's a "Cheat sheet" that summarizes the metacharacters and their meanings (only the left-hand side of the sheet is applicable): http://www.omicentral.com/cheatsheets/JavaRegularExpressionsCheatSheet.pdf



See also: STRING, STRING_COMPARE


Here is a short program that you can use to test your regular expressions:

PRINT "========== Test Regex String =========="
INPUT STRING "Enter input string: " inputStr
INPUT STRING "Enter replacement string: " replacementStr
PRINT "----------------------"
quit = 1
UNTIL quit = 0
   STRING "<html>Target String: " inputStr "<br>Repl. String: " replacementStr "<br>" \
     "Enter regular expression to match (or \"quit\" to quit): " prompt
   INPUT STRING prompt regex
   STRING_COMPARE "quit" regex quit
   STRING_REPLACE matchCase regex replacementStr inputStr outputStr
   OUTPUT "--->"
   PRINT outputStr
   PRINT "----------------------"
END
EXIT "Quitting ..."



You can also test Java regular expressions online at http://www.fileformat.info/tool/regex.htm.

The following table shows some typical examples of regex strings and their result using the above program. Assume these inputs:

InputStr: It was the best of times, it was the worst of times.

ReplacementStr: XX

Also notice that MATCHCASE is chosen for the STRING_REPLACE command in the program.

Regular Expression

Result

Comment

it

It was the best of times, XX was the worst of times.

matchCase causes the first "It" to be ignored because of its capital "I".

(?i)it

XX was the best of times, XX was the worst of times.

The (?i) forces the matching engine to ignore case (even though matchCase is present), so both "it"s are replaced.

[bw].{1,2}st

It was the XX of times, it was the XX of times.

Matches any word beginning with "b" or "w" and ending with "st" with 1 to 2 characters between them.



If you want to delete the parts of a string that match the regex, use "" as the replacement string.