The STRING_REPLACE command replaces one or all substrings in inputString that match the regular expression in regexString with the replacementString. The result is placed in resultString. By default, the match is made ignoring typographical case. If you want to match case, add the matchCase keyword. Also by default, all matches of the regexString are replaced. If you want to replace only the first occurrence of a match, add the keyword first. A "regular expression", or "regex", is a string that describes a pattern of characters. The simplest regular expression is just a string of normal characters, such as "this is". When used in the STRING_REPLACE command as the regexString such a string represents itself. For example: STRING_REPLACE "this is" "that is" "this is a string" result PRINT result produces: that is a string
A regex can also contain "metacharacters".
Metacharacters are characters that have special meaning. The
metacharacters that can be used in a regex are any of these:
STRING_REPLACE "\\..+$" ".doc" "myFile.ext" resultString PRINT resultString The result is: myFile.doc The meaning of the regex above is: the two backslashes and the first dot represent a literal dot. The next dot is a metacharacter that represents "any character", the "+" extends the "any character" dot to mean that there must be one or more characters until the end of the string, which is signified by the dollar sign. So you can read the regex as matching "a literal dot followed by one or more characters followed by the end of the string." The above replacement won't work the way you would want if there are more than one dot in a filename. That's because the regex will match the first occurrence of a dot all the way to the end of the string. But for an extension you only want to change the substring after the last dot. A more general regex, that will replace only the last dot in a string and anything after it would be: STRING_REPLACE "\\.[^\\.]+$" ".doc" "myFile.john.html" resultString PRINT result for which the result is: myFile.john.doc The regex in the above command can be read as matching "a dot followed by any number of characters, none of which are dots, up to the end of the string." Regular expressions is a powerful feature, but a complicated topic which takes a good deal of study. You can find tutorials on Java's regular expressions (which are what the STRING_REPLACE command accepts) here: Quickstart guide: http://www.regular-expressions.info/quickstart.html Other tutorials: http://java.sun.com/docs/books/tutorial/essential/regex/intro.html http://www.javaregex.com/tutorial.html Here's a "Cheat sheet" that summarizes the metacharacters and their meanings (only the left-hand side of the sheet is applicable): http://www.omicentral.com/cheatsheets/JavaRegularExpressionsCheatSheet.pdf
See also: STRING, STRING_COMPARE
|
Here is a short program that you can use to test your regular expressions: PRINT "========== Test Regex String ==========" INPUT STRING "Enter input string: " inputStr INPUT STRING "Enter replacement string: " replacementStr PRINT "----------------------" quit = 1 UNTIL quit = 0 STRING "<html>Target String: " inputStr "<br>Repl. String: " replacementStr "<br>" \ "Enter regular expression to match (or \"quit\" to quit): " prompt INPUT STRING prompt regex STRING_COMPARE "quit" regex quit STRING_REPLACE matchCase regex replacementStr inputStr outputStr OUTPUT "--->" PRINT outputStr PRINT "----------------------" END EXIT "Quitting ..."
You can also test Java regular expressions online at http://www.fileformat.info/tool/regex.htm. The following table shows some typical examples of regex strings and their result using the above program. Assume these inputs: InputStr: It was the best of times, it was the worst of times. ReplacementStr: XX Also notice that MATCHCASE is chosen for the STRING_REPLACE command in the program.
If you want to delete the parts of a string that match the regex, use "" as the replacement string. |