[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.11 Regular expression

Builtin Class: <regexp>
Regular expression object. You can construct a regexp object from a string by string->regexp at run time. Gauche also has a special syntax to denote regexp literals, which construct regexp object at loading time.

Gauche's regexp engine is fully aware of multibyte characters.

Builtin Class: <regmatch>
Regexp match object. A regexp matcher rxmatch returns this object if match. This object contains all the information about the match, including submatches.

The advantage of using match object, rather than substrings or list of indices is efficiency. The regmatch object keeps internal state of match, and computes indices and/or substrings only when requested. This is particularly effective for mutibyte strings, for index access is slow on them.

Reader Syntax: #/regexp-spec/
Reader Syntax: #/regexp-spec/i
Denotes literal regular expression object. When read, it becomes an instance of <regexp>.

If a letter 'i' is given at the end, the created regexp becomes case-folding regexp, i.e. it matches in the case-insensitive way. (The current version only cares ASCII characters for case-folding--- beyond ASCII characters, the match is done in the same way as normal match.)

The advantage of using this syntax over string->regexp is that the regexp is compiled only once. You can use literal regexp inside loop without worring about regexp compilation overhead. If you want to construct regexp on-the-fly, however, use string->regexp.

The recognized syntax is a subset of POSIX extended regular expression, with a bit of extension taken from Perl. Specifically, Gauche supports:

And Gauche does not supports:

Among those unsupported features, the first three will eventually be supported. It is unlikely to support back reference, however. If you use back reference, you're not dealing with regular grammer any more. And if you're dealing with higher class of grammer, there should be appropriate tools rather than regular expressions.

Function: string->regexp string &keyword case-fold
Takes string as a regexp specification, and constructs an instance of <regexp> object.

If a true value is given to the keyword argument case-fold, the created regexp object becomes case-folding regexp. (See the above explanation about case-folding regexp).

Function: regexp? obj
Returns true iff obj is a regexp object.

Function: regexp->string regexp
Returns a source string describing the regexp regexp. The returned string is immutable.

Function: rxmatch regexp string
Regexp is a regular expression object. A string string is matched by regexp. If it matches, the function returns a <regmatch> object. Otherwise it returns #f.

This is called match, regexp-search or string-match in some other Scheme implementations.

Generic application: regexp string
A regular expression object can be applied directly to the string. This works the same as (rxmatch regexp string), but allows shorter notation. See section 6.15.2 Applicable objects, for generic mechanism used to implement this.

Function: rxmatch-start match &optional (i 0)
Function: rxmatch-end match &optional (i 0)
Function: rxmatch-substring match &optional (i 0)
Match is a match object returned by rxmatch. If i equals to zero, the functions return start, end or the substring of entire match, respectively. With positive integer I, it returns those of I-th submatches. It is an error to pass other values to I.

It is allowed to pass #f to match for convenience. The functions return #f in such case.

These functions correspond to scsh's match:start, match:end and match:substring.

Function: rxmatch-after match &optional (i 0)
Function: rxmatch-before match &optional (i 0)
Returns substring of the input string after or before match. If optional argument is given, the i-th submatch is used (0-th submatch is the entire match).
(define match (rxmatch #/(\d+)\.(\d+)/ "pi=3.14..."))

(rxmatch-after match) => "..."
(rxmatch-after match 1) => ".14..."

(rxmatch-before match) => "pi="
(rxmatch-before match 2) => "pi=3."

Generic application: regmatch &optional index
Generic application: regmatch 'before &optional index
Generic application: regmatch 'after &optional index
A regmatch object can be applied directly to the integer index, or a symbol before or after. They works the same as (rxmatch-substring regmatch index), (rxmatch-before regmatch), and (rxmatch-after regmatch), respectively. This allows shorter notation. See section 6.15.2 Applicable objects, for generic mechanism used to implement this.

(define match (#/(\d+)\.(\d+)/ "pi=3.14..."))

(match)           => "3.14"
(match 1)         => "3"
(match 2)         => "14"

(match 'after)    => "..."
(match 'after 1)  => ".14..."

(match 'before)   => "pi="
(match 'before 2) => "pi=3."

Seel also 9.15 gauche.regexp - Regular expression utilities, which defines useful macros and functions to deal with regular expression matching.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated by Ken Dickey on November, 28 2002 using texi2html