| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
string->regexp at run time. Gauche also
has a special syntax to denote regexp literals, which construct
regexp object at loading time.
Gauche's regexp engine is fully aware of multibyte characters.
rxmatch returns
this object if match. This object contains all the information
about the match, including submatches.
The advantage of using match object, rather than substrings or list of indices is efficiency. The regmatch object keeps internal state of match, and computes indices and/or substrings only when requested. This is particularly effective for mutibyte strings, for index access is slow on them.
#/regexp-spec/
#/regexp-spec/i
<regexp>.
If a letter 'i' is given at the end, the created regexp
becomes case-folding regexp, i.e. it matches in the case-insensitive
way. (The current version only cares ASCII characters for case-folding---
beyond ASCII characters, the match is done in the same way as normal match.)
The advantage of using this syntax over string->regexp is
that the regexp is compiled only once. You can use literal regexp
inside loop without worring about regexp compilation overhead.
If you want to construct regexp on-the-fly, however, use string->regexp.
The recognized syntax is a subset of POSIX extended regular expression, with a bit of extension taken from Perl. Specifically, Gauche supports:
*, + and ?.
(). No backslash escape is needed.
|.
[]. The syntax recognized by character set
literals (See section 6.9 Character Set) are valid.
^ and end of the string $ assertions.
\d for digits, \D for non-digits, \w for alfphanumeric
characters, \W for non-alphanumeric characters, \s for
whitespace characters, and \S for non-whitespace characters.
These can be used both inside and outside of [].
And Gauche does not supports:
[=e=] and [.ll.].
{m,n}.
Among those unsupported features, the first three will eventually be supported. It is unlikely to support back reference, however. If you use back reference, you're not dealing with regular grammer any more. And if you're dealing with higher class of grammer, there should be appropriate tools rather than regular expressions.
<regexp> object.
If a true value is given to the keyword argument case-fold, the created regexp object becomes case-folding regexp. (See the above explanation about case-folding regexp).
<regmatch>
object. Otherwise it returns #f.
This is called match, regexp-search or string-match
in some other Scheme implementations.
(rxmatch regexp string),
but allows shorter notation. See section 6.15.2 Applicable objects, for
generic mechanism used to implement this.
rxmatch.
If i equals to zero, the functions return
start, end or the substring of entire match, respectively.
With positive integer I, it returns those of I-th
submatches. It is an error to pass other values to I.
It is allowed to pass #f to match for convenience.
The functions return #f in such case.
These functions correspond to scsh's match:start, match:end
and match:substring.
(define match (rxmatch #/(\d+)\.(\d+)/ "pi=3.14...")) (rxmatch-after match) => "..." (rxmatch-after match 1) => ".14..." (rxmatch-before match) => "pi=" (rxmatch-before match 2) => "pi=3." |
'before &optional index
'after &optional index
before or after.
They works the same as (rxmatch-substring regmatch index),
(rxmatch-before regmatch), and
(rxmatch-after regmatch), respectively.
This allows shorter notation. See section 6.15.2 Applicable objects, for
generic mechanism used to implement this.
(define match (#/(\d+)\.(\d+)/ "pi=3.14...")) (match) => "3.14" (match 1) => "3" (match 2) => "14" (match 'after) => "..." (match 'after 1) => ".14..." (match 'before) => "pi=" (match 'before 2) => "pi=3." |
Seel also 9.15 gauche.regexp - Regular expression utilities, which defines
useful macros and functions to deal with regular expression matching.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |