[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.15 gauche.regexp - Regular expression utilities

Module: gauche.regexp
This module defines some macros and utilities useful in regexp match. See 6.11 Regular expression for builtin regexp features.

As of release 0.4.11, this module is set to be autoloaded in gosh, so you don't usually need to say (use gauche.regexp).

The interface of some of the macros is borrowed from scsh (if-match, let-match and match-cond), but I changed the name of macros since scsh's match-cond can be confusing (e.g. Bigloo has match-lambda and match-case in pattern match library, that sounds too similar).

In the following macros, match-expr is an expression which produces a match object or #f. Typically it is a call of rxmatch, but it can be any expression.

Macro: rxmatch-let match-expr (var ...) form ...

Evaluates match-expr, and if matched, binds var ... to the matched strings, then evaluates forms. The first var receives the entire match, and subsequent variables receive submatches. If the number of submatches are smaller than the number of variables to receive them, the rest of variables will get #f.

It is possible to put #f in variable position, which says you don't care that match.

 
(rxmatch-let (rxmatch #/(\d+):(\d+):(\d+)/
                      "Jan  1 23:59:58, 2001")
   (time hh mm ss)
  (list time hh mm ss))
 => ("23:59:58" "23" "59" "58")

(rxmatch-let (rxmatch #/(\d+):(\d+):(\d+)/
                      "Jan  1 23:59:58, 2001")
   (#f hh mm)
  (list hh mm))
 => ("23" "59")

This macro corresponds to scsh's let-match.

Macro: rxmatch-if match-expr (var ...) then-form else-form
Evaluates match-expr, and if matched, binds var ... to the matched strings and evaluate then-form. Otherwise evaluates else-form. The rule of binding vars is the same as rxmatch-let.

 
(rxmatch-if (rxmatch #/(\d+:\d+)/ "Jan 1 11:22:33")
    (time)
  (format #f "time is ~a" time)
  "unknown time")
 => "time is 11:22"

(rxmatch-if (rxmatch #/(\d+:\d+)/ "Jan 1 11-22-33")
    (time)
  (format #f "time is ~a" time)
  "unknown time")
 => "unknown time"

This macro corresponds to scsh's if-match.

Macro: rxmatch-cond clause ...
Evaluate condition in clauses one by one. If a condition of a clause satisfies, rest portion of the clause is evaluated and becomes the result of rxmatch-cond. Clause may be one of the following pattern.

(match-expr (var ...) form ...)
Evaluate match-expr, which may return a regexp match object or #f. If it returns a match object, the matches are bound to vars, like rxmatch-let, and forms are evaluated.

(test expr form ...)
Evaluates expr. If it yields true, evaluates forms.

(test expr => proc)
Evaluates expr and if it is true, calls proc with the result of expr as the only argument.

(else form ...)
If this clause exists, it must be the last clause. If other clauses fail, forms are evaluated.

If no else clause exists, and no other clause matched the string-expr, an undefined value is returned.

 
;; parses several possible date format
(define (parse-date str)
  (rxmatch-cond
    ((rxmatch #/^(\d\d?)\/(\d\d?)\/(\d\d\d\d)$/ str)
        (#f mm dd yyyy)
      (map string->number (list yyyy mm dd)))
    ((rxmatch #/^(\d\d\d\d)\/(\d\d?)\/(\d\d?)$/ str)
        (#f yyyy mm dd)
      (map string->number (list yyyy mm dd)))
    ((rxmatch #/^\d+\/\d+\/\d+$/ str)
        (#f)
     (error "ambiguous: ~s" str))
    (else (error "bogus: ~s" str))))

(parse-date "2001/2/3") => (2001 2 3)
(parse-date "12/25/1999") => (1999 12 25)

This macro corresponds to scsh's match-cond.

Macro: rxmatch-case string-expr clause ...
String-expr is evaluated, and clauses are interpreted one by one. A clause may be one of the following pattern.

(re (var ...) form ...)
Re must be either a literal string describing a regexp, or a regexp object. If it matches with the result of string-expr, the match result is bound to vars and forms are evaluated, and rxmatch-case returns the result of the last form.

If re doesn't match the result of string-expr, string-expr yields non-string value, the interpretation proceeds to the next clause.

(test proc form ...)
A procedure proc is applied on the result of string-expr. If it yields true value, forms are evaluated, and rxmatch-case returns the result of the last form.

If proc yieds #f, the interpretation proceeds to the next clause.

(test proc => proc2)
A procedure proc is applied on the result of string-expr. If it yields true value, proc2 is applied on the result, and its result is returned as the result of rxmatch-case.

If proc yieds #f, the interpretation proceeds to the next clause.

(else form ...)
This form must appear at the end of clauses, if any. If other clauses fail, forms are evaluated, and the result of the last form becomes the result of rxmatch-case.

If no else clause exists, and no other clause matched the string-expr, an undefined value is returned.

The parse-date example above becomes simpler if you use rxmatch-case
 
(define (parse-date2 str)
  (rxmatch-case str
    (test (lambda (s) (not (string? s))) #f)
    (#/^(\d\d?)\/(\d\d?)\/(\d\d\d\d)$/ (#f mm dd yyyy)
     (map string->number (list yyyy mm dd)))
    (#/^(\d\d\d\d)\/(\d\d?)\/(\d\d?)$/ (#f yyyy mm dd)
     (map string->number (list yyyy mm dd)))
    (#/^\d+\/\d+\/\d+$/                (#f)
     (error "ambiguous: ~s" str))
    (else (error "bogus: ~s" str))))

Function: regexp-replace regexp string substitution
Function: regexp-replace-all regexp string substitution
Replaces the part of string that matched to regexp for substitution. regexp-replace just replaces the first match of regexp, while regexp-replace-all repeats the replacing throughout entire string.

substitution may be a string or a procedure. If it is a string, it can contain a digit sequence preceded by a backslash (e.g. \2) that refers the submatch. \0 refers to the entire match. Note that you need two backslashes to include backslash character in the literal string; if you want to include a backslash character itself in the substitution, you need four backslashes.

 
(regexp-replace #/def|DEF/ "abcdefghi" "...")
  => "abc...ghi"
(regexp-replace #/def|DEF/ "abcdefghi" "|\\0|")
  => "abc|def|ghi"
(regexp-replace #/def|DEF/ "abcdefghi" "|\\\\0|")
  => "abc|\\0|ghi"
(regexp-replace #/c(.*)g/ "abcdefghi" "|\\1|")
  => "ab|def|hi"

If substitution is a procedure, for every match in string it is called with one argument, regexp-match object. The returned value from the procedure is inserted to the output string using display.

 
(regexp-replace #/c(.*)g/ "abcdefghi" 
                (lambda (m)
                  (list->string
                   (reverse
                    (string->list (rxmatch-substring m 1))))))
 => "abfedhi"


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated by Ken Dickey on November, 28 2002 using texi2html