[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.8 Characters

Builtin Class: <char>

Reader Syntax: #\charname
[R5RS] Denotes a literal character.

When the reader reads #\, it fetches a sbusequent character. If it is one of ()[]{}" \|;#, this is a character literal of itself. Otherwise, the reader reads subsequent characters until it sees a non word-constituent character. If only one character is read, it is the character. Otherwise, the reader matches the read characters with predefined character names. If it doesn't match any, an error is signalled.

The following character names are recognized. These haracter names are case insensitive.

space
Whitespace (ASCII #x20)
newline, nl, lf
Newline (ASCII #x0a)
return, cr
Carriage return (ASCII #x0d)
tab, ht
Horizontal tab (ASCII #x09)
page
Form feed (ASCII #x0c)
escape, esc
Escape (ASCII #x1b)
delete, del
Delete (ASCII #x7f)
null
NUL character (ASCII #x00)
xN
A character whose internal encoding is the integer N, when N is a hexadecimal integer. Note that this notation is not portable among different internal encoding schemes except ASCII character range.
uN
A character whose UCS character code is the integer N, where N is 4-digit or 8-digit hexadecimal number. If Gauche is compiled with the internal encoding other than UTF-8, the reader uses gauche.charconv module to convert Unicode to the internal character code. Note that the specified character may not be defined in the internal encoding; in which case, either a substitution character is used, or an error is signalled.

 
#\newline => #\newline ; newline character
#\x0a     => #\newline ; ditto
#\x41     => #\A       ; ASCII letter 'A'
#\u0041   => #\A       ; ASCII letter 'A', specified by UCS
#\u3042   => ; Hiragana letter A, specified by UCS
#\u0002a6b2 => ; JISX0213 Kanji 2-94-86, specified by UCS4

You can denote multibyte characters with this syntax if the program text is written in the same encoding as the internal character encoding.

Function: char? obj
[R5RS] Returns #t if obj is a character, #f otherwise.

Function: char=? char1 char2
Function: char<? char1 char2
Function: char<=? char1 char2
Function: char>? char1 char2
Function: char>=? char1 char2
[R5RS] Compares characters. Character comparison is done in internal character encoding.

Function: char-ci=? char1 char2
Function: char-ci<? char1 char2
Function: char-ci<=? char1 char2
Function: char-ci>? char1 char2
Function: char-ci>=? char1 char2
[R5RS] Compares characters in case-insensitive way. In the current version, character cases are not well defined outside the ASCII character range.

Function: char-alphabetic? char
Function: char-numeric? char
Function: char-whitespace? char
Function: char-upper-case? char
Function: char-lower-case? char
[R5RS] Returns true if a character char is an alphabetic character ([A-Za-z]), a numeric character ([0-9]), a whitespace character, an upper case character or a lower case character, respectively. Currently, these procedures works only for ASCII characters. They return #f for all other characters.

Function: char->integer char
Function: integer->char n
[R5RS] char->integer returns an exact integer that represents internal encoding of the character char. integer->char returns a character whose internal encoding is an exact integer n. The following expression is always true for valid charcter char:
 
(eq? char (integer->char (char->integer char)))

The result is undefined if you pass n to integer->char that doesn't have a corresponding character.

Function: char->ucs char
Function: ucs->char n
Converts a character char to integer UCS codepoint, and integer UCS codepoint n to a character, respectively.

If Gauche is compiled with UTF-8 encoding, these procedures are the same as char->integer and integer->char.

When Gauche's internal encoding differs from UTF-8, these procedures implicitly loads gauche.charconv module to convert internal character code to UCS or vice versa (See section 9.2 gauche.charconv - Character Code Conversion). If char doesn't have corresponding UCS codepoint, char->ucs returns #f. If UCS codepoint n can't be represented in the internal character encoding, ucs->char returns #f, unless the conversion routine provides a substitution character.

Function: char-upcase char
Function: char-downcase char
[R5RS] Returns the upper case and lower case of char, respectively. If char doesn't have such distinction of upper or lower case, char itself is returned.

In the current version, character cases are not well defined outside the ASCII character range.

Function: digit->integer char &optional (radix 10)
If given character char is a valid digit character in radix radix number, the corresponding integer is returned. Otherwise #f is returned.
 
(digit->integer #\4) => 4
(digit->integer #\e 16) => 14
(digit->integer #\9 8) => #f
Note: CommonLisp has a similar function in rather confusing name, digit-char-p.

Function: integer->digit integer &optional (radix 10)
Reverse operation of digit->integer. Returns a character that represents the number integer in the radix radix system. If integer is out of the valid range, #f is returned.
 
(integer->digit 13 16) => #\d
(integer->digit 10) => #f
Note: CommonLisp's digit-char.

Function: gauche-character-encoding
Returns a symbol designates the native character encoding, selected at the compile time. The possible return values are those:
euc-jp
EUC-JP
utf-8
UTF-8
sjis
Shift JIS
none
No multibyte character support (8-bit fixed-length character).

Function: supported-character-encodings
Returns a list of string names of character encoding schemes that are supported in the native multibyte encoding scheme.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated by Ken Dickey on November, 28 2002 using texi2html