| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Traditionally, a string is considered as a simple array of bytes. Programmers tend to imagine a string as a simple array of characters (though a character may occupy more than one byte). It's not the case in Gauche.
Gauche supports multibyte string natively, which means characters are represented by variable number of bytes in a string. Gauche retains semantic compatibility of Scheme string, so such details can be hidden, but it'll helpful if you know a few points.
A string object keeps a type tag and a pointer to the storage
of the string body. The storage of the body is managed in a sort of
"copy-on-write"
way--if you take substring, e.g. using directly by substring
or using regular expression matcher, or even if you copy a string
by copy-string, the underlying storage is shared
(the "anchor" of the string is different, so the copied string
is not eq? to the original string).
The actual string is copied only if you destructively modify it.
Consequently the algorithm like pre-allocating a string by
make-string and filling it with string-set!
becomes extremely inefficient in Gauche. Don't do it.
(It doesn't work with mulitbyte strings anyway).
Sequential access of string is much more efficient
using string ports (See section 6.18.4 String ports).
String search primitives such as string-scan (See section 6.10.7 String utilities)
and regular expression matcher (See section 6.11 Regular expression)
can return a matched string directly, without
using index access at all.
You can choose internal encoding scheme at the time of compiling
Gauche. At runtime, a procedure gauche-character-encoding
can be used to query the internal encoding. Currently, the following
internal encodings are supported.
euc-jp
sjis
utf-8
none
Conversions from other encoding scheme is provided
as a special port. See section 9.2 gauche.charconv - Character Code Conversion, for details.
Gauche assumes the Scheme program is written in its internal encoding scheme.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |