[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.2.3 Conversion ports

Function: open-input-conversion-port source from-code &keyword to-code buffer-size owner?
Takes an input port source, which feeds characters encoded in from-code, and returns another input port, from which you can read characters encoded in to-code.

If to-code is omitted, the native CES is assumed.

buffer-size is used to allocate internal buffer size for conversion. The default size is about 1 kilobytes and it's suitable for typical cases.

If you don't know the source's CES, you can specify CES guessing scheme, such as "*JP", in place of from-code. The conversion port tries to guess the encoding, by prefetching the data from source up to the buffer size. It signals an error if the code guessing routine finds no appropriate CES. If the guessing routine finds ambiguous input, however, it silently assume one of possible CES's, in favor of the native CES. Hence it is possible that the guessing is wrong if the buffer size is too small. The default size is usually enough for most text documents, but it may fail if the large text contains mostly ASCII characters and multibyte characters appear only at the very end of the document. To be sure for the worst case, you have to specify the buffer size large enough to hold entire text.

By default, open-input-conversion-port leaves source open. If you specify true value to owner?, the function closes source after it reads EOF from the port.

For example, the following code copies a file `unknown.txt' to a file `eucjp.txt', converting unknown japanese CES to EUC-JP.
 
(call-with-output-file "eucjp.txt"
  (lambda (out)
    (copy-port (open-input-conversion-port
                 (open-input-file "unknown.txt")
                 "*jp"             ;guess code
                 :to-code "eucjp"
                 :owner? #t)       ;close unknown.txt afterwards
               out)))

Function: open-output-conversion-port sink to-code &keyword from-code buffer-size owner?
Creates and returns an output port that converts given characters from from-code to to-code and feed to an output port sink. If from-code is omitted, the native CES is assumed. You can't specify a character guessing scheme (such as "*JP") to neither from-code nor to-code.

buffer-size specifies the size of internal conversion buffer. The characters put to the returned port may stay in the buffer, until the port is explicity flushed (by flush) or the port is closed.

By default, the returned port doesn't closes sink when itself is closed. If a keyword argument owner? is provided and true, however, it closes sink when it is closed.

Function: ces-convert string from-code &optional to-code
Convert string's character encoding from from-code to to-code, and returns the converted string. The returned string may be a byte-string if to-code is different from the native CES.

from-code can be a name of character guessing scheme (e.g. "*JP"). when to-code is omitted, the native CES is assumed.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated by Ken Dickey on November, 28 2002 using texi2html