Re: [xml] Encoding and Win32

Date view Thread view Subject view Author view

From: Dave Madole (dmadole@cddb.com)
Date: Thu Feb 01 2001 - 14:07:34 EST


I use the ICU (International Components for Unicode) libraries from IBM

http://oss.software.ibm.com/icu/

to encode element content. One must convert the element content from utf-8
to utf-16(native byte order) then convert from utf-16 to whatever
character encoding (sjis, big-5) one wants, then base64 encode the
content. This works and satisfies the w3c requirement that the XML itself
must be utf-8 or utf-16. I found that ICU more completely addressed my
needs than iconv. The only downside is that you must convert whatever your
source encoding is to native byte order utf-16 before converting it to your
destination encoding. The API seems to be modeled on iconv and is very
easy to use.

This obviates messing with libxml, although it becomes a bit of a memory
management nightmare.

Dave

Igor Zlatkovic wrote:

> Hi there.
>
> You have a problem. So do I. What you are trying to do is the subject of
> my pains since few days and it has proven to be fairly difficult to
> handle. Due to numerous issues, there is only one way I can go: use wide
> charracters (type wchar_t) internally and UTF-8 externally. When one set
> to such a way needs to use libxml in its present form, there are two
> possibilities:
>
> 1. Write a surrogate for each and every libxml function which operates
> on strings and make the surrogate convert all involved strings before
> and/or after the call to the real libxml function. The same applies when
> accessing any data member of any structure in libxml which contains a
> string. Well, if you are using just a few libxml functions, this is not
> a big problem to do. If you however use a lot of them, it is a pure
> terror. Depending on how often your program calls into libxml, it can
> represent a performance penalty. Good thing about it is that it need be
> done once, provided libxml interface doesn't change.
>
> 2. Modify libxml and make xmlChar type resemble wchar_t, instead of
> unsigned char. In this case, libxml would have to convert from/to UTF-8
> whenever it reads from, or writes to, any external storage (file,
> http...). This is considerable if, and only if you use a whole lot of
> libxml functions and call them often. However, it means producing a new,
> private version of libxml and keeping it in sync with the public one for
> all times. No, even if that would be my (and your) salvation, the
> official libxml cannot be modified in this manner. That would
> mercilessly break each and every libxml-based program out there, in
> addition to loosing support for every platform which does not have an
> ISO-10646 implementation. (Is there such platform?)
>
> To convert strings between wide charracter and UTF-8 representations
> under Win32, I would use WidecharToMultiByte and MultiByteToWideChar
> functions, which are a part of the operating system NLS interface.
>
> I wish you a lot of luck.
> Igor
>
> Philipp Kursawe wrote:
>
> >
> >
> > Hello,
> >
> > I'm new to libxml and and must say its wonderful clean, fast simple
> > C-API is just what I's looking for.
> >
> > But now I'm running into serious problems when I want to use russian,
> > german or chinese text in my XML (not for the tags).
> >
> > I don't understand how to convert from the internal used UTF-8 to a
> > Win32 character set so I can use the texts in native Win32 functions.
> > I'v tried to use iconv and enabled iconv support in libxml but the
> > resulting string still doesnt look right to me. Is it possible to
> > convert from UTF-8 to 2-Byte Windows UNICODE characters, or is there
> > something I've missed?
> >
> > Thanks in advance for your help,
> > Philipp
>
> ----
> Message from the list xml@rpmfind.net
> Archived at : http://xmlsoft.org/messages/
> to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Thu Feb 01 2001 - 14:43:38 EST