From: Igor Zlatkovic (igor@stud.fh-frankfurt.de)
Date: Wed Jan 31 2001 - 17:36:01 EST
Hi there.
You have a problem. So do I. What you are trying to do is the subject of
my pains since few days and it has proven to be fairly difficult to
handle. Due to numerous issues, there is only one way I can go: use wide
charracters (type wchar_t) internally and UTF-8 externally. When one set
to such a way needs to use libxml in its present form, there are two
possibilities:
1. Write a surrogate for each and every libxml function which operates
on strings and make the surrogate convert all involved strings before
and/or after the call to the real libxml function. The same applies when
accessing any data member of any structure in libxml which contains a
string. Well, if you are using just a few libxml functions, this is not
a big problem to do. If you however use a lot of them, it is a pure
terror. Depending on how often your program calls into libxml, it can
represent a performance penalty. Good thing about it is that it need be
done once, provided libxml interface doesn't change.
2. Modify libxml and make xmlChar type resemble wchar_t, instead of
unsigned char. In this case, libxml would have to convert from/to UTF-8
whenever it reads from, or writes to, any external storage (file,
http...). This is considerable if, and only if you use a whole lot of
libxml functions and call them often. However, it means producing a new,
private version of libxml and keeping it in sync with the public one for
all times. No, even if that would be my (and your) salvation, the
official libxml cannot be modified in this manner. That would
mercilessly break each and every libxml-based program out there, in
addition to loosing support for every platform which does not have an
ISO-10646 implementation. (Is there such platform?)
To convert strings between wide charracter and UTF-8 representations
under Win32, I would use WidecharToMultiByte and MultiByteToWideChar
functions, which are a part of the operating system NLS interface.
I wish you a lot of luck.
Igor
Philipp Kursawe wrote:
>
>
> Hello,
>
> I'm new to libxml and and must say its wonderful clean, fast simple
> C-API is just what I's looking for.
>
> But now I'm running into serious problems when I want to use russian,
> german or chinese text in my XML (not for the tags).
>
> I don't understand how to convert from the internal used UTF-8 to a
> Win32 character set so I can use the texts in native Win32 functions.
> I'v tried to use iconv and enabled iconv support in libxml but the
> resulting string still doesnt look right to me. Is it possible to
> convert from UTF-8 to 2-Byte Windows UNICODE characters, or is there
> something I've missed?
>
> Thanks in advance for your help,
> Philipp
---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net
This archive was generated by hypermail 2b29 : Wed Jan 31 2001 - 22:43:49 EST