Re: [xml] Encoding and Win32

Date view Thread view Subject view Author view

From: Dave Madole (dmadole@cddb.com)
Date: Fri Feb 02 2001 - 14:06:18 EST


Hi,

I feel that my original comment could use a little clarification.

First, I tried using "iconv", but it didn't really cut it for a couple of
reasons, not the least of which was the fact that its use was completely opaque
on my Red Hat linux box - man iconv, man -k iconv, man unicode, etc. produce
nothing appropriate. I searched around for the data files, etc. to no avail.
There are man pages on my solaris box, but they aren't really appropriate, as
the character encodings actually SEEM to have different names (cross platform
transparency is critical to my app).

Our friends at Red Hat could use a little prodding as far as the doc goes.
Doing a search for "iconv" on the RedHat sites turns up nothing useful.

May I suggest adding a link to the online Linux iconv documentation to the
xmlsoft.org web site?

Secondly, the data from which I am building my document is coming from various
sources, some in utf-8, some in "wrong endian" utf-16 (Oracle), some in "wierd"
and basically unpredictable multi-byte Asian character encodings. In my case
it actually makes sense to convert to a "neutral" utf-16 intermediate
encoding. Also I am dealing with a situation where I need to be told on the
fly what destination encoding to use from a very large range of Asian character
encodings - ICU makes this much easier because it accepts just about anything
the user might enter as the name of the encoding. It also provides support for
collating and sorting, etc.

Finally, I had meant to add that I certainly appreciate the work that went into
putting iconv support into libxml, and understand that it IS the standard and
that it should be used when possible and didn't mean to imply that it was a
wrong choice in any way. No doubt in most cases it is more than adequate.
ICU probably is more than most people would need and is a bit fat, but it works
very well and is copiously documented.

Dave

Peter Jacobi wrote:

> Hi Igor, All,
>
> I agree, that is easier to factor a n-to-m conversion into n-1 and 1-m, but
> it isn't required or useful to store the complete string in the intermediate
> encoding. Only one internal w_char needs to be used.
>
> Regards,
> Peter
>
> ----
> Message from the list xml@rpmfind.net
> Archived at : http://xmlsoft.org/messages/
> to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Fri Feb 02 2001 - 14:43:44 EST