RE: [xml] encoding, SAX callbacks

Date view Thread view Subject view Author view

From: James McCann (james@votehere.net)
Date: Mon Feb 26 2001 - 13:16:18 EST


>
>As libxml uses UTF-8 internally, it seems to me that the only proper
>solution to your suggestion would be to insert a wrapper layer for the
>SAX functions. The purpose of this layer would be to convert the data
>into to the application-requested encoding and then pass the data on
>to the application through the "real" SAX interface.

This is exactly what I intended to suggest. Permit the user to control
the encoding used in callbacks without putting the burden on the user to
convert from an undesired encoding to the desired encoding.

>I would like to support Craig's idea of putting this wrapper in your
>application, since it is the one that is dependent on a specific
>encoding. You can use the functions from libxml to do this conversion
>for you (this is fairly easy; I have done something similar for UTF-8
>to HTML).

Yes it is easy to copy and paste, but that is error prone and bad design.
This is what Larry Wall calls "false laziness". If a large fraction of
people who use SAX end up copying and pasting the same chunk of code then
libxml has missed a chance to increase software reliability. You have
copied and pasted chunks of the library source as have I. This is not a
good solution. libxml is quite good at doing various encodings, using
iconv, potentially ones not in existence yet. Why not make this power
available to the users of the library in a fashion convenient for the apps?

Some of the arguments I have read in this thread against seem to assume
that every project which uses XML will encounter documents in many
encodings and that it is better to simply use UTF8 so that the app doesn't
encounter "a myriad" of undesired encodings. In the project I am working
on, we are using XML to send data over the network from one app to another.

So we control the encoding of every document. If our app(s) were processing
documents in
numerous encodings, then getting all of the characters in UTF8 would be
better than a "myriad" of encodings. However since we control both ends
of our transactions, and we use only a single encoding, UTF8 becomes a
burden not a convenience. It turns out that oddly enough for systems like
ours a naive XML parser is preferable to libxml because we do not have to
deal w/ an additional encoding.

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Mon Feb 26 2001 - 14:43:49 EST