Re: [xml] encoding, SAX callbacks

Date view Thread view Subject view Author view

From: Craig Wright (spiral@cs.unm.edu)
Date: Fri Feb 23 2001 - 15:16:57 EST


>X-Authentication-Warning: rpmfind.net: majordomo set sender to xml-request@rufus.w3.org using -f
>From: James McCann <james@votehere.net>
>Cc: Andrew Berg <andrewb@votehere.net>
>Date: Fri, 23 Feb 2001 11:31:29 -0800
>Content-Type: text/plain;
> charset="iso-8859-1"
>Sender: xml-request@rufus.w3.org
>Precedence: list
>Reply-To: xml@rpmfind.net
>X-loop: xml@rpmfind.net
>X-mailing-list: xml@rpmfind.net
>Resent-from: xml@rpmfind.net
>Resent-Bcc:
>
>
>>
>>IMHO, getting the characters in the SAX callbacks in the encoding of the
>>document is the worst of all choices. Now the application has to know
>>about myriads of encodings, instead of the parser.
>
>I don't understand this thinking. I am working on a large project w/
>other people, and our apps use ISO-8859-1. We want to understand and
>use precisely one encoding, ISO-8859-1. This is the encoding used in
>our XML documents. We do not want to handle UTF8 or any other encoding.
>libxml's lack of flexibility in this regard means that we have to add
>code (which I of course simply copied from libxml) to convert from
>an encoding which we have no desire to use to our desired encoding. One
>of the features of libxml which initially attracted me was its ability
>to handle multiple encodings. Now I find that I must use my preferred
>encoding, and in addition translate the encoding used internally by
>libxml.
>
>>There would be some value in the possibility to specify the desired
>>encoding for the callbacks, if you have a UTF16 or 8859-x centric
>>application. But this can easily implemented on top of the existing SAX
>>callback.
>
>I agree that there is "value in the possibility to specify the desired
>encoding for the callbacks" but I would drop the qualification. I also
>agree that it can easily be implemented, and I propose to do this w/in the
>library. libxml already does UTF8 -> various other encodings and vice
>versa, so why burden applications w/ the need to write additional code?
>I have no problem with using UTF8 as the default encoding for the SAX
>callbacks as long as a mechanism exists to determine the encoding used
>in the callbacks. I realize there will be memory use and performance
>drawbacks, but if a user requests a certain encoding, and the appropriate
>transcoders exist, that encoding should be used in the callbacks.
>
>>In fact I assume some users already have done so and can share their
>>code?!
>
>The appropriate means for sharing code is through a library. Having
>someone email some code, or copying and pasting code from the library
>into a project is not. libxml already knows how to translate from one
>encoding to another, what I am suggesting is to make this ability
>available to the user, who should not be forced to use or deal with UTF8.
>There is already a flexible means for transcoding in the library, why
>not extend the API to permit this in SAX callbacks? It would not break
>existing code, and would benefit projects like mine which do not use
>UTF8 (which I suspect is a large group of applications).
>
>What I propose is something along these lines:
>
> pCtxt = xmlCreatePushParserCtxt(&SAXHandler, NULL, "", 0, 0);
> xmlSetSAXCallbacksEncoding(pCtxt, "ISO-8859-1");
>
>Now my application only ever needs to understand ISO-8859-1.
>

    Instead of bloating libxml why not create a little library that
    specifically does translations?

-- 
Craig

------------------------------------------------------------------------------ | Craig W. Wright | "The hard and brittle will surely fall; | | spiral@cs.unm.edu | the soft and supple will overcome." | | http://www.cs.unm.edu/~spiral/ | -Lao Tzu | ------------------------------------------------------------------------------

---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Fri Feb 23 2001 - 16:44:45 EST