From: Igor Zlatkovic (igorz@dialup.nacamar.de)
Date: Wed Jan 24 2001 - 11:22:14 EST
Hello.
Alejo, I see your point. Indeed, your argumenting contains a fair amount of
reason and there are a lot of points I would agree with you at, including
the statement that I abused the strlen example, for it is most true :-). I
can only hope that you see my point as well, for it is not that much
different to yours.
> However, you see, correctness and sanity of parameters are
> completely relative to the function's precondition.
I can but agree with that. Still, you must agree with me about the calling
program is the one who comes up with those parameters.
I think the point where Alejandro and myself disagree is not checking the
parameters in its own right, but the definition of The Check That Need Be
(TCTNB, in the following) and The Check That Overbloats Code (TCTOC in the
following). I need not mention that the two are mutualy exclusive.
recedence: list
Reply-To: xml@rpmfind.net
Errors-To: xml-error@rpmfind.net
X-loop: xml@rpmfind.net
X-mailing-list: xml@rpmfind.net
Resent-from: xml@rpmfind.net
It looks to me that Alejo and myself have stated our opinion the best way we
could. Each has presented reasons for why things should be the way he
considers it apropriate. Nothing more one of us could say would change the
opinion of the other and I am not quite sure if the issue is significant
enough to justify pumping huge amounts of text, such asAlejandro pointed out to amount of URL processing which appearantly takes
place somewhere within xmlParseFile. According to that, libxml checks the
resource locators passed to it and after determining the transmission
channel which can be used for the particular URL, it proceeds. Is this check
TCTNB or TCTOC?
We are happy to have a libxml which handles a lot of resource locators, gods
bless Daniel Veillard for making it. What sorts of URL's can be handled
here? Well, there is http://, ftp://, file://, perh my last reply, into
the mailing list :-).
I would say, now that we have made our points, let the others decide. If a
patch should be made, then we shall make one.
Alejo, It is a fact that this must be done slightly differently under Win32
than under UNIX, a courtesy of a rather unreasonable naming conventions used
in Microsoft's C-runtime. If the people wish this patch, I would be glad to
help if you cannot test it under both platforms.
Cheers
Igor
---- Message from the list xml@rpmfind.net Archived ataps a few others. libxml checks if a URL is http://, ftp://, file:// or whatever, before trying to access data the URL points to. This happens because that very check determines the transmission channel and the subset of the underlying operating system funcitons which must be used in order to access the data. This is a part of a very normal URL processing and must be done. This is therefore TCTNB.Now, Alejandro claims, that if the URL points to a file in the local filesystem, an additional check should be made to ensure that it indeed specifies a file, not a directory. Would this check be TCTNB, or TCTOC?
One can specify a directory through a http:// URL as well, not just through file://. I mean, I can say http://xmlfiles.xml.org/archive/1998 ... but in the other hand, I can also say: http://xmlfiles.xml.org/archive/1998/acct.xml. Should checking be done here as well? If file:// URLs are disallowed to point to a directory, why allow http:// and ftp:// URLs to do so? Clear, if I use a http:// URL and speci : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net fy a directory, there is a slight possibility for XML still to come, that being in contrast to the filesystem. Nevertheless, some consistence would not hurt a library which claims to handle something transparently to the programmer.
There is more. The reason why a directory check should be imlemented is to prevent the programmer to choose the source which can never contain XML data, or so I see it. In fact, there are many different things one can find on the filesystem. There are device descriptors, named pipes, sockets, all of which are not known for being a good source of XML data, unless designed specifically for such purpose. In the realm of regular files, it is a fact that executable files contain no XML data, at least nothing parsable by libxml. It is rumored that files which contain media, like JPEG images, also lack XML structure.
If libxml calls stat(2) in order to find out if the URL points to a regular file, then I see it responsible for ensuring that the URL points to something that really contains XML as well. Why is a gzipped tarball a better source of XML data than a directory? Just like directories cannot contain any XML, neither tarballs can. If you look at these two as a possible source of XML data, they are completely equal. If I can use fopen(3) on one of them and not on the other is of no consequence. I cannot think of a good reason why are directories so much worse than any other point in the filesystem without XML, so that they would deserve an extra check? My opinion remains: this sort of check is TCTOC.
> Following your logic, why bother checking the return > value for the open(2) call in libxml at all? After all, the > programmer passed a filename that can't be opened > so it's his fault for not establishing the correctness > and sanity of parameters.
This can only mean you misunderstood my logic, most certainly because I haven't defined it good enough. If you are this far, then you have read things regarding the TCTNB and TCTOC and if I succeeded in making my point, then you know that checking the return value of open(2) is what I came to call TCTNB, in my eyes.
> Currently, the (implicit) precondition is "pass a > filename that points to a file (not directory; if it points > to directory, errors will occur)". So yes, for that > precondition, you must make sure your filename > does point to a file, not a directory. I am just proposing > changing that (again, implicit) precondition to "pass a > filename that can point to both a file or a directory and > let libxml signal an error if it is a directory".
But... that is exactly what happens. You pass a filename that can point to anything and libxml signals an error if it does not contain valid XML, be it a directory, or be it /dev/random. What you pass to xmlParseFile should be viewed not as a file-or-directory-item, but much more abstract, as a resource locator which points to the XML source. This abstraction allows using http://, ftp:// and friends. libxml does and should continue to differ between valid and invalid XML sources, not between files and directories.
I would like to add my opinion about usability and user-friendliness by referring to the example Alejandro gave: The user sits there, chooses something in a file selection dialog and presses OK. What happens? Alejandro gave three possible options.
The option I would use is to present a file-selection dialog which does justice to its name. If things which are not considered good simply don't appear in the selection dialog, there is nothing the user can do wrong. This can be as simple as showing only files which end with ".xml", as the vast majority of XML files do. Every file-selection dialog from any widget set known to me supports a thing called a filter, which can be preset programatically before the dialog is displayed. In addition to that, all those dialogs present directories and files visually distinguishable from each other. The user can alter the filter after the dialog has been presented and thus open a file which does not end with ".xml", should she see it apropriate. Should something go wrong, no user I dealt with would misunderstand the message if it says that something other than a file which contains XML data cannot be a valid source of XML data.
To address the problem with directories, which are displayed in the above option, let me state this: Basic knowledge about filesystem organisation can be expressed in few sentences: - Files contain data. - Directories are not files. They contain files and/or other directories. If there are users who are puzzled about this, they must be educated. They would not understand a "cannot read XML from a directory" any better than a "selected file contains no valid XML", since they seem not to see the difference between a file and a directory anyway. Keeping them from learning the above basics does them no good, since they have most certainly huge problems and zero fun with every program they use. If they happen to be one's customers, then one must either educate or lose them, since the effort and cost of supporting them would never be compensated by any gain they can ever deliver. This is a sad, but true fact of life.
Thanks for listening. I mean it. :-) Igor
---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net
This archive was generated by hypermail 2b29 : Wed Jan 24 2001 - 12:43:51 EST