<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: How not to do XML</title>
	<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/</link>
	<description>Armin Ronacher thinking</description>
	<pubDate>Mon, 07 Jul 2008 10:30:56 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2</generator>

	<item>
		<title>By: schwuk.com</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2383</link>
		<author>schwuk.com</author>
		<pubDate>Thu, 21 Feb 2008 23:02:18 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2383</guid>
		<description>&lt;strong&gt;Migrated to WordPress...&lt;/strong&gt;

First of all, apologies for any &#8216;planet spam&#8217; caused the change to my feeds.
After what seems like an eternity (but is actually just over a year) I&#8217;ve switched the backend of this site from Mephisto to WordPress. The main reason for t...</description>
		<content:encoded><![CDATA[<p><strong>Migrated to WordPress&#8230;</strong></p>
<p>First of all, apologies for any &#8216;planet spam&#8217; caused the change to my feeds.<br />
After what seems like an eternity (but is actually just over a year) I&#8217;ve switched the backend of this site from Mephisto to WordPress. The main reason for t&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: author</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2321</link>
		<author>author</author>
		<pubDate>Tue, 19 Feb 2008 14:41:42 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2321</guid>
		<description>&#62; An ini file is a standard. A .odt file is a standard. But XML is not a standard at all. It needs 
&#62; interpretation; a mapping to something concrete. You can pinpoint almost all XML abuses back &#62; to this misconception that XML truly defines anything except a BNF grammer. It doesn’t.

XML : Extensible Markup Language
Its all very simple, you just don't get it. XML, in the end, is nothing more than
a form of meta-data. It just describes the data, so I'm sorry it cannot be tied down
to any one concrete thing/application because data comes in all shapes and sizes, and one mans data is another mans rubbish.</description>
		<content:encoded><![CDATA[<p>&gt; An ini file is a standard. A .odt file is a standard. But XML is not a standard at all. It needs<br />
&gt; interpretation; a mapping to something concrete. You can pinpoint almost all XML abuses back &gt; to this misconception that XML truly defines anything except a BNF grammer. It doesn’t.</p>
<p>XML : Extensible Markup Language<br />
Its all very simple, you just don&#8217;t get it. XML, in the end, is nothing more than<br />
a form of meta-data. It just describes the data, so I&#8217;m sorry it cannot be tied down<br />
to any one concrete thing/application because data comes in all shapes and sizes, and one mans data is another mans rubbish.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: amix</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2319</link>
		<author>amix</author>
		<pubDate>Tue, 19 Feb 2008 12:22:19 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2319</guid>
		<description>The general problem with the "web community" is that no one follows standards and this have been ongoing since the beginning of the WWW. Everybody is looking for an easy way out (even those that implement browsers). The fact that WordPress do not use "XML" isn't really a shock, it's to expect.

I have high doubts that this mentality will change. Even those that scream "web-standards" and implement their pages tableless use shit loads of hacks in order to render their page properly in different browsers.</description>
		<content:encoded><![CDATA[<p>The general problem with the &#8220;web community&#8221; is that no one follows standards and this have been ongoing since the beginning of the <a href="http://WWW." rel="nofollow">WWW.</a> Everybody is looking for an easy way out (even those that implement browsers). The fact that WordPress do not use &#8220;XML&#8221; isn&#8217;t really a shock, it&#8217;s to expect.</p>
<p>I have high doubts that this mentality will change. Even those that scream &#8220;web-standards&#8221; and implement their pages tableless use shit loads of hacks in order to render their page properly in different browsers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Herman Bos</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2316</link>
		<author>Herman Bos</author>
		<pubDate>Tue, 19 Feb 2008 07:57:23 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2316</guid>
		<description>I would expect you didn't lose your faith in open standards but in php projects of decent quality. I do not want to say its because of the language (...) but definitely because of the type of programmers which use it. We have seen such terrible PHP things (ranging from custom made stuff to out of the box open source packages). To such an extend that it builds the believe in your mind that  php software equals crap software. I hope to be proved wrong in the future!</description>
		<content:encoded><![CDATA[<p>I would expect you didn&#8217;t lose your faith in open standards but in php projects of decent quality. I do not want to say its because of the language (&#8230;) but definitely because of the type of programmers which use it. We have seen such terrible PHP things (ranging from custom made stuff to out of the box open source packages). To such an extend that it builds the believe in your mind that  php software equals crap software. I hope to be proved wrong in the future!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Meneer R</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2311</link>
		<author>Meneer R</author>
		<pubDate>Tue, 19 Feb 2008 06:26:51 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2311</guid>
		<description>@11) mridkash

"As I know XML is generally used for storage of data. And w3c website says,
“XML was designed to transport and store data.”

Which seems an extremely broad, non specific definition. What is not data?. What about that definition is anything different from the definition of a file? Also designed to transport and store data. Yet nobody every confused a database with a file-system. (well, they did when they planned vista, but they never got around to implement that, not suprisingly)

The question remains: what kind of data, for what purposes? How would you store audio in XML? Audio is data right? Is it designed to store audio? The fact you came with this defintions only illustrates the confusion.

Erik larson @15 might provide an answer. He made the specific claim it is well suited for document based data. There is definately something to say for that. It seems wel suited for a specific domain of data. Esspecially where we need a tree like structure. Those are much less verbose than say when using a relational mapping.

As to using it as a database is a mistake, I'm referring to these obvious facts:
  a) its space inefficient
  b) it has no defined query language, nor indexes
  c) it does not scale to large amounts of data

Perhaps point c is less obvious. It does not scale because you can't query/search an xml file, without having the parse and deal with the whole thing. Consider the algorithmic complexicity class of the search algorithm. Yeah, its linear. That won't scale. Do not use it as a database. 

You either have to go through the whole file over and over again, for each query, _or_ you load the file into memory completely and create indexes (read: your own crappy DBMS). Off course your custom hand-crafted memory based database engine needs to map your XML _correctly_ onto the ADT you use to store in memory, or otherwise a bunch of hard-to-trace-bugs are going to show up. 

The thing is, either the performance is repulsive (rhythmbox?) or you are dealing with two different ADT's, one is used to store the data, the other to manage the data when you use it. In those situations its just another layer of abstraction and conversion bugs. 

But to respond some more to Erik Larson @15. It is even badly suited, as a standard, not as a technique, for document based data. The thing is: the standard doesn't define any document. Rather, it provides a template to define a document standard. But without any default interpretation, XML is not a standard at all, any more as an ordinary file is. They all have names for example. They all have mime-types too. (say the doctype of a file). If you want to organize them into a hierarchy you can use a directory. When you look at it like that, a normal unix filesystem, or a .tar archive provide the exact same abstraction. Except those formats are actually performant, unlike XML.

Now, that's the weird thing. I'm not saying a file-system or .tar file is a preferred abstraction for documents. I am saying that files are not a standard, nor is XML. 

An ini file is a standard. A .odt file is a standard. But XML is not a standard at all. It needs interpretation; a mapping to something concrete. You can pinpoint almost all XML abuses back to this misconception that XML truly defines anything except a BNF grammer. It doesn't. 

To Biff, @13, "XML is not, and was never meant to be, a programming language."

I wasn't claiming it was. But without containing default, usefull, datatypes, that actually map the datatypes we find in 99% of the programming languages out there, the conversion into those types is going to introduce more bugs and problems than XML could possible solve. 

Perhaps we need a better definition of the problem we are triying to solve here.
Here's the problem I think you would want XML to solve:

"The exchange of typed data (types suggesting not only the syntax, but also the semantics).
In such a way you can easily create, move, import and work with that data from the ecoystem of programming languages."

Given that definition, the perfect data-exchange language I would come up, would also contain some sort of turing-machiene. Because exchanging behavior, logic in a modular, interchangeable way between programs is something we definately need. But that was just dreaming out lead. At this point, if the world settled onto a data-exchange language that didn't do this much harm, i would be a happier man.

To Ronny Pfannschmidt, @15. Although lisp has some of the most intelligent primitives; that is, it is quite powerfull using only a minimal set of operations, i doubt the syntax would make the format very human-friendly. I would rather go for a haskell-style syntax; perhaps even without the default polish notation. The perfect candidate however, needs to be very easy to support from within another language (esspecially generating it) as well being human readable and usable. It would also be the default query language. 

If anything I can understand the feeling some of you have to step up and defend XML. It's foundation and the reason people use it has so much idealism and good intentions behind it, it's hard to attack. I am however, also afraid that that is the reason why this wasn't shot down as much as should have been.  

I dare to claim it is no accident that most XML parsers are actually not valid at all. The majority of people that would produce a valid XML parser with their hands tight behind their back would think twice before even considering to use XML in the first place. There might be a negative correlation between skill and the probability of using XML. At least when you limit skill to those who with an academic background.</description>
		<content:encoded><![CDATA[<p>@11) mridkash</p>
<p>&#8220;As I know XML is generally used for storage of data. And w3c website says,<br />
“XML was designed to transport and store data.”</p>
<p>Which seems an extremely broad, non specific definition. What is not data?. What about that definition is anything different from the definition of a file? Also designed to transport and store data. Yet nobody every confused a database with a file-system. (well, they did when they planned vista, but they never got around to implement that, not suprisingly)</p>
<p>The question remains: what kind of data, for what purposes? How would you store audio in XML? Audio is data right? Is it designed to store audio? The fact you came with this defintions only illustrates the confusion.</p>
<p>Erik larson @15 might provide an answer. He made the specific claim it is well suited for document based data. There is definately something to say for that. It seems wel suited for a specific domain of data. Esspecially where we need a tree like structure. Those are much less verbose than say when using a relational mapping.</p>
<p>As to using it as a database is a mistake, I&#8217;m referring to these obvious facts:<br />
  a) its space inefficient<br />
  b) it has no defined query language, nor indexes<br />
  c) it does not scale to large amounts of data</p>
<p>Perhaps point c is less obvious. It does not scale because you can&#8217;t query/search an xml file, without having the parse and deal with the whole thing. Consider the algorithmic complexicity class of the search algorithm. Yeah, its linear. That won&#8217;t scale. Do not use it as a database. </p>
<p>You either have to go through the whole file over and over again, for each query, _or_ you load the file into memory completely and create indexes (read: your own crappy DBMS). Off course your custom hand-crafted memory based database engine needs to map your XML _correctly_ onto the ADT you use to store in memory, or otherwise a bunch of hard-to-trace-bugs are going to show up. </p>
<p>The thing is, either the performance is repulsive (rhythmbox?) or you are dealing with two different ADT&#8217;s, one is used to store the data, the other to manage the data when you use it. In those situations its just another layer of abstraction and conversion bugs. </p>
<p>But to respond some more to Erik Larson @15. It is even badly suited, as a standard, not as a technique, for document based data. The thing is: the standard doesn&#8217;t define any document. Rather, it provides a template to define a document standard. But without any default interpretation, XML is not a standard at all, any more as an ordinary file is. They all have names for example. They all have mime-types too. (say the doctype of a file). If you want to organize them into a hierarchy you can use a directory. When you look at it like that, a normal unix filesystem, or a .tar archive provide the exact same abstraction. Except those formats are actually performant, unlike XML.</p>
<p>Now, that&#8217;s the weird thing. I&#8217;m not saying a file-system or .tar file is a preferred abstraction for documents. I am saying that files are not a standard, nor is XML. </p>
<p>An ini file is a standard. A .odt file is a standard. But XML is not a standard at all. It needs interpretation; a mapping to something concrete. You can pinpoint almost all XML abuses back to this misconception that XML truly defines anything except a BNF grammer. It doesn&#8217;t. </p>
<p>To Biff, @13, &#8220;XML is not, and was never meant to be, a programming language.&#8221;</p>
<p>I wasn&#8217;t claiming it was. But without containing default, usefull, datatypes, that actually map the datatypes we find in 99% of the programming languages out there, the conversion into those types is going to introduce more bugs and problems than XML could possible solve. </p>
<p>Perhaps we need a better definition of the problem we are triying to solve here.<br />
Here&#8217;s the problem I think you would want XML to solve:</p>
<p>&#8220;The exchange of typed data (types suggesting not only the syntax, but also the semantics).<br />
In such a way you can easily create, move, import and work with that data from the ecoystem of programming languages.&#8221;</p>
<p>Given that definition, the perfect data-exchange language I would come up, would also contain some sort of turing-machiene. Because exchanging behavior, logic in a modular, interchangeable way between programs is something we definately need. But that was just dreaming out lead. At this point, if the world settled onto a data-exchange language that didn&#8217;t do this much harm, i would be a happier man.</p>
<p>To Ronny Pfannschmidt, @15. Although lisp has some of the most intelligent primitives; that is, it is quite powerfull using only a minimal set of operations, i doubt the syntax would make the format very human-friendly. I would rather go for a haskell-style syntax; perhaps even without the default polish notation. The perfect candidate however, needs to be very easy to support from within another language (esspecially generating it) as well being human readable and usable. It would also be the default query language. </p>
<p>If anything I can understand the feeling some of you have to step up and defend XML. It&#8217;s foundation and the reason people use it has so much idealism and good intentions behind it, it&#8217;s hard to attack. I am however, also afraid that that is the reason why this wasn&#8217;t shot down as much as should have been.  </p>
<p>I dare to claim it is no accident that most XML parsers are actually not valid at all. The majority of people that would produce a valid XML parser with their hands tight behind their back would think twice before even considering to use XML in the first place. There might be a negative correlation between skill and the probability of using XML. At least when you limit skill to those who with an academic background.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Grammar Nazi</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2310</link>
		<author>Grammar Nazi</author>
		<pubDate>Tue, 19 Feb 2008 05:25:46 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2310</guid>
		<description>Apostrophe's and they're use's.</description>
		<content:encoded><![CDATA[<p>Apostrophe&#8217;s and they&#8217;re use&#8217;s.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Gauthier</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2306</link>
		<author>Michael Gauthier</author>
		<pubDate>Tue, 19 Feb 2008 02:48:25 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2306</guid>
		<description>Wordpress is not an example of good PHP programming. For the record, it is not difficult to generate valid XML in either PHP4 (Which Wordpress is still using) or PHP5. PHP is not the problem, lazy developers are.</description>
		<content:encoded><![CDATA[<p>Wordpress is not an example of good PHP programming. For the record, it is not difficult to generate valid XML in either PHP4 (Which Wordpress is still using) or PHP5. PHP is not the problem, lazy developers are.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: eddie</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2303</link>
		<author>eddie</author>
		<pubDate>Tue, 19 Feb 2008 00:17:43 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2303</guid>
		<description>In the long run (ie. you decide to scrape your own site and let Wordpress handle its own export format) you are using regexp anyway, but yeah, it's a shame people can't have fun parsing WXR in 'strict standards compliance mode' and have to overcome wordpress bugs. It is interesting to know that if I write a blog comment that is not XML compliant, then the resulting export file won't be, so I may be planting the seed for future problems for the blog author and/or the sysadmin.

However I am very happy with my sqladmin. I cannot do stuff like DTD or XSL (and it could be a cool thing to do over a WXR file) but it works flawlessly.

Somebody should refactor the whole WXR thing. In my POV it could be done in a compatible way and make everybody happy, but I am not a PHP hacker and I'm afraid I won't see that happening ever.</description>
		<content:encoded><![CDATA[<p>In the long run (ie. you decide to scrape your own site and let Wordpress handle its own export format) you are using regexp anyway, but yeah, it&#8217;s a shame people can&#8217;t have fun parsing WXR in &#8217;strict standards compliance mode&#8217; and have to overcome wordpress bugs. It is interesting to know that if I write a blog comment that is not XML compliant, then the resulting export file won&#8217;t be, so I may be planting the seed for future problems for the blog author and/or the sysadmin.</p>
<p>However I am very happy with my sqladmin. I cannot do stuff like DTD or XSL (and it could be a cool thing to do over a WXR file) but it works flawlessly.</p>
<p>Somebody should refactor the whole WXR thing. In my POV it could be done in a compatible way and make everybody happy, but I am not a PHP hacker and I&#8217;m afraid I won&#8217;t see that happening ever.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John L. Clark</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2301</link>
		<author>John L. Clark</author>
		<pubDate>Mon, 18 Feb 2008 22:11:46 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2301</guid>
		<description>`nickname_deleted`'s advice is certainly the quick-and-dirty naive approach, but it makes one's toolchain tightly coupled to the almost-XML Wordpress format.  If this format were produced as well-formed XML and processed using a conformant parser, then other tools could consume and produce this XML in a loosely coupled network, which would lead to a rich environment for manipulating your underlying blog data.  Way to call the Wordpress folks on this, Armin!</description>
		<content:encoded><![CDATA[<p>`nickname_deleted`&#8217;s advice is certainly the quick-and-dirty naive approach, but it makes one&#8217;s toolchain tightly coupled to the almost-XML Wordpress format.  If this format were produced as well-formed XML and processed using a conformant parser, then other tools could consume and produce this XML in a loosely coupled network, which would lead to a rich environment for manipulating your underlying blog data.  Way to call the Wordpress folks on this, Armin!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Larson</title>
		<link>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2295</link>
		<author>Eric Larson</author>
		<pubDate>Mon, 18 Feb 2008 18:29:00 +0000</pubDate>
		<guid>http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/#comment-2295</guid>
		<description>@Meneer R

XML was born out of SGML, which was created within the context of publishing. The impact then is that XML is well suited for document oriented processes. A feed is an excellent example of this in that it is unicode aware and can handle mixed content. The problem mentioned above is that the WXL (or whatever it is called) is not valid XML, which means that the wealth of tools available for working with (valid) XML are useless. 

Your point regarding conversion between XML to native language types is a somewhat valid point considering how most people use XML today. I would argue though, that using XML as some sort of data conversion or serialization platform is not the intended use and as such, is problematic. Really the problem of transforming data is a difficult one even within a single language. Take Lisp for example. Its name is derived from list processing! It is not surprising then that folks would have a wealth of problems working with XML as a serialization tool. Even in a rather extensively spec'd technology like SOAP we services, translation of simple types like Strings is non-trivial between relative languages like Java and C#. 

I say all this because you are right in saying XML is bad for something like storing ints and string for use by a programming language. It is good for exporting a series of entries as a single document. Having been bit by the same issue in Wordpress, I don't put blame on XML, but rather Wordpress for not creating valid XML.</description>
		<content:encoded><![CDATA[<p>@Meneer R</p>
<p>XML was born out of SGML, which was created within the context of publishing. The impact then is that XML is well suited for document oriented processes. A feed is an excellent example of this in that it is unicode aware and can handle mixed content. The problem mentioned above is that the WXL (or whatever it is called) is not valid XML, which means that the wealth of tools available for working with (valid) XML are useless. </p>
<p>Your point regarding conversion between XML to native language types is a somewhat valid point considering how most people use XML today. I would argue though, that using XML as some sort of data conversion or serialization platform is not the intended use and as such, is problematic. Really the problem of transforming data is a difficult one even within a single language. Take Lisp for example. Its name is derived from list processing! It is not surprising then that folks would have a wealth of problems working with XML as a serialization tool. Even in a rather extensively spec&#8217;d technology like SOAP we services, translation of simple types like Strings is non-trivial between relative languages like Java and C#. </p>
<p>I say all this because you are right in saying XML is bad for something like storing ints and string for use by a programming language. It is good for exporting a series of entries as a single document. Having been bit by the same issue in Wordpress, I don&#8217;t put blame on XML, but rather Wordpress for not creating valid XML.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
