Pages tagged as ‘html’

Draconian Error Handling in XML

Mark found a broken blog, I have a nice broken XHTML page directly from the W3C: screenshot after the jump.

We all love XHTML do we? And yes I do know that this blog’s XML is equally broken but seriously, blame WordPress not me.

No…

I will not blog about it, because it just sucks.

Abusing XHTML

As small resumption to my previous post about XHTML/HTML here a small list of websites using XHTML that break when rendered on a browser in XHTML mode:

Not that all my XHTML pages are valid, but if they fail… How should browser vendors implement XHTML if that would break the internets?

Doctype Woes (back to HTML4)

October 3rd, 2007

At the moment I’m working together with the rest of the webteam of the ubuntuusers Team on the new portal of ubuntuusers.de based on django. One of the things we will do is consolidating all templates. And while doing so we have to decide to use an HTML/XHTML standard which we will use including the correct mimetype and doctype.

And selecting that is the hardest part because once you’ve decided on something you have to live with the consequences and cannot really change. For example HTML and XHTML have a slightly different DOM or different rules for CSS (CSS for example has an exception that allows background colors on the body-tag to affect the whole page, this exception does not exist for XHTML). Without a doubt many people use XHTML in a wrong way. Just have a look how many people serve their webpages as text/html and only use HTML semantics. They break if you serve them as application/xml+xhtml or render in a wrong way.

But why does XML and SGML have different semantics? SGML itself was created long ago (I assume IBM has something to do with it, at least it’s predecessor was created there) and is an insane specification. At least that’s what the web told me. I cannot tell you if that’s true or not because the standard itself is not available without paying for it :-/

From what sources tell me XML is an subset of SGML. I wonder how that’s possible tough, because there are syntactic elements that in my opinion are not compatible. For example clash XML’s self closing tags with null end tags in SGML:

XML <br /><br />
SGML: <p/This is some text in a paragraph/

Because the slash has a special meaning in tags in SGML it clashes with the closing slash of XML tags. Also SGML is apparently case insensitive where XML is not. Maybe I’m also wrong there and that part is up to the DTD, but quite frankly. I don’t care. I don’t even are about clashing slashes in tags because no browser implements the correct SGML behavior. And if they would do, we would all see invalid output because the web is not valid. It’s not and it will never be.

But what’s indeed ridiculous is that it’s incredible hard to write pages that are semantical and syntactical correct to both HTML4 and XHTML. However you have to make your documents compatible to both if you want to your page to be valid XHTML and render correctly. The reason is that no browser today selects the render mode by Doctype, and even if they would do, other browsers would break then on the huge number of pages that incorrectly use XHTML.

XML is strict, very strict. Syntactical errors appear as big red error messages. I for myself have to work on the wiki markup for the new portal and one of the things I have to deal with is balancing elements. That is possible and simple, but what’s harder is adding paragraphs without breaking things. And that’s not that easy because not every element is allowed in a paragraph and a paragraph cannot be mixed with every element thanks to inline versus block elements.

Even HTML5 disallows that mixing of different element types but at least it doesn’t complain. Sure, I could send the output through a validator and tell the user that his markup is bullshit and he should correct it. But I won’t do that. Users give a fuck about their markup. And I cannot bloat the parser more than it is now. Server resources are limited and additional validation for such a high traffic site is nearly impossible.

Fortunately browsers will never show you those errors because they parse XHTML with their tagsoup parser they use for HTML too. Even tough, if we cannot ensure that all of our pages are valid XML and XHTML we are not allowed to use the doctype because it would break browsers that support XHTML.

While this is hard for webdesigners and especially for programmers that want to create parsers that generate XHTML it’s an even harder job for the developers of browsers. In the end they have to have two independent parsers for HTML and XHTML. This makes it hard enough for the big browser vendors Microsoft, Mozilla, Opera and Apple, but even harder if you are new to the market and want to ship your own one. Because you not only have to be compatible with the new XHTML standard, but also the old HTML one. Nobody will translate all the old documents to XHTML I’m sure ;-)

Details about the issues are summarized here:

Without a doubt we will have fun with XHTML in the future. Probably the web stays like it’s today, we will still use the tag soup parsers, people will write XHTML that is HTML in fact and browsers will interpret it like that.

For me the decision is HTML4 at the moment, with the subset that is valid for both HTML4 and HTML5. That could make it easier for transition once the standard is ready (and I hope it’s earlier than 2022) and it’s good idea now too. Who needs an u-Tag anyway?

HTML5 Ahead

changes from HTML4 — awesome. Let’s forget about XHTML and friends and go with HTML5. The specs rock, rock, rock!

cogitations driven by wordpress