HTML, CSS, JS and other unsorted stuff

Error correction ...

, , , , ,

The web is no nice place for browsers and they have to correct b0rked code in many ways and usually do it quite good - but there is always room for improvement. Take the following piece of invalid HTML code as you can find it on many websites:
1    <div id="mytest">
2        <p class="theClass">blah 
3            <strong id="theID">"foo" 
4                <em arbitrary="someValue">bar 
3            </strong>
2        </p>
5        <script>
5        </script>
6        <p>baz 
4            </em>blubb 
6        </p>
7        <p>test 
7        </p>
1    </div>
Just the </em> is in the wrong place, what causes the segment to be invalid. How do you think different browsers handle it?

In Opera 11.01 and 11.10 the code above becomes this after error correction:
1    <div id="mytest">
2        <p class="theClass">blah 
3            <strong id="theID">"foo" 
4                <em arbitrary="someValue">bar 
3            </strong>
2
5            <script>
5            </script>
6            <p>baz 
4                </em>blubb 
6            </p>
7            <p>test 
7            </p>
1    </div>
The error correction totally destroys the structure of the block level elements, worse: It even violates the HTML specification because it nests a P in a P which is forbidden. This makes it nearly impossible to address the block level elements with JS DOM scripting. Even IE8 seems to do it better, at least it doesn't tinker with the source that JS "sees":
1    <div id="mytest">
2        <P class=theClass>blah 
3            <STRONG id=theID>"foo" 
4                <EM arbitrary="someValue">bar 
3            </STRONG>
2        </P>
5        <SCRIPT>
5        </SCRIPT>

6        <P>baz 
4            </EM>blubb 
6        </P>
7        <P>test 
7        </P>
1    </div>
This makes it a whole lot easier to work with JS to correct errors on pages than the unpredictable Opera approach. Firefox renders it exactly like Opera 11.10 but corrects the code the JS "sees" in a far better way:
1    <div id="mytest">
2        <p class="theClass">blah 
3            <strong id="theID">"foo" 
4                <em arbitrary="someValue">bar 
4                </em>
3            </strong>
2        </p>
5        <script>
5        </script>
6        <em arbitrary="someValue">
6        </em>
7        <p>
8            <em arbitrary="someValue">baz 
8            </em>blubb 
7        </p>
9        <p>test 
9        </p>
1    </div>
That is by far the best error correction, I wish we could get that in Opera too. When looking at this, I can understand why Opera sometimes fails on JS heavy pages, despite its extremely good JS support, if their HTML code isn't valid . Operas internal HTML DOM error correction routine must become much better than that. Block Level Elements like div, p, section etc. must have a higher priority than any phrasing element like strong, em, b, i etc.

cleanPages Extension - an arc90 Readability conversion Error correction revisited - sometimes wishes come true ...

Comments

lucideer Saturday, February 19, 2011 7:32:15 PM

Any browser that's signed up to support HTML5 is already working on parity in html error-correction due to the HTML5's new parsing algorithm - so Opera will have this.

http://www.w3.org/TR/html5/parsing.html

QuHno Saturday, February 19, 2011 9:45:39 PM

A bird whispered to me, that the HTML5 parser bell rings for Opera and that the HTML5 parser will be used for all other (X)HTML dialects too smile

I just hope it will come soon.

I think I must re-read the spec to find out how the error handling works in detail.

How to use Quote function:

  1. Select some text
  2. Click on the Quote link

Write a comment

Comment
(BBcode and HTML is turned off for anonymous user comments.)

If you can't read the words, press the small reload icon.


Smilies