# [12:24] <StephaneD> I hope I didn't sound like too much of a troll on the ML, but I still have to understand things and *will* ask candid questions time and again
# [12:26] <zcorpan> iirc it was dropped because it triggered quirks mode in firefox and safari
# [12:27] <StephaneD> because we could explicitly say: this is HTML5, this is HTML5 as XML, etc
# [12:28] <StephaneD> of course this does not help us with the revisions though
# [12:28] <zcorpan> UAs don't need that information
# [12:28] <StephaneD> for instance, implicit closing tags aren't good for xml-like syntax, so how is one to explicitly explain to the browser that it's one and not the other?
# [12:29] <StephaneD> the spec says that either I code sloppily as permissive HTML, or as strict XML-based HTML,right?
# [12:30] <StephaneD> how is the browser to know that I'm not asking for a quirsk-like rendering, because I know what I'm doing and I want to have a strict rendering
# [12:31] <zcorpan> there are two authoring formats: the custom text/html and XML
# [12:42] <StephaneD> ok, let's say frames are illegal for the sake of the argument. if I insert frames in html5 it's going to break the UA if it thinks I'm doing HTML5 and tries to render them but has no engine to do so, am I right?
# [12:49] <zcorpan> implementing HTML in a UA, yes
# [12:49] <StephaneD> is the UA to first parse the code and then decide: "ok, there does not seem to be frames, this must be a very recent html, thus it's 6", etc ?
# [12:49] <StephaneD> I'm not comfortable with that idea
# [12:51] <StephaneD> let's say I've got a <yeepee> tag
# [12:51] <StephaneD> how is the browser to *not* use it as explicited by html6 because it thinks it's presented with html5
# [12:51] <MikeSmith> StephaneD - coding in conformant HTML5 is not coding "sloppily" in "permissive HTML"
# [12:52] <MikeSmith> if you don't want to be thought of as a troll, you might want to not write ... stuff .. like that
# [12:52] <StephaneD> yeah, sorry, the word sloppy was the closest I could find from what I had in mind
# [12:53] <StephaneD> (not being a native is sometimes a drawback)
# [12:53] <zcorpan> StephaneD: the browser never thinks it is presented with html5 if it supports html6
# [12:53] <StephaneD> (maybe it's not visible but I do spend a long time weighing my words before posting to the list)
# [12:54] <StephaneD> so it would think: since I find a <!doctype html> and html6 would be out, then it would automatically assume it's 6?
# [12:54] <zcorpan> StephaneD: so it would support the <yepee> tag as defined in html6 even if you use the html5 doctype (or the html3.2 doctype, or no doctype)
# [12:54] <StephaneD> I'll have to think all this through - that a very new way of thinking html compatibility
# [12:54] <zcorpan> not really, it has been this way all along ;)
# [12:54] <zcorpan> but authors don't know about it
# [19:38] <hsivonen> Philip`: was that with a warm JVM?
# [19:39] <hsivonen> anyway, pretty cool to beat C++ at something :-)
# [19:40] <hsivonen> does the Python impl do all the encoding error stuff in input stream decoding appropriately pendantly?
# [19:43] <hsivonen> Philip`: were they all reading a local file without explicit buffering to memory first?
# [19:46] <Philip`> It was non-warm, only running the tokeniser once (though not measuring the JVM startup time), partly since I can't remember enough Java to make it read from the input stream more than once :-)
# [19:47] <Philip`> (I'd assume the 93MB-one gives the JVM plenty of time to warm up, but it would be nice to repeat the tests multiple times)
# [19:48] <Philip`> Java/C++ were reading from stdin, Python was buffering a file into a string first
# [19:48] <Philip`> (I'd like to do these a bit more accurately, though I don't think anything is going to save Python...)
# [19:49] <hsivonen> btw, the way I do buffering and blocking is (so I think :-) optimized for InputStreams that return largish chunks on their own (like files). I have no idea how System.in behaves.
# [19:50] <Philip`> I tried it with a BufferedInputStream around System.in but that didn't make any difference
# [19:50] * Philip` tries it reading a file from disk instead
# [20:58] <hsivonen> Philip`: you could give the JVM so much memory that it doesn't run out of it before your test run finishes :-)
# [20:59] <Philip`> I don't quite understand why the Java version has non-linear behaviour (it takes ~200 times as long for ~50 times as much input), since it shouldn't be having any more memory usage or more garbage when it's just a longer input/output stream
# [20:59] <hsivonen> sure it has more garbage: more CharBuffers and more Strings
# [21:00] <Philip`> More garbage than when doing a smaller file lots of times?
# [21:02] <Zeros> While the JVM hasn't released a charbuffer that space isn't going to be reused and it'll have to alloc more space, which can be really slow. Play with the jvm settings.
# [21:04] <hsivonen> fwiw, a charbuffer wrapper object for a fixed char array is allocated every 2048 UTF-16 code units or more often (could be tweaked away by holding onto it). new strings are created for each tag and attribute name as well as attribute values
# [21:04] <hsivonen> again, tag and attribute names provide an opportunity for optimization when I get around to adding a custom interning function (not gonna happen soon)
# [22:02] <hsivonen> Philip`: fix checked in for the first bug
# [22:04] <Philip`> From the code, it looks like if c=='\r' then you set c='\n' and later set prev=c, and then later test prev=='\r' except it's not '\r' any more
# [22:17] <Philip`> The input-stream-preprocessing bit doesn't say when parse errors on \0 occur, which is incompatible with the html5lib test format putting ParseError in specific locations
# [22:18] <hsivonen> I want it to occur in the tokenizer :-)
# [22:18] <hsivonen> simply because I don't want the stream to do additional checking beyond character decoding in the stream
# [22:19] <hsivonen> and the tokenizer has to look at each char anyway
# [22:19] <Philip`> <!doc>xx\0 in html5lib gives the parse error before the comment token, whereas <!doc>xxx\0 gives it after the comment
# [22:25] <hsivonen> my point is that I don't want to change what I am doing :-)
# [22:28] <Philip`> Might it work if the spec said that "if the next n characters are ..." must always stop after reading the first character which does not match? (so it would read the "<!doc>" then stop, and the 0 wouldn't be read from the stream until later, though "<!doc\0" would still have the \0 parse error before the comment)
# [22:29] <hsivonen> on perhaps this is something where we shouldn't care about error order between impls
# [22:30] <hsivonen> I'm not going to report encoding errors in sync, either
# [22:31] <hsivonen> notes me and Anne being marked as WHATWG reps
# [22:33] <mjs> "Of course, arguments, particularly regarding <canvas> and accessibility, remain at the heart of the debate with no clear solutions in sight."
# [22:39] <DanC> I sent some comments on the W3C mobile best practices, and they do emphasize "one web" more as a result.
# [22:41] <DanC> I'm not really an expert on deployment of mobile handset technology, so I don't have good arguments against "120 pixels, minimum." but I find it hard to believe that's really going to be a relevant target for very long.
# [22:41] <mjs> well it mentions some nonstandard iphone-specific stuff, and the advice about media queries should be tweaked
# [22:42] <mjs> but yes, it's mostly one-web focused
# [22:42] <DanC> having some nonstandard stuff is no crime, as long as you're up front about the costs and benefits of using it, and as long as you don't put nonstandard stuff where standard stuff would obviously do the job
# [22:43] * DanC wonders if we can just get rid of application/xhtml+xml
# [22:46] <hsivonen> DanC: as in use application/xml or as in use text/html?
# [22:46] <DanC> say... here's one that this HTML WG has discussed a bit recently... access keys... a good thing or bad thing? "Assign access keys to links in navigational menus and frequently accessed functionality."
# [22:47] <mjs> without open-ended support for embedding other vocabularies, the XML serialization can always potentially do stuff that the HTML one can't
# [22:48] <DanC> i'm happy using XML serialization in text/html
# [22:48] <hsivonen> DanC: I'd rather we didn't open *that* can of worms
# [22:48] <mjs> is it supposed to get parsed as HTML or as XML?
# [22:49] <DanC> it's supposed to get parsed using HTML 5 rules which sorta erase the difference.
# [22:52] <Philip`> hsivonen: It seems it'd be a shame to not care about parse error order at all, since usually it's well-defined and easy to implement and helps ensure stuff is being done right. Maybe the tests could have an optional flag that indicates when error order doesn't matter (just for cases when the errors come asynchronously from the input-stream)?
# [23:23] <DanC> I'm not sure I follow, but I think what you have in mind sounds like an interesting test case. one that involves scripting and interactivity.
# [23:24] <DanC> are we gonna have to specify concurrency in scripting events, I wonder?
# [23:28] <zcorpan> safari and firefox both put the P outside the table
# [23:28] <zcorpan> safari uses the table, firefox uses the p
# [23:28] <gsnedders> DanC: as part of my review of the spec, I'm doing a 1:1 implementation of the algorithms that I'm reviewing, complete with test cases
# [23:39] <gsnedders> hsivonen: no, I've not had the time. I don't know if he ignored what I and another person had worked on previously or didn't know about it, but it is now outdated and needs in large parts to be redone
# [23:39] <gsnedders> hsivonen: what I'm doing as part of the review will be far too slow to really be relevant though (though would be useful as a starting point)
# [23:40] <hsivonen> gsnedders: how do you deal with Unicode in PHP?
# [23:40] <gsnedders> hsivonen: horribly. just have to use UTF-8 strings.
# [23:41] <gsnedders> hsivonen: and there's no easy way without relying on PHP extensions to do things at a character level, so it all has to be done at a byte level
# [23:42] <zcorpan> so likely to not work with broken utf-8 sequences?
# [23:47] <zcorpan> is that U+FFFD or U+FFFD followed by U+003C ?
# [23:47] <hsivonen> gsnedders: I ported the Mozilla UTF-8 converter to PHP. there's a pure-PHP4 library on sf.net that uses it and provides other UTF-8 tools
# [23:47] <gsnedders> zcorpan: the latter I expect?
# [23:48] <zcorpan> gsnedders: it is in browsers, yeah.
# [23:48] <gsnedders> hsivonen: But isn't it released under the same tri-license as Mozilla? And the pure PHP library is GPL, IIRC.
# [23:48] <hsivonen> gsnedders: yeah, the port is under the tri-license. IIRC the lib is LGPL
# [23:48] <gsnedders> zcorpan: I haven't touched that code in a while