Options:
- # Session Start: Tue Jul 10 00:00:00 2007
- # Session Ident: #html-wg
- # [00:06] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
- # [00:08] * Joins: mjs (mjs@17.255.105.59)
- # [00:08] * Parts: hasather (hasather@80.203.71.22)
- # [00:12] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
- # [00:13] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [00:18] * Joins: gavin (gavin@74.103.208.221)
- # [00:31] * Quits: tH (Rob@87.102.18.111) (Quit: ChatZilla 0.9.78.1-rdmsoft [XULRunner 1.8.0.9/2006120508])
- # [00:39] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
- # [00:51] * Quits: Zeros (Zeros-Elip@67.154.87.254) (Quit: Leaving)
- # [00:59] * Joins: mjs (mjs@17.255.105.59)
- # [01:09] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
- # [01:14] * Joins: mjs (mjs@17.255.105.59)
- # [01:17] * Parts: billmason (billmason@69.30.57.156)
- # [01:25] * Quits: Philip` (philip@80.177.163.133) (Ping timeout)
- # [01:31] * Joins: mjs_ (mjs@17.255.105.59)
- # [01:31] * Quits: mjs (mjs@17.255.105.59) (Connection reset by peer)
- # [01:32] * Joins: karl (karlcow@128.30.52.30)
- # [01:33] * Joins: Philip` (philip@80.177.163.133)
- # [01:42] * Quits: mjs_ (mjs@17.255.105.59) (Quit: mjs_)
- # [02:19] * Joins: mjs (mjs@17.255.105.59)
- # [02:20] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
- # [02:21] * Joins: mjs (mjs@17.255.105.59)
- # [03:13] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
- # [03:14] * Quits: kingryan (rking3@208.66.64.47) (Quit: kingryan)
- # [03:17] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
- # [03:54] * Joins: olivier (ot@128.30.52.30)
- # [04:13] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [04:18] * Joins: gavin (gavin@74.103.208.221)
- # [05:56] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
- # [06:09] * Quits: olivier (ot@128.30.52.30) (Quit: Leaving)
- # [06:19] * RRSAgent excuses himself; his presence no longer seems to be needed
- # [06:19] * Parts: RRSAgent (rrs-loggee@128.30.52.30)
- # [06:41] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [06:46] * Joins: gavin (gavin@74.103.208.221)
- # [07:38] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
- # [08:02] * Quits: sbuluf (jgnacpt@200.49.140.148) (Ping timeout)
- # [08:34] * Joins: zcorpan (zcorpan@88.131.66.80)
- # [08:45] * Joins: mjs (mjs@64.81.48.145)
- # [08:49] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [08:54] * Joins: gavin (gavin@74.103.208.221)
- # [09:11] * Joins: billyjack (MikeSmith@mcclure.w3.org)
- # [09:13] * Quits: billyjack (MikeSmith@mcclure.w3.org) (Quit: Less talk, more pimp walk.)
- # [09:18] * Joins: billyjack (MikeSmith@mcclure.w3.org)
- # [09:19] * Quits: billyjack (MikeSmith@mcclure.w3.org) (Client exited)
- # [09:19] * Joins: billyjack (MikeSmith@mcclure.w3.org)
- # [09:22] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
- # [09:29] * Joins: edas (edaspet@88.191.34.123)
- # [09:33] * billyjack is now known as MikeSmith
- # [09:37] * Joins: Dashimon (noone@80.202.223.17)
- # [09:38] * Quits: Dashiva (noone@80.202.223.17) (Ping timeout)
- # [09:38] * Dashimon is now known as Dashiva
- # [09:55] * Joins: ROBOd (robod@86.34.246.154)
- # [09:57] * Joins: Dashimon (noone@80.202.223.17)
- # [09:58] * Quits: Dashiva (noone@80.202.223.17) (Ping timeout)
- # [09:58] * Dashimon is now known as Dashiva
- # [10:02] * Quits: Dashiva (noone@80.202.223.17) (Ping timeout)
- # [10:43] * Joins: Dashiva (noone@80.202.223.17)
- # [10:56] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [11:01] * Joins: gavin (gavin@74.103.208.221)
- # [11:39] * Joins: karl (karlcow@128.30.52.30)
- # [11:45] <karl> hsivonen: you do not reply to my questions. You repeat in different words what I said.
- # [11:45] <karl> when you say "it is obvious", you forget that I have written this email, because it is not obvious.
- # [11:45] <karl> You know too much the specification ;)
- # [11:45] <karl> that's normal.
- # [11:46] <karl> about HTML document, I just followed the links.
- # [11:46] <karl> Which is what you seem to have missed.
- # [11:57] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
- # [12:23] * Joins: StephaneD (c1317c6b@128.30.52.23)
- # [12:23] <StephaneD> hi all
- # [12:23] <zcorpan> hi StephaneD
- # [12:24] <StephaneD> I hope I didn't sound like too much of a troll on the ML, but I still have to understand things and *will* ask candid questions time and again
- # [12:26] <zcorpan> iirc it was dropped because it triggered quirks mode in firefox and safari
- # [12:26] <StephaneD> yuck
- # [12:26] <zcorpan> and because we would need a new doctype for every revision of the language, which sucks
- # [12:26] <StephaneD> maybe a proper DTD, html4-like (even if I understand that we will not use a SGML-conformant syntax) would clear things up
- # [12:27] <zcorpan> why?
- # [12:27] <StephaneD> because we could explicitly say: this is HTML5, this is HTML5 as XML, etc
- # [12:28] <StephaneD> of course this does not help us with the revisions though
- # [12:28] <zcorpan> UAs don't need that information
- # [12:28] <StephaneD> for instance, implicit closing tags aren't good for xml-like syntax, so how is one to explicitly explain to the browser that it's one and not the other?
- # [12:29] <zcorpan> ?? i don't follow
- # [12:29] <StephaneD> the spec says that either I code sloppily as permissive HTML, or as strict XML-based HTML,right?
- # [12:30] <StephaneD> how is the browser to know that I'm not asking for a quirsk-like rendering, because I know what I'm doing and I want to have a strict rendering
- # [12:31] <zcorpan> there are two authoring formats: the custom text/html and XML
- # [12:31] <StephaneD> yes
- # [12:31] <zcorpan> you tell which you use with http content-type
- # [12:31] <StephaneD> assuming the browser can understand that (re: IE and application/xhtml+xml)
- # [12:32] <zcorpan> well, what the client understands or not is orthogonal
- # [12:32] <StephaneD> yah
- # [12:32] <zcorpan> if you use MS Word, the way you label it as being a word document is by http content-type
- # [12:33] <zcorpan> html5 vs. xhtml5 is no different
- # [12:33] <StephaneD> ok, point taken
- # [12:33] <zcorpan> ok. then you asked about rendering modes
- # [12:33] <StephaneD> yup
- # [12:34] <zcorpan> xml is always in the "no quirks" rendering mode
- # [12:34] <zcorpan> text/html can be in one of "no quirks" or "limited quirks" or "quirks" modes
- # [12:35] <zcorpan> if you use <!doctype html> you will get "no quirks"
- # [12:35] <zcorpan> and that is the only thing that is conforming per html5
- # [12:35] <zcorpan> if you don't use a doctype or use some other doctype then you might end up in another mode (which is required for compat)
- # [12:36] <zcorpan> does that answer the question?
- # [12:37] <StephaneD> yes and no
- # [12:37] <StephaneD> for the rendering choice, I'd say yes
- # [12:37] <StephaneD> but that leaves us with the idea that html5 id definitive
- # [12:38] <StephaneD> and history teaches us that nothing is final (re: html4)
- # [12:38] <StephaneD> s/id/is/
- # [12:38] <zcorpan> does having "5" in the doctype change that?
- # [12:39] <StephaneD> yup
- # [12:39] <zcorpan> how?
- # [12:39] <StephaneD> because I'm thinking html6
- # [12:39] <zcorpan> html6 can use the same doctype
- # [12:39] <StephaneD> not sure: imagine html6 drops a few tags and attributes and makes them 'illegal'
- # [12:40] <zcorpan> then it better have a good reason to do so
- # [12:40] <StephaneD> hehe
- # [12:40] <StephaneD> we *did* drop things
- # [12:40] <StephaneD> and have good reasons, as per *today's* state of the art
- # [12:40] <StephaneD> re:frames
- # [12:41] <zcorpan> making things illegal for authoring doesn't break compat
- # [12:41] <StephaneD> they were a very good idea when the bandwidth was poor
- # [12:41] <zcorpan> thus doesn't affect UAs
- # [12:41] <zcorpan> aiui, frames will be specced
- # [12:41] <zcorpan> (but still be "illegal")
- # [12:42] <StephaneD> ok, let's say frames are illegal for the sake of the argument. if I insert frames in html5 it's going to break the UA if it thinks I'm doing HTML5 and tries to render them but has no engine to do so, am I right?
- # [12:42] <zcorpan> no
- # [12:42] <zcorpan> HTML5 UAs will support frames
- # [12:42] <zcorpan> regardless of what doctype you declare
- # [12:42] <StephaneD> I *said* let's :)
- # [12:42] <zcorpan> yes
- # [12:43] <zcorpan> if we spec that frames must not be supported, then they will not be supported regardless of doctype
- # [12:43] <zcorpan> but frames have to be supported for compat with the web
- # [12:43] <StephaneD> yeah
- # [12:43] <zcorpan> so frames will be specced
- # [12:44] <zcorpan> there is no "html5 mode" in browsers where some things stop working
- # [12:44] <StephaneD> <zcorpan> making things illegal for authoring doesn't break compat <-- ok, I'm knid of beginning to see the light
- # [12:44] <zcorpan> :)
- # [12:44] <StephaneD> there could be, though
- # [12:44] <zcorpan> sure
- # [12:44] <StephaneD> to push things to their limits: after all my UA has nothing to do with frames because it's doing HTML5
- # [12:45] <StephaneD> yet the author did insert [illegal tag]
- # [12:45] <StephaneD> although
- # [12:45] <StephaneD> come to think of it
- # [12:45] <StephaneD> html specs have always explicitly said: if you don't know a tag, render its content as plain
- # [12:45] <zcorpan> yeah
- # [12:45] * StephaneD brain grinding
- # [12:46] <zcorpan> if you don't support something that html5 requires you to support, then you're not conforming
- # [12:46] <zcorpan> (even if the construct in question is illegal for authors to use)
- # [12:46] <StephaneD> ok, back to my example html6 with frames illegal
- # [12:47] <StephaneD> let's say html6 doesn't spec frames, how am I going to understand <!doctype html> is 6 and not 5 ?
- # [12:47] <StephaneD> (feel free to tell me when I'm thick, eh?) ;)
- # [12:47] <zcorpan> you mean that html6 would say "UAs must not support frames"?
- # [12:47] <StephaneD> yup
- # [12:48] <zcorpan> then, if you don't support frames, you conform to html6 but not to html5
- # [12:48] <StephaneD> yeah, and how am I to understand, seen from the UA, that it's html6 and not html5 or vice-versa?
- # [12:48] <zcorpan> you know which spec you're reading when you're implementing, right? :)
- # [12:49] <StephaneD> yeah, but I'm on the UA side this time :)
- # [12:49] <StephaneD> what? a spec? where? ;)
- # [12:49] <zcorpan> implementing HTML in a UA, yes
- # [12:49] <StephaneD> is the UA to first parse the code and then decide: "ok, there does not seem to be frames, this must be a very recent html, thus it's 6", etc ?
- # [12:49] <StephaneD> I'm not comfortable with that idea
- # [12:49] <zcorpan> no
- # [12:50] <zcorpan> you don't dispatch different modes depending on what you find in the document
- # [12:50] <zcorpan> (except for the quirks thing)
- # [12:50] <zcorpan> you either support frames or you don't support frames
- # [12:50] <zcorpan> *regardless* of what you find in the document
- # [12:51] <StephaneD> yeah, but how is the browser to know which grammar to load?
- # [12:51] <zcorpan> there is only one for html
- # [12:51] <StephaneD> let's say I've got a <yeepee> tag
- # [12:51] <StephaneD> how is the browser to *not* use it as explicited by html6 because it thinks it's presented with html5
- # [12:51] <MikeSmith> StephaneD - coding in conformant HTML5 is not coding "sloppily" in "permissive HTML"
- # [12:52] <MikeSmith> if you don't want to be thought of as a troll, you might want to not write ... stuff .. like that
- # [12:52] <StephaneD> yeah, sorry, the word sloppy was the closest I could find from what I had in mind
- # [12:53] <StephaneD> (not being a native is sometimes a drawback)
- # [12:53] <zcorpan> StephaneD: the browser never thinks it is presented with html5 if it supports html6
- # [12:53] <StephaneD> (maybe it's not visible but I do spend a long time weighing my words before posting to the list)
- # [12:54] <StephaneD> so it would think: since I find a <!doctype html> and html6 would be out, then it would automatically assume it's 6?
- # [12:54] <zcorpan> StephaneD: so it would support the <yepee> tag as defined in html6 even if you use the html5 doctype (or the html3.2 doctype, or no doctype)
- # [12:54] <StephaneD> I'll have to think all this through - that a very new way of thinking html compatibility
- # [12:54] <zcorpan> not really, it has been this way all along ;)
- # [12:54] <zcorpan> but authors don't know about it
- # [12:54] <StephaneD> ahhh
- # [12:55] <MikeSmith> StephaneD - I was about to say the same thing that zcorpan just said ...
- # [12:55] <MikeSmith> this isn't new to browsers
- # [12:55] <MikeSmith> it's the way browsers have been doing it all along
- # [12:55] <zcorpan> yeah
- # [12:55] <StephaneD> I must be thinking as if browsers have each version of HTML in a hermetic block, and obviously it's not the case
- # [12:56] <MikeSmith> nope
- # [12:56] <StephaneD> ok, thanks for clearing this up
- # [12:57] <zcorpan> np
- # [12:57] <StephaneD> I'll back-read the whole thread if I find the time
- # [12:57] <StephaneD> boy is this list active!
- # [13:02] <hsivonen> hmm. looks like karl left already...
- # [13:26] <StephaneD> FWIW (irc log mainly) I've found this as a summary: http://esw.w3.org/topic/HTML/DocTypes02
- # [13:27] <hsivonen> StephaneD: Hixie wrote a summary message about this a while back and cced to www-archive
- # [13:27] * hsivonen tries to find it
- # [13:28] <StephaneD> thx
- # [13:29] <hsivonen> StephaneD: http://www.w3.org/mid/Pine.LNX.4.64.0706192049040.10651@dhalsim.dreamhost.com
- # [13:29] <hsivonen> StephaneD: in particular, please see the "see also" links
- # [13:29] <StephaneD> ok, thanks
- # [13:30] <StephaneD> (and here's another afternoon of work ruined trying to understand how to build a perfect world) ;)
- # [13:35] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [13:40] * Joins: gavin (gavin@74.103.208.221)
- # [13:43] * Joins: myakura (myakura@58.88.37.26)
- # [13:44] <StephaneD> hsivonen: very educational read, thank you
- # [13:45] <hsivonen> StephaneD: np
- # [13:49] <StephaneD> additionally Karl did a good job of summarizing here: http://www.w3.org/QA/2007/05/html_and_version_mechanisms.html
- # [13:56] * Joins: Sander (svl@80.60.87.115)
- # [14:04] * Joins: billyjack (MikeSmith@mcclure.w3.org)
- # [14:05] * Quits: billyjack (MikeSmith@mcclure.w3.org) (Client exited)
- # [14:05] * Joins: billyjack (MikeSmith@mcclure.w3.org)
- # [14:06] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
- # [14:07] * billyjack is now known as MikeSmith
- # [14:34] * Quits: StephaneD (c1317c6b@128.30.52.23) (Quit: see you soon)
- # [15:08] * Joins: jdandrea (jdandrea@24.228.42.231)
- # [15:10] * Quits: jdandrea (jdandrea@24.228.42.231) (Quit: ciao)
- # [15:10] * Joins: jdandrea (jdandrea@24.228.42.231)
- # [15:17] * Quits: ROBOd (robod@86.34.246.154) (Client exited)
- # [15:41] * Joins: tH (Rob@87.102.18.111)
- # [15:43] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [15:44] * Quits: jdandrea (jdandrea@24.228.42.231) (Quit: ciao)
- # [15:46] * Joins: ROBOd (robod@86.34.246.154)
- # [15:48] * Joins: gavin (gavin@74.103.208.221)
- # [16:09] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
- # [16:26] * Quits: Sander (svl@80.60.87.115) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
- # [16:26] <zcorpan> seems like my detailed review of http://simon.html5.org/test/html/dom/interfaces/HTMLDocument/title/ will have to wait until tomorrow... (i haven't figured out how i want it to work yet)
- # [16:29] * Quits: tH (Rob@87.102.18.111) (Connection reset by peer)
- # [16:30] * Joins: tH (Rob@87.102.67.108)
- # [16:38] * Joins: billmason (billmason@69.30.57.156)
- # [17:31] * Quits: edas (edaspet@88.191.34.123) (Quit: http://eric.daspet.name/ et l'édition 2007 de http://www.paris-web.fr/ )
- # [17:42] * Parts: zcorpan (zcorpan@88.131.66.80)
- # [17:50] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [17:55] * Joins: gavin (gavin@74.103.208.221)
- # [18:14] * Joins: spleen_blender (notgonnage@72.16.243.238)
- # [18:57] * Joins: hasather (hasather@80.203.71.22)
- # [19:11] * Quits: mjs (mjs@64.81.48.145) (Client exited)
- # [19:13] * Joins: mjs (mjs@64.81.48.145)
- # [19:13] <Philip`> Tokenising the HTML5 spec (1.8MB): Python: 43 seconds Python + Psyco: 20 seconds Java: 0.25 seconds C++: 0.35 seconds
- # [19:13] <Philip`> Wait, where did my newlines go?
- # [19:13] <Philip`> Tokenising the HTML5 spec (1.8MB):
- # [19:13] <Philip`> Python: 43 seconds
- # [19:13] <Philip`> Python + Psyco: 20 seconds
- # [19:13] <Philip`> Java: 0.25 seconds
- # [19:13] <Philip`> C++: 0.35 seconds
- # [19:13] <Philip`> Tokenising ~2500 web pages stuck together (93MB):
- # [19:13] <Philip`> Java: 28 seconds
- # [19:14] <Philip`> C++: 19 seconds
- # [19:14] <Philip`> Python: I'm not even going to try
- # [19:14] <Philip`> (All were hooked up to just count the number of occurrences of tag names)
- # [19:15] <Philip`> (The C++ one is still a bit buggy since it doesn't do the input-stream stuff and doesn't handle non-numeric entities)
- # [19:30] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
- # [19:31] * Joins: Sander (svl@80.60.87.115)
- # [19:38] <hsivonen> Philip`: was that with a warm JVM?
- # [19:39] <hsivonen> anyway, pretty cool to beat C++ at something :-)
- # [19:40] <hsivonen> does the Python impl do all the encoding error stuff in input stream decoding appropriately pendantly?
- # [19:43] <hsivonen> Philip`: were they all reading a local file without explicit buffering to memory first?
- # [19:46] <Philip`> It was non-warm, only running the tokeniser once (though not measuring the JVM startup time), partly since I can't remember enough Java to make it read from the input stream more than once :-)
- # [19:47] <Philip`> (I'd assume the 93MB-one gives the JVM plenty of time to warm up, but it would be nice to repeat the tests multiple times)
- # [19:48] <Philip`> Java/C++ were reading from stdin, Python was buffering a file into a string first
- # [19:48] <Philip`> (I'd like to do these a bit more accurately, though I don't think anything is going to save Python...)
- # [19:49] * Joins: Zeros (Zeros-Elip@67.154.87.254)
- # [19:49] <hsivonen> btw, the way I do buffering and blocking is (so I think :-) optimized for InputStreams that return largish chunks on their own (like files). I have no idea how System.in behaves.
- # [19:50] <Philip`> I tried it with a BufferedInputStream around System.in but that didn't make any difference
- # [19:50] * Philip` tries it reading a file from disk instead
- # [19:50] <hsivonen> ok cool.
- # [19:52] * hsivonen has uncharitable thoughts about garbage markup inside tables that the tree builder has to deal with
- # [19:58] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [20:03] * Joins: gavin (gavin@74.103.208.221)
- # [20:05] <Philip`> HotSpot has quite visible effects
- # [20:07] <Philip`> If I do the ~2MB file lots of times, the server VM settles at around 0.12 seconds, and the client at about 0.17s
- # [20:09] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
- # [20:18] <hsivonen> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0A%3Ctable%3E%0A%0A%3C/
- # [20:18] <hsivonen> weird in Firefox
- # [20:22] * Quits: tH (Rob@87.102.67.108) (Ping timeout)
- # [20:24] <hsivonen> hmm. Opera doesn't do foster parenting in the DOM but renders content as if it did. foster parenting in the CSS box tree?
- # [20:28] * Joins: mjs (mjs@17.255.104.239)
- # [20:31] <Philip`> If I do the ~90MB file lots of times, the server VM actually gets slower - it's 27s for five or ten minutes, then 34s
- # [20:31] <Philip`> Maybe that's just because my CPU temperature gets up to 80'C after that much time...
- # [20:34] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3Cxmp%3E vs data:text/html,<xmp> - I guess that's just an artifact of document.write()
- # [20:42] <Philip`> Oh, C++ likes reading files instead of stdin
- # [20:43] <Philip`> After I repeat it long enough, reading from files into C++: 0.18 seconds for the small file, 11.5 seconds for the large file
- # [20:45] <Zeros> Philip`, sounds like the GC might be getting in your way
- # [20:49] * Joins: tH (Rob@87.102.67.108)
- # [20:55] <Philip`> Zeros: Is there a way to keep it out of my way?
- # [20:57] <Zeros> Might try changing which collector you're using, http://www.petefreitag.com/articles/gctuning/ talks about all the options you can give the jvm
- # [20:58] <hsivonen> Philip`: you could give the JVM so much memory that it doesn't run out of it before your test run finishes :-)
- # [20:59] <Philip`> I don't quite understand why the Java version has non-linear behaviour (it takes ~200 times as long for ~50 times as much input), since it shouldn't be having any more memory usage or more garbage when it's just a longer input/output stream
- # [20:59] <hsivonen> sure it has more garbage: more CharBuffers and more Strings
- # [21:00] <Philip`> More garbage than when doing a smaller file lots of times?
- # [21:01] <hsivonen> no
- # [21:01] <Zeros> More fragmentation I'd imagine
- # [21:02] <Zeros> While the JVM hasn't released a charbuffer that space isn't going to be reused and it'll have to alloc more space, which can be really slow. Play with the jvm settings.
- # [21:02] * Joins: dbaron (dbaron@63.245.220.242)
- # [21:04] <hsivonen> fwiw, a charbuffer wrapper object for a fixed char array is allocated every 2048 UTF-16 code units or more often (could be tweaked away by holding onto it). new strings are created for each tag and attribute name as well as attribute values
- # [21:04] <hsivonen> again, tag and attribute names provide an opportunity for optimization when I get around to adding a custom interning function (not gonna happen soon)
- # [21:06] <Zeros> Couldn't just use an enum?
- # [21:06] <Zeros> I guess that'd give you optimization in the valid case, and unknown attributes and tags would be slower
- # [21:07] <hsivonen> "the last table element in the stack of open elements has no parent, or its parent node is not an element"
- # [21:08] <hsivonen> how could it not be an element?
- # [21:08] * Joins: zcorpan (zcorpan@84.216.43.88)
- # [21:08] <hsivonen> the fragment case has an "html" sentinel anyway
- # [21:08] <zcorpan> DanC: good reply on the charset thing
- # [21:08] <hsivonen> Zeros: can't use enum for unknowns
- # [21:09] <hsivonen> Zeros: interned String is the best of both worlds
- # [21:09] <Zeros> yeah I suppose you're right
- # [21:09] <hsivonen> Zeros: however, I might add a magic bitfield anyway later on to make group checks fast
- # [21:09] <Zeros> nice
- # [21:10] <hsivonen> I consider a bitfield a premature optimization at this stage
- # [21:10] <DanC> tx, zcorpan
- # [21:14] * Quits: mjs (mjs@17.255.104.239) (Connection reset by peer)
- # [21:18] * Joins: mjs (mjs@17.255.104.239)
- # [21:21] * Philip` tries to make his code slower by actually implementing all the bits properly
- # [21:26] * zcorpan notes that DOMTokenList.add &c raise an exception if the argument contains spaces
- # [21:27] <zcorpan> any language that wants to have classes and work nicely with the DOM APIs just cannot have spaces in the classes
- # [21:27] <zcorpan> so it seems pointless to support it with getElementsByClassName
- # [21:28] <Philip`> Can you put s in class names?
- # [21:29] <zcorpan> sure, but that's not a space character
- # [21:29] <zcorpan> http://www.whatwg.org/specs/web-apps/current-work/#space
- # [21:39] * Joins: hyatt (hyatt@17.203.14.191)
- # [21:48] * Quits: Sander (svl@80.60.87.115) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
- # [21:58] <Philip`> hsivonen: It looks like you don't emit a parse error on </br/>
- # [21:59] <hsivonen> Philip`: hmm.
- # [22:00] <hsivonen> Philip`: forgot to check for end tagness
- # [22:00] <Philip`> Also it looks like you convert \r\n into \n\n
- # [22:00] <hsivonen> Philip`: thanks
- # [22:01] <hsivonen> I do?
- # [22:01] <hsivonen> that's bad
- # [22:02] <hsivonen> Philip`: fix checked in for the first bug
- # [22:04] <Philip`> From the code, it looks like if c=='\r' then you set c='\n' and later set prev=c, and then later test prev=='\r' except it's not '\r' any more
- # [22:04] <Philip`> unless I'm mistaking something
- # [22:05] <hsivonen> Philip`: fix checked in for the second bug, I think
- # [22:05] <hsivonen> yes, that was the bug
- # [22:05] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [22:06] * Philip` steals the eat-the-following-\n-instead-of-the-preceding-\r idea
- # [22:09] * Quits: mjs (mjs@17.255.104.239) (Ping timeout)
- # [22:10] * Joins: gavin (gavin@74.103.208.221)
- # [22:11] * Quits: zcorpan (zcorpan@84.216.43.88) (Ping timeout)
- # [22:15] <Philip`> html5lib doesn't work very well on "\r\r" or "\r\0"
- # [22:15] * Joins: mjs (mjs@17.255.104.239)
- # [22:17] <Philip`> The input-stream-preprocessing bit doesn't say when parse errors on \0 occur, which is incompatible with the html5lib test format putting ParseError in specific locations
- # [22:18] <hsivonen> I want it to occur in the tokenizer :-)
- # [22:18] <hsivonen> simply because I don't want the stream to do additional checking beyond character decoding in the stream
- # [22:19] <hsivonen> and the tokenizer has to look at each char anyway
- # [22:19] <Philip`> <!doc>xx\0 in html5lib gives the parse error before the comment token, whereas <!doc>xxx\0 gives it after the comment
- # [22:19] <hsivonen> not cool
- # [22:20] <Philip`> I can't see anything in spec saying what should happen in that case
- # [22:21] <hsivonen> I think the spec is being bad when it puts \0 in the stream instead of the tokenizer
- # [22:22] <hsivonen> or, it should put the check conseptually at the point when a character is read from the stream
- # [22:24] <Philip`> Conceptually you can read the next six characters from the stream after seeing a <!
- # [22:24] <Philip`> or you can not do so - it doesn't seem to indicate that one way is correct
- # [22:24] <Philip`> so "when a character is read from the stream" still seems insufficiently defined for this
- # [22:25] <hsivonen> right
- # [22:25] <hsivonen> my point is that I don't want to change what I am doing :-)
- # [22:28] <Philip`> Might it work if the spec said that "if the next n characters are ..." must always stop after reading the first character which does not match? (so it would read the "<!doc>" then stop, and the 0 wouldn't be read from the stream until later, though "<!doc\0" would still have the \0 parse error before the comment)
- # [22:29] <hsivonen> on perhaps this is something where we shouldn't care about error order between impls
- # [22:30] <hsivonen> I'm not going to report encoding errors in sync, either
- # [22:30] <hsivonen> => spec not being bad
- # [22:31] <hsivonen> http://2007.xtech.org/public/content/2007/06/12-summit-wrapup
- # [22:31] <hsivonen> notes me and Anne being marked as WHATWG reps
- # [22:33] <mjs> "Of course, arguments, particularly regarding <canvas> and accessibility, remain at the heart of the debate with no clear solutions in sight."
- # [22:33] <mjs> have we had that argument?
- # [22:33] <mjs> like, at all, let alone as the "heart of the debate"?
- # [22:33] <hsivonen> mjs: "canvas isn't accessible"
- # [22:34] <hsivonen> mjs: it lets you do visual things in a completely screen reader-unfriendly way
- # [22:34] <mjs> seriously though, I don't recall this being ever raised as a major objection to HTML5, let alone the top one
- # [22:34] <DanC> no, we have not had that argument.
- # [22:35] <DanC> I have tried to get people to make that argument in substance. (not very hard, but I've done a little prompting)
- # [22:35] <mjs> I do agree that <canvas> could be used to do something screen reader unfriendly, but I think that's true of any form of graphics
- # [22:36] <DanC> by the way, mjs, http://developer.apple.com/iphone/designingcontent.html rocks. it's great to see "just Do The Right Thing and it should mostly work" from a vendor.
- # [22:36] <mjs> DanC: well that's not exactly all it says, but thanks
- # [22:37] <DanC> "The first design rules for web applications on iPhone are to stick with web standards and follow established web design practices"
- # [22:37] <DanC> that's pretty much "Do The Right Thing". the rest is details ;-)
- # [22:38] <DanC> meanwhile, what W3C is putting out as mobile best practices seems like "please design for 1985 technology"
- # [22:38] <hsivonen> DanC++
- # [22:39] <DanC> I sent some comments on the W3C mobile best practices, and they do emphasize "one web" more as a result.
- # [22:41] <DanC> I'm not really an expert on deployment of mobile handset technology, so I don't have good arguments against "120 pixels, minimum." but I find it hard to believe that's really going to be a relevant target for very long.
- # [22:41] <mjs> well it mentions some nonstandard iphone-specific stuff, and the advice about media queries should be tweaked
- # [22:42] <mjs> but yes, it's mostly one-web focused
- # [22:42] <DanC> having some nonstandard stuff is no crime, as long as you're up front about the costs and benefits of using it, and as long as you don't put nonstandard stuff where standard stuff would obviously do the job
- # [22:43] * DanC wonders if we can just get rid of application/xhtml+xml
- # [22:46] <hsivonen> DanC: as in use application/xml or as in use text/html?
- # [22:46] <DanC> say... here's one that this HTML WG has discussed a bit recently... access keys... a good thing or bad thing? "Assign access keys to links in navigational menus and frequently accessed functionality."
- # [22:46] <DanC> use text/html
- # [22:46] <hsivonen> DanC: gotta have MathML and SVG there first :-)
- # [22:47] <DanC> sure. why not
- # [22:47] <mjs> without open-ended support for embedding other vocabularies, the XML serialization can always potentially do stuff that the HTML one can't
- # [22:48] <DanC> i'm happy using XML serialization in text/html
- # [22:48] <hsivonen> DanC: I'd rather we didn't open *that* can of worms
- # [22:48] <mjs> is it supposed to get parsed as HTML or as XML?
- # [22:49] <DanC> it's supposed to get parsed using HTML 5 rules which sorta erase the difference.
- # [22:49] <hsivonen> DanC: kinda big sorta
- # [22:50] <DanC> seems to be getting smaller all the time... with <br /> allowed and such
- # [22:50] <DanC> it could me that I'm just missing some critical clues.
- # [22:51] * Quits: ROBOd (robod@86.34.246.154) (Quit: http://www.robodesign.ro )
- # [22:51] <mjs> well, <div /> will do something different
- # [22:51] <hsivonen> PIs, CDATA sections, real />, namespaces, case folding
- # [22:52] <hsivonen> tag inference
- # [22:52] <Philip`> hsivonen: It seems it'd be a shame to not care about parse error order at all, since usually it's well-defined and easy to implement and helps ensure stuff is being done right. Maybe the tests could have an optional flag that indicates when error order doesn't matter (just for cases when the errors come asynchronously from the input-stream)?
- # [22:52] <hsivonen> Philip`: makes sense
- # [22:53] <DanC> yes, I'm prepared to live without <div />. prolly PIs too. an update of "appendix C" is fine.
- # [22:54] <hsivonen> DanC: are you prepared to live with no <ul> as child of <p> and no <tr> as child of <table>?
- # [22:54] <DanC> I won't miss CDATA sections, except maybe as a kludge to find an intersection between XML and <script> parsing.
- # [22:54] <hsivonen> anyway, there's legacy application/xhtml+xml content
- # [22:55] <DanC> I have lived this far with no ul as child of p. I don't see an issue with tr and table; why would I not get that?
- # [22:55] <hsivonen> if we don't define XHTML, it will happen ad hoc
- # [22:55] <hsivonen> DanC: the parsing algorith doesn't allow either
- # [22:56] <DanC> the parsing algorithm allows willy-nilly testing of <b> and<i>, but not <tr> inside <table>? huh?
- # [22:56] <DanC> nesting
- # [22:56] <hsivonen> DanC: yes
- # [22:56] <hsivonen> DanC: backwards compat :-)
- # [22:56] <DanC> so if I write <table><tr><td>abc</td></tr></table> like I have for years, html5lib will crap out?
- # [22:57] <hsivonen> DanC: no. it'll do the same as HTML 4: treat it as <table><tbody><tr><td>abc</td></tr></tbody></table>
- # [22:57] <jgraham> DanC: http://james.html5.org/cgi-bin/parsetree/parsetree.py?source=<table><tr><td>abc<%2Ftd><%2Ftr><%2Ftable>
- # [22:58] <hsivonen> DanC: can't express trees without the tbody in text/html in standards mode
- # [22:58] <DanC> oh. that hasn't bothered me so far.
- # [22:58] <DanC> I guess I'd have to be careful with my XPaths
- # [22:58] <DanC> does CSS magically not notice?
- # [22:58] <hsivonen> afk
- # [22:59] <hsivonen> DanC: CSS notices. it is a gotcha for CSS authors
- # [22:59] <hsivonen> really afk
- # [23:01] * Joins: myakura (myakura@58.88.37.26)
- # [23:01] * jgraham notices Molly didn't record his attendance at the XTech browser summit thing
- # [23:16] * Joins: zcorpan (zcorpan@84.216.41.183)
- # [23:17] <zcorpan> "browser varations in treatment of XML namespaces with DOM-based work arounds using scripts" -- http://esw.w3.org/topic/HtmlTestMaterials
- # [23:17] * zcorpan doesn't get that
- # [23:17] <zcorpan> is that namespaces in text/html?
- # [23:18] <DanC> oh... wow... merry christmas to me... the other items in that list sprouted test cases
- # [23:18] <zcorpan> they are pretty trivial... :)
- # [23:19] <DanC> well, I think it'll be non-trivial to get the HTML WG to agree what the right answer is in those cases.
- # [23:19] <zcorpan> couldn't get anything interesting out of "behavior for multiple definitions of same ID value" though :(
- # [23:19] <zcorpan> but aiui browser vendors want that case undefined anyway
- # [23:19] <DanC> i.e. it'll be non-trivial for W3C to say somthing other than "that's out of scope of the standards"
- # [23:20] * DanC struggles to decode aiui ... "as I understand it"?
- # [23:20] <zcorpan> yeah
- # [23:20] <DanC> ok, ~= AFAIK
- # [23:21] <DanC> mjs seems to argue hard against any cases being undefined.
- # [23:21] * DanC gets kinda excited at the possibility that the HTML WG might make real tangible progress on a test suite before we get old
- # [23:21] <zcorpan> sure, but if handling of duplicate ids is defined then browsers can't do lazy evaluation, which is cheaper
- # [23:22] <DanC> if it's defined as "pick the 1st one" then they can do lazy eval, no?
- # [23:22] <zcorpan> not if the dom is changed
- # [23:23] <DanC> I'm not sure I follow, but I think what you have in mind sounds like an interesting test case. one that involves scripting and interactivity.
- # [23:24] <DanC> are we gonna have to specify concurrency in scripting events, I wonder?
- # [23:24] * DanC shudders
- # [23:24] * DanC is reminded of POSIX file system semantic standards horrors
- # [23:27] <zcorpan> dup ids: http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%3Ctable%20id%3Dx%3E%3Ctr%3E%3Ctd%3E%3C/td%3E%3C/tr%3E%3Cp%20id%3Dx%3Ep%3C/table%3E%3Cscript%3Ew%28document.getElementById%28%22x%22%29.innerHTML%29%3C/script%3E
- # [23:28] <zcorpan> safari and firefox both put the P outside the table
- # [23:28] <zcorpan> safari uses the table, firefox uses the p
- # [23:28] <gsnedders> DanC: as part of my review of the spec, I'm doing a 1:1 implementation of the algorithms that I'm reviewing, complete with test cases
- # [23:29] <zcorpan> gsnedders: which algorithms?
- # [23:29] <gsnedders> zcorpan: off the top of my head, common microsyntaxes and the parser
- # [23:29] <zcorpan> ok
- # [23:30] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
- # [23:31] <DanC> gsnedders, where do you keep your code? is it published? bzr/hg/svn/cvs repo?
- # [23:31] <gsnedders> DanC: http://geoffers.no-ip.com/svn/php-html-5-lib
- # [23:31] * DanC wants to play with decentralized version control in building the HTML WG test suite
- # [23:31] <gsnedders> I think that is the correct URI
- # [23:31] <gsnedders> if not, just hit the authority and find the link
- # [23:32] * DanC gets a password prompt; wonders if that's by design
- # [23:32] <gsnedders> DanC: no
- # [23:32] <DanC> perhaps http://geoffers.no-ip.com/svn/php-html-5-direct/ ?
- # [23:33] <gsnedders> DanC: yes
- # [23:33] <DanC> ok, thanks
- # [23:34] <zcorpan> where is annevk btw? vacation?
- # [23:34] <gsnedders> it's too late to try and remember URLs off the top of my head
- # [23:35] <gsnedders> the number tests are all arranged so each test is run against each number algorithm
- # [23:35] <hasather> zcorpan: yea, I think so
- # [23:36] * zcorpan notes that http://html5.org/parsing-tests/testrunner.htm isn't the latest revision ( http://html5.googlecode.com/svn/trunk/parser-tests/testrunner.htm )
- # [23:37] <hasather> zcorpan: in Greece I think
- # [23:37] <zcorpan> hasather: ok
- # [23:37] <hsivonen> gsnedders: are you coordinating with Jero?
- # [23:37] <hsivonen> on the PHP impl?
- # [23:38] <Philip`> I like automatic test generation now - almost all the ones in http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/test3.test were constructed automatically from the tokeniser algorithm, and they cover every step in the algorithm
- # [23:38] <hsivonen> Philip`: cool
- # [23:39] <gsnedders> hsivonen: no, I've not had the time. I don't know if he ignored what I and another person had worked on previously or didn't know about it, but it is now outdated and needs in large parts to be redone
- # [23:39] <gsnedders> hsivonen: what I'm doing as part of the review will be far too slow to really be relevant though (though would be useful as a starting point)
- # [23:40] <hsivonen> gsnedders: how do you deal with Unicode in PHP?
- # [23:40] <gsnedders> hsivonen: horribly. just have to use UTF-8 strings.
- # [23:41] <gsnedders> hsivonen: and there's no easy way without relying on PHP extensions to do things at a character level, so it all has to be done at a byte level
- # [23:42] <zcorpan> so likely to not work with broken utf-8 sequences?
- # [23:42] <zcorpan> byte sequences
- # [23:43] <gsnedders> zcorpan: what I normally do just replaces any invalid sequences with a single U+FFFD character
- # [23:46] <zcorpan> 0xE5 0x3C
- # [23:47] <zcorpan> is that U+FFFD or U+FFFD followed by U+003C ?
- # [23:47] <hsivonen> gsnedders: I ported the Mozilla UTF-8 converter to PHP. there's a pure-PHP4 library on sf.net that uses it and provides other UTF-8 tools
- # [23:47] <gsnedders> zcorpan: the latter I expect?
- # [23:48] <zcorpan> gsnedders: it is in browsers, yeah.
- # [23:48] <gsnedders> hsivonen: But isn't it released under the same tri-license as Mozilla? And the pure PHP library is GPL, IIRC.
- # [23:48] <hsivonen> gsnedders: yeah, the port is under the tri-license. IIRC the lib is LGPL
- # [23:48] <gsnedders> zcorpan: I haven't touched that code in a while
- # [23:49] <gsnedders> zcorpan: it's U+FFD U+003C
- # [23:49] <gsnedders> *FFFD
- # [23:49] <gsnedders> hsivonen: I'm planning on merging an HTML5 parser into a BSD licensed project next year, so I can't really use such a thing
- # [23:50] <hsivonen> who is responsible for the html5lib non-JSOn test case format?
- # [23:50] <hsivonen> gsnedders: ok
- # [23:50] <hsivonen> can we make things easier and say that the substream is terminated by LF followed by #?
- # [23:51] <hsivonen> instead of LF followed by #errors
- # [23:52] <zcorpan> hsivonen: i think Hixie designed it
- # [23:53] <gsnedders> anyhow, see y'all tomorrow
- # [23:53] <hsivonen> let's see if I can get away with not looking beyond #
- # [23:53] <zcorpan> gsnedders: cya
- # [23:57] <jgraham> hsivonen: Hixie sort of. Although I implemented the parser we use so I guess I can bear some of the responsibility...
- # [23:58] * zcorpan implemented a "parser" in JS too
- # [23:59] <zcorpan> which is really just naïve split()s :)
- # [23:59] <Philip`> Ooh, I wonder if I could port my tokeniser to JS...
- # Session Close: Wed Jul 11 00:00:00 2007
The end :)