/irc-logs / w3c / #html-wg / 2007-07-10 / end

Options:

  1. # Session Start: Tue Jul 10 00:00:00 2007
  2. # Session Ident: #html-wg
  3. # [00:06] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
  4. # [00:08] * Joins: mjs (mjs@17.255.105.59)
  5. # [00:08] * Parts: hasather (hasather@80.203.71.22)
  6. # [00:12] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
  7. # [00:13] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  8. # [00:18] * Joins: gavin (gavin@74.103.208.221)
  9. # [00:31] * Quits: tH (Rob@87.102.18.111) (Quit: ChatZilla 0.9.78.1-rdmsoft [XULRunner 1.8.0.9/2006120508])
  10. # [00:39] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
  11. # [00:51] * Quits: Zeros (Zeros-Elip@67.154.87.254) (Quit: Leaving)
  12. # [00:59] * Joins: mjs (mjs@17.255.105.59)
  13. # [01:09] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
  14. # [01:14] * Joins: mjs (mjs@17.255.105.59)
  15. # [01:17] * Parts: billmason (billmason@69.30.57.156)
  16. # [01:25] * Quits: Philip` (philip@80.177.163.133) (Ping timeout)
  17. # [01:31] * Joins: mjs_ (mjs@17.255.105.59)
  18. # [01:31] * Quits: mjs (mjs@17.255.105.59) (Connection reset by peer)
  19. # [01:32] * Joins: karl (karlcow@128.30.52.30)
  20. # [01:33] * Joins: Philip` (philip@80.177.163.133)
  21. # [01:42] * Quits: mjs_ (mjs@17.255.105.59) (Quit: mjs_)
  22. # [02:19] * Joins: mjs (mjs@17.255.105.59)
  23. # [02:20] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
  24. # [02:21] * Joins: mjs (mjs@17.255.105.59)
  25. # [03:13] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
  26. # [03:14] * Quits: kingryan (rking3@208.66.64.47) (Quit: kingryan)
  27. # [03:17] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  28. # [03:54] * Joins: olivier (ot@128.30.52.30)
  29. # [04:13] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  30. # [04:18] * Joins: gavin (gavin@74.103.208.221)
  31. # [05:56] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
  32. # [06:09] * Quits: olivier (ot@128.30.52.30) (Quit: Leaving)
  33. # [06:19] * RRSAgent excuses himself; his presence no longer seems to be needed
  34. # [06:19] * Parts: RRSAgent (rrs-loggee@128.30.52.30)
  35. # [06:41] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  36. # [06:46] * Joins: gavin (gavin@74.103.208.221)
  37. # [07:38] * Quits: mjs (mjs@17.255.105.59) (Quit: mjs)
  38. # [08:02] * Quits: sbuluf (jgnacpt@200.49.140.148) (Ping timeout)
  39. # [08:34] * Joins: zcorpan (zcorpan@88.131.66.80)
  40. # [08:45] * Joins: mjs (mjs@64.81.48.145)
  41. # [08:49] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  42. # [08:54] * Joins: gavin (gavin@74.103.208.221)
  43. # [09:11] * Joins: billyjack (MikeSmith@mcclure.w3.org)
  44. # [09:13] * Quits: billyjack (MikeSmith@mcclure.w3.org) (Quit: Less talk, more pimp walk.)
  45. # [09:18] * Joins: billyjack (MikeSmith@mcclure.w3.org)
  46. # [09:19] * Quits: billyjack (MikeSmith@mcclure.w3.org) (Client exited)
  47. # [09:19] * Joins: billyjack (MikeSmith@mcclure.w3.org)
  48. # [09:22] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
  49. # [09:29] * Joins: edas (edaspet@88.191.34.123)
  50. # [09:33] * billyjack is now known as MikeSmith
  51. # [09:37] * Joins: Dashimon (noone@80.202.223.17)
  52. # [09:38] * Quits: Dashiva (noone@80.202.223.17) (Ping timeout)
  53. # [09:38] * Dashimon is now known as Dashiva
  54. # [09:55] * Joins: ROBOd (robod@86.34.246.154)
  55. # [09:57] * Joins: Dashimon (noone@80.202.223.17)
  56. # [09:58] * Quits: Dashiva (noone@80.202.223.17) (Ping timeout)
  57. # [09:58] * Dashimon is now known as Dashiva
  58. # [10:02] * Quits: Dashiva (noone@80.202.223.17) (Ping timeout)
  59. # [10:43] * Joins: Dashiva (noone@80.202.223.17)
  60. # [10:56] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  61. # [11:01] * Joins: gavin (gavin@74.103.208.221)
  62. # [11:39] * Joins: karl (karlcow@128.30.52.30)
  63. # [11:45] <karl> hsivonen: you do not reply to my questions. You repeat in different words what I said.
  64. # [11:45] <karl> when you say "it is obvious", you forget that I have written this email, because it is not obvious.
  65. # [11:45] <karl> You know too much the specification ;)
  66. # [11:45] <karl> that's normal.
  67. # [11:46] <karl> about HTML document, I just followed the links.
  68. # [11:46] <karl> Which is what you seem to have missed.
  69. # [11:57] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
  70. # [12:23] * Joins: StephaneD (c1317c6b@128.30.52.23)
  71. # [12:23] <StephaneD> hi all
  72. # [12:23] <zcorpan> hi StephaneD
  73. # [12:24] <StephaneD> I hope I didn't sound like too much of a troll on the ML, but I still have to understand things and *will* ask candid questions time and again
  74. # [12:26] <zcorpan> iirc it was dropped because it triggered quirks mode in firefox and safari
  75. # [12:26] <StephaneD> yuck
  76. # [12:26] <zcorpan> and because we would need a new doctype for every revision of the language, which sucks
  77. # [12:26] <StephaneD> maybe a proper DTD, html4-like (even if I understand that we will not use a SGML-conformant syntax) would clear things up
  78. # [12:27] <zcorpan> why?
  79. # [12:27] <StephaneD> because we could explicitly say: this is HTML5, this is HTML5 as XML, etc
  80. # [12:28] <StephaneD> of course this does not help us with the revisions though
  81. # [12:28] <zcorpan> UAs don't need that information
  82. # [12:28] <StephaneD> for instance, implicit closing tags aren't good for xml-like syntax, so how is one to explicitly explain to the browser that it's one and not the other?
  83. # [12:29] <zcorpan> ?? i don't follow
  84. # [12:29] <StephaneD> the spec says that either I code sloppily as permissive HTML, or as strict XML-based HTML,right?
  85. # [12:30] <StephaneD> how is the browser to know that I'm not asking for a quirsk-like rendering, because I know what I'm doing and I want to have a strict rendering
  86. # [12:31] <zcorpan> there are two authoring formats: the custom text/html and XML
  87. # [12:31] <StephaneD> yes
  88. # [12:31] <zcorpan> you tell which you use with http content-type
  89. # [12:31] <StephaneD> assuming the browser can understand that (re: IE and application/xhtml+xml)
  90. # [12:32] <zcorpan> well, what the client understands or not is orthogonal
  91. # [12:32] <StephaneD> yah
  92. # [12:32] <zcorpan> if you use MS Word, the way you label it as being a word document is by http content-type
  93. # [12:33] <zcorpan> html5 vs. xhtml5 is no different
  94. # [12:33] <StephaneD> ok, point taken
  95. # [12:33] <zcorpan> ok. then you asked about rendering modes
  96. # [12:33] <StephaneD> yup
  97. # [12:34] <zcorpan> xml is always in the "no quirks" rendering mode
  98. # [12:34] <zcorpan> text/html can be in one of "no quirks" or "limited quirks" or "quirks" modes
  99. # [12:35] <zcorpan> if you use <!doctype html> you will get "no quirks"
  100. # [12:35] <zcorpan> and that is the only thing that is conforming per html5
  101. # [12:35] <zcorpan> if you don't use a doctype or use some other doctype then you might end up in another mode (which is required for compat)
  102. # [12:36] <zcorpan> does that answer the question?
  103. # [12:37] <StephaneD> yes and no
  104. # [12:37] <StephaneD> for the rendering choice, I'd say yes
  105. # [12:37] <StephaneD> but that leaves us with the idea that html5 id definitive
  106. # [12:38] <StephaneD> and history teaches us that nothing is final (re: html4)
  107. # [12:38] <StephaneD> s/id/is/
  108. # [12:38] <zcorpan> does having "5" in the doctype change that?
  109. # [12:39] <StephaneD> yup
  110. # [12:39] <zcorpan> how?
  111. # [12:39] <StephaneD> because I'm thinking html6
  112. # [12:39] <zcorpan> html6 can use the same doctype
  113. # [12:39] <StephaneD> not sure: imagine html6 drops a few tags and attributes and makes them 'illegal'
  114. # [12:40] <zcorpan> then it better have a good reason to do so
  115. # [12:40] <StephaneD> hehe
  116. # [12:40] <StephaneD> we *did* drop things
  117. # [12:40] <StephaneD> and have good reasons, as per *today's* state of the art
  118. # [12:40] <StephaneD> re:frames
  119. # [12:41] <zcorpan> making things illegal for authoring doesn't break compat
  120. # [12:41] <StephaneD> they were a very good idea when the bandwidth was poor
  121. # [12:41] <zcorpan> thus doesn't affect UAs
  122. # [12:41] <zcorpan> aiui, frames will be specced
  123. # [12:41] <zcorpan> (but still be "illegal")
  124. # [12:42] <StephaneD> ok, let's say frames are illegal for the sake of the argument. if I insert frames in html5 it's going to break the UA if it thinks I'm doing HTML5 and tries to render them but has no engine to do so, am I right?
  125. # [12:42] <zcorpan> no
  126. # [12:42] <zcorpan> HTML5 UAs will support frames
  127. # [12:42] <zcorpan> regardless of what doctype you declare
  128. # [12:42] <StephaneD> I *said* let's :)
  129. # [12:42] <zcorpan> yes
  130. # [12:43] <zcorpan> if we spec that frames must not be supported, then they will not be supported regardless of doctype
  131. # [12:43] <zcorpan> but frames have to be supported for compat with the web
  132. # [12:43] <StephaneD> yeah
  133. # [12:43] <zcorpan> so frames will be specced
  134. # [12:44] <zcorpan> there is no "html5 mode" in browsers where some things stop working
  135. # [12:44] <StephaneD> <zcorpan> making things illegal for authoring doesn't break compat <-- ok, I'm knid of beginning to see the light
  136. # [12:44] <zcorpan> :)
  137. # [12:44] <StephaneD> there could be, though
  138. # [12:44] <zcorpan> sure
  139. # [12:44] <StephaneD> to push things to their limits: after all my UA has nothing to do with frames because it's doing HTML5
  140. # [12:45] <StephaneD> yet the author did insert [illegal tag]
  141. # [12:45] <StephaneD> although
  142. # [12:45] <StephaneD> come to think of it
  143. # [12:45] <StephaneD> html specs have always explicitly said: if you don't know a tag, render its content as plain
  144. # [12:45] <zcorpan> yeah
  145. # [12:45] * StephaneD brain grinding
  146. # [12:46] <zcorpan> if you don't support something that html5 requires you to support, then you're not conforming
  147. # [12:46] <zcorpan> (even if the construct in question is illegal for authors to use)
  148. # [12:46] <StephaneD> ok, back to my example html6 with frames illegal
  149. # [12:47] <StephaneD> let's say html6 doesn't spec frames, how am I going to understand <!doctype html> is 6 and not 5 ?
  150. # [12:47] <StephaneD> (feel free to tell me when I'm thick, eh?) ;)
  151. # [12:47] <zcorpan> you mean that html6 would say "UAs must not support frames"?
  152. # [12:47] <StephaneD> yup
  153. # [12:48] <zcorpan> then, if you don't support frames, you conform to html6 but not to html5
  154. # [12:48] <StephaneD> yeah, and how am I to understand, seen from the UA, that it's html6 and not html5 or vice-versa?
  155. # [12:48] <zcorpan> you know which spec you're reading when you're implementing, right? :)
  156. # [12:49] <StephaneD> yeah, but I'm on the UA side this time :)
  157. # [12:49] <StephaneD> what? a spec? where? ;)
  158. # [12:49] <zcorpan> implementing HTML in a UA, yes
  159. # [12:49] <StephaneD> is the UA to first parse the code and then decide: "ok, there does not seem to be frames, this must be a very recent html, thus it's 6", etc ?
  160. # [12:49] <StephaneD> I'm not comfortable with that idea
  161. # [12:49] <zcorpan> no
  162. # [12:50] <zcorpan> you don't dispatch different modes depending on what you find in the document
  163. # [12:50] <zcorpan> (except for the quirks thing)
  164. # [12:50] <zcorpan> you either support frames or you don't support frames
  165. # [12:50] <zcorpan> *regardless* of what you find in the document
  166. # [12:51] <StephaneD> yeah, but how is the browser to know which grammar to load?
  167. # [12:51] <zcorpan> there is only one for html
  168. # [12:51] <StephaneD> let's say I've got a <yeepee> tag
  169. # [12:51] <StephaneD> how is the browser to *not* use it as explicited by html6 because it thinks it's presented with html5
  170. # [12:51] <MikeSmith> StephaneD - coding in conformant HTML5 is not coding "sloppily" in "permissive HTML"
  171. # [12:52] <MikeSmith> if you don't want to be thought of as a troll, you might want to not write ... stuff .. like that
  172. # [12:52] <StephaneD> yeah, sorry, the word sloppy was the closest I could find from what I had in mind
  173. # [12:53] <StephaneD> (not being a native is sometimes a drawback)
  174. # [12:53] <zcorpan> StephaneD: the browser never thinks it is presented with html5 if it supports html6
  175. # [12:53] <StephaneD> (maybe it's not visible but I do spend a long time weighing my words before posting to the list)
  176. # [12:54] <StephaneD> so it would think: since I find a <!doctype html> and html6 would be out, then it would automatically assume it's 6?
  177. # [12:54] <zcorpan> StephaneD: so it would support the <yepee> tag as defined in html6 even if you use the html5 doctype (or the html3.2 doctype, or no doctype)
  178. # [12:54] <StephaneD> I'll have to think all this through - that a very new way of thinking html compatibility
  179. # [12:54] <zcorpan> not really, it has been this way all along ;)
  180. # [12:54] <zcorpan> but authors don't know about it
  181. # [12:54] <StephaneD> ahhh
  182. # [12:55] <MikeSmith> StephaneD - I was about to say the same thing that zcorpan just said ...
  183. # [12:55] <MikeSmith> this isn't new to browsers
  184. # [12:55] <MikeSmith> it's the way browsers have been doing it all along
  185. # [12:55] <zcorpan> yeah
  186. # [12:55] <StephaneD> I must be thinking as if browsers have each version of HTML in a hermetic block, and obviously it's not the case
  187. # [12:56] <MikeSmith> nope
  188. # [12:56] <StephaneD> ok, thanks for clearing this up
  189. # [12:57] <zcorpan> np
  190. # [12:57] <StephaneD> I'll back-read the whole thread if I find the time
  191. # [12:57] <StephaneD> boy is this list active!
  192. # [13:02] <hsivonen> hmm. looks like karl left already...
  193. # [13:26] <StephaneD> FWIW (irc log mainly) I've found this as a summary: http://esw.w3.org/topic/HTML/DocTypes02
  194. # [13:27] <hsivonen> StephaneD: Hixie wrote a summary message about this a while back and cced to www-archive
  195. # [13:27] * hsivonen tries to find it
  196. # [13:28] <StephaneD> thx
  197. # [13:29] <hsivonen> StephaneD: http://www.w3.org/mid/Pine.LNX.4.64.0706192049040.10651@dhalsim.dreamhost.com
  198. # [13:29] <hsivonen> StephaneD: in particular, please see the "see also" links
  199. # [13:29] <StephaneD> ok, thanks
  200. # [13:30] <StephaneD> (and here's another afternoon of work ruined trying to understand how to build a perfect world) ;)
  201. # [13:35] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  202. # [13:40] * Joins: gavin (gavin@74.103.208.221)
  203. # [13:43] * Joins: myakura (myakura@58.88.37.26)
  204. # [13:44] <StephaneD> hsivonen: very educational read, thank you
  205. # [13:45] <hsivonen> StephaneD: np
  206. # [13:49] <StephaneD> additionally Karl did a good job of summarizing here: http://www.w3.org/QA/2007/05/html_and_version_mechanisms.html
  207. # [13:56] * Joins: Sander (svl@80.60.87.115)
  208. # [14:04] * Joins: billyjack (MikeSmith@mcclure.w3.org)
  209. # [14:05] * Quits: billyjack (MikeSmith@mcclure.w3.org) (Client exited)
  210. # [14:05] * Joins: billyjack (MikeSmith@mcclure.w3.org)
  211. # [14:06] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
  212. # [14:07] * billyjack is now known as MikeSmith
  213. # [14:34] * Quits: StephaneD (c1317c6b@128.30.52.23) (Quit: see you soon)
  214. # [15:08] * Joins: jdandrea (jdandrea@24.228.42.231)
  215. # [15:10] * Quits: jdandrea (jdandrea@24.228.42.231) (Quit: ciao)
  216. # [15:10] * Joins: jdandrea (jdandrea@24.228.42.231)
  217. # [15:17] * Quits: ROBOd (robod@86.34.246.154) (Client exited)
  218. # [15:41] * Joins: tH (Rob@87.102.18.111)
  219. # [15:43] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  220. # [15:44] * Quits: jdandrea (jdandrea@24.228.42.231) (Quit: ciao)
  221. # [15:46] * Joins: ROBOd (robod@86.34.246.154)
  222. # [15:48] * Joins: gavin (gavin@74.103.208.221)
  223. # [16:09] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
  224. # [16:26] * Quits: Sander (svl@80.60.87.115) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  225. # [16:26] <zcorpan> seems like my detailed review of http://simon.html5.org/test/html/dom/interfaces/HTMLDocument/title/ will have to wait until tomorrow... (i haven't figured out how i want it to work yet)
  226. # [16:29] * Quits: tH (Rob@87.102.18.111) (Connection reset by peer)
  227. # [16:30] * Joins: tH (Rob@87.102.67.108)
  228. # [16:38] * Joins: billmason (billmason@69.30.57.156)
  229. # [17:31] * Quits: edas (edaspet@88.191.34.123) (Quit: http://eric.daspet.name/ et l'édition 2007 de http://www.paris-web.fr/ )
  230. # [17:42] * Parts: zcorpan (zcorpan@88.131.66.80)
  231. # [17:50] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  232. # [17:55] * Joins: gavin (gavin@74.103.208.221)
  233. # [18:14] * Joins: spleen_blender (notgonnage@72.16.243.238)
  234. # [18:57] * Joins: hasather (hasather@80.203.71.22)
  235. # [19:11] * Quits: mjs (mjs@64.81.48.145) (Client exited)
  236. # [19:13] * Joins: mjs (mjs@64.81.48.145)
  237. # [19:13] <Philip`> Tokenising the HTML5 spec (1.8MB): Python: 43 seconds Python + Psyco: 20 seconds Java: 0.25 seconds C++: 0.35 seconds
  238. # [19:13] <Philip`> Wait, where did my newlines go?
  239. # [19:13] <Philip`> Tokenising the HTML5 spec (1.8MB):
  240. # [19:13] <Philip`> Python: 43 seconds
  241. # [19:13] <Philip`> Python + Psyco: 20 seconds
  242. # [19:13] <Philip`> Java: 0.25 seconds
  243. # [19:13] <Philip`> C++: 0.35 seconds
  244. # [19:13] <Philip`> Tokenising ~2500 web pages stuck together (93MB):
  245. # [19:13] <Philip`> Java: 28 seconds
  246. # [19:14] <Philip`> C++: 19 seconds
  247. # [19:14] <Philip`> Python: I'm not even going to try
  248. # [19:14] <Philip`> (All were hooked up to just count the number of occurrences of tag names)
  249. # [19:15] <Philip`> (The C++ one is still a bit buggy since it doesn't do the input-stream stuff and doesn't handle non-numeric entities)
  250. # [19:30] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  251. # [19:31] * Joins: Sander (svl@80.60.87.115)
  252. # [19:38] <hsivonen> Philip`: was that with a warm JVM?
  253. # [19:39] <hsivonen> anyway, pretty cool to beat C++ at something :-)
  254. # [19:40] <hsivonen> does the Python impl do all the encoding error stuff in input stream decoding appropriately pendantly?
  255. # [19:43] <hsivonen> Philip`: were they all reading a local file without explicit buffering to memory first?
  256. # [19:46] <Philip`> It was non-warm, only running the tokeniser once (though not measuring the JVM startup time), partly since I can't remember enough Java to make it read from the input stream more than once :-)
  257. # [19:47] <Philip`> (I'd assume the 93MB-one gives the JVM plenty of time to warm up, but it would be nice to repeat the tests multiple times)
  258. # [19:48] <Philip`> Java/C++ were reading from stdin, Python was buffering a file into a string first
  259. # [19:48] <Philip`> (I'd like to do these a bit more accurately, though I don't think anything is going to save Python...)
  260. # [19:49] * Joins: Zeros (Zeros-Elip@67.154.87.254)
  261. # [19:49] <hsivonen> btw, the way I do buffering and blocking is (so I think :-) optimized for InputStreams that return largish chunks on their own (like files). I have no idea how System.in behaves.
  262. # [19:50] <Philip`> I tried it with a BufferedInputStream around System.in but that didn't make any difference
  263. # [19:50] * Philip` tries it reading a file from disk instead
  264. # [19:50] <hsivonen> ok cool.
  265. # [19:52] * hsivonen has uncharitable thoughts about garbage markup inside tables that the tree builder has to deal with
  266. # [19:58] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  267. # [20:03] * Joins: gavin (gavin@74.103.208.221)
  268. # [20:05] <Philip`> HotSpot has quite visible effects
  269. # [20:07] <Philip`> If I do the ~2MB file lots of times, the server VM settles at around 0.12 seconds, and the client at about 0.17s
  270. # [20:09] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
  271. # [20:18] <hsivonen> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0A%3Ctable%3E%0A%0A%3C/
  272. # [20:18] <hsivonen> weird in Firefox
  273. # [20:22] * Quits: tH (Rob@87.102.67.108) (Ping timeout)
  274. # [20:24] <hsivonen> hmm. Opera doesn't do foster parenting in the DOM but renders content as if it did. foster parenting in the CSS box tree?
  275. # [20:28] * Joins: mjs (mjs@17.255.104.239)
  276. # [20:31] <Philip`> If I do the ~90MB file lots of times, the server VM actually gets slower - it's 27s for five or ten minutes, then 34s
  277. # [20:31] <Philip`> Maybe that's just because my CPU temperature gets up to 80'C after that much time...
  278. # [20:34] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3Cxmp%3E vs data:text/html,<xmp> - I guess that's just an artifact of document.write()
  279. # [20:42] <Philip`> Oh, C++ likes reading files instead of stdin
  280. # [20:43] <Philip`> After I repeat it long enough, reading from files into C++: 0.18 seconds for the small file, 11.5 seconds for the large file
  281. # [20:45] <Zeros> Philip`, sounds like the GC might be getting in your way
  282. # [20:49] * Joins: tH (Rob@87.102.67.108)
  283. # [20:55] <Philip`> Zeros: Is there a way to keep it out of my way?
  284. # [20:57] <Zeros> Might try changing which collector you're using, http://www.petefreitag.com/articles/gctuning/ talks about all the options you can give the jvm
  285. # [20:58] <hsivonen> Philip`: you could give the JVM so much memory that it doesn't run out of it before your test run finishes :-)
  286. # [20:59] <Philip`> I don't quite understand why the Java version has non-linear behaviour (it takes ~200 times as long for ~50 times as much input), since it shouldn't be having any more memory usage or more garbage when it's just a longer input/output stream
  287. # [20:59] <hsivonen> sure it has more garbage: more CharBuffers and more Strings
  288. # [21:00] <Philip`> More garbage than when doing a smaller file lots of times?
  289. # [21:01] <hsivonen> no
  290. # [21:01] <Zeros> More fragmentation I'd imagine
  291. # [21:02] <Zeros> While the JVM hasn't released a charbuffer that space isn't going to be reused and it'll have to alloc more space, which can be really slow. Play with the jvm settings.
  292. # [21:02] * Joins: dbaron (dbaron@63.245.220.242)
  293. # [21:04] <hsivonen> fwiw, a charbuffer wrapper object for a fixed char array is allocated every 2048 UTF-16 code units or more often (could be tweaked away by holding onto it). new strings are created for each tag and attribute name as well as attribute values
  294. # [21:04] <hsivonen> again, tag and attribute names provide an opportunity for optimization when I get around to adding a custom interning function (not gonna happen soon)
  295. # [21:06] <Zeros> Couldn't just use an enum?
  296. # [21:06] <Zeros> I guess that'd give you optimization in the valid case, and unknown attributes and tags would be slower
  297. # [21:07] <hsivonen> "the last table element in the stack of open elements has no parent, or its parent node is not an element"
  298. # [21:08] <hsivonen> how could it not be an element?
  299. # [21:08] * Joins: zcorpan (zcorpan@84.216.43.88)
  300. # [21:08] <hsivonen> the fragment case has an "html" sentinel anyway
  301. # [21:08] <zcorpan> DanC: good reply on the charset thing
  302. # [21:08] <hsivonen> Zeros: can't use enum for unknowns
  303. # [21:09] <hsivonen> Zeros: interned String is the best of both worlds
  304. # [21:09] <Zeros> yeah I suppose you're right
  305. # [21:09] <hsivonen> Zeros: however, I might add a magic bitfield anyway later on to make group checks fast
  306. # [21:09] <Zeros> nice
  307. # [21:10] <hsivonen> I consider a bitfield a premature optimization at this stage
  308. # [21:10] <DanC> tx, zcorpan
  309. # [21:14] * Quits: mjs (mjs@17.255.104.239) (Connection reset by peer)
  310. # [21:18] * Joins: mjs (mjs@17.255.104.239)
  311. # [21:21] * Philip` tries to make his code slower by actually implementing all the bits properly
  312. # [21:26] * zcorpan notes that DOMTokenList.add &c raise an exception if the argument contains spaces
  313. # [21:27] <zcorpan> any language that wants to have classes and work nicely with the DOM APIs just cannot have spaces in the classes
  314. # [21:27] <zcorpan> so it seems pointless to support it with getElementsByClassName
  315. # [21:28] <Philip`> Can you put &nbsp;s in class names?
  316. # [21:29] <zcorpan> sure, but that's not a space character
  317. # [21:29] <zcorpan> http://www.whatwg.org/specs/web-apps/current-work/#space
  318. # [21:39] * Joins: hyatt (hyatt@17.203.14.191)
  319. # [21:48] * Quits: Sander (svl@80.60.87.115) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  320. # [21:58] <Philip`> hsivonen: It looks like you don't emit a parse error on </br/>
  321. # [21:59] <hsivonen> Philip`: hmm.
  322. # [22:00] <hsivonen> Philip`: forgot to check for end tagness
  323. # [22:00] <Philip`> Also it looks like you convert \r\n into \n\n
  324. # [22:00] <hsivonen> Philip`: thanks
  325. # [22:01] <hsivonen> I do?
  326. # [22:01] <hsivonen> that's bad
  327. # [22:02] <hsivonen> Philip`: fix checked in for the first bug
  328. # [22:04] <Philip`> From the code, it looks like if c=='\r' then you set c='\n' and later set prev=c, and then later test prev=='\r' except it's not '\r' any more
  329. # [22:04] <Philip`> unless I'm mistaking something
  330. # [22:05] <hsivonen> Philip`: fix checked in for the second bug, I think
  331. # [22:05] <hsivonen> yes, that was the bug
  332. # [22:05] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  333. # [22:06] * Philip` steals the eat-the-following-\n-instead-of-the-preceding-\r idea
  334. # [22:09] * Quits: mjs (mjs@17.255.104.239) (Ping timeout)
  335. # [22:10] * Joins: gavin (gavin@74.103.208.221)
  336. # [22:11] * Quits: zcorpan (zcorpan@84.216.43.88) (Ping timeout)
  337. # [22:15] <Philip`> html5lib doesn't work very well on "\r\r" or "\r\0"
  338. # [22:15] * Joins: mjs (mjs@17.255.104.239)
  339. # [22:17] <Philip`> The input-stream-preprocessing bit doesn't say when parse errors on \0 occur, which is incompatible with the html5lib test format putting ParseError in specific locations
  340. # [22:18] <hsivonen> I want it to occur in the tokenizer :-)
  341. # [22:18] <hsivonen> simply because I don't want the stream to do additional checking beyond character decoding in the stream
  342. # [22:19] <hsivonen> and the tokenizer has to look at each char anyway
  343. # [22:19] <Philip`> <!doc>xx\0 in html5lib gives the parse error before the comment token, whereas <!doc>xxx\0 gives it after the comment
  344. # [22:19] <hsivonen> not cool
  345. # [22:20] <Philip`> I can't see anything in spec saying what should happen in that case
  346. # [22:21] <hsivonen> I think the spec is being bad when it puts \0 in the stream instead of the tokenizer
  347. # [22:22] <hsivonen> or, it should put the check conseptually at the point when a character is read from the stream
  348. # [22:24] <Philip`> Conceptually you can read the next six characters from the stream after seeing a <!
  349. # [22:24] <Philip`> or you can not do so - it doesn't seem to indicate that one way is correct
  350. # [22:24] <Philip`> so "when a character is read from the stream" still seems insufficiently defined for this
  351. # [22:25] <hsivonen> right
  352. # [22:25] <hsivonen> my point is that I don't want to change what I am doing :-)
  353. # [22:28] <Philip`> Might it work if the spec said that "if the next n characters are ..." must always stop after reading the first character which does not match? (so it would read the "<!doc>" then stop, and the 0 wouldn't be read from the stream until later, though "<!doc\0" would still have the \0 parse error before the comment)
  354. # [22:29] <hsivonen> on perhaps this is something where we shouldn't care about error order between impls
  355. # [22:30] <hsivonen> I'm not going to report encoding errors in sync, either
  356. # [22:30] <hsivonen> => spec not being bad
  357. # [22:31] <hsivonen> http://2007.xtech.org/public/content/2007/06/12-summit-wrapup
  358. # [22:31] <hsivonen> notes me and Anne being marked as WHATWG reps
  359. # [22:33] <mjs> "Of course, arguments, particularly regarding <canvas> and accessibility, remain at the heart of the debate with no clear solutions in sight."
  360. # [22:33] <mjs> have we had that argument?
  361. # [22:33] <mjs> like, at all, let alone as the "heart of the debate"?
  362. # [22:33] <hsivonen> mjs: "canvas isn't accessible"
  363. # [22:34] <hsivonen> mjs: it lets you do visual things in a completely screen reader-unfriendly way
  364. # [22:34] <mjs> seriously though, I don't recall this being ever raised as a major objection to HTML5, let alone the top one
  365. # [22:34] <DanC> no, we have not had that argument.
  366. # [22:35] <DanC> I have tried to get people to make that argument in substance. (not very hard, but I've done a little prompting)
  367. # [22:35] <mjs> I do agree that <canvas> could be used to do something screen reader unfriendly, but I think that's true of any form of graphics
  368. # [22:36] <DanC> by the way, mjs, http://developer.apple.com/iphone/designingcontent.html rocks. it's great to see "just Do The Right Thing and it should mostly work" from a vendor.
  369. # [22:36] <mjs> DanC: well that's not exactly all it says, but thanks
  370. # [22:37] <DanC> "The first design rules for web applications on iPhone are to stick with web standards and follow established web design practices"
  371. # [22:37] <DanC> that's pretty much "Do The Right Thing". the rest is details ;-)
  372. # [22:38] <DanC> meanwhile, what W3C is putting out as mobile best practices seems like "please design for 1985 technology"
  373. # [22:38] <hsivonen> DanC++
  374. # [22:39] <DanC> I sent some comments on the W3C mobile best practices, and they do emphasize "one web" more as a result.
  375. # [22:41] <DanC> I'm not really an expert on deployment of mobile handset technology, so I don't have good arguments against "120 pixels, minimum." but I find it hard to believe that's really going to be a relevant target for very long.
  376. # [22:41] <mjs> well it mentions some nonstandard iphone-specific stuff, and the advice about media queries should be tweaked
  377. # [22:42] <mjs> but yes, it's mostly one-web focused
  378. # [22:42] <DanC> having some nonstandard stuff is no crime, as long as you're up front about the costs and benefits of using it, and as long as you don't put nonstandard stuff where standard stuff would obviously do the job
  379. # [22:43] * DanC wonders if we can just get rid of application/xhtml+xml
  380. # [22:46] <hsivonen> DanC: as in use application/xml or as in use text/html?
  381. # [22:46] <DanC> say... here's one that this HTML WG has discussed a bit recently... access keys... a good thing or bad thing? "Assign access keys to links in navigational menus and frequently accessed functionality."
  382. # [22:46] <DanC> use text/html
  383. # [22:46] <hsivonen> DanC: gotta have MathML and SVG there first :-)
  384. # [22:47] <DanC> sure. why not
  385. # [22:47] <mjs> without open-ended support for embedding other vocabularies, the XML serialization can always potentially do stuff that the HTML one can't
  386. # [22:48] <DanC> i'm happy using XML serialization in text/html
  387. # [22:48] <hsivonen> DanC: I'd rather we didn't open *that* can of worms
  388. # [22:48] <mjs> is it supposed to get parsed as HTML or as XML?
  389. # [22:49] <DanC> it's supposed to get parsed using HTML 5 rules which sorta erase the difference.
  390. # [22:49] <hsivonen> DanC: kinda big sorta
  391. # [22:50] <DanC> seems to be getting smaller all the time... with <br /> allowed and such
  392. # [22:50] <DanC> it could me that I'm just missing some critical clues.
  393. # [22:51] * Quits: ROBOd (robod@86.34.246.154) (Quit: http://www.robodesign.ro )
  394. # [22:51] <mjs> well, <div /> will do something different
  395. # [22:51] <hsivonen> PIs, CDATA sections, real />, namespaces, case folding
  396. # [22:52] <hsivonen> tag inference
  397. # [22:52] <Philip`> hsivonen: It seems it'd be a shame to not care about parse error order at all, since usually it's well-defined and easy to implement and helps ensure stuff is being done right. Maybe the tests could have an optional flag that indicates when error order doesn't matter (just for cases when the errors come asynchronously from the input-stream)?
  398. # [22:52] <hsivonen> Philip`: makes sense
  399. # [22:53] <DanC> yes, I'm prepared to live without <div />. prolly PIs too. an update of "appendix C" is fine.
  400. # [22:54] <hsivonen> DanC: are you prepared to live with no <ul> as child of <p> and no <tr> as child of <table>?
  401. # [22:54] <DanC> I won't miss CDATA sections, except maybe as a kludge to find an intersection between XML and <script> parsing.
  402. # [22:54] <hsivonen> anyway, there's legacy application/xhtml+xml content
  403. # [22:55] <DanC> I have lived this far with no ul as child of p. I don't see an issue with tr and table; why would I not get that?
  404. # [22:55] <hsivonen> if we don't define XHTML, it will happen ad hoc
  405. # [22:55] <hsivonen> DanC: the parsing algorith doesn't allow either
  406. # [22:56] <DanC> the parsing algorithm allows willy-nilly testing of <b> and<i>, but not <tr> inside <table>? huh?
  407. # [22:56] <DanC> nesting
  408. # [22:56] <hsivonen> DanC: yes
  409. # [22:56] <hsivonen> DanC: backwards compat :-)
  410. # [22:56] <DanC> so if I write <table><tr><td>abc</td></tr></table> like I have for years, html5lib will crap out?
  411. # [22:57] <hsivonen> DanC: no. it'll do the same as HTML 4: treat it as <table><tbody><tr><td>abc</td></tr></tbody></table>
  412. # [22:57] <jgraham> DanC: http://james.html5.org/cgi-bin/parsetree/parsetree.py?source=<table><tr><td>abc<%2Ftd><%2Ftr><%2Ftable>
  413. # [22:58] <hsivonen> DanC: can't express trees without the tbody in text/html in standards mode
  414. # [22:58] <DanC> oh. that hasn't bothered me so far.
  415. # [22:58] <DanC> I guess I'd have to be careful with my XPaths
  416. # [22:58] <DanC> does CSS magically not notice?
  417. # [22:58] <hsivonen> afk
  418. # [22:59] <hsivonen> DanC: CSS notices. it is a gotcha for CSS authors
  419. # [22:59] <hsivonen> really afk
  420. # [23:01] * Joins: myakura (myakura@58.88.37.26)
  421. # [23:01] * jgraham notices Molly didn't record his attendance at the XTech browser summit thing
  422. # [23:16] * Joins: zcorpan (zcorpan@84.216.41.183)
  423. # [23:17] <zcorpan> "browser varations in treatment of XML namespaces with DOM-based work arounds using scripts" -- http://esw.w3.org/topic/HtmlTestMaterials
  424. # [23:17] * zcorpan doesn't get that
  425. # [23:17] <zcorpan> is that namespaces in text/html?
  426. # [23:18] <DanC> oh... wow... merry christmas to me... the other items in that list sprouted test cases
  427. # [23:18] <zcorpan> they are pretty trivial... :)
  428. # [23:19] <DanC> well, I think it'll be non-trivial to get the HTML WG to agree what the right answer is in those cases.
  429. # [23:19] <zcorpan> couldn't get anything interesting out of "behavior for multiple definitions of same ID value" though :(
  430. # [23:19] <zcorpan> but aiui browser vendors want that case undefined anyway
  431. # [23:19] <DanC> i.e. it'll be non-trivial for W3C to say somthing other than "that's out of scope of the standards"
  432. # [23:20] * DanC struggles to decode aiui ... "as I understand it"?
  433. # [23:20] <zcorpan> yeah
  434. # [23:20] <DanC> ok, ~= AFAIK
  435. # [23:21] <DanC> mjs seems to argue hard against any cases being undefined.
  436. # [23:21] * DanC gets kinda excited at the possibility that the HTML WG might make real tangible progress on a test suite before we get old
  437. # [23:21] <zcorpan> sure, but if handling of duplicate ids is defined then browsers can't do lazy evaluation, which is cheaper
  438. # [23:22] <DanC> if it's defined as "pick the 1st one" then they can do lazy eval, no?
  439. # [23:22] <zcorpan> not if the dom is changed
  440. # [23:23] <DanC> I'm not sure I follow, but I think what you have in mind sounds like an interesting test case. one that involves scripting and interactivity.
  441. # [23:24] <DanC> are we gonna have to specify concurrency in scripting events, I wonder?
  442. # [23:24] * DanC shudders
  443. # [23:24] * DanC is reminded of POSIX file system semantic standards horrors
  444. # [23:27] <zcorpan> dup ids: http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%3Ctable%20id%3Dx%3E%3Ctr%3E%3Ctd%3E%3C/td%3E%3C/tr%3E%3Cp%20id%3Dx%3Ep%3C/table%3E%3Cscript%3Ew%28document.getElementById%28%22x%22%29.innerHTML%29%3C/script%3E
  445. # [23:28] <zcorpan> safari and firefox both put the P outside the table
  446. # [23:28] <zcorpan> safari uses the table, firefox uses the p
  447. # [23:28] <gsnedders> DanC: as part of my review of the spec, I'm doing a 1:1 implementation of the algorithms that I'm reviewing, complete with test cases
  448. # [23:29] <zcorpan> gsnedders: which algorithms?
  449. # [23:29] <gsnedders> zcorpan: off the top of my head, common microsyntaxes and the parser
  450. # [23:29] <zcorpan> ok
  451. # [23:30] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
  452. # [23:31] <DanC> gsnedders, where do you keep your code? is it published? bzr/hg/svn/cvs repo?
  453. # [23:31] <gsnedders> DanC: http://geoffers.no-ip.com/svn/php-html-5-lib
  454. # [23:31] * DanC wants to play with decentralized version control in building the HTML WG test suite
  455. # [23:31] <gsnedders> I think that is the correct URI
  456. # [23:31] <gsnedders> if not, just hit the authority and find the link
  457. # [23:32] * DanC gets a password prompt; wonders if that's by design
  458. # [23:32] <gsnedders> DanC: no
  459. # [23:32] <DanC> perhaps http://geoffers.no-ip.com/svn/php-html-5-direct/ ?
  460. # [23:33] <gsnedders> DanC: yes
  461. # [23:33] <DanC> ok, thanks
  462. # [23:34] <zcorpan> where is annevk btw? vacation?
  463. # [23:34] <gsnedders> it's too late to try and remember URLs off the top of my head
  464. # [23:35] <gsnedders> the number tests are all arranged so each test is run against each number algorithm
  465. # [23:35] <hasather> zcorpan: yea, I think so
  466. # [23:36] * zcorpan notes that http://html5.org/parsing-tests/testrunner.htm isn't the latest revision ( http://html5.googlecode.com/svn/trunk/parser-tests/testrunner.htm )
  467. # [23:37] <hasather> zcorpan: in Greece I think
  468. # [23:37] <zcorpan> hasather: ok
  469. # [23:37] <hsivonen> gsnedders: are you coordinating with Jero?
  470. # [23:37] <hsivonen> on the PHP impl?
  471. # [23:38] <Philip`> I like automatic test generation now - almost all the ones in http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/test3.test were constructed automatically from the tokeniser algorithm, and they cover every step in the algorithm
  472. # [23:38] <hsivonen> Philip`: cool
  473. # [23:39] <gsnedders> hsivonen: no, I've not had the time. I don't know if he ignored what I and another person had worked on previously or didn't know about it, but it is now outdated and needs in large parts to be redone
  474. # [23:39] <gsnedders> hsivonen: what I'm doing as part of the review will be far too slow to really be relevant though (though would be useful as a starting point)
  475. # [23:40] <hsivonen> gsnedders: how do you deal with Unicode in PHP?
  476. # [23:40] <gsnedders> hsivonen: horribly. just have to use UTF-8 strings.
  477. # [23:41] <gsnedders> hsivonen: and there's no easy way without relying on PHP extensions to do things at a character level, so it all has to be done at a byte level
  478. # [23:42] <zcorpan> so likely to not work with broken utf-8 sequences?
  479. # [23:42] <zcorpan> byte sequences
  480. # [23:43] <gsnedders> zcorpan: what I normally do just replaces any invalid sequences with a single U+FFFD character
  481. # [23:46] <zcorpan> 0xE5 0x3C
  482. # [23:47] <zcorpan> is that U+FFFD or U+FFFD followed by U+003C ?
  483. # [23:47] <hsivonen> gsnedders: I ported the Mozilla UTF-8 converter to PHP. there's a pure-PHP4 library on sf.net that uses it and provides other UTF-8 tools
  484. # [23:47] <gsnedders> zcorpan: the latter I expect?
  485. # [23:48] <zcorpan> gsnedders: it is in browsers, yeah.
  486. # [23:48] <gsnedders> hsivonen: But isn't it released under the same tri-license as Mozilla? And the pure PHP library is GPL, IIRC.
  487. # [23:48] <hsivonen> gsnedders: yeah, the port is under the tri-license. IIRC the lib is LGPL
  488. # [23:48] <gsnedders> zcorpan: I haven't touched that code in a while
  489. # [23:49] <gsnedders> zcorpan: it's U+FFD U+003C
  490. # [23:49] <gsnedders> *FFFD
  491. # [23:49] <gsnedders> hsivonen: I'm planning on merging an HTML5 parser into a BSD licensed project next year, so I can't really use such a thing
  492. # [23:50] <hsivonen> who is responsible for the html5lib non-JSOn test case format?
  493. # [23:50] <hsivonen> gsnedders: ok
  494. # [23:50] <hsivonen> can we make things easier and say that the substream is terminated by LF followed by #?
  495. # [23:51] <hsivonen> instead of LF followed by #errors
  496. # [23:52] <zcorpan> hsivonen: i think Hixie designed it
  497. # [23:53] <gsnedders> anyhow, see y'all tomorrow
  498. # [23:53] <hsivonen> let's see if I can get away with not looking beyond #
  499. # [23:53] <zcorpan> gsnedders: cya
  500. # [23:57] <jgraham> hsivonen: Hixie sort of. Although I implemented the parser we use so I guess I can bear some of the responsibility...
  501. # [23:58] * zcorpan implemented a "parser" in JS too
  502. # [23:59] <zcorpan> which is really just naïve split()s :)
  503. # [23:59] <Philip`> Ooh, I wonder if I could port my tokeniser to JS...
  504. # Session Close: Wed Jul 11 00:00:00 2007

The end :)