/irc-logs / freenode / #whatwg / 2007-07-05 / end

Options:

  1. # Session Start: Thu Jul 05 00:00:00 2007
  2. # Session Ident: #whatwg
  3. # [00:10] * Quits: tndH (i=Rob@83.100.252.160) ("ChatZilla 0.9.78.1-rdmsoft [XULRunner 1.8.0.9/2006120508]")
  4. # [00:29] * Joins: MikeSmith (n=MikeSmit@eM60-254-213-126.pool.emobile.ad.jp)
  5. # [00:46] * Quits: hendry (n=hendry@91.84.62.62) ("sleep")
  6. # [00:56] * Joins: tantek (n=tantek@m810f36d0.tmodns.net)
  7. # [01:10] * Quits: tantek (n=tantek@m810f36d0.tmodns.net)
  8. # [01:18] * Quits: duryodhan (n=chatzill@221-128-173-162.static.exatt.net) (Read error: 110 (Connection timed out))
  9. # [01:32] * moeffju is now known as moeffju[ZzZz]
  10. # [01:45] * Joins: tantek (n=tantek@c-24-6-138-86.hsd1.ca.comcast.net)
  11. # [02:02] * Joins: karlUshi (n=karl@dhcp-247-173.mag.keio.ac.jp)
  12. # [02:02] * Quits: tantek (n=tantek@c-24-6-138-86.hsd1.ca.comcast.net)
  13. # [02:02] * Quits: bzed (n=bzed@dslb-084-059-118-233.pools.arcor-ip.net) ("Leaving")
  14. # [02:47] * Parts: zcorpan_ (n=zcorpan@84-216-43-119.sprayadsl.telenor.se)
  15. # [02:47] * Quits: the_mart (n=Martin@host86-135-9-158.range86-135.btcentralplus.com) ("Leaving")
  16. # [02:56] * Joins: kfish (n=conrad@61.194.21.25)
  17. # [02:59] <Philip`> Does http://canvex.lazyilluminati.com/misc/imagedata.html crash Opera 9.5? (I can only test via Opera Mini, which just says "Internal server error", which sounds potentially worrying but not very informative)
  18. # [02:59] * Quits: csarven (n=nevrasc@modemcable081.152-201-24.mc.videotron.ca) (Read error: 110 (Connection timed out))
  19. # [03:01] * Quits: MikeSmith (n=MikeSmit@eM60-254-213-126.pool.emobile.ad.jp) (Read error: 104 (Connection reset by peer))
  20. # [03:02] <othermaciej> does Opera Mini handle events?
  21. # [03:03] <othermaciej> and scripting?
  22. # [03:04] <Philip`> It seems to, as long as you don't use setInterval and don't expect it to wait for distant timeouts
  23. # [03:05] <Philip`> (i.e. it can handle scripting and events and stuff while the page is loading, for some definition of 'loading' that I haven't quite worked out, though then it justs sends a static copy to your phone)
  24. # [03:05] <Philip`> *just
  25. # [03:06] * Quits: Lachy (n=Lachy@124-168-24-114.dyn.iinet.net.au) ("ChatZilla 0.9.78.1 [Firefox 2.0.0.4/2007051502]")
  26. # [03:06] * Joins: Lachy (n=Lachy@124-168-24-114.dyn.iinet.net.au)
  27. # [03:07] <othermaciej> so script runs at load time but not afterwards?
  28. # [03:08] <Philip`> Yes (as far as I can tell)
  29. # [03:09] <othermaciej> (I'm playing with the Opera Mini simulator)
  30. # [03:09] <Philip`> (since it basically opens the page in Opera on their servers, then at some point it decides it's got enough and transmits a non-interactive compressed snapshot, I think)
  31. # [03:09] <Philip`> (Me too, since my real phone is far too rubbish :-) )
  32. # [03:11] <Philip`> I got it to run ~100 canvas tests in iframes on a single page, and that (eventually) worked correctly with all the scripting and loading and stuff, but it wouldn't let me correctly press the buttons to submit the test results, so I had to do that via a hard-coded timer :-(
  33. # [03:23] * Joins: MikeSmith (n=MikeSmit@eM60-254-197-237.pool.emobile.ad.jp)
  34. # [03:25] * Quits: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net)
  35. # [03:34] * Joins: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net)
  36. # [03:40] * Joins: yod (n=ot@dhcp-247-181.mag.keio.ac.jp)
  37. # [03:50] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  38. # [04:04] * othermaciej is now known as om_out
  39. # [04:09] * Quits: kfish (n=conrad@61.194.21.25) ("同志社")
  40. # [04:15] * Quits: MikeSmith (n=MikeSmit@eM60-254-197-237.pool.emobile.ad.jp) (Read error: 110 (Connection timed out))
  41. # [04:21] * Joins: MikeSmith (n=MikeSmit@eM60-254-214-154.pool.emobile.ad.jp)
  42. # [04:22] * Quits: MikeSmith (n=MikeSmit@eM60-254-214-154.pool.emobile.ad.jp) (Read error: 104 (Connection reset by peer))
  43. # [04:26] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  44. # [04:26] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net) (Remote closed the connection)
  45. # [04:26] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  46. # [04:26] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net) (Remote closed the connection)
  47. # [04:27] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  48. # [04:27] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net) (Remote closed the connection)
  49. # [04:35] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  50. # [04:35] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net) (Remote closed the connection)
  51. # [04:46] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  52. # [04:48] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net) (Client Quit)
  53. # [04:48] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  54. # [05:01] * Quits: mpt (n=mpt@121-72-128-43.dsl.telstraclear.net) ("Leaving")
  55. # [05:28] * Joins: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
  56. # [05:30] * aroben is now known as aroben|food
  57. # [05:31] * Quits: aroben|food (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  58. # [05:53] * Joins: MikeSmith (n=MikeSmit@eM60-254-215-244.pool.emobile.ad.jp)
  59. # [05:53] * Quits: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
  60. # [05:58] * Quits: weinig (i=weinig@nat/apple/x-88c022b759e253c0)
  61. # [06:14] * Joins: mpt (n=mpt@121-72-128-43.dsl.telstraclear.net)
  62. # [06:19] <mpt> "For example, don’t put a 100 x 100 image in a 10 x 10 <image> element." -- unintentionally hilarious iPhone developer docs
  63. # [06:20] * Joins: wild_cfo (n=wild_c_f@ool-44c1bb48.dyn.optonline.net)
  64. # [06:27] * Joins: weinig (n=weinig@c-67-188-89-242.hsd1.ca.comcast.net)
  65. # [06:29] <mpt> Ah, interesting: "ensure that width * height * 4 < 8 MB" ... so apparently this <image> element is for some new kind of file that has widths and heights measured in MBm⁻².
  66. # [06:37] <mpt> But hooray for this: "Don’t use JavaScript movie controls to play video on iPhone. iPhone supplies its own controls."
  67. # [06:54] * Quits: Lachy (n=Lachy@124-168-24-114.dyn.iinet.net.au) (kubrick.freenode.net irc.freenode.net)
  68. # [06:54] * Quits: annevk (n=annevk@pat-tdc.opera.com) (kubrick.freenode.net irc.freenode.net)
  69. # [06:54] * Quits: Philip` (n=philip@zaynar.demon.co.uk) (kubrick.freenode.net irc.freenode.net)
  70. # [07:03] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  71. # [07:08] * Joins: Lachy (n=Lachy@124-168-24-114.dyn.iinet.net.au)
  72. # [07:08] * Joins: annevk (n=annevk@pat-tdc.opera.com)
  73. # [07:08] * Joins: Philip` (n=philip@zaynar.demon.co.uk)
  74. # [07:29] * Joins: duryodhan (n=chatzill@221.128.138.137)
  75. # [07:36] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net) (Remote closed the connection)
  76. # [07:36] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  77. # [08:08] * Joins: hendry (n=hendry@91.84.62.62)
  78. # [08:15] * Quits: hendry (n=hendry@91.84.62.62) ("wrongkernel")
  79. # [08:32] * Joins: hendry (n=hendry@91.84.62.62)
  80. # [08:39] * Joins: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
  81. # [08:56] * Quits: weinig (n=weinig@c-67-188-89-242.hsd1.ca.comcast.net)
  82. # [08:59] <om_out> mpt: width * height * 4 bytes
  83. # [08:59] * om_out is now known as othermaciej
  84. # [09:03] * Joins: Ducki (n=Alex@dialin-145-254-189-142.pools.arcor-ip.net)
  85. # [09:07] <hsivonen> Hixie: http://www.w3.org/mid/A0F10D3A-A679-4BB1-8844-684FBFDB94F6@iki.fi is there a way for the stack have td or th in such a position that generating implied end tags could close the scope (except for the EOF case)?
  86. # [09:19] * Joins: tndH (i=Rob@83.100.252.160)
  87. # [09:20] * Joins: webben (n=benh@dip5-fw.corp.ukl.yahoo.com)
  88. # [09:24] <annevk> hehe, iPhone docs promote <image> :)
  89. # [09:24] <hsivonen> annevk: URL?
  90. # [09:24] <annevk> http://developer.apple.com/iphone/designingcontent.html
  91. # [09:25] <annevk> click on "Use Standards and Tried-and-True Design Practices" and then search
  92. # [09:27] <othermaciej> I'll report a bug
  93. # [09:31] <hsivonen> annevk: did you try to optimize redundant steps in tree building at all or did you just follow the spec to letter even if it asked you to traverse the stack more than absolutely necessary?
  94. # [09:32] <annevk> there are some small optimizations
  95. # [09:32] <annevk> but not much
  96. # [09:32] <annevk> doesn't really matter a lot in Python I've the feeling
  97. # [09:33] * Quits: karlUshi (n=karl@dhcp-247-173.mag.keio.ac.jp) ("Where dwelt Ymir, or wherein did he find sustenance?")
  98. # [09:33] <annevk> well, in the beginning we tried to reduce function calls by using dictionaries instead of token objects and such and that worked pretty well
  99. # [09:33] <hsivonen> annevk: what's your take on the the ability of "generate end tags" to close the scope?
  100. # [09:33] <annevk> but now with the treebuilder abstraction we gained a lot of function calls again :(
  101. # [09:33] * Quits: yod (n=ot@dhcp-247-181.mag.keio.ac.jp) ("This computer has gone to sleep")
  102. # [09:34] <annevk> http://html5lib.googlecode.com/svn/trunk/python/src/html5lib/treebuilders/_base.py search for "generateImpliedEndTags"
  103. # [09:35] <annevk> although I now see it has some XXX comment that we never hit apparently...
  104. # [09:35] <hsivonen> annevk: I was thinking of doing the exact same thing: just popping
  105. # [09:35] <hsivonen> I guess I have to send another email
  106. # [09:36] <annevk> Hixie recently added a bunch of table elements there
  107. # [09:36] <annevk> I'm not sure what that was about
  108. # [09:37] <hsivonen> annevk: I think that was about EOF
  109. # [09:37] <hsivonen> I am not sure that it is a good idea to put them in that part of the spec
  110. # [09:37] <hsivonen> annevk: does Python turn tail recursion into looping?
  111. # [09:38] <annevk> dunno
  112. # [09:38] * Quits: webben (n=benh@dip5-fw.corp.ukl.yahoo.com)
  113. # [09:38] <annevk> http://html5.org/tools/web-apps-tracker?from=964&to=965
  114. # [09:39] <annevk> is that for <table><tbody><tr><td><p><tbody> or something?
  115. # [09:40] <annevk> doesn't seem like it, that already works
  116. # [09:41] <hsivonen> the only case where I see those mattering is the EOF case
  117. # [09:41] <annevk> example markup?
  118. # [09:42] * annevk reads http://en.wikipedia.org/wiki/Tail_recursion and understands we might be able to optimize stuff a bit
  119. # [09:44] <annevk> hmm, seems only to matter if it calls itself a lot
  120. # [09:46] <annevk> hsivonen, I don't see how it matters for EOF either
  121. # [09:46] <annevk> hsivonen, you always get a single error and that can't be avoided, because </table> is never implied
  122. # [09:47] <hsivonen> annevk: good point. will you send email or shall I?
  123. # [09:48] <annevk> you're already going pretty good with your review, you do it ;)
  124. # [09:49] <hsivonen> annevk: ok
  125. # [09:52] * Joins: met_ (n=Hassman@r5bx220.net.upc.cz)
  126. # [09:53] <met_> http://www.bluishcoder.co.nz/2007/07/patch-for-video-element-support-in.html
  127. # [09:55] <Hixie> hsivonen: i don't know (re <td>s)
  128. # [09:56] <hsivonen> Hixie: that doesn't sound good ;-)
  129. # [09:57] <Hixie> the table elements were added because it seemed wrong that they not be on the list
  130. # [09:57] <Hixie> i honestly don't know if they'll ever get hit
  131. # [09:57] <Hixie> i want to say no
  132. # [09:57] <Hixie> but i'm not sure how to prove it
  133. # [09:58] <Hixie> i'll be back in about 12 hours
  134. # [09:58] <Hixie> (and possibly briefly in a few minutes)
  135. # [09:58] <hsivonen> Hixie: I'd prefer to pretend that we proved that they never get hit
  136. # [10:00] <annevk> <tbody> gets ignored outside <table>, inside <table> it is handled explicitly in each table phase
  137. # [10:00] <annevk> I wonder if the same goes for <td> and <tr>
  138. # [10:01] <annevk> I'm pretty sure they never get hit either
  139. # [10:01] <annevk> lets test that with the tests we got...
  140. # [10:02] <hsivonen> annevk: tr, td and th start tags are ignored "in body"
  141. # [10:02] <annevk> indeed
  142. # [10:02] <annevk> if I remove "td", "th", "tr" from our generate implied end tags algorithm nothing goes wrong
  143. # [10:03] <annevk> because the table phases already deal with them
  144. # [10:03] <hsivonen> annevk: the end tags seem to fall under "An end tag token not covered by the previous entries", but that seems wrong
  145. # [10:03] <annevk> only "dd", "dt", "li", "p" are important
  146. # [10:03] <annevk> actually, if I remove "p" nothing fails either...
  147. # [10:03] * annevk ponders
  148. # [10:04] <hsivonen> annevk: removing p seem wrong
  149. # [10:04] <hsivonen> hmm. perhaps the An end tag token not covered by the previous entries
  150. # [10:04] <hsivonen> still does the right thing "in body" for cell ends
  151. # [10:04] <annevk> ah, the problem is that we don't count errors I suppose
  152. # [10:05] <annevk> as removing <li> also "works"
  153. # [10:05] <annevk> they are catched by the alternative algorithm that generates parse errors and therefore still generate the same tree...
  154. # [10:06] <hsivonen> IIRC, in fragment cases some "act as if" consistently produce 0 or 2 errors. I think I may have changed some of those to emit 0 or 1 errors
  155. # [10:17] * Joins: Charl (n=charlvn@c1-228-9.wblv.isadsl.co.za)
  156. # [10:28] <annevk> how does "If the stack of open elements has a p element in scope, then generate implied end tags, except for p elements." even make sense?
  157. # [10:28] <annevk> it says that when you encounter </p>
  158. # [10:29] <annevk> however, you will never generate an implied end tag for <dd>, <dt> or <li> or any o the table cells as they can never be between the <p> that is in scope and the current node
  159. # [10:37] <annevk> innerHTML wouldn't change anything for that either
  160. # [10:48] <hsivonen> annevk: excellent point
  161. # [10:49] <hsivonen> annevk: I'll email again.
  162. # [10:59] * Joins: Ducki_ (n=Alex@dialin-212-144-055-153.pools.arcor-ip.net)
  163. # [11:04] * Joins: BenWard (i=BenWard@nat/yahoo/x-4b53abbbd5c94177)
  164. # [11:08] <hsivonen> should the list of active formatting elements be implemented as an array or as a linked list?
  165. # [11:09] <hsivonen> is it searched much more often than a node is removed from the middle?
  166. # [11:13] <hsivonen> Hixie: was you stat for "invocations of the AAA" exactly this? (that is, is the answer array?)
  167. # [11:14] <hsivonen> oh that counted cloning nodes
  168. # [11:14] <hsivonen> Hixie: did you count changing the size of the list by deleting stuff in the middle?
  169. # [11:17] * Joins: zcorpan_ (n=zcorpan@84-216-41-39.sprayadsl.telenor.se)
  170. # [11:18] * Quits: Ducki (n=Alex@dialin-145-254-189-142.pools.arcor-ip.net) (Read error: 113 (No route to host))
  171. # [11:20] <hsivonen> annevk: does the algorithm for "in body" "An end tag token not covered by the previous entries" make sense to you?
  172. # [11:20] <hsivonen> step 2.3. makes no sense to me
  173. # [11:22] <annevk> what's 2.3?
  174. # [11:22] <hsivonen> Pop all the nodes from the current node up to node, including node, then stop this algorithm.
  175. # [11:23] <hsivonen> First: Initialise node to be the current node (the bottommost node of the stack).
  176. # [11:23] <hsivonen> ok makes sense
  177. # [11:23] <hsivonen> #
  178. # [11:23] <hsivonen> If node has the same tag name as the end tag token, then:
  179. # [11:23] <hsivonen> #
  180. # [11:23] <hsivonen> Generate implied end tags.
  181. # [11:23] <hsivonen> ok, makes sense
  182. # [11:23] <hsivonen> now Pop all the nodes from the current node up to node, including node, then stop this algorithm.
  183. # [11:23] <annevk> oh, I was looking at the wrong algorithm duh
  184. # [11:24] <hsivonen> how could /node/ not already be popped or be the current node?
  185. # [11:24] <hsivonen> shouldn't that be a simple unconditional pop
  186. # [11:25] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
  187. # [11:25] <hsivonen> umm. not unconditional but pop if the current node is /node/
  188. # [11:25] <annevk> <foo><bar><baz></foo>
  189. # [11:26] <annevk> would pop <baz> and <bar> and <foo>
  190. # [11:26] * Joins: maikmerten (n=maikmert@T63c3.t.pppool.de)
  191. # [11:26] <hsivonen> annevk: sorry for being dense, but I don't understand what step 2.3. has to do with it
  192. # [11:27] <hsivonen> annevk: isn't step 4. what causes that?
  193. # [11:27] <hsivonen> actually, step 2.1. makes no sense to me, either
  194. # [11:27] <annevk> indeed
  195. # [11:28] <annevk> I wonder how we managed to implement it :)
  196. # [11:29] <hsivonen> time to send mail again
  197. # [11:29] <annevk> we implemented what was mentioned
  198. # [11:30] <annevk> which doesn't make much sense :(
  199. # [11:30] <zcorpan_> can you provide a markup snippet that highlights the difference?
  200. # [11:31] <hsivonen> zcorpan_: the difference?
  201. # [11:31] <annevk> <foo>...</foo> is the only case that 2.1 covers
  202. # [11:31] <annevk> in which case you don't need to generate implied end tags etc.
  203. # [11:31] <annevk> you just need to pop
  204. # [11:31] <zcorpan_> ah
  205. # [11:31] <zcorpan_> indeed
  206. # [11:31] <hsivonen> lunch
  207. # [11:31] <hsivonen> then email
  208. # [11:58] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) ("Leaving")
  209. # [12:00] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
  210. # [12:01] <annevk> I think I'm done with public-html for the day
  211. # [12:05] * Joins: ROBOd (n=robod@86.34.246.154)
  212. # [12:06] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) ("Leaving")
  213. # [12:07] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
  214. # [12:10] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) (Client Quit)
  215. # [12:11] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
  216. # [12:11] <hsivonen> annevk: did you my email about the catch-all end tag case, though? did it make sense?
  217. # [12:13] * Quits: MikeSmith (n=MikeSmit@eM60-254-215-244.pool.emobile.ad.jp) (Read error: 110 (Connection timed out))
  218. # [12:20] <annevk> yes
  219. # [12:23] <hsivonen> ok. thanks.
  220. # [12:43] <annevk> having said that, I'm not sure the algorithm is correct
  221. # [12:43] <annevk> oh wait
  222. # [12:44] <annevk> hsivonen, it does make sense
  223. # [12:44] * annevk just realized
  224. # [12:44] <annevk> hsivonen, because of step 5
  225. # [12:44] <annevk> hsivonen, and step 4
  226. # [12:44] <annevk> hsivonen, they change "node"
  227. # [12:45] <annevk> so say you have <dialog><dd></dialog>
  228. # [12:45] <annevk> you get to 4
  229. # [12:46] <annevk> node becomes <dialog>
  230. # [12:46] <annevk> </dd> is implied
  231. # [12:46] <annevk> done
  232. # [12:46] <annevk> however, it's questionable whether this is correct given that current UAs don't generate implied end tags in those cases...
  233. # [12:50] <hsivonen> annevk: well, this certainly looks like something that needs another look by Hixie
  234. # [12:53] <annevk> it seems that for <foo> </foo> it doesn't make much sense
  235. # [12:54] <annevk> well, it seems that you can optimize for <foo> </foo>
  236. # [12:54] <annevk> it does make sense in a twisted way
  237. # [12:56] <hsivonen> annevk: looks like you aren't done for the day after all :-/
  238. # [12:58] * Quits: hendry (n=hendry@91.84.62.62) ("leaving")
  239. # [12:59] * Quits: Ducki_ (n=Alex@dialin-212-144-055-153.pools.arcor-ip.net) (Read error: 104 (Connection reset by peer))
  240. # [12:59] * Joins: Ducki_ (n=Alex@dialin-145-254-186-023.pools.arcor-ip.net)
  241. # [13:12] <hsivonen> I'd like to try to avoid ad hominems, but I'm intrigued that the insistence on a small improvement with great cost comes from an economist
  242. # [13:21] * Joins: MikeSmith (n=MikeSmit@eM60-254-212-208.pool.emobile.ad.jp)
  243. # [13:25] <annevk> that discussion is just painful
  244. # [13:35] <zcorpan_> authors provide fallback to <object>?
  245. # [13:36] * zcorpan_ won't join that discussion
  246. # [13:38] * moeffju[ZzZz] is now known as moeffju
  247. # [13:41] <annevk> hsivonen, yeah :-/
  248. # [13:41] <annevk> these people should join some browser development project and learn about the web a little bit
  249. # [13:54] <zcorpan_> annevk: did you check in the parser-tests thing somewhere?
  250. # [13:54] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) (Read error: 104 (Connection reset by peer))
  251. # [13:54] * Joins: yod (n=ot@softbank221018155222.bbtec.net)
  252. # [13:55] * Quits: yod (n=ot@softbank221018155222.bbtec.net) (Remote closed the connection)
  253. # [13:55] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
  254. # [13:55] * Joins: yod (n=ot@softbank221018155222.bbtec.net)
  255. # [13:55] * Quits: yod (n=ot@softbank221018155222.bbtec.net) (Remote closed the connection)
  256. # [13:56] <annevk> not yet
  257. # [13:56] * Joins: yod (n=ot@softbank221018155222.bbtec.net)
  258. # [13:56] * annevk was fixing html5lib
  259. # [13:56] <zcorpan_> ok
  260. # [13:56] <annevk> you want it checked in somewhere?
  261. # [13:57] <zcorpan_> would be nice, in case i feel like improving it
  262. # [13:58] <zcorpan_> no rush though
  263. # [14:01] * Joins: karlUshi (n=karl@124-144-94-188.rev.home.ne.jp)
  264. # [14:02] <annevk> it's in the html5 project now
  265. # [14:02] <annevk> including a README that says to modify the tests from html5lib, not the ones included
  266. # [14:02] <annevk> karlUshi, seen http://html5.org/parsing-tests/testrunner.htm already?
  267. # [14:02] <annevk> karlUshi, you might like it
  268. # [14:03] * Philip` wonders if anyone really cares what input like &#4294967366; gets parsed into
  269. # [14:04] <annevk> FFFD
  270. # [14:04] <annevk> U+FFFD
  271. # [14:04] <Philip`> Is it worth having tests for that kind of thing? (Or are there ones already?)
  272. # [14:04] <Philip`> (Firefox gets it wrong and says "F")
  273. # [14:05] <Lachy> I wonder why it does that
  274. # [14:05] <annevk> maybe a limit
  275. # [14:05] <Philip`> (and so does my non-serious not-really-implemented tokeniser)
  276. # [14:05] <annevk> we have tokenizer tests
  277. # [14:06] <Philip`> Probably by doing "int n; ... n = n*10 + (next_char - '0')" or something and not caring about overflow
  278. # [14:06] <Lachy> looks like it's a limit of 1 0000 0000 base 16
  279. # [14:06] <annevk> Opera and IE get it right
  280. # [14:07] <Philip`> FF also parses &#4294967295; into #4294967295;
  281. # [14:08] <annevk> oops
  282. # [14:08] * Philip` doesn't expect this is a likely place for real-world interoperability concerns
  283. # [14:08] <annevk> I suppose that explains how much time reverse engineering costs and that it isn't really worth checking what other browsers do all the time
  284. # [14:09] <hsivonen> if there's anything long about longdesc, it is the email threads
  285. # [14:09] <annevk> :p
  286. # [14:10] <hsivonen> Philip`: that's why you should have an integer overflow guard in your loop that consumes NCRs
  287. # [14:10] * hsivonen has one
  288. # [14:10] <Philip`> I just have a TODO comment stuck in there :-)
  289. # [14:10] <Philip`> and I have another similar comment telling me to implement the non-numeric entity things too
  290. # [14:10] <hsivonen> Philip`: which programming language?
  291. # [14:11] <hsivonen> Philip`: Ocaml?
  292. # [14:11] <Philip`> but I'm not particularly interested in making things actually work at the moment
  293. # [14:11] <Philip`> OCaml generating C++
  294. # [14:11] <hsivonen> cool
  295. # [14:11] <Philip`> (Also OCaml generating .dot files so I can make nice graphs of the tokeniser state transitions)
  296. # [14:11] <annevk> we solved it by having a try statement around the string to int conversion
  297. # [14:12] <hsivonen> if (value < 0) {
  298. # [14:12] <hsivonen> value = 0x110000; // Value above Unicode range but within int
  299. # [14:12] <hsivonen> // range
  300. # [14:12] <hsivonen> }
  301. # [14:13] * Philip` just wants to see what's possible when you have the tokeniser algorithm as a data structure that you can process, instead of being English text or unprocessable program code
  302. # [14:13] * Quits: MikeSmith (n=MikeSmit@eM60-254-212-208.pool.emobile.ad.jp) (Read error: 104 (Connection reset by peer))
  303. # [14:13] <hsivonen> (value is signed)
  304. # [14:18] <annevk> Philip`, will you consider implementing all the other fancy stuff as well?
  305. # [14:18] <annevk> or just tokenizing?
  306. # [14:22] <Philip`> That depends on how impossible the rest of it looks :-)
  307. # [14:23] <annevk> by the time Hixie addresses hsivonen's comments nobody will have to think about it anymore :p
  308. # [14:23] <Philip`> The tokeniser is fairly straightforward, since you can just represent the whole thing as a dozen state variables and some functions that match certain states and have transitions into new states
  309. # [14:23] <annevk> now I think of it, that might make it too boring for some!
  310. # [14:24] <Philip`> (The tree construction looks more complex than that, though I haven't looked at it in any detail)
  311. # [14:24] <annevk> tree construction is actually similar
  312. # [14:24] <annevk> although currently it has this concept called insertion mode which makes it look more complicated
  313. # [14:24] <annevk> you can actually implement it as a bunch of states as well
  314. # [14:25] <annevk> the difference being that you have some other set of variables and pass tokens around instead of characters
  315. # [14:26] <Philip`> Would I be right in thinking the only way the content model flag can change outside the tokeniser is when explicitly emitting a start tag?
  316. # [14:27] <annevk> yeah
  317. # [14:27] <annevk> hsivonen, removing "td", "th" and "tr" from generate implied end tags does indeed not give any parse error differences
  318. # [14:28] <annevk> hsivonen, removing "p", however, gives 45
  319. # [14:29] <hsivonen> Philip`: it's just that start tags "in body" have a lot of stuff to type
  320. # [14:30] * annevk is amazed at Robert's ability to not understand
  321. # [14:36] * Joins: MikeSmith (n=MikeSmit@eM60-254-202-189.pool.emobile.ad.jp)
  322. # [14:36] * Philip` reaches the bogus comment state, and finds that it totally doesn't match his way of writing the algorithm
  323. # [14:37] <annevk> markup open declaration did?
  324. # [14:38] <annevk> you should be able to implement those as functions I guess; separate from the states
  325. # [14:38] <Philip`> The problem is that it sounds like it needs to look backwards and know what happened before that state was reached
  326. # [14:39] <Philip`> The markup declaration open state is just after the bogus comment state, so I haven't got that far yet :-)
  327. # [14:41] <annevk> don't you have a character queue or something?
  328. # [14:42] <annevk> then you just make sure the right chars are on the stack before switching to the state
  329. # [14:44] <hsivonen> Philip`: you may find my impl useful to look at
  330. # [14:49] <annevk> zcorpan_, in case you missed it: http://html5.googlecode.com/svn/trunk/parser-tests/
  331. # [14:51] * Quits: maikmerten (n=maikmert@T63c3.t.pppool.de) (Read error: 110 (Connection timed out))
  332. # [14:51] * Joins: maikmerten (n=maikmert@T72ea.t.pppool.de)
  333. # [14:52] <zcorpan_> annevk: saw it, cheers
  334. # [14:53] <Philip`> Oh, I think my confusion comes from e.g. "<?" transitioning to the bogus comment state after consuming the '?', whereas "<!x" transitions before consuming the 'x', and the BCS can't tell the difference
  335. # [14:54] <annevk> doesn't it say "unconsume" somewhere?
  336. # [14:56] <Philip`> Not that I can see
  337. # [14:56] <Philip`> but I can work around it by just moving the consumption around to the right places
  338. # [14:58] <hsivonen> Philip`: I think Hixie cut corners when writing the spec. I had a bug there that the unit tests revealed
  339. # [14:58] <hsivonen> Philip`: basically, you need to start filling the bogus comment buffer before you make the actual state transition
  340. # [14:59] * Joins: Ducki__ (n=Alex@dialin-145-254-180-253.pools.arcor-ip.net)
  341. # [14:59] * Quits: karlUshi (n=karl@124-144-94-188.rev.home.ne.jp) ("Where dwelt Ymir, or wherein did he find sustenance?")
  342. # [14:59] * Joins: Codler (n=Codler@84-218-7-44.eurobelladsl.telenor.se)
  343. # [15:01] <Philip`> "(If the comment was started by the end of the file (EOF), the token is empty.)" - isn't it also empty if the comment was started by a > character?
  344. # [15:02] <Philip`> Hmm, I'll wait until later to sort out the details and make it actually work properly and pass the tests :-)
  345. # [15:02] <Philip`> (since the current implementation is totally not executable, which makes it hard to test)
  346. # [15:03] * Quits: BenWard (i=BenWard@nat/yahoo/x-4b53abbbd5c94177) (Read error: 104 (Connection reset by peer))
  347. # [15:03] * Quits: yod (n=ot@softbank221018155222.bbtec.net) ("Leaving")
  348. # [15:04] * Joins: BenWard (i=BenWard@nat/yahoo/x-424721520e41d982)
  349. # [15:05] * Quits: BenWard (i=BenWard@nat/yahoo/x-424721520e41d982) (Read error: 104 (Connection reset by peer))
  350. # [15:06] * Joins: BenWard (i=BenWard@nat/yahoo/x-851c38bdf86ef319)
  351. # [15:07] * Quits: Lachy (n=Lachy@124-168-24-114.dyn.iinet.net.au) (Read error: 110 (Connection timed out))
  352. # [15:12] <annevk> Philip`, yeah, then it's also empty
  353. # [15:16] * Quits: Ducki_ (n=Alex@dialin-145-254-186-023.pools.arcor-ip.net) (Read error: 113 (No route to host))
  354. # [15:21] * Joins: Lachy (n=Lachy@124-168-24-114.dyn.iinet.net.au)
  355. # [15:29] <Philip`> http://canvex.lazyilluminati.com/misc/states.png - incomplete and quite possibly with bugs, but it looks kind of interesting
  356. # [15:34] * Philip` should probably skip all the EOF bits since they're not very interesting and they make the diagram too complex
  357. # [15:36] <Lachy> in the whole fallback content thread, has anyone actually given a use case for needing fallback beyond plain text? All I've seen are unsupported claims that it's needed.
  358. # [15:37] <hsivonen> Philip`: cool. the diagram makes the transitions look more complex than they actually are
  359. # [15:38] <hsivonen> Philip`: in fact there are only two transitions that break a stack assumption
  360. # [15:39] <Philip`> hsivonen: Is that two when not counting all the reconsume-EOF-in-the-data-state ones?
  361. # [15:39] <hsivonen> Lachy: if you want to get rid of longdesc and move the essay about the Union Jack or the dress of Lord Cornwallis inline
  362. # [15:40] * Quits: Toolskyn (i=toolskyn@amy.bdick.de) (Remote closed the connection)
  363. # [15:40] <hsivonen> Philip`: reconsume whatever in data state works as a stack transition
  364. # [15:40] * Joins: Toolskyn (i=toolskyn@amy.bdick.de)
  365. # [15:40] <hsivonen> (see my code :-)
  366. # [15:40] <hsivonen> Philip`: just rewind the stack to the data state
  367. # [15:40] * Philip` will try to finish these bits while still untainted, and then look at the code ;-)
  368. # [15:41] <Lachy> hsivonen: that union jack example isn't particularly significant, since that description is completely inappropriate for how the flag was used.
  369. # [15:41] <Philip`> (I'm not trying to do a practical implementation - mostly I just want pretty pictures and things)
  370. # [15:41] <hsivonen> html5lib and my code are under the MIT license, it's not like looking at AT&T code :-)
  371. # [15:42] <Philip`> I currently just want to represent the algorithm as described in the spec, disregarding the implementation details that everyone else worries about :-)
  372. # [15:47] <MikeSmith> No commit-watchers mail since 28 June ... have there really been no changes, or is the list broken?
  373. # [15:47] <hsivonen> MikeSmith: Hixie is doing research. no changes
  374. # [15:47] <MikeSmith> OK
  375. # [15:47] <MikeSmith> thanks
  376. # [15:50] * Joins: rubys (n=rubys@cpe-075-182-064-252.nc.res.rr.com)
  377. # [15:50] <rubys> annevk: you there?
  378. # [15:52] <rubys> if you get a chance, can you look into removing from tests/test_parser.py the following line "if testName == "tests5": continue # TODO"?
  379. # [15:53] * Quits: BenWard (i=BenWard@nat/yahoo/x-851c38bdf86ef319)
  380. # [15:53] <hsivonen> ouch. the catch all end tag case "in body" has a set of 69 strings to test against...
  381. # [15:55] <hsivonen> perhaps the tokens should come with a clever bitfield after all... instead of just interning
  382. # [15:55] * Joins: BenWard (i=BenWard@nat/yahoo/x-86d43e2f7c62c229)
  383. # [15:57] <hsivonen> or a lex sorted array with binary search. or something...
  384. # [16:02] <Philip`> Does Java let you do binary searches for (interned) strings based on something like a pointer, rather than slowly comparing characters?
  385. # [16:03] <Philip`> (I guess that might not be possible since the GC can move things around arbitrarily and won't maintain a consistent ordering, perhaps)
  386. # [16:05] <hsivonen> Philip`: no, you only get to compare memory addresses for equality
  387. # [16:06] <hsivonen> Philip`: however, I could have a hashtable that knew that all values are interned
  388. # [16:07] <hsivonen> for the time being, I'm treating anything that goes beyond interning name and doing "foo" == name || "bar" == name || ... as a premature optimization
  389. # [16:08] * Quits: Lfe (n=lfe@bergstroem.nu) ("leaving")
  390. # [16:10] * Philip` wishes OCaml had better error reports than simply "Syntax error"
  391. # [16:20] <Philip`> Oh, assuming there's never an EOF doesn't make the state transitions much simpler - there's only about three cases I can see where it makes a difference
  392. # [16:31] * Joins: billmason (n=billmaso@ip156.unival.com)
  393. # [16:31] <MikeSmith> hsivonen - is it true that currently with html5lib, given an arbitrary HTML document as source that it can construct a DOM from successfully, that DOM can't necessarily be re-serialized as well-formed XML?
  394. # [16:31] <MikeSmith> Or anybody?
  395. # [16:32] <rubys> it is rare, but true
  396. # [16:32] <MikeSmith> (I realize html5lib is not hsivonen's implementation...)
  397. # [16:32] <MikeSmith> rubys - OK
  398. # [16:33] <rubys> it is possible to have entity or attribute names that aren't simple names, it is possible for comments to have two consecutive dashes in them, it is possible for strings to contain form feeds or other values that are illegal in XML.
  399. # [16:33] <MikeSmith> ah
  400. # [16:34] <Philip`> When I tried serialising a random collection of web pages as XML, a significant number (uh, I can't remember how much, but maybe 20% or so) became ill-formed XML
  401. # [16:34] <rubys> other things (like matching up open and close tags) are taken care of by html5lib, and so are the overwhelming majority of common errors.
  402. # [16:34] * Joins: tndH_ (i=Rob@83.100.252.160)
  403. # [16:34] <rubys> 20% surprises me.
  404. # [16:34] <rubys> are these public pages? Can you share an example?
  405. # [16:36] <MikeSmith> but hsivonen's implementation (backend of his conformance checker), by its nature, is inherently capable of producing well-formed XML?
  406. # [16:36] <MikeSmith> is that true?
  407. # [16:36] <MikeSmith> I would think it'd need to be since he has XML tools in the toolchain for it
  408. # [16:37] <MikeSmith> or maybe not
  409. # [16:37] <Philip`> I never looked at the examples in any detail, so I'm not sure what the issues were, though I remember a few were just because of <!---------->
  410. # [16:37] <Philip`> http://www.toyota.com/ is an interesting one
  411. # [16:37] <Philip`> since it has <spacer type"block" width="1" height="1"></spacer> which gets parsed as an attribute with a " in its name
  412. # [16:38] <Philip`> http://krijnhoetmer.nl/irc-logs/whatwg/20070507#l-581 - hmm, apparently it was 25%
  413. # [16:38] <Philip`> (just using the top thousand Yahoo search results for some boring word, if I remember correctly)
  414. # [16:39] <rubys> html5lib has a sanitizer that removes unsafe or unknown markup. Our goal is to make that bullet proof.
  415. # [16:39] <Philip`> I don't know how many of those issues were just caused by the html5lib toxml() being not very good
  416. # [16:40] <Philip`> (Also I think some of the issues might have been that I didn't handle character encoding properly)
  417. # [16:42] <rubys> If you are interested in producing XML, I would recommend the dom treebuilder
  418. # [16:46] <Philip`> When I was looking at those things before, I was mostly interested in analysing real HTML documents and just avoiding the slowness of repeatedly parsing with html5lib by caching them in a nicer serialised format, but it seems XML isn't very suitable for that :-(
  419. # [16:46] * Quits: wild_cfo (n=wild_c_f@ool-44c1bb48.dyn.optonline.net) ("This computer has gone to sleep")
  420. # [16:46] * Quits: tndH (i=Rob@83.100.252.160) (Read error: 110 (Connection timed out))
  421. # [16:46] <rubys> what type of analysis?
  422. # [16:48] <Philip`> Mainly looking for common usage of certain elements/attributes, like in http://canvex.lazyilluminati.com/misc/copyright.html and http://canvex.lazyilluminati.com/misc/summary.html
  423. # [16:48] <rubys> your requirements are terribly unique, and I would like to work towards making a bullet proof conversion (possibly lossy in cases like spaces in attribute names) possible, and would appreciate test cases towards that end.
  424. # [16:48] <Philip`> (and theoretically any other statistics on HTML documents, except I got distracted before getting around to scaling the system up to work on a reasonable sample)
  425. # [16:49] <Philip`> ((for quite small values of 'reasonable'))
  426. # [16:51] * Joins: hendry (n=hendry@kitten-x.com)
  427. # [16:53] <annevk> his requirements are very relevant for the work the HTML WG and WHATWG are doing (fwiw)
  428. # [16:53] <annevk> although they should be met by having a fast html5lib
  429. # [16:53] <Philip`> I expect I'll get back to this analysis thing at some point, and I'll see if I can extract the cases that cause problems (since I expect it would be nice to be able to use standard XML tools on random documents safely, without having to stick an HTML frontend onto them)
  430. # [16:54] <rubys> a fast html5ib ... which ultimately means a port to C
  431. # [16:54] <rubys> annevk: can you scroll back and see my question about tests5?
  432. # [16:54] <annevk> yeah, saw that
  433. # [16:55] <annevk> thought they already worked
  434. # [16:55] * annevk poners
  435. # [16:55] * annevk ponders*
  436. # [16:55] <rubys> that test passes, except for error checks, which you just enabled.
  437. # [16:55] <rubys> no error is produced on EOF
  438. # [16:55] <Philip`> I'm trying to write the easy part of the parsing algorithm in a language-agnostic manner, so it'll be nice if that works out :-)
  439. # [16:57] <annevk> there should be no error either
  440. # [16:57] <annevk> seems like a simple mistake in the test
  441. # [16:59] * Joins: Ducki (i=Alex@dialin-145-254-186-124.pools.arcor-ip.net)
  442. # [16:59] <rubys> if the tests were changed, then 'next if test_name == "tests5" # TODO' can be removed from ruby/tests/test_parser.rb too
  443. # [17:00] <annevk> yeah, did all that a few minutes ago
  444. # [17:01] <rubys> 'all that'? You changed the ruby test?
  445. # [17:01] <annevk> oh, ruby
  446. # [17:01] * Quits: Ducki__ (n=Alex@dialin-145-254-180-253.pools.arcor-ip.net) (Read error: 113 (No route to host))
  447. # [17:01] <annevk> sorry
  448. # [17:01] <annevk> I haven't played with ruby at all
  449. # [17:03] <rubys> I'd work on a C port, but only if we had more people who were interested in maintaining the code. This business of multiple people making changes to the Python code and Sam ports the changes won't scale much further.
  450. # [17:05] <annevk> if we have a C version we can just make Python and Ruby bindings, no?
  451. # [17:06] <rubys> that could certainly be done
  452. # [17:06] <Philip`> It's nice to have pure Python/Ruby/etc versions when people are unable/unwilling to compile and install C modules
  453. # [17:07] <annevk> can't you make some .pyc version people can just use?
  454. # [17:07] * annevk isn't really up to speed with C > Python mappings and how to work with them
  455. # [17:08] <Philip`> (hence things like XML::Sax::PurePerl)
  456. # [17:10] <Philip`> I think you probably need a .dll (or .so or whatever) if you want to use a C library in Python, and that will be specific to a certain processor architecture and OS and maybe other system libraries, which is a pain when people can't compile easily
  457. # [17:10] <annevk> hmm, fair enough
  458. # [17:11] <rubys> on the other hand, 99.99% of the people would choose to use a C binding to their favorite language over a native binding.
  459. # [17:12] <annevk> http://lists.w3.org/Archives/Public/www-archive/2007Jul/0010.html ...
  460. # [17:13] * tndH_ is now known as tndH
  461. # [17:13] <annevk> rubys, people who care one bit about performance, indeed
  462. # [17:14] <annevk> also, C bindings to an HTML5 parser should just be included by default in Python, Ruby, Java, etc.
  463. # [17:14] <annevk> well, maybe not Java
  464. # [17:15] <Philip`> Perl too :-)
  465. # [17:15] <rubys> I'd also love to see the C parser actually used by products like Opera and/or Firefox.
  466. # [17:15] <rubys> they could have their own treebuilders, of course; but the parser could be the same.
  467. # [17:17] * Philip` wishes he could remember how to compute transitive closures (in a functional language)
  468. # [17:18] <annevk> from what I heard from WebKit and Firefox architecture that might be quite tricky
  469. # [17:19] <rubys> I'm not familiar with WebKit, but I have taken a peek at Firefox. Don't see why it would be tricky (I know, I know, famous last words...)
  470. # [17:20] * annevk needs /ignore for e-mail clients
  471. # [17:21] <annevk> rubys, maybe it's possible, they have done it for the XML parser after all...
  472. # [17:23] <rubys> exactly... there is a part in the logic where you take in an input stream and produce a custom DOM implementation. Obviously, the input stream and DOM may vary from product to product, as would the tokenizer/parser error handing, but the logic could be pluggable.
  473. # [17:24] <rubys> Imagine how nice it would be if Safari, Firefox, and Opera used the SAME tokenizer/parser?
  474. # [17:24] <annevk> hmm, no parsing bugs to exploit!
  475. # [17:24] <Philip`> They'd probably all use slightly different versions with different bug fixes, so it wouldn't be entirely perfect
  476. # [17:25] <rubys> perfect? No. But a dramatic improvement over today.
  477. # [17:26] <rubys> And each vendor is going to have to invest some work effort towards html5 compliance. This should reduce the work for everybody.
  478. # [17:33] <Philip`> Are vendors planning to replace their existing HTML parser with a shiny new HTML5 one, or are they planning to just receive lots of bug reports and make lots of small fixes until they pass most of the tests, or are they not planning anything yet?
  479. # [17:39] <annevk> I think WebKit is planning on fixing bugs
  480. # [17:39] <annevk> they're pretty close for most cases anyway
  481. # [17:39] <annevk> dunno about other browsers
  482. # [17:45] <Philip`> Hmm, the state transition graph gets a bit big when I split out all the different content models
  483. # [17:48] * Quits: jgraham (n=jgraham@81-86-222-233.dsl.pipex.com) (Read error: 110 (Connection timed out))
  484. # [17:54] * Joins: tndH_ (i=Rob@83.100.252.160)
  485. # [17:54] * Quits: tndH (i=Rob@83.100.252.160) (Read error: 110 (Connection timed out))
  486. # [17:54] * tndH_ is now known as tndH
  487. # [18:00] * Joins: weinig (i=weinig@nat/apple/x-a6309fb9aa376651)
  488. # [18:01] <Philip`> http://canvex.lazyilluminati.com/misc/states2.png
  489. # [18:02] <annevk> ouch
  490. # [18:02] <annevk> "HTML tokenizing. More trivial than it looks."
  491. # [18:05] <Philip`> I think that's overestimating the possible transitions a little, since it assumes that whenever a tag token (either start or end) is emitted it could end up in any of the four content models
  492. # [18:06] <Philip`> At least there's the nice DataState PLAINTEXT black hole at the bottom
  493. # [18:06] <annevk> :)
  494. # [18:17] <annevk> In the Live DOM Viewer in Internet Explorer the <!> sequence causes the DOM view to turn almost blank...
  495. # [18:29] * Joins: tndH_ (i=Rob@83.100.252.160)
  496. # [18:32] * Joins: h3h (n=w3rd@66-162-32-234.static.twtelecom.net)
  497. # [18:33] * Quits: weinig (i=weinig@nat/apple/x-a6309fb9aa376651)
  498. # [18:36] * Joins: weinig (i=weinig@nat/apple/x-204ff4e81de6ca4d)
  499. # [18:36] * Quits: KevinMarks (n=KevinMar@c-76-102-254-252.hsd1.ca.comcast.net) ("The computer fell asleep")
  500. # [18:37] * Joins: hasather (n=hasather@22.80-203-71.nextgentel.com)
  501. # [18:38] <Philip`> It looks like my state transition thing agrees with the spec's comments about "This can only happen if the content model flag is set to the PCDATA state" etc, except for the bogus comment state where you have to do lots of slightly convoluted thinking to work out that it's correct
  502. # [18:38] <Philip`> though, should the (non-bogus) comment states state that they can only happen when PCDATA, or is that obvious when left unstated?
  503. # [18:47] * Quits: tndH (i=Rob@83.100.252.160) (Read error: 110 (Connection timed out))
  504. # [18:47] <Philip`> (I suppose it should also be obvious that the only state you can be in with PLAINTEXT is the data state)
  505. # [18:48] <annevk> I'm not sure why the other cases actually state it, to be honest
  506. # [18:48] <annevk> It makes it just more confusing for the cases where it's not
  507. # [18:50] * Quits: tndH_ (i=Rob@83.100.252.160) (Read error: 110 (Connection timed out))
  508. # [18:51] * Quits: Lachy (n=Lachy@124-168-24-114.dyn.iinet.net.au) (Read error: 110 (Connection timed out))
  509. # [18:54] <zcorpan_> annevk: it's because comments where the leading "!--" and trailing "--" don't fit, you can't read .nodeValue in ie
  510. # [18:54] <zcorpan_> annevk: i solved that by using a try/catch in dom2string
  511. # [18:55] <zcorpan_> annevk: and emitting "<!-- -->" if reading .nodeValue fails
  512. # [18:55] <annevk> k
  513. # [18:56] <Philip`> Ooh, neat, the W3C validator says <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><title></title><table datapagesize=cheese><tr><td></table> is valid
  514. # [18:56] <annevk> hehe
  515. # [18:56] <zcorpan_> would be cool if the live dom viewer had an option to show the dom using dom2string_recursive
  516. # [18:58] <zcorpan_> Hixie: yt?
  517. # [18:59] * Joins: Ducki_ (n=Alex@dialin-212-144-055-172.pools.arcor-ip.net)
  518. # [19:02] <annevk> zcorpan_, the real feature would be to make a mashup of http://james.html5.org/parsetree.html and your script
  519. # [19:02] <annevk> zcorpan_, maybe just for the text input box
  520. # [19:08] * Joins: aroben (n=adamrobe@17.203.15.248)
  521. # [19:09] * Joins: Lachy (n=Lachy@203-158-59-119.dyn.iinet.net.au)
  522. # [19:18] * Quits: Ducki (i=Alex@dialin-145-254-186-124.pools.arcor-ip.net) (Read error: 110 (Connection timed out))
  523. # [19:29] * Joins: tndH (i=Rob@83.100.252.160)
  524. # [19:30] * Quits: met_ (n=Hassman@r5bx220.net.upc.cz) ("Chemists never die, they just stop reacting.")
  525. # [19:30] * Quits: BenWard (i=BenWard@nat/yahoo/x-86d43e2f7c62c229) ("Fades out again…")
  526. # [19:47] * Joins: KevinMarks (i=KevinMar@nat/google/x-3d39f747c7a64a31)
  527. # [19:51] * Joins: webben (i=benh@nat/yahoo/x-298224fddc481c77)
  528. # [20:00] * Quits: hendry (n=hendry@kitten-x.com) (Read error: 113 (No route to host))
  529. # [20:11] * Joins: hendry (n=hendry@kitten-x.com)
  530. # [20:22] <Philip`> The tokeniser is much easier when I don't worry about actually implementing it, since I can just add a command like AppendHyphenToCommentToken and use it without caring about what it does
  531. # [20:23] <Philip`> but I guess it'll all catch up with me when I do get around to the implementation bit :-(
  532. # [20:27] <zcorpan_> Philip`: you're writing pseudo-code? :)
  533. # [20:28] * Quits: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
  534. # [20:30] <Philip`> Yes :-)
  535. # [20:31] <Philip`> (in a form that can be transformed into real code)
  536. # [20:31] <Philip`> (but that just moves some of the work into the code that does the transformation)
  537. # [20:32] <Philip`> (but it's a good excuse to learn OCaml anyway)
  538. # [20:32] * Quits: Ducki_ (n=Alex@dialin-212-144-055-172.pools.arcor-ip.net) (Client Quit)
  539. # [20:32] * Quits: MikeSmith (n=MikeSmit@eM60-254-202-189.pool.emobile.ad.jp) (Read error: 110 (Connection timed out))
  540. # [20:33] * Joins: Ducki (n=Alex@dialin-212-144-055-172.pools.arcor-ip.net)
  541. # [20:35] * Joins: MikeSmith (n=MikeSmit@eM60-254-197-94.pool.emobile.ad.jp)
  542. # [20:37] * Joins: dbaron (n=dbaron@corp-242.mountainview.mozilla.com)
  543. # [20:50] <Philip`> http://canvex.lazyilluminati.com/misc/states3.png - now with added doctype states, so I think it's got everything (and probably more bugs than before)
  544. # [20:51] <Philip`> Oops, that's still got the EOF transitions...
  545. # [20:52] <Philip`> Now it doesn't, so it's a bit prettier
  546. # [20:57] <Philip`> Actually, I should probably tell it about parse errors too, so I can see if it's much simpler for conforming content
  547. # [20:58] <zcorpan_> seems the algorithm in https://bugzilla.mozilla.org/attachment.cgi?id=188040 only has one flaw, which is before step 1: match the value against the list of color keywords
  548. # [20:58] * Quits: weinig (i=weinig@nat/apple/x-204ff4e81de6ca4d) (Read error: 110 (Connection timed out))
  549. # [21:00] * Joins: Ducki_ (i=Alex@dialin-145-254-186-098.pools.arcor-ip.net)
  550. # [21:00] <annevk> zcorpan_, nice interop mess
  551. # [21:01] * Quits: Ducki (n=Alex@dialin-212-144-055-172.pools.arcor-ip.net) (Read error: 104 (Connection reset by peer))
  552. # [21:01] <zcorpan_> now i'll just see which keywords are supported, and if that differs from the keywords supported in css
  553. # [21:01] * Quits: Codler (n=Codler@84-218-7-44.eurobelladsl.telenor.se) ("- nbs-irc 2.21 - www.nbs-irc.net -")
  554. # [21:03] * Quits: Charl (n=charlvn@c1-228-9.wblv.isadsl.co.za) ("Leaving")
  555. # [21:07] <Philip`> http://canvex.lazyilluminati.com/misc/states4.png - hmm, it does look much cleaner when you don't allow parse errors
  556. # [21:11] <zcorpan_> wow. ie supports lightgrey but not lightgray. quite the opposite to all other gr(a|e)ys
  557. # [21:12] <zcorpan_> Philip`: you can't get into the bogus states if you don't allow parse errors, right?
  558. # [21:12] <Philip`> http://en.wikipedia.org/wiki/HTML_colors says lightgrey too
  559. # [21:15] <zcorpan_> could there be other keywords supported that aren't listed in css3-color ?
  560. # [21:15] <Philip`> zcorpan_: Yep - there's nothing leading into those states in the diagram, but I didn't bother stripping them out
  561. # [21:15] * Joins: jcgregorio (n=chatzill@209.79.152.140)
  562. # [21:15] <zcorpan_> Philip`: ok
  563. # [21:16] <Philip`> zcorpan_: I believe I looked in IE's .exe for colour names, and it didn't have any that weren't the standard set which CSS3 and every other browser includes
  564. # [21:16] <zcorpan_> Philip`: ok. thanks
  565. # [21:17] <Philip`> Oh, that was IE3
  566. # [21:18] <Philip`> but I don't think they've changed it since then
  567. # [21:18] <Philip`> since they just copied it from NN2
  568. # [21:19] <Dashiva> Philip`: What if you colored the transition arrows depending on whether the transition requires a parse error or not?
  569. # [21:19] <annevk> might be interesting to test DarkSeaGreen
  570. # [21:19] <annevk> whether IE has the X11 or .Net impl
  571. # [21:19] * Quits: jcgregorio (n=chatzill@209.79.152.140) (Client Quit)
  572. # [21:19] * annevk got that from the wikipedia page
  573. # [21:20] <zcorpan_> annevk: darkseagreen is in css3-color
  574. # [21:20] <Philip`> Dashiva: That sounds worth doing
  575. # [21:20] <zcorpan_> ah
  576. # [21:20] <Philip`> though what about transitions that can be both parse errors and not?
  577. # [21:21] <Dashiva> a third color, or both?
  578. # [21:22] <Philip`> Hmm, I'll just draw two arrows, because then I won't have to change my code :-)
  579. # [21:22] <annevk> some more arrows wouldn't hurt
  580. # [21:22] <annevk> it's not always clear what the direction is :)
  581. # [21:23] <Dashiva> Maybe put an arrowhead on the middle of the arrow too
  582. # [21:24] <Philip`> Hmph, colour PNGs are huge
  583. # [21:25] <zcorpan_> annevk: ie uses x11
  584. # [21:25] <Philip`> http://canvex.lazyilluminati.com/misc/states5.png
  585. # [21:25] * Joins: weinig (i=weinig@nat/apple/x-6e3b9ac0c16bd8f1)
  586. # [21:28] <Philip`> Hmm, I don't think I can make Graphviz draw arrow heads except at the end
  587. # [21:30] <annevk> zcorpan_, so how do you test which color is used? some color picker?
  588. # [21:31] <zcorpan_> annevk: .bgcolor returns the rgb color
  589. # [21:32] <zcorpan_> er, .bgColor
  590. # [21:32] <annevk> cool, automated testing
  591. # [21:34] <zcorpan_> http://simon.html5.org/test/html/parsing/color-attributes/keywords/
  592. # [21:36] <zcorpan_> i haven't sent anything to the list about color attributes yet, have i
  593. # [21:38] <annevk> prolly not: http://www.google.com/search?q=inurl:whatwg-whatwg+color
  594. # [21:39] * Philip` wonders if he could automatically generate tests to cover all the possible state transitions
  595. # [21:39] <annevk> in http://simon.html5.org/test/html/parsing/color-attributes/ you can change Opera to none too
  596. # [21:39] <annevk> Philip`, that'd be most useful
  597. # [21:40] <annevk> Philip`, format: http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/ pretty please :)
  598. # [21:40] <zcorpan_> annevk: ah. cool.
  599. # [21:41] <annevk> Philip`, or maybe in the tree construction format...
  600. # [21:41] <annevk> Philip`, that would prolly be useful too especially for testing browsers
  601. # [21:42] <Philip`> The tree construction format probably wouldn't work too well when I don't have a tree constructor, unless I'm missing some point...
  602. # [21:43] <annevk> ah, if you want to debug your own code, then no
  603. # [21:44] <Philip`> Ah, okay - I think it would be nice to have something I could use for just tokeniser tests
  604. # [21:44] <annevk> then use the funky json format :)
  605. # [21:45] <annevk> I wonder if that can be used in some meaningfull way on browsers too... prolly not
  606. # [21:45] * Joins: webben_ (i=benh@nat/yahoo/x-33bf928752899e80)
  607. # [21:45] <Philip`> though I don't know how to cope with the issue that the tree construction stage can affect the tokeniser's content model, when there's no tree construction stage
  608. # [21:45] * Quits: webben (i=benh@nat/yahoo/x-298224fddc481c77) (Read error: 104 (Connection reset by peer))
  609. # [21:45] <annevk> see escapeFlag.test and contentModelFlags.test
  610. # [21:45] <Philip`> Incidentally, "content model flag" is a confusing name since most flags don't have four states...
  611. # [21:46] <Philip`> Oh, right - that looks useful :-)
  612. # [21:48] * Quits: webben_ (i=benh@nat/yahoo/x-33bf928752899e80) (Client Quit)
  613. # [21:49] <Philip`> Shouldn't the test format include attributes on end tags, since the tokeniser is meant to emit them?
  614. # [21:50] * Joins: bzed (n=bzed@dslb-084-059-100-221.pools.arcor-ip.net)
  615. # [21:50] <annevk> the tokeniser doesn't emit them
  616. # [21:51] <annevk> Hixie, those stats on AAA are useful! thanks
  617. # [21:51] <Philip`> "Start and end tag tokens have a tag name and a list of attributes, each of which has a name and a value." "When an end tag token is emitted with attributes, that is a parse error." - it sounds like they are emitted
  618. # [21:52] <annevk> oh, ok
  619. # [21:52] <Hixie> annevk: which ones?
  620. # [21:52] <annevk> Hixie, the ones you pasted in IRC earlier; how many times duplication is hit etc.
  621. # [21:53] <annevk> although I'd love to see more detail :)
  622. # [21:53] <Hixie> ah yes
  623. # [21:53] <Hixie> i'll be posting more in due course
  624. # [21:54] * Joins: jgraham (n=jgraham@81-86-213-61.dsl.pipex.com)
  625. # [22:12] * Quits: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net) (Read error: 104 (Connection reset by peer))
  626. # [22:15] * Quits: maikmerten (n=maikmert@T72ea.t.pppool.de) ("Leaving")
  627. # [22:15] * Joins: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net)
  628. # [22:20] <annevk> jgraham, I've been thinking about removing all the classes in html5parser.py
  629. # [22:20] <annevk> having said that, it hasn't been more than thinking
  630. # [22:21] <annevk> I'm not sure if we would actually gain anything from removing them and moving to a bunch of if/else statements as opposed to dictionary based method invocations
  631. # [22:21] <annevk> what we have now might actually be faster
  632. # [22:21] <rubys> why remove them then?
  633. # [22:21] <jgraham> annevk: I would image waht we have now is faster
  634. # [22:22] <jgraham> (although I would need metrics to be sure, of course)
  635. # [22:22] <jgraham> I think the time would be better spent on Chtml5lib
  636. # [22:22] <annevk> prolly
  637. # [22:23] <rubys> If I did the port, who would contribute to it?
  638. # [22:23] <jgraham> rubys: I guess it would be one way for me to finally learn C :)
  639. # [22:24] <rubys> I took a look at it, and porting it to C++ would probably take about a week. To C would be another week.
  640. # [22:24] <annevk> if I learn how to work with C on Ubuntu (besides learning to work with C in general) I would probably contribute
  641. # [22:24] <jgraham> (which is a way of saying I would love to contribute fixes but I don't feel confident in designing it)
  642. # [22:24] <annevk> not sure how much time I would invest on the python version afterwards
  643. # [22:24] <rubys> I would simply port the existing design. After it is working, it could be optimized, refactored, etc.
  644. # [22:25] <bewest> in that case why not profile the python version and move slow parts to C?
  645. # [22:25] <annevk> hmm, how are we going to handle <noscript>?
  646. # [22:26] <annevk> bewest, how is that better?
  647. # [22:26] <jgraham> That sounds great to me; I simply don't have enough C experience to know how best to implement things that are currently e.g. lists in python in C
  648. # [22:26] <annevk> we can prolly steal some ideas from Hixie's and hsivonen's impl
  649. # [22:26] <bewest> annevk: maybe it's not :/
  650. # [22:26] <rubys> C++ has a standard library. Going to C next would mean reimplementing those concepts.
  651. # [22:26] <jgraham> bewest: It's not like there's one slow bit, it's the overhad of doing things many times
  652. # [22:26] <bewest> yeah
  653. # [22:26] <jgraham> e.g. many function calls
  654. # [22:27] <Philip`> I'd be interested to see if my C++ tokeniser implementation could actually work in practice
  655. # [22:27] <jgraham> Philip`: the O'Caml one?
  656. # [22:27] <annevk> rubys, if we're going to do it C might be better if we get more detailed control over things like the inputstream
  657. # [22:27] <Philip`> jgraham: Yes
  658. # [22:27] <annevk> Question: scripting is enabled or disabled?
  659. # [22:27] <annevk> we don't have any tests for <noscript> atm...
  660. # [22:28] <Philip`> (The C++-generating part is totally broken now, but http://canvex.lazyilluminati.com/misc/states5.png is generated from exactly the same data as the C++ tokeniser would be)
  661. # [22:31] <annevk> I'll assume that scripting is enabled for now
  662. # [22:31] <annevk> I suppose at some point we can provide a switch and enable/disable tests conditionally
  663. # [22:33] <Philip`> Could the test format be made to handle scripts modifying the input stream?
  664. # [22:34] * Joins: wild_cfo (n=wild_c_f@ool-44c1bb48.dyn.optonline.net)
  665. # [22:34] * Quits: wild_cfo (n=wild_c_f@ool-44c1bb48.dyn.optonline.net) (Client Quit)
  666. # [22:35] <Philip`> You couldn't really expect parsers to all have script interpreters, but you could define that the tests can have <script>document.write("<p>")</script> (for some arbitrary JSON-encoded string) and the test harness can push those strings back into the input stream, to make sure the parser copes properly
  667. # [22:40] <annevk> at least for tree construction that's feasible
  668. # [22:40] <annevk> I was thinking of maybe offering #document-scripting-disabled at some point which provides an alternate tree and prolly also #errors-scripting-disabled
  669. # [22:41] <Hixie> just so everyone is aware and doesn't wonder if i died or something, i'm going to be on vacation for 3 weeks starting sunday
  670. # [22:41] <gsnedders> I'll make sure to ask if you've died.
  671. # [22:42] <hasather> Hixie: have fun :)
  672. # [22:42] <Hixie> i'll try! :-)
  673. # [22:42] <gsnedders> more seriously, where are you going?
  674. # [22:42] <Hixie> europe, east coast, various places around there
  675. # [22:43] <Hixie> apparently spending a lot of time in layovers at schipol
  676. # [22:43] <Hixie> which doesn't bode well for my luggage
  677. # [22:43] <annevk> yeah, it does that to you
  678. # [22:44] <gsnedders> I'm probably not getting of of the UK this summer
  679. # [22:44] <jgraham> gsnedders: Me neither (although I have been to various conferences abroad)
  680. # [22:45] <gsnedders> I'm going off down to Cambridge, but that's it. Probably going to Paris with my sister + her husband over the October holidays, though
  681. # [22:46] <jgraham> I assure you that Cambridge is lovely in every way. As long as you don't like hills. Or even slight rises.
  682. # [22:46] <jgraham> And, preferably, have a thing for tourists and punt touts
  683. # [22:46] <gsnedders> my grandmother lives in Cambridge, I've been plenty of times. Doesn't seem that hilly to someone from Scotland, though.
  684. # [22:47] <gsnedders> I should try actually punting again…
  685. # [22:47] <jgraham> It's really not that hilly. That why you can't like hills if you want to like Cambridge
  686. # [22:47] * jgraham wants to move away just to get some hills
  687. # [22:47] <gsnedders> jgraham: come here!
  688. # [22:48] <gsnedders> [Fife]
  689. # [22:49] <jgraham> Fife would be nice. How are the employment opportunities though?...
  690. # [22:50] * Joins: hober (n=ted@unaffiliated/hober)
  691. # [22:50] <gsnedders> No idea. I'm too young to know such things :)
  692. # [22:50] <jgraham> And I, sadly, am almost old enough to have to care :(
  693. # [22:51] * gsnedders goes back to showing how young he is by looking up university entrance requirements
  694. # [22:51] <Dashiva> I feel old now
  695. # [22:52] * Quits: ROBOd (n=robod@86.34.246.154) ("http://www.robodesign.ro")
  696. # [22:58] <Philip`> You have to put up with all the students in Cambridge too :-p
  697. # [22:58] * Quits: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net)
  698. # [22:59] <gsnedders> hmmm… AAAAB at the min. for Higher entrance into Oxford
  699. # [22:59] * gsnedders marks English as the B
  700. # [22:59] <Philip`> though I suppose they're usually outnumbered by tourists
  701. # [22:59] <gsnedders> Philip`: the terms aren't overly long at Cam/Oxf
  702. # [23:00] * Quits: Ducki_ (i=Alex@dialin-145-254-186-098.pools.arcor-ip.net) (Read error: 113 (No route to host))
  703. # [23:00] <Philip`> 3 * 8 weeks, with three months off for the summer vacation :-)
  704. # [23:00] * Joins: Ducki_ (n=Alex@dialin-145-254-186-098.pools.arcor-ip.net)
  705. # [23:00] <gsnedders> Philip`: which gives plenty of time for tourists to rule supreme :)
  706. # [23:00] <gsnedders> (I couldn't myself remember whether it was 8v10 or 10v12)
  707. # [23:01] <Philip`> It's nice during the exam term when they stop all the tourists coming into the colleges
  708. # [23:02] <gsnedders> I don't think I've ever been there at the time, due to school
  709. # [23:02] <Philip`> (Er, but I have no idea how many colleges do that)
  710. # [23:02] <gsnedders> (and nowadays I have exams at the same time)
  711. # [23:02] <gsnedders> Philip`: all do, IIRC
  712. # [23:05] <jgraham> Philip`: the quatity tourits+students is roughly conserved over the whole year
  713. # [23:06] <Hixie> cute, this http://triin.net/2006/06/12/Coding_practices_of_web_pages page refers to my 2005-12 study
  714. # [23:07] <Hixie> wow, the numbers he gets are very similar to the numbers i got in that study
  715. # [23:07] <Hixie> ncie
  716. # [23:07] <Hixie> nice
  717. # [23:07] <Hixie> (comparing http://code.google.com/webstats/2005-12/pages.html to http://triin.net/2006/06/12/HTML)
  718. # [23:08] <Hixie> even the oddities are present in both studies
  719. # [23:08] <Hixie> that's awesome
  720. # [23:09] * Joins: csarven (n=nevrasc@modemcable081.152-201-24.mc.videotron.ca)
  721. # [23:18] * Quits: annevk (n=annevk@pat-tdc.opera.com) (Read error: 110 (Connection timed out))
  722. # [23:18] * Joins: webben (n=benh@91.84.193.157)
  723. # [23:19] <hsivonen> MikeSmith: my Java impl has configurable XML 1.0 compat
  724. # [23:21] <hsivonen> MikeSmith: for various features you can choose to be conforming to HTML5 (and potentially violate XML 1.0), not to violate XML 1.0 by treating violations as fatal errors or not violate XML 1.0 by being non-conforming to HTML 5 and making infoset-altering coercions
  725. # [23:22] * Quits: weinig (i=weinig@nat/apple/x-6e3b9ac0c16bd8f1) (Read error: 110 (Connection timed out))
  726. # [23:23] <hsivonen> rubys: it might be a good idea to do an independent implementation in C. I believe Mike Day has already started one. I chose to do an independent implementation in Java using only test cases from html5lib in order to make a library that makes the most of Java instead of trying to map Pythonic stuff to Java
  727. # [23:24] * Quits: Ducki_ (n=Alex@dialin-145-254-186-098.pools.arcor-ip.net) (Read error: 110 (Connection timed out))
  728. # [23:31] <MikeSmith> hsivonen - thanks for the info
  729. # [23:37] <hsivonen> MikeSmith: to elaborate a bit: the SAX interface makes it possible for me to violate the interface contract in a way that exposes all of HTML5 in a way that may violate XML 1.0. The XOM interface, by design, won't allow it. When using a DOM impl meant for XML, some of the violation may not pass, either.
  730. # [23:38] <hsivonen> MikeSmith: so the non-XML stuff will be available through SAX (which I'm treating as the native interface) and custom DOM impls if someone cares to make one
  731. # [23:41] * Joins: weinig (i=weinig@nat/apple/x-980a2e775f61ddd9)
  732. # [23:42] <rubys> hsivonen: the Ruby implementation is meant to make the most of Ruby, and diverges in a number of significant ways.
  733. # [23:42] <rubys> I did use the Python implementation as a starting point, but only as that, and only because it saved me some time.
  734. # [23:43] <hsivonen> rubys: ok. anyway, I suggest pinging Mike Day to avoid duplicating what he has already been doing
  735. # [23:44] <rubys> that's why I've been advocating putting implementations into one place (html5lib)... so as to minimize the "search time" it takes to find out the actual current state of an implementation.
  736. # [23:45] <rubys> what is the license, for example, of Mike's work?
  737. # [23:46] <hsivonen> rubys: the reason why I put the Java impl in a different repo is to keep it together with the rest of the conformance checker which in turn is there in order to keep it together with the schema project
  738. # [23:46] <hsivonen> rubys: MIT/expat, IIRC
  739. # [23:47] <hsivonen> rubys: MIT/expat seems to be the convention for HTML5 parsers :-)
  740. # [23:47] <rubys> ... eventually it will likely no longer be "the" (as in "the only") Java implementation. :-)
  741. # [23:48] <hsivonen> rubys: do you mean because of the repo choice or in general?
  742. # [23:50] <rubys> the two parsers that are in html5 have essentially zero required dependencies, and very few optional dependencies. I'd like to see a similar effort in PHP, Java, C#, and C.
  743. # [23:51] <hsivonen> rubys: my Java impl depends on a couple of my utility classes and ICU4J
  744. # [23:51] <hsivonen> rubys: putting the utility classes in one jar with the parser is not a big deal
  745. # [23:51] <rubys> i tried downloading it once. that was not the impression I got. But perhaps I was wrong.
  746. # [23:52] <hsivonen> rubys: making ICU4J optional for reduced correctness is not a big deal, either
  747. # [23:52] <hsivonen> rubys: do you mean you downloaded the parser that I'm currently working on or the conformance checker way back when you mentioned it in your blog comments
  748. # [23:53] <rubys> way back when
  749. # [23:53] <hsivonen> rubys: when my parser implementation is in a state where it can actually be used, I intend to offer a binary jar that doesn't require you to run the whole conformance checker build
  750. # [23:53] <hsivonen> (and the conformance checker build is now much easier, too)
  751. # [23:54] <hsivonen> rubys: the parser I'm now writing is not the prototype parser you saw way back when
  752. # [23:54] <rubys> Cool. Is there a single place where implementations can be found?
  753. # [23:55] * Joins: othermaciej (n=mjs@17.255.106.198)
  754. # [23:55] <rubys> If not, can we make such a list on http://wiki.whatwg.org/wiki/ ?
  755. # [23:55] <hsivonen> rubys: dunno if the WHATWG wiki is up to date
  756. # [23:55] <hsivonen> rubys: in any case, I suggest that we link to each other whenever someone makes something runnable in a new language
  757. # [23:56] <hsivonen> (my tree builder is not runnable just yet)
  758. # [23:56] <rubys> How about this: I'll update html5lib to point to http://wiki.whatwg.org/wiki/Implementations
  759. # [23:56] <hsivonen> makes sense
  760. # [23:56] <hsivonen> svn co http://svn.versiondude.net/whattf/htmlparser/trunk/ htmlparser
  761. # [23:56] <hsivonen> in case you are interested
  762. # [23:57] <hsivonen> depends on the util module in the same repo, ICU4J and Java5
  763. # [23:57] * Quits: weinig (i=weinig@nat/apple/x-980a2e775f61ddd9)
  764. # [23:58] * Quits: MikeSmith (n=MikeSmit@eM60-254-197-94.pool.emobile.ad.jp) ("Less talk, more pimp walk.")
  765. # [23:58] * Parts: hasather (n=hasather@22.80-203-71.nextgentel.com)
  766. # [23:58] <rubys> are there any tests?
  767. # [23:59] <hsivonen> rubys: you need to check out html5lib separately to get test data
  768. # [23:59] <hsivonen> rubys: there are test harnesses for running html5lib encoding tests and tokenization tests
  769. # [23:59] <hsivonen> (tree builder harness will follow in due course)
  770. # [23:59] * Joins: MikeSmith (n=MikeSmit@eM60-254-197-94.pool.emobile.ad.jp)
  771. # Session Close: Fri Jul 06 00:00:00 2007

The end :)