/irc-logs / freenode / #whatwg / 2008-12-22 / end

Options:

  1. # Session Start: Mon Dec 22 00:00:00 2008
  2. # Session Ident: #whatwg
  3. # [00:00] * Joins: shepazu (n=schepers@cpe-65-29-70-220.indy.res.rr.com)
  4. # [00:02] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  5. # [00:04] * Joins: karlushi (n=karl@74.58.58.53)
  6. # [00:10] * Parts: erlehmann (n=erlehman@86.59.25.121)
  7. # [00:11] * Quits: Maurice (n=copyman@5ED548D4.cable.ziggo.nl) ("Disconnected...")
  8. # [00:11] * Joins: erlehmann (n=erlehman@86.59.25.121)
  9. # [00:13] * Quits: karlcow (n=karl@216.144.126.222) (Read error: 113 (No route to host))
  10. # [00:17] * Joins: hdh (n=hdh@58.187.23.189)
  11. # [00:18] * Quits: wakaba (n=wakaba@220.210.164.189) (Read error: 54 (Connection reset by peer))
  12. # [00:18] * Joins: wakaba_ (n=wakaba@189.164.210.220.dy.bbexcite.jp)
  13. # [00:46] * Joins: jruderman (n=jruderma@ip68-5-179-249.oc.oc.cox.net)
  14. # [00:49] * Quits: shepazu (n=schepers@cpe-65-29-70-220.indy.res.rr.com)
  15. # [00:51] * Quits: karlushi (n=karl@74.58.58.53) (Read error: 60 (Operation timed out))
  16. # [00:56] * Joins: karlushi (n=karl@216.144.126.222)
  17. # [00:59] * Joins: jruderman__ (n=jruderma@ip68-5-179-249.oc.oc.cox.net)
  18. # [01:00] * Quits: jruderman (n=jruderma@ip68-5-179-249.oc.oc.cox.net) (Read error: 60 (Operation timed out))
  19. # [01:02] * Quits: jruderman_ (n=jruderma@ip68-5-179-249.oc.oc.cox.net) (Read error: 110 (Connection timed out))
  20. # [01:03] <Philip`> [ ["Character", "txet>x lmth EPYTCOD!"], "ParseError", ["Character", "<"] ]
  21. # [01:03] <Philip`> Hmm, that doesn't quite look right
  22. # [01:04] * Joins: jruderman (n=jruderma@ip68-5-179-249.oc.oc.cox.net)
  23. # [01:09] * Joins: olliej (n=oliver@c-67-164-125-23.hsd1.ca.comcast.net)
  24. # [01:18] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
  25. # [01:19] * Joins: shepazu (n=schepers@adsl-76-252-31-89.dsl.ipltin.sbcglobal.net)
  26. # [01:20] * Quits: shepazu (n=schepers@adsl-76-252-31-89.dsl.ipltin.sbcglobal.net) (Remote closed the connection)
  27. # [01:21] * Quits: jruderman__ (n=jruderma@ip68-5-179-249.oc.oc.cox.net) (Read error: 110 (Connection timed out))
  28. # [01:33] * Joins: jruderman_ (n=jruderma@ip68-5-179-249.oc.oc.cox.net)
  29. # [01:36] * Quits: jruderman (n=jruderma@ip68-5-179-249.oc.oc.cox.net) (Read error: 60 (Operation timed out))
  30. # [01:50] <Philip`> Hooray, now my OCaml code passes all of the tokeniser tests (excluding the content model / escape flag ones)
  31. # [01:50] <Philip`> It's sometimes horrendously inefficient, e.g. every time it consumes an entity it sorts the whole entity list by length and then iterates through to find the first match, but that's okay because efficient is a non-goal
  32. # [01:51] <Philip`> s/efficient/efficiency/
  33. # [01:51] <famicom> eh
  34. # [01:52] <famicom> simplicity>consistency>efficiency
  35. # [01:52] <takkaria> except, say, when writing parsers that need to be time-efficient, when efficiencey is a pretty important thing
  36. # [01:52] <famicom> takkaria
  37. # [01:52] <famicom> repeat after me: "Premature optimization is the root of all evil"
  38. # [01:53] <takkaria> I'm not talking about premature optimisation :)
  39. # [01:53] <Philip`> Overly late optimisation is a problem too - you have to be careful to get it just right :-)
  40. # [01:54] <famicom> philip: You mean like mozilla firefox?
  41. # [01:54] <famicom> which is apiece of bloat
  42. # [01:54] <famicom> it crashed when i tried to open 109 bookmarks at the same time
  43. # [01:55] <Philip`> (My OCaml thing is meant to act as a flexible reference implementation rather than as a usable parser, but the idea is to be able to compile that implementation into efficient code in other languages)
  44. # [02:00] <Philip`> http://philip.html5.org/misc/tokeniser_states.png
  45. # [02:45] * Quits: tndH (n=Rob@adsl-83-100-138-116.karoo.KCOM.COM) ("ChatZilla 0.9.84-rdmsoft [XULRunner 1.9.0.1/2008072406]")
  46. # [02:51] * Quits: Amorphous (i=jan@unaffiliated/amorphous) (Read error: 110 (Connection timed out))
  47. # [02:53] * Joins: Amorphous (i=jan@unaffiliated/amorphous)
  48. # [02:57] * Joins: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au)
  49. # [04:07] * Parts: erlehmann (n=erlehman@86.59.25.121)
  50. # [04:07] * Joins: erlehmann (n=erlehman@86.59.25.121)
  51. # [04:07] * Parts: erlehmann (n=erlehman@86.59.25.121)
  52. # [04:08] * Joins: erlehmann (n=erlehman@86.59.25.121)
  53. # [04:17] * Joins: MikeSmith (n=MikeSmit@58.157.21.205)
  54. # [04:28] * Quits: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au) ("This computer has gone to sleep")
  55. # [04:46] <jwalden> Philip`: that's the entire state-transition diagram for HTML5, I take it? beats out ECMA-262 for simplicity as I recall
  56. # [05:05] * Joins: dglazkov (n=dglazkov@c-24-130-144-56.hsd1.ca.comcast.net)
  57. # [05:21] * Quits: doublec (n=chris@202.0.36.64) ("Leaving")
  58. # [05:43] * Joins: doublec (n=Chris_Do@118-92-151-230.dsl.dyn.ihug.co.nz)
  59. # [05:48] * Quits: doublec (n=Chris_Do@118-92-151-230.dsl.dyn.ihug.co.nz) (Read error: 104 (Connection reset by peer))
  60. # [05:48] * Joins: doublec (n=Chris_Do@118-92-151-230.dsl.dyn.ihug.co.nz)
  61. # [05:58] * Quits: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net) (Read error: 54 (Connection reset by peer))
  62. # [05:59] * Joins: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net)
  63. # [06:06] * Quits: heycam (n=cam@clm-laptop.infotech.monash.edu.au) ("bye")
  64. # [06:25] * Quits: doublec (n=Chris_Do@118-92-151-230.dsl.dyn.ihug.co.nz) ("ChatZilla 0.9.79-rdmsoft [XULRunner 1.8.0.9/2006120508]")
  65. # [06:38] * Quits: dglazkov (n=dglazkov@c-24-130-144-56.hsd1.ca.comcast.net)
  66. # [06:47] * Quits: karlushi (n=karl@216.144.126.222) (Read error: 113 (No route to host))
  67. # [06:50] * Joins: ap (n=ap@195.239.126.12)
  68. # [06:57] * Quits: Sephr (n=Sephr@c-68-38-250-93.hsd1.pa.comcast.net) ("Sephr.net")
  69. # [07:03] * Joins: harig (n=harig_in@122.160.12.230)
  70. # [07:16] * Joins: aboodman2 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net)
  71. # [07:41] * Joins: maikmerten (n=merten@ls5dhcp195.cs.uni-dortmund.de)
  72. # [07:44] * Joins: heycam (n=cam@210-84-45-25.dyn.iinet.net.au)
  73. # [07:54] * Quits: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net) (Read error: 104 (Connection reset by peer))
  74. # [07:54] * Joins: jacobolus_ (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net)
  75. # [07:56] * Quits: aboodman2 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net)
  76. # [08:01] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
  77. # [08:09] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  78. # [08:09] * Joins: weinig (n=weinig@c-69-181-81-233.hsd1.ca.comcast.net)
  79. # [08:17] * Quits: jacobolus_ (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net) (Read error: 104 (Connection reset by peer))
  80. # [08:18] * Joins: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net)
  81. # [08:27] * Joins: pesla (n=retep@procurios.xs4all.nl)
  82. # [08:34] * Joins: pergj (n=pergj@195.159.61.155)
  83. # [08:38] * Quits: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net) (Read error: 131 (Connection reset by peer))
  84. # [08:38] * Joins: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net)
  85. # [08:53] * Quits: pergj (n=pergj@195.159.61.155) ("Ex-Chat")
  86. # [08:55] * Joins: pergj (n=pergj@195.159.61.155)
  87. # [08:55] * Quits: pergj (n=pergj@195.159.61.155) (Remote closed the connection)
  88. # [08:56] * Joins: pergj (n=pergj@195.159.61.155)
  89. # [08:58] * Quits: pergj (n=pergj@195.159.61.155) (Client Quit)
  90. # [09:00] * Joins: pergj (n=pergj@195.159.61.155)
  91. # [09:01] * Quits: pergj (n=pergj@195.159.61.155) (Client Quit)
  92. # [09:02] * Joins: pergj (n=pergj@195.159.61.155)
  93. # [09:38] * Quits: harig (n=harig_in@122.160.12.230) (Read error: 110 (Connection timed out))
  94. # [09:48] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
  95. # [09:49] * Joins: yecril71 (n=giecrilj@piekna-gts.2a.pl)
  96. # [09:50] <yecril71> The advantage of having window for global scope is that otherwise you would not be able to differentiate between local and global.
  97. # [09:52] <yecril71> It does not cover all identifiers, e.g. it does not apply to constants and class names, but it is useful nevertheless.
  98. # [09:56] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
  99. # [10:00] <yecril71> Modern blogs and wikis allow users to embed images in editable content.
  100. # [10:00] * Joins: Maurice (n=copyman@5ED548D4.cable.ziggo.nl)
  101. # [10:09] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
  102. # [10:16] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) ("Leaving")
  103. # [10:21] * Parts: erlehmann (n=erlehman@86.59.25.121)
  104. # [10:24] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
  105. # [10:26] * Joins: danbri (n=danbri@ip565f6edb.direct-adsl.nl)
  106. # [10:38] <Philip`> jwalden: That's just for the tokeniser
  107. # [10:38] <jwalden> okay, I *think* that's analogous
  108. # [10:40] * jgraham tries to check in changes to html5lib gets caught by merge errors, cries
  109. # [10:40] <Philip`> jwalden: (The tree constructor algorithm is more like http://philip.html5.org/misc/insertion-modes-4.svg but that's about nine months out of date)
  110. # [10:41] <jwalden> tables
  111. # [10:42] <jwalden> bleh, let's just get rid of 'em
  112. # [10:42] <jwalden> :-)
  113. # [11:02] * Joins: ROBOd (n=robod@89.122.216.38)
  114. # [11:04] * Philip` remembers he used to have something that split out the content model flags like http://canvex.lazyilluminati.com/misc/states10.png but can't find the code anywhere :-(
  115. # [11:09] * Joins: erlehmann (n=erlehman@86.59.25.121)
  116. # [11:26] * Quits: Lachy (n=Lachlan@85.196.122.246) ("This computer has gone to sleep")
  117. # [11:33] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
  118. # [11:40] * Joins: Lachy (n=Lachlan@pat-tdc.opera.com)
  119. # [11:44] * Joins: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
  120. # [12:13] * Quits: hdh (n=hdh@58.187.23.189) ("Leaving.")
  121. # [12:22] * Quits: jwalden (n=waldo@c-67-180-39-55.hsd1.ca.comcast.net) (Connection reset by peer)
  122. # [12:24] <gsnedders> Philip`: Have you looked in /dev/null?
  123. # [12:44] * Quits: MikeSmith (n=MikeSmit@58.157.21.205) ("sex break")
  124. # [12:49] * Joins: mookid (i=mookid@ROFL.name)
  125. # [13:02] * Quits: Kuruma (n=Kuruman@h116-000-163-146.catv01.catv-yokohama.ne.jp) (Remote closed the connection)
  126. # [13:05] * Joins: Kuruma (n=Kuruman@h116-000-163-146.catv01.catv-yokohama.ne.jp)
  127. # [13:18] * Quits: ap (n=ap@195.239.126.12)
  128. # [13:21] <Philip`> gsnedders: Yes, but I couldn't find anything in there
  129. # [13:25] <Philip`> http://canvas.quaese.de/ looks like a handy canvas tutorial, if you speak German
  130. # [13:27] <hsivonen> it should be in /dev/random along with the works of Shakespeare
  131. # [13:37] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  132. # [13:38] * Joins: karlcow (n=karl@216.144.126.222)
  133. # [13:38] * Quits: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
  134. # [13:39] <jgraham> Philip`: Any good ideas about how to implement the character encoding reparsing stuff in html5lib?
  135. # [13:39] <jgraham> s/encoding/encoding switching/
  136. # [13:42] * Joins: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
  137. # [13:42] <Philip`> jgraham: I know almost entirely nothing about how character encoding works in HTML5 or html5lib or Python, so I have no ideas :-(
  138. # [14:00] <jgraham> Philip`: What I know: If we hit a meta element we need to either be sure that all the characters consumed so far have the same encoding as the previous characters or restart the parsing. The underlying file-like object may not natively support reseeking to the beginning so we either have to reread it or buffer the whole thing ourselves.
  139. # [14:01] <Philip`> We already try to buffer the first 10KB of the stream as soon as you start parsing it
  140. # [14:02] <gsnedders> jgraham: We don't want to re-read it if it's a urllib object of a POST request, for example
  141. # [14:02] <gsnedders> jgraham: So we probably need to buffer it
  142. # [14:02] <jgraham> I _think_ we need to buffer the raw character data before replacment characters are inserted and line breaks are normalised
  143. # [14:03] <Philip`> jgraham: Oh, that sounds true, and we only buffer the post-preprocessed input stream
  144. # [14:03] <Philip`> Is there some fixed limit on how much would need to be buffered?
  145. # [14:04] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
  146. # [14:04] <jgraham> Philip`: AFAIK, no
  147. # [14:04] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  148. # [14:04] <jgraham> So maybe we should make a BufferedStream type that adds a .tell() and .seek() method to non-buffered streams
  149. # [14:04] <jgraham> By storing all the read data in a buffer
  150. # [14:04] <jgraham> (there is something like this already but it is not quite what we want)
  151. # [14:05] <Philip`> So for a document that never confidently declares a character encoding, the entire thing will be buffered in memory?
  152. # [14:05] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  153. # [14:05] <Philip`> In that case, we could just slurp the entire stream into a single string at the start, and then parse that
  154. # [14:05] <jgraham> Oh but then there is another problem because when we hit the <meta> element we don't know where in the unprocessed stream we are
  155. # [14:05] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  156. # [14:06] <jgraham> (assuming we want to read that in chunks for the sake of efficiency)
  157. # [14:06] <Philip`> Why does it matter where we are in the unprocessed stream?
  158. # [14:07] <Philip`> If the encoding changes incompatibly, it would just have to seek to 0 and start again, and it wouldn't matter where it had changed
  159. # [14:07] <Philip`> Oh
  160. # [14:08] <jgraham> It matters if we want to continue without reparisng if the encoding is compatible
  161. # [14:08] <Philip`> but it needs to work out whether anything has changed incompatibly, up to the end of the meta charset element
  162. # [14:08] <Philip`> which means it needs to know where it's read up to
  163. # [14:08] <Philip`> Oh, and that too
  164. # [14:09] <Philip`> It'd be easier if html5lib decided not to be a "user agent [that] supports changing the converter on the fly"
  165. # [14:09] <jgraham> Yeah, we could ignore that for now
  166. # [14:09] <jgraham> (but it would be a perf. in if we supported it)
  167. # [14:09] <jgraham> /in/win/
  168. # [14:10] <jgraham> (assuming supporting it didn't place an undue burden on the implementation)
  169. # [14:10] <Philip`> jgraham: (Probably not much of one, since meta charset will typically be near the start of the document and it wouldn't have to reparse much at all)
  170. # [14:10] <takkaria> Hubbub doesn't allow changing the convertor on the fly, it just reparses
  171. # [14:10] <Philip`> takkaria: Is its input a stream or a string or something?
  172. # [14:11] <takkaria> yes, a string, so not a particularly useful comment from me there. :)
  173. # [14:12] <Philip`> jgraham: Is there a reason why html5lib should use streams rather than slurping everything into a string?
  174. # [14:12] <Philip`> Memory is cheap, after all :-)
  175. # [14:13] <jgraham> Philip`: It seems nicer? (especially for long strings). Also, we could, in principle, throw the buffer away once the encoding confidence was certian
  176. # [14:14] <takkaria> more properly, what hubbub actually does is call a "character encoding change" hook, which then can set a flag on the tokeniser so that it stops parsing and returns the new character encoding
  177. # [14:14] <takkaria> and then the app that's using hubbub has to send the data in again
  178. # [14:17] * Philip` mostly just wants to reduce the overhead of calling char(), to make things much faster, but that seems independent of the encoding-related buffering/reparsing issue since it's on the opposite side of the decoder
  179. # [14:18] * Quits: karlcow (n=karl@216.144.126.222) ("This computer has gone to sleep")
  180. # [14:20] <Philip`> BufferedStream with .seek_to_zero() (and reparse when the encoding changes, don't do the complex changing-on-the-fly thing) sounds like the sanest approach, I guess
  181. # [14:21] * jgraham wonders if hsivonen solved this issue
  182. # [14:21] <jgraham> Philip`: OK, I will look at that at some point soon
  183. # [14:22] <jgraham> (like maybe this evening)
  184. # [14:24] <hsivonen> for Java, I figured that it happens too often that the buffering in the character decoder causes non-ASCII to be buffered by the time of changing encodings
  185. # [14:24] <hsivonen> so I decided to remove support for changing decoders in place
  186. # [14:25] <hsivonen> instead, the java.io-based driver restarts the parse unconditionally when changing encodings
  187. # [14:26] <hsivonen> I intend to implement the same strategy for Gecko, but the current Gecko behavior is different, so I'm not sure if the spec as currently written is completely Web-compatible here
  188. # [14:26] <hsivonen> We'll see
  189. # [14:26] <jgraham> hsivonen: What does GEcko do?
  190. # [14:27] <hsivonen> jgraham: I don't understand what it does.
  191. # [14:27] <hsivonen> jgraham: my hypothesis is that it reparses if scripts haven't run and changes decoders in place if scripts have run
  192. # [14:38] * Quits: danbri (n=danbri@unaffiliated/danbri)
  193. # [14:43] * Joins: tndH (n=Rob@adsl-83-100-138-116.karoo.KCOM.COM)
  194. # [14:45] * Quits: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
  195. # [14:51] * Quits: olliej (n=oliver@c-67-164-125-23.hsd1.ca.comcast.net) (Remote closed the connection)
  196. # [14:51] * Joins: olliej (n=oliver@c-67-164-125-23.hsd1.ca.comcast.net)
  197. # [14:55] * Joins: karlcow (n=karl@modemcable168.84-81-70.mc.videotron.ca)
  198. # [15:05] * Joins: aroben (i=aroben@unaffiliated/aroben)
  199. # [15:12] * Joins: aroben_ (i=aroben@unaffiliated/aroben)
  200. # [15:22] * Quits: Hish (n=chatzill@mail2.n-e-s.de) (Remote closed the connection)
  201. # [15:24] <jgraham> Philip` or someone - let me know if I just horribly broke html5lib in some way and I'll back out the change (I checked in more than I intended to anyway)
  202. # [15:25] <Philip`> jgraham: It already fails enough test cases that I probably wouldn't notice if all the rest started breaking too :-)
  203. # [15:27] * Quits: aroben (i=aroben@unaffiliated/aroben) (Read error: 110 (Connection timed out))
  204. # [15:29] * aroben_ is now known as aroben
  205. # [15:45] <jgraham> Philip`: BTW, I think it should be a little faster now
  206. # [15:46] * Philip` sees ihatexml.py
  207. # [15:51] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
  208. # [15:53] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
  209. # [15:53] * Joins: dbaron (n=dbaron@pool-173-49-118-225.phlapa.fios.verizon.net)
  210. # [15:56] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  211. # [16:01] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  212. # [16:03] * Quits: olliej (n=oliver@c-67-164-125-23.hsd1.ca.comcast.net)
  213. # [16:10] * Joins: dglazkov (n=dglazkov@c-24-130-144-56.hsd1.ca.comcast.net)
  214. # [16:21] * Joins: ap (n=ap@195.239.126.11)
  215. # [16:30] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  216. # [16:32] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  217. # [16:32] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  218. # [16:33] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  219. # [16:37] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  220. # [16:38] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  221. # [16:38] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  222. # [16:38] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) ("Leaving")
  223. # [16:40] * Quits: dglazkov (n=dglazkov@c-24-130-144-56.hsd1.ca.comcast.net)
  224. # [16:43] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  225. # [16:43] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  226. # [16:51] * Joins: myakura (n=myakura@p3156-ipbf1910marunouchi.tokyo.ocn.ne.jp)
  227. # [16:55] * Quits: pergj (n=pergj@195.159.61.155) (Read error: 110 (Connection timed out))
  228. # [16:55] * Quits: maikmerten (n=merten@ls5dhcp195.cs.uni-dortmund.de) (Remote closed the connection)
  229. # [17:09] * Joins: dglazkov (n=dglazkov@nat/google/x-099d3ce636b3234b)
  230. # [17:21] * Quits: pesla (n=retep@procurios.xs4all.nl) ("( www.nnscript.com :: NoNameScript 4.21 :: www.esnation.com )")
  231. # [17:29] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Read error: 60 (Operation timed out))
  232. # [17:42] * Joins: mlpug (n=user@a88-115-168-225.elisa-laajakaista.fi)
  233. # [17:46] * Joins: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au)
  234. # [17:46] * Quits: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au) (Remote closed the connection)
  235. # [17:51] * Quits: Lachy (n=Lachlan@pat-tdc.opera.com) ("This computer has gone to sleep")
  236. # [17:51] * Philip` generates twelve thousand tokeniser test cases, and finds one bug in html5lib
  237. # [17:54] <gsnedders> Philip`: Then you don't have enough test cases
  238. # [17:56] <Philip`> gsnedders: I can't think of any more test cases to add, since I have one case for each interesting character that can occur from every tokeniser state
  239. # [18:00] <gsnedders> Philip`: Do you test every possible unicode character in every state?
  240. # [18:01] <gsnedders> No, you don't.
  241. # [18:01] <Philip`> gsnedders: No, because those aren't interesting characters
  242. # [18:01] <gsnedders> Philip`: That doesn't mean there aren't interesting bugs
  243. # [18:02] <Philip`> gsnedders: It means it's very unlikely that there will be bugs, because I test all the characters that a sane tokeniser would depend on, and every other character is equivalent and has no special processing
  244. # [18:03] <gsnedders> Philip`: You are assuming tokenizers are sane, which is very naïve
  245. # [18:19] * Quits: weinig (n=weinig@c-69-181-81-233.hsd1.ca.comcast.net)
  246. # [18:27] * Quits: myakura (n=myakura@p3156-ipbf1910marunouchi.tokyo.ocn.ne.jp) ("Leaving...")
  247. # [18:27] <takkaria> Philip`: please do make those testcases public. :)
  248. # [18:33] <jruderman_> Philip`: i bet you'd find more bugs by fuzzing than by trying to be exhaustive wrt one aspect of parsing
  249. # [18:45] * Joins: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au)
  250. # [18:51] * Joins: weinig (n=weinig@17.203.15.158)
  251. # [19:01] * Joins: weinig_ (n=weinig@nat/apple/x-d841b18ac91e3904)
  252. # [19:01] * Joins: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
  253. # [19:01] * dave_levin is now known as dave_levin|AWAY
  254. # [19:04] * Joins: shepazu (n=schepers@mo-76-0-60-125.dhcp.embarqhsd.net)
  255. # [19:16] * Quits: weinig (n=weinig@17.203.15.158) (Read error: 110 (Connection timed out))
  256. # [19:18] * Quits: dbaron (n=dbaron@pool-173-49-118-225.phlapa.fios.verizon.net) (Read error: 60 (Operation timed out))
  257. # [19:20] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  258. # [19:22] * weinig_ is now known as weinig
  259. # [19:22] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  260. # [19:22] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  261. # [19:29] * Quits: drry (n=drry@it17.opt2.point.ne.jp)
  262. # [19:32] * Joins: drry (n=drry@it17.opt2.point.ne.jp)
  263. # [19:32] <Philip`> takkaria: I think it'd be a bad idea to add them all into html5lib, but I could just upload them to the web somewhere
  264. # [19:32] <gsnedders> Philip`: Add them all into html5lib, please.
  265. # [19:32] * Joins: jwalden_ (n=waldo@corp-241.mountainview.mozilla.com)
  266. # [19:32] * jwalden_ is now known as jwalden
  267. # [19:34] <Philip`> jruderman_: This seems like a case where exhaustiveness is relatively feasible, since there's an algorithm with a well-defined series of states and state transitions, and most implementations are pretty close to that definition, so it works at providing decent coverage of the implementations
  268. # [19:34] <Philip`> gsnedders: Why?
  269. # [19:34] <Philip`> gsnedders: Also: No
  270. # [19:42] <gsnedders> Philip`: Because then we have test cases located in one place
  271. # [19:43] * Joins: dbaron (n=dbaron@pool-173-49-118-225.phlapa.fios.verizon.net)
  272. # [19:44] <Philip`> gsnedders: But if there's twelve thousand tokeniser tests, and it takes ages to run them all, people will run the tests less often, which is detrimental
  273. # [19:45] <gsnedders> Philip`: But if they aren't there then they won't be wrong, which is detrimental
  274. # [19:45] <gsnedders> s/wrong/run/
  275. # [19:45] <gsnedders> Interesting typo.
  276. # [19:46] <Philip`> gsnedders: It's only detrimental if they would have caught a bug that the remaining tests would have missed
  277. # [19:47] <Philip`> (and most of these tests are very redundant)
  278. # [19:47] * Quits: aroben (i=aroben@unaffiliated/aroben) (Read error: 104 (Connection reset by peer))
  279. # [19:47] * Quits: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au) ("This computer has gone to sleep")
  280. # [19:48] <Philip`> (e.g. there are tests for "<!DOCTYPEa", "<!DOCTYPEb", "<!DOCTYPEy", "<!DOCTYPEz", "<!DOCTYPEA", ...)
  281. # [19:48] <Philip`> Also, if I did check in all these tests, and then the spec changed, someone would find hundreds of errors and get really annoyed trying to manually fix all the test cases
  282. # [19:50] <Philip`> Oops, there's only actually about one thousand tests, since I didn't sufficiently uniquify them
  283. # [20:03] <gsnedders> That certainly isn't too many.
  284. # [20:05] * Joins: annevk (n=annevk@53530B04.cable.casema.nl)
  285. # [20:14] <gsnedders> ergh. This is going to be horrible. Having the same @cite over and over again.
  286. # [20:14] <gsnedders> Meh.
  287. # [20:17] * Quits: weinig (n=weinig@nat/apple/x-d841b18ac91e3904) (Remote closed the connection)
  288. # [20:17] * Joins: weinig (n=weinig@17.203.15.158)
  289. # [20:22] <Philip`> http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/test3.test
  290. # [20:22] <Philip`> Happy now? :-p
  291. # [20:23] <Philip`> (That's about 1500, after I stopped stupidly failing to remove duplicates)
  292. # [20:23] <Philip`> takkaria: There's some new tests for you to run if you fancy it :-)
  293. # [20:25] <gsnedders> Philip`: :)
  294. # [20:54] <Dashiva> Are the tests sorted in order of relevance? :)
  295. # [20:58] <Philip`> I don't have a way to quantitatively determine relevance, so they're just sorted on the input strings :-p
  296. # [20:58] * Joins: aroben (n=adamrobe@c-69-142-103-232.hsd1.pa.comcast.net)
  297. # [21:00] * Quits: ROBOd (n=robod@89.122.216.38) (Excess Flood)
  298. # [21:01] * Joins: ROBOd (n=robod@89.122.216.38)
  299. # [21:04] * Joins: aaronlev (n=chatzill@e176230253.adsl.alicedsl.de)
  300. # [21:19] * Parts: annevk (n=annevk@53530B04.cable.casema.nl)
  301. # [21:21] * Quits: dolske (n=dolske@firefox/developer/dolske)
  302. # [21:27] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
  303. # [21:27] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  304. # [21:34] * Quits: yecril71 (n=giecrilj@piekna-gts.2a.pl)
  305. # [21:34] * Quits: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
  306. # [21:35] * Quits: mlpug (n=user@a88-115-168-225.elisa-laajakaista.fi) (Remote closed the connection)
  307. # [21:35] <gsnedders> Is it reasonable to write notes for English in HTML?
  308. # [21:37] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
  309. # [21:38] <jruderman_> "for English"? as in classroom lecture notes?
  310. # [21:38] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  311. # [21:38] <gsnedders> jruderman_: Well, not lecture notes, but for my final year of school (in the en-gb meaning of school)
  312. # [21:38] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  313. # [21:38] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
  314. # [21:39] <jruderman_> i used HTML for a few papers in college
  315. # [21:39] <jruderman_> and TeX for others
  316. # [21:39] <gsnedders> These are notes for my dissertation: I intend on doing the dissertation itself using XeTeX
  317. # [21:40] <jruderman_> i liked using HTML because i could easily tweak styles across an entire document. wysiwyg word processors usually don't do that well.
  318. # [21:41] <jruderman_> for example, if i needed to pad my paper a little, it was a simple matter of p { line-height: 1.05em; }
  319. # [21:41] <jruderman_> slightly less obvious than changing the font size ;)
  320. # [21:41] <gsnedders> For notes that isn't so needed :)
  321. # [21:41] <jruderman_> hehe
  322. # [21:42] <jruderman_> still useful to be able to change the styles of all the headings at once, though
  323. # [21:42] <jruderman_> another advantage of HTML is that you can put the notes on your web site and not worry about what software viewers have; )
  324. # [21:45] <Philip`> Is text/plain inadequate for notes?
  325. # [21:45] <gsnedders> Philip`: Yes
  326. # [21:45] <Philip`> Why?
  327. # [21:45] <gsnedders> Philip`: Can't so easily build TOCs for text/plain :)
  328. # [21:46] <Philip`> Why do notes need a TOC?
  329. # [21:46] <Philip`> Just use your editor's 'find' feature if you want to go to a certain section :-)
  330. # [21:46] * gsnedders now has a header element
  331. # [21:49] * Joins: dolske (n=dolske@corp-241.mountainview.mozilla.com)
  332. # [21:50] <gsnedders> I need automatic indexing in anolis
  333. # [21:50] * Joins: annevk (n=annevk@77.163.243.203)
  334. # [21:50] <gsnedders> I do like how I mention that then someone who asked for it comes along
  335. # [22:01] <gsnedders> Anyone have views on how to mark up a bilbiography?
  336. # [22:05] <Philip`> I suggest putting it in <cite>
  337. # [22:06] <gsnedders> Philip`: "The cite element represents the title of a work"
  338. # [22:06] <Philip`> Who cares what specs say?
  339. # [22:06] <Philip`> You're citing stuff, so use <cite> - it makes perfect sense
  340. # [22:06] <gsnedders> It does, but Hixie's stupid.
  341. # [22:09] <hsivonen> what classes of products is http://www.w3.org/TR/2008/WD-XForms-for-HTML-20081219/ supposed to be normative on?
  342. # [22:10] <hsivonen> gsnedders: my view about marking up a bibliography: http://hsivonen.iki.fi/thesis/html5-conformance-checker#references
  343. # [22:11] <gsnedders> hsivonen: That doesn't conform to ISO 690, though
  344. # [22:11] <gsnedders> I mean, sure, I can use classes, but what do I gain?
  345. # [22:12] <hsivonen> gsnedders: you probably don't gain anything
  346. # [22:12] * gsnedders links urn:isbn:0-330-29666-3
  347. # [22:13] <hsivonen> gsnedders: bibliography formats that don't show the first name of the authors in full suck
  348. # [22:13] <hsivonen> gsnedders: they are bad for googling and disapproved by feminists
  349. # [22:14] <hsivonen> gsnedders: also, emphasizing author names over titles of works sucks when you are mostly referencing specs and technical documents some of which conceal their authors/editors
  350. # [22:15] <gsnedders> hsivonen: I'm referencing a book for English work, so that isn't relevant :)
  351. # [22:16] <hsivonen> gsnedders: you could still make the argument that in cultural contexts where the surname of the author is the surname of the spouse, abbreviating the first name of the author diminishes the personal identifier of the author to one letter, which is uncool
  352. # [22:17] * Quits: ROBOd (n=robod@89.122.216.38) ("http://www.robodesign.ro")
  353. # [22:19] <hsivonen> gsnedders: besides, I suggest making references in a way that you can GET without paying CHF 72
  354. # [22:19] <gsnedders> :)
  355. # [22:20] * Philip` realises that efficiently cutting wrapping paper for varyingly-sized presents is probably a bin packing problem and therefore NP-hard, which is totally unfair
  356. # [22:21] * Quits: jwalden (n=waldo@corp-241.mountainview.mozilla.com) ("ChatZilla 0.9.82.1-rdmsoft [XULRunner 1.8.0.9/2006120508]")
  357. # [22:29] * Joins: Lachy (n=Lachlan@85.196.122.246)
  358. # [22:31] * Joins: jwalden_ (n=waldo@corp-241.mountainview.mozilla.com)
  359. # [22:31] * jwalden_ is now known as jwalden
  360. # [22:31] * gsnedders tries to follow the Oxford Guide to Style
  361. # [22:32] * gsnedders comes up with the probably stupid, "Vladimir Nabokov, The Enchanter [En. trans. of Volshebnik] (trans. Dmitri Nabokov) (London: Pan Books Ltd, 1987) (ISBN 0-330-29666-3)."
  362. # [22:33] * Quits: dolske (n=dolske@firefox/developer/dolske)
  363. # [22:35] * Philip` suggests focussing on the parts of the dissertation that are likely to result in marks :-)
  364. # [22:36] <hsivonen> gsnedders: at least it's positive that they approve of listing the ISBN
  365. # [22:36] <gsnedders> hsivonen: They don't, I ignored that part.
  366. # [22:36] <gsnedders> :)
  367. # [22:36] <hsivonen> oh well
  368. # [22:36] <gsnedders> hsivonen: They do however, as with most of the style guide, give a lot more flexibility than almost anything else
  369. # [22:37] <gsnedders> Philip`: Yeah, I should :)
  370. # [22:38] * Joins: dolske (n=dolske@corp-241.mountainview.mozilla.com)
  371. # [22:39] <gsnedders> Hixie: "A person's name is not the title of a work "
  372. # [22:40] <gsnedders> Hixie: Lolita's name is the title of the book about her!
  373. # [22:44] <Philip`> gsnedders: Only if you do a plain string comparison and ignore the context and semantics
  374. # [22:44] <gsnedders> Philip`: Oh, sure. :P
  375. # [22:44] <gsnedders> (and yes, I am doing my dissertation on such books)
  376. # [22:46] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
  377. # [22:47] * Joins: famicom (n=famicom@5ED2FF2D.cable.ziggo.nl)
  378. # [22:47] * Quits: famicom (n=famicom@5ED2FF2D.cable.ziggo.nl) (Read error: 104 (Connection reset by peer))
  379. # [22:50] * Joins: famicom (n=famicom@5ED2FF2D.cable.ziggo.nl)
  380. # [22:52] * Quits: ap (n=ap@195.239.126.11)
  381. # [22:54] * Quits: famicom (n=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
  382. # [22:54] * Joins: virtuelv (n=virtuelv@74.80-202-66.nextgentel.com)
  383. # [22:55] <virtuelv> JohnResig: you around? You have a few broken links on http://docs.jquery.com/UI
  384. # [22:55] <virtuelv> (namely, all linked examples)
  385. # [23:14] * Joins: Lachy_ (n=Lachlan@rpl-ipsec-053.tip.csiro.au)
  386. # [23:26] * Quits: karlcow (n=karl@modemcable168.84-81-70.mc.videotron.ca) ("This computer has gone to sleep")
  387. # [23:28] * Joins: olliej (n=oliver@nat/apple/x-0d3562fa96f745ff)
  388. # [23:31] * Quits: Lachy (n=Lachlan@85.196.122.246) (Read error: 110 (Connection timed out))
  389. # [23:55] * Joins: Lachy__ (n=Lachlan@85.196.122.246)
  390. # Session Close: Tue Dec 23 00:00:00 2008

The end :)