/irc-logs / freenode / #whatwg / 2009-11-22 / end

Options:

  1. # Session Start: Sun Nov 22 00:00:00 2009
  2. # Session Ident: #whatwg
  3. # [00:06] * Joins: jonpierce (n=jonpierc@64.119.130.114)
  4. # [00:20] * Parts: cpharmston (n=cpharmst@pool-173-66-156-203.washdc.fios.verizon.net)
  5. # [00:47] * Quits: gavin_ (n=gavin@firefox/developer/gavin) (Read error: 145 (Connection timed out))
  6. # [00:48] * Joins: gavin_ (n=gavin@firefox/developer/gavin)
  7. # [00:57] * Joins: nessy (n=Adium@203-214-159-50.dyn.iinet.net.au)
  8. # [01:04] * Joins: erlehmann (n=erlehman@1.106.113.82.net.de.o2.com)
  9. # [01:11] * Joins: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  10. # [01:13] * Quits: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net) (Client Quit)
  11. # [01:22] * Quits: jonpierce (n=jonpierc@64.119.130.114)
  12. # [01:23] * Joins: MikeSmith (n=MikeSmit@EM114-48-25-149.pool.e-mobile.ne.jp)
  13. # [01:23] * Quits: MikeSmith (n=MikeSmit@EM114-48-25-149.pool.e-mobile.ne.jp) (Client Quit)
  14. # [01:25] * Joins: Jeromche (n=cellshoc@201.141.210.83)
  15. # [01:29] * Quits: Jeromche (n=cellshoc@201.141.210.83)
  16. # [01:43] * Joins: othermaciej (n=mjs@c-69-181-42-237.hsd1.ca.comcast.net)
  17. # [01:56] * Quits: tndH (n=Rob@cpc2-leed18-0-0-cust427.leed.cable.ntl.com) ("ChatZilla 0.9.85-rdmsoft [XULRunner 1.9.0.1/2008072406]")
  18. # [02:03] * Quits: archtech (i=stanv@83.228.56.37) (Client Quit)
  19. # [02:15] * Joins: Huvet (n=Emil@c-2fc1e555.07-131-73746f39.cust.bredbandsbolaget.se)
  20. # [02:16] <Huvet> hi everyone! I'm playing around with the html5lib 0.11 python implementation, and is wondering if I might have hit a bug: http://dpaste.com/hold/123513/
  21. # [02:16] <Huvet> I'm parsning the HTML of swedish newspapers, which seems to we one of the worst messes in the world :(
  22. # [02:17] <Huvet> or, I could be doing something wrong, it would not be the first time :)
  23. # [02:27] * Quits: gavin_ (n=gavin@firefox/developer/gavin) (Read error: 110 (Connection timed out))
  24. # [02:28] * Joins: gavin_ (n=gavin@firefox/developer/gavin)
  25. # [02:30] * Quits: ttepasse (n=ttepas--@p5B014E4B.dip.t-dialin.net) ("?Q")
  26. # [02:42] <Huvet> the same error occurs on www.unt.se, and www.uhp.se too
  27. # [02:49] * Joins: gunderwonder (n=gunderwo@89.80-202-84.nextgentel.com)
  28. # [02:50] * Quits: paul_irish (n=paul_iri@64.119.130.114) (Remote closed the connection)
  29. # [02:53] * Joins: Arron (n=arronei@nat/microsoft/x-glkjiykrceibixcx)
  30. # [02:55] * Quits: othermaciej (n=mjs@c-69-181-42-237.hsd1.ca.comcast.net) (sendak.freenode.net irc.freenode.net)
  31. # [02:55] * Quits: Huvet (n=Emil@c-2fc1e555.07-131-73746f39.cust.bredbandsbolaget.se) (sendak.freenode.net irc.freenode.net)
  32. # [02:55] * Quits: arronei (n=arronei@nat/microsoft/x-pcqorwlngqmvpyfw) (sendak.freenode.net irc.freenode.net)
  33. # [02:57] * Joins: Huvet (n=Emil@c-2fc1e555.07-131-73746f39.cust.bredbandsbolaget.se)
  34. # [02:57] * Joins: othermaciej (n=mjs@c-69-181-42-237.hsd1.ca.comcast.net)
  35. # [02:59] * Joins: Huvet1 (n=Emil@c-2fc1e555.07-131-73746f39.cust.bredbandsbolaget.se)
  36. # [03:04] * Quits: Huvet1 (n=Emil@c-2fc1e555.07-131-73746f39.cust.bredbandsbolaget.se) ("Leaving.")
  37. # [03:10] * Quits: Huvet (n=Emil@c-2fc1e555.07-131-73746f39.cust.bredbandsbolaget.se) (Read error: 110 (Connection timed out))
  38. # [03:15] * Joins: jonpierce (n=jonpierc@209-6-91-231.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com)
  39. # [03:24] * Quits: jonpierce (n=jonpierc@209-6-91-231.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com)
  40. # [03:33] * Quits: othermaciej (n=mjs@c-69-181-42-237.hsd1.ca.comcast.net) (sendak.freenode.net irc.freenode.net)
  41. # [03:33] * Joins: othermaciej (n=mjs@c-69-181-42-237.hsd1.ca.comcast.net)
  42. # [03:47] * Joins: hobertoAtWork2 (n=hobertoa@gw1.mcgraw-hill.com)
  43. # [03:48] * Quits: hobertoAtWork (n=hobertoa@198.45.18.20) (Read error: 131 (Connection reset by peer))
  44. # [03:48] * Quits: TabAtkins (n=chatzill@70-139-15-246.lightspeed.rsbgtx.sbcglobal.net) (sendak.freenode.net irc.freenode.net)
  45. # [03:48] * Quits: ivan` (n=ivan@unaffiliated/ivan/x-000001) (sendak.freenode.net irc.freenode.net)
  46. # [03:48] * Quits: AryehGregor (n=Simetric@mediawiki/simetrical) (sendak.freenode.net irc.freenode.net)
  47. # [03:48] * Quits: jarib (i=jarib@li34-70.members.linode.com) (sendak.freenode.net irc.freenode.net)
  48. # [03:48] * Quits: vvv (n=vvv@mediawiki/VasilievVV) (sendak.freenode.net irc.freenode.net)
  49. # [03:48] * Quits: jgraham (n=jgraham@web22.webfaction.com) (sendak.freenode.net irc.freenode.net)
  50. # [03:49] * Joins: TabAtkins (n=chatzill@70-139-15-246.lightspeed.rsbgtx.sbcglobal.net)
  51. # [03:49] * Joins: ivan` (n=ivan@unaffiliated/ivan/x-000001)
  52. # [03:49] * Joins: AryehGregor (n=Simetric@mediawiki/simetrical)
  53. # [03:49] * Joins: jarib (i=jarib@li34-70.members.linode.com)
  54. # [03:49] * Joins: vvv (n=vvv@mediawiki/VasilievVV)
  55. # [03:49] * Joins: jgraham (n=jgraham@web22.webfaction.com)
  56. # [03:54] * Quits: Midler1 (n=midler@212.37.124.243) ("Leaving.")
  57. # [03:55] * Quits: ivan` (n=ivan@unaffiliated/ivan/x-000001) ("jumpin' jumpin'")
  58. # [03:55] * Joins: ivan` (n=ivan@unaffiliated/ivan/x-000001)
  59. # [03:55] * Joins: TabAtkins_ (n=chatzill@70-139-15-246.lightspeed.rsbgtx.sbcglobal.net)
  60. # [03:56] * Quits: jgraham (n=jgraham@web22.webfaction.com) (Read error: 131 (Connection reset by peer))
  61. # [03:56] * Joins: jgraham (n=jgraham@web22.webfaction.com)
  62. # [03:56] * Quits: TabAtkins (n=chatzill@70-139-15-246.lightspeed.rsbgtx.sbcglobal.net) (Read error: 131 (Connection reset by peer))
  63. # [03:56] * TabAtkins_ is now known as TabAtkins
  64. # [03:56] * Joins: jarib_ (i=jarib@li34-70.members.linode.com)
  65. # [03:56] * Quits: jarib (i=jarib@li34-70.members.linode.com) (Read error: 131 (Connection reset by peer))
  66. # [04:00] * Joins: cpharmston (n=cpharmst@pool-173-66-156-203.washdc.fios.verizon.net)
  67. # [04:02] * Quits: vvv (n=vvv@mediawiki/VasilievVV) (sendak.freenode.net irc.freenode.net)
  68. # [04:02] * Quits: AryehGregor (n=Simetric@mediawiki/simetrical) (sendak.freenode.net irc.freenode.net)
  69. # [04:04] * Joins: AryehGregor (n=Simetric@mediawiki/simetrical)
  70. # [04:04] * Joins: vvv (n=vvv@mediawiki/VasilievVV)
  71. # [04:08] * Quits: othermaciej (n=mjs@c-69-181-42-237.hsd1.ca.comcast.net) (sendak.freenode.net irc.freenode.net)
  72. # [04:09] * Joins: othermaciej (n=mjs@c-69-181-42-237.hsd1.ca.comcast.net)
  73. # [04:11] * Quits: othermaciej (n=mjs@c-69-181-42-237.hsd1.ca.comcast.net)
  74. # [04:12] * Quits: wm3|bed (n=davidwor@cpc3-bagu10-0-0-cust651.1-3.cable.virginmedia.com)
  75. # [04:14] * Quits: gunderwonder (n=gunderwo@89.80-202-84.nextgentel.com) (Read error: 110 (Connection timed out))
  76. # [04:15] * Joins: miketaylr (n=miketayl@24.42.95.234)
  77. # [04:15] * Quits: miketaylr (n=miketayl@24.42.95.234) (Remote closed the connection)
  78. # [04:15] * Joins: miketaylr (n=miketayl@24.42.95.234)
  79. # [04:26] * Joins: Lachy (n=Lachlan@85.196.122.246)
  80. # [04:40] * Parts: bentomas (n=bentomas@c-24-9-8-90.hsd1.co.comcast.net)
  81. # [04:43] * Joins: workmad3 (n=davidwor@cpc3-bagu10-0-0-cust651.1-3.cable.virginmedia.com)
  82. # [04:47] * Joins: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  83. # [04:55] * Quits: wakaba_0 (n=wakaba_@206.63.138.58.dy.bbexcite.jp) (Read error: 110 (Connection timed out))
  84. # [04:57] * Joins: wakaba_ (n=wakaba_@122x221x184x68.ap122.ftth.ucom.ne.jp)
  85. # [05:09] * Joins: riven` (n=colin@53518387.cable.casema.nl)
  86. # [05:12] * Quits: riven (n=colin@53518387.cable.casema.nl) (Connection reset by peer)
  87. # [05:12] * Joins: arronei (n=arronei@nat/microsoft/x-klwxenpiknmjwrct)
  88. # [05:19] * Joins: abii (n=macbook@rescomp-09-148450.Stanford.EDU)
  89. # [05:20] * Quits: Arron (n=arronei@nat/microsoft/x-glkjiykrceibixcx) (Read error: 110 (Connection timed out))
  90. # [05:47] * Quits: miketaylr (n=miketayl@24.42.95.234) ("Leaving...")
  91. # [06:21] * Joins: Dashimon (i=Dashiva@m223j.studby.ntnu.no)
  92. # [06:24] * Joins: miketaylr (n=miketayl@24.42.95.234)
  93. # [06:24] * Quits: miketaylr (n=miketayl@24.42.95.234) (Remote closed the connection)
  94. # [06:34] * Quits: cpharmston (n=cpharmst@pool-173-66-156-203.washdc.fios.verizon.net) ("Leaving.")
  95. # [06:38] * Quits: Dashiva (i=Dashiva@wikia/Dashiva) (Read error: 110 (Connection timed out))
  96. # [06:38] * Dashimon is now known as Dashiva
  97. # [06:52] * Joins: paul_irish (n=paul_iri@c-71-192-163-128.hsd1.nh.comcast.net)
  98. # [07:00] * Quits: gavin_ (n=gavin@firefox/developer/gavin) (Read error: 110 (Connection timed out))
  99. # [07:00] * Joins: gavin_ (n=gavin@firefox/developer/gavin)
  100. # [07:18] * Joins: MikeSmith (n=MikeSmit@EM114-48-9-94.pool.e-mobile.ne.jp)
  101. # [07:19] * Quits: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  102. # [07:19] * Quits: GPH-Laptop (n=GPHemsle@pdpc/supporter/student/GPHemsley) (Read error: 104 (Connection reset by peer))
  103. # [07:25] * Joins: harig (i=harig@121.245.103.44)
  104. # [07:42] * Joins: archtech (i=stanv@83.228.56.37)
  105. # [07:44] * Quits: harig (i=harig@121.245.103.44) (sendak.freenode.net irc.freenode.net)
  106. # [07:44] * Quits: Lachy (n=Lachlan@85.196.122.246) (sendak.freenode.net irc.freenode.net)
  107. # [07:44] * Quits: vvv (n=vvv@mediawiki/VasilievVV) (sendak.freenode.net irc.freenode.net)
  108. # [07:44] * Quits: AryehGregor (n=Simetric@mediawiki/simetrical) (sendak.freenode.net irc.freenode.net)
  109. # [07:45] * Joins: harig (i=harig@121.245.103.44)
  110. # [07:45] * Joins: Lachy (n=Lachlan@85.196.122.246)
  111. # [07:45] * Joins: AryehGregor (n=Simetric@mediawiki/simetrical)
  112. # [07:45] * Joins: vvv (n=vvv@mediawiki/VasilievVV)
  113. # [07:51] * Joins: jonpierce (n=jonpierc@209-6-91-231.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com)
  114. # [07:54] * Quits: gavin_ (n=gavin@firefox/developer/gavin) (Read error: 110 (Connection timed out))
  115. # [07:55] * Joins: gavin_ (n=gavin@firefox/developer/gavin)
  116. # [08:06] * Quits: archtech (i=stanv@83.228.56.37) (Client Quit)
  117. # [08:15] * Quits: vvv (n=vvv@mediawiki/VasilievVV) ("KVIrc Insomnia 4.0.0, revision: 3410, sources date: 20090703, built on: 2009/08/12 22:29:13 UTC http://www.kvirc.net/")
  118. # [08:23] * Quits: jonpierce (n=jonpierc@209-6-91-231.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com)
  119. # [08:42] * Quits: dbaron (n=dbaron@c-98-234-51-190.hsd1.ca.comcast.net) ("8403864 bytes have been tenured, next gc will be global.")
  120. # [09:14] * Joins: GPHemsley (n=GPHemsle@69.113.158.192)
  121. # [09:16] * Quits: harig (i=harig@121.245.103.44) (Read error: 145 (Connection timed out))
  122. # [09:40] * Joins: zcorpan (n=zcorpan@c83-252-193-59.bredband.comhem.se)
  123. # [09:47] * Joins: archtech (i=stanv@83.228.56.37)
  124. # [09:57] * Quits: zcorpan (n=zcorpan@c83-252-193-59.bredband.comhem.se) (Read error: 110 (Connection timed out))
  125. # [09:58] * Joins: zcorpan (n=zcorpan@c83-252-193-59.bredband.comhem.se)
  126. # [10:14] * Joins: Maurice (i=copyman@94.213.72.212)
  127. # [10:16] * Quits: zcorpan (n=zcorpan@c83-252-193-59.bredband.comhem.se) (Read error: 110 (Connection timed out))
  128. # [10:43] * Joins: ROBOd (n=robod@89.122.216.38)
  129. # [11:19] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
  130. # [11:26] * Joins: wakaba_0 (n=wakaba_@122x221x184x68.ap122.ftth.ucom.ne.jp)
  131. # [11:38] * Joins: Huvet (n=Emil@c-2fc1e555.07-131-73746f39.cust.bredbandsbolaget.se)
  132. # [11:38] * Quits: wakaba_ (n=wakaba_@122x221x184x68.ap122.ftth.ucom.ne.jp) (Read error: 110 (Connection timed out))
  133. # [11:40] * Joins: maikmerten (n=maikmert@77.132.12.215)
  134. # [12:05] * Quits: ciaran_lee (i=leecn@spoon.netsoc.tcd.ie) (Remote closed the connection)
  135. # [12:05] * Joins: ciaran_lee (i=leecn@134.226.83.42)
  136. # [12:09] * Quits: Rik|work (n=Rik|work@fw01d.skyrock.net) (Connection reset by peer)
  137. # [12:31] * Quits: nessy (n=Adium@203-214-159-50.dyn.iinet.net.au) ("Leaving.")
  138. # [12:46] * Joins: Rik` (n=Rik`@81.57.187.57)
  139. # [12:48] <Philip`> Huvet: 0.11 is very old - you should try it with the latest source version
  140. # [12:55] <Huvet> thanks, I will
  141. # [13:02] * Joins: Michelangelo (n=Michelan@93-42-96-106.ip86.fastwebnet.it)
  142. # [13:10] * Joins: mlpug (n=mlpug@a88-115-164-40.elisa-laajakaista.fi)
  143. # [13:20] * Joins: jonpierce (n=jonpierc@209.6.91.231)
  144. # [13:22] <Huvet> gah, "hg" needed to download the latest source version? what happened to the good old svn days :(
  145. # [13:25] <Philip`> The good old svn days turned into the better new hg days
  146. # [13:26] <Philip`> It's basically the same as SVN except you use the command "hg" instead of "svn" :-)
  147. # [13:26] * Quits: jonpierce (n=jonpierc@209.6.91.231)
  148. # [13:26] <Philip`> ...although I suppose it might be a bit more painful on Windows
  149. # [13:35] <Huvet> well, not really, seems to work exactly like it should
  150. # [13:37] <Huvet> hmm... strange, it checked out the whole tree, even though I requested a subdirectory
  151. # [13:38] * Quits: archtech (i=stanv@83.228.56.37) (No route to host)
  152. # [13:41] <Huvet> hmm... "... you cannot check out only one directory of a repository"
  153. # [13:41] * Joins: cpharmston (n=cpharmst@pool-173-66-156-203.washdc.fios.verizon.net)
  154. # [13:43] * Quits: MikeSmith (n=MikeSmit@EM114-48-9-94.pool.e-mobile.ne.jp) (Read error: 110 (Connection timed out))
  155. # [13:47] <Huvet> hmm... I guess I can't clone the default repository and use that? seems that is 0.11 still. Maybe the 0.2 branch? *figures out how to clone a branch*
  156. # [13:51] <Huvet> is that the latest version? or should I look into some other branch?
  157. # [13:54] <Huvet> ah, fuck it, beautifulsoup seems deprecated anyways
  158. # [13:55] <Philip`> Huvet: Yeah, Hg doesn't support partial checkouts - you just clone the entire repository
  159. # [13:55] <Huvet> yeah, I figured that out
  160. # [13:55] <Philip`> which includes all the branches and everything
  161. # [13:55] <Huvet> ah
  162. # [13:56] <Huvet> how do I know which the latest branch is?
  163. # [13:56] <Philip`> You should just use the default branch
  164. # [13:56] <Huvet> ok
  165. # [13:56] <Philip`> since the others were for temporary experiments
  166. # [13:57] <Philip`> I think the BS code is still included and should work better than the 0.11 release, though I could be wrong about that
  167. # [13:58] <Huvet> seems I still get the same error there
  168. # [13:58] <Philip`> but there are fundamental problems in BS that mean it can't work properly in html5lib, and nobody has been interested in spending a great deal of effort on it
  169. # [13:58] <Huvet> but with an extra DataLossWarning
  170. # [13:58] <Huvet> I'll just use something else then I guess
  171. # [13:59] <Philip`> Okay, so maybe it doesn't work much better than the 0.11 release :-(
  172. # [14:00] <Philip`> lxml is usually the recommended treebuilder
  173. # [14:01] <Huvet> ok, i saw the remark in the docs about lxml being an "excellent library" :)
  174. # [14:01] <Huvet> or something in those terms
  175. # [14:08] <Huvet> oh great, the lxml parser crashes on those sites too :(
  176. # [14:08] <Philip`> Hmm, seems to work okay for me with lxml
  177. # [14:09] <Philip`> (I can't test BS yet since I don't have it installed)
  178. # [14:09] <Huvet> are you parsning http://www.allehanda.se ?
  179. # [14:10] <Huvet> http://dpaste.com/123628/
  180. # [14:10] <Philip`> No, because that timed out when I first tried downloading it
  181. # [14:10] <Philip`> but now I see the problem :-/
  182. # [14:12] <Philip`> ihatexml.py lives up to its name
  183. # [14:12] <Huvet> heh, great name for a file, what does it do?
  184. # [14:12] * Joins: gratz|home (n=gratz@81.106.148.238)
  185. # [14:15] <Philip`> http://code.google.com/p/html5lib/issues/detail?id=125
  186. # [14:15] <Huvet> ah, that seems it
  187. # [14:15] <Philip`> It tries to modify the names returned by the HTML parser so they're compatible with APIs that enforce XML's name requirements
  188. # [14:16] <Philip`> (and similar things)
  189. # [14:20] <Philip`> Huvet: <a><div><div><a> seems to be the pattern the BS treebuilder dislikes
  190. # [14:21] <Huvet> heh, I can understand that
  191. # [14:23] <Philip`> Huvet: It's the same as http://code.google.com/p/html5lib/issues/detail?id=80
  192. # [14:23] <Huvet> ah, good detective work
  193. # [14:24] * Philip` should have remembered it sooner because he looked into that bug when it was new
  194. # [14:24] * Joins: harig (i=HariG@121.245.108.149)
  195. # [14:25] <Philip`> (At least that's the problem on www.unt.se, I assume the others are the same)
  196. # [14:26] * Joins: jonpierce (n=jonpierc@64.119.130.114)
  197. # [14:36] * Quits: hobertoAtWork2 (n=hobertoa@gw1.mcgraw-hill.com) (Read error: 104 (Connection reset by peer))
  198. # [14:36] * Joins: hobertoAtWork (n=hobertoa@gw1.mcgraw-hill.com)
  199. # [14:48] * Joins: MikeSmith (n=MikeSmit@114.49.0.152)
  200. # [15:00] * Quits: jonpierce (n=jonpierc@64.119.130.114)
  201. # [15:12] * Quits: JoePeck (n=JoePeck@cpe-74-69-85-249.rochester.res.rr.com)
  202. # [15:27] * Quits: harig (i=HariG@121.245.108.149) (Read error: 104 (Connection reset by peer))
  203. # [15:27] * Quits: danbri (n=danbri@unaffiliated/danbri) (Read error: 113 (No route to host))
  204. # [15:32] * Quits: gavin_ (n=gavin@firefox/developer/gavin) (Remote closed the connection)
  205. # [15:33] * Joins: gavin_ (n=gavin@firefox/developer/gavin)
  206. # [15:38] * Joins: jonpierce (n=jonpierc@64.119.130.114)
  207. # [15:38] * Joins: openstandards (n=openstan@78.143.215.162)
  208. # [15:50] * Joins: fishd_ (n=darin@c-98-207-16-168.hsd1.ca.comcast.net)
  209. # [15:59] * Joins: hobertoAtWork2 (n=hobertoa@gw2.mcgraw-hill.com)
  210. # [16:02] * Joins: danbri (n=danbri@unaffiliated/danbri)
  211. # [16:03] * Quits: erlehmann (n=erlehman@1.106.113.82.net.de.o2.com) ("Ex-Chat")
  212. # [16:05] * Joins: vvv (n=vvv@213.181.10.212)
  213. # [16:09] * Quits: sebmarkbage (n=miranda@213.80.108.29) (Remote closed the connection)
  214. # [16:10] * Joins: hobertoAtWork3 (n=hobertoa@gw1.mcgraw-hill.com)
  215. # [16:14] * Quits: hobertoAtWork (n=hobertoa@gw1.mcgraw-hill.com) (Read error: 110 (Connection timed out))
  216. # [16:15] * Joins: Phae (n=phaeness@cpc2-acto9-0-0-cust364.brnt.cable.ntl.com)
  217. # [16:21] * Joins: myakura (n=myakura@p2197-ipbf7505marunouchi.tokyo.ocn.ne.jp)
  218. # [16:26] * Quits: hobertoAtWork2 (n=hobertoa@gw2.mcgraw-hill.com) (Read error: 110 (Connection timed out))
  219. # [16:41] * Joins: boogyman (n=chatzill@unaffiliated/boogyman)
  220. # [16:51] * Joins: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  221. # [16:53] * Joins: sebmarkbage (n=miranda@213.80.108.29)
  222. # [16:54] * Quits: wakaba_0 (n=wakaba_@122x221x184x68.ap122.ftth.ucom.ne.jp) (Read error: 110 (Connection timed out))
  223. # [16:57] * Joins: KrocCamen (n=kroc@cpc3-lanc2-0-0-cust544.brig.cable.ntl.com)
  224. # [17:02] * Quits: fishd_ (n=darin@c-98-207-16-168.hsd1.ca.comcast.net) (Read error: 110 (Connection timed out))
  225. # [17:13] * Joins: taf2 (n=taf2@98.117.216.229)
  226. # [17:42] * Joins: JoePeck (n=JoePeck@cpe-74-65-7-212.rochester.res.rr.com)
  227. # [17:52] * Quits: paul_irish (n=paul_iri@c-71-192-163-128.hsd1.nh.comcast.net) (Remote closed the connection)
  228. # [17:55] * Quits: Phae (n=phaeness@cpc2-acto9-0-0-cust364.brnt.cable.ntl.com)
  229. # [17:57] * Quits: Michelangelo (n=Michelan@93-42-96-106.ip86.fastwebnet.it) (Remote closed the connection)
  230. # [18:06] * Quits: taf2 (n=taf2@98.117.216.229)
  231. # [18:23] * Joins: paul_irish (n=paul_iri@64.119.130.114)
  232. # [18:26] * jarib_ is now known as jarib
  233. # [18:27] * Joins: taf2 (n=taf2@151.196.60.88)
  234. # [18:31] * Quits: myakura (n=myakura@p2197-ipbf7505marunouchi.tokyo.ocn.ne.jp) ("Leaving...")
  235. # [18:45] * Joins: dbaron (n=dbaron@c-98-234-51-190.hsd1.ca.comcast.net)
  236. # [18:46] * Quits: jonpierce (n=jonpierc@64.119.130.114)
  237. # [19:03] * Joins: erlehmann (n=erlehman@1.106.113.82.net.de.o2.com)
  238. # [19:18] * Quits: starjive (i=beos@81-233-16-19-no30.tbcn.telia.com) (Read error: 110 (Connection timed out))
  239. # [19:21] * Joins: starjive (i=beos@81-233-16-19-no30.tbcn.telia.com)
  240. # [19:32] * Quits: Amorphous (i=jan@unaffiliated/amorphous) (Read error: 104 (Connection reset by peer))
  241. # [19:32] * Quits: MikeSmith (n=MikeSmit@114.49.0.152) (Read error: 145 (Connection timed out))
  242. # [19:34] * Quits: KrocCamen (n=kroc@cpc3-lanc2-0-0-cust544.brig.cable.ntl.com)
  243. # [19:37] * Joins: KrocCamen (n=kroc@cpc3-lanc2-0-0-cust544.brig.cable.ntl.com)
  244. # [19:39] * Joins: rauchg (n=rauchg@32.177.130.23)
  245. # [19:50] * Joins: cohitre (n=cohitre@64-40-56-46-dsl.itltd.net)
  246. # [19:50] * Parts: cohitre (n=cohitre@64-40-56-46-dsl.itltd.net)
  247. # [19:51] * Joins: Amorphous (i=jan@unaffiliated/amorphous)
  248. # [19:54] * Quits: starjive (i=beos@81-233-16-19-no30.tbcn.telia.com) (Read error: 110 (Connection timed out))
  249. # [19:57] * Quits: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net) (Read error: 110 (Connection timed out))
  250. # [20:14] * boogyman is now known as boog|afk
  251. # [20:19] * Joins: fishd_ (n=darin@c-98-207-16-168.hsd1.ca.comcast.net)
  252. # [20:27] * Quits: KrocCamen (n=kroc@cpc3-lanc2-0-0-cust544.brig.cable.ntl.com)
  253. # [20:27] * Joins: KrocCamen (n=kroc@cpc3-lanc2-0-0-cust544.brig.cable.ntl.com)
  254. # [20:34] * Joins: zalan (n=zalan@catv-89-135-144-122.catv.broadband.hu)
  255. # [20:46] * Quits: maikmerten (n=maikmert@77.132.12.215) (Remote closed the connection)
  256. # [20:54] * Quits: fishd_ (n=darin@c-98-207-16-168.hsd1.ca.comcast.net) (Read error: 145 (Connection timed out))
  257. # [21:07] * Joins: jonpierce (n=jonpierc@64.119.130.114)
  258. # [21:23] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
  259. # [21:23] * Quits: ROBOd (n=robod@89.122.216.38) ("http://www.robodesign.ro")
  260. # [21:32] * Quits: taf2 (n=taf2@151.196.60.88) (Read error: 131 (Connection reset by peer))
  261. # [21:33] * Joins: taf2 (n=taf2@static-151-196-60-88.balt.east.verizon.net)
  262. # [21:34] * Joins: gunderwonder (n=gunderwo@89.80-202-84.nextgentel.com)
  263. # [21:35] * Joins: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  264. # [21:43] * Quits: KrocCamen (n=kroc@cpc3-lanc2-0-0-cust544.brig.cable.ntl.com)
  265. # [21:44] * Quits: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  266. # [21:47] * Joins: nessy (n=Adium@203-214-159-50.dyn.iinet.net.au)
  267. # [21:55] * Joins: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  268. # [21:58] * Quits: jonpierce (n=jonpierc@64.119.130.114)
  269. # [21:58] <Huvet> heh, next horrendous HTML that crashes the html5 parser: http://7-harad.nu/
  270. # [21:59] <Philip`> What error message do you get?
  271. # [21:59] <Huvet> http://dpaste.com/123783/
  272. # [21:59] * Joins: taf2_ (n=taf2@static-151-196-60-88.balt.east.verizon.net)
  273. # [22:00] * Quits: cpharmston (n=cpharmst@pool-173-66-156-203.washdc.fios.verizon.net) ("Leaving.")
  274. # [22:02] <Philip`> Hmm
  275. # [22:02] <Philip`> What treebuilder are you using?
  276. # [22:02] <Huvet> dom
  277. # [22:03] <Huvet> beautifulsoup crashed on some sites, lxml on some other ones, so I'm on dom now :)
  278. # [22:04] <Huvet> I guess it's all the advertising code on these sites that make them so badly formatted
  279. # [22:04] <Philip`> http://code.google.com/p/html5lib/issues/detail?id=123 sounds like it could be relevant
  280. # [22:05] <Philip`> but I'm not really sure
  281. # [22:05] <Philip`> It'd be good if you could produce a minimal testcase
  282. # [22:05] <Huvet> yeah, I'm not sure how to go about that... save the sourcecode locally and start stipping stuff out?
  283. # [22:05] <Philip`> by starting with the markup from the site that causes problems, then deleting half of it and seeing if the problem is still there, else delete the other half instead, and repeat until there's not much left
  284. # [22:06] <Philip`> Yeah, basically what you said :-)
  285. # [22:06] <Huvet> ok, I'll get to work right away
  286. # [22:06] <Huvet> :)
  287. # [22:07] * Joins: jonpierce (n=jonpierc@64.119.130.114)
  288. # [22:08] * Quits: taf2 (n=taf2@static-151-196-60-88.balt.east.verizon.net) (Read error: 110 (Connection timed out))
  289. # [22:21] * Quits: taf2_ (n=taf2@static-151-196-60-88.balt.east.verizon.net) (Read error: 110 (Connection timed out))
  290. # [22:26] * Joins: KrocCamen (n=kroc@cpc3-lanc2-0-0-cust544.brig.cable.ntl.com)
  291. # [22:27] <Huvet> oh, there's a new error
  292. # [22:27] <Huvet> http://dpaste.com/123797/
  293. # [22:28] <Huvet> but one thing at the time
  294. # [22:30] <Philip`> Testing on real content is a good way to find bugs :-)
  295. # [22:31] * Philip` wonders how many pages Huvet is running through it
  296. # [22:31] <Huvet> 351 :)
  297. # [22:31] <Huvet> I'm scaping swedish news sites for RSS urls
  298. # [22:32] <Huvet> seems that's a bit harder than I first thouht :P
  299. # [22:32] <Huvet> seems that's a bit harder than I first thouht :
  300. # [22:33] <Huvet> this is the smallest I can get it: <table><td><span><font></span><span>
  301. # [22:33] <Huvet> first one
  302. # [22:33] * Quits: workmad3 (n=davidwor@cpc3-bagu10-0-0-cust651.1-3.cable.virginmedia.com)
  303. # [22:35] <Huvet> ehm... strange... the other error is if I have a file with just <table> in it :)
  304. # [22:36] * Quits: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net) (Read error: 145 (Connection timed out))
  305. # [22:39] <Philip`> That's quite minimal :-)
  306. # [22:39] * Joins: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  307. # [22:39] * Quits: mlpug (n=mlpug@a88-115-164-40.elisa-laajakaista.fi) (Remote closed the connection)
  308. # [22:43] <Philip`> Huvet: I think you could fix the processEOF easily by removing the 'token' in html5parser.py lines 1689, 1692 (the processEOF declaration/call)
  309. # [22:44] <Philip`> but it'd be good to post a new issue on the Google Code site, so someone can add a test case and fix the code and make sure it works
  310. # [22:45] <Philip`> and also for the other bug (which looks like a scary adoption agency thing)
  311. # [22:46] <Huvet> I will
  312. # [22:48] * Joins: tndH (n=Rob@cpc2-leed18-0-0-cust427.leed.cable.ntl.com)
  313. # [23:02] * riven` is now known as riven
  314. # [23:04] <Huvet> here's the first bug: http://code.google.com/p/html5lib/issues/detail?id=126
  315. # [23:05] * Quits: KrocCamen (n=kroc@cpc3-lanc2-0-0-cust544.brig.cable.ntl.com)
  316. # [23:09] * Joins: cpharmston (n=cpharmst@pool-173-66-156-203.washdc.fios.verizon.net)
  317. # [23:09] <Huvet> and here's the other one: http://code.google.com/p/html5lib/issues/detail?id=127
  318. # [23:11] * Joins: ttepasse (n=ttepas--@dslb-084-060-060-034.pools.arcor-ip.net)
  319. # [23:11] * Parts: cpharmston (n=cpharmst@pool-173-66-156-203.washdc.fios.verizon.net)
  320. # [23:12] <AryehGregor> "Such a subset does not, in general, include inline script elements."
  321. # [23:12] * Quits: rauchg (n=rauchg@32.177.130.23) (Read error: 110 (Connection timed out))
  322. # [23:12] <AryehGregor> Why can't you include inline script in polyglots? Can't you fudge things using <!CDATA[ or whatnot?
  323. # [23:15] * Quits: dglazkov (n=dglazkov@c-67-188-0-62.hsd1.ca.comcast.net)
  324. # [23:22] * Quits: Maurice (i=copyman@94.213.72.212)
  325. # [23:31] * Joins: fishd_ (n=darin@c-98-207-16-168.hsd1.ca.comcast.net)
  326. # [23:45] * Quits: fishd_ (n=darin@c-98-207-16-168.hsd1.ca.comcast.net) (Read error: 145 (Connection timed out))
  327. # [23:52] * Quits: zalan (n=zalan@catv-89-135-144-122.catv.broadband.hu) (Read error: 110 (Connection timed out))
  328. # [23:57] * Quits: dbaron (n=dbaron@c-98-234-51-190.hsd1.ca.comcast.net) ("8403864 bytes have been tenured, next gc will be global.")
  329. # Session Close: Mon Nov 23 00:00:00 2009

The end :)