/irc-logs / freenode / #whatwg / 2009-04-09 / end

Options:

  1. # Session Start: Thu Apr 09 00:00:00 2009
  2. # Session Ident: #whatwg
  3. # [00:04] * Quits: MikeSmith (n=MikeSmit@EM114-48-167-26.pool.e-mobile.ne.jp) (Read error: 110 (Connection timed out))
  4. # [00:05] * Quits: franksalim (n=frank@adsl-76-236-71-187.dsl.pltn13.sbcglobal.net) ("Leaving")
  5. # [00:07] <annevk5> it's funny that when ubiquity-xforms is put to test it fails
  6. # [00:07] <annevk5> the disturbing thing is of course that some day they get all the details right (if they at all care) and claim we should support it because it's simple
  7. # [00:10] <Hixie> the day they get all the details right will be the day it looks like wf2 :-)
  8. # [00:10] * Hixie ducks
  9. # [00:11] <Hixie> (wf2 is what resulted from our attempt to adapt xforms for html, so i'm not really joking)
  10. # [00:15] <gsnedders> Hixie: What you need to fight spam is escape @name and @id on the comment form using entities. Spam bots are dumb.
  11. # [00:15] <gsnedders> (That's basically all the spam protection on my blog. Works awesomely.)
  12. # [00:16] <Hixie> i don't particularly care about fighting spam, i was just enabling the plugins dreamhost asked me to enable
  13. # [00:17] <gsnedders> Hixie: What? Is spam like CSRF and not a problem for the web?
  14. # [00:20] * Joins: Lachy (n=Lachlan@85-189-168-181.glemnet.managedbroadband.co.uk)
  15. # [00:26] * Joins: mpilgrim (n=mpilgrim@rrcs-96-10-240-189.midsouth.biz.rr.com)
  16. # [00:31] * Joins: roc (n=roc@130.216.53.168)
  17. # [00:36] * Quits: mgrdcm (n=mgrdcm@65.111.247.194)
  18. # [00:43] * Quits: heycam (n=cam@210-84-43-129.dyn.iinet.net.au) ("bye")
  19. # [00:51] * Quits: roc (n=roc@130.216.53.168)
  20. # [00:58] * Joins: doublec (n=doublec@202.0.36.64)
  21. # [01:04] <takkaria> neither of Philip's demos worked for me
  22. # [01:05] <takkaria> (the xforms ones)
  23. # [01:08] <takkaria> oh, the first one seems to do something now
  24. # [01:08] * Joins: aroben_ (n=aroben@unaffiliated/aroben)
  25. # [01:13] * Quits: Lachy (n=Lachlan@85-189-168-181.glemnet.managedbroadband.co.uk) ("This computer has gone to sleep")
  26. # [01:20] * Quits: tndH (n=Rob@james-baillie-pc083-014.student-halls.leeds.ac.uk) ("ChatZilla 0.9.84-rdmsoft [XULRunner 1.9.0.1/2008072406]")
  27. # [01:24] * Quits: aroben (n=aroben@unaffiliated/aroben) (Read error: 110 (Connection timed out))
  28. # [01:25] * Joins: roc (n=roc@130.216.53.168)
  29. # [01:26] <Philip`> takkaria: You have to wait for it to load a zillion script files
  30. # [01:26] <Philip`> (Presumably it'd be possible to pack them all into a single .js file if you cared about performance)
  31. # [01:27] * Philip` hopes he is actually testing it correctly, and didn't forget to change some of the prefixes or something
  32. # [01:29] * Joins: MikeSmith (n=MikeSmit@EM114-48-135-234.pool.e-mobile.ne.jp)
  33. # [01:30] * Parts: annevk5 (n=annevk@53568A94.cable.casema.nl)
  34. # [01:35] * Joins: aroben__ (n=aroben@unaffiliated/aroben)
  35. # [01:36] * Quits: MikeSmith (n=MikeSmit@EM114-48-135-234.pool.e-mobile.ne.jp) ("Tomorrow to fresh woods, and pastures new.")
  36. # [01:38] * Joins: MikeSmith (n=MikeSmit@EM114-48-135-234.pool.e-mobile.ne.jp)
  37. # [01:43] * Joins: LeifHS_ (n=chatzill@cm-84.208.110.159.getinternet.no)
  38. # [01:43] * Parts: LeifHS_ (n=chatzill@cm-84.208.110.159.getinternet.no)
  39. # [01:44] * Joins: LeifHS (n=chatzill@cm-84.208.110.159.getinternet.no)
  40. # [01:48] <LeifHS> annevk5: Yes, <embed> is like <source> - kind of. However, <source> is part of the <video> element - it is an extension of <video>. What I meant was that WebKit is treating <embed> as if is part of <object> (the same way that <source> is part of <video>).
  41. # [01:48] * karlcow always wonders why talented people becomes suddenly blind when they are passionate about their tech
  42. # [01:50] * Quits: dglazkov (n=dglazkov@nat/google/x-6318337d13626020)
  43. # [01:53] * Quits: aroben_ (n=aroben@unaffiliated/aroben) (Connection timed out)
  44. # [01:58] <LeifHS> zcorpan: Is it possible for you to create an arch typical exsample of what you mean?
  45. # [01:58] * Joins: heycam (n=cam@zot.infotech.monash.edu.au)
  46. # [01:59] * Quits: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
  47. # [01:59] * Joins: webben (n=benh@dip5-fw.corp.ukl.yahoo.com)
  48. # [02:03] * Quits: onar (n=onar@17.226.23.135) (Read error: 60 (Operation timed out))
  49. # [02:03] * Quits: slightlyoff (n=slightly@nat/google/x-1c2086cb2e73169f) (Read error: 54 (Connection reset by peer))
  50. # [02:03] * Joins: slightlyoff (n=slightly@nat/google/x-bd1a81792fa8d08a)
  51. # [02:06] * Quits: MikeSmith (n=MikeSmit@EM114-48-135-234.pool.e-mobile.ne.jp) (Read error: 110 (Connection timed out))
  52. # [02:07] * Quits: jwalden (n=waldo@corp-241.mountainview.mozilla.com) ("Reconnecting…")
  53. # [02:08] * Quits: dbaron (n=dbaron@corp-241.mountainview.mozilla.com) ("8403864 bytes have been tenured, next gc will be global.")
  54. # [02:08] * Joins: dbaron (n=dbaron@corp-241.mountainview.mozilla.com)
  55. # [02:09] * Quits: dolske (n=dolske@firefox/developer/dolske) (Remote closed the connection)
  56. # [02:09] * Joins: dolske (n=dolske@corp-241.mountainview.mozilla.com)
  57. # [02:11] * Quits: arun_ (n=arun@corp-243.mountainview.mozilla.com) (Read error: 113 (No route to host))
  58. # [02:13] * Joins: jwalden (n=waldo@corp-241.mountainview.mozilla.com)
  59. # [02:13] * Quits: grimboy (n=grimboy@78-86-152-156.zone2.bethere.co.uk) ("Lost terminal")
  60. # [02:14] * Joins: arun_ (n=arun@corp-243.mountainview.mozilla.com)
  61. # [02:19] * Quits: jorlow (n=jorlow@nat/google/x-79940e481de8bf6e) (Read error: 110 (Connection timed out))
  62. # [02:20] * Joins: MikeSmith (n=MikeSmit@EM114-48-137-135.pool.e-mobile.ne.jp)
  63. # [02:20] * Quits: arun_ (n=arun@corp-243.mountainview.mozilla.com)
  64. # [02:20] * Quits: fishd (n=darin@nat/google/x-6ead278d6495a049) (Read error: 110 (Connection timed out))
  65. # [02:20] * Quits: jorlow_ (n=jorlow@nat/google/x-843d38b4f07f4091) (Read error: 110 (Connection timed out))
  66. # [02:21] * Quits: sicking (n=chatzill@corp-241.mountainview.mozilla.com) (Read error: 113 (No route to host))
  67. # [02:31] * Quits: roc (n=roc@130.216.53.168)
  68. # [02:33] * Quits: weinig (n=weinig@17.246.17.225)
  69. # [02:37] * Quits: slightlyoff (n=slightly@nat/google/x-bd1a81792fa8d08a)
  70. # [02:40] * Quits: bgalbraith (n=bgalbrai@c-71-202-109-116.hsd1.ca.comcast.net)
  71. # [02:54] * Quits: aroben__ (n=aroben@unaffiliated/aroben) (Read error: 110 (Connection timed out))
  72. # [02:58] * Parts: LeifHS (n=chatzill@cm-84.208.110.159.getinternet.no)
  73. # [02:59] * Joins: onar (n=onar@c-98-234-65-251.hsd1.ca.comcast.net)
  74. # [03:11] * Quits: jcranmer (n=jcranmer@ltsp2.csl.tjhsst.edu) (Read error: 104 (Connection reset by peer))
  75. # [03:12] * Quits: olliej (n=oliver@17.203.15.161) (Read error: 104 (Connection reset by peer))
  76. # [03:13] * Joins: jcranmer (n=jcranmer@ltsp2.csl.tjhsst.edu)
  77. # [03:14] * Joins: olliej (n=oliver@17.203.15.161)
  78. # [03:29] * Quits: MikeSmith (n=MikeSmit@EM114-48-137-135.pool.e-mobile.ne.jp) ("Tomorrow to fresh woods, and pastures new.")
  79. # [03:32] * Joins: MikeSmith (n=MikeSmit@tea12.w3.mag.keio.ac.jp)
  80. # [03:46] * Quits: smedero (n=smedero@pia145-154.pioneernet.net)
  81. # [03:52] * Quits: dave_levin (n=dave_lev@72.14.227.1)
  82. # [03:55] * Joins: doublec_ (n=doublec@202.0.36.64)
  83. # [03:57] * Quits: onar (n=onar@c-98-234-65-251.hsd1.ca.comcast.net)
  84. # [03:58] * Quits: doublec (n=doublec@202.0.36.64) (Read error: 110 (Connection timed out))
  85. # [04:05] * Quits: mpilgrim (n=mpilgrim@rrcs-96-10-240-189.midsouth.biz.rr.com) (Read error: 104 (Connection reset by peer))
  86. # [04:07] * Joins: Niictar_ (n=ritz@S010600183f550ae0.cg.shawcable.net)
  87. # [04:10] * Quits: Niictar24 (n=ritz@S010600183f550ae0.cg.shawcable.net) (Read error: 60 (Operation timed out))
  88. # [04:13] * Joins: roc (n=roc@202.0.36.64)
  89. # [04:39] * Joins: olliej_ (n=oliver@17.203.15.161)
  90. # [04:39] * Quits: olliej (n=oliver@17.203.15.161) (Read error: 104 (Connection reset by peer))
  91. # [04:40] * Joins: dglazkov (n=dglazkov@c-98-207-88-44.hsd1.ca.comcast.net)
  92. # [04:47] * Joins: onar (n=onar@c-98-234-65-251.hsd1.ca.comcast.net)
  93. # [05:04] * Quits: dglazkov (n=dglazkov@c-98-207-88-44.hsd1.ca.comcast.net)
  94. # [05:15] * Joins: dglazkov (n=dglazkov@c-98-207-88-44.hsd1.ca.comcast.net)
  95. # [05:16] * Quits: dbaron (n=dbaron@corp-241.mountainview.mozilla.com) ("8403864 bytes have been tenured, next gc will be global.")
  96. # [05:30] * Quits: dolske (n=dolske@firefox/developer/dolske) (Read error: 110 (Connection timed out))
  97. # [05:30] * Joins: weinig (n=weinig@c-67-180-35-124.hsd1.ca.comcast.net)
  98. # [05:32] * doublec_ is now known as doublec
  99. # [05:33] * Joins: aroben__ (n=aroben@unaffiliated/aroben)
  100. # [05:35] * Joins: dave_levin (n=dave_lev@72.14.224.1)
  101. # [05:40] * Quits: jwalden (n=waldo@corp-241.mountainview.mozilla.com) ("->home")
  102. # [05:45] * Joins: hdh (n=hdh@58.187.19.53)
  103. # [05:50] * Quits: weinig (n=weinig@c-67-180-35-124.hsd1.ca.comcast.net)
  104. # [05:52] * Joins: bgalbraith (n=bgalbrai@c-71-202-109-116.hsd1.ca.comcast.net)
  105. # [05:54] * Joins: dolske (n=dolske@c-76-103-40-203.hsd1.ca.comcast.net)
  106. # [05:57] * Joins: annevk5 (n=annevk@aas.attingo.nl)
  107. # [05:57] * Joins: annevk2 (n=opera@77.241.230.242)
  108. # [06:00] * olliej_ is now known as olliej
  109. # [06:06] * Quits: dglazkov (n=dglazkov@c-98-207-88-44.hsd1.ca.comcast.net)
  110. # [06:07] * Quits: MikeSmith (n=MikeSmit@tea12.w3.mag.keio.ac.jp) ("Tomorrow to fresh woods, and pastures new.")
  111. # [06:10] * Joins: MikeSmith (n=MikeSmit@EM114-48-42-229.pool.e-mobile.ne.jp)
  112. # [06:12] * Joins: dglazkov (n=dglazkov@c-98-207-88-44.hsd1.ca.comcast.net)
  113. # [06:16] * Joins: smedero (n=smedero@pia145-154.pioneernet.net)
  114. # [06:26] * Quits: dglazkov (n=dglazkov@c-98-207-88-44.hsd1.ca.comcast.net)
  115. # [06:28] * Joins: billyjackass (n=MikeSmit@dhcp-246-223.mag.keio.ac.jp)
  116. # [06:34] * Quits: annevk2 (n=opera@77.241.230.242)
  117. # [06:34] * Quits: annevk5 (n=annevk@aas.attingo.nl)
  118. # [06:48] * Quits: MikeSmith (n=MikeSmit@EM114-48-42-229.pool.e-mobile.ne.jp) (Read error: 110 (Connection timed out))
  119. # [06:49] * billyjackass is now known as MikeSmith
  120. # [07:17] * Quits: aroben__ (n=aroben@unaffiliated/aroben) (Read error: 104 (Connection reset by peer))
  121. # [07:17] * Joins: aroben__ (n=aroben@unaffiliated/aroben)
  122. # [07:20] * Joins: dbaron (n=dbaron@c-98-234-51-190.hsd1.ca.comcast.net)
  123. # [07:31] * Quits: doublec (n=doublec@202.0.36.64) ("Leaving")
  124. # [07:51] * Quits: bgalbraith (n=bgalbrai@c-71-202-109-116.hsd1.ca.comcast.net)
  125. # [07:52] * Quits: Niictar_ (n=ritz@S010600183f550ae0.cg.shawcable.net) (Read error: 60 (Operation timed out))
  126. # [08:06] * Joins: harig (n=opera@59.90.71.35)
  127. # [08:11] * Joins: zalan (n=kvirc@catv-80-99-193-98.catv.broadband.hu)
  128. # [08:19] * Quits: olliej (n=oliver@17.203.15.161)
  129. # [08:28] * Quits: onar (n=onar@c-98-234-65-251.hsd1.ca.comcast.net)
  130. # [08:34] * Quits: roc (n=roc@202.0.36.64)
  131. # [08:40] * Joins: roc (n=roc@202.0.36.64)
  132. # [08:42] * Joins: Maurice (n=ano@a80-101-46-164.adsl.xs4all.nl)
  133. # [08:47] * Joins: onar (n=onar@c-98-234-65-251.hsd1.ca.comcast.net)
  134. # [08:49] * Quits: onar (n=onar@c-98-234-65-251.hsd1.ca.comcast.net) (Client Quit)
  135. # [08:56] * Quits: dbaron (n=dbaron@c-98-234-51-190.hsd1.ca.comcast.net) ("8403864 bytes have been tenured, next gc will be global.")
  136. # [08:59] * Joins: olliej (n=oliver@17.203.15.161)
  137. # [09:03] * Joins: aroben (n=aroben@unaffiliated/aroben)
  138. # [09:05] * Joins: pesla (n=retep@procurios.xs4all.nl)
  139. # [09:14] * Joins: Lachy (n=Lachlan@85-189-168-181.glemnet.managedbroadband.co.uk)
  140. # [09:15] * Joins: ap (n=ap@194.154.88.36)
  141. # [09:16] * Quits: drry (n=drry@dd25.opt2.point.ne.jp) ("Server Configuration changed; reconnect")
  142. # [09:16] * Joins: drry (n=drry@dd25.opt2.point.ne.jp)
  143. # [09:18] * Quits: aroben__ (n=aroben@unaffiliated/aroben) (Read error: 110 (Connection timed out))
  144. # [09:20] * Quits: Lachy (n=Lachlan@85-189-168-181.glemnet.managedbroadband.co.uk) ("This computer has gone to sleep")
  145. # [09:24] * Quits: aroben (n=aroben@unaffiliated/aroben) (Read error: 104 (Connection reset by peer))
  146. # [09:39] * Hixie now has a basical proof of concept of his new <datagrid> API design
  147. # [09:40] <Hixie> now i just have to check how plausible it is
  148. # [09:40] <Hixie> from the authoring side...
  149. # [09:45] <Hixie> this is one of the first times that i've designed something for which i think a structure a bit like a B-tree would actually be a pretty good fit
  150. # [09:48] * Quits: smedero (n=smedero@pia145-154.pioneernet.net)
  151. # [09:53] <MikeSmith> wow
  152. # [09:53] <MikeSmith> B-tree
  153. # [09:53] <MikeSmith> that's a blast from the past
  154. # [09:54] <MikeSmith> Hixie: seems like datagrid is ultimately going to sink or swim based on how much implementor commitment there is
  155. # [09:55] <MikeSmith> and given that there's not be much implementor commitment forthcoming so far, I wonder where that's going to leave it
  156. # [09:56] <MikeSmith> if we really want to get to LC by this Fall
  157. # [09:56] * Joins: Mau`werk (n=ano@a80-101-46-164.adsl.xs4all.nl)
  158. # [09:56] * Quits: Maurice (n=ano@a80-101-46-164.adsl.xs4all.nl) (Read error: 54 (Connection reset by peer))
  159. # [09:56] * Quits: olliej (n=oliver@17.203.15.161) (Read error: 104 (Connection reset by peer))
  160. # [09:56] <Hixie> my impression is that browser vendors seem in agreement that it would be a useful feature, but that it isn't a priority
  161. # [09:56] * Joins: olliej (n=oliver@17.203.15.161)
  162. # [09:59] <MikeSmith> Hixie: I think it's something more than just a useful feature
  163. # [10:00] <Hixie> oh?
  164. # [10:00] <MikeSmith> to me, it's something that would appeal quite a bit to Web developers
  165. # [10:00] <Hixie> that's what i mean by "useful feature" :-)
  166. # [10:00] * Joins: zcorpan (n=zcorpan@c83-252-196-43.bredband.comhem.se)
  167. # [10:01] <Hixie> note that by LC we don't have to have commitements to implement, only agreement that the features should be in the language
  168. # [10:01] <MikeSmith> Hixie: it's the "isn't a priority" to browser developers that's the big stumbling block
  169. # [10:01] <Hixie> in fact it's only when entering CR that we have to list features that might be at risk
  170. # [10:01] <Hixie> and even then we have until REC to see them implemented
  171. # [10:02] <Hixie> so lack of implementation commitements is not a big deal so long as implementors don't disagree that it would be good to implement eventually
  172. # [10:02] <Hixie> commitments
  173. # [10:02] <MikeSmith> I guess
  174. # [10:03] <MikeSmith> I would personally rather not see us take a particular feature into CR without a clear commitment from multiple browser vendors to implement it
  175. # [10:03] <Hixie> sure
  176. # [10:04] <MikeSmith> maybe we need to light a fire under some asses as far as datagrid
  177. # [10:04] <hsivonen> myvidoop.com no longer shows an EV cert. Am I being MITMed?
  178. # [10:04] <MikeSmith> e.g., threat "this is going to be removed from HTML5 unless we get a clear indication of vendor support"
  179. # [10:04] <MikeSmith> hsivonen: MITMed
  180. # [10:05] <MikeSmith> ?
  181. # [10:05] <hsivonen> MikeSmith: Man In The Middle
  182. # [10:05] * Hixie watches the EV cert security model fall apart
  183. # [10:06] <zcorpan> hsivonen: do what normal people do and discard any dialogs and don't notice the lack of dialogs
  184. # [10:06] <MikeSmith> EV cert, despite whatever faults it has, is approximately one gazillion times better than non-EV certs
  185. # [10:07] <Hixie> for cert vendors, sure
  186. # [10:07] <hsivonen> MikeSmith: how so. It's causing me distress now. Probably for no good reason.
  187. # [10:07] <hsivonen> If I'm being MITMed, how do I email vidoop, when I should assume that by now the MX record in my DNS cache is poisoned as well?
  188. # [10:08] * Quits: heycam (n=cam@zot.infotech.monash.edu.au) ("bye")
  189. # [10:08] <Hixie> certs in general can answer the question "is this who i think it is", which is never a question users ask. They ask (implicitly) the question "am I being attacked", which is not possible to answer using any kind of SSL cert that I know of, EV or otherwise.
  190. # [10:09] <Hixie> (and EV certs can answer the first question with $450 more confidence than non-EV certs, that's about it.)
  191. # [10:09] <MikeSmith> Hixie: EV certs have documented rules for identity
  192. # [10:09] <Hixie> right. "$450 more confidence".
  193. # [10:09] <MikeSmith> when you pay (whatever dollar amount) for a non-EV cert, you are a sucker
  194. # [10:10] <MikeSmith> because you are paying for nothing
  195. # [10:10] <Hixie> you are paying for someone to sign it
  196. # [10:10] <MikeSmith> CAs are bound to do zero vetting otherwise
  197. # [10:10] <Philip`> You're paying for something that prevents passive network attackers from reading all the data send to/from your web site, which seems quite useful
  198. # [10:11] * Quits: roc (n=roc@202.0.36.64)
  199. # [10:11] <Philip`> *sent
  200. # [10:13] <Hixie> Philip`: you can get that with a self-signed cert
  201. # [10:13] <Philip`> Hixie: You can't since it won't work in Firefox 3
  202. # [10:13] <MikeSmith> the EV-cert work is similar to a lot of things in that it's really easy to make jackass comments about it from the outside after the fact, throwing rocks, without any understanding of the difficulties involved in ever having tried to do something like it all
  203. # [10:13] <Hixie> Philip`: that's a bug (it should encrypt but not show any ui about it)
  204. # [10:13] <Hixie> imho, anyway
  205. # [10:14] * Quits: webben (n=benh@dip5-fw.corp.ukl.yahoo.com) (Read error: 104 (Connection reset by peer))
  206. # [10:14] <Hixie> MikeSmith: as someone who is involved in the security work of three browsers, i feel pretty qualified in the subject. I'm fully aware that there has not been anything better proposed. That doesn't mean it's good.
  207. # [10:14] <hsivonen> I emailed vidoop. I wonder if they'll answer and with what answer.
  208. # [10:15] <hsivonen> Hixie: hey, that's the argument against the Public Suffic List
  209. # [10:15] <Hixie> i'm not saying we shouldn't have EVs
  210. # [10:15] <Hixie> and the public suffix list sucks too, yes
  211. # [10:15] <Hixie> :-)
  212. # [10:15] * zcorpan thinks they'll answer "What is MITM?"
  213. # [10:17] <Hixie> (and before someone thinks i'm only picking on technologies i'm not working on, html sucks in many ways too. It, like the other two, just happens to be the best we have in the real world.)
  214. # [10:17] <hsivonen> zcorpan: their core business is making people feel more secure, so one would hope they'd know what MITM is
  215. # [10:18] <MikeSmith> Hixie: yeah, the point is that there was not anything better proposed, and that coming up with anything at all involved burning up a lot of time negotiating things with the CAs, and not all browser projects cared to actually show up to bother to take the time to actually involve themselves actively in the discussions
  216. # [10:18] <Hixie> oh?
  217. # [10:18] <Hixie> who wasn't involved?
  218. # [10:19] <MikeSmith> Opera was involved, Microsoft was involved, George Staikos was involved
  219. # [10:20] * Quits: Mau`werk (n=ano@a80-101-46-164.adsl.xs4all.nl) ("Disconnected...")
  220. # [10:22] <Hixie> interesting
  221. # [10:22] <Hixie> anyway, bed time
  222. # [10:22] * Quits: ap (n=ap@194.154.88.36)
  223. # [10:23] <Hixie> nn
  224. # [10:24] * Quits: MikeSmith (n=MikeSmit@dhcp-246-223.mag.keio.ac.jp) ("Tomorrow to fresh woods, and pastures new.")
  225. # [10:29] * Joins: MikeSmith (n=MikeSmit@dhcp-246-223.mag.keio.ac.jp)
  226. # [10:31] <MikeSmith> hsivonen: about http://bugzilla.validator.nu/show_bug.cgi?id=437
  227. # [10:31] <MikeSmith> I'm trying to figure out which part of the code it is that adds the hyperlinking back to the spec
  228. # [10:32] <hsivonen> MikeSmith: nu.validator.spec.html5.Html5SpecBuilder
  229. # [10:32] * MikeSmith looks now
  230. # [10:33] * Joins: mat_t (n=mattomas@nat/canonical/x-3e68bedca63a67f8)
  231. # [10:36] * Joins: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
  232. # [10:36] <MikeSmith> hsivonen: also, seems to be some brokenness in currently linking
  233. # [10:37] <MikeSmith> test, with, e.g., <!DOCTYPE html><title>foo</title><img>
  234. # [10:37] <MikeSmith> links to <img> doc end up as:
  235. # [10:37] <MikeSmith> http://www.whatwg.org/specs/web-apps/current-work/#null
  236. # [10:37] <hsivonen> MikeSmith: that code is *very* brittle :-(
  237. # [10:38] <MikeSmith> hsivonen: i see
  238. # [10:39] <hsivonen> I really should get to fixing outstanding V.nu bugs, but I'm still blocked on eliminating a key memory management issue from the C++ version of the parser.
  239. # [10:39] * Joins: svl (n=chatzill@a194-109-2-36.dmn.xs4all.nl)
  240. # [10:40] <MikeSmith> hsivonen: yeah, understood
  241. # [10:41] <MikeSmith> I'm happy to help with, at the very least, dealing with some of the low-hanging fruit
  242. # [10:41] <MikeSmith> as far a v.nu bugs
  243. # [10:41] * zcorpan wonders how to ensure that text doesn't use non-XML 1.0 Char characters in a text-based CMS
  244. # [10:41] <hsivonen> MikeSmith: I very much appreciate your help. The recent fixes have been great.
  245. # [10:42] <hsivonen> zcorpan: run everything through a magic regexp?
  246. # [10:43] <zcorpan> i guess
  247. # [10:44] * Joins: ROBOd (n=robod@89.122.216.38)
  248. # [10:45] <Philip`> zcorpan: Remove all non-ASCII characters, and all below 0x20
  249. # [10:46] <Philip`> Alternatively: Don't try to stop people inputting non-XML 1.0 Char characters
  250. # [10:46] <jgraham> zcorpan: html5lib has a big regexp somewhere
  251. # [10:46] <Philip`> and just make sure you deal with the issue when outputting to XML
  252. # [10:46] <zcorpan> Philip`: i need non-ascii characters
  253. # [10:46] <jgraham> zcorpan: Why are you trying to produce AML with a text-based CMS?
  254. # [10:46] <jgraham> *XML
  255. # [10:46] <Philip`> (because otherwise you'll forget to validate some of the input, and get invalid text in your database, and then it'll be a pain to get rid of)
  256. # [10:47] <zcorpan> jgraham: i want to have an Atom feed and the CMS i have is text-based
  257. # [10:47] <jgraham> s/somewhere/in ihatexml.py/
  258. # [10:47] <zcorpan> lol
  259. # [10:47] <Philip`> Escape on output, it's the only way to be sure :-)
  260. # [10:48] <jgraham> zcorpan: I feel there must be a "now I have two problems" type comment in here somewhere
  261. # [10:49] <MikeSmith> heh
  262. # [10:52] * Joins: webben (n=benh@nat/yahoo/x-4ee0609040efaccc)
  263. # [11:00] * Quits: dave_levin (n=dave_lev@72.14.224.1)
  264. # [11:09] * Joins: heycam (n=cam@210-84-43-129.dyn.iinet.net.au)
  265. # [11:19] * Quits: mpt (n=mpt@canonical/launchpad/mpt) (Read error: 113 (No route to host))
  266. # [11:20] * Quits: zcorpan (n=zcorpan@c83-252-196-43.bredband.comhem.se)
  267. # [11:23] * Joins: mpt (n=mpt@canonical/launchpad/mpt)
  268. # [11:27] <MikeSmith> http://www.whatwg.org/specs/web-apps/current-work/multipage/editing.html#spelling-and-grammar-checking
  269. # [11:28] <MikeSmith> "The spellcheck attribute is an enumerated attribute whose keywords are the empty string, true and false."
  270. # [11:29] <MikeSmith> so <p spellcheck>foo</p> should be invalid, right?
  271. # [11:29] * Joins: jfkthame (n=jonathan@user-5447749d.wfd89a.dsl.pol.co.uk)
  272. # [11:29] <Philip`> That looks like an empty string to me
  273. # [11:29] <MikeSmith> but <p spellcheck="">foo</p> is valid
  274. # [11:29] <MikeSmith> Philip`: <p spellcheck> is the same as empty string?
  275. # [11:29] <Philip`> The tokeniser doesn't make any distinction between those two cases, unless I'm horribly mistaken
  276. # [11:29] <MikeSmith> OK
  277. # [11:29] <Philip`> and I'm pretty sure I'm not horribly mistaken
  278. # [11:29] <Philip`> though I could be wrong about that
  279. # [11:29] <jfkthame> Philip`: are you the one behind http://fonts.philip.html5.org/ by any chance?
  280. # [11:30] <Philip`> jfkthame: I am
  281. # [11:30] <MikeSmith> hsivonen: <p spellcheck> vs. <p spellcheck=""> ?
  282. # [11:30] <jfkthame> Philip`: cool page! however, there's a problem with at least one of the fonts
  283. # [11:30] <jfkthame> please see https://bugzilla.mozilla.org/show_bug.cgi?id=487549
  284. # [11:31] <Philip`> jfkthame: "Not authorized"
  285. # [11:31] <jfkthame> aaarrrgghhh, sorry....
  286. # [11:31] <Philip`> jfkthame: You could CC me on the bug if you want :-)
  287. # [11:32] <jfkthame> bugzilla id?
  288. # [11:32] <Philip`> philip.taylor@cl.cam.ac.uk
  289. # [11:32] <jgraham> Has Philip` inadvertently found yet another browser security bug
  290. # [11:32] <Philip`> If I had to guess wildly, it would be that it crashes on OS X
  291. # [11:32] <jfkthame> right - due to an invalid kern table in one of the subsetted fonts
  292. # [11:33] <jfkthame> you're cc'd
  293. # [11:35] <jfkthame> we could potentially do some more validation, but it's not feasible to exhaustively test fonts before handing them over to the OS or font rendering library, so if that isn't completely robust, bad fonts will always be a risk
  294. # [11:37] * Philip` wonders why his copy of the font doesn't look anything like a font, and then realises after several minutes that it's been gzipped
  295. # [11:39] <jfkthame> looking at the original PixAntiqua.ttf, i see that it has the apple-format kern table, and offhand it looks valid
  296. # [11:39] <jfkthame> i guess the subsetting code is at fault, then, when it strips out all the irrelevant kern pairs from the subset font
  297. # [11:44] <jfkthame> ah, looks like the problem is Kern.pm (from Font::TTF).... it always packs the kern table header in the old format, without checking the version number to see whether that's correct
  298. # [11:45] * Philip` was coming to the same conclusion :-)
  299. # [11:45] <Philip`> Is kern table version 1 documented somewhere?
  300. # [11:46] <jfkthame> http://developer.apple.com/textfonts/TTRefMan/RM06/Chap6kern.html
  301. # [11:46] * jgraham notes there seems very little point in the bug being marked security sensitive now since most of the information needed to reproduce it seems to be documented here :)
  302. # [11:46] <Philip`> I suppose I'd really have to support it properly, to fix up all the glyph IDs, and can't just pass it through :-(
  303. # [11:47] <Philip`> but I'm too lazy to do that
  304. # [11:47] <Philip`> but I could make it just drop the entire kern table
  305. # [11:48] <Philip`> jgraham: It's not a clearly exploitable bug, and there's probably hundreds of other ways you could make font renderers crash because they weren't designed for untrusted input, so I suppose the relative risk is quite low :-)
  306. # [11:48] <jfkthame> jgraham: yup - 'fraid so - it's basically just another case of "feed random data to the font system and you'll probably crash something"
  307. # [11:49] <Philip`> Doesn't fill me with great confidence in the security of browsers if they're dependent on such things :-/
  308. # [11:50] <takkaria> font renderers obviously need to stop trusting input
  309. # [11:50] <jfkthame> you mean "the security of operating systems", don't you? the browser is just a way to get a font onto the user's system
  310. # [11:51] <Philip`> I mean the security of the process of using my web browser to view a (untrusted) page
  311. # [11:51] <jfkthame> takkaria: right - it's no different from jpeg or png decoders or whatever - just that we've been attacking those for longer and so they've had a lot more hardening
  312. # [11:52] <Philip`> Were JPEG and PNG decoders as crash-prone as TTF decoders are, when they were first used in browsers?
  313. # [11:52] * Philip` supposes it's quite possible they were
  314. # [11:52] <Philip`> (Has anyone tried making Theora files that crash Firefox?)
  315. # [11:52] <jfkthame> ISTR plenty of security alerts in that area over the years
  316. # [11:55] * jfkthame needs to go for now - Philip`, have fun with the Theora idea!
  317. # [11:56] <Philip`> jfkthame: Thanks for pointing me at the bug :-)
  318. # [11:58] <gsnedders> Oh, yeah, it's Thursday today.
  319. # [11:58] * gsnedders forgot
  320. # [12:02] * Joins: doublec (n=doublec@118-93-172-205.dsl.dyn.ihug.co.nz)
  321. # [12:02] * Quits: MikeSmith (n=MikeSmit@dhcp-246-223.mag.keio.ac.jp) ("Tomorrow to fresh woods, and pastures new.")
  322. # [12:03] * Philip` fixes his code to drop the kern table
  323. # [12:03] <Philip`> I wonder how many OS X users' browsers my page crashed before that fix
  324. # [12:06] <jgraham> gsnedders: You never could get the hang of Thursdays?
  325. # [12:07] <gsnedders> jgraham: Nah, it's just it's school holidays so I have no concept of time anymore.
  326. # [12:07] * Philip` is not on holiday but has no concept of time anyway
  327. # [12:22] * Joins: mpilgrim (n=mark@rrcs-96-10-240-189.midsouth.biz.rr.com)
  328. # [12:22] * Quits: Guest91020 (n=mark@rrcs-96-10-240-189.midsouth.biz.rr.com) (Read error: 104 (Connection reset by peer))
  329. # [12:22] * Joins: myakura (n=myakura@p1063-ipbf3305marunouchi.tokyo.ocn.ne.jp)
  330. # [12:36] * Joins: nessy (n=nessy@124-168-176-139.dyn.iinet.net.au)
  331. # [12:43] * Joins: rubys (n=rubys@cpe-075-182-092-038.nc.res.rr.com)
  332. # [12:49] * Quits: webben (n=benh@nat/yahoo/x-4ee0609040efaccc) (Read error: 60 (Operation timed out))
  333. # [12:52] <eighty4> gsnedders: so no concept of time == holiday?
  334. # [12:59] * Joins: zcorpan (n=zcorpan@pat.se.opera.com)
  335. # [13:01] * Joins: webben (n=benh@nat/yahoo/x-af5e1af47e9e04d2)
  336. # [13:10] <gsnedders> eighty4: No, if holiday then no concept of time. The inverse does not hold true.
  337. # [13:11] <eighty4> :)
  338. # [13:11] <eighty4> I'm having 2 days completly alone tomorrow... during that time I'll probably have no concept of time
  339. # [13:18] * Joins: Maurice (n=ano@a80-101-46-164.adsl.xs4all.nl)
  340. # [13:22] * Quits: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
  341. # [13:23] * Joins: remysharp (n=remyshar@remysharp.plus.com)
  342. # [13:23] <remysharp> Is this a sensible place to ask html5 questions - in particular about using it for new sites/pages?
  343. # [13:24] <gsnedders> remysharp: Yes
  344. # [13:24] <gsnedders> remysharp: But I would say that, wouldn't I? :)
  345. # [13:24] <jgraham> gsnedders: The right answer was no. We fail on the first clause (Is this a sensible place)
  346. # [13:24] <gsnedders> jgraham: True.
  347. # [13:25] * Parts: zcorpan (n=zcorpan@pat.se.opera.com)
  348. # [13:25] <remysharp> Hmm, okay - how about senseless questions?
  349. # [13:25] <jgraham> remysharp: However it is still a good place to ask HTML 5 questions
  350. # [13:25] <gsnedders> jgraham: But giving the right answer would be sensible.
  351. # [13:25] <jgraham> Perhaps the best
  352. # [13:25] <jgraham> Just note the /topic
  353. # [13:26] * Joins: trovster (n=trovster@iweb-adsl.demon.co.uk)
  354. # [13:26] <remysharp> So, in the <footer> element, if I have a list of elements -
  355. # [13:26] * Joins: mikos1 (n=mikos@87.84.153.18)
  356. # [13:26] <remysharp> As per this example: http://dev.w3.org/html5/spec/Overview.html#the-nav-element
  357. # [13:26] <remysharp> shouldn't the list be in a nav element?
  358. # [13:26] * Joins: matthewknight (n=mattknig@88-97-40-88.dsl.zen.co.uk)
  359. # [13:26] * Quits: webben (n=benh@nat/yahoo/x-af5e1af47e9e04d2) (Read error: 110 (Connection timed out))
  360. # [13:26] * remysharp hmm - was that the right link...
  361. # [13:27] <gsnedders> remysharp: "Not all groups of links on a page need to be in a nav element — only sections that consist of primary navigation blocks are appropriate for the nav element."
  362. # [13:27] <gsnedders> remysharp: They aren't primary navigation links
  363. # [13:27] <remysharp> gsnedders: ah, right, so only "primary" - great - thanks.
  364. # [13:28] <remysharp> So I've been using a few live examples around the web to help me code my html5 page -
  365. # [13:28] <remysharp> and I've been looking at the uxlondon.com site -
  366. # [13:28] <remysharp> which uses the <div class="section"> method to get around having to use JS to trigger IE to see html
  367. # [13:28] <remysharp> so - my question is -
  368. # [13:28] <jgraham> (On the opther hand it does make some sense that the primary navigation might be partially in the footer so the content model restriction maybe doesn't make sense)
  369. # [13:29] <remysharp> is there a real reason why all their 'section's contain 'article's?
  370. # [13:29] <remysharp> I wouldn't have always nested an article in a section
  371. # [13:29] * Quits: harig (n=opera@59.90.71.35) (Connection timed out)
  372. # [13:29] <remysharp> and the spec description of an article is user generate content or article blog entry, etc.
  373. # [13:30] <remysharp> example: http://uxlondon.com/speakers/
  374. # [13:30] <jgraham> remysharp: At first glance, no
  375. # [13:30] <remysharp> any thoughts on that at all? or perhaps just a design choice they went with
  376. # [13:31] <gsnedders> At a further glance, no.
  377. # [13:31] <jgraham> remysharp: I think their markup is unnecessarily redundant
  378. # [13:31] <remysharp> okay, good - that's what I was thinking too - so at least I'm partially following it.
  379. # [13:31] <remysharp> Using the same page as an example (the output)
  380. # [13:32] <remysharp> I am marking up a speakers page for my own project:
  381. # [13:32] <remysharp> and I've got a list of speakers - which normally I'd put in a <ul>
  382. # [13:32] <remysharp> however - I kind of want to put each one in a <section> element with the whole thing nested inside an <article> element
  383. # [13:33] * Quits: matthewknight (n=mattknig@88-97-40-88.dsl.zen.co.uk)
  384. # [13:33] <remysharp> would that make sense? (or would you like a quick example of what I mean?)
  385. # [13:33] <gsnedders> You're asking about sense again…
  386. # [13:33] <remysharp> !! :-D
  387. # [13:33] <remysharp> yeah, sorry!
  388. # [13:33] * gsnedders reads the actual question
  389. # [13:33] * Joins: harig (n=opera@59.90.71.35)
  390. # [13:34] <jgraham> remysharp: Something like <article><h2>Speakers</h2><section><h3>J. Smith</h3>[...]
  391. # [13:34] <remysharp> I suspect a quick mock of the markup I'm suggesting might help
  392. # [13:34] <remysharp> yeah - jgraham that looks like what I was thinking
  393. # [13:34] <remysharp> so no use of <ul> at all
  394. # [13:34] <jgraham> It's not really clear why the outside bit would be <article> rather than <section> but I guess it barely makes any difference anyway
  395. # [13:34] <remysharp> but my normal html4 approach would be to use a list element, but with html5, I kinda don't want to
  396. # [13:34] <gsnedders> remysharp: I'd go for <article><h2>Speakers</h2><dl><dt>Mr John Smith<dd><p>Mr John Smith is awesome.<p>He even wrote <cite>My Magical Wonderland</cite></dl></article>
  397. # [13:35] <remysharp> but isn't Mr John Smith a header within a section?
  398. # [13:35] <jgraham> gsnedders: Doesn't work so well if you ant to generate a toc that has the speakers listed
  399. # [13:35] <jgraham> *want
  400. # [13:35] * gsnedders shrugs
  401. # [13:35] <beowulf> i'd go with gsnedders
  402. # [13:36] <beowulf> in terms of that question :)
  403. # [13:36] <jgraham> remysharp: I think, assuming each spaeker will have a little description, that using section+headers is fine
  404. # [13:36] <beowulf> though i'd probably s/article/section
  405. # [13:36] <gsnedders> jgraham: I do have a vague clue about what the outlining algorithm says :P
  406. # [13:37] * Joins: matthewknight (n=matthewk@88-97-40-88.dsl.zen.co.uk)
  407. # [13:37] <remysharp> damn - is there a prefered pastebin?
  408. # [13:38] <gsnedders> The one that Google takes you to when you click on "I'm feeling lucky!" because it takes the least amount of effort to find.
  409. # [13:38] <jgraham> remysharp: Try your markup in http://gsnedders.html5.org/outliner/
  410. # [13:38] <remysharp> jgraham: ta
  411. # [13:38] <gsnedders> (OMG! I WROTE THAT1)
  412. # [13:38] <jgraham> remysharp: (That is not a pastebin)
  413. # [13:38] <remysharp> yeah, sure - oh
  414. # [13:38] <remysharp> there's not going to be url is there?
  415. # [13:38] <remysharp> hixie has one I believe - that saves the url...trying that
  416. # [13:38] <jgraham> remysharp: Ask gsnedders :)
  417. # [13:39] <jgraham> Oh, yeah you could use the LDV
  418. # [13:39] * Parts: trovster (n=trovster@iweb-adsl.demon.co.uk)
  419. # [13:39] <gsnedders> remysharp: Just allowing a textarea?
  420. # [13:39] <gsnedders> Yeah, that's on my to-do list.
  421. # [13:39] * gsnedders has quite a lot of his to-do list, though
  422. # [13:40] * Joins: markj (n=MJ@93-97-166-243.zone5.bethere.co.uk)
  423. # [13:40] <remysharp> right - there you:
  424. # [13:40] <remysharp> http://tr.im/iv56
  425. # [13:42] <beowulf> i'd say the first header is redundant unless it wraps the <p> and that the outer article is a section
  426. # [13:42] <jgraham> remysharp: No need for the <header> element unless you plan to make a subheading
  427. # [13:42] <gsnedders> The header elements are both needless
  428. # [13:42] <beowulf> but i'm easily the dumbest person in the room, just to clarify
  429. # [13:43] <remysharp> There would obviously be a header element at the top of the page
  430. # [13:43] <remysharp> but you're saying the name shouldn't be in a header
  431. # [13:43] <jgraham> remysharp: No, no no :)
  432. # [13:44] <gsnedders> I'm saying you gain nothing by having it in a header element
  433. # [13:44] <jgraham> <header><h2>Foo</h2></header> === <h2>Foo</h2>
  434. # [13:44] <gsnedders> A header element only is of use when you having a heading and subheading and you only want the heading to appear in the TOC
  435. # [13:44] <remysharp> sorry, yeah, doesn't gain anything
  436. # [13:44] <jgraham> The extra <header> is redundant
  437. # [13:44] <gsnedders> <header><h2>Foo</h2><h3>Bar</h3></header> === <h2>Foo</h2> in terms of TOC too
  438. # [13:45] <remysharp> gsnedders: why does the h3 text get lost then?
  439. # [13:45] <remysharp> (in the TOC)
  440. # [13:45] <remysharp> or does it read the highest level heading and use that?
  441. # [13:45] <gsnedders> jgraham: Explain!
  442. # [13:45] <gsnedders> :P
  443. # [13:45] <gsnedders> remysharp: The highest level heading is all that's used
  444. # [13:45] <remysharp> cool - that makes sense.
  445. # [13:46] <remysharp> so if a TOC was generated from that page - if I omitted the <header> on the names of speakers, their names wouldn't appear in the TOC - is that right?
  446. # [13:46] <beowulf> gsnedders: what does <header><h1>Foo</h1><h1>Bar</h1></header> come out as in the TOC?
  447. # [13:46] <gsnedders> beowulf: Foo
  448. # [13:47] <gsnedders> remysharp: They would. The h3 element would make them.
  449. # [13:47] <remysharp> right - got you.
  450. # [13:47] <remysharp> so header is only if there's mixed content and you want a specific (or the highest) to be used.
  451. # [13:47] <remysharp> that makes sense as to why it's utterly redundant in my example.
  452. # [13:47] <gsnedders> beowulf: (It's the first highest order header)
  453. # [13:48] <gsnedders> (There are, however, bugs in what the spec currently says.)
  454. # [13:48] <remysharp> So, on that same topic, would it be fair to say that in this example, the <header> is redundant:
  455. # [13:48] <remysharp> <header><h1>My site</h1><p>Tag line</p></header>
  456. # [13:49] <gsnedders> remysharp: No
  457. # [13:49] <jgraham> remysharp: Technically, no
  458. # [13:49] <gsnedders> (the p element will not be associated with any section, IIRC)
  459. # [13:50] <jgraham> It won't have any observable effect on the outline, but semantically it is right
  460. # [13:50] <gsnedders> http://www.w3.org/Bugs/Public/show_bug.cgi?id=6750
  461. # [13:50] <jgraham> Oh, maybe it does have some observable effect
  462. # [13:50] <jgraham> then
  463. # [13:50] <gsnedders> jgraham: Not really
  464. # [13:50] <gsnedders> jgraham: Unless you have a UI which shows what element is linked to what section, it doesn't.
  465. # [13:50] * Quits: matthewknight (n=matthewk@88-97-40-88.dsl.zen.co.uk) ("Get Colloquy for iPhone! http://mobile.colloquy.info/")
  466. # [13:50] <jgraham> gsnedders: That is an observable effect
  467. # [13:51] <gsnedders> Well, not in any current implementation of the algorithm seeming both of our implementations just build a TOC.
  468. # [13:51] * Joins: matthewknight (n=matthewk@82.132.136.216)
  469. # [13:51] <jgraham> gsnedders: In principle though
  470. # [13:51] <remysharp> just backpeddling to the question about using sections within an article instead of a <ul><li> collection - does that look right in essence?
  471. # [13:52] <jgraham> remysharp: Yes
  472. # [13:52] * Joins: ap (n=ap@194.154.88.36)
  473. # [13:52] <jgraham> It is better than <ul> for sure
  474. # [13:52] <remysharp> awesome. total head f**k based on getting used to using lists for everything, but feel right.
  475. # [13:52] <remysharp> *feels right
  476. # [13:53] <beowulf> jgraham: why is it better than a list?
  477. # [13:54] <jgraham> beowulf: Even extant AT will allow you to nagivate easilly by header elements, for example
  478. # [13:54] * Quits: matthewknight (n=matthewk@82.132.136.216) (Client Quit)
  479. # [13:55] * Joins: MikeSmith (n=MikeSmit@EM114-48-72-144.pool.e-mobile.ne.jp)
  480. # [13:56] * Joins: matthewknight (n=mattknig@88-97-40-88.dsl.zen.co.uk)
  481. # [14:06] <Philip`> If you want to generate a TOC of all the speakers, use <span class="author"> and then write a script that extracts all the author names and sticks them into a TOC list
  482. # [14:06] <Philip`> which is, like, two lines of code
  483. # [14:07] <hsivonen> Hmm. Is keygen really meant to differ from input 'in select'?
  484. # [14:14] * Quits: bzed (n=bzed@devel.recluse.de) (kubrick.freenode.net irc.freenode.net)
  485. # [14:14] * Quits: heycam (n=cam@210-84-43-129.dyn.iinet.net.au) (kubrick.freenode.net irc.freenode.net)
  486. # [14:14] * Quits: atwilson (n=atwilson@74.125.59.1) (kubrick.freenode.net irc.freenode.net)
  487. # [14:14] * Quits: campd (n=dave@li5-166.members.linode.com) (kubrick.freenode.net irc.freenode.net)
  488. # [14:14] * Joins: heycam (n=cam@210-84-43-129.dyn.iinet.net.au)
  489. # [14:14] * Joins: atwilson (n=atwilson@74.125.59.1)
  490. # [14:14] * Joins: bzed (n=bzed@devel.recluse.de)
  491. # [14:14] * Joins: campd (n=dave@li5-166.members.linode.com)
  492. # [14:15] * Quits: harig (n=opera@59.90.71.35) (Read error: 110 (Connection timed out))
  493. # [14:20] * Joins: webben (n=benh@nat/yahoo/x-b29ba6c8ab8e0bfd)
  494. # [14:24] <MikeSmith> hsivonen: I wondered about that too, when looking at your treebuilder code
  495. # [14:24] <MikeSmith> maybe worth a bugzilla to clarify
  496. # [14:28] <hsivonen> http://software.hixie.ch/utilities/js/live-dom-viewer/saved/73
  497. # [14:29] * Joins: taf2 (n=taf2@65.210.82.235)
  498. # [14:29] * Quits: matthewknight (n=mattknig@88-97-40-88.dsl.zen.co.uk)
  499. # [14:29] <hsivonen> like <input> in Gecko. Not like <input> in Opera and Safari.
  500. # [14:29] * Joins: matthewknight (n=matthewk@82.132.136.217)
  501. # [14:29] * Quits: matthewknight (n=matthewk@82.132.136.217) (Client Quit)
  502. # [14:32] <hsivonen> MikeSmith: bug filed just in case
  503. # [14:35] * Joins: matthewknight (n=matthewk@82.132.136.216)
  504. # [14:35] * Joins: pauld (n=pauld@194.102.13.6)
  505. # [14:35] * Joins: harig (n=opera@59.90.71.35)
  506. # [14:38] <gsnedders> MikeSmith: Which Irish author is it you want me to read, again?
  507. # [14:42] * Quits: matthewknight (n=matthewk@82.132.136.216)
  508. # [14:48] * Joins: matthewknight (n=matthewk@82.132.136.216)
  509. # [14:48] * Parts: rubys (n=rubys@cpe-075-182-092-038.nc.res.rr.com)
  510. # [14:51] <hsivonen> could someone please point out to me why the following XSD regexp: "\s*(none|xMinYMin|xMidYMin|xMaxYMin|xMinYMid|xMidYMid|xMaxYMid|xMinYMax|xMidYMax|xMaxYMax)\s+(meet|slice)?\s*"
  511. # [14:51] <hsivonen> does not match the word 'none'?
  512. # [14:51] * Quits: jfkthame (n=jonathan@user-5447749d.wfd89a.dsl.pol.co.uk)
  513. # [14:52] <hsivonen> ooh
  514. # [14:52] <hsivonen> now I see it
  515. # [14:52] <hsivonen> \s+
  516. # [14:53] <hsivonen> that's a classing mis-use of XSD \s, BTW
  517. # [14:53] <hsivonen> XSD \s does *not* equal XML whitespace
  518. # [14:54] <gsnedders> WHAT!?
  519. # [14:54] <hsivonen> gsnedders: welcome to the world of i18n political correctness. Zs FTW!
  520. # [14:54] <gsnedders> Not White_Space and not Zs?
  521. # [14:54] <hsivonen> gsnedders: IIRC, Zs
  522. # [14:55] <gsnedders> That doesn't even include U+000A IIRC!
  523. # [14:55] <hsivonen> s/classing/classical/
  524. # [14:56] <hsivonen> s/al// I suppose
  525. # [14:56] <gsnedders> Yeah
  526. # [14:56] <gsnedders> It's now right :)
  527. # [14:56] * Philip` would be unable to resist the temptation to write that regexp with (none|x(Min|Mid|Max)Y(Min|Mid|Max))
  528. # [14:57] <Philip`> (Fortunately I would be able to resist (none|xM(in|id|ax)YM(in|id|ax)) because that's just crazy)
  529. # [14:59] * Joins: bgalbraith (n=bgalbrai@c-71-202-109-116.hsd1.ca.comcast.net)
  530. # [14:59] <Philip`> (Shouldn't the regexp start with "(defer\s+)?"?)
  531. # [15:01] <hsivonen> defer?
  532. # [15:02] <Philip`> That's what http://www.w3.org/TR/SVG/coords.html#PreserveAspectRatioAttribute says
  533. # [15:03] <hsivonen> Philip`: good point.
  534. # [15:04] <hsivonen> is defer only allowed on <image>?
  535. # [15:04] <Philip`> Sounds like it's allowed everywhere, but ignored except on <image>
  536. # [15:05] <hsivonen> heycam: ^
  537. # [15:05] <hsivonen> heycam: the delta of the V.nu copy of the SVG 1.1 schema and the W3C copy is growing
  538. # [15:05] * Quits: nessy (n=nessy@124-168-176-139.dyn.iinet.net.au) ("This computer has gone to sleep")
  539. # [15:06] * Joins: zdobersek (n=zan@cpe-92-37-73-187.dynamic.amis.net)
  540. # [15:07] <hsivonen> MikeSmith: I deployed your recent checkins.
  541. # [15:07] * Joins: mgrdcm (n=mgrdcm@65.111.247.194)
  542. # [15:13] <MikeSmith> hsivonen: thanks
  543. # [15:14] <MikeSmith> gsnedders: Flann O'Brien
  544. # [15:18] * Quits: doublec (n=doublec@118-93-172-205.dsl.dyn.ihug.co.nz) ("Leaving")
  545. # [15:18] <heycam> hsivonen, spec problem?
  546. # [15:19] <heycam> or just a problem with the SVG 1.1 DTD not being as restrictive as it could be?
  547. # [15:20] * Quits: webben (n=benh@nat/yahoo/x-b29ba6c8ab8e0bfd) (Read error: 110 (Connection timed out))
  548. # [15:31] * Joins: pesla\work (n=retep@procurios.xs4all.nl)
  549. # [15:31] * Quits: pesla (n=retep@procurios.xs4all.nl) (Read error: 60 (Operation timed out))
  550. # [15:33] * Joins: campd_ (n=dave@li5-166.members.linode.com)
  551. # [15:35] * Quits: campd (n=dave@li5-166.members.linode.com) (Read error: 104 (Connection reset by peer))
  552. # [15:40] * Joins: smedero (n=smedero@pia145-154.pioneernet.net)
  553. # [15:43] * Quits: mikos1 (n=mikos@87.84.153.18)
  554. # [15:45] <hsivonen> heycam: schema bug
  555. # [15:45] <hsivonen> heycam: schema bug in the RELAX NG schema
  556. # [16:03] <heycam> hsivonen, ok
  557. # [16:03] <heycam> so that rng isn't really official
  558. # [16:03] * Quits: matthewknight (n=matthewk@82.132.136.216)
  559. # [16:04] <heycam> we're going to be making a new one soon, for 1.1, but starting from the 1.2T rng
  560. # [16:06] <heycam> ah i see that regex you quote is from that unofficial rng
  561. # [16:06] <gsnedders> MikeSmith: Well, it's my birthday in just over a week ;P
  562. # [16:07] <jgraham> gsnedders: How old are ypu going to be 5? 6? I lose track
  563. # [16:07] <gsnedders> jgraham: 7
  564. # [16:07] <gsnedders> Sorry, I lie. 10, in an as of yet undecided base.
  565. # [16:08] <heycam> hsivonen, erk, seems like the regex in the 1.2T relaxng is horribly wrong!
  566. # [16:08] <heycam> \s*(none|xMidYMid)\s*(meet)?\s*
  567. # [16:08] * heycam raises an issue
  568. # [16:09] * Joins: krikey (n=test@93-97-166-243.zone5.bethere.co.uk)
  569. # [16:10] <krikey> I was sat like billy no mates in #html5
  570. # [16:10] <krikey> someone could have coma and got me :)
  571. # [16:10] <heycam> http://www.w3.org/Graphics/SVG/WG/track/issues/2257
  572. # [16:10] <jgraham> Hard to get someone if you're in a coma
  573. # [16:11] <krikey> ye
  574. # [16:11] * heycam goes to watch some more daily show
  575. # [16:24] * Quits: mgrdcm (n=mgrdcm@65.111.247.194)
  576. # [16:28] * Joins: mgrdcm (n=mgrdcm@65.111.247.194)
  577. # [16:39] * Quits: bgalbraith (n=bgalbrai@c-71-202-109-116.hsd1.ca.comcast.net)
  578. # [16:42] * Quits: myakura (n=myakura@p1063-ipbf3305marunouchi.tokyo.ocn.ne.jp) ("Leaving...")
  579. # [16:45] * Joins: dbaron (n=dbaron@c-98-234-51-190.hsd1.ca.comcast.net)
  580. # [16:46] * Quits: mpt (n=mpt@canonical/launchpad/mpt) (Remote closed the connection)
  581. # [16:56] * Joins: dglazkov (n=dglazkov@c-98-207-88-44.hsd1.ca.comcast.net)
  582. # [17:03] * Joins: arun_ (n=arun@adsl-76-220-108-134.dsl.pltn13.sbcglobal.net)
  583. # [17:03] * Quits: Maurice (n=ano@a80-101-46-164.adsl.xs4all.nl) ("Disconnected...")
  584. # [17:05] * Joins: tndH (n=Rob@77.86.124.251)
  585. # [17:07] * Joins: billmason1 (n=billmaso@ip98.unival.com)
  586. # [17:09] * Quits: pauld (n=pauld@194.102.13.6)
  587. # [17:09] * Joins: bgalbraith (n=bgalbrai@corp-241.mountainview.mozilla.com)
  588. # [17:14] * Quits: svl (n=chatzill@a194-109-2-36.dmn.xs4all.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
  589. # [17:15] * Quits: tndH (n=Rob@77.86.124.251) ("ChatZilla 0.9.84-rdmsoft [XULRunner 1.9.0.1/2008072406]")
  590. # [17:15] * Joins: dave_levin (n=dave_lev@72.14.224.1)
  591. # [17:17] * Quits: dglazkov (n=dglazkov@c-98-207-88-44.hsd1.ca.comcast.net)
  592. # [17:28] * Quits: zdobersek (n=zan@cpe-92-37-73-187.dynamic.amis.net) ("Leaving.")
  593. # [17:31] * Quits: adambeynon (n=adambeyn@94-194-177-54.zone8.bethere.co.uk) (Success)
  594. # [17:31] * Joins: zdobersek (n=zan@cpe-92-37-73-187.dynamic.amis.net)
  595. # [17:33] * Joins: adambeynon (n=adambeyn@94-194-177-54.zone8.bethere.co.uk)
  596. # [17:34] <Philip`> heycam: That issue fails to mention that it accepts strings like "nonemeet" too
  597. # [17:34] <jgraham> http://www.ecma-international.org/news/PressReleases/PR_Ecma_finalises_major_revision_of_ECMAScript.htm
  598. # [17:35] * Joins: cryzed (n=cryzed@i538731F9.versanet.de)
  599. # [17:35] <cryzed> Hey :)
  600. # [17:35] <cryzed> Is someone from the html5lib for Python here?
  601. # [17:35] <jgraham> cryzed: Yes
  602. # [17:36] <cryzed> great :)
  603. # [17:36] <jgraham> "TC39 members will create and test implementations of the candidate specification to verify its correctness and the feasibility of creating interoperable implementations". I wonder if they mean interoperable implemetations that can ship on the web
  604. # [17:36] <cryzed> So basically I want to know
  605. # [17:36] * Quits: markj (n=MJ@93-97-166-243.zone5.bethere.co.uk)
  606. # [17:37] <cryzed> Do I get the lxml.html parser with this : html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("lxml"))?
  607. # [17:37] <Philip`> jgraham: Try reading the next sentence
  608. # [17:37] <Philip`> "The test implementations will also be used for web compatibility testing to ensure that the revised specification remains compatible with existing web applications."
  609. # [17:37] <jgraham> Philip`: Oh.
  610. # [17:38] <Philip`> cryzed: No - that uses html5lib's HTML5 parser (and constructs an lxml document from it), not lxml's non-standard HTML parser
  611. # [17:38] <jgraham> cryzed: Yes, unless you use the latest svn in which case the best option is to use html5lib.parse(input, tree="lxml")
  612. # [17:38] * Joins: dglazkov (n=dglazkov@nat/google/x-c1832c16b3d1781a)
  613. # [17:39] <cryzed> Well
  614. # [17:39] <cryzed> what now?
  615. # [17:39] <jgraham> Oh, sorry, I misunderstood the question
  616. # [17:39] <jgraham> cryzed: What do you actually want to do
  617. # [17:39] <cryzed> wait
  618. # [17:39] <cryzed> http://paste.pocoo.org/show/ad9DwVRGDKIhXNgqSk8j/
  619. # [17:39] * jgraham is not doing very well at reading at the moment
  620. # [17:39] <cryzed> I want to parse my blog with the html5lib
  621. # [17:40] <cryzed> and then scrape it with the resulting elementtree
  622. # [17:40] <cryzed> Unfortunately I don't really find any documentation for the ElementTree except http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree.ElementTree-class
  623. # [17:40] <jgraham> cryzed: That looks vaugely sensible. What is the problem?
  624. # [17:41] <cryzed> http://paste.pocoo.org/show/KHOsRJHJNnHNvS9ec0Bg/ that should work
  625. # [17:41] <cryzed> doesn't though
  626. # [17:41] <jgraham> http://codespeak.net/lxml/tutorial.html
  627. # [17:41] <cryzed> http://paste.pocoo.org/show/ld7f4BUjcUZbnyFG4vGd/
  628. # [17:41] * Quits: Amorphous (i=jan@unaffiliated/amorphous) (Read error: 110 (Connection timed out))
  629. # [17:41] <cryzed> ah...
  630. # [17:41] <cryzed> the case..
  631. # [17:42] <jgraham> findall()?
  632. # [17:42] <cryzed> yes
  633. # [17:42] <cryzed> and I DO need to supply
  634. # [17:42] <cryzed> an argument
  635. # [17:42] <cryzed> which argument do I need to supply to find all tags?
  636. # [17:42] <cryzed> .
  637. # [17:42] <cryzed> ?
  638. # [17:43] <jgraham> if you just want all the child nodes you can just do "for item in element:"
  639. # [17:43] <cryzed> etree.findall(".//*"): that works aswell
  640. # [17:43] <cryzed> thanks though jgraham :)
  641. # [17:43] <cryzed> Is the argument which I pass to findall called a "xpath"?
  642. # [17:43] * Joins: Amorphous (i=jan@unaffiliated/amorphous)
  643. # [17:43] <cryzed> ElementPath
  644. # [17:44] <cryzed> found it..
  645. # [17:44] <cryzed> sorry I seem to ask only stupid questions
  646. # [17:44] * Quits: pesla\work (n=retep@procurios.xs4all.nl) ("( www.nnscript.com :: NoNameScript 4.21 :: www.esnation.com )")
  647. # [17:44] <jgraham> cryzed: If you know xpath and are using lxml you can do element.xpath(xpath_expression)
  648. # [17:45] <cryzed> Or I can just use the ElementPath?
  649. # [17:45] <cryzed> http://effbot.org/zone/element-xpath.htm
  650. # [17:45] <jgraham> e.g. element.xpath(".//a") finds all a descendants
  651. # [17:45] <jgraham> cryzed: ElementPaths are like a subset of XPath 1.0
  652. # [17:46] <cryzed> Hrmm, I don't think I do need full xpath support, thanks for the tip though
  653. # [17:46] <cryzed> jgraham, I read on the lxml.html documentation
  654. # [17:46] <cryzed> about the following function:
  655. # [17:46] <cryzed> *method
  656. # [17:46] <cryzed> .text_content():
  657. # [17:46] <jgraham> Also if you use a really up to date lxml you can probably get CSS Selectors
  658. # [17:47] <cryzed> this isn't available in the lxml etree, right?
  659. # [17:47] <jgraham> cryzed: html5lib just generates an lxml tree. It has all the features of whichever lxml you have installed
  660. # [17:47] <cryzed> well, yes
  661. # [17:47] <cryzed> the problem is
  662. # [17:48] <cryzed> lxml.html
  663. # [17:48] <cryzed> the html Etree is a special tree
  664. # [17:48] <cryzed> How do I tell html5lib
  665. # [17:48] <cryzed> to use the lxml.html tree?
  666. # [17:48] <jgraham> Oh, yeah
  667. # [17:48] <smedero> jgraham: I don't think the .text_content() method exist in lxml.etree
  668. # [17:48] <cryzed> Is there any way to use the lxml.html tree?
  669. # [17:48] <jgraham> So, I don't think you can at the moment because of some weirdness in the way that lxml is set up
  670. # [17:48] <cryzed> I think this would be really comfortable for webscraping
  671. # [17:49] <jgraham> At least that is my recollection from when I implemented this stuff a while ago
  672. # [17:49] <Philip`> etree.tostring(node, method='text')
  673. # [17:49] <Philip`> might be similar to node.text_content()
  674. # [17:49] <smedero> yeah, that should be in the ballpark
  675. # [17:49] <jgraham> It's something like you can't create comments in lxml.html or...
  676. # [17:49] <jgraham> .xpath(".//text()) works
  677. # [17:50] <jgraham> .xpath(".//text()")
  678. # [17:50] <cryzed> thanks
  679. # [17:50] <cryzed> I found in the lxml.html implementation the following
  680. # [17:50] <cryzed> _collect_string_content(self)
  681. # [17:50] <cryzed> should work if there is no other way
  682. # [18:00] * Joins: davidb (n=davidb@bas4-toronto06-1279310294.dsl.bell.ca)
  683. # [18:07] <cryzed> the .text attribute
  684. # [18:07] <cryzed> works beautifully
  685. # [18:09] <cryzed> ..not
  686. # [18:10] * Joins: aroben_ (n=aroben@unaffiliated/aroben)
  687. # [18:10] <jgraham> cryzed: for <a><b>foo</b>bar</a> a.text == None
  688. # [18:10] <jgraham> b.text == "foo"
  689. # [18:10] <jgraham> b.tail =="bar"
  690. # [18:10] * aroben_ is now known as aroben
  691. # [18:10] * Joins: onar (n=onar@17.244.68.238)
  692. # [18:11] <cryzed> is there any way to get the whole text and protect the formatting?
  693. # [18:11] <cryzed> for example replace <br>
  694. # [18:11] <cryzed> with \n
  695. # [18:11] <jgraham> cryzed: Not an easy way that I know of
  696. # [18:11] <cryzed> hrm
  697. # [18:11] <jgraham> You would need to walk the tree, normalize whitespace and make whatever replacements you want
  698. # [18:13] <cryzed> http://paste.pocoo.org/show/5gbDet52tiXSMRWkLZ4y/ ?
  699. # [18:13] * Quits: harig (n=opera@59.90.71.35) (Read error: 110 (Connection timed out))
  700. # [18:13] * Joins: Maurice (i=copyman@5ED548D4.cable.ziggo.nl)
  701. # [18:14] <jgraham> That will only do children of the td
  702. # [18:15] <cryzed> yes
  703. # [18:15] <cryzed> that's what I want actually
  704. # [18:16] <cryzed> sorry if I start to get annoying
  705. # [18:16] <cryzed> but why doesn't that work: http://paste.pocoo.org/show/cfmGTUkDaWSCr1ehDH3X/ ?
  706. # [18:21] <jgraham> Trying to get the "blockquote" attribute of a "blockquote" element?
  707. # [18:22] <cryzed> well.. kinda
  708. # [18:22] <cryzed> ^^
  709. # [18:22] <cryzed> It works in BeautifulSoup :D...
  710. # [18:22] <jgraham> Er, what does it do?
  711. # [18:22] <jgraham> I mean if you really have <blockquote blockquote=something> I guess it should work
  712. # [18:23] <cryzed> it should get me the text WITHOUT markup WITH formatting out of the blockquote tags
  713. # [18:23] <cryzed> >IN BETWEEN HERE<
  714. # [18:24] <gsnedders> with formatting without markup? how?
  715. # [18:24] <cryzed> well
  716. # [18:24] <cryzed> for example
  717. # [18:24] <jgraham> cryzed: If you just want the test you can do element.xpath(".//text()")
  718. # [18:24] <gsnedders> jgraham: You don't need XPath for that! Peh!
  719. # [18:25] <cryzed> <pre>That's some fancy text <br>comment'</pre>
  720. # [18:25] <cryzed> Should result to
  721. # [18:25] <jgraham> If you want to do some formatting on the text you need to decide what formatting you want
  722. # [18:25] <cryzed> That's some fancy text
  723. # [18:25] <cryzed> comment
  724. # [18:25] <cryzed> gsnedders, what should I use?
  725. # [18:25] <gsnedders> jgraham: return etree.tostring(element, encoding=unicode, method='text', with_tail=False) is better than that
  726. # [18:26] <cryzed> I can't acces this function
  727. # [18:26] <jgraham> And implement that by e.g. walking the tree replacing <br> with "\n" and adding the .tail of the br to the right place
  728. # [18:26] <jgraham> gsnedders: Define "better"
  729. # [18:26] <gsnedders> jgraham: Quicker
  730. # [18:26] <gsnedders> jgraham: The result is identical :P
  731. # [18:27] <jgraham> gsnedders: Seems unlikely to be a problem in this case
  732. # [18:27] <jgraham> It is much longer to type and easier to get wrong (maybe)
  733. # [18:27] * gsnedders just wraps it in a function :P
  734. # [18:27] <cryzed> etree doesn't got the attribute .tostring
  735. # [18:27] <gsnedders> cryzed: from lxml import etree
  736. # [18:28] <cryzed> I originally only wanted to import html5lib :|..
  737. # [18:29] * Joins: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
  738. # [18:38] * Joins: matthewknight (n=matthewk@82.132.136.216)
  739. # [18:47] * Quits: hdh (n=hdh@58.187.19.53) (Read error: 104 (Connection reset by peer))
  740. # [18:48] * Joins: dimich (n=dimich@72.14.227.1)
  741. # [18:50] * Joins: hdh (n=hdh@58.187.19.53)
  742. # [18:52] * Joins: krikey72 (n=test@93-97-166-243.zone5.bethere.co.uk)
  743. # [18:54] * Joins: adambeynon_ (n=adambeyn@94-194-177-54.zone8.bethere.co.uk)
  744. # [18:54] * Quits: remysharp (n=remyshar@remysharp.plus.com) ("Gotta shoot - "peeyaow"")
  745. # [18:59] * Joins: matthewknight_ (n=matthewk@82.132.136.217)
  746. # [18:59] * Quits: matthewknight (n=matthewk@82.132.136.216) (Remote closed the connection)
  747. # [18:59] * matthewknight_ is now known as matthewknight
  748. # [19:02] * Joins: matthewknight_ (n=matthewk@82.132.136.216)
  749. # [19:02] * Quits: matthewknight (n=matthewk@82.132.136.217) (Remote closed the connection)
  750. # [19:02] * matthewknight_ is now known as matthewknight
  751. # [19:02] * Quits: krikey (n=test@93-97-166-243.zone5.bethere.co.uk) (Read error: 110 (Connection timed out))
  752. # [19:06] * Joins: matthewknight_ (n=matthewk@82.132.136.216)
  753. # [19:06] * Quits: matthewknight (n=matthewk@82.132.136.216) (Remote closed the connection)
  754. # [19:06] * matthewknight_ is now known as matthewknight
  755. # [19:09] <cryzed> Is it a good idea to use the latest svn build?
  756. # [19:09] <gsnedders> cryzed: Absolutely.
  757. # [19:10] <gsnedders> cryzed: It's a better idea than using the latest release
  758. # [19:10] <cryzed> Okay
  759. # [19:10] <cryzed> btw
  760. # [19:10] <cryzed> gsnedders, if I had an custom BeautifulSoup.py
  761. # [19:10] <cryzed> *a
  762. # [19:10] <cryzed> Could I somehow tell the treebuilder
  763. # [19:10] <cryzed> to use this this BeautifulSoup?
  764. # [19:10] * Quits: adambeynon (n=adambeyn@94-194-177-54.zone8.bethere.co.uk) (Read error: 110 (Connection timed out))
  765. # [19:11] <cryzed> http://furyu-tei.sakura.ne.jp/archives/BSXPath.zip I found this
  766. # [19:11] <cryzed> it has xpath support as it seems
  767. # [19:11] <Philip`> html5lib's BeautifulSoup support is not particularly reliable
  768. # [19:11] <cryzed> Yeah, but it works
  769. # [19:11] <cryzed> I did some things with it
  770. # [19:11] <cryzed> And I somehow think that using BeautifulSoup is easier than lxml
  771. # [19:12] <Philip`> It works as long as you don't do one of the things that doesn't work :-)
  772. # [19:12] <cryzed> e.g?
  773. # [19:13] <Philip`> e.g. http://code.google.com/p/html5lib/issues/detail?id=80
  774. # [19:13] <cryzed> oh..
  775. # [19:13] * Joins: rubys (n=rubys@cpe-075-182-092-038.nc.res.rr.com)
  776. # [19:14] <cryzed> lxml
  777. # [19:14] <cryzed> doesn't work any better
  778. # [19:14] <cryzed> lol
  779. # [19:14] <cryzed> html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder("lxml")).parse("<a><div><div><a>")
  780. # [19:14] <cryzed> >>> etree.tostring(e)
  781. # [19:14] <cryzed> '<html><head/><body><a/><div><a></a><div><a></a><a/></div></div></body></html>'
  782. # [19:14] <Philip`> That works much better, since it gives the right output and doesn't throw exceptions :-p
  783. # [19:14] <cryzed> right?
  784. # [19:14] <cryzed> oh.. you are right
  785. # [19:14] * gsnedders guesses you want the html5lib serializer
  786. # [19:15] <Philip`> cryzed: "Right" according to the HTML5 spec
  787. # [19:15] * Quits: davidb (n=davidb@bas4-toronto06-1279310294.dsl.bell.ca)
  788. # [19:15] <cryzed> Oke
  789. # [19:15] <cryzed> Guys let me tell what I want to do
  790. # [19:15] <Philip`> Bit peculiar how it mixes "<a></a>" and "<a/>"...
  791. # [19:15] <cryzed> Oke
  792. # [19:15] <cryzed> I'm sure some of you know 4chan
  793. # [19:16] <cryzed> I want to write a Python "API" for it
  794. # [19:16] <cryzed> take a look at this page for example:
  795. # [19:16] <cryzed> http://zip.4chan.org/x/res/1648607.html
  796. # [19:16] <cryzed> So
  797. # [19:16] <cryzed> each reply has got an id attribute and a class
  798. # [19:16] <cryzed> Well, basically I want to use lxml
  799. # [19:16] <cryzed> to scrape this site
  800. # [19:16] <cryzed> get the text between the blockquotes
  801. # [19:17] <cryzed> remove the tags
  802. # [19:17] <cryzed> and replace the <br> tags with \n
  803. # [19:17] <cryzed> and save it into a variable
  804. # [19:17] <cryzed> at the this is all getting wrapped in a class called Reply
  805. # [19:17] <cryzed> *at the end
  806. # [19:18] <cryzed> http://iohosaf.pastebin.com/d5b10f992
  807. # [19:18] * Quits: rubys (n=rubys@cpe-075-182-092-038.nc.res.rr.com) (Read error: 104 (Connection reset by peer))
  808. # [19:18] * Joins: rubys1 (n=rubys@cpe-075-182-092-038.nc.res.rr.com)
  809. # [19:18] <cryzed> This is what I've got so far
  810. # [19:18] <cryzed> in the implementation with BeautifulSoup
  811. # [19:18] <cryzed> works, but is ugly imho
  812. # [19:21] <Philip`> (On your original question about using a different BeautifulSoup.py: It may be sufficient to just put it in Python's search path, like by using PYTHON_PATH=some-directory-which-contains-that-file)
  813. # [19:21] <cryzed> Philip`, okay
  814. # [19:21] <cryzed> I could just rename it to BeautifulSoup.py
  815. # [19:21] <cryzed> and place it locally
  816. # [19:21] <cryzed> next to my script
  817. # [19:21] <cryzed> probably
  818. # [19:21] <Philip`> I'm not certain but I think that ought to get picked up when html5lib tries loading it
  819. # [19:22] * Philip` has to go away
  820. # [19:22] * Joins: mlpug (n=mlpug@a91-156-60-13.elisa-laajakaista.fi)
  821. # [19:23] <cryzed> Well oke
  822. # [19:24] <cryzed> thanks Philip`
  823. # [19:24] <cryzed> I'll get myself the newest html5lib
  824. # [19:24] * Quits: mat_t (n=mattomas@nat/canonical/x-3e68bedca63a67f8) ("This computer has gone to sleep")
  825. # [19:24] <cryzed> and just tune my old lib
  826. # [19:24] <cryzed> with BeautifulSoup a bit
  827. # [19:24] <cryzed> and hope that it works
  828. # [19:26] * Joins: mpt (n=mpt@canonical/launchpad/mpt)
  829. # [19:28] * Quits: ray (i=ray@2001:41c8:1:54da:0:0:0:1337) (kubrick.freenode.net irc.freenode.net)
  830. # [19:28] * Quits: rubys1 (n=rubys@cpe-075-182-092-038.nc.res.rr.com) (kubrick.freenode.net irc.freenode.net)
  831. # [19:28] * Quits: broquaint (i=245f7a9c@spc1-brig11-0-0-cust544.asfd.broadband.ntl.com) (kubrick.freenode.net irc.freenode.net)
  832. # [19:28] * Quits: matthewknight (n=matthewk@82.132.136.216)
  833. # [19:30] * Joins: pauld (n=pauld@87.83.16.77)
  834. # [19:30] * Joins: jwalden (n=waldo@c-24-6-169-169.hsd1.ca.comcast.net)
  835. # [19:33] * Quits: onar (n=onar@17.244.68.238) (Read error: 110 (Connection timed out))
  836. # [19:34] * Quits: pauld (n=pauld@87.83.16.77) (Client Quit)
  837. # [19:40] * Quits: dbaron (n=dbaron@c-98-234-51-190.hsd1.ca.comcast.net) ("8403864 bytes have been tenured, next gc will be global.")
  838. # [19:40] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
  839. # [19:44] * Parts: cryzed (n=cryzed@i538731F9.versanet.de) ("Verlassend")
  840. # [19:51] * Quits: mpt (n=mpt@canonical/launchpad/mpt) (Read error: 113 (No route to host))
  841. # [19:53] * Joins: mpt (n=mpt@canonical/launchpad/mpt)
  842. # [19:56] * Quits: MikeSmith (n=MikeSmit@EM114-48-72-144.pool.e-mobile.ne.jp) (Read error: 110 (Connection timed out))
  843. # [19:58] * Joins: dbaron (n=dbaron@corp-241.mountainview.mozilla.com)
  844. # [20:08] * Joins: onar (n=onar@17.244.68.238)
  845. # [20:12] * Joins: broquaint (i=245f7a9c@spc1-brig11-0-0-cust544.asfd.broadband.ntl.com)
  846. # [20:12] * Joins: ray (i=ray@2001:41c8:1:54da:0:0:0:1337)
  847. # [20:15] * Joins: ojan (n=ojan@72.14.229.81)
  848. # [20:16] * Joins: pauld (n=pauld@87.83.16.77)
  849. # [20:24] * ap is now known as ap|away
  850. # [20:25] * Quits: pauld (n=pauld@87.83.16.77)
  851. # [20:29] * Joins: Niictar24 (n=ritz@S010600183f550ae0.cg.shawcable.net)
  852. # [20:31] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) (Read error: 104 (Connection reset by peer))
  853. # [20:32] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
  854. # [20:32] * Quits: dave_levin (n=dave_lev@72.14.224.1)
  855. # [20:36] * Joins: mgrdcm_ (n=mgrdcm@65.111.247.194)
  856. # [20:36] * Quits: mgrdcm (n=mgrdcm@65.111.247.194) (Read error: 104 (Connection reset by peer))
  857. # [20:37] * Joins: weinig (n=weinig@17.246.17.225)
  858. # [20:39] * Quits: hdh (n=hdh@58.187.19.53) ("Leaving.")
  859. # [20:48] * Quits: dolske (n=dolske@firefox/developer/dolske) (Read error: 110 (Connection timed out))
  860. # [20:49] * Joins: dolske (n=dolske@corp-241.mountainview.mozilla.com)
  861. # [20:50] <hsivonen> "I don't think anyone wants to break plugins" - iPhone, anyone?
  862. # [20:50] * Quits: weinig (n=weinig@17.246.17.225)
  863. # [20:50] <hsivonen> (quote from HTML WG telecon minutes)
  864. # [20:51] <smedero> it was a surreal a couple of moments... for sure.
  865. # [20:51] <gsnedders> hsivonen: Well, arguably it never broke
  866. # [20:51] * Quits: jwalden (n=waldo@c-24-6-169-169.hsd1.ca.comcast.net) ("->office")
  867. # [20:57] * Joins: davidb (n=davidb@bas4-toronto06-1279310294.dsl.bell.ca)
  868. # [21:05] * Quits: onar (n=onar@17.244.68.238)
  869. # [21:09] * Joins: dave_levin (n=dave_lev@72.14.227.1)
  870. # [21:11] * Quits: aroben (n=aroben@unaffiliated/aroben) (Read error: 110 (Connection timed out))
  871. # [21:14] * Joins: Lachy (n=Lachlan@85-189-168-181.glemnet.managedbroadband.co.uk)
  872. # [21:19] * Joins: onar (n=onar@17.244.68.238)
  873. # [21:19] * Quits: onar (n=onar@17.244.68.238) (Remote closed the connection)
  874. # [21:20] * Joins: onar (n=onar@17.226.23.135)
  875. # [21:26] * Quits: smedero (n=smedero@pia145-154.pioneernet.net)
  876. # [21:29] * Joins: jwalden (n=waldo@corp-241.mountainview.mozilla.com)
  877. # [21:38] * Joins: smedero (n=smedero@pia145-154.pioneernet.net)
  878. # [21:39] * Quits: zalan (n=kvirc@catv-80-99-193-98.catv.broadband.hu) ("KVIrc 3.4.0 Virgo http://www.kvirc.net/")
  879. # [21:44] * Quits: mlpug (n=mlpug@a91-156-60-13.elisa-laajakaista.fi) (Read error: 54 (Connection reset by peer))
  880. # [21:44] * Joins: virtuelv (n=virtuelv@95.34.27.22.customer.cdi.no)
  881. # [21:53] <tantek> hsivonen - neither iPhone nor BlackBerry browsers support Flash / plugins
  882. # [22:09] * Joins: pauld (n=pauld@92.40.68.129.sub.mbb.three.co.uk)
  883. # [22:14] * Joins: slightlyoff (n=slightly@nat/google/x-70971672382656c6)
  884. # [22:15] * gsnedders feels bad…
  885. # [22:15] <gsnedders> Appealing to authority :(
  886. # [22:16] <takkaria> some authority is good
  887. # [22:16] <gsnedders> annevk?
  888. # [22:19] * Joins: aroben (n=aroben@unaffiliated/aroben)
  889. # [22:24] <gsnedders> ARGH!
  890. # [22:24] * gsnedders gets annoyed at Google again
  891. # [22:24] <gsnedders> I can't simply look for anything relating to Lolita without getting a ton of results of porn
  892. # [22:29] * Quits: ap|away (n=ap@194.154.88.36)
  893. # [22:30] * Joins: adambeynon (n=adambeyn@93-97-225-135.zone5.bethere.co.uk)
  894. # [22:34] <Philip`> gsnedders: Maybe you shouldn't be using the image search with SafeSearch off
  895. # [22:35] * Joins: pauld_ (n=pauld@92.40.99.242.sub.mbb.three.co.uk)
  896. # [22:35] <gsnedders> Philip`: I'm not using image search. That doesn't help me write an English dissertation.
  897. # [22:37] * Quits: adambeynon_ (n=adambeyn@94-194-177-54.zone8.bethere.co.uk) (Success)
  898. # [22:39] * Quits: pauld (n=pauld@92.40.68.129.sub.mbb.three.co.uk) (Read error: 60 (Operation timed out))
  899. # [22:47] * Quits: zdobersek (n=zan@cpe-92-37-73-187.dynamic.amis.net) ("Leaving.")
  900. # [22:47] * Quits: krikey72 (n=test@93-97-166-243.zone5.bethere.co.uk)
  901. # [22:48] <gsnedders> Hmm… Nabokov almost always uses to sob and rarely to cry…
  902. # [22:50] * Joins: olliej_ (n=oliver@17.203.15.161)
  903. # [22:51] * Quits: pauld_ (n=pauld@92.40.99.242.sub.mbb.three.co.uk)
  904. # [22:51] * Quits: olliej (n=oliver@17.203.15.161) (Read error: 110 (Connection timed out))
  905. # [22:53] * Quits: ROBOd (n=robod@89.122.216.38) ("http://www.robodesign.ro")
  906. # [23:06] * Joins: onar__ (n=onar@17.244.68.238)
  907. # [23:15] * olliej_ is now known as olliej
  908. # [23:17] * Joins: pauld (n=pauld@host81-151-61-163.range81-151.btcentralplus.com)
  909. # [23:20] * Joins: DanMan (n=opera@HSI-KBW-091-089-162-015.hsi2.kabel-badenwuerttemberg.de)
  910. # [23:22] * Parts: DanMan (n=opera@HSI-KBW-091-089-162-015.hsi2.kabel-badenwuerttemberg.de)
  911. # [23:22] * Quits: pauld (n=pauld@host81-151-61-163.range81-151.btcentralplus.com) (Client Quit)
  912. # [23:23] * Quits: onar (n=onar@17.226.23.135) (Connection timed out)
  913. # [23:23] * Quits: Maurice (i=copyman@5ED548D4.cable.ziggo.nl) ("Disconnected...")
  914. # [23:41] * Joins: aroben_ (n=aroben@unaffiliated/aroben)
  915. # [23:43] * Quits: aroben_ (n=aroben@unaffiliated/aroben) (Read error: 104 (Connection reset by peer))
  916. # [23:44] * Quits: onar__ (n=onar@17.244.68.238)
  917. # [23:50] <Hixie> heycam: yt?
  918. # [23:51] * Quits: aroben (n=aroben@unaffiliated/aroben) (Read error: 110 (Connection timed out))
  919. # [23:51] <Hixie> i have a method take takes as its argument an array of values
  920. # [23:51] <Hixie> the values are typed
  921. # [23:51] * Quits: olliej (n=oliver@17.203.15.161) (Read error: 54 (Connection reset by peer))
  922. # [23:51] * Joins: pauld (n=pauld@host81-151-61-163.range81-151.btcentralplus.com)
  923. # [23:52] * Joins: olliej (n=oliver@17.203.15.161)
  924. # [23:52] <Hixie> er, i mean, an array of arrays of values, which are typed
  925. # [23:52] <Hixie> let's say, it's an array of arrays of DOMString, long pairs
  926. # [23:52] <Hixie> e.g.
  927. # [23:52] <Hixie> foo([['a', 1], ['b', 2], ['c', 3]]);
  928. # [23:52] <Hixie> is there a sane way to describe that in WebIDL?
  929. # [23:58] * Quits: taf2 (n=taf2@65.210.82.235)
  930. # [23:58] * Joins: onar (n=onar@17.226.23.135)
  931. # Session Close: Fri Apr 10 00:00:00 2009

The end :)