/irc-logs / freenode / #webplatform / 2014-05-21 / end

Options:

  1. # Session Start: Wed May 21 00:00:00 2014
  2. # Session Ident: #webplatform
  3. # [00:00] * Quits: @julee (~Adium@192.150.10.210) (Quit: Leaving.)
  4. # [00:03] * Joins: julee (~Adium@192.150.10.210)
  5. # [00:03] * ChanServ sets mode: +o julee
  6. # [00:17] * Quits: drublic (~drublic@xdsl-87-78-102-91.netcologne.de) (Remote host closed the connection)
  7. # [00:27] * Quits: +eliezerb (uid25062@gateway/web/irccloud.com/x-mtgsqbxbuesaebcd) (Quit: Connection closed for inactivity)
  8. # [00:30] * Joins: kiy (~kiyoura@pool-173-79-97-128.washdc.fios.verizon.net)
  9. # [00:38] * Quits: AmeliaBR (3263c548@gateway/web/freenode/ip.50.99.197.72) (Quit: Page closed)
  10. # [00:55] * Quits: roven (~roven@78-20-24-80.access.telenet.be) (Remote host closed the connection)
  11. # [01:04] * Quits: David_Bradbury (~chatzilla@75-147-178-254-Washington.hfc.comcastbusiness.net) (Quit: ChatZilla 0.9.90.1 [Firefox 29.0.1/20140506152807])
  12. # [01:05] * Quits: @julee (~Adium@192.150.10.210) (Quit: Leaving.)
  13. # [01:42] * Joins: julee (~Adium@192.150.10.210)
  14. # [01:42] * ChanServ sets mode: +o julee
  15. # [01:46] * Quits: @julee (~Adium@192.150.10.210) (Ping timeout: 252 seconds)
  16. # [01:48] * Quits: @Ryan_Lane (~Ryan_Lane@wikimedia/Ryan-lane) (Quit: Leaving.)
  17. # [01:52] * Joins: Ryan_Lane (~Ryan_Lane@wikimedia/Ryan-lane)
  18. # [01:52] * ChanServ sets mode: +o Ryan_Lane
  19. # [01:55] * Joins: ryuan (~ryuan@210.94.41.89)
  20. # [01:59] * Quits: lmclister (~lmclister@192.150.10.210)
  21. # [01:59] * Quits: @Ryan_Lane (~Ryan_Lane@wikimedia/Ryan-lane) (Quit: Leaving.)
  22. # [02:00] * Joins: Ryan_Lane (~Ryan_Lane@wikimedia/Ryan-lane)
  23. # [02:00] * ChanServ sets mode: +o Ryan_Lane
  24. # [02:04] * DenSchub is now known as offSchub
  25. # [02:34] * Quits: jswisher (~jswisher@cpe-72-182-94-57.austin.res.rr.com) (Quit: jswisher)
  26. # [02:53] * Quits: @Ryan_Lane (~Ryan_Lane@wikimedia/Ryan-lane) (Quit: Leaving.)
  27. # [02:56] * Joins: roven (~roven@78-20-24-80.access.telenet.be)
  28. # [02:57] * Joins: lmclister (~lmclister@c-98-210-38-110.hsd1.ca.comcast.net)
  29. # [03:01] * Quits: roven (~roven@78-20-24-80.access.telenet.be) (Ping timeout: 258 seconds)
  30. # [03:14] * Joins: karlcow (~karl@nerval.la-grange.net)
  31. # [03:59] * Quits: lmclister (~lmclister@c-98-210-38-110.hsd1.ca.comcast.net)
  32. # [04:04] * Joins: eliezerb (uid25062@gateway/web/irccloud.com/x-mkwuhgrzqoqbgdzr)
  33. # [04:04] * ChanServ sets mode: +v eliezerb
  34. # [04:05] <+eliezerb> renoirb: wow! No more crazy jobs! \o/
  35. # [04:06] * Quits: karlcow (~karl@nerval.la-grange.net) (Quit: This computer has gone to sleep)
  36. # [04:11] * Quits: vanessametonini (~vanessame@5.55.net.registro.br) (Remote host closed the connection)
  37. # [04:38] * Joins: lmclister (~lmclister@c-98-210-38-110.hsd1.ca.comcast.net)
  38. # [04:57] * Joins: roven (~roven@78-20-24-80.access.telenet.be)
  39. # [05:01] * Quits: roven (~roven@78-20-24-80.access.telenet.be) (Ping timeout: 240 seconds)
  40. # [05:02] * Joins: karlcow (~karl@nerval.la-grange.net)
  41. # [05:03] * Quits: karlcow (~karl@nerval.la-grange.net) (Remote host closed the connection)
  42. # [05:03] * Joins: karlcow (~karl@nerval.la-grange.net)
  43. # [05:10] * Quits: Bad_Advice_Cat (~Moai@unaffiliated/featheredserpent) (Ping timeout: 256 seconds)
  44. # [05:24] * Joins: hyperair (~hyperair@ubuntu/member/hyperair)
  45. # [05:24] * Quits: ckwalsh (~ckwalsh@facebook/engineering/ckwalsh) (Remote host closed the connection)
  46. # [05:38] * Quits: hyperair (~hyperair@ubuntu/member/hyperair) (Ping timeout: 255 seconds)
  47. # [05:43] * Joins: Bad_Advice_Cat (~Moai@unaffiliated/featheredserpent)
  48. # [05:47] * Joins: hyperair (~hyperair@ubuntu/member/hyperair)
  49. # [05:52] * Joins: java_expert (ba52dc30@gateway/web/freenode/ip.186.82.220.48)
  50. # [05:52] <java_expert> hello
  51. # [05:53] <java_expert> hola
  52. # [05:53] <java_expert> hola
  53. # [05:53] <java_expert> hola
  54. # [05:53] <java_expert> ninguno por ahi
  55. # [05:58] * Quits: java_expert (ba52dc30@gateway/web/freenode/ip.186.82.220.48) (Ping timeout: 240 seconds)
  56. # [06:01] * Quits: jerryitt (uid17132@gateway/web/irccloud.com/x-iqrondqcnxdhkcfl) (Quit: Connection closed for inactivity)
  57. # [06:46] * Quits: Rastus_Vernon (uid15187@wikimedia/Rastus-Vernon) (Quit: Connection closed for inactivity)
  58. # [06:49] * Quits: benschwarz_ (sid2121@gateway/web/irccloud.com/x-hrrpuokenukxdomr) (Ping timeout: 276 seconds)
  59. # [06:50] * Joins: benschwarz_ (sid2121@gateway/web/irccloud.com/x-lwfgtktwfrrisvfb)
  60. # [07:47] * Quits: +eliezerb (uid25062@gateway/web/irccloud.com/x-mkwuhgrzqoqbgdzr) (Quit: Connection closed for inactivity)
  61. # [07:49] * Quits: lmclister (~lmclister@c-98-210-38-110.hsd1.ca.comcast.net)
  62. # [08:09] * Joins: ptressel (~chatzilla@174-31-242-8.tukw.qwest.net)
  63. # [08:30] * Joins: lmclister (~lmclister@c-98-210-38-110.hsd1.ca.comcast.net)
  64. # [08:40] * Quits: kiy (~kiyoura@pool-173-79-97-128.washdc.fios.verizon.net) (Read error: Connection reset by peer)
  65. # [08:40] * Quits: karlcow (~karl@nerval.la-grange.net) (Quit: :tiuQ tiuq sah woclrak)
  66. # [08:40] * Joins: karlcow (~karl@nerval.la-grange.net)
  67. # [08:55] * Quits: lmclister (~lmclister@c-98-210-38-110.hsd1.ca.comcast.net)
  68. # [08:58] * Joins: roven (~roven@78-20-24-80.access.telenet.be)
  69. # [09:04] * Quits: roven (~roven@78-20-24-80.access.telenet.be) (Ping timeout: 252 seconds)
  70. # [09:04] * Quits: @_cheney (~cheney@nat.sierrabravo.net) (Read error: Connection reset by peer)
  71. # [09:05] * Joins: _cheney (~cheney@nat.sierrabravo.net)
  72. # [09:05] * ChanServ sets mode: +o _cheney
  73. # [09:16] * Joins: mattweb_de (~mattweb_d@pd95699f8.dip0.t-ipconnect.de)
  74. # [09:19] * Joins: drublic (~drublic@213.15.0.85)
  75. # [09:24] * Joins: antdillon (~ant@nat/canonical/x-rporkjeklrourlrj)
  76. # [09:42] * Joins: roven (~roven@78-20-24-80.access.telenet.be)
  77. # [10:03] * Joins: mattweb_de_ (~mattweb_d@pd95699f8.dip0.t-ipconnect.de)
  78. # [10:05] * Quits: mattweb_de (~mattweb_d@pd95699f8.dip0.t-ipconnect.de) (Ping timeout: 276 seconds)
  79. # [10:05] * mattweb_de_ is now known as mattweb_de
  80. # [10:08] * Quits: ptressel (~chatzilla@174-31-242-8.tukw.qwest.net) (Read error: Connection reset by peer)
  81. # [10:16] * Joins: mstalfoort (~manuchill@83.232.96.217)
  82. # [10:22] * Quits: ryuan (~ryuan@210.94.41.89) (Remote host closed the connection)
  83. # [10:51] * Joins: auchenberg (~auchenber@94.18.214.22)
  84. # [11:15] * Joins: ink|off|ZNC (~inky@master.qs.biz)
  85. # [11:37] * Quits: tfnico (sid1523@gateway/web/irccloud.com/x-isyyvsgckilbufud) (Ping timeout: 245 seconds)
  86. # [11:37] * Joins: ptressel (~chatzilla@174-31-242-8.tukw.qwest.net)
  87. # [11:37] * Quits: Kenzi` (sid7017@gateway/web/irccloud.com/x-vdyzvhoiidhywuit) (Ping timeout: 245 seconds)
  88. # [11:37] * Quits: benschwarz_ (sid2121@gateway/web/irccloud.com/x-lwfgtktwfrrisvfb) (Read error: Connection reset by peer)
  89. # [11:37] * Joins: benschwarz_ (sid2121@gateway/web/irccloud.com/x-xhaeqxquxpiokswu)
  90. # [11:39] * Joins: Kenzi` (sid7017@gateway/web/irccloud.com/x-tssfcjiofrpvegrz)
  91. # [11:39] * Joins: tfnico (sid1523@gateway/web/irccloud.com/x-kroncydhqvfwbyve)
  92. # [11:44] * Quits: wpdbot (~wpdbot@ec2-50-19-180-183.compute-1.amazonaws.com) (Remote host closed the connection)
  93. # [11:45] * Joins: wpdbot (~wpdbot@ec2-23-22-142-26.compute-1.amazonaws.com)
  94. # [11:47] * Quits: tfnico (sid1523@gateway/web/irccloud.com/x-kroncydhqvfwbyve) (Ping timeout: 264 seconds)
  95. # [11:47] * Joins: tfnico (sid1523@gateway/web/irccloud.com/x-zpgtpiidwiydfmmc)
  96. # [11:49] * Quits: Bad_Advice_Cat (~Moai@unaffiliated/featheredserpent) (Ping timeout: 256 seconds)
  97. # [11:51] * Joins: Bad_Advice_Cat (~Moai@unaffiliated/featheredserpent)
  98. # [12:24] * Quits: auchenberg (~auchenber@94.18.214.22) (Remote host closed the connection)
  99. # [12:24] * Quits: ptressel (~chatzilla@174-31-242-8.tukw.qwest.net) (Quit: zzz)
  100. # [12:29] * Joins: chrismills (~chrismill@87.115.156.125)
  101. # [12:29] * ChanServ sets mode: +o chrismills
  102. # [12:35] * Joins: eliezerb (uid25062@gateway/web/irccloud.com/x-kswgdusrcmyfeomq)
  103. # [12:35] * ChanServ sets mode: +v eliezerb
  104. # [12:50] * Joins: auchenberg (~auchenber@94.18.214.22)
  105. # [13:50] * Joins: auchenbe_ (~auchenber@94.18.214.22)
  106. # [13:50] * Quits: auchenberg (~auchenber@94.18.214.22) (Read error: Connection reset by peer)
  107. # [13:54] * Quits: Bad_Advice_Cat (~Moai@unaffiliated/featheredserpent) (Ping timeout: 256 seconds)
  108. # [15:13] * Joins: jswisher (~jswisher@cpe-72-182-94-57.austin.res.rr.com)
  109. # [15:45] * Joins: jerryitt (uid17132@gateway/web/irccloud.com/x-jidigtgjolflkaps)
  110. # [15:47] * Quits: auchenbe_ (~auchenber@94.18.214.22) (Remote host closed the connection)
  111. # [15:52] * Joins: auchenberg (~auchenber@176.222.239.226)
  112. # [16:10] * Joins: Ryan_Lane (~Ryan_Lane@wikimedia/Ryan-lane)
  113. # [16:10] * ChanServ sets mode: +o Ryan_Lane
  114. # [16:12] * Quits: @Ryan_Lane (~Ryan_Lane@wikimedia/Ryan-lane) (Client Quit)
  115. # [16:17] * Quits: +eliezerb (uid25062@gateway/web/irccloud.com/x-kswgdusrcmyfeomq) (Quit: Connection closed for inactivity)
  116. # [16:35] * Joins: dontcallmedom (~dom@216.239.55.62)
  117. # [16:41] * Joins: codylindley (~textual@184-155-250-216.cpe.cableone.net)
  118. # [17:07] * Quits: auchenberg (~auchenber@176.222.239.226) (Remote host closed the connection)
  119. # [17:08] * Quits: hyperair (~hyperair@ubuntu/member/hyperair) (Ping timeout: 240 seconds)
  120. # [17:10] * Joins: auchenberg (~auchenber@176.222.239.226)
  121. # [17:11] * Quits: jswisher (~jswisher@cpe-72-182-94-57.austin.res.rr.com) (Ping timeout: 264 seconds)
  122. # [17:12] * Joins: auchenbe_ (~auchenber@176.222.239.226)
  123. # [17:15] * Quits: auchenberg (~auchenber@176.222.239.226) (Ping timeout: 276 seconds)
  124. # [17:26] * Joins: jswisher (~jswisher@cpe-72-182-94-57.austin.res.rr.com)
  125. # [17:45] * Joins: eliezerb (uid25062@gateway/web/irccloud.com/x-wlnlnvjxzgefidtp)
  126. # [17:45] * ChanServ sets mode: +v eliezerb
  127. # [17:46] * Quits: auchenbe_ (~auchenber@176.222.239.226) (Remote host closed the connection)
  128. # [17:47] * Quits: mattweb_de (~mattweb_d@pd95699f8.dip0.t-ipconnect.de) (Quit: mattweb_de)
  129. # [17:52] * Quits: karlcow (~karl@nerval.la-grange.net) (Ping timeout: 258 seconds)
  130. # [17:54] * Joins: karlcow (~karl@nerval.la-grange.net)
  131. # [17:57] * Quits: jswisher (~jswisher@cpe-72-182-94-57.austin.res.rr.com) (Quit: jswisher)
  132. # [17:59] * Joins: hyperair (~hyperair@ubuntu/member/hyperair)
  133. # [18:02] * Joins: lmclister (~lmclister@192.150.10.210)
  134. # [18:07] * Quits: drublic (~drublic@213.15.0.85) (Remote host closed the connection)
  135. # [18:11] * Quits: mstalfoort (~manuchill@83.232.96.217) (Quit: kthxbai)
  136. # [18:19] * Joins: julee (~Adium@c-50-184-87-81.hsd1.ca.comcast.net)
  137. # [18:19] * ChanServ sets mode: +o julee
  138. # [18:19] * Quits: @julee (~Adium@c-50-184-87-81.hsd1.ca.comcast.net) (Client Quit)
  139. # [18:21] * Joins: julee (~Adium@192.150.10.203)
  140. # [18:21] * ChanServ sets mode: +o julee
  141. # [18:43] * Quits: @chrismills (~chrismill@87.115.156.125) (Quit: Off to find beer and rock and roll...)
  142. # [18:59] * Quits: antdillon (~ant@nat/canonical/x-rporkjeklrourlrj) (Quit: Leaving)
  143. # [19:16] * Quits: lmclister (~lmclister@192.150.10.210)
  144. # [19:21] * Joins: lmclister (~lmclister@192.150.10.210)
  145. # [19:27] * Joins: David_Bradbury (~chatzilla@75-147-178-254-Washington.hfc.comcastbusiness.net)
  146. # [19:33] * Joins: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk)
  147. # [19:47] * Joins: Bad_Advice_Cat (~Moai@unaffiliated/featheredserpent)
  148. # [19:55] * Joins: ckwalsh (~ckwalsh@facebook/engineering/ckwalsh)
  149. # [20:11] * offSchub is now known as DenSchub
  150. # [20:34] * Parts: ink|off|ZNC (~inky@master.qs.biz)
  151. # [20:57] * Quits: David_Bradbury (~chatzilla@75-147-178-254-Washington.hfc.comcastbusiness.net) (Quit: ChatZilla 0.9.90.1 [Firefox 29.0.1/20140506152807])
  152. # [21:03] * Quits: karlcow (~karl@nerval.la-grange.net) (Ping timeout: 240 seconds)
  153. # [21:05] * Joins: _cheney_ (~cheney@nat.sierrabravo.net)
  154. # [21:07] * Quits: dontcallmedom (~dom@216.239.55.62) (Ping timeout: 240 seconds)
  155. # [21:08] * Quits: @_cheney (~cheney@nat.sierrabravo.net) (Ping timeout: 240 seconds)
  156. # [21:09] * Joins: mattweb_de (~mattweb_d@cable-78-34-4-198.netcologne.de)
  157. # [21:56] * Quits: m4nu (~manu@216.252.204.51) (Ping timeout: 276 seconds)
  158. # [21:58] * Joins: ptressel (~chatzilla@174-31-242-8.tukw.qwest.net)
  159. # [22:01] * Joins: manu (~manu@216.252.204.51)
  160. # [22:01] * manu is now known as Guest41481
  161. # [22:02] * Guest41481 is now known as m4nu
  162. # [22:03] * Quits: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk) (Remote host closed the connection)
  163. # [22:04] * Joins: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk)
  164. # [22:09] * Quits: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk) (Ping timeout: 265 seconds)
  165. # [22:11] * Joins: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk)
  166. # [22:16] * Quits: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk) (Ping timeout: 265 seconds)
  167. # [22:17] * Quits: +eliezerb (uid25062@gateway/web/irccloud.com/x-wlnlnvjxzgefidtp) (Quit: Connection closed for inactivity)
  168. # [22:24] <@shepazu> frozenice, yt?
  169. # [22:25] <@frozenice> hi!
  170. # [22:25] <@shepazu> whoah!
  171. # [22:25] <@shepazu> fast response
  172. # [22:25] <@frozenice> working on the irc bot :)
  173. # [22:25] <@shepazu> now I don't recall what I wanted to say… jk
  174. # [22:25] <@frozenice> it's about MDN, I imagine
  175. # [22:25] <@shepazu> frozenice, any chance you could help with the MDN crawling/scraping?
  176. # [22:26] <@shepazu> Pat seems to be busy lately, or maybe I'm just confused
  177. # [22:26] <@shepazu> in any case, I'm stressing out about the compat-table stuff
  178. # [22:27] <@shepazu> and from what I understood, you had some of the scrape-bot stuff working already
  179. # [22:27] <@shepazu> in your NodeJS thingie
  180. # [22:28] <@frozenice> yeah that works, it fetches feeds from some tags (HTML, HTML5, CSS, etc.) but it can only get 500 pages or so from those feeds, that was the problem
  181. # [22:28] <@shepazu> frozenice, any way around that?
  182. # [22:29] <@frozenice> none that I saw, we somehow need to get us a list of all the pages, then the bot can run with that and pick out the compat tables from each page
  183. # [22:30] <@frozenice> I just did the feed stuff to get a pool of useful pages
  184. # [22:30] <@shepazu> frozenice, that's the crawler aspect, right?
  185. # [22:30] <@shepazu> surely there's a node crawler out there...
  186. # [22:30] <@frozenice> well, kinda
  187. # [22:31] <@frozenice> we just need a list of pages, the thing can do the rest :)
  188. # [22:31] <@frozenice> maybe MDN has one, in a sitemap or something
  189. # [22:31] * Joins: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk)
  190. # [22:31] <@shepazu> frozenice, I'm sure they do
  191. # [22:32] <@shepazu> frozenice, if I collect the list of pages, will your scraper do the rest?
  192. # [22:32] <@frozenice> yup
  193. # [22:32] <@frozenice> the "getting a list of pages" is just one step, we can change how it gets that list
  194. # [22:33] <@shepazu> frozenice, ok, I'll get on that
  195. # [22:33] <@frozenice> neat
  196. # [22:33] <@frozenice> btw, I started working on the irc bot again yesterday, making progress :)
  197. # [22:33] * Joins: drublic (~drublic@xdsl-87-78-27-195.netcologne.de)
  198. # [22:34] <@shepazu> frozenice, what will it do?
  199. # [22:34] <@frozenice> everything!
  200. # [22:34] <@shepazu> and what specific part are you working on?
  201. # [22:34] <@frozenice> everything!
  202. # [22:34] <@shepazu> everything? good, I could use someone to help clean up my house
  203. # [22:34] <@frozenice> your house is no part of the bot :P
  204. # [22:35] <@frozenice> unless someone would write a plugin...
  205. # [22:35] <@shepazu> frozenice, one thing I'd like it to do is make tidy transcripts from meeting minutes
  206. # [22:35] <@shepazu> and record actions
  207. # [22:36] <@frozenice> yeah, I got some ideas for plugins, too
  208. # [22:36] <@frozenice> the current bot (old code) has some of those
  209. # [22:38] <@frozenice> the core seems pretty finished for the time being, I'm putting together the actual bot we will use, adding some plugins (like watching for wiki changes), testing the whole plugin system etc.
  210. # [22:41] <@frozenice> I've been kinda absent, because a colleague / friend suddenly passed away two weeks ago, that sucked very much... but the irc bot is good for getting back to work
  211. # [22:45] <@shepazu> oh, wow, sorry to hear that!
  212. # [22:45] <@shepazu> that's terrible.
  213. # [22:48] <@frozenice> yeah, my job got a bit stressier, but we'll manage, the show must go on :)
  214. # [22:49] <@frozenice> stressier = more stressful
  215. # [22:49] <@frozenice> I claim that word
  216. # [22:51] <@shepazu> can I rent that word from you?
  217. # [22:51] <@frozenice> only if you don't pay me!
  218. # [22:56] <@frozenice> well, let's see how far I can get the bot this week.
  219. # [22:57] <@shepazu> frozenice, ok, I have a list of all the CSS property pages that we want… I can do the same for HTML, SVG, etc.
  220. # [22:57] <ptressel> Hi, shepazu
  221. # [22:57] <@shepazu> is that really all we need?
  222. # [22:57] <@shepazu> hi, ptressel!
  223. # [22:58] <ptressel> I had windows of time to work on scrapng -- another will open up starting today.
  224. # [22:58] <@shepazu> ptressel, ok, great
  225. # [22:59] <@shepazu> ptressel, I'm not sure we actually need to scrape, based on what frozenice said… confirming now
  226. # [22:59] <@shepazu> or rather, we don't need to crawl, sorry
  227. # [22:59] <ptressel> Ok
  228. # [23:00] <@frozenice> shepazu: cool!
  229. # [23:00] <@frozenice> hi ptressel :)
  230. # [23:00] <@frozenice> yep, a list of pages should be enough
  231. # [23:01] <ptressel> Ok, I'm totally confused now.
  232. # [23:01] <@shepazu> well, shucks, I can do that tonight
  233. # [23:01] <ptressel> Just read the chat backlog.
  234. # [23:01] <ptressel> The issue was that the tag lists cut off at a fixed number.
  235. # [23:01] <@frozenice> correct
  236. # [23:01] <ptressel> So the point of crawling was to get around that.
  237. # [23:02] <ptressel> Yes, there are node.js crawler libraries.
  238. # [23:02] <ptressel> We got stuck on nutch for a while.
  239. # [23:02] <@shepazu> ptressel, yeah, but their topic index pages have the full list of pages we want :) https://developer.mozilla.org/en-US/docs/Web/CSS/Reference
  240. # [23:02] <ptressel> Yes, those are the seed pages.
  241. # [23:02] <ptressel> The point of the crawler is that it fetches them.
  242. # [23:02] <@shepazu> renoirb and I just "scraped" all the URLs for that
  243. # [23:03] <ptressel> I have the list of seed pages.
  244. # [23:03] <@shepazu> ptressel, yeah, isn't that what frozenice's script does?
  245. # [23:03] <@shepazu> maybe I'm confused
  246. # [23:03] <ptressel> If that's different from what's in the node.js work, then I don't know.
  247. # [23:03] <ptressel> The *tag* request is different.
  248. # [23:04] <ptressel> That is a specific MDN query that returns a fixed max number of pages having a particular tag.
  249. # [23:04] <@frozenice> yeah the getting a page list via the tag-feeds was my way to get us started on some useful pages
  250. # [23:05] <@frozenice> we can change the importer, so it pulls the page list from elsewhere
  251. # [23:05] <ptressel> Anyhow, I'm going to a meetup tonight where there is a node.js expert.
  252. # [23:05] <@frozenice> nice
  253. # [23:05] <ptressel> But if this is done, then I'll work on something else.
  254. # [23:06] <ptressel> So...done? or not done?
  255. # [23:06] <@shepazu> ptressel, frozenice, I want to make sure I'm not confused
  256. # [23:06] <ptressel> I'm totally confused at the moment.
  257. # [23:06] <@frozenice> :D
  258. # [23:06] <@shepazu> yay! me too!
  259. # [23:06] <ptressel> :P
  260. # [23:06] <@shepazu> ptressel, don't worry, there's plenty more you could do, if you want :)
  261. # [23:07] <@shepazu> frozenice, ok, sorry to be pedantic...
  262. # [23:07] <@shepazu> but just to confirm:
  263. # [23:07] <ptressel> shepazu has another confusion: We've met at the TTWF event in Seattle. I'm a "she" not a "he".
  264. # [23:07] <ptressel> :D
  265. # [23:07] <@frozenice> that has also confused me.
  266. # [23:07] <ptressel> The Pat is for Patricia
  267. # [23:07] <@shepazu> gah!!!!!
  268. # [23:07] <ptressel> :D
  269. # [23:07] <@frozenice> I always get bad luck on names which can be both :P
  270. # [23:08] <ptressel> :D
  271. # [23:08] <@shepazu> ptressel, I have a sister named Pat, that's why I was confused… I understand now that frozenice is a woman, despite the name "David"
  272. # [23:08] <@frozenice> wat
  273. # [23:08] <@shepazu> now we're all clear, sorry
  274. # [23:08] <@frozenice> that would be news to me
  275. # [23:09] <@frozenice> the confusion seems to be spreading
  276. # [23:09] <@shepazu> frozenice, ok...
  277. # [23:09] <@shepazu> 1) the reason we weren't getting all the pages was that we didn't have a complete list of pages to scrape
  278. # [23:10] <@shepazu> 2) if we have a full list of content pages, we can extract the compat tables from each of them with your script
  279. # [23:10] <@shepazu> 3) ptressel has the complete list of pages we want
  280. # [23:10] <ptressel> What's the script?
  281. # [23:10] <ptressel> No, getting the pages is what the crawl is for.
  282. # [23:11] <@shepazu> 4) frozenice is male, ptressel is female, shepazu is male and confused
  283. # [23:11] <@shepazu> the script is the nodejs thingie
  284. # [23:11] <ptressel> What I have are the seed pages -- those are the tables of contents you mentioned, plus a few obscure ones.
  285. # [23:11] <ptressel> Ok
  286. # [23:11] <@shepazu> ptressel, I think the seed pages contain all the URLs we want
  287. # [23:11] <ptressel> The nice thing about a real crawler is that it doesn't annoy their sysadmins.
  288. # [23:12] * Quits: mattweb_de (~mattweb_d@cable-78-34-4-198.netcologne.de) (Quit: mattweb_de)
  289. # [23:12] <@frozenice> I have proof for 4) https://www.flickr.com/photos/szene/8459312560/in/set-72157632724112919 directly under the 'W'
  290. # [23:12] <ptressel> It obeys robots.txt, doesn't fetch too rapidly, etc.
  291. # [23:12] <@shepazu> ptressel, you want to send me your list of seed pages?
  292. # [23:12] <ptressel> Let me dig them out...
  293. # [23:13] <@shepazu> ptressel, that just proves you have long hair, I've had long hair!
  294. # [23:13] <@shepazu> heck, look at the bearded weirdo next to you, he has longer hair than you!
  295. # [23:14] <@frozenice> wtf are you talking about shepazu :D
  296. # [23:14] <@shepazu> ptressel, sorry I didn't remember you, I'm bad with names
  297. # [23:14] <@shepazu> frozenice, I think I might not be sure anymore
  298. # [23:15] <@frozenice> you know that's jswisher in the center of that photo, right?
  299. # [23:15] <@shepazu> yes, and Chris Mills to the right
  300. # [23:16] <@frozenice> yes
  301. # [23:16] <@shepazu> and ptressel the the left, IIUI
  302. # [23:16] <@frozenice> no
  303. # [23:16] <@shepazu> with her back turned
  304. # [23:16] <@shepazu> oh… she said "under the W"
  305. # [23:16] <@frozenice> I SAID THIS :D
  306. # [23:16] <ptressel> :D
  307. # [23:16] <@shepazu> wtf????
  308. # [23:17] <@shepazu> I am going blind and insane
  309. # [23:17] <ptressel> "He said this"
  310. # [23:17] <@frozenice> on Janet's right is Flo (also from MDN, with orange lanyard) and to his right it's me, under the 'W'
  311. # [23:17] <@shepazu> I think I might need to stop drinking so much petroleum
  312. # [23:17] <@frozenice> or drink more
  313. # [23:17] <@shepazu> at least on work days
  314. # [23:18] <ptressel> Seed pages: http://pastebin.ubuntu.com/7499120/
  315. # [23:18] <@shepazu> frozenice, you are 2 down from Janet?
  316. # [23:18] <@frozenice> yeah, with orange lanyard
  317. # [23:18] <ptressel> The short list at the top is good enough for a depth 2 or 3 crawl.
  318. # [23:18] <@frozenice> inbetween me an Janet is Florian Scholz
  319. # [23:19] <@shepazu> you look male, true, and I'm willing to take your word for it… but I don't consider that photo proof, you're covering your face
  320. # [23:19] <@frozenice> it's a fact and that should clear up article 4) subsection 1. :)
  321. # [23:19] <@shepazu> but I'm not judgmental, you can be whatever sex you want
  322. # [23:19] * Quits: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk) (Remote host closed the connection)
  323. # [23:19] <ptressel> I'm not in that pic.
  324. # [23:20] <ptressel> :D
  325. # [23:20] <ptressel> So there's no evidence there re my gender. ;-)
  326. # [23:20] <@frozenice> the girl in purple is User:Vivienne, IIRC
  327. # [23:21] <@frozenice> uhm, what I was actually wanting to say
  328. # [23:21] <@frozenice> the CSS/Reference page is maybe a good start, but as ptressel said there are some weird pages, which could be discovered through crawling
  329. # [23:22] <@shepazu> frozenice, would this be a reasonable start?
  330. # [23:22] <@shepazu> http://pastebin.ubuntu.com/7499135/
  331. # [23:22] <@frozenice> spotted 2 external links
  332. # [23:23] <@shepazu> frozenice, yeah, minus those and a few others
  333. # [23:23] <@frozenice> it's good for a start, yeah
  334. # [23:23] <@frozenice> I wonder how many of those <500 we already have :)
  335. # [23:23] <@shepazu> frozenice, and who knows how to run your script?
  336. # [23:23] <ptressel> Some of those urls are dups with fragments
  337. # [23:23] <@frozenice> there was a thread in the ML, I believe
  338. # [23:25] <@frozenice> shepazu: http://lists.w3.org/Archives/Public/public-webplatform/2014Jan/0030.html
  339. # [23:26] <@frozenice> the README in https://github.com/webplatform/mdn-compat-importer has some more instructions
  340. # [23:27] <ptressel> There are changes since the last version I pulled, I think.
  341. # [23:28] <@shepazu> ok
  342. # [23:28] <@frozenice> renoirb has done some work on the conversion and some meta-stuff
  343. # [23:28] <@shepazu> yeah
  344. # [23:28] <@shepazu> ok, here's my plan
  345. # [23:28] <@shepazu> I'm going to find all the pages we want (or at least most of them)
  346. # [23:29] <@shepazu> using ptressel's seed pages to inform that list
  347. # [23:29] <@shepazu> I'll make a master list
  348. # [23:29] <ptressel> You're gong to crawl by hand? :D
  349. # [23:29] <@shepazu> then compare those pages to the ones we already got results for
  350. # [23:29] * Quits: codylindley (~textual@184-155-250-216.cpe.cableone.net) (Quit: ["Textual IRC Client: www.textualapp.com"])
  351. # [23:29] <@shepazu> and remove the dupes
  352. # [23:30] <@shepazu> ptressel, it will take me less time to do it by hand than to write a script for it and execute it
  353. # [23:31] <@shepazu> frozenice, once I have that list, we'll run it against MDN
  354. # [23:31] <@shepazu> and convert the results
  355. # [23:31] <@shepazu> then whammo, we're done
  356. # [23:31] <@frozenice> we will feed that list to the importer, yes
  357. # [23:31] <@shepazu> we only need to do this once
  358. # [23:31] <@shepazu> we don't need a repeatable process
  359. # [23:33] <ptressel> Don't we need to repeat this at wossname?
  360. # [23:33] <ptressel> Other site...
  361. # [23:33] <@shepazu> ptressel, quirksmode? caniuse?
  362. # [23:34] <@shepazu> caniuse.com already has a json feed of its results available
  363. # [23:34] <ptressel> Ah, right, caniuse
  364. # [23:35] <@shepazu> ptressel, so, we don't need to scrape it
  365. # [23:36] <@frozenice> I think we only need to get rid of https://github.com/webplatform/mdn-compat-importer/blob/master/index.js#L25 and put the master list into reader.links instead
  366. # [23:38] <@shepazu> OK
  367. # [23:38] <@frozenice> whoever coded that thing did a fine job of separating the tasks :D
  368. # [23:41] <ptressel> :D
  369. # [23:43] <ptressel> shepazu, That list is for CSS. What about others?
  370. # [23:44] <@shepazu> ptressel, I've gathered HTML attributes and elements so far, as well
  371. # [23:44] <@shepazu> working on the others
  372. # [23:44] <@shepazu> frozenice, if you do say so yourself?
  373. # [23:45] <@frozenice> well, I recognize good code when I see it!
  374. # [23:45] <ptressel> :D
  375. # [23:45] <@frozenice> (is "it" right here? sounds kinda wrong)
  376. # [23:46] <@shepazu> frozenice, yup, "it" is correct
  377. # [23:46] <ptressel> Heading toward gender confusion again? :D
  378. # [23:46] <@frozenice> :P
  379. # [23:46] <@shepazu> das ist in ordenung
  380. # [23:47] <@frozenice> hehe, almost perfect ("ordnung")
  381. # [23:47] <@shepazu> gah!
  382. # [23:48] <@shepazu> ich habe für drei jahre deutsch gelernt, aber ich has alles vergessen
  383. # [23:49] <@frozenice> not bad ("habe") :)
  384. # [23:49] <@shepazu> oh, yeah
  385. # [23:49] <@frozenice> denglish
  386. # [23:49] <@shepazu> doch
  387. # [23:49] <@shepazu> or should I say, do'ch!
  388. # [23:50] <@frozenice> :D
  389. # [23:50] * Joins: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk)
  390. # [23:52] <ptressel> Ok, so just to be clear... I don't need to do anything else? I should not add the crawler module to frozenice's code?
  391. # [23:52] * Quits: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk) (Read error: Connection reset by peer)
  392. # [23:52] * Joins: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk)
  393. # [23:53] <@frozenice> I'd say if you want to do a crawler, do it as a separate project, so you are free in choice of modules etc., if it spits out a list of pages, we could use that
  394. # [23:55] <ptressel> Generally the crawler fetches the pages too.
  395. # [23:55] <@frozenice> hm indeed, maybe if it also parses out the compatibility HTML
  396. # [23:55] <ptressel> It's not a matter of "want"...I'm asking what needs to be done.
  397. # [23:56] * Joins: auchenbe_ (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk)
  398. # [23:56] <ptressel> Typical crawlers hand off to an indexer for parsing, except for extracting links to follow.
  399. # [23:56] <ptressel> E.g. nutch hands off to lucene and solr for indexing and serving
  400. # [23:57] * Quits: auchenberg (~auchenber@x1-6-00-8e-f2-36-28-8a.cpe.webspeed.dk) (Ping timeout: 255 seconds)
  401. # Session Close: Thu May 22 00:00:00 2014

The end :)