/irc-logs / w3c / #html-wg / 2007-07-15 / end

Options:

  1. # Session Start: Sun Jul 15 00:00:00 2007
  2. # Session Ident: #html-wg
  3. # [00:00] * Joins: hyatt (hyatt@24.6.91.161)
  4. # [00:02] <Philip`> I remember looking a long time ago at the pages I found using <footer>, and they just looked far more like old buggy HTML with random made-up tags than like early adopters of HTML5 :-)
  5. # [00:04] <zcorpan_> ok. was the usage of <footer> incompatible with html5?
  6. # [00:04] <Philip`> (It would be nice if all the collected statistics could be linked back to the pages they came from - I'll see if I can use that, if it's not going to take huge amounts of disk space...)
  7. # [00:05] <Philip`> http://www.classesusa.com/schools/campus/it.html
  8. # [00:06] <Philip`> (The other <footer> was on the same site as that one)
  9. # [00:07] <Philip`> Oops, the <header> was actually just a </header>, in http://home.comcast.net/~chris.s/myth.html
  10. # [00:08] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  11. # [00:11] * Joins: sbuluf (wzfdycu@200.49.140.174)
  12. # [00:14] * Joins: gavin (gavin@74.103.208.221)
  13. # [00:16] * Quits: hyatt (hyatt@24.6.91.161) (Quit: hyatt)
  14. # [00:16] * Quits: tinfish (tinfish@84.92.181.183) (Quit: tinfish)
  15. # [00:22] * Quits: zcorpan_ (zcorpan@90.229.146.10) (Ping timeout)
  16. # [00:31] * Quits: Lachy (chatzilla@203.214.140.60) (Quit: ChatZilla 0.9.78.1 [Firefox 2.0.0.4/2007051502])
  17. # [00:42] * Joins: mjs (mjs@64.81.48.145)
  18. # [01:41] * Joins: hyatt (hyatt@24.6.91.161)
  19. # [01:46] * Joins: myakura (myakura@58.88.37.26)
  20. # [01:51] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
  21. # [01:55] * Quits: hyatt (hyatt@24.6.91.161) (Quit: hyatt)
  22. # [02:03] * Quits: tH (Rob@87.102.36.227) (Quit: ChatZilla 0.9.78.1-rdmsoft [XULRunner 1.8.0.9/2006120508])
  23. # [02:04] <Philip`> http://canvex.lazyilluminati.com/misc/stats/analyse.cgi/index - I've replaced the dataset with the Alexa Top 500 pages
  24. # [02:05] <Philip`> It's interesting to see the prevalence of <script> (on about 93% of pages), compared to http://code.google.com/webstats/2005-12/scripting.html finding it on roughly half
  25. # [02:16] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  26. # [02:21] * Joins: gavin (gavin@74.103.208.221)
  27. # [02:32] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  28. # [03:45] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  29. # [04:06] * Joins: mjs (mjs@64.81.48.145)
  30. # [07:10] * Joins: Lachy (chatzilla@203.214.140.60)
  31. # [11:00] * Joins: Fred (fred@84.6.240.69)
  32. # [11:00] * Parts: Fred (fred@84.6.240.69)
  33. # [11:31] * Joins: tH_ (Rob@87.102.36.227)
  34. # [11:31] * tH_ is now known as tH
  35. # [12:35] * Joins: zcorpan_ (zcorpan@90.229.146.10)
  36. # [12:57] * Joins: Sander (svl@86.87.68.167)
  37. # [13:33] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  38. # [13:38] * Joins: gavin (gavin@74.103.208.221)
  39. # [14:23] * Quits: oedipus (oedipus@71.250.56.243) (Ping timeout)
  40. # [15:04] * Joins: ROBOd (robod@86.34.246.154)
  41. # [15:26] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  42. # [15:38] * Joins: kazuhito (kazuhito@222.151.186.76)
  43. # [15:58] * Joins: Sander (svl@86.87.68.167)
  44. # [15:59] * Joins: edas (edaspet@88.191.34.123)
  45. # [16:31] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  46. # [16:46] * Quits: tH (Rob@87.102.36.227) (Ping timeout)
  47. # [16:49] * Joins: tH (Rob@87.102.36.227)
  48. # [17:04] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  49. # [17:09] * Joins: gavin (gavin@74.103.208.221)
  50. # [18:05] * Quits: edas (edaspet@88.191.34.123) (Ping timeout)
  51. # [19:39] * Quits: kazuhito (kazuhito@222.151.186.76) (Quit: Quitting!)
  52. # [19:45] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  53. # [19:50] * Joins: gavin (gavin@74.103.208.221)
  54. # [19:54] * Joins: Sander (svl@86.87.68.167)
  55. # [21:02] * Quits: sbuluf (wzfdycu@200.49.140.174) (Ping timeout)
  56. # [21:07] * Joins: sbuluf (fgwg@200.49.140.174)
  57. # [21:27] * Quits: xover (xover@193.157.66.5) (Ping timeout)
  58. # [21:36] * Quits: Sander (svl@86.87.68.167) (Ping timeout)
  59. # [21:36] * Joins: xover (xover@193.157.66.5)
  60. # [21:45] * Joins: Lionheart (robin@66.57.69.65)
  61. # [21:46] <Philip`> Of the front pages of the top 500 sites, www.w3.org contains 79% of the <acronym>s and 97% of the <abbr>s
  62. # [21:51] <hsivonen> Philip`: do you have a survey framework that others can run?
  63. # [21:52] <Philip`> I'm trying to build one up at the moment
  64. # [21:53] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  65. # [21:53] <Philip`> though I've not tried to do anything good about downloading a good sample of pages yet, which is why I'm just testing with that list of 500 sites for now
  66. # [21:53] <hsivonen> I wonder if the dmoz data dump could be considered a representative samle of pages
  67. # [21:54] <hsivonen> would it be biased towards old pages and front pages?
  68. # [21:54] <Philip`> That's what http://triin.net/2006/06/12/Selection_of_pages used
  69. # [21:55] <Philip`> Would it be biased towards English too?
  70. # [21:55] <hsivonen> dunno. probably
  71. # [21:56] <hsivonen> although if one wants to analyze, for example, what fallback content tends to say, one would be better off scraping text that one can actually read and categorize
  72. # [21:57] * hsivonen notes that dmoz still carries a Netscape copyright notice
  73. # [21:57] <Philip`> Non-English sites seem to be quite different to English ones - e.g. http://www.xinhuanet.com/ has a thousand <td>s, which seems quite insane, but it's just as important that HTML5 isn't incompatible with those sites
  74. # [21:58] * Joins: gavin (gavin@74.103.208.221)
  75. # [21:59] <Philip`> (Of the top 12 <td> abusers in my collection of pages, ign.com is the only English one)
  76. # [22:00] * hsivonen passes tests4.dat
  77. # [22:00] <hsivonen> oops. didn't pass after all
  78. # [22:03] <hsivonen> passing it now
  79. # [22:46] * Quits: zcorpan_ (zcorpan@90.229.146.10) (Ping timeout)
  80. # [22:47] * Joins: Sander (svl@86.87.68.167)
  81. # [23:04] * Quits: ROBOd (robod@86.34.246.154) (Quit: http://www.robodesign.ro )
  82. # [23:24] <Philip`> hsivonen / html5lib people: http://canvex.lazyilluminati.com/misc/stats/tokeniser.html gives the frequency of each step of the tokeniser algorithm, in case that's interesting for knowing which bits to optimise
  83. # [23:25] <Philip`> (It records the current state and the C++ code for the first conditional which succeeded, with "true" being the "not yet handled" parts)
  84. # [23:25] <hsivonen> Philip`: thank you. on the face of it, the frequencies suggest that I should optimize away my additional buffers
  85. # [23:25] <hsivonen> as they are used in the most common steps
  86. # [23:40] <Philip`> This download-loads-of-HTML-pages idea would be much easier if I had more than 200MB of free disk space left
  87. # Session Close: Mon Jul 16 00:00:00 2007

The end :)