/irc-logs / w3c / #html-wg / 2007-07-14 / end

Options:

  1. # Session Start: Sat Jul 14 00:00:00 2007
  2. # Session Ident: #html-wg
  3. # [00:12] * Joins: edas (edaspet@88.191.34.123)
  4. # [00:30] * Parts: hasather (hasather@80.203.71.22)
  5. # [00:39] * Quits: edas (edaspet@88.191.34.123) (Quit: http://eric.daspet.name/ et l'├ędition 2007 de http://www.paris-web.fr/ )
  6. # [00:56] * Joins: mjs (mjs@17.255.98.236)
  7. # [00:57] * Quits: zcorpan_ (zcorpan@90.229.146.10) (Ping timeout)
  8. # [01:01] * Quits: tH (Rob@87.102.36.227) (Quit: ChatZilla 0.9.78.1-rdmsoft [XULRunner 1.8.0.9/2006120508])
  9. # [01:09] * Joins: hyatt (hyatt@24.6.91.161)
  10. # [01:15] * Parts: billmason (billmason@69.30.57.156)
  11. # [01:26] * Joins: sbuluf (klum@200.49.140.231)
  12. # [01:27] * Quits: hyatt (hyatt@24.6.91.161) (Quit: hyatt)
  13. # [01:28] * Quits: mjs (mjs@17.255.98.236) (Quit: mjs)
  14. # [01:28] * Joins: mjs (mjs@17.255.98.236)
  15. # [01:34] * Quits: kingryan (rking3@208.66.64.47) (Quit: kingryan)
  16. # [01:42] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  17. # [01:48] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  18. # [01:51] * Quits: sbuluf (klum@200.49.140.231) (Ping timeout)
  19. # [01:53] * Joins: gavin (gavin@74.103.208.221)
  20. # [01:56] * Joins: sbuluf (cgsjqk@200.49.140.231)
  21. # [02:00] * Quits: mjs (mjs@17.255.98.236) (Ping timeout)
  22. # [02:03] * Joins: mjs (mjs@17.255.98.236)
  23. # [02:34] * Quits: Zeros (Zeros-Elip@67.154.87.254) (Quit: Leaving)
  24. # [02:53] <Philip`> http://canvex.lazyilluminati.com/misc/stats/ - some totally rough data - it could be nice to collect more types of data and then do this on lots of pages
  25. # [03:55] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  26. # [04:00] * Joins: gavin (gavin@74.103.208.221)
  27. # [04:15] * Quits: mjs (mjs@17.255.98.236) (Quit: mjs)
  28. # [04:41] * Quits: gavin (gavin@74.103.208.221) (Quit: gavin)
  29. # [04:46] * Joins: gavin (gavin@74.103.208.221)
  30. # [04:56] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  31. # [04:58] * Joins: billyjack (MikeSmith@mcclure.w3.org)
  32. # [05:02] * Quits: billyjack (MikeSmith@mcclure.w3.org) (Client exited)
  33. # [05:10] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
  34. # [05:21] * Quits: dbaron (dbaron@63.245.220.241) (Quit: 8403864 bytes have been tenured, next gc will be global.)
  35. # [05:25] * Quits: deltab (deltab@82.36.30.34) (Client exited)
  36. # [05:31] * Joins: myakura (myakura@58.88.37.26)
  37. # [05:36] * Joins: mjs (mjs@64.81.48.145)
  38. # [06:32] * Joins: deltab (deltab@82.36.30.34)
  39. # [06:36] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  40. # [06:48] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  41. # [06:52] * Joins: gavin (gavin@74.103.208.221)
  42. # [08:49] * Joins: Lachy (chatzilla@203.214.140.60)
  43. # [08:54] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  44. # [08:59] * Joins: gavin (gavin@74.103.208.221)
  45. # [09:29] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  46. # [09:31] * Quits: sbuluf (cgsjqk@200.49.140.231) (Ping timeout)
  47. # [09:40] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Quit: Less talk, more pimp walk.)
  48. # [09:55] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
  49. # [09:56] * Joins: ROBOd (robod@86.34.246.154)
  50. # [10:00] <hsivonen> http://unicode2.chat.ru/
  51. # [10:20] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  52. # [10:29] <Lachy> oh wow, that's even funnier than Dmitry's HTML60 stuff
  53. # [10:29] <Lachy> should probably call it Unicode 60
  54. # [10:42] * Quits: Lachy (chatzilla@203.214.140.60) (Quit: ChatZilla 0.9.78.1 [Firefox 2.0.0.4/2007051502])
  55. # [11:02] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  56. # [11:04] * Joins: tH_ (Rob@87.102.36.227)
  57. # [11:04] * tH_ is now known as tH
  58. # [11:07] * Joins: gavin (gavin@74.103.208.221)
  59. # [11:11] * Joins: Lachy (chatzilla@203.214.140.60)
  60. # [11:14] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Quit: Less talk, more pimp walk.)
  61. # [11:18] * Joins: hasather (hasather@80.203.71.22)
  62. # [11:27] * Joins: myakura (myakura@58.88.37.26)
  63. # [12:37] <Philip`> Far too many people have non-standard doctypes - I see "-//SQ//DTD HTML 2.0 + all extensions//EN", "-//SoftQuad//DTD HoTMetaL PRO 4.0::19970916::extensions to HTML 4.0//EN", "-//Stanford University Libraries//DTD HTML Experimental//EN", "-//GSI//DTD smPanel 1.0 //EN", and I've not even looked at many yet
  64. # [13:04] <hsivonen> Philip`: from what era? in terms of quirks mode needs, should those have made it onto dbaron's doctype list way back when? (too late now anyway)
  65. # [13:09] <Philip`> From the couple of thousand documents I got from Yahoo search results a few months ago
  66. # [13:09] <Philip`> (It would be much better to do this on a more proper sample, but currently I'm just trying to see if the system could generally work)
  67. # [13:11] <Philip`> IE just looks for the strings "DOCTYPE NETSC", " HTML plus", "DTD HTML EXP" and "DTD W3 HTML//" (anywhere inside the <!doctype...> string) to decide to use quirks mode, so that'll catch many of the unusual doctypes too
  68. # [13:13] <Philip`> (where "anywhere" means that <!DOCTYPE fooDOCTYPE NETSCfoo> will trigger quirks mode, while <!DOCTYPE fooDOCTYPE NETSfoo> will be standards mode)
  69. # [13:20] <hsivonen> eww
  70. # [13:25] <Philip`> Oh, plus there's some special cases depending on finding " Transitional//" and "http://" and some HTML indicators ("DTD HTML 4.", etc) inside the doctype string
  71. # [13:26] <Philip`> Maybe it'd be interesting to do a comparison of what doctypes are in use in the world, and how IE vs HTML5 treats them
  72. # [13:28] <hsivonen> http://esw.w3.org/topic/HTML/AuthorSyntax I guess I could point out what's wrong with it, but doing so would tarpit me for days. :-(
  73. # [13:29] <hsivonen> Philip`: it would be interesting, but I'm not convinced that changing the spec to match IE precisely would be worth all the trouble
  74. # [13:30] <hsivonen> Philip`: the spec is already pretty close to Gecko, WebKit and Presto and they seem to handle the real Web well enough on this point
  75. # [13:30] <hsivonen> "well enough" being subjective, of course
  76. # [13:31] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  77. # [13:36] * Joins: gavin (gavin@74.103.208.221)
  78. # [14:44] * Joins: zcorpan_ (zcorpan@90.229.146.10)
  79. # [15:00] * Joins: Jero (Jero@213.46.207.230)
  80. # [15:10] * Quits: Jero (Jero@213.46.207.230) (Quit: ChatZilla 0.9.78.1 [Firefox 2.0.0.4/2007051502])
  81. # [15:39] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  82. # [15:44] * Joins: gavin (gavin@74.103.208.221)
  83. # [16:22] * Parts: hasather (hasather@80.203.71.22)
  84. # [17:41] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
  85. # [17:47] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  86. # [17:51] * Joins: gavin (gavin@74.103.208.221)
  87. # [18:17] * Joins: sbuluf (hsu@200.49.140.202)
  88. # [18:55] * Joins: edas (edaspet@88.191.34.123)
  89. # [19:03] * Joins: myakura (myakura@58.88.37.26)
  90. # [19:11] * Quits: sbuluf (hsu@200.49.140.202) (Ping timeout)
  91. # [19:12] * Joins: sbuluf (prw@200.49.140.153)
  92. # [19:20] * Quits: zcorpan_ (zcorpan@90.229.146.10) (Ping timeout)
  93. # [19:54] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  94. # [19:59] * Joins: gavin (gavin@74.103.208.221)
  95. # [20:09] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
  96. # [20:10] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  97. # [20:40] * Quits: edas (edaspet@88.191.34.123) (Ping timeout)
  98. # [21:16] * Joins: mjs (mjs@64.81.48.145)
  99. # [21:19] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  100. # [21:48] * Quits: heycam (cam@203.214.115.243) (Ping timeout)
  101. # [22:01] * Joins: zcorpan_ (zcorpan@90.229.146.10)
  102. # [22:01] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
  103. # [22:04] * Joins: heycam (cam@203.214.127.179)
  104. # [22:06] * Joins: gavin (gavin@74.103.208.221)
  105. # [22:09] * Joins: Sander (svl@86.87.68.167)
  106. # [22:38] * Quits: sbuluf (prw@200.49.140.153) (Ping timeout)
  107. # [22:38] * Joins: sbuluf (luv@200.49.140.184)
  108. # [23:05] * Quits: ROBOd (robod@86.34.246.154) (Quit: http://www.robodesign.ro )
  109. # [23:08] <Philip`> http://canvex.lazyilluminati.com/misc/stats/analyse.cgi/index - is this kind of thing vaguely useful?
  110. # [23:08] <Philip`> (Doesn't work well in FF2 - use a proper browser, like Opera or FF3 or IE6, or tell me how to fix my CSS :-p )
  111. # [23:11] <zcorpan_> Philip`: how many with no doctype?
  112. # [23:12] <Philip`> Not sure, since my current data collection is rubbish - it's treated like a single 90MB HTML page, rather than preserving the page boundaries
  113. # [23:13] <Philip`> but I'll try to fix that, so I can count pages without doctypes, and so I can count the numbers of pages using each tag
  114. # [23:13] <zcorpan_> for the attributes, could you also say what element the attribute was found?
  115. # [23:13] <Philip`> How is that different to what e.g. http://canvex.lazyilluminati.com/misc/stats/analyse.cgi/attribute/href shows?
  116. # [23:14] <zcorpan_> ah. so why is "class" there several times?
  117. # [23:15] <Philip`> Um...
  118. # [23:15] <Philip`> Good point
  119. # [23:15] <zcorpan_> perhaps you should combine them
  120. # [23:17] <Philip`> Done
  121. # [23:17] <Philip`> That was just a bug :-)
  122. # [23:21] <zcorpan_> nice :)
  123. # [23:24] <zcorpan_> when you know how many pages have no doctype, it would be nice to know how many pages are in quirks, almost standards, and standards mode
  124. # [23:25] <zcorpan_> (though you can't know that by just tokenizing)
  125. # [23:26] <zcorpan_> or perhaps you can... but not by just looking at any doctype tokens
  126. # [23:27] <Philip`> (As a scale for the current doctype numbers, there were originally 2522 pages in total, and I guess not many have >1 doctype, so it looks like 20% were using XHTML Transitional)
  127. # [23:28] <zcorpan_> how many doctypes did you find in total?
  128. # [23:29] <Philip`> It looks like it'd be fairly easy to implement the little bit at the start of the tree construction algorithm, just to determine the quirkiness of each document
  129. # [23:30] <zcorpan_> yeah
  130. # [23:30] <Philip`> I see 1618 doctypes in total
  131. # [23:30] <zcorpan_> ~ 64%
  132. # [23:31] <Philip`> That's a surprising number of people who actually read something about HTML at least once, instead of just hitting random characters into a blank page until the right output comes
  133. # [23:31] <Philip`> though I expect there'd be quite different results if looking at a different sampling of pages
  134. # [23:32] <zcorpan_> presence of doctype doesn't imply the author having read something about html
  135. # [23:32] <Philip`> They must have either read some tutorial or read the source of someone else's page
  136. # [23:32] <zcorpan_> or used a tool that emits it for them
  137. # [23:32] <Philip`> Oh, yes
  138. # [23:32] <Philip`> (How many people actually use tools?)
  139. # [23:33] <zcorpan_> where tool can be wysiwyg editor
  140. # [23:33] <Philip`> There's still some HoTMetaL Pro out there...
  141. # [23:33] <zcorpan_> or a text editor with default templates
  142. # [23:34] <zcorpan_> etc
  143. # [23:34] * Joins: tobywoby (tinfish@84.92.181.183)
  144. # [23:34] * tobywoby is now known as tinfish
  145. # [23:35] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Quit: Less talk, more pimp walk.)
  146. # [23:36] <zcorpan_> on the web scale, my expectation is that 50% don't use a doctype at all, 90% are in quirks mode, 9% are in almost standards mode, and the remaining in standards mode
  147. # [23:39] <Sander> zcorpan_: agreed if you're counting by domain. But by page, the amount of pages in standards mode is much larger, due to a number of large weblog systems having valid doctypes (think wordpress, livejournal, etc)
  148. # [23:41] <zcorpan_> Sander: don't they generally use almost standards mode doctypes?
  149. # [23:41] <Sander> no
  150. # [23:41] * Philip` wonders if there's any useful information that should be gathered from a web survey, that would require more than just the tokeniser
  151. # [23:41] * zcorpan_ thought WP used xhtml transitional
  152. # [23:42] <zcorpan_> livejournal seems to be xhtml transitional too
  153. # [23:42] <zcorpan_> which is almost standards mode, not standards mode
  154. # [23:43] <Sander> hmm, right
  155. # [23:43] <Sander> weird - I could swear page info showed when in Almost Standards Mode
  156. # [23:44] <Sander> it says "Render Mode: Standards compliance mode" - but you're right about the transitional
  157. # [23:44] <Philip`> Other things I want to count: parse errors (split into the individual causes of errors); attribute quotedness (single/double/none) and tag/attribute uppercasesness, to see how many people already write stuff as if it's XML; each step in the tokeniser algorithm; can't think of anything else at the moment
  158. # [23:45] <Philip`> *uppercasedness
  159. # [23:45] <Philip`> ((Not that that's a legitimate word in any case))
  160. # [23:49] <zcorpan_> did you filter out feeds?
  161. # [23:50] <Philip`> I didn't do any filtering (hence there being occasional PDF junk too)
  162. # [23:50] <Philip`> (For now, I'm just using the data that I collected months ago)
  163. # [23:50] <Philip`> (but I'd like to get better data)
  164. # [23:52] * zcorpan_ notes that 2 pages use <meta charset>
  165. # [23:53] <Philip`> Looks like none use <!DOCTYPE HTML>
  166. # [23:54] <Philip`> There's two <footer> and one <header>
  167. # [23:55] <Philip`> and two <section>
  168. # [23:55] <Philip`> so some people are using HTML5 features, even though they never actually meant to
  169. # [23:56] <zcorpan_> how do you know what they meant?
  170. # [23:57] * Quits: sbuluf (luv@200.49.140.184) (Ping timeout)
  171. # Session Close: Sun Jul 15 00:00:00 2007

The end :)