/irc-logs / w3c / #html-wg / 2007-10-29 / end

Options:

  1. # Session Start: Mon Oct 29 00:00:00 2007
  2. # Session Ident: #html-wg
  3. # [01:18] * Disconnected
  4. # [01:18] * Attempting to rejoin channel #html-wg
  5. # [01:18] * Rejoined channel #html-wg
  6. # [01:18] * Topic is 'next HTML WG telcon 25 Oct 2300Z http://www.w3.org/html/wg/ (more logs: http://krijnhoetmer.nl/irc-logs/ )'
  7. # [01:18] * Set by DanC on Mon Oct 22 15:50:08
  8. # [01:20] * Quits: aroben (aroben@67.160.250.192) (Connection reset by peer)
  9. # [01:49] * Quits: marcos (chatzilla@131.181.148.226) (Connection reset by peer)
  10. # [02:36] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  11. # [02:41] * Joins: gavin (gavin@99.227.30.12)
  12. # [03:03] * Quits: deltab (deltab@82.36.30.34) (Client exited)
  13. # [03:03] * Joins: deltab (deltab@82.36.30.34)
  14. # [03:16] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  15. # [03:22] * Joins: mjs (mjs@64.81.48.145)
  16. # [04:03] * Joins: shepazu (schepers@128.30.52.30)
  17. # [04:25] * Quits: shepazu (schepers@128.30.52.30) (Client exited)
  18. # [04:43] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  19. # [04:48] * Joins: gavin (gavin@99.227.30.12)
  20. # [04:49] * Joins: aroben (adamroben@67.160.250.192)
  21. # [05:00] * Quits: aroben (adamroben@67.160.250.192) (Quit: aroben)
  22. # [05:03] * Joins: aroben (aroben@67.160.250.192)
  23. # [05:37] * Joins: aroben_ (aroben@67.160.250.192)
  24. # [05:38] * Quits: aroben (aroben@67.160.250.192) (Ping timeout)
  25. # [05:49] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  26. # [06:16] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  27. # [06:21] * Quits: aroben_ (aroben@67.160.250.192) (Quit: Leaving)
  28. # [06:50] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  29. # [06:55] * Joins: gavin (gavin@99.227.30.12)
  30. # [07:01] * Joins: mjs (mjs@64.81.48.145)
  31. # [07:19] * Joins: aroben (aroben@67.160.250.192)
  32. # [07:40] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  33. # [07:49] * Joins: mjs (mjs@64.81.48.145)
  34. # [07:54] * Quits: dbaron (dbaron@71.204.145.103) (Quit: 8403864 bytes have been tenured, next gc will be global.)
  35. # [08:00] * Joins: Sander (svl@86.87.68.167)
  36. # [08:49] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
  37. # [08:56] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Client exited)
  38. # [08:58] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  39. # [09:01] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  40. # [09:03] * Joins: gavin (gavin@99.227.30.12)
  41. # [09:03] * Quits: sbuluf (olgkp@200.49.140.188) (Ping timeout)
  42. # [09:08] * Joins: tH_ (Rob@87.102.47.210)
  43. # [09:08] * tH_ is now known as tH
  44. # [09:26] * Quits: aroben (aroben@67.160.250.192) (Ping timeout)
  45. # [10:07] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  46. # [10:08] * Quits: Lachy (Lachy@213.236.208.22) (Quit: Leaving)
  47. # [10:12] * Joins: Lachy (Lachy@213.236.208.22)
  48. # [10:16] * Joins: tH_ (Rob@87.102.45.182)
  49. # [10:17] * Quits: tH (Rob@87.102.47.210) (Ping timeout)
  50. # [10:17] * tH_ is now known as tH
  51. # [10:18] * Joins: mjs (mjs@64.81.48.145)
  52. # [10:28] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  53. # [10:36] * Joins: mjs (mjs@64.81.48.145)
  54. # [10:59] * Quits: xover (xover@193.157.66.5) (Quit: Leaving)
  55. # [11:05] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  56. # [11:10] * Joins: gavin (gavin@99.227.30.12)
  57. # [11:15] * Joins: olivier (ot@128.30.52.30)
  58. # [11:17] * Joins: myakura (myakura@210.227.200.92)
  59. # [11:17] * Joins: ROBOd (robod@89.122.216.38)
  60. # [11:44] * Quits: olivier (ot@128.30.52.30) (Ping timeout)
  61. # [12:20] * Quits: Lachy (Lachy@213.236.208.22) (Quit: Leaving)
  62. # [12:32] * Joins: Lachy (Lachy@213.236.208.22)
  63. # [12:58] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
  64. # [13:05] * Quits: ROBOd (robod@89.122.216.38) (Quit: http://www.robodesign.ro )
  65. # [13:12] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  66. # [13:17] * Joins: gavin (gavin@99.227.30.12)
  67. # [13:29] * Joins: olivier (ot@128.30.52.30)
  68. # [13:49] * Joins: matt (matt@128.30.52.30)
  69. # [14:07] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  70. # [14:10] <anne> hsivonen, "Required attributes missing on element img from namespace http://www.w3.org/1999/xhtml" is not that friendly
  71. # [14:10] <anne> maybe say upfront that you're validating "HTML" and leave all the namespace crap out of it?
  72. # [14:11] <anne> or call it the "HTML <img> element"
  73. # [14:11] <anne> the suggestions are nice btw
  74. # [14:13] <anne> It would be nice if the W3C coordinated with you as your validator seems to be improving more quickly than theirs
  75. # [14:17] * Joins: Lachy_ (Lachy@213.236.208.22)
  76. # [14:17] * Quits: Lachy (Lachy@213.236.208.22) (Connection reset by peer)
  77. # [14:25] * Joins: karl (karlcow@128.30.52.30)
  78. # [14:28] <hsivonen> anne: OK. I'll make the UI rendering of names from well-known namespaces nicer
  79. # [14:28] <hsivonen> anne: as for telling which attributes are missing, I'm waiting for upstream to fix that one
  80. # [14:30] <hsivonen> anne: leaving the namespace "crap" completely out would be problematic in cases of XML validation and bad ns declarations and with compound documents
  81. # [14:30] <hsivonen> anne: I intend to enable XHTML5+SVG 1.1 in due course
  82. # [14:31] <anne> well, my complete suggestion would be to leave in namespaces for XML, but special case all forms of HTML, XHTML, and probably SVG, MathML, XBL and combinations of those
  83. # [14:32] <hsivonen> anne: that's doable
  84. # [14:51] * Quits: myakura (myakura@210.227.200.92) (Ping timeout)
  85. # [14:59] * Joins: myakura (myakura@210.227.200.92)
  86. # [15:04] * Joins: ROBOd (robod@89.122.216.38)
  87. # [15:11] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Client exited)
  88. # [15:18] <hsivonen> anne: what's your take on xml:lang, xml:base, etc.: should I say attribute xml:lang or XML attribute lang?
  89. # [15:18] <hsivonen> I'd go with xml:lang.
  90. # [15:19] <Dashiva> I'd say they're more familiar as xml:lang etc
  91. # [15:19] <hsivonen> yeah.
  92. # [15:19] <hsivonen> special cases
  93. # [15:19] <hsivonen> yay
  94. # [15:19] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  95. # [15:24] * Joins: gavin (gavin@99.227.30.12)
  96. # [15:26] <DanC> hsivonen, do you want me to add you to the list of people with write access to tracker? http://www.w3.org/html/wg/tracker/ Are you interested to chat with James and Gregory and Julian and/or me and Chris every once in a while about it?
  97. # [15:27] <DanC> I think you're already chatting with us pretty regularly
  98. # [15:29] <hsivonen> DanC: yes, tracker access would be nice, but I cannot commit to doing regular issue triage work
  99. # [15:31] <DanC> I'm inclined to only give write access to people who are willing to commit, at least for a while.
  100. # [15:33] <hsivonen> DanC: ok
  101. # [15:34] <hsivonen> anne: I'm considering setting up a logger to catch namespace URIs for which I forgot to create human-readable UI strings
  102. # [15:37] * Joins: billmason (billmason@69.30.57.156)
  103. # [15:39] * Quits: aaronlev (chatzilla@66.31.86.217) (Ping timeout)
  104. # [15:45] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
  105. # [15:48] <anne> hsivonen, xml:lang, yeah
  106. # [15:48] <anne> hsivonen, if it's not too complicated...
  107. # [15:49] <anne> although I wouldn't bother for XSLT, Atom, etc. I think
  108. # [15:49] <anne> the XSLT audience prolly likes the namespace to be there and the Atom audience should really be visiting feedvalidator.org
  109. # [16:16] <anne> I wonder if "the handful" will now get a ton of requests from people to add their issue to the tracker page...
  110. # [16:17] <DanC> we'll see. either way seems OK
  111. # [16:20] <DanC> re "validator seems to be improving more quickly than theirs", any particular coordination you think would help, Anne? I have a TAG action to work with TimBL and Olivier on validation and extensibility
  112. # [16:20] <anne> coordination with hsivonen, I suppose
  113. # [16:20] <DanC> what sort of coordination?
  114. # [16:21] * olivier hopes to see henri at the TPAC
  115. # [16:21] <anne> maybe replacing the W3C validator with his over time, dunno
  116. # [16:21] <DanC> I get "Attribute profile not allowed on element head from namespace http://www.w3.org/1999/xhtml at this point.". :-/
  117. # [16:21] <DanC> not a feature, IMO
  118. # [16:22] <anne> that's not a problem with the validator DanC and I'm not sure why you bring that up in a discussion about it...
  119. # [16:22] * Quits: myakura (myakura@210.227.200.92) (Quit: Leaving...)
  120. # [16:22] <anne> unless of course you're validating XHTML 1.0 or HTML4
  121. # [16:22] <anne> in which case this may be a bug
  122. # [16:23] <DanC> I bring it up because I was spot-checking your claim that html5.validator.nu is improving faster than validator.w3.org ; in my opinion, that's not an improvement
  123. # [16:23] <anne> although otoh, it has been suggested that the validator shouldn't really validate against versioned formats either, and evolve with "browser engines" and "deployed content"
  124. # [16:24] <anne> DanC, well, it's correct per HTML 5, it seems wrong to attack the validator for that
  125. # [16:25] <DanC> I'm not attacking; just observing. I made a request to change the HTML 5 draft; it's perfectly within Henri's power to modify his code in advance of changes to the spec
  126. # [16:25] <anne> whatever
  127. # [16:26] <DanC> anne, that's out of line.
  128. # [16:28] <anne> well, you draw the discussion away from validators to <head profile>, you assume the spec will change
  129. # [16:28] <DanC> I don't assume; I just play my part in the WG
  130. # [16:29] <anne> s/will change/will change with respect to <head profile>/
  131. # [16:30] <DanC> no, I don't assume that either; but i have made a request, and it is within henri's control to accept that request
  132. # [16:30] <DanC> the validator is an important part of the feedback loop between authors and spec developers.
  133. # [16:31] <DanC> since I think head/@profile is worth keeping, I'm not inclined to take a validator that discourages it and deploy it more widely
  134. # [16:31] <anne> if Henri makes <head profile> conforming HTML5 that feedback will go away
  135. # [16:32] <DanC> right; in this case, the spec should change, not the document. (IMO)
  136. # [16:32] <anne> and my point wasn't so much about detalis like that, but more that the architecture of Henr's validator seems better as it isn't based on SGML anymore and also uses a proper XML parser for XHTML
  137. # [16:33] * olivier wishes there can be coordination other that "replace the W3C validator with (henri's)". It would be nicer to have people work together on some common tool. Having separate developers build separate tools is good, asking that one be trashed for another, not constructive IMHO
  138. # [16:33] <olivier> that's why I hope to see henri, would like to meet Jirka, etc
  139. # [16:33] <DanC> "trashed" is your word, not his, Olivier
  140. # [16:33] <olivier> indeed
  141. # [16:33] <olivier> I didn't claim it was
  142. # [16:34] <DanC> right; you're adding heat by that choice of words.
  143. # [16:34] * Joins: aroben (aroben@67.160.250.192)
  144. # [16:34] <DanC> if what you really want is more coordination, you do well to choose other words
  145. # [16:35] <DanC> using a conforming XML parser seems long overdue
  146. # [16:35] <anne> (I didn't really see that clearly, the HTML part is based on the HTML parser in the HTML5 spec with some provisions for HTML4 as HTML5 is not yet a standard.)
  147. # [16:35] <olivier> Dan, the truth is that it is very much the mindset. everyone thinks their own tool has something special, and would rather have the others replaced than work on getting tools working together
  148. # [16:36] <anne> I'm not involved in Henri's validator in any way olivier and I was the one suggesting it, not hsivonen
  149. # [16:36] <olivier> that's why I like the idea of working with Henri and Jirka on a common output
  150. # [16:36] <anne> to be clear
  151. # [16:36] <DanC> ok, but such generalities don't get us closer to the goal. What else can you tell us about Henris' architecture, anne? (I'm also looking for pointers)
  152. # [16:36] <olivier> but that has not seen much progress
  153. # [16:36] <anne> I do provide IRC-level feedback to hsivonen now and then
  154. # [16:37] <olivier> (although there were threads in that direction lately, I haven't managed to grab henri's attention on it yet)
  155. # [16:37] <anne> DanC, http://about.validator.nu/ maybe?
  156. # [16:37] <anne> DanC, I don't consider that a generality btw, it seems pretty fundamental to me if you want to provide feedback on syntax
  157. # [16:38] <olivier> danc, sure. Let's get out of generalities: how do we merge capabilities of different engines built in different languages
  158. # [16:38] <DanC> by "generalities" I meant "everyone thinks their own tool has something special"
  159. # [16:39] <DanC> which capabilities are you most interested in, olivier/
  160. # [16:39] <DanC> ?
  161. # [16:39] <DanC> the main thing I get from http://about.validator.nu/ is RELAX-NG. I find that fairly desirable, but only as a means to an end; I haven't seen a RELAX-NG service with lots of work on user-friendly diagnostics.
  162. # [16:40] <DanC> what about nuts-and-bolts software architecture? I thought validator.nu was java, but the build seems to be python.
  163. # [16:40] <olivier> I think for HTML <= 4.01 nothing beats the flexibility and usefuleness of opensp (C, perl) but for XML based languages relax ng,schematron and nrl would be great. as for html5, if the group really goes forward without using schemas, then plugging a parser that groks the html5 parsing algo with something that has good UI and explanations
  164. # [16:41] <olivier> I thought validator.nu was java indeed, but perhaps it was switched to libhtml5 (which is python)?
  165. # [16:42] <DanC> I think it's highly unlikely that a schema will ever be an integral part of the HTML 5 spec. I think treating it as an implementation is a reasonable approach (though probably not what I'd do if I were editing the spec).
  166. # [16:42] <olivier> right
  167. # [16:42] <anne> it's Java
  168. # [16:43] <olivier> so that's at least 3 very different parsing/checking methods
  169. # [16:43] <olivier> not to mention xml schema support, for e.g voicexml
  170. # [16:43] <anne> other aspects is that it has a custom checker for lots of attribute values, a table checker
  171. # [16:43] <anne> so it checks for HTML tables if the table is marked up correctly
  172. # [16:43] <olivier> anne: do you know if it can check improper deep nesting?
  173. # [16:44] <olivier> e.g <form><p><form> ?
  174. # [16:44] <anne> yeah, it does that
  175. # [16:44] <olivier> very cool
  176. # [16:44] <olivier> that's hard to do with a schema
  177. # [16:44] <anne> it's based on a mix of RelaxNG, Schematron, and custom code
  178. # [16:44] * anne saw a presentation during XTech 2006 about it
  179. # [16:45] <DanC> I'm inclined to give HTML-specific stuff like table checking priority over general-purpose stuff like XML Schema and voiceXML. I think validator.w3.org should focus on the bulk of web content; if we need a separate special-purpose checker for VoiceXML, that seems OK to me
  180. # [16:45] <anne> it's really quite neat, it basically checks everything you can possibly check without requiring something that's turing complete
  181. # [16:45] <olivier> danc: yeah
  182. # [16:45] <DanC> schematron is turing complete
  183. # [16:45] <Philip> anne: By "Turing complete", do you mean "AI complete"?
  184. # [16:45] <olivier> but relaxng and nrl for svg seems important
  185. # [16:46] <anne> Philip, I suppose
  186. # [16:46] <DanC> nrl... is that different from NVDL?
  187. # [16:46] <olivier> old name of it, sorry
  188. # [16:46] <olivier> but same thing
  189. # [16:46] <DanC> ok
  190. # [16:46] <Philip> anne: There's quite a difference between them :-)
  191. # [16:46] * DanC hunts for the source for the table checker; seems to remember it's in Java
  192. # [16:47] <DanC> http://hsivonen.iki.fi/table-integrity-checker/ ...
  193. # [16:47] * Joins: hasather (hasather@90.231.107.133)
  194. # [16:48] <DanC> irony! from sgmllib import SGMLParser -- http://svn.versiondude.net/whattf/build/trunk/build.py
  195. # [16:48] <anne> Philip, oh, wait, I think I just meant turing complete, but you're right
  196. # [16:49] * anne wasn't really paying much attention to what Philip said, oops
  197. # [16:49] <DanC> wild... there's a pile of java stuff, and the python code is for fetching it from hither and yon
  198. # [16:50] <Philip> anne: Hmm, now I have no idea what you meant :-p
  199. # [16:50] <DanC> hmm... return code isn't checked; os.system(cmd)
  200. # [16:51] <DanC> that's in runCmd; in execCmd, the return code _is_ checked. odd.
  201. # [16:53] * Quits: aroben (aroben@67.160.250.192) (Ping timeout)
  202. # [16:53] * Joins: ChrisWilson (cwilso@131.107.0.105)
  203. # [16:53] * Quits: ChrisWilson2 (cwilso@131.107.0.73) (Ping timeout)
  204. # [16:54] <anne> Philip, sorry, turing complete is needed for ECMAScript checking and such but AI complete is needed for semantics; I meant turing complete, but AI complete would indeed be better
  205. # [16:56] <Philip> anne: Ah, right
  206. # [16:57] <DanC> still looking for table checker source; doesn't seem to be in http://svn.versiondude.net/whattf/validator/trunk/src/nu/
  207. # [16:57] <Philip> but Turing completeness still doesn't let you check ECMAScript for some properties, because of the halting problem
  208. # [16:58] <DanC> wild... http://svn.versiondude.net/whattf/util/trunk/src/nu/validator/json/Serializer.java ... he wrote his own JSON serializer? or used one from MOZ and renamed it package nu.validator.json?
  209. # [16:59] <DanC> aha! http://svn.versiondude.net/whattf/syntax/trunk/non-schema/java/src/org/whattf/checker/table/
  210. # [17:00] <DanC> now... what's the interface between the table checker and the rest? (and how does it compare to the Unicorn interface?)
  211. # [17:01] <DanC> public final class TableChecker extends Checker {
  212. # [17:01] <DanC> import org.whattf.checker.Checker;
  213. # [17:01] <DanC> clearly a web browser is not the intended mechanism to browse this code ;-)
  214. # [17:02] <DanC> but it's not bad... http://svn.versiondude.net/whattf/syntax/trunk/non-schema/java/src/org/whattf/checker/Checker.java
  215. # [17:02] <DanC> * The abstract base class for SAX-based content checkers that listen to
  216. # [17:02] <DanC> * the <code>ContentHandler</code> events and emit errors and warnings to
  217. # [17:02] <DanC> * an <code>ErrorHandler</code>.
  218. # [17:03] <DanC> olivier, what's the closest thing in Unicorn? I suppose it's bytestream based; it's reasonably straightforward to stick an XML parser in there to turn a bytestream into a sequence of sax events
  219. # [17:05] <DanC> hmm... it's an abstract class, not a java interface. I gather people don't really use java interfaces as much as the Modula-3 crowd used interfaces.
  220. # [17:08] <hsivonen> anne: I think I'm going to do human-readable names for XSLT and Atom as well, although I agree that Feed Validator does Atom better so I'm not advertising Validator.nu for Atom purposes
  221. # [17:09] <hsivonen> DanC: back at XTech, there was some preliminary discussion about running a copy of the Validator.nu software (unnamed back then) in the w3.org space
  222. # [17:09] <hsivonen> DanC: I said that it would be good, but it should probably wait until I had a better parser
  223. # [17:09] <hsivonen> DanC: I now have a better parser
  224. # [17:10] <DanC> I ran across http://relaxed.sourceforge.net/ the other day; it's also pretty interesting.
  225. # [17:11] <hsivonen> DanC: Re: profile. if Validator.nu does something that you don't like and Validator.nu is merely following the HTML 5 draft, I think it isn't a Validator.nu bug per se
  226. # [17:11] <hsivonen> DanC: I do have some diffs from the draft, though, when it is too obvious that implementing the current spec language is not worthwhile
  227. # [17:11] <hsivonen> DanC: e.g. style='' and <font>
  228. # [17:12] <hsivonen> DanC: It is in my power to make the code disagree with the spec, but I'd rather minimize the gap between the code and the spec
  229. # [17:14] <hsivonen> olivier: Re: common tool: what's the situation with Unicorn these days?
  230. # [17:15] <DanC> hsivonen, do the relaxng and schematron bits extend Checker the way TableChecker does?
  231. # [17:15] <hsivonen> olivier: sorry about not following up on the output format in a timely manner. I'll follow up shortly.
  232. # [17:16] <hsivonen> DanC: the architecture is described in http://hsivonen.iki.fi/thesis/html5-conformance-checker
  233. # [17:17] <hsivonen> DanC: the short story is that the parsers are SAX and the higher layer is RELAX NG, custom RELAX NG datatype library, schematron and hand-rolled Java depending on the suitability of each for a given subproblem
  234. # [17:18] <hsivonen> DanC: only the build script is Python
  235. # [17:19] <hsivonen> olivier: Validator.nu uses a library that is supposed to do XSD as well, but I turned it off, because it crashes and because I haven't gotten around to reviewing it for security
  236. # [17:19] * DanC skips to chapter 5 ... http://hsivonen.iki.fi/thesis/html5-conformance-checker#implementation
  237. # [17:20] <hsivonen> DanC: I wasn't aware that Schematron was Turing complete. It seems that it isn't conveniently Turing-complete at least if you are sticking to XPath 1.0
  238. # [17:21] * Quits: matt (matt@128.30.52.30) (Quit: matt)
  239. # [17:21] <hsivonen> Validator.nu has latent (totally untested) NRL and NVDL capability
  240. # [17:22] <olivier> hsivonen: we're restarting development on it
  241. # [17:22] <hsivonen> DanC: I wrote my own JSON serializer. All the ones I found for Java were non-streaming and would have been harder to glue on.
  242. # [17:22] <olivier> (re: unicorn)
  243. # [17:22] <olivier> first implementation was interesting proof of concept but not flexible enough for real world usage
  244. # [17:23] <hsivonen> DanC: no, the RELAX NG and Schematron bits implement the Validator interface
  245. # [17:23] <DanC> "XHTML5 does not allow the character encoding to be declared using the meta element". wild. is that still the case?
  246. # [17:23] <hsivonen> DanC: the Checker stuff is also wrapped in an adapter that makes them look like Validator instances as well
  247. # [17:23] <olivier> hsivonen: no problem for the mails, I was just planning to chat with you at tpac if you didn't have time before that
  248. # [17:24] <hsivonen> DanC: yes, meta charset is bogus in application/xhtml+xml
  249. # [17:24] <DanC> "For example, HTML5 allows the form feed character." oh my. that's gonna cost us a round with the I18N WG. Is that really worthwhile?
  250. # [17:24] <gsnedders> DanC: if we allowed that, we'd need similar algorithms to sniff the meta element as we have in HTML. is the XML declaration not enough?
  251. # [17:24] <gsnedders> DanC: (meta element for charset)
  252. # [17:25] <DanC> I'm not really interested in having two separate HTML languages, gsnedders
  253. # [17:25] <hsivonen> DanC: I've argued that we shouldn't allow Form Feed, but what problem does banning it solve (except XML round trippability)
  254. # [17:26] <gsnedders> DanC: so you'd rather totally drop |meta| for charset?
  255. # [17:26] <hsivonen> DanC: and why should it be non-conforming to grab an RFC file and put it in <pre>?
  256. # [17:26] <DanC> I don't know what's wrong with form feed, hsivonen , but I know I have to ask the I18N WG.
  257. # [17:27] <DanC> I wish we could put _some_ bounds on the design space for HTML 5. but no, we seem to be opening every single can of worms, bar none.
  258. # [17:27] * DanC looks up the decision to bar ff from XML...
  259. # [17:28] <DanC> XML decision record: http://www.w3.org/XML/9712-reports.html
  260. # [17:28] <DanC> hmm... "form feed" doesn't occur.
  261. # [17:29] <DanC> I was hoping to never think about such things again: "Decision: When an XML processor encounters any of the character sequences CR (UTF-16 x000D), LF (UTF-16 x000A), or CR LF (UTF-16 x000D x000A), the processor must pass a single LF character to the downstream application."
  262. # [17:30] <DanC> x000C doesn't occur.
  263. # [17:30] <hsivonen> DanC: XML has an interesting loophole that allows escaped carriage returns to make their way into the infoset
  264. # [17:30] <DanC> it doesn't seem to be an explicit decision of the XML WG
  265. # [17:30] * Joins: kingryan (rking3@208.66.64.47)
  266. # [17:31] <DanC> XML doesn't allow newlines in attribute values, even escaped. I saw a very angry blog article about that, from a guy trying to use <input type="hidden />.
  267. # [17:32] <hsivonen> DanC: yeah, that's one of the reasons why I'm not very fond of the XML spec writers guessing what the reasonable limits on the use of particular characters are
  268. # [17:32] <DanC> part of me would rather drop meta for charset, gsnedders ; it's an ugly hack. but it's now ubiquitously deployed and hence our responsibility to put it in the spec.
  269. # [17:32] <hsivonen> DanC: however, I thought escaped line feeds survived in attribute values
  270. # [17:32] <hsivonen> DanC: meta charset is *not* ubiquitous in application/xhtml+xml consumers
  271. # [17:32] <gsnedders> DanC: it needs to be in the parsing section, yes; it's requirement in the conformance section is slightly more questionable
  272. # [17:33] <hsivonen> DanC: which is why we don't have a legacy pressure to allow it
  273. # [17:33] * DanC doesn't care too much about the spec for application/xhtml+xml until he sees a viable deployment path for it
  274. # [17:34] <gsnedders> also, to support it in XHTML you couldn't use a verbatim XML parser
  275. # [17:35] <DanC> uncle. I don't care where charset is allowed. Clearly I'm not going to like any of the deployable designs.
  276. # [17:37] <DanC> hmm... if the XML decision record doesn't have something from the I18N WG on FF, maybe I don't need to notify them. let's check http://www.w3.org/TR/charmod/ ...
  277. # [17:38] <DanC> 000c and "form feed" and "formfeed" don't occur there either. whew.
  278. # [17:38] * Lachy_ is now known as Lachy
  279. # [17:39] * DanC checks for "whitespace" and "control character"...
  280. # [17:39] <DanC> wow... no "whitespace"
  281. # [17:39] <hsivonen> DanC: fwiw, HTML 5 has to violate some of the requirements charmod places on specs in order to be compatible with the Web
  282. # [17:39] <DanC> I wish you hadn't said that; now I have to ask you which requirements and get I18N WG review
  283. # [17:40] <hsivonen> Validator.nu checks for most charmod requirements except the ones that would seriously devalue errors
  284. # [17:41] * DanC follows a pointer to http://www.w3.org/TR/unicode-xml/ ...
  285. # [17:41] <DanC> for example?
  286. # [17:41] <DanC> "[HTML4.01] adds to these the form feed character (U+000C), but that character cannot be used in any XHTML version."
  287. # [17:42] <hsivonen> I'm trying to find the charmod violations. just a moment
  288. # [17:42] <DanC> -- section 7. White Space http://www.w3.org/TR/unicode-xml/#White
  289. # [17:42] <hsivonen> haha. XML violates charmod C070
  290. # [17:43] <hsivonen> Validator.nu does not check for C049
  291. # [17:43] <DanC> are you sure? I think there's a recorded rationale for each excluded character.
  292. # [17:44] <gsnedders> including form-feed? :P
  293. # [17:44] <hsivonen> DanC: well, from the point of view of using XML, it sure feels rather arbitrary at times
  294. # [17:44] <DanC> yes, including form feed. I'll be surprised if I don't eventually find a reason why it was excluded from XML
  295. # [17:44] <gsnedders> I haven't found any looking around either, in any place I'd expect it.
  296. # [17:45] <gsnedders> Hopefully don't need to look deep into mailing lists
  297. # [17:45] <hsivonen> DanC: HTML5 violates charmod C027 (and has to do so in order to be useful)
  298. # [17:45] <DanC> "C027 [S] Specifications that require a default encoding MUST define either UTF-8 or UTF-16 as the default, or both if they define suitable means of distinguishing them."
  299. # [17:45] <DanC> what's the HTML5 default?
  300. # [17:45] <gsnedders> Windows-1252
  301. # [17:45] * DanC blinks, dumbfounded
  302. # [17:45] <gsnedders> needed for compat
  303. # [17:46] <hsivonen> C040 is not machine-testable
  304. # [17:46] <gsnedders> also need to treat any claim of ISO-8859-1 as Windows-1252 for compat
  305. # [17:47] <hsivonen> C045 SHOULD part is not honored (and rightly so)
  306. # [17:47] * Quits: billmason (billmason@69.30.57.156) (Quit: .)
  307. # [17:47] <gsnedders> hsivonen: the SHOULD?
  308. # [17:47] <hsivonen> gsnedders: hex over decimal
  309. # [17:47] <gsnedders> hsivonen: yeah
  310. # [17:47] <DanC> "User agents must at a minimum support the UTF-8 and Windows-1252 encodings, but may support more." -- 8.2.2.2. Character encoding requirements http://www.w3.org/html/wg/html5/ . wow. Maybe I don't want to chair this WG after all. I don't think I can take that as seriously as it evidently merits.
  311. # [17:48] <DanC> what spec does HTML5 cite for the definition of Windows-1252?
  312. # [17:48] <hsivonen> DanC: the spec doesn't cite any normative references properly yet
  313. # [17:48] <hsivonen> DanC: and the req for Windows-1252 is very serious indeed
  314. # [17:49] <gsnedders> DanC: the vast majority of the web is UTF-8 or Windows-1252
  315. # [17:49] <DanC> is that the default in firefox/gecko and safar/webkit? opera?
  316. # [17:49] <hsivonen> Validator.nu does not check charmod C047 as it is not well-defined
  317. # [17:49] <hsivonen> DanC: yes
  318. # [17:49] * DanC wimpers
  319. # [17:49] <gsnedders> DanC: so much breaks if you don't
  320. # [17:49] <ChrisWilson> Oh really Dan, don't be so surprised
  321. # [17:49] <ChrisWilson> :)
  322. # [17:50] <hsivonen> Validator.nu does not check for C048 because doing so would seriously devalue errors
  323. # [17:50] <gsnedders> from the #whatwg /topic: "Please leave your sense of logic at the door, thanks!"
  324. # [17:50] <gavin_> I'm pretty sure the Firefox default is not Windows-1252
  325. # [17:50] <DanC> I can see that I shouldn't be surprised, but... well... I am.
  326. # [17:50] <ChrisWilson> Everyone supports overlapping bold and italic tags too.
  327. # [17:50] <ChrisWilson> why is Windows 1252 not logical?
  328. # [17:50] <hsivonen> s/doing so/supporting it/
  329. # [17:51] <hsivonen> gavin_: It has to be Windows-1252 to be Web-compatible.
  330. # [17:51] <gavin_> it varies per-locale, but the default for en-US is ISO-8859-1 cross-platform, I believe
  331. # [17:51] <hsivonen> gavin_: but the user can change it
  332. # [17:51] <DanC> I don't remember reading a spec for Windows-1252; I'm totally unfamiliar with it.
  333. # [17:51] <hsivonen> gavin_: ISO-8859-1 in Gecko means Windows-1252
  334. # [17:51] <hsivonen> gavin_: (that's in the spec, too)
  335. # [17:51] <gavin_> hsivonen: ah, ok
  336. # [17:51] <gsnedders> DanC: just reassigns the control characters within 0x80-0xFF to actual printable characters from ISO-8859-1
  337. # [17:52] <DanC> it's registered. http://www.iana.org/assignments/character-sets -> http://www.iana.org/assignments/charset-reg/windows-1252
  338. # [17:52] <hsivonen> DanC: oh yeah, in HTML, ISO-8859-1 has to be treated as an alias for Windows-1252. that's a Support Existing Content requirement
  339. # [17:52] <hsivonen> DanC: that probably violates the letter of charmod
  340. # [17:53] <DanC> "Support Existing Content" is a principle, not a stop-thinking requirement.
  341. # [17:53] <gsnedders> ChrisWilson: I'll probably finally write you an email about parsing of HTTP responses this week, BTW
  342. # [17:53] <gavin_> I didn't realize Windows-1252 was a superset of ISO 8859-1
  343. # [17:53] <DanC> but clearly we should have test cases for treating ISO-8859-1 as Windows-1252
  344. # [17:54] <hsivonen> DanC: well, thinking leads to treating ISO-8859-1 as an alias for Windows-1252 for the purpose of consuming text/html
  345. # [17:54] <hsivonen> DanC: html5lib has tests
  346. # [17:54] <gsnedders> DanC: if HTML5 deviates from what is needed for the real world, implementers, myself included, will simply leave the WG.
  347. # [17:54] <DanC> wanna help me find which html5lib test, hsivonen ?
  348. # [17:54] <hsivonen> DanC: sure
  349. # [17:55] <DanC> yes, gsnedders , I'm after a spec to match real-world deployment too. sigh.
  350. # [17:55] <gsnedders> which means it really does need to be a stop-thinking-and-do-something-illogical requirement :(
  351. # [17:55] <hsivonen> DanC: testdata/encoding/tests1.dat second test
  352. # [17:55] <gsnedders> (though what is conforming in a document is far more open for discussion)
  353. # [17:56] <DanC> well, we only stop thinking after we've done some measurememt. evidently the measurement here is done and I'm late to the party.
  354. # [17:57] <gsnedders> most of the measurement has been done in UA development over the years, in all seriousness
  355. # [17:57] <DanC> quite.
  356. # [17:57] <DanC> now we just need to collect that into a test suite
  357. # [17:58] <gsnedders> does anyone know if Apple ever shipped a Safari release with SGML comment parsing?
  358. # [17:58] <hsivonen> DanC: http://hsivonen.iki.fi/test/iso8859/ contains measurement demos
  359. # [17:59] * Joins: aroben (aroben@17.255.98.208)
  360. # [17:59] <DanC> hsivonen, which row in http://hsivonen.iki.fi/test/iso8859/ISO-8859-1.htm tells me my browser is treating the page as 1252 rather than 8859-1?
  361. # [18:00] <hsivonen> DanC: every row that has a printable character has a matching rendering in the Byte column and in the Windows-1252 NCR column
  362. # [18:01] * Joins: aroben_ (aroben@17.203.12.72)
  363. # [18:01] <Philip> All three columns look identical to me
  364. # [18:02] <hsivonen> DanC: the fact that the ISO-8859-1 NCR and the Windows-1252 NCR columns match shows that NCRs pointing to C1 controls have to be treated as Windows-1252 code point references
  365. # [18:02] <Philip> presumably since the Windows-1252 mapping is applied to NCRs too
  366. # [18:02] <Philip> (Oh, what you said)
  367. # [18:03] <DanC> my tiny brain is not following. do all the rows support this line of reasoning, or just some of them? If just some, please nominate 1 for me to study.
  368. # [18:03] <hsivonen> DanC: IIRC, the Thai ISO encoding is also weird in the way that some C1 range points mean corresponding windows code points in deployed content
  369. # [18:04] * Quits: aroben (aroben@17.255.98.208) (Ping timeout)
  370. # [18:04] <hsivonen> DanC: rows 0x80 through 0xA0 are interesting
  371. # [18:04] <hsivonen> DanC: and of those, the ones that have printable characters (i.e. are assigned in Windows-1252) support the assertion
  372. # [18:04] <DanC> ok, for 0x80, what would my browser do if it were treating the data as 8859-1 rather than 1252?
  373. # [18:05] <hsivonen> DanC: display the euro sign
  374. # [18:05] <hsivonen> oops
  375. # [18:05] <hsivonen> I misread the question
  376. # [18:05] <hsivonen> if the browser were treating the data as ISO-8859-1, it should *not* render an euro sign there
  377. # [18:05] <DanC> what should it do?
  378. # [18:06] <DanC> I guess show a little hex-in-box or something?
  379. # [18:06] <hsivonen> DanC: that's a good question. I don't know a definitive answer, but rendering a replacement character would be reasonable
  380. # [18:07] * Joins: Thezilch (fuz007@68.54.228.249)
  381. # [18:07] <hsivonen> DanC: as far as I can tell, the rendering of C1 controls is not well-defined in a CSS formatter
  382. # [18:07] <hsivonen> (of if it is, I missed the spec)
  383. # [18:07] <DanC> ok, so this is a case of browsers filling in where the specs said the author shouldn't do that.
  384. # [18:08] <hsivonen> DanC: this is a case of making use of the goodness of Windows-1252 where ISO was being unuseful
  385. # [18:08] * DanC sees 80 thru 9f are unused, per http://en.wikipedia.org/wiki/ISO/IEC_8859-1
  386. # [18:08] <hsivonen> DanC: now it is just a part of the legacy weirdness when UTF-8 could give us what ISO-8859-1 could not
  387. # [18:09] <DanC> wow... this is evidently common knowledge... "Many web browsers treat the MIME charset ISO-8859-1 as Windows-1252 " -- http://en.wikipedia.org/wiki/Windows-1252 . I've been under a rock for a long time.
  388. # [18:09] <hsivonen> :-)
  389. # [18:10] <DanC> is 1252 new-ish? the euro character isn't that old, is it?
  390. # [18:10] * DanC follows his nose to http://en.wikipedia.org/wiki/Euro_sign
  391. # [18:11] <DanC> 1996
  392. # [18:11] <hsivonen> DanC: the euro sign was retrofitted in MacRoman and Windows-1252
  393. # [18:11] <ChrisWilson> yup. Very quickly.
  394. # [18:11] <ChrisWilson> (Shipped as a patch)
  395. # [18:12] <hsivonen> Microsoft did a *much* better job there than Apple. the fallout from the Apple quick fix still continues to suck
  396. # [18:13] <hsivonen> (in font design that is)
  397. # [18:13] <Philip> data:text/html;charset=utf-8,%3Cbody%3E%26%23x80%3B%3Cscript%3Ealert(document.body.innerHTML.charCodeAt(0))%3C%2Fscript%3E is an example of an interesting result despite not using iso-8859-1
  398. # [18:14] <hsivonen> the pre-euro Windows encoding has a distinct IANA name in theory, but virtually no one uses the pre-euro name
  399. # [18:14] * Joins: Sander (svl@86.87.68.167)
  400. # [18:21] <DanC> # HTML 5 defaults to Windows-1252, where charmod requires UTF-8/UTF-16 Dan Connolly (Monday, 29 October) http://lists.w3.org/Archives/Public/www-archive/2007Oct/0059.html
  401. # [18:22] * DanC does his duty and invites I18N review. :-/
  402. # [18:25] <Philip> (http://lists.w3.org/Archives/Public/www-archive/2007Oct/0058.html looks fun)
  403. # [18:29] <hsivonen> Philip: I had hoped XML 1.1 would just go away and be forgotten :-(
  404. # [18:33] <Philip> Are they proposing something other than just renaming XML 1.1 to XML 1.0?
  405. # [18:35] <hsivonen> Philip: dunno exactly, but they seem to suggest breaking consistency between various parser in order to make thing politically correct so that people can make parochial markup languages
  406. # [18:36] <hsivonen> this won't help anyone, of course, because the legacy would still cast enough uncertainty upon e.g. Khmer element names that Cambodian markup language designers would still be better off not using Khmer element names
  407. # [18:40] <hsivonen> besides, if XML Core is now willing to change whan version='1.0' means, we might as well do directly to XML5 parsing
  408. # [18:44] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  409. # [18:49] * Joins: gavin (gavin@99.227.30.12)
  410. # [18:49] <anne> DanC, that you're surprised is annoying, as it means the W3C is quite out of touch with reality
  411. # [18:50] <anne> hmm, maybe I shouldn't generalize so much, but I do get that feeling
  412. # [18:51] <ChrisWilson> That does seem like a large generalization based on a lack of knowledge on one specific item.
  413. # [18:51] <ChrisWilson> (I meant, Dan's lack of knowledge)
  414. # [18:51] <DanC> only one of the co-chairs was surprised; maybe that helps, anne?
  415. # [18:52] <DanC> And Richard Ishida, who is the W3C team member who is supposed to know about this stuff, doesn't seem to be surprised nor see a problem.
  416. # [18:53] <anne> DanC, the other co-chair works for a browser vendor
  417. # [18:53] <DanC> right; W3C has various checks and balances
  418. # [18:54] <anne> hsivonen, XML Core seems to be heading in the right direction anyway, seems like a good thing :)
  419. # [20:02] * Quits: ChrisWilson (cwilso@131.107.0.105) (Ping timeout)
  420. # [20:23] * Joins: ChrisWilson (cwilso@131.107.0.102)
  421. # [20:27] * Joins: mjs (mjs@64.81.48.145)
  422. # [22:28] * Disconnected
  423. # [22:28] * Attempting to rejoin channel #html-wg
  424. # [22:28] * Rejoined channel #html-wg
  425. # [22:28] * Topic is 'next HTML WG telcon 25 Oct 2300Z http://www.w3.org/html/wg/ (more logs: http://krijnhoetmer.nl/irc-logs/ )'
  426. # [22:28] * Set by DanC on Mon Oct 22 15:50:08
  427. # [22:47] * Joins: mjs (mjs@17.255.106.186)
  428. # [22:58] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
  429. # [23:03] * Joins: gavin (gavin@99.227.30.12)
  430. # [23:48] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  431. # Session Close: Tue Oct 30 00:00:00 2007

The end :)