Options:
- # Session Start: Mon Oct 29 00:00:00 2007
- # Session Ident: #html-wg
- # [01:18] * Disconnected
- # [01:18] * Attempting to rejoin channel #html-wg
- # [01:18] * Rejoined channel #html-wg
- # [01:18] * Topic is 'next HTML WG telcon 25 Oct 2300Z http://www.w3.org/html/wg/ (more logs: http://krijnhoetmer.nl/irc-logs/ )'
- # [01:18] * Set by DanC on Mon Oct 22 15:50:08
- # [01:20] * Quits: aroben (aroben@67.160.250.192) (Connection reset by peer)
- # [01:49] * Quits: marcos (chatzilla@131.181.148.226) (Connection reset by peer)
- # [02:36] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [02:41] * Joins: gavin (gavin@99.227.30.12)
- # [03:03] * Quits: deltab (deltab@82.36.30.34) (Client exited)
- # [03:03] * Joins: deltab (deltab@82.36.30.34)
- # [03:16] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
- # [03:22] * Joins: mjs (mjs@64.81.48.145)
- # [04:03] * Joins: shepazu (schepers@128.30.52.30)
- # [04:25] * Quits: shepazu (schepers@128.30.52.30) (Client exited)
- # [04:43] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [04:48] * Joins: gavin (gavin@99.227.30.12)
- # [04:49] * Joins: aroben (adamroben@67.160.250.192)
- # [05:00] * Quits: aroben (adamroben@67.160.250.192) (Quit: aroben)
- # [05:03] * Joins: aroben (aroben@67.160.250.192)
- # [05:37] * Joins: aroben_ (aroben@67.160.250.192)
- # [05:38] * Quits: aroben (aroben@67.160.250.192) (Ping timeout)
- # [05:49] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
- # [06:16] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
- # [06:21] * Quits: aroben_ (aroben@67.160.250.192) (Quit: Leaving)
- # [06:50] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [06:55] * Joins: gavin (gavin@99.227.30.12)
- # [07:01] * Joins: mjs (mjs@64.81.48.145)
- # [07:19] * Joins: aroben (aroben@67.160.250.192)
- # [07:40] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
- # [07:49] * Joins: mjs (mjs@64.81.48.145)
- # [07:54] * Quits: dbaron (dbaron@71.204.145.103) (Quit: 8403864 bytes have been tenured, next gc will be global.)
- # [08:00] * Joins: Sander (svl@86.87.68.167)
- # [08:49] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
- # [08:56] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Client exited)
- # [08:58] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [09:01] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
- # [09:03] * Joins: gavin (gavin@99.227.30.12)
- # [09:03] * Quits: sbuluf (olgkp@200.49.140.188) (Ping timeout)
- # [09:08] * Joins: tH_ (Rob@87.102.47.210)
- # [09:08] * tH_ is now known as tH
- # [09:26] * Quits: aroben (aroben@67.160.250.192) (Ping timeout)
- # [10:07] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
- # [10:08] * Quits: Lachy (Lachy@213.236.208.22) (Quit: Leaving)
- # [10:12] * Joins: Lachy (Lachy@213.236.208.22)
- # [10:16] * Joins: tH_ (Rob@87.102.45.182)
- # [10:17] * Quits: tH (Rob@87.102.47.210) (Ping timeout)
- # [10:17] * tH_ is now known as tH
- # [10:18] * Joins: mjs (mjs@64.81.48.145)
- # [10:28] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
- # [10:36] * Joins: mjs (mjs@64.81.48.145)
- # [10:59] * Quits: xover (xover@193.157.66.5) (Quit: Leaving)
- # [11:05] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [11:10] * Joins: gavin (gavin@99.227.30.12)
- # [11:15] * Joins: olivier (ot@128.30.52.30)
- # [11:17] * Joins: myakura (myakura@210.227.200.92)
- # [11:17] * Joins: ROBOd (robod@89.122.216.38)
- # [11:44] * Quits: olivier (ot@128.30.52.30) (Ping timeout)
- # [12:20] * Quits: Lachy (Lachy@213.236.208.22) (Quit: Leaving)
- # [12:32] * Joins: Lachy (Lachy@213.236.208.22)
- # [12:58] * Quits: mjs (mjs@64.81.48.145) (Quit: mjs)
- # [13:05] * Quits: ROBOd (robod@89.122.216.38) (Quit: http://www.robodesign.ro )
- # [13:12] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [13:17] * Joins: gavin (gavin@99.227.30.12)
- # [13:29] * Joins: olivier (ot@128.30.52.30)
- # [13:49] * Joins: matt (matt@128.30.52.30)
- # [14:07] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
- # [14:10] <anne> hsivonen, "Required attributes missing on element img from namespace http://www.w3.org/1999/xhtml" is not that friendly
- # [14:10] <anne> maybe say upfront that you're validating "HTML" and leave all the namespace crap out of it?
- # [14:11] <anne> or call it the "HTML <img> element"
- # [14:11] <anne> the suggestions are nice btw
- # [14:13] <anne> It would be nice if the W3C coordinated with you as your validator seems to be improving more quickly than theirs
- # [14:17] * Joins: Lachy_ (Lachy@213.236.208.22)
- # [14:17] * Quits: Lachy (Lachy@213.236.208.22) (Connection reset by peer)
- # [14:25] * Joins: karl (karlcow@128.30.52.30)
- # [14:28] <hsivonen> anne: OK. I'll make the UI rendering of names from well-known namespaces nicer
- # [14:28] <hsivonen> anne: as for telling which attributes are missing, I'm waiting for upstream to fix that one
- # [14:30] <hsivonen> anne: leaving the namespace "crap" completely out would be problematic in cases of XML validation and bad ns declarations and with compound documents
- # [14:30] <hsivonen> anne: I intend to enable XHTML5+SVG 1.1 in due course
- # [14:31] <anne> well, my complete suggestion would be to leave in namespaces for XML, but special case all forms of HTML, XHTML, and probably SVG, MathML, XBL and combinations of those
- # [14:32] <hsivonen> anne: that's doable
- # [14:51] * Quits: myakura (myakura@210.227.200.92) (Ping timeout)
- # [14:59] * Joins: myakura (myakura@210.227.200.92)
- # [15:04] * Joins: ROBOd (robod@89.122.216.38)
- # [15:11] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Client exited)
- # [15:18] <hsivonen> anne: what's your take on xml:lang, xml:base, etc.: should I say attribute xml:lang or XML attribute lang?
- # [15:18] <hsivonen> I'd go with xml:lang.
- # [15:19] <Dashiva> I'd say they're more familiar as xml:lang etc
- # [15:19] <hsivonen> yeah.
- # [15:19] <hsivonen> special cases
- # [15:19] <hsivonen> yay
- # [15:19] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [15:24] * Joins: gavin (gavin@99.227.30.12)
- # [15:26] <DanC> hsivonen, do you want me to add you to the list of people with write access to tracker? http://www.w3.org/html/wg/tracker/ Are you interested to chat with James and Gregory and Julian and/or me and Chris every once in a while about it?
- # [15:27] <DanC> I think you're already chatting with us pretty regularly
- # [15:29] <hsivonen> DanC: yes, tracker access would be nice, but I cannot commit to doing regular issue triage work
- # [15:31] <DanC> I'm inclined to only give write access to people who are willing to commit, at least for a while.
- # [15:33] <hsivonen> DanC: ok
- # [15:34] <hsivonen> anne: I'm considering setting up a logger to catch namespace URIs for which I forgot to create human-readable UI strings
- # [15:37] * Joins: billmason (billmason@69.30.57.156)
- # [15:39] * Quits: aaronlev (chatzilla@66.31.86.217) (Ping timeout)
- # [15:45] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
- # [15:48] <anne> hsivonen, xml:lang, yeah
- # [15:48] <anne> hsivonen, if it's not too complicated...
- # [15:49] <anne> although I wouldn't bother for XSLT, Atom, etc. I think
- # [15:49] <anne> the XSLT audience prolly likes the namespace to be there and the Atom audience should really be visiting feedvalidator.org
- # [16:16] <anne> I wonder if "the handful" will now get a ton of requests from people to add their issue to the tracker page...
- # [16:17] <DanC> we'll see. either way seems OK
- # [16:20] <DanC> re "validator seems to be improving more quickly than theirs", any particular coordination you think would help, Anne? I have a TAG action to work with TimBL and Olivier on validation and extensibility
- # [16:20] <anne> coordination with hsivonen, I suppose
- # [16:20] <DanC> what sort of coordination?
- # [16:21] * olivier hopes to see henri at the TPAC
- # [16:21] <anne> maybe replacing the W3C validator with his over time, dunno
- # [16:21] <DanC> I get "Attribute profile not allowed on element head from namespace http://www.w3.org/1999/xhtml at this point.". :-/
- # [16:21] <DanC> not a feature, IMO
- # [16:22] <anne> that's not a problem with the validator DanC and I'm not sure why you bring that up in a discussion about it...
- # [16:22] * Quits: myakura (myakura@210.227.200.92) (Quit: Leaving...)
- # [16:22] <anne> unless of course you're validating XHTML 1.0 or HTML4
- # [16:22] <anne> in which case this may be a bug
- # [16:23] <DanC> I bring it up because I was spot-checking your claim that html5.validator.nu is improving faster than validator.w3.org ; in my opinion, that's not an improvement
- # [16:23] <anne> although otoh, it has been suggested that the validator shouldn't really validate against versioned formats either, and evolve with "browser engines" and "deployed content"
- # [16:24] <anne> DanC, well, it's correct per HTML 5, it seems wrong to attack the validator for that
- # [16:25] <DanC> I'm not attacking; just observing. I made a request to change the HTML 5 draft; it's perfectly within Henri's power to modify his code in advance of changes to the spec
- # [16:25] <anne> whatever
- # [16:26] <DanC> anne, that's out of line.
- # [16:28] <anne> well, you draw the discussion away from validators to <head profile>, you assume the spec will change
- # [16:28] <DanC> I don't assume; I just play my part in the WG
- # [16:29] <anne> s/will change/will change with respect to <head profile>/
- # [16:30] <DanC> no, I don't assume that either; but i have made a request, and it is within henri's control to accept that request
- # [16:30] <DanC> the validator is an important part of the feedback loop between authors and spec developers.
- # [16:31] <DanC> since I think head/@profile is worth keeping, I'm not inclined to take a validator that discourages it and deploy it more widely
- # [16:31] <anne> if Henri makes <head profile> conforming HTML5 that feedback will go away
- # [16:32] <DanC> right; in this case, the spec should change, not the document. (IMO)
- # [16:32] <anne> and my point wasn't so much about detalis like that, but more that the architecture of Henr's validator seems better as it isn't based on SGML anymore and also uses a proper XML parser for XHTML
- # [16:33] * olivier wishes there can be coordination other that "replace the W3C validator with (henri's)". It would be nicer to have people work together on some common tool. Having separate developers build separate tools is good, asking that one be trashed for another, not constructive IMHO
- # [16:33] <olivier> that's why I hope to see henri, would like to meet Jirka, etc
- # [16:33] <DanC> "trashed" is your word, not his, Olivier
- # [16:33] <olivier> indeed
- # [16:33] <olivier> I didn't claim it was
- # [16:34] <DanC> right; you're adding heat by that choice of words.
- # [16:34] * Joins: aroben (aroben@67.160.250.192)
- # [16:34] <DanC> if what you really want is more coordination, you do well to choose other words
- # [16:35] <DanC> using a conforming XML parser seems long overdue
- # [16:35] <anne> (I didn't really see that clearly, the HTML part is based on the HTML parser in the HTML5 spec with some provisions for HTML4 as HTML5 is not yet a standard.)
- # [16:35] <olivier> Dan, the truth is that it is very much the mindset. everyone thinks their own tool has something special, and would rather have the others replaced than work on getting tools working together
- # [16:36] <anne> I'm not involved in Henri's validator in any way olivier and I was the one suggesting it, not hsivonen
- # [16:36] <olivier> that's why I like the idea of working with Henri and Jirka on a common output
- # [16:36] <anne> to be clear
- # [16:36] <DanC> ok, but such generalities don't get us closer to the goal. What else can you tell us about Henris' architecture, anne? (I'm also looking for pointers)
- # [16:36] <olivier> but that has not seen much progress
- # [16:36] <anne> I do provide IRC-level feedback to hsivonen now and then
- # [16:37] <olivier> (although there were threads in that direction lately, I haven't managed to grab henri's attention on it yet)
- # [16:37] <anne> DanC, http://about.validator.nu/ maybe?
- # [16:37] <anne> DanC, I don't consider that a generality btw, it seems pretty fundamental to me if you want to provide feedback on syntax
- # [16:38] <olivier> danc, sure. Let's get out of generalities: how do we merge capabilities of different engines built in different languages
- # [16:38] <DanC> by "generalities" I meant "everyone thinks their own tool has something special"
- # [16:39] <DanC> which capabilities are you most interested in, olivier/
- # [16:39] <DanC> ?
- # [16:39] <DanC> the main thing I get from http://about.validator.nu/ is RELAX-NG. I find that fairly desirable, but only as a means to an end; I haven't seen a RELAX-NG service with lots of work on user-friendly diagnostics.
- # [16:40] <DanC> what about nuts-and-bolts software architecture? I thought validator.nu was java, but the build seems to be python.
- # [16:40] <olivier> I think for HTML <= 4.01 nothing beats the flexibility and usefuleness of opensp (C, perl) but for XML based languages relax ng,schematron and nrl would be great. as for html5, if the group really goes forward without using schemas, then plugging a parser that groks the html5 parsing algo with something that has good UI and explanations
- # [16:41] <olivier> I thought validator.nu was java indeed, but perhaps it was switched to libhtml5 (which is python)?
- # [16:42] <DanC> I think it's highly unlikely that a schema will ever be an integral part of the HTML 5 spec. I think treating it as an implementation is a reasonable approach (though probably not what I'd do if I were editing the spec).
- # [16:42] <olivier> right
- # [16:42] <anne> it's Java
- # [16:43] <olivier> so that's at least 3 very different parsing/checking methods
- # [16:43] <olivier> not to mention xml schema support, for e.g voicexml
- # [16:43] <anne> other aspects is that it has a custom checker for lots of attribute values, a table checker
- # [16:43] <anne> so it checks for HTML tables if the table is marked up correctly
- # [16:43] <olivier> anne: do you know if it can check improper deep nesting?
- # [16:44] <olivier> e.g <form><p><form> ?
- # [16:44] <anne> yeah, it does that
- # [16:44] <olivier> very cool
- # [16:44] <olivier> that's hard to do with a schema
- # [16:44] <anne> it's based on a mix of RelaxNG, Schematron, and custom code
- # [16:44] * anne saw a presentation during XTech 2006 about it
- # [16:45] <DanC> I'm inclined to give HTML-specific stuff like table checking priority over general-purpose stuff like XML Schema and voiceXML. I think validator.w3.org should focus on the bulk of web content; if we need a separate special-purpose checker for VoiceXML, that seems OK to me
- # [16:45] <anne> it's really quite neat, it basically checks everything you can possibly check without requiring something that's turing complete
- # [16:45] <olivier> danc: yeah
- # [16:45] <DanC> schematron is turing complete
- # [16:45] <Philip> anne: By "Turing complete", do you mean "AI complete"?
- # [16:45] <olivier> but relaxng and nrl for svg seems important
- # [16:46] <anne> Philip, I suppose
- # [16:46] <DanC> nrl... is that different from NVDL?
- # [16:46] <olivier> old name of it, sorry
- # [16:46] <olivier> but same thing
- # [16:46] <DanC> ok
- # [16:46] <Philip> anne: There's quite a difference between them :-)
- # [16:46] * DanC hunts for the source for the table checker; seems to remember it's in Java
- # [16:47] <DanC> http://hsivonen.iki.fi/table-integrity-checker/ ...
- # [16:47] * Joins: hasather (hasather@90.231.107.133)
- # [16:48] <DanC> irony! from sgmllib import SGMLParser -- http://svn.versiondude.net/whattf/build/trunk/build.py
- # [16:48] <anne> Philip, oh, wait, I think I just meant turing complete, but you're right
- # [16:49] * anne wasn't really paying much attention to what Philip said, oops
- # [16:49] <DanC> wild... there's a pile of java stuff, and the python code is for fetching it from hither and yon
- # [16:50] <Philip> anne: Hmm, now I have no idea what you meant :-p
- # [16:50] <DanC> hmm... return code isn't checked; os.system(cmd)
- # [16:51] <DanC> that's in runCmd; in execCmd, the return code _is_ checked. odd.
- # [16:53] * Quits: aroben (aroben@67.160.250.192) (Ping timeout)
- # [16:53] * Joins: ChrisWilson (cwilso@131.107.0.105)
- # [16:53] * Quits: ChrisWilson2 (cwilso@131.107.0.73) (Ping timeout)
- # [16:54] <anne> Philip, sorry, turing complete is needed for ECMAScript checking and such but AI complete is needed for semantics; I meant turing complete, but AI complete would indeed be better
- # [16:56] <Philip> anne: Ah, right
- # [16:57] <DanC> still looking for table checker source; doesn't seem to be in http://svn.versiondude.net/whattf/validator/trunk/src/nu/
- # [16:57] <Philip> but Turing completeness still doesn't let you check ECMAScript for some properties, because of the halting problem
- # [16:58] <DanC> wild... http://svn.versiondude.net/whattf/util/trunk/src/nu/validator/json/Serializer.java ... he wrote his own JSON serializer? or used one from MOZ and renamed it package nu.validator.json?
- # [16:59] <DanC> aha! http://svn.versiondude.net/whattf/syntax/trunk/non-schema/java/src/org/whattf/checker/table/
- # [17:00] <DanC> now... what's the interface between the table checker and the rest? (and how does it compare to the Unicorn interface?)
- # [17:01] <DanC> public final class TableChecker extends Checker {
- # [17:01] <DanC> import org.whattf.checker.Checker;
- # [17:01] <DanC> clearly a web browser is not the intended mechanism to browse this code ;-)
- # [17:02] <DanC> but it's not bad... http://svn.versiondude.net/whattf/syntax/trunk/non-schema/java/src/org/whattf/checker/Checker.java
- # [17:02] <DanC> * The abstract base class for SAX-based content checkers that listen to
- # [17:02] <DanC> * the <code>ContentHandler</code> events and emit errors and warnings to
- # [17:02] <DanC> * an <code>ErrorHandler</code>.
- # [17:03] <DanC> olivier, what's the closest thing in Unicorn? I suppose it's bytestream based; it's reasonably straightforward to stick an XML parser in there to turn a bytestream into a sequence of sax events
- # [17:05] <DanC> hmm... it's an abstract class, not a java interface. I gather people don't really use java interfaces as much as the Modula-3 crowd used interfaces.
- # [17:08] <hsivonen> anne: I think I'm going to do human-readable names for XSLT and Atom as well, although I agree that Feed Validator does Atom better so I'm not advertising Validator.nu for Atom purposes
- # [17:09] <hsivonen> DanC: back at XTech, there was some preliminary discussion about running a copy of the Validator.nu software (unnamed back then) in the w3.org space
- # [17:09] <hsivonen> DanC: I said that it would be good, but it should probably wait until I had a better parser
- # [17:09] <hsivonen> DanC: I now have a better parser
- # [17:10] <DanC> I ran across http://relaxed.sourceforge.net/ the other day; it's also pretty interesting.
- # [17:11] <hsivonen> DanC: Re: profile. if Validator.nu does something that you don't like and Validator.nu is merely following the HTML 5 draft, I think it isn't a Validator.nu bug per se
- # [17:11] <hsivonen> DanC: I do have some diffs from the draft, though, when it is too obvious that implementing the current spec language is not worthwhile
- # [17:11] <hsivonen> DanC: e.g. style='' and <font>
- # [17:12] <hsivonen> DanC: It is in my power to make the code disagree with the spec, but I'd rather minimize the gap between the code and the spec
- # [17:14] <hsivonen> olivier: Re: common tool: what's the situation with Unicorn these days?
- # [17:15] <DanC> hsivonen, do the relaxng and schematron bits extend Checker the way TableChecker does?
- # [17:15] <hsivonen> olivier: sorry about not following up on the output format in a timely manner. I'll follow up shortly.
- # [17:16] <hsivonen> DanC: the architecture is described in http://hsivonen.iki.fi/thesis/html5-conformance-checker
- # [17:17] <hsivonen> DanC: the short story is that the parsers are SAX and the higher layer is RELAX NG, custom RELAX NG datatype library, schematron and hand-rolled Java depending on the suitability of each for a given subproblem
- # [17:18] <hsivonen> DanC: only the build script is Python
- # [17:19] <hsivonen> olivier: Validator.nu uses a library that is supposed to do XSD as well, but I turned it off, because it crashes and because I haven't gotten around to reviewing it for security
- # [17:19] * DanC skips to chapter 5 ... http://hsivonen.iki.fi/thesis/html5-conformance-checker#implementation
- # [17:20] <hsivonen> DanC: I wasn't aware that Schematron was Turing complete. It seems that it isn't conveniently Turing-complete at least if you are sticking to XPath 1.0
- # [17:21] * Quits: matt (matt@128.30.52.30) (Quit: matt)
- # [17:21] <hsivonen> Validator.nu has latent (totally untested) NRL and NVDL capability
- # [17:22] <olivier> hsivonen: we're restarting development on it
- # [17:22] <hsivonen> DanC: I wrote my own JSON serializer. All the ones I found for Java were non-streaming and would have been harder to glue on.
- # [17:22] <olivier> (re: unicorn)
- # [17:22] <olivier> first implementation was interesting proof of concept but not flexible enough for real world usage
- # [17:23] <hsivonen> DanC: no, the RELAX NG and Schematron bits implement the Validator interface
- # [17:23] <DanC> "XHTML5 does not allow the character encoding to be declared using the meta element". wild. is that still the case?
- # [17:23] <hsivonen> DanC: the Checker stuff is also wrapped in an adapter that makes them look like Validator instances as well
- # [17:23] <olivier> hsivonen: no problem for the mails, I was just planning to chat with you at tpac if you didn't have time before that
- # [17:24] <hsivonen> DanC: yes, meta charset is bogus in application/xhtml+xml
- # [17:24] <DanC> "For example, HTML5 allows the form feed character." oh my. that's gonna cost us a round with the I18N WG. Is that really worthwhile?
- # [17:24] <gsnedders> DanC: if we allowed that, we'd need similar algorithms to sniff the meta element as we have in HTML. is the XML declaration not enough?
- # [17:24] <gsnedders> DanC: (meta element for charset)
- # [17:25] <DanC> I'm not really interested in having two separate HTML languages, gsnedders
- # [17:25] <hsivonen> DanC: I've argued that we shouldn't allow Form Feed, but what problem does banning it solve (except XML round trippability)
- # [17:26] <gsnedders> DanC: so you'd rather totally drop |meta| for charset?
- # [17:26] <hsivonen> DanC: and why should it be non-conforming to grab an RFC file and put it in <pre>?
- # [17:26] <DanC> I don't know what's wrong with form feed, hsivonen , but I know I have to ask the I18N WG.
- # [17:27] <DanC> I wish we could put _some_ bounds on the design space for HTML 5. but no, we seem to be opening every single can of worms, bar none.
- # [17:27] * DanC looks up the decision to bar ff from XML...
- # [17:28] <DanC> XML decision record: http://www.w3.org/XML/9712-reports.html
- # [17:28] <DanC> hmm... "form feed" doesn't occur.
- # [17:29] <DanC> I was hoping to never think about such things again: "Decision: When an XML processor encounters any of the character sequences CR (UTF-16 x000D), LF (UTF-16 x000A), or CR LF (UTF-16 x000D x000A), the processor must pass a single LF character to the downstream application."
- # [17:30] <DanC> x000C doesn't occur.
- # [17:30] <hsivonen> DanC: XML has an interesting loophole that allows escaped carriage returns to make their way into the infoset
- # [17:30] <DanC> it doesn't seem to be an explicit decision of the XML WG
- # [17:30] * Joins: kingryan (rking3@208.66.64.47)
- # [17:31] <DanC> XML doesn't allow newlines in attribute values, even escaped. I saw a very angry blog article about that, from a guy trying to use <input type="hidden />.
- # [17:32] <hsivonen> DanC: yeah, that's one of the reasons why I'm not very fond of the XML spec writers guessing what the reasonable limits on the use of particular characters are
- # [17:32] <DanC> part of me would rather drop meta for charset, gsnedders ; it's an ugly hack. but it's now ubiquitously deployed and hence our responsibility to put it in the spec.
- # [17:32] <hsivonen> DanC: however, I thought escaped line feeds survived in attribute values
- # [17:32] <hsivonen> DanC: meta charset is *not* ubiquitous in application/xhtml+xml consumers
- # [17:32] <gsnedders> DanC: it needs to be in the parsing section, yes; it's requirement in the conformance section is slightly more questionable
- # [17:33] <hsivonen> DanC: which is why we don't have a legacy pressure to allow it
- # [17:33] * DanC doesn't care too much about the spec for application/xhtml+xml until he sees a viable deployment path for it
- # [17:34] <gsnedders> also, to support it in XHTML you couldn't use a verbatim XML parser
- # [17:35] <DanC> uncle. I don't care where charset is allowed. Clearly I'm not going to like any of the deployable designs.
- # [17:37] <DanC> hmm... if the XML decision record doesn't have something from the I18N WG on FF, maybe I don't need to notify them. let's check http://www.w3.org/TR/charmod/ ...
- # [17:38] <DanC> 000c and "form feed" and "formfeed" don't occur there either. whew.
- # [17:38] * Lachy_ is now known as Lachy
- # [17:39] * DanC checks for "whitespace" and "control character"...
- # [17:39] <DanC> wow... no "whitespace"
- # [17:39] <hsivonen> DanC: fwiw, HTML 5 has to violate some of the requirements charmod places on specs in order to be compatible with the Web
- # [17:39] <DanC> I wish you hadn't said that; now I have to ask you which requirements and get I18N WG review
- # [17:40] <hsivonen> Validator.nu checks for most charmod requirements except the ones that would seriously devalue errors
- # [17:41] * DanC follows a pointer to http://www.w3.org/TR/unicode-xml/ ...
- # [17:41] <DanC> for example?
- # [17:41] <DanC> "[HTML4.01] adds to these the form feed character (U+000C), but that character cannot be used in any XHTML version."
- # [17:42] <hsivonen> I'm trying to find the charmod violations. just a moment
- # [17:42] <DanC> -- section 7. White Space http://www.w3.org/TR/unicode-xml/#White
- # [17:42] <hsivonen> haha. XML violates charmod C070
- # [17:43] <hsivonen> Validator.nu does not check for C049
- # [17:43] <DanC> are you sure? I think there's a recorded rationale for each excluded character.
- # [17:44] <gsnedders> including form-feed? :P
- # [17:44] <hsivonen> DanC: well, from the point of view of using XML, it sure feels rather arbitrary at times
- # [17:44] <DanC> yes, including form feed. I'll be surprised if I don't eventually find a reason why it was excluded from XML
- # [17:44] <gsnedders> I haven't found any looking around either, in any place I'd expect it.
- # [17:45] <gsnedders> Hopefully don't need to look deep into mailing lists
- # [17:45] <hsivonen> DanC: HTML5 violates charmod C027 (and has to do so in order to be useful)
- # [17:45] <DanC> "C027 [S] Specifications that require a default encoding MUST define either UTF-8 or UTF-16 as the default, or both if they define suitable means of distinguishing them."
- # [17:45] <DanC> what's the HTML5 default?
- # [17:45] <gsnedders> Windows-1252
- # [17:45] * DanC blinks, dumbfounded
- # [17:45] <gsnedders> needed for compat
- # [17:46] <hsivonen> C040 is not machine-testable
- # [17:46] <gsnedders> also need to treat any claim of ISO-8859-1 as Windows-1252 for compat
- # [17:47] <hsivonen> C045 SHOULD part is not honored (and rightly so)
- # [17:47] * Quits: billmason (billmason@69.30.57.156) (Quit: .)
- # [17:47] <gsnedders> hsivonen: the SHOULD?
- # [17:47] <hsivonen> gsnedders: hex over decimal
- # [17:47] <gsnedders> hsivonen: yeah
- # [17:47] <DanC> "User agents must at a minimum support the UTF-8 and Windows-1252 encodings, but may support more." -- 8.2.2.2. Character encoding requirements http://www.w3.org/html/wg/html5/ . wow. Maybe I don't want to chair this WG after all. I don't think I can take that as seriously as it evidently merits.
- # [17:48] <DanC> what spec does HTML5 cite for the definition of Windows-1252?
- # [17:48] <hsivonen> DanC: the spec doesn't cite any normative references properly yet
- # [17:48] <hsivonen> DanC: and the req for Windows-1252 is very serious indeed
- # [17:49] <gsnedders> DanC: the vast majority of the web is UTF-8 or Windows-1252
- # [17:49] <DanC> is that the default in firefox/gecko and safar/webkit? opera?
- # [17:49] <hsivonen> Validator.nu does not check charmod C047 as it is not well-defined
- # [17:49] <hsivonen> DanC: yes
- # [17:49] * DanC wimpers
- # [17:49] <gsnedders> DanC: so much breaks if you don't
- # [17:49] <ChrisWilson> Oh really Dan, don't be so surprised
- # [17:49] <ChrisWilson> :)
- # [17:50] <hsivonen> Validator.nu does not check for C048 because doing so would seriously devalue errors
- # [17:50] <gsnedders> from the #whatwg /topic: "Please leave your sense of logic at the door, thanks!"
- # [17:50] <gavin_> I'm pretty sure the Firefox default is not Windows-1252
- # [17:50] <DanC> I can see that I shouldn't be surprised, but... well... I am.
- # [17:50] <ChrisWilson> Everyone supports overlapping bold and italic tags too.
- # [17:50] <ChrisWilson> why is Windows 1252 not logical?
- # [17:50] <hsivonen> s/doing so/supporting it/
- # [17:51] <hsivonen> gavin_: It has to be Windows-1252 to be Web-compatible.
- # [17:51] <gavin_> it varies per-locale, but the default for en-US is ISO-8859-1 cross-platform, I believe
- # [17:51] <hsivonen> gavin_: but the user can change it
- # [17:51] <DanC> I don't remember reading a spec for Windows-1252; I'm totally unfamiliar with it.
- # [17:51] <hsivonen> gavin_: ISO-8859-1 in Gecko means Windows-1252
- # [17:51] <hsivonen> gavin_: (that's in the spec, too)
- # [17:51] <gavin_> hsivonen: ah, ok
- # [17:51] <gsnedders> DanC: just reassigns the control characters within 0x80-0xFF to actual printable characters from ISO-8859-1
- # [17:52] <DanC> it's registered. http://www.iana.org/assignments/character-sets -> http://www.iana.org/assignments/charset-reg/windows-1252
- # [17:52] <hsivonen> DanC: oh yeah, in HTML, ISO-8859-1 has to be treated as an alias for Windows-1252. that's a Support Existing Content requirement
- # [17:52] <hsivonen> DanC: that probably violates the letter of charmod
- # [17:53] <DanC> "Support Existing Content" is a principle, not a stop-thinking requirement.
- # [17:53] <gsnedders> ChrisWilson: I'll probably finally write you an email about parsing of HTTP responses this week, BTW
- # [17:53] <gavin_> I didn't realize Windows-1252 was a superset of ISO 8859-1
- # [17:53] <DanC> but clearly we should have test cases for treating ISO-8859-1 as Windows-1252
- # [17:54] <hsivonen> DanC: well, thinking leads to treating ISO-8859-1 as an alias for Windows-1252 for the purpose of consuming text/html
- # [17:54] <hsivonen> DanC: html5lib has tests
- # [17:54] <gsnedders> DanC: if HTML5 deviates from what is needed for the real world, implementers, myself included, will simply leave the WG.
- # [17:54] <DanC> wanna help me find which html5lib test, hsivonen ?
- # [17:54] <hsivonen> DanC: sure
- # [17:55] <DanC> yes, gsnedders , I'm after a spec to match real-world deployment too. sigh.
- # [17:55] <gsnedders> which means it really does need to be a stop-thinking-and-do-something-illogical requirement :(
- # [17:55] <hsivonen> DanC: testdata/encoding/tests1.dat second test
- # [17:55] <gsnedders> (though what is conforming in a document is far more open for discussion)
- # [17:56] <DanC> well, we only stop thinking after we've done some measurememt. evidently the measurement here is done and I'm late to the party.
- # [17:57] <gsnedders> most of the measurement has been done in UA development over the years, in all seriousness
- # [17:57] <DanC> quite.
- # [17:57] <DanC> now we just need to collect that into a test suite
- # [17:58] <gsnedders> does anyone know if Apple ever shipped a Safari release with SGML comment parsing?
- # [17:58] <hsivonen> DanC: http://hsivonen.iki.fi/test/iso8859/ contains measurement demos
- # [17:59] * Joins: aroben (aroben@17.255.98.208)
- # [17:59] <DanC> hsivonen, which row in http://hsivonen.iki.fi/test/iso8859/ISO-8859-1.htm tells me my browser is treating the page as 1252 rather than 8859-1?
- # [18:00] <hsivonen> DanC: every row that has a printable character has a matching rendering in the Byte column and in the Windows-1252 NCR column
- # [18:01] * Joins: aroben_ (aroben@17.203.12.72)
- # [18:01] <Philip> All three columns look identical to me
- # [18:02] <hsivonen> DanC: the fact that the ISO-8859-1 NCR and the Windows-1252 NCR columns match shows that NCRs pointing to C1 controls have to be treated as Windows-1252 code point references
- # [18:02] <Philip> presumably since the Windows-1252 mapping is applied to NCRs too
- # [18:02] <Philip> (Oh, what you said)
- # [18:03] <DanC> my tiny brain is not following. do all the rows support this line of reasoning, or just some of them? If just some, please nominate 1 for me to study.
- # [18:03] <hsivonen> DanC: IIRC, the Thai ISO encoding is also weird in the way that some C1 range points mean corresponding windows code points in deployed content
- # [18:04] * Quits: aroben (aroben@17.255.98.208) (Ping timeout)
- # [18:04] <hsivonen> DanC: rows 0x80 through 0xA0 are interesting
- # [18:04] <hsivonen> DanC: and of those, the ones that have printable characters (i.e. are assigned in Windows-1252) support the assertion
- # [18:04] <DanC> ok, for 0x80, what would my browser do if it were treating the data as 8859-1 rather than 1252?
- # [18:05] <hsivonen> DanC: display the euro sign
- # [18:05] <hsivonen> oops
- # [18:05] <hsivonen> I misread the question
- # [18:05] <hsivonen> if the browser were treating the data as ISO-8859-1, it should *not* render an euro sign there
- # [18:05] <DanC> what should it do?
- # [18:06] <DanC> I guess show a little hex-in-box or something?
- # [18:06] <hsivonen> DanC: that's a good question. I don't know a definitive answer, but rendering a replacement character would be reasonable
- # [18:07] * Joins: Thezilch (fuz007@68.54.228.249)
- # [18:07] <hsivonen> DanC: as far as I can tell, the rendering of C1 controls is not well-defined in a CSS formatter
- # [18:07] <hsivonen> (of if it is, I missed the spec)
- # [18:07] <DanC> ok, so this is a case of browsers filling in where the specs said the author shouldn't do that.
- # [18:08] <hsivonen> DanC: this is a case of making use of the goodness of Windows-1252 where ISO was being unuseful
- # [18:08] * DanC sees 80 thru 9f are unused, per http://en.wikipedia.org/wiki/ISO/IEC_8859-1
- # [18:08] <hsivonen> DanC: now it is just a part of the legacy weirdness when UTF-8 could give us what ISO-8859-1 could not
- # [18:09] <DanC> wow... this is evidently common knowledge... "Many web browsers treat the MIME charset ISO-8859-1 as Windows-1252 " -- http://en.wikipedia.org/wiki/Windows-1252 . I've been under a rock for a long time.
- # [18:09] <hsivonen> :-)
- # [18:10] <DanC> is 1252 new-ish? the euro character isn't that old, is it?
- # [18:10] * DanC follows his nose to http://en.wikipedia.org/wiki/Euro_sign
- # [18:11] <DanC> 1996
- # [18:11] <hsivonen> DanC: the euro sign was retrofitted in MacRoman and Windows-1252
- # [18:11] <ChrisWilson> yup. Very quickly.
- # [18:11] <ChrisWilson> (Shipped as a patch)
- # [18:12] <hsivonen> Microsoft did a *much* better job there than Apple. the fallout from the Apple quick fix still continues to suck
- # [18:13] <hsivonen> (in font design that is)
- # [18:13] <Philip> data:text/html;charset=utf-8,%3Cbody%3E%26%23x80%3B%3Cscript%3Ealert(document.body.innerHTML.charCodeAt(0))%3C%2Fscript%3E is an example of an interesting result despite not using iso-8859-1
- # [18:14] <hsivonen> the pre-euro Windows encoding has a distinct IANA name in theory, but virtually no one uses the pre-euro name
- # [18:14] * Joins: Sander (svl@86.87.68.167)
- # [18:21] <DanC> # HTML 5 defaults to Windows-1252, where charmod requires UTF-8/UTF-16 Dan Connolly (Monday, 29 October) http://lists.w3.org/Archives/Public/www-archive/2007Oct/0059.html
- # [18:22] * DanC does his duty and invites I18N review. :-/
- # [18:25] <Philip> (http://lists.w3.org/Archives/Public/www-archive/2007Oct/0058.html looks fun)
- # [18:29] <hsivonen> Philip: I had hoped XML 1.1 would just go away and be forgotten :-(
- # [18:33] <Philip> Are they proposing something other than just renaming XML 1.1 to XML 1.0?
- # [18:35] <hsivonen> Philip: dunno exactly, but they seem to suggest breaking consistency between various parser in order to make thing politically correct so that people can make parochial markup languages
- # [18:36] <hsivonen> this won't help anyone, of course, because the legacy would still cast enough uncertainty upon e.g. Khmer element names that Cambodian markup language designers would still be better off not using Khmer element names
- # [18:40] <hsivonen> besides, if XML Core is now willing to change whan version='1.0' means, we might as well do directly to XML5 parsing
- # [18:44] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [18:49] * Joins: gavin (gavin@99.227.30.12)
- # [18:49] <anne> DanC, that you're surprised is annoying, as it means the W3C is quite out of touch with reality
- # [18:50] <anne> hmm, maybe I shouldn't generalize so much, but I do get that feeling
- # [18:51] <ChrisWilson> That does seem like a large generalization based on a lack of knowledge on one specific item.
- # [18:51] <ChrisWilson> (I meant, Dan's lack of knowledge)
- # [18:51] <DanC> only one of the co-chairs was surprised; maybe that helps, anne?
- # [18:52] <DanC> And Richard Ishida, who is the W3C team member who is supposed to know about this stuff, doesn't seem to be surprised nor see a problem.
- # [18:53] <anne> DanC, the other co-chair works for a browser vendor
- # [18:53] <DanC> right; W3C has various checks and balances
- # [18:54] <anne> hsivonen, XML Core seems to be heading in the right direction anyway, seems like a good thing :)
- # [20:02] * Quits: ChrisWilson (cwilso@131.107.0.105) (Ping timeout)
- # [20:23] * Joins: ChrisWilson (cwilso@131.107.0.102)
- # [20:27] * Joins: mjs (mjs@64.81.48.145)
- # [22:28] * Disconnected
- # [22:28] * Attempting to rejoin channel #html-wg
- # [22:28] * Rejoined channel #html-wg
- # [22:28] * Topic is 'next HTML WG telcon 25 Oct 2300Z http://www.w3.org/html/wg/ (more logs: http://krijnhoetmer.nl/irc-logs/ )'
- # [22:28] * Set by DanC on Mon Oct 22 15:50:08
- # [22:47] * Joins: mjs (mjs@17.255.106.186)
- # [22:58] * Quits: gavin (gavin@99.227.30.12) (Ping timeout)
- # [23:03] * Joins: gavin (gavin@99.227.30.12)
- # [23:48] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
- # Session Close: Tue Oct 30 00:00:00 2007
The end :)