Options:
- # Session Start: Tue Jul 29 00:00:00 2008
- # Session Ident: #html-wg
- # [00:01] * Quits: shepazu (schepers@128.30.52.30) (Quit: shepazu)
- # [00:13] * Joins: Zeros (Zeros-Elip@67.154.87.254)
- # [00:15] * Joins: shepazu (schepers@128.30.52.30)
- # [00:21] * Quits: heycam (cam@124.168.12.194) (Quit: bye)
- # [00:37] * Quits: shepazu (schepers@128.30.52.30) (Ping timeout)
- # [00:56] * Quits: Zeros (Zeros-Elip@67.154.87.254) (Ping timeout)
- # [00:58] * Joins: mjs (mjs@17.203.14.227)
- # [01:12] * Joins: mjs_ (mjs@17.255.109.93)
- # [01:13] * Quits: mjs (mjs@17.203.14.227) (Ping timeout)
- # [01:15] * Joins: mjs (mjs@17.203.14.227)
- # [01:17] * Quits: mjs_ (mjs@17.255.109.93) (Ping timeout)
- # [01:22] * Joins: shepazu (schepers@128.30.52.30)
- # [01:29] * Quits: billmason (billmason@69.30.57.110) (Connection reset by peer)
- # [02:00] * Quits: aroben (aroben@71.58.56.76) (Quit: aroben)
- # [02:25] * Quits: tH (Rob@87.102.92.207) (Quit: ChatZilla 0.9.83-rdmsoft [XULRunner 1.9/2008061013])
- # [02:49] * Quits: adele (adele@17.203.14.218) (Ping timeout)
- # [03:03] * Joins: mjs_ (mjs@17.255.109.93)
- # [03:03] * Quits: mjs_ (mjs@17.255.109.93) (Connection reset by peer)
- # [03:04] * Quits: mjs (mjs@17.203.14.227) (Ping timeout)
- # [03:29] * Quits: ChrisWilson (cwilso@131.107.0.71) (Ping timeout)
- # [04:04] * Quits: hsivonen (hsivonen@130.233.41.50) (Ping timeout)
- # [04:05] * Joins: hsivonen (hsivonen@130.233.41.50)
- # [04:11] * Joins: mjs (mjs@17.203.14.227)
- # [05:28] * Joins: Zeros (Zeros-Elip@69.140.40.140)
- # [05:39] * Quits: mjs (mjs@17.203.14.227) (Quit: mjs)
- # [05:41] * Joins: mjs (mjs@17.255.109.93)
- # [05:42] * Joins: mjs_ (mjs@17.255.109.93)
- # [05:42] * Quits: mjs (mjs@17.255.109.93) (Connection reset by peer)
- # [05:42] * Quits: mjs_ (mjs@17.255.109.93) (Quit: mjs_)
- # [06:57] * Joins: mjs (mjs@24.5.43.151)
- # [07:17] * Joins: Thezilch (fuz007@76.171.111.7)
- # [07:33] * Joins: dbaron (dbaron@216.18.1.210)
- # [08:54] * Joins: heycam (cam@124.168.12.194)
- # [09:06] * Joins: zcorpan (zcorpan@88.131.66.80)
- # [09:16] * Quits: dbaron (dbaron@216.18.1.210) (Quit: g'night)
- # [09:19] * Joins: marcos (marcos@124.171.136.76)
- # [09:29] * Quits: marcos (marcos@124.171.136.76) (Quit: marcos)
- # [09:37] * Quits: mjs (mjs@24.5.43.151) (Quit: mjs)
- # [09:43] * Joins: mjs (mjs@24.5.43.151)
- # [10:51] * Joins: ROBOd (robod@89.122.216.38)
- # [11:01] * Quits: Lachy (Lachlan@85.196.122.246) (Quit: This computer has gone to sleep)
- # [11:17] * Joins: Lachy (Lachlan@213.236.208.247)
- # [11:19] * Quits: Thezilch (fuz007@76.171.111.7) (Connection reset by peer)
- # [11:20] * Quits: Lachy (Lachlan@213.236.208.247) (Ping timeout)
- # [11:21] * Joins: Lachy (Lachlan@213.236.208.22)
- # [11:37] * Joins: tH_ (Rob@87.102.92.207)
- # [11:38] * tH_ is now known as tH
- # [12:39] * Quits: Lachy (Lachlan@213.236.208.22) (Quit: Leaving)
- # [12:39] * Joins: Lachy (Lachlan@213.236.208.22)
- # [12:48] * Joins: myakura (myakura@118.8.102.216)
- # [13:14] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
- # [15:28] * Quits: myakura (myakura@118.8.102.216) (Quit: Leaving...)
- # [15:41] * RRSAgent excuses himself; his presence no longer seems to be needed
- # [15:41] * Parts: RRSAgent (rrs-loggee@128.30.52.30)
- # [15:57] * Joins: aroben (aroben@71.58.56.76)
- # [16:09] * Quits: Lachy (Lachlan@213.236.208.22) (Quit: This computer has gone to sleep)
- # [16:19] * Joins: Lachy (Lachlan@85.196.122.246)
- # [16:23] * Quits: Lachy (Lachlan@85.196.122.246) (Ping timeout)
- # [16:24] * Joins: Lachy (Lachlan@85.196.122.246)
- # [16:28] * Joins: billmason (billmason@69.30.57.110)
- # [17:07] * Quits: zcorpan (zcorpan@88.131.66.80) (Quit: zcorpan)
- # [17:12] <DanC> hmm... I thought I understood this "Character encoding overrides" table, but I tried to explain it to somebody, and they noticed "Any bytes that are treated differently due to this encoding aliasing must be considered parse errors. " right above it.
- # [17:12] <DanC> byte 128 is different in ISO-8859-1 and Windows-1252, no?
- # [17:13] <hsivonen> DanC: I think the parse error part should be taken away. Implementing it for something like GBK has a very unfavorable cost/benefit ratio
- # [17:14] <hsivonen> (yes, 128 is different in ISO-8859-1 and Windows-1252)
- # [17:15] <DanC> I understood the whole point of mapping ISO-8859-1 to Windows-125 was to map byte 128 to the euro character. no?
- # [17:16] <hsivonen> yeah (well, the rest of the C1 range, too)
- # [17:18] <DanC> hmm. I'm totally lost.
- # [17:18] <DanC> oh well.
- # [17:20] <hsivonen> apart from the parse error requirement (which I want to abolish) it's really just an alias table
- # [17:20] <Philip> DanC: Lost in the details of what an implementation should do, or lost in trying to understand the purpose of what the spec says?
- # [17:21] <DanC> both, philip.
- # [17:21] <DanC> why the table at all, or at least why the iso-8859-1 row, if not for the euro character?
- # [17:21] <DanC> and what is an implementation to do with byte 128 in a page labelled iso-8859-1?
- # [17:22] <hsivonen> DanC: Turn it into euro
- # [17:22] <DanC> hsivonen, that's your advice, or your reading of the spec?
- # [17:22] <hsivonen> DanC: my remark about the C1 range was just pointing out that it isn't just the euro
- # [17:22] <hsivonen> DanC: both
- # [17:22] <Philip> hsivonen: If the page was iso-8859-1, and there wasn't the mapping to windows-1252, what would happen?
- # [17:23] <DanC> didn't we establish that it's a parser error, since 128 is different in ISO-8859-1 and Windows-1252?
- # [17:23] <Philip> 0x80 seems to be undefined in ISO-8859-1, so would it just turn into U+FFFD or something?
- # [17:23] <hsivonen> DanC: yes, it's a parse error per spec. (not per Validator.nu, though)
- # [17:23] <hsivonen> Philip: no, ISO-8859-1 would map it to U+0080
- # [17:24] <hsivonen> Philip: that is, officially the C1 range mapping to Unicode is just zero-extension
- # [17:24] * DanC is getting conflicting data about whether iso-8859-1 maps 0x80 to a character
- # [17:25] <DanC> wikipedia says "Code values 00â1F, 7Fâ9F are not assigned to characters by ISO/IEC 8859-1."
- # [17:25] <hsivonen> DanC: ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
- # [17:25] <DanC> ah... "In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the code values 00â1F, 7F, and 80â9F. It thus provides for 256 characters via every possible 8-bit value."
- # [17:25] <hsivonen> 0x80 0x0080 # <control>
- # [17:26] <Philip> hsivonen: Ah, right
- # [17:26] <Philip> ISO/IEC 8859-1:1997 says "The shaded positions in the code table correspond to bit combinations that do not represent graphic characters. Their use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429."
- # [17:27] <DanC> ok, so it's a parse error; does the spec require displaying a euro character in the case of a parse error?
- # [17:28] <hsivonen> DanC: yes
- # [17:29] <DanC> or abort, right?
- # [17:31] <hsivonen> DanC: oh, right, aborting is allowed too, but market forces take care of that for browsers
- # [17:32] <DanC> ok
- # [17:32] <DanC> then the parse error stuff seems to be a no-op; what cost did you mean when you said "Implementing it for something like GBK has a very unfavorable cost/benefit ratio"?
- # [17:33] <DanC> ah... perhaps you meant detecting this error
- # [17:33] <DanC> "Conformance checkers must report at least one parse error condition to the user if one or more parse error conditions exist in the document"
- # [17:33] <hsivonen> DanC: detecting it for GBK would be troublesome
- # [17:34] <hsivonen> DanC: bad for perf, more code, no practical benefit
- # [17:36] * DanC thinks he understands now... maybe...
- # [17:36] <Philip> hsivonen: The benefit is that it would stop someone from taking a conforming HTML5 page that declares itself to be GBK, passing it through "iconv -f GBK -t UTF-8", and unexpectedly getting errors
- # [17:36] <Philip> s/The/A/
- # [17:37] <hsivonen> Philip: using the Validator.nu parser connected to the bundled serializer solves the problem
- # [17:38] <hsivonen> (although in this case, GBK is the superset)
- # [17:38] <hsivonen> I keep forgetting the number of the GBxxxx subset
- # [17:39] <DanC> Philip, implementing a check for this error goes beyond detecting garbled GBK... it's a matter of finding all byte sequences that GBK maps to something different from what, for example, GB2312 maps it to
- # [17:39] <Philip> hsivonen: That requires hugely more effort to discover and install and learn how to use than existing tools that are well known and ought to work perfectly well
- # [17:39] <Philip> hsivonen: Oops, yes, I meant GB2312
- # [17:40] <Philip> DanC: I think GBK is meant to be an exact superset of GB2312, so any valid GB2312 bytestream will decode identically under GBK; I'm not positive about that but I really hope it's true :-)
- # [17:41] <hsivonen> Philip: the kind of people who use iconv in Europe and the Americas should know to use Windows-1252 when they see ISO-8859-1. Presumably, anyone who'd use iconv in China should know to specify GBK...
- # [17:42] <Philip> hsivonen: (Anyway, serialisers don't preserve human-significant aspects of the document, like attribute ordering and whitespace inside elements, so they're not at all equivalent to a charset-converting tool)
- # [17:42] <hsivonen> Philip: true.
- # [17:42] <hsivonen> Philip: but if you're working with someone else's "garbage out", you can't assume validity
- # [17:45] <Philip> hsivonen: You can pass it through a validator to see if it's valid, and if it's not then reject it, otherwise pass it through iconv to standardise the charset without disturbing the source document any more than is absolutely necessary
- # [17:47] <hsivonen> Philip: I think supporting that use case isn't worth the trouble of detecting the situation in an efficient manner.
- # [17:47] <Philip> kind of like how Youtube complains if your video is too long but otherwise standardises it to ugly FLV, except for HTML documents instead of video
- # [17:47] <Philip> or, alternatively, like a better analogy, that I can't think of
- # [17:47] <hsivonen> Philip: YouTube engineer have built in a lot of knowledge about video encoding craziness
- # [17:47] <Philip> or, even better, not like an analogy at all
- # [17:48] <hsivonen> Philip: anyone offering a similar service for HTML should at minimum look up the aliases in the spec
- # [17:48] <hsivonen> s/engineer/engineers/
- # [17:49] <hsivonen> afk
- # [17:54] <Philip> hsivonen: Hmm, good point :-(
- # [18:01] * Joins: ChrisWilson (cwilso@131.107.0.104)
- # [18:16] * Joins: aaronlev (chatzilla@216.18.1.210)
- # [18:55] * Quits: Hixie (ianh@129.241.93.37) (Ping timeout)
- # [18:55] * Joins: Hixie (ianh@129.241.93.37)
- # [18:55] * Quits: hsivonen (hsivonen@130.233.41.50) (Ping timeout)
- # [18:57] * Joins: hsivonen (hsivonen@130.233.41.50)
- # [18:58] * Quits: aaronlev (chatzilla@216.18.1.210) (Ping timeout)
- # [19:11] * Joins: marcos (marcos@124.171.136.76)
- # [19:44] * Joins: tlr (tlr@128.30.52.30)
- # [19:55] * Joins: adele (adele@17.203.14.218)
- # [20:07] * Quits: tlr (tlr@128.30.52.30) (Quit: tlr)
- # [20:09] * Quits: mjs (mjs@24.5.43.151) (Quit: mjs)
- # [20:21] * Joins: scotfl (scotfl@70.64.14.62)
- # [20:27] * Quits: marcos (marcos@124.171.136.76) (Quit: marcos)
- # [20:39] * Joins: plinss_ (peter.lins@15.243.169.70)
- # [20:57] * Joins: codedread (chatzilla@129.188.69.129)
- # [20:57] * Parts: codedread (chatzilla@129.188.69.129)
- # [21:38] * Quits: Zeros (Zeros-Elip@69.140.40.140) (Ping timeout)
- # [21:39] * Joins: Zeros (Zeros-Elip@67.154.87.254)
- # [21:45] * Quits: Zeros (Zeros-Elip@67.154.87.254) (Quit: Leaving)
- # [22:17] * Joins: mjs (mjs@17.255.96.56)
- # [22:36] * Quits: ChrisWilson (cwilso@131.107.0.104) (Ping timeout)
- # [22:46] * Joins: ChrisWilson (cwilso@131.107.0.104)
- # [23:04] * Quits: ROBOd (robod@89.122.216.38) (Quit: http://www.robodesign.ro )
- # [23:05] * Quits: mjs (mjs@17.255.96.56) (Quit: mjs)
- # [23:05] * Quits: plinss_ (peter.lins@15.243.169.70) (Quit: plinss_)
- # [23:09] * Joins: mjs (mjs@17.255.96.56)
- # [23:32] * Joins: mjs_ (mjs@17.255.96.56)
- # [23:33] * Quits: mjs (mjs@17.255.96.56) (Connection reset by peer)
- # [23:53] * Quits: gsnedders (gsnedders@217.44.35.200) (Quit: Killin' teh intarwebs)
- # [23:53] * Joins: gsnedders (gsnedders@217.44.35.200)
- # [23:53] * Parts: gsnedders (gsnedders@217.44.35.200)
- # Session Close: Wed Jul 30 00:00:00 2008
The end :)