/irc-logs / w3c / #html-wg / 2008-07-29 / end

Options:

  1. # Session Start: Tue Jul 29 00:00:00 2008
  2. # Session Ident: #html-wg
  3. # [00:01] * Quits: shepazu (schepers@128.30.52.30) (Quit: shepazu)
  4. # [00:13] * Joins: Zeros (Zeros-Elip@67.154.87.254)
  5. # [00:15] * Joins: shepazu (schepers@128.30.52.30)
  6. # [00:21] * Quits: heycam (cam@124.168.12.194) (Quit: bye)
  7. # [00:37] * Quits: shepazu (schepers@128.30.52.30) (Ping timeout)
  8. # [00:56] * Quits: Zeros (Zeros-Elip@67.154.87.254) (Ping timeout)
  9. # [00:58] * Joins: mjs (mjs@17.203.14.227)
  10. # [01:12] * Joins: mjs_ (mjs@17.255.109.93)
  11. # [01:13] * Quits: mjs (mjs@17.203.14.227) (Ping timeout)
  12. # [01:15] * Joins: mjs (mjs@17.203.14.227)
  13. # [01:17] * Quits: mjs_ (mjs@17.255.109.93) (Ping timeout)
  14. # [01:22] * Joins: shepazu (schepers@128.30.52.30)
  15. # [01:29] * Quits: billmason (billmason@69.30.57.110) (Connection reset by peer)
  16. # [02:00] * Quits: aroben (aroben@71.58.56.76) (Quit: aroben)
  17. # [02:25] * Quits: tH (Rob@87.102.92.207) (Quit: ChatZilla 0.9.83-rdmsoft [XULRunner 1.9/2008061013])
  18. # [02:49] * Quits: adele (adele@17.203.14.218) (Ping timeout)
  19. # [03:03] * Joins: mjs_ (mjs@17.255.109.93)
  20. # [03:03] * Quits: mjs_ (mjs@17.255.109.93) (Connection reset by peer)
  21. # [03:04] * Quits: mjs (mjs@17.203.14.227) (Ping timeout)
  22. # [03:29] * Quits: ChrisWilson (cwilso@131.107.0.71) (Ping timeout)
  23. # [04:04] * Quits: hsivonen (hsivonen@130.233.41.50) (Ping timeout)
  24. # [04:05] * Joins: hsivonen (hsivonen@130.233.41.50)
  25. # [04:11] * Joins: mjs (mjs@17.203.14.227)
  26. # [05:28] * Joins: Zeros (Zeros-Elip@69.140.40.140)
  27. # [05:39] * Quits: mjs (mjs@17.203.14.227) (Quit: mjs)
  28. # [05:41] * Joins: mjs (mjs@17.255.109.93)
  29. # [05:42] * Joins: mjs_ (mjs@17.255.109.93)
  30. # [05:42] * Quits: mjs (mjs@17.255.109.93) (Connection reset by peer)
  31. # [05:42] * Quits: mjs_ (mjs@17.255.109.93) (Quit: mjs_)
  32. # [06:57] * Joins: mjs (mjs@24.5.43.151)
  33. # [07:17] * Joins: Thezilch (fuz007@76.171.111.7)
  34. # [07:33] * Joins: dbaron (dbaron@216.18.1.210)
  35. # [08:54] * Joins: heycam (cam@124.168.12.194)
  36. # [09:06] * Joins: zcorpan (zcorpan@88.131.66.80)
  37. # [09:16] * Quits: dbaron (dbaron@216.18.1.210) (Quit: g'night)
  38. # [09:19] * Joins: marcos (marcos@124.171.136.76)
  39. # [09:29] * Quits: marcos (marcos@124.171.136.76) (Quit: marcos)
  40. # [09:37] * Quits: mjs (mjs@24.5.43.151) (Quit: mjs)
  41. # [09:43] * Joins: mjs (mjs@24.5.43.151)
  42. # [10:51] * Joins: ROBOd (robod@89.122.216.38)
  43. # [11:01] * Quits: Lachy (Lachlan@85.196.122.246) (Quit: This computer has gone to sleep)
  44. # [11:17] * Joins: Lachy (Lachlan@213.236.208.247)
  45. # [11:19] * Quits: Thezilch (fuz007@76.171.111.7) (Connection reset by peer)
  46. # [11:20] * Quits: Lachy (Lachlan@213.236.208.247) (Ping timeout)
  47. # [11:21] * Joins: Lachy (Lachlan@213.236.208.22)
  48. # [11:37] * Joins: tH_ (Rob@87.102.92.207)
  49. # [11:38] * tH_ is now known as tH
  50. # [12:39] * Quits: Lachy (Lachlan@213.236.208.22) (Quit: Leaving)
  51. # [12:39] * Joins: Lachy (Lachlan@213.236.208.22)
  52. # [12:48] * Joins: myakura (myakura@118.8.102.216)
  53. # [13:14] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  54. # [15:28] * Quits: myakura (myakura@118.8.102.216) (Quit: Leaving...)
  55. # [15:41] * RRSAgent excuses himself; his presence no longer seems to be needed
  56. # [15:41] * Parts: RRSAgent (rrs-loggee@128.30.52.30)
  57. # [15:57] * Joins: aroben (aroben@71.58.56.76)
  58. # [16:09] * Quits: Lachy (Lachlan@213.236.208.22) (Quit: This computer has gone to sleep)
  59. # [16:19] * Joins: Lachy (Lachlan@85.196.122.246)
  60. # [16:23] * Quits: Lachy (Lachlan@85.196.122.246) (Ping timeout)
  61. # [16:24] * Joins: Lachy (Lachlan@85.196.122.246)
  62. # [16:28] * Joins: billmason (billmason@69.30.57.110)
  63. # [17:07] * Quits: zcorpan (zcorpan@88.131.66.80) (Quit: zcorpan)
  64. # [17:12] <DanC> hmm... I thought I understood this "Character encoding overrides" table, but I tried to explain it to somebody, and they noticed "Any bytes that are treated differently due to this encoding aliasing must be considered parse errors. " right above it.
  65. # [17:12] <DanC> byte 128 is different in ISO-8859-1 and Windows-1252, no?
  66. # [17:13] <hsivonen> DanC: I think the parse error part should be taken away. Implementing it for something like GBK has a very unfavorable cost/benefit ratio
  67. # [17:14] <hsivonen> (yes, 128 is different in ISO-8859-1 and Windows-1252)
  68. # [17:15] <DanC> I understood the whole point of mapping ISO-8859-1 to Windows-125 was to map byte 128 to the euro character. no?
  69. # [17:16] <hsivonen> yeah (well, the rest of the C1 range, too)
  70. # [17:18] <DanC> hmm. I'm totally lost.
  71. # [17:18] <DanC> oh well.
  72. # [17:20] <hsivonen> apart from the parse error requirement (which I want to abolish) it's really just an alias table
  73. # [17:20] <Philip> DanC: Lost in the details of what an implementation should do, or lost in trying to understand the purpose of what the spec says?
  74. # [17:21] <DanC> both, philip.
  75. # [17:21] <DanC> why the table at all, or at least why the iso-8859-1 row, if not for the euro character?
  76. # [17:21] <DanC> and what is an implementation to do with byte 128 in a page labelled iso-8859-1?
  77. # [17:22] <hsivonen> DanC: Turn it into euro
  78. # [17:22] <DanC> hsivonen, that's your advice, or your reading of the spec?
  79. # [17:22] <hsivonen> DanC: my remark about the C1 range was just pointing out that it isn't just the euro
  80. # [17:22] <hsivonen> DanC: both
  81. # [17:22] <Philip> hsivonen: If the page was iso-8859-1, and there wasn't the mapping to windows-1252, what would happen?
  82. # [17:23] <DanC> didn't we establish that it's a parser error, since 128 is different in ISO-8859-1 and Windows-1252?
  83. # [17:23] <Philip> 0x80 seems to be undefined in ISO-8859-1, so would it just turn into U+FFFD or something?
  84. # [17:23] <hsivonen> DanC: yes, it's a parse error per spec. (not per Validator.nu, though)
  85. # [17:23] <hsivonen> Philip: no, ISO-8859-1 would map it to U+0080
  86. # [17:24] <hsivonen> Philip: that is, officially the C1 range mapping to Unicode is just zero-extension
  87. # [17:24] * DanC is getting conflicting data about whether iso-8859-1 maps 0x80 to a character
  88. # [17:25] <DanC> wikipedia says "Code values 00–1F, 7F–9F are not assigned to characters by ISO/IEC 8859-1."
  89. # [17:25] <hsivonen> DanC: ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
  90. # [17:25] <DanC> ah... "In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the code values 00–1F, 7F, and 80–9F. It thus provides for 256 characters via every possible 8-bit value."
  91. # [17:25] <hsivonen> 0x80 0x0080 # <control>
  92. # [17:26] <Philip> hsivonen: Ah, right
  93. # [17:26] <Philip> ISO/IEC 8859-1:1997 says "The shaded positions in the code table correspond to bit combinations that do not represent graphic characters. Their use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429."
  94. # [17:27] <DanC> ok, so it's a parse error; does the spec require displaying a euro character in the case of a parse error?
  95. # [17:28] <hsivonen> DanC: yes
  96. # [17:29] <DanC> or abort, right?
  97. # [17:31] <hsivonen> DanC: oh, right, aborting is allowed too, but market forces take care of that for browsers
  98. # [17:32] <DanC> ok
  99. # [17:32] <DanC> then the parse error stuff seems to be a no-op; what cost did you mean when you said "Implementing it for something like GBK has a very unfavorable cost/benefit ratio"?
  100. # [17:33] <DanC> ah... perhaps you meant detecting this error
  101. # [17:33] <DanC> "Conformance checkers must report at least one parse error condition to the user if one or more parse error conditions exist in the document"
  102. # [17:33] <hsivonen> DanC: detecting it for GBK would be troublesome
  103. # [17:34] <hsivonen> DanC: bad for perf, more code, no practical benefit
  104. # [17:36] * DanC thinks he understands now... maybe...
  105. # [17:36] <Philip> hsivonen: The benefit is that it would stop someone from taking a conforming HTML5 page that declares itself to be GBK, passing it through "iconv -f GBK -t UTF-8", and unexpectedly getting errors
  106. # [17:36] <Philip> s/The/A/
  107. # [17:37] <hsivonen> Philip: using the Validator.nu parser connected to the bundled serializer solves the problem
  108. # [17:38] <hsivonen> (although in this case, GBK is the superset)
  109. # [17:38] <hsivonen> I keep forgetting the number of the GBxxxx subset
  110. # [17:39] <DanC> Philip, implementing a check for this error goes beyond detecting garbled GBK... it's a matter of finding all byte sequences that GBK maps to something different from what, for example, GB2312 maps it to
  111. # [17:39] <Philip> hsivonen: That requires hugely more effort to discover and install and learn how to use than existing tools that are well known and ought to work perfectly well
  112. # [17:39] <Philip> hsivonen: Oops, yes, I meant GB2312
  113. # [17:40] <Philip> DanC: I think GBK is meant to be an exact superset of GB2312, so any valid GB2312 bytestream will decode identically under GBK; I'm not positive about that but I really hope it's true :-)
  114. # [17:41] <hsivonen> Philip: the kind of people who use iconv in Europe and the Americas should know to use Windows-1252 when they see ISO-8859-1. Presumably, anyone who'd use iconv in China should know to specify GBK...
  115. # [17:42] <Philip> hsivonen: (Anyway, serialisers don't preserve human-significant aspects of the document, like attribute ordering and whitespace inside elements, so they're not at all equivalent to a charset-converting tool)
  116. # [17:42] <hsivonen> Philip: true.
  117. # [17:42] <hsivonen> Philip: but if you're working with someone else's "garbage out", you can't assume validity
  118. # [17:45] <Philip> hsivonen: You can pass it through a validator to see if it's valid, and if it's not then reject it, otherwise pass it through iconv to standardise the charset without disturbing the source document any more than is absolutely necessary
  119. # [17:47] <hsivonen> Philip: I think supporting that use case isn't worth the trouble of detecting the situation in an efficient manner.
  120. # [17:47] <Philip> kind of like how Youtube complains if your video is too long but otherwise standardises it to ugly FLV, except for HTML documents instead of video
  121. # [17:47] <Philip> or, alternatively, like a better analogy, that I can't think of
  122. # [17:47] <hsivonen> Philip: YouTube engineer have built in a lot of knowledge about video encoding craziness
  123. # [17:47] <Philip> or, even better, not like an analogy at all
  124. # [17:48] <hsivonen> Philip: anyone offering a similar service for HTML should at minimum look up the aliases in the spec
  125. # [17:48] <hsivonen> s/engineer/engineers/
  126. # [17:49] <hsivonen> afk
  127. # [17:54] <Philip> hsivonen: Hmm, good point :-(
  128. # [18:01] * Joins: ChrisWilson (cwilso@131.107.0.104)
  129. # [18:16] * Joins: aaronlev (chatzilla@216.18.1.210)
  130. # [18:55] * Quits: Hixie (ianh@129.241.93.37) (Ping timeout)
  131. # [18:55] * Joins: Hixie (ianh@129.241.93.37)
  132. # [18:55] * Quits: hsivonen (hsivonen@130.233.41.50) (Ping timeout)
  133. # [18:57] * Joins: hsivonen (hsivonen@130.233.41.50)
  134. # [18:58] * Quits: aaronlev (chatzilla@216.18.1.210) (Ping timeout)
  135. # [19:11] * Joins: marcos (marcos@124.171.136.76)
  136. # [19:44] * Joins: tlr (tlr@128.30.52.30)
  137. # [19:55] * Joins: adele (adele@17.203.14.218)
  138. # [20:07] * Quits: tlr (tlr@128.30.52.30) (Quit: tlr)
  139. # [20:09] * Quits: mjs (mjs@24.5.43.151) (Quit: mjs)
  140. # [20:21] * Joins: scotfl (scotfl@70.64.14.62)
  141. # [20:27] * Quits: marcos (marcos@124.171.136.76) (Quit: marcos)
  142. # [20:39] * Joins: plinss_ (peter.lins@15.243.169.70)
  143. # [20:57] * Joins: codedread (chatzilla@129.188.69.129)
  144. # [20:57] * Parts: codedread (chatzilla@129.188.69.129)
  145. # [21:38] * Quits: Zeros (Zeros-Elip@69.140.40.140) (Ping timeout)
  146. # [21:39] * Joins: Zeros (Zeros-Elip@67.154.87.254)
  147. # [21:45] * Quits: Zeros (Zeros-Elip@67.154.87.254) (Quit: Leaving)
  148. # [22:17] * Joins: mjs (mjs@17.255.96.56)
  149. # [22:36] * Quits: ChrisWilson (cwilso@131.107.0.104) (Ping timeout)
  150. # [22:46] * Joins: ChrisWilson (cwilso@131.107.0.104)
  151. # [23:04] * Quits: ROBOd (robod@89.122.216.38) (Quit: http://www.robodesign.ro )
  152. # [23:05] * Quits: mjs (mjs@17.255.96.56) (Quit: mjs)
  153. # [23:05] * Quits: plinss_ (peter.lins@15.243.169.70) (Quit: plinss_)
  154. # [23:09] * Joins: mjs (mjs@17.255.96.56)
  155. # [23:32] * Joins: mjs_ (mjs@17.255.96.56)
  156. # [23:33] * Quits: mjs (mjs@17.255.96.56) (Connection reset by peer)
  157. # [23:53] * Quits: gsnedders (gsnedders@217.44.35.200) (Quit: Killin' teh intarwebs)
  158. # [23:53] * Joins: gsnedders (gsnedders@217.44.35.200)
  159. # [23:53] * Parts: gsnedders (gsnedders@217.44.35.200)
  160. # Session Close: Wed Jul 30 00:00:00 2008

The end :)