/irc-logs / w3c / #html-wg / 2008-12-07 / end

Options:

  1. # Session Start: Sun Dec 07 00:00:00 2008
  2. # Session Ident: #html-wg
  3. # [01:06] * Quits: maddiin (mc@87.185.254.216) (Quit: maddiin)
  4. # [01:07] * Joins: Lionheart (robin@66.57.69.65)
  5. # [02:04] * Joins: dbaron (dbaron@71.204.152.23)
  6. # [03:46] * Quits: sryo (sryo@190.245.204.198) (Ping timeout)
  7. # [03:47] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  8. # [04:05] * Joins: Zeros (Zeros-Elip@68.50.195.181)
  9. # [04:21] * Quits: Zeros (Zeros-Elip@68.50.195.181) (Quit: Leaving)
  10. # [04:52] * Quits: gavin_ (gavin@99.253.193.147) (Ping timeout)
  11. # [04:57] * Joins: gavin_ (gavin@99.253.193.147)
  12. # [06:10] * Quits: tH (Rob@129.11.83.58) (Quit: ChatZilla 0.9.84-rdmsoft [XULRunner 1.9.0.1/2008072406])
  13. # [07:48] * Quits: gavin_ (gavin@99.253.193.147) (Ping timeout)
  14. # [07:53] * Joins: gavin_ (gavin@99.253.193.147)
  15. # [08:34] * Quits: dbaron (dbaron@71.204.152.23) (Quit: 8403864 bytes have been tenured, next gc will be global.)
  16. # [09:10] * Joins: Zeros (Zeros-Elip@68.50.195.181)
  17. # [10:07] * Parts: deane (opera@121.72.203.100)
  18. # [10:17] * Quits: Zeros (Zeros-Elip@68.50.195.181) (Quit: Leaving)
  19. # [10:23] * Quits: gavin_ (gavin@99.253.193.147) (Ping timeout)
  20. # [10:28] * Joins: gavin_ (gavin@99.253.193.147)
  21. # [10:31] * Joins: ROBOd (robod@89.122.216.38)
  22. # [10:40] * Joins: hywan (hywan@212.62.167.226)
  23. # [10:40] * Quits: hywan (hywan@212.62.167.226) (Quit: Leaving)
  24. # [10:59] * Joins: deane (opera@121.72.174.207)
  25. # [11:00] * Parts: deane (opera@121.72.174.207)
  26. # [11:30] * Quits: xover (xover@193.157.66.22) (Ping timeout)
  27. # [11:31] * Joins: xover (xover@193.157.66.22)
  28. # [11:36] * Joins: sryo (sryo@190.245.204.198)
  29. # [13:03] * Quits: gavin_ (gavin@99.253.193.147) (Ping timeout)
  30. # [13:09] * Joins: gavin_ (gavin@99.253.193.147)
  31. # [13:47] * Joins: tH (Rob@129.11.83.58)
  32. # [13:57] * Quits: anne (annevk@213.236.208.22) (Ping timeout)
  33. # [14:19] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
  34. # [14:20] * Quits: sryo (sryo@190.245.204.198) (Ping timeout)
  35. # [14:20] <MikeSmith> hsivonen: you there?
  36. # [14:32] <MikeSmith> hsivonen: I notice that the whattf.org schema has both common.data.float and form.data.float, and that microsyntaxes for them differ
  37. # [14:32] <MikeSmith> but the spec has only one definition for "valid floating point number"
  38. # [14:33] <MikeSmith> which form.data.float matches, but common.data.float does not
  39. # [14:34] <MikeSmith> so it seems like there only needs to be just one *.data.float pattern defined, and its definition should match the current form.data.float one
  40. # [14:39] <MikeSmith> same for common.data.float.positive and form.data.float.positive
  41. # [14:40] <MikeSmith> and it seems that the definition for common.data.float.non-negative needs to be revised also
  42. # [14:45] <MikeSmith> so, specifically that there should just be one definition for each, and that there definitions should be w:float-exp, w:float-exp-positive, with the new addition of w:float-exp-non-negative
  43. # [14:46] <MikeSmith> anyway, I'll file a bug at bugzilla.validator.nu
  44. # [14:48] <MikeSmith> also, as far as the definition of float, the spec allows either "E" or "e", but the regexp in the schema comment restricts it to "e" only
  45. # [15:17] * Joins: Sander (svl@86.87.68.167)
  46. # [15:26] <MikeSmith> http://bugzilla.validator.nu/show_bug.cgi?id=345
  47. # [15:26] <pimpbot> 345: peterv@propagandism.org, P2, RESOLVED FIXED, Problem building dom1-html-gen-jsunit
  48. # [15:28] <MikeSmith> hmm, pimpbot seems to think any bugzilla URL is a to W3C bugzilla..
  49. # [15:28] <MikeSmith> hsivonen: anyway, bug filed and patch attached
  50. # [15:35] <MikeSmith> hsivonen: also noticed that there's some leftover cruft still from old repetition-template stuff
  51. # [15:35] * MikeSmith goes to raise another bug
  52. # [16:06] * marcos wonders if anyone on linux can help him out quickly... For the widget spec, I'm trying to work out how different Zip implementations encode characters across various OSs, but I don't have linux installed. It would be great if someone could create an empty ASCII text file and give it the file name "ñ" (no file extension) . Zip it up and email it to me at marcosscaceres@gmail.com. If anyone can help me, it would be greatly appreciated.
  53. # [16:07] <gsnedders> marcos: That'll be locale dependant, I expect
  54. # [16:08] <marcos> shouldn't be.
  55. # [16:08] <gsnedders> The encoding of the file name likely is
  56. # [16:08] <Philip> As far as I'm aware, Linux filesystems treat filenames as raw bytes, and the interpretation as characters is up to the user
  57. # [16:09] <marcos> gsnedders, the thing should go -> locale setting -> Zip encoder -> UTF-8 (zip only supports CP437 or UTF-8)
  58. # [16:09] <marcos> otherwise it will go locale setting -> Zip encoder -> CP437
  59. # [16:11] <marcos> I'm trying to see if, like in MacOs, file names get normalized to decomposed form.
  60. # [16:11] <marcos> or wether they just get destroyed, as happens in Windows.
  61. # [16:13] <gsnedders> marcos: NFD is used at a file system level on OS X
  62. # [16:14] <marcos> yep, I got that so far :)
  63. # [16:14] <gsnedders> Nothing else does that :)
  64. # [16:16] <marcos> MacOs seems to be the only mainstream implementation of Zip that supports UTF-8 in NFD. So I'm wondering if anyone else has followed their implementation of interop. If they have, then I can mandate NFD in the widget spec.
  65. # [16:17] <marcos> the i18n guys said that, if anything, I should recommend NFC.
  66. # [16:17] <marcos> But I just want to make sure.
  67. # [16:20] * Joins: sryo (sryo@190.245.204.198)
  68. # [16:22] <marcos> Hmmm... gsnedders, it might be even worst. MacOS might actually be using FCD instead of NFD.
  69. # [16:23] <Philip> marcos: On my particular configuration of Linux and version of 'zip', ñ gets stored as 0xC3 0xB1
  70. # [16:23] <marcos> Philip, great! thank for the info. I'll see what encoding that corresponds to.
  71. # [16:24] <Philip> (I have everything configured to use UTF-8 as far as possible)
  72. # [16:27] <marcos> Philip, just to be sure, when you extract the file again, the ñ gets retained?
  73. # [16:29] * Quits: sryo (sryo@190.245.204.198) (Connection reset by peer)
  74. # [16:29] * Joins: sryo (sryo@190.245.204.198)
  75. # [16:33] <marcos> gsnedders: see http://osdir.com/ml/network.gnutella.limewire.core.devel/2003-01/msg00000.html. Interesting: "HFS+ will always force its own "canonicalization" (which does not
  76. # [16:33] <marcos> conform to any Unicode standard, except for a related technical note related
  77. # [16:33] <marcos> to "fast decomposition" technics used to perform some internal processing of
  78. # [16:33] <marcos> strings, in which FCC and FCD forms are discussed)."
  79. # [16:35] <Philip> marcos: If I extract it on the same computer, then it does
  80. # [16:35] <Philip> marcos: If I extract it on some other Linux machine, which doesn't seem to be set up to use UTF-8, I get
  81. # [16:35] <Philip> ... "extracting: ñ" from the unzip program
  82. # [16:35] <Philip> ... but "-rw-r--r-- 1 philip philip 0 Dec 7 15:22 ??" if I run 'ls -l'
  83. # [16:36] <marcos> nice
  84. # [16:36] <marcos> argh... I think all this Zip filename encoding stuff is all screwed.
  85. # [16:36] <Philip> The file isn't actually called "??"
  86. # [16:36] <marcos> Yeah, I know.
  87. # [16:36] <marcos> What are the bytes
  88. # [16:36] <marcos> ?
  89. # [16:36] <Philip> i.e. "cat \?\?" doesn't find it, and "touch \?\?" makes a second file
  90. # [16:37] <Philip> but it does have two characters in the filename, i.e. "ls ??" finds that file and "ls ?" doesn't
  91. # [16:37] <Philip> How can I find the bytes?
  92. # [16:38] <marcos> can you zip it up again?
  93. # [16:38] <marcos> or maybe copy/paste the file name into a text file and then check it with a text editor?
  94. # [16:39] <marcos> s/text/hex
  95. # [16:39] <marcos> editor
  96. # [16:39] <Philip> $ zip test2.zip ?? adding: ñ (stored 0%)
  97. # [16:39] <Philip> Argh, stupid newline mangling
  98. # [16:39] <Philip> $ zip test2.zip ??
  99. # [16:39] <Philip> adding: ñ (stored 0%)
  100. # [16:40] <Philip> and then the .zip file contains 0xC3 0xB1 again
  101. # [16:41] <Philip> which seems consistent with the view that filenames are raw bytes, and it's left to the user-visible tools to do whatever encoding/decoding they want
  102. # [16:42] <marcos> oh, this is fun :(
  103. # [16:43] <Philip> (http://docs.python.org/dev/3.0/whatsnew/3.0.html - "Filenames are passed to and returned from APIs as (Unicode) strings. This can present platform-specific problems because on some platforms filenames are arbitrary byte strings."
  104. # [16:43] <pimpbot> Title: Whats New In Python 3.0 &#8212; Python v3.1a0 documentation (at docs.python.org)
  105. # [16:43] <marcos> That would certainly explain why I can't match C3 B1 to any character encoding.
  106. # [16:44] <Philip> marcos: Isn't it just UTF-8?
  107. # [16:44] <marcos> no, UTF -8 should be Ux6E 0xCC 0x83 (decomposed, at least). Let me check NFC...
  108. # [16:45] <Philip> Why would it be decomposed?
  109. # [16:45] * Dashiva was wondering the same
  110. # [16:46] <marcos> sorry, still in MacOs world. You are right. It is UTF-8
  111. # [16:46] <marcos> U+00F1 ñ c3 b1 LATIN SMALL LETTER N WITH TILDE
  112. # [16:49] <marcos> Ok, so, in the spec it would be best to either say nothing about NFD or NFC... or recommend NFC.
  113. # [16:51] <Dashiva> Is NFC possible on mac?
  114. # [16:52] <gsnedders> At a FS level? No
  115. # [16:52] <marcos> Doubt it. Apple should fix their Zip implementation.
  116. # [16:52] <gsnedders> Why should they? Does the Zip spec say it should be NFC?
  117. # [16:53] <Philip> Why does it matter?
  118. # [16:53] <marcos> I wanna share my zip files between a OSs.
  119. # [16:54] <marcos> the only way I can do that is to use ASCII character names.
  120. # [16:54] <gsnedders> marcos: The OSes should cope with any normalization form :\
  121. # [16:55] <marcos> gsnedders: should the widget engine cope with any normalization form?
  122. # [16:56] <gsnedders> marcos: Yeah
  123. # [16:56] <Dashiva> How can it cope?
  124. # [16:56] <Dashiva> If the filename is raw bytes on OS level
  125. # [16:56] <gsnedders> On OS X it isn't
  126. # [16:56] <gsnedders> On OS X it's a Unicode string
  127. # [16:57] * Philip wonders what it is considered to be on Windows
  128. # [16:57] <Dashiva> But on other OSes
  129. # [16:57] <Philip> (NTFS just stores arbitrary 2-byte characters, as far as I'm aware)
  130. # [16:57] <Philip> (which is quite similar to how it works on Linux, except Linux does 1-byte units instead)
  131. # [16:57] <gsnedders> Dashiva: Other OSes should do it too :P
  132. # [16:58] <Dashiva> Should is fine and well, but meanwhile we're concerned with what works in practice :P
  133. # [16:59] <gsnedders> Dashiva: :P
  134. # [16:59] <gsnedders> Dashiva: Theoretical bullshit ftw!
  135. # [17:00] <marcos> In windows, when I compress a file called "ñ" the byte sequence that represents the file name is 0xA4 0x0B
  136. # [17:01] <marcos> Actually, it's just 0xA4, which is correct for CP437
  137. # [17:02] <marcos> A4 164 ñ 241 LATIN SMALL LETTER N WITH TILDE
  138. # [17:02] <Philip> gsnedders: So what should those OSes do when dealing with data on old filesystems which aren't using Unicode strings?
  139. # [17:02] <gsnedders> Philip: Deal with it at a FS driver level, hiding it to everything else
  140. # [17:03] <Philip> gsnedders: How can they deal with it when the filesystem driver has no idea what encoding is used for the bytes on disk?
  141. # [17:03] <MikeSmith> marcos, gsnedders, Philip - what's your timezone abbreviations?
  142. # [17:03] <gsnedders> MikeSmith: Z
  143. # [17:03] <Philip> MikeSmith: GMT
  144. # [17:04] <Philip> except when it's BST
  145. # [17:04] <gsnedders> What about UTC?
  146. # [17:04] <marcos> MikeSmith: GMT
  147. # [17:04] <Philip> I don't care about UTC :-p
  148. # [17:05] <Philip> GMT is close enough
  149. # [17:06] * Quits: phenny (phenny@80.68.92.65) (Client exited)
  150. # [17:06] * Joins: phenny (phenny@80.68.92.65)
  151. # [17:06] <MikeSmith> .t gsnedders
  152. # [17:06] <phenny> Sun, 07 Dec 2008 16:04:52 GMT
  153. # [17:06] <gsnedders> Awesome.
  154. # [17:06] <MikeSmith> .t Philip
  155. # [17:06] <phenny> Sun, 07 Dec 2008 16:05:01 GMT
  156. # [17:06] <MikeSmith> .t marcos
  157. # [17:06] <phenny> Sun, 07 Dec 2008 16:05:05 GMT
  158. # [17:06] <Philip> (As far as Linux is concerned, my timezone is actually GB)
  159. # [17:07] <Philip> (which I assume means it'll get the right DST changes)
  160. # [17:07] <Philip> .t phenny
  161. # [17:07] <phenny> Philip: Sorry, I don't know about the 'phenny' timezone.
  162. # [17:07] <Philip> .t GB
  163. # [17:07] <phenny> Sun Dec 7 16:06:05 GMT 2008
  164. # [17:08] <Philip> .t ../../../etc/passwd
  165. # [17:08] <phenny> Philip: Sorry, I don't know about the '../../../etc/passwd' timezone.
  166. # [17:08] <marcos> gsnedders: so, I guess the best thing would be to have a special Zip implementation just for widgets that always stores the files names in some normalized form. Then you do the reverse when unzipping.
  167. # [17:08] <Philip> Hmm, maybe it's not just reading files from /usr/share/zoneinfo :-(
  168. # [17:08] <Philip> marcos: If you need some special implementation, why use Zip at all?
  169. # [17:09] <marcos> Philip: that's kinda what I am trying to get at... trying to work out what the extremes of the problem are.
  170. # [17:10] <marcos> or ways in which the problem can be solved... I'm not seeing another way to solve this.
  171. # [17:11] <marcos> There is too much variation in what Zip implementations do.
  172. # [17:11] <marcos> They are completely incompatible with any character ranges outside ASCII
  173. # [17:12] <Philip> marcos: Is the situation something like: User creates ñ.jpg, and an .html that does <img src=ñ.jpg>, then zips all the files up and calls them a widget, and then tries to view it
  174. # [17:12] <marcos> ATM, in the widget spec, we discourage authors from using characters outside the ASCII range. However, naturally, that really sucks.
  175. # [17:12] <Philip> (so it doesn't matter much what happens when a user unzips the widget, because they're not really going to do that)
  176. # [17:14] <marcos> But what if ñ.jpg was created on your system, then I bring it to mine (Windows) and it becomes ñ.jpg because your system used UTF-8 and mine is using CP437?
  177. # [17:15] <Philip> What do you mean by "bring"?
  178. # [17:16] <marcos> you email me your widget
  179. # [17:16] <marcos> or zip file.
  180. # [17:16] <Philip> Ah, so zipping+unzipping
  181. # [17:16] <Philip> Is anyone likely to do that with widgets?
  182. # [17:17] <Dashiva> The widget engine has to get to the files somehow
  183. # [17:17] <marcos> philip, that is the whole point of widgets :)
  184. # [17:18] <Philip> The widget engine can be designed specially to do some magic when trying to match filenames, so it doesn't have the same constraints as the OS's standard unzipping tools
  185. # [17:18] <marcos> Philip, that magic is what I'm trying to spec :)
  186. # [17:18] <Dashiva> Encoding sniffing, oh boy
  187. # [17:19] <Dashiva> marcos: You could require all widgets to contain a file named ñ and use that to determine the method :)
  188. # [17:19] <Philip> Do widgets only ever have to match filenames, and never simply decode them? (i.e. the operation is "match(unicode-string, list-of-all-byte-sequence-filenames)" and not "decode-to-unicode(byte-sequence-filename)")
  189. # [17:21] <Dashiva> The filename in the widget could be decomposed utf-8, while it also contains a non-decomposed utf-8 HTML file referencing an image
  190. # [17:21] <marcos> Philip: they need to match file names. They also need to encode the file names as URIs (using the tag:// or widget:// uri scheme) and then decode to whatever encoding the file system has to find the correct files, etc.
  191. # [17:22] <marcos> s/tag:///tag:
  192. # [17:24] <marcos> Hmmm.... maybe I should just leave this as an implementation detail.
  193. # [17:50] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
  194. # [18:32] * Joins: dbaron (dbaron@71.204.152.23)
  195. # [18:46] <shepazu> .t shepazu
  196. # [18:46] <phenny> Sun, 07 Dec 2008 12:45:21 EST
  197. # [18:46] <shepazu> yay
  198. # [18:55] <gsnedders> shepazu has discovered it's afternoon!
  199. # [18:55] <shepazu> since I just got up, I wasn't sure :)
  200. # [18:55] <shepazu> I've also discovered that MikeSmith got phenny working
  201. # [18:56] <shepazu> .t gsnedders
  202. # [18:56] <phenny> Sun, 07 Dec 2008 17:54:41 GMT
  203. # [19:08] * Joins: Sander (svl@86.87.68.167)
  204. # [19:49] <hsivonen> MikeSmith: thanks. WF2 and HTML5 used to have different floats
  205. # [19:50] * hsivonen is jetlagged in California
  206. # [19:57] * Joins: Zeros (Zeros-Elip@68.50.195.181)
  207. # [20:09] <marcos> .t marcos
  208. # [20:09] <phenny> Sun, 07 Dec 2008 19:08:25 GMT
  209. # [20:10] <marcos> wholly crap!! that's amazing! :P
  210. # [20:20] <hsivonen> .t hsivonen
  211. # [20:20] <phenny> Sun, 07 Dec 2008 21:19:10 EET
  212. # [20:27] * Quits: shepazu (schepers@128.30.52.30) (Ping timeout)
  213. # [21:09] * Quits: dbaron (dbaron@71.204.152.23) (Quit: 8403864 bytes have been tenured, next gc will be global.)
  214. # [22:11] * Joins: maddiin (mc@87.185.188.143)
  215. # [22:17] * Joins: dbaron (dbaron@71.204.152.23)
  216. # [22:36] * Quits: ROBOd (robod@89.122.216.38) (Quit: http://www.robodesign.ro )
  217. # [23:09] * Joins: anne (annevk@213.236.208.22)
  218. # [23:09] * Quits: anne (annevk@213.236.208.22) (Client exited)
  219. # [23:09] * Joins: anne (annevk@213.236.208.22)
  220. # [23:11] * Parts: anne (annevk@213.236.208.22)
  221. # [23:20] * Quits: Lionheart (robin@66.57.69.65) (Quit: Leaving.)
  222. # [23:20] * Joins: Lionheart (robin@66.57.69.65)
  223. # [23:21] * Quits: Lionheart (robin@66.57.69.65) (Connection reset by peer)
  224. # [23:24] * Joins: anne (annevk@213.236.208.22)
  225. # [23:56] * Quits: hober (ted@206.212.254.2) (No route to host)
  226. # Session Close: Mon Dec 08 00:00:00 2008

The end :)