/irc-logs / w3c / #html-wg / 2008-12-07 / end

Options:

# Session Start: Sun Dec 07 00:00:00 2008
# Session Ident: #html-wg
# [01:06] * Quits: maddiin (mc@87.185.254.216) (Quit: maddiin)
# [01:07] * Joins: Lionheart (robin@66.57.69.65)
# [02:04] * Joins: dbaron (dbaron@71.204.152.23)
# [03:46] * Quits: sryo (sryo@190.245.204.198) (Ping timeout)
# [03:47] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
# [04:05] * Joins: Zeros (Zeros-Elip@68.50.195.181)
# [04:21] * Quits: Zeros (Zeros-Elip@68.50.195.181) (Quit: Leaving)
# [04:52] * Quits: gavin_ (gavin@99.253.193.147) (Ping timeout)
# [04:57] * Joins: gavin_ (gavin@99.253.193.147)
# [06:10] * Quits: tH (Rob@129.11.83.58) (Quit: ChatZilla 0.9.84-rdmsoft [XULRunner 1.9.0.1/2008072406])
# [07:48] * Quits: gavin_ (gavin@99.253.193.147) (Ping timeout)
# [07:53] * Joins: gavin_ (gavin@99.253.193.147)
# [08:34] * Quits: dbaron (dbaron@71.204.152.23) (Quit: 8403864 bytes have been tenured, next gc will be global.)
# [09:10] * Joins: Zeros (Zeros-Elip@68.50.195.181)
# [10:07] * Parts: deane (opera@121.72.203.100)
# [10:17] * Quits: Zeros (Zeros-Elip@68.50.195.181) (Quit: Leaving)
# [10:23] * Quits: gavin_ (gavin@99.253.193.147) (Ping timeout)
# [10:28] * Joins: gavin_ (gavin@99.253.193.147)
# [10:31] * Joins: ROBOd (robod@89.122.216.38)
# [10:40] * Joins: hywan (hywan@212.62.167.226)
# [10:40] * Quits: hywan (hywan@212.62.167.226) (Quit: Leaving)
# [10:59] * Joins: deane (opera@121.72.174.207)
# [11:00] * Parts: deane (opera@121.72.174.207)
# [11:30] * Quits: xover (xover@193.157.66.22) (Ping timeout)
# [11:31] * Joins: xover (xover@193.157.66.22)
# [11:36] * Joins: sryo (sryo@190.245.204.198)
# [13:03] * Quits: gavin_ (gavin@99.253.193.147) (Ping timeout)
# [13:09] * Joins: gavin_ (gavin@99.253.193.147)
# [13:47] * Joins: tH (Rob@129.11.83.58)
# [13:57] * Quits: anne (annevk@213.236.208.22) (Ping timeout)
# [14:19] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
# [14:20] * Quits: sryo (sryo@190.245.204.198) (Ping timeout)
# [14:20] <MikeSmith> hsivonen: you there?
# [14:32] <MikeSmith> hsivonen: I notice that the whattf.org schema has both common.data.float and form.data.float, and that microsyntaxes for them differ
# [14:32] <MikeSmith> but the spec has only one definition for "valid floating point number"
# [14:33] <MikeSmith> which form.data.float matches, but common.data.float does not
# [14:34] <MikeSmith> so it seems like there only needs to be just one *.data.float pattern defined, and its definition should match the current form.data.float one
# [14:39] <MikeSmith> same for common.data.float.positive and form.data.float.positive
# [14:40] <MikeSmith> and it seems that the definition for common.data.float.non-negative needs to be revised also
# [14:45] <MikeSmith> so, specifically that there should just be one definition for each, and that there definitions should be w:float-exp, w:float-exp-positive, with the new addition of w:float-exp-non-negative
# [14:46] <MikeSmith> anyway, I'll file a bug at bugzilla.validator.nu
# [14:48] <MikeSmith> also, as far as the definition of float, the spec allows either "E" or "e", but the regexp in the schema comment restricts it to "e" only
# [15:17] * Joins: Sander (svl@86.87.68.167)
# [15:26] <MikeSmith> http://bugzilla.validator.nu/show_bug.cgi?id=345
# [15:26] <pimpbot> 345: peterv@propagandism.org, P2, RESOLVED FIXED, Problem building dom1-html-gen-jsunit
# [15:28] <MikeSmith> hmm, pimpbot seems to think any bugzilla URL is a to W3C bugzilla..
# [15:28] <MikeSmith> hsivonen: anyway, bug filed and patch attached
# [15:35] <MikeSmith> hsivonen: also noticed that there's some leftover cruft still from old repetition-template stuff
# [15:35] * MikeSmith goes to raise another bug
# [16:06] * marcos wonders if anyone on linux can help him out quickly... For the widget spec, I'm trying to work out how different Zip implementations encode characters across various OSs, but I don't have linux installed. It would be great if someone could create an empty ASCII text file and give it the file name "ñ" (no file extension) . Zip it up and email it to me at marcosscaceres@gmail.com. If anyone can help me, it would be greatly appreciated.
# [16:07] <gsnedders> marcos: That'll be locale dependant, I expect
# [16:08] <marcos> shouldn't be.
# [16:08] <gsnedders> The encoding of the file name likely is
# [16:08] <Philip> As far as I'm aware, Linux filesystems treat filenames as raw bytes, and the interpretation as characters is up to the user
# [16:09] <marcos> gsnedders, the thing should go -> locale setting -> Zip encoder -> UTF-8 (zip only supports CP437 or UTF-8)
# [16:09] <marcos> otherwise it will go locale setting -> Zip encoder -> CP437
# [16:11] <marcos> I'm trying to see if, like in MacOs, file names get normalized to decomposed form.
# [16:11] <marcos> or wether they just get destroyed, as happens in Windows.
# [16:13] <gsnedders> marcos: NFD is used at a file system level on OS X
# [16:14] <marcos> yep, I got that so far :)
# [16:14] <gsnedders> Nothing else does that :)
# [16:16] <marcos> MacOs seems to be the only mainstream implementation of Zip that supports UTF-8 in NFD. So I'm wondering if anyone else has followed their implementation of interop. If they have, then I can mandate NFD in the widget spec.
# [16:17] <marcos> the i18n guys said that, if anything, I should recommend NFC.
# [16:17] <marcos> But I just want to make sure.
# [16:20] * Joins: sryo (sryo@190.245.204.198)
# [16:22] <marcos> Hmmm... gsnedders, it might be even worst. MacOS might actually be using FCD instead of NFD.
# [16:23] <Philip> marcos: On my particular configuration of Linux and version of 'zip', ñ gets stored as 0xC3 0xB1
# [16:23] <marcos> Philip, great! thank for the info. I'll see what encoding that corresponds to.
# [16:24] <Philip> (I have everything configured to use UTF-8 as far as possible)
# [16:27] <marcos> Philip, just to be sure, when you extract the file again, the ñ gets retained?
# [16:29] * Quits: sryo (sryo@190.245.204.198) (Connection reset by peer)
# [16:29] * Joins: sryo (sryo@190.245.204.198)
# [16:33] <marcos> gsnedders: see http://osdir.com/ml/network.gnutella.limewire.core.devel/2003-01/msg00000.html. Interesting: "HFS+ will always force its own "canonicalization" (which does not
# [16:33] <marcos> conform to any Unicode standard, except for a related technical note related
# [16:33] <marcos> to "fast decomposition" technics used to perform some internal processing of
# [16:33] <marcos> strings, in which FCC and FCD forms are discussed)."
# [16:35] <Philip> marcos: If I extract it on the same computer, then it does
# [16:35] <Philip> marcos: If I extract it on some other Linux machine, which doesn't seem to be set up to use UTF-8, I get
# [16:35] <Philip> ... "extracting: ñ" from the unzip program
# [16:35] <Philip> ... but "-rw-r--r-- 1 philip philip 0 Dec 7 15:22 ??" if I run 'ls -l'
# [16:36] <marcos> nice
# [16:36] <marcos> argh... I think all this Zip filename encoding stuff is all screwed.
# [16:36] <Philip> The file isn't actually called "??"
# [16:36] <marcos> Yeah, I know.
# [16:36] <marcos> What are the bytes
# [16:36] <marcos> ?
# [16:36] <Philip> i.e. "cat \?\?" doesn't find it, and "touch \?\?" makes a second file
# [16:37] <Philip> but it does have two characters in the filename, i.e. "ls ??" finds that file and "ls ?" doesn't
# [16:37] <Philip> How can I find the bytes?
# [16:38] <marcos> can you zip it up again?
# [16:38] <marcos> or maybe copy/paste the file name into a text file and then check it with a text editor?
# [16:39] <marcos> s/text/hex
# [16:39] <marcos> editor
# [16:39] <Philip> $ zip test2.zip ?? adding: ñ (stored 0%)
# [16:39] <Philip> Argh, stupid newline mangling
# [16:39] <Philip> $ zip test2.zip ??
# [16:39] <Philip> adding: ñ (stored 0%)
# [16:40] <Philip> and then the .zip file contains 0xC3 0xB1 again
# [16:41] <Philip> which seems consistent with the view that filenames are raw bytes, and it's left to the user-visible tools to do whatever encoding/decoding they want
# [16:42] <marcos> oh, this is fun :(
# [16:43] <Philip> (http://docs.python.org/dev/3.0/whatsnew/3.0.html - "Filenames are passed to and returned from APIs as (Unicode) strings. This can present platform-specific problems because on some platforms filenames are arbitrary byte strings."
# [16:43] <pimpbot> Title: Whats New In Python 3.0 — Python v3.1a0 documentation (at docs.python.org)
# [16:43] <marcos> That would certainly explain why I can't match C3 B1 to any character encoding.
# [16:44] <Philip> marcos: Isn't it just UTF-8?
# [16:44] <marcos> no, UTF -8 should be Ux6E 0xCC 0x83 (decomposed, at least). Let me check NFC...
# [16:45] <Philip> Why would it be decomposed?
# [16:45] * Dashiva was wondering the same
# [16:46] <marcos> sorry, still in MacOs world. You are right. It is UTF-8
# [16:46] <marcos> U+00F1 ñ c3 b1 LATIN SMALL LETTER N WITH TILDE
# [16:49] <marcos> Ok, so, in the spec it would be best to either say nothing about NFD or NFC... or recommend NFC.
# [16:51] <Dashiva> Is NFC possible on mac?
# [16:52] <gsnedders> At a FS level? No
# [16:52] <marcos> Doubt it. Apple should fix their Zip implementation.
# [16:52] <gsnedders> Why should they? Does the Zip spec say it should be NFC?
# [16:53] <Philip> Why does it matter?
# [16:53] <marcos> I wanna share my zip files between a OSs.
# [16:54] <marcos> the only way I can do that is to use ASCII character names.
# [16:54] <gsnedders> marcos: The OSes should cope with any normalization form :\
# [16:55] <marcos> gsnedders: should the widget engine cope with any normalization form?
# [16:56] <gsnedders> marcos: Yeah
# [16:56] <Dashiva> How can it cope?
# [16:56] <Dashiva> If the filename is raw bytes on OS level
# [16:56] <gsnedders> On OS X it isn't
# [16:56] <gsnedders> On OS X it's a Unicode string
# [16:57] * Philip wonders what it is considered to be on Windows
# [16:57] <Dashiva> But on other OSes
# [16:57] <Philip> (NTFS just stores arbitrary 2-byte characters, as far as I'm aware)
# [16:57] <Philip> (which is quite similar to how it works on Linux, except Linux does 1-byte units instead)
# [16:57] <gsnedders> Dashiva: Other OSes should do it too :P
# [16:58] <Dashiva> Should is fine and well, but meanwhile we're concerned with what works in practice :P
# [16:59] <gsnedders> Dashiva: :P
# [16:59] <gsnedders> Dashiva: Theoretical bullshit ftw!
# [17:00] <marcos> In windows, when I compress a file called "ñ" the byte sequence that represents the file name is 0xA4 0x0B
# [17:01] <marcos> Actually, it's just 0xA4, which is correct for CP437
# [17:02] <marcos> A4 164 ñ 241 LATIN SMALL LETTER N WITH TILDE
# [17:02] <Philip> gsnedders: So what should those OSes do when dealing with data on old filesystems which aren't using Unicode strings?
# [17:02] <gsnedders> Philip: Deal with it at a FS driver level, hiding it to everything else
# [17:03] <Philip> gsnedders: How can they deal with it when the filesystem driver has no idea what encoding is used for the bytes on disk?
# [17:03] <MikeSmith> marcos, gsnedders, Philip - what's your timezone abbreviations?
# [17:03] <gsnedders> MikeSmith: Z
# [17:03] <Philip> MikeSmith: GMT
# [17:04] <Philip> except when it's BST
# [17:04] <gsnedders> What about UTC?
# [17:04] <marcos> MikeSmith: GMT
# [17:04] <Philip> I don't care about UTC :-p
# [17:05] <Philip> GMT is close enough
# [17:06] * Quits: phenny (phenny@80.68.92.65) (Client exited)
# [17:06] * Joins: phenny (phenny@80.68.92.65)
# [17:06] <MikeSmith> .t gsnedders
# [17:06] <phenny> Sun, 07 Dec 2008 16:04:52 GMT
# [17:06] <gsnedders> Awesome.
# [17:06] <MikeSmith> .t Philip
# [17:06] <phenny> Sun, 07 Dec 2008 16:05:01 GMT
# [17:06] <MikeSmith> .t marcos
# [17:06] <phenny> Sun, 07 Dec 2008 16:05:05 GMT
# [17:06] <Philip> (As far as Linux is concerned, my timezone is actually GB)
# [17:07] <Philip> (which I assume means it'll get the right DST changes)
# [17:07] <Philip> .t phenny
# [17:07] <phenny> Philip: Sorry, I don't know about the 'phenny' timezone.
# [17:07] <Philip> .t GB
# [17:07] <phenny> Sun Dec 7 16:06:05 GMT 2008
# [17:08] <Philip> .t ../../../etc/passwd
# [17:08] <phenny> Philip: Sorry, I don't know about the '../../../etc/passwd' timezone.
# [17:08] <marcos> gsnedders: so, I guess the best thing would be to have a special Zip implementation just for widgets that always stores the files names in some normalized form. Then you do the reverse when unzipping.
# [17:08] <Philip> Hmm, maybe it's not just reading files from /usr/share/zoneinfo :-(
# [17:08] <Philip> marcos: If you need some special implementation, why use Zip at all?
# [17:09] <marcos> Philip: that's kinda what I am trying to get at... trying to work out what the extremes of the problem are.
# [17:10] <marcos> or ways in which the problem can be solved... I'm not seeing another way to solve this.
# [17:11] <marcos> There is too much variation in what Zip implementations do.
# [17:11] <marcos> They are completely incompatible with any character ranges outside ASCII
# [17:12] <Philip> marcos: Is the situation something like: User creates ñ.jpg, and an .html that does <img src=ñ.jpg>, then zips all the files up and calls them a widget, and then tries to view it
# [17:12] <marcos> ATM, in the widget spec, we discourage authors from using characters outside the ASCII range. However, naturally, that really sucks.
# [17:12] <Philip> (so it doesn't matter much what happens when a user unzips the widget, because they're not really going to do that)
# [17:14] <marcos> But what if ñ.jpg was created on your system, then I bring it to mine (Windows) and it becomes Ã±.jpg because your system used UTF-8 and mine is using CP437?
# [17:15] <Philip> What do you mean by "bring"?
# [17:16] <marcos> you email me your widget
# [17:16] <marcos> or zip file.
# [17:16] <Philip> Ah, so zipping+unzipping
# [17:16] <Philip> Is anyone likely to do that with widgets?
# [17:17] <Dashiva> The widget engine has to get to the files somehow
# [17:17] <marcos> philip, that is the whole point of widgets :)
# [17:18] <Philip> The widget engine can be designed specially to do some magic when trying to match filenames, so it doesn't have the same constraints as the OS's standard unzipping tools
# [17:18] <marcos> Philip, that magic is what I'm trying to spec :)
# [17:18] <Dashiva> Encoding sniffing, oh boy
# [17:19] <Dashiva> marcos: You could require all widgets to contain a file named ñ and use that to determine the method :)
# [17:19] <Philip> Do widgets only ever have to match filenames, and never simply decode them? (i.e. the operation is "match(unicode-string, list-of-all-byte-sequence-filenames)" and not "decode-to-unicode(byte-sequence-filename)")
# [17:21] <Dashiva> The filename in the widget could be decomposed utf-8, while it also contains a non-decomposed utf-8 HTML file referencing an image
# [17:21] <marcos> Philip: they need to match file names. They also need to encode the file names as URIs (using the tag:// or widget:// uri scheme) and then decode to whatever encoding the file system has to find the correct files, etc.
# [17:22] <marcos> s/tag:///tag:
# [17:24] <marcos> Hmmm.... maybe I should just leave this as an implementation detail.
# [17:50] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
# [18:32] * Joins: dbaron (dbaron@71.204.152.23)
# [18:46] <shepazu> .t shepazu
# [18:46] <phenny> Sun, 07 Dec 2008 12:45:21 EST
# [18:46] <shepazu> yay
# [18:55] <gsnedders> shepazu has discovered it's afternoon!
# [18:55] <shepazu> since I just got up, I wasn't sure :)
# [18:55] <shepazu> I've also discovered that MikeSmith got phenny working
# [18:56] <shepazu> .t gsnedders
# [18:56] <phenny> Sun, 07 Dec 2008 17:54:41 GMT
# [19:08] * Joins: Sander (svl@86.87.68.167)
# [19:49] <hsivonen> MikeSmith: thanks. WF2 and HTML5 used to have different floats
# [19:50] * hsivonen is jetlagged in California
# [19:57] * Joins: Zeros (Zeros-Elip@68.50.195.181)
# [20:09] <marcos> .t marcos
# [20:09] <phenny> Sun, 07 Dec 2008 19:08:25 GMT
# [20:10] <marcos> wholly crap!! that's amazing! :P
# [20:20] <hsivonen> .t hsivonen
# [20:20] <phenny> Sun, 07 Dec 2008 21:19:10 EET
# [20:27] * Quits: shepazu (schepers@128.30.52.30) (Ping timeout)
# [21:09] * Quits: dbaron (dbaron@71.204.152.23) (Quit: 8403864 bytes have been tenured, next gc will be global.)
# [22:11] * Joins: maddiin (mc@87.185.188.143)
# [22:17] * Joins: dbaron (dbaron@71.204.152.23)
# [22:36] * Quits: ROBOd (robod@89.122.216.38) (Quit: http://www.robodesign.ro )
# [23:09] * Joins: anne (annevk@213.236.208.22)
# [23:09] * Quits: anne (annevk@213.236.208.22) (Client exited)
# [23:09] * Joins: anne (annevk@213.236.208.22)
# [23:11] * Parts: anne (annevk@213.236.208.22)
# [23:20] * Quits: Lionheart (robin@66.57.69.65) (Quit: Leaving.)
# [23:20] * Joins: Lionheart (robin@66.57.69.65)
# [23:21] * Quits: Lionheart (robin@66.57.69.65) (Connection reset by peer)
# [23:24] * Joins: anne (annevk@213.236.208.22)
# [23:56] * Quits: hober (ted@206.212.254.2) (No route to host)
# Session Close: Mon Dec 08 00:00:00 2008

The end :)