Options:
- # Session Start: Sat Aug 30 00:00:00 2008
- # Session Ident: #whatwg
- # [00:03] <Hixie> oh great, anne's told everyone that the real reason for websocket is my model railway :-P
- # [00:05] <annevk> anecdotical evidence to feed the flames :p
- # [00:06] <Lachy> BenMillard, ok. reading it now
- # [00:06] <Hixie> annevk: good interview
- # [00:07] <annevk> ta
- # [00:07] <annevk> gsnedders, so are you putting up a Web service tonight?
- # [00:07] <BenMillard> smedero has sent me a review of the message. it's enlightening but I think it doesn't change what my message says
- # [00:07] <smedero> no, it shouldn't
- # [00:08] <smedero> it is just backstory, since you probably haven't been following the telecons....
- # [00:08] <BenMillard> yes, thanks for providing me with that
- # [00:14] * Quits: Amorphous (i=jan@unaffiliated/amorphous) (Connection timed out)
- # [00:15] <jacobolus> are unescaped ampersands allowed in html?
- # [00:16] <annevk> sometimes
- # [00:17] * Joins: Amorphous (i=jan@unaffiliated/amorphous)
- # [00:18] <jacobolus> the validator seems happy to accept “<!DOCTYPE html><title></title>&”
- # [00:19] <annevk> that's correct (not sure if the syntax section accurately reflects it currently)
- # [00:20] <jacobolus> I didn't look super carefully, but I only noticed a mention w.r.t. xml
- # [00:20] <annevk> jacobolus, it does not need to be escaped when followed by a space, EOF, another ampersand, start tag, end tag, comment
- # [00:20] <jacobolus> ah, okay
- # [00:21] <jacobolus> “<!DOCTYPE html><title></title>AT&T” properly fails then
- # [00:21] <annevk> yup
- # [00:22] <annevk> hsivonen, "Probable cause: & should have been escaped as &D." what is the "D" doing there?
- # [00:23] <BenMillard> annevk, nothing much. :P
- # [00:24] <jacobolus> ah, it is described in detail, but should maybe be more explicit what authors should do?
- # [00:24] <jacobolus> (i.e. what implementors should do is described in detail)
- # [00:25] <annevk> section 8.1 describes in detail what authors should do
- # [00:25] <jacobolus> oh, nevermind
- # [00:25] <annevk> though it seems it's not entirely clear on things
- # [00:25] <jacobolus> the text must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand.
- # [00:26] <jacobolus> okay, that's reasonable :)
- # [00:26] * Joins: othermaciej (n=mjs@17.203.15.200)
- # [00:26] <jacobolus> http://www.whatwg.org/specs/web-apps/current-work/#ambiguous doesn't mention EOF :)
- # [00:27] <BenMillard> Lachy, are you double-checking my numbers? if so, thanks!
- # [00:28] <annevk> jacobolus, yeah, whatwg@whatwg.org ;)
- # [00:29] <Lachy> BenMillard, I suppose I can go through it again and do that. But let me finish it at least once first... I'm in the middle of doing some other things too
- # [00:29] <BenMillard> Lachy, sure thing...I don't wish to monopolise your time
- # [00:30] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
- # [00:31] * Joins: tndH (i=Rob@adsl-77-86-6-71.karoo.KCOM.COM)
- # [00:33] <Lachy> BenMillard, btw, I found a rather interesting table might be interesting for you to study. But it requires membership to see it on this site http://www.newzbin.com/
- # [00:34] <BenMillard> Lachy, would it be permitted for you to extract the table and send it to me for analysis and publishing in my collection? my work is non-commercial research so I imagine that would qualify as "fair use"
- # [00:34] * annevk summons zcorpan
- # [00:35] * BenMillard hopes zcorpan will arrive with a witty /me line...
- # [00:35] <Lachy> BenMillard, sure. I'll do it after I'm done with your email
- # [00:36] <BenMillard> Lachy, cool. I'll go start some dinner :)
- # [00:39] <gsnedders> annevk: As I said, if anyone writes it :P
- # [00:40] <gsnedders> BenMillard: Peh! Silly logn emails you want me to read!
- # [00:40] <gsnedders> *long
- # [00:41] * Quits: smedero (n=smedero@mdp-nat251.mdp.com)
- # [00:42] <Lachy> gsnedders, you could just pretend to read it, skim it and post random comments about things you see to make him think you're really reading it. ;-)
- # [00:42] <gsnedders> Lachy: True.
- # [00:42] * gsnedders hopes all this crazy insane ECMAScript optimization work shows through in other interpreted languages
- # [00:45] <annevk> gsnedders, oh I see
- # [00:45] <annevk> well, now it's too late
- # [00:45] <annevk> I guess tomorrow I can download the relevant packages and try making it work, shouldn't be too hard
- # [00:45] <gsnedders> annevk: It's probably worth working from what I'll put in hg tomorrow
- # [00:46] <gsnedders> Actually, if you want to work on it in the morning, I could do that now
- # [00:48] * jgraham wonders if there is any point in arguing process with CW
- # [00:48] <BenMillard> gsnedders, there's no obligation and no salesperson will call. :D
- # [00:49] <jgraham> I think I'll fix html5lib first
- # [00:49] <gsnedders> Hixie: PostScript is a turing-complete language
- # [00:49] <annevk> gsnedders, that doesn't mean it will cause a risk
- # [00:49] * Parts: michaeln (n=michaeln@nat/google/x-f4ccb7dae09b7c33)
- # [00:49] * gsnedders doesn't know enough to know that
- # [00:49] * jgraham saw a ~3 line postscript file that did a ray tracing image of a ball on a chessboard or something once
- # [00:49] <gsnedders> I'm just pointing out what I know :P
- # [00:49] <Philip`> Turing machines with no IO devices are a bit rubbish in practice
- # [00:50] * gsnedders passes Philip` an IO device
- # [00:50] <Philip`> I suppose you could try to make them jump backwards and forwards really fast so their tape catches fire
- # [00:52] <gsnedders> Philip`: Problem with that is next to no-one actually implements a turing-machine using tape
- # [00:53] <Philip`> Partly because it'd be physically impossible to implement one at all
- # [00:53] <Lachy> BenMillard, your byte counts for the sizes of the tables seem to be a little off
- # [00:53] <BenMillard> Lachy, perhaps there was a bug during the copying...
- # [00:54] <Lachy> I saved the both files using wget, stripped out all markup before and after the <table>...</table> and resaved the files
- # [00:54] <BenMillard> Lachy, what numbers do you get and by which method? I expect the error is on my part
- # [00:54] <Lachy> For noscope.html, I get 1659, and for complexdatatable.html, I get 2591
- # [00:56] <BenMillard> hmm, what types of line endings do you see? I see 2 characters per newline, which might be my error
- # [00:56] <Lachy> what method did you use to get the sizes?
- # [00:57] <BenMillard> I viewed source in Firefox 2, copied and pasted into a plain text editor, then selected the text from the start of "<table" to the end of "</table>" and read off what the statusbar said
- # [00:57] <BenMillard> your method sounds better to me :)
- # [00:57] * gsnedders reads the start of BenMillard's email and turns off
- # [00:57] <Lachy> line endings are CRLF
- # [00:58] <BenMillard> Lachy, same here
- # [00:58] <Lachy> the problem with copying and pasting from firefox's view source is that it doesn't copy properly
- # [00:59] <Lachy> it frequently adds in extra lines in random places
- # [00:59] <BenMillard> Lachy, I now measure 2,625 for the complexdatatable.html
- # [00:59] <BenMillard> Lachy, yes I removed the empty lines
- # [00:59] <BenMillard> I'm happy to steal your numbers if that's OK by you :)
- # [01:00] <Lachy> no, those numbers are copyrighted to me! :-)
- # [01:01] <BenMillard> Lachy, perhaps you could e-mail me the files you downloaded and the cropped versions, then I can see where the difference is
- # [01:01] <BenMillard> thanks for going to the trouble to do this, btw
- # [01:03] <Lachy> http://lachy.id.au/temp/tables.zip
- # [01:05] <BenMillard> Lachy, both the text editor's selection and Windows Explorer agree with your numbers
- # [01:06] <Lachy> did you find the diffs with your files?
- # [01:06] * Quits: billmason (n=billmaso@ip75.unival.com) (Read error: 104 (Connection reset by peer))
- # [01:07] <BenMillard> Lachy, in noscope.html I see class="header" on the <td> for each of the 3 dates
- # [01:07] <jgraham> argh
- # [01:08] <gsnedders> I probably ought to do the work I'm meant to do for Computing
- # [01:08] <gsnedders> I'm really get quite far behind
- # [01:08] <gsnedders> Though de-facto as long as I've done everything I'm meant to by December it doesn't really matter when I do it
- # [01:08] <BenMillard> gsnedders, do prioritise things above this e-mail. I might not send for a day or two yet
- # [01:09] <gsnedders> BenMillard: I've just been dealing with other low priority email
- # [01:09] <Lachy> why? I took my files directly from juicystudio, and didn't modify anything else
- # [01:09] <gsnedders> BenMillard: From several months ago :)
- # [01:09] <BenMillard> Lachy, you see the class="headers" here? http://juicystudio.com/wcag/tables/noscope.html
- # [01:09] <jgraham> Why did someone think it was a good idea for lxml to add a random doctype when parsing html documents?
- # [01:09] <BenMillard> sorry, class="header": "<td class="header">12/12/2005</td"
- # [01:09] <gsnedders> If you send me something low priority and don't get a reply within an hour or two, it'll probably take a few weeks or months :)
- # [01:10] <gsnedders> jgraham: Because libxml2's HTML support is just a big hack
- # [01:10] <jgraham> gsnedders: The problem is that their XML support tries to enforce XML rules
- # [01:10] <BenMillard> Lachy, the 3 instances of class="header" are also missing from your complexdatatable.html
- # [01:10] <jgraham> Like no : in tag names
- # [01:11] <jgraham> s/tag/attribute/
- # [01:11] <Lachy> Looks like they've just been removed from those files
- # [01:11] <Lachy> press reload
- # [01:11] * Quits: Maurice (i=copyman@cc90688-a.emmen1.dr.home.nl) ("Disconnected...")
- # [01:11] <BenMillard> Lachy: yes, you're right
- # [01:11] <BenMillard> that's annoying
- # [01:11] <Lachy> I did at first, but that must have been cached
- # [01:11] <BenMillard> they're changing the record whilst I'm replying to it :(
- # [01:11] <Lachy> yeah, he's destroying the evidence
- # [01:12] * Joins: tantek (n=tantek@adsl-68-123-180-62.dsl.pltn13.pacbell.net)
- # [01:12] <BenMillard> Lachy, OK, so if you saw those attributes were present then those are the values I'll give since they are historically accurate
- # [01:13] <jgraham> Those attributes were present for sure. I think I mentioned it in an email
- # [01:13] <gsnedders> Why hasn't anyone creating a decent Flickr downloader yet?
- # [01:13] <gsnedders> That is like, easy to use.
- # [01:13] <jgraham> gsnedders: What do you mean decent?
- # [01:13] <Philip`> gsnedders: Like, a web browser?
- # [01:13] <gsnedders> Philip`: But to download an entire set?
- # [01:13] <Philip`> gsnedders: Oh
- # [01:13] <BenMillard> Lachy, the absence of class="header" makes our numbers match, so at least I haven't forgotten how to count. :)
- # [01:13] <gsnedders> jgraham: Not having a crazily complex UI. Copy and pasting a URL from a browser would work fine.
- # [01:14] <Philip`> gsnedders: Write a few (dozen) lines of script to use their API?
- # [01:14] <Lachy> ok, so my values are wrong cause they're the new values without the classes
- # [01:14] <BenMillard> Lachy, correct
- # [01:14] <gsnedders> Philip`: Because I need something that works on my uncle's computer, so I could just write a web API
- # [01:14] <gsnedders> *web interface
- # [01:14] * gsnedders yawns
- # [01:15] <Lachy> BenMillard, there are only 18 header cells by my count, not 20
- # [01:15] <BenMillard> Lachy, 12 in the <thead>, agreed?
- # [01:15] <Lachy> yes
- # [01:15] <BenMillard> Lachy, 2 in the first column of the <tbody>?
- # [01:16] <Lachy> plus the 6 for budgeted, actual and forcasted in the column
- # [01:16] <BenMillard> Lachy, 6 in column 7 as you say
- # [01:16] <BenMillard> Lachy, yep, I've forgotten how to add up them :)
- # [01:16] <Lachy> ah, I didn't count those first 2 as headers
- # [01:16] <BenMillard> oh wait, 12 + 6 + 2 = 20
- # [01:16] <Lachy> the "Partner Portal" ones?
- # [01:16] <BenMillard> Lachy, yeah
- # [01:17] <BenMillard> they are associated as being row headers
- # [01:18] <Lachy> in complexdatatable.html, they are. But in noscope.html, there's nothing that indicates they are headers
- # [01:18] <BenMillard> "<td scope="row" id="row1" rowspan="3">Partner Portal</td>" in http://juicystudio.com/wcag/tables/complexdatatable.html
- # [01:18] <BenMillard> Lachy, yeah, later in my e-mail I mention that test 1 is unfair
- # [01:18] <Lachy> ok
- # [01:18] <BenMillard> and that scope="row" was used in test 3 instead of using headers+id for all the associations
- # [01:18] <Lachy> you mean test 2
- # [01:19] <BenMillard> Lachy, oh sorry you're right
- # [01:19] <BenMillard> scope="row" is used in addition to headers+id in test 3
- # [01:19] <Lachy> oh, what are the final byte counts you used? I should check the percentage given too
- # [01:20] <BenMillard> Lachy, 1,704 and 2,625. yes, I have made percentage errors before now :)
- # [01:22] <Lachy> I get 54.05%
- # [01:22] <BenMillard> Lachy, when talking about test file 3, I say "5 cells use <td scope> and participate in headers+id, duplicating the association." so my e-mail about that aspect correct, I just got muddled during this review
- # [01:23] <BenMillard> Lachy, what is your calculation for that? Maybe I've forgotten percentage increase math...
- # [01:24] <Lachy> (2625 - 1704) / 1704 * 100 = 921 / 1704 * 100 = 54.04%
- # [01:24] <BenMillard> hmm, that's more complicated than what I did :)
- # [01:24] <Lachy> what did you do?
- # [01:25] <BenMillard> just now I tried 2,625 / 1,704 = 1.5404 so yours looks right
- # [01:25] <BenMillard> maybe I typod my sum first time round
- # [01:25] <Lachy> I assumed I needed to find the difference, and then find out what percentage that difference was with the lower value
- # [01:26] <BenMillard> Lachy, so me saying 36% more markup was understating the code bloat by quite a bit! thanks for spotting that
- # [01:26] <Lachy> our numbers are consistent. Yours (1.5404) means that 2625 is 154% the size of 1704
- # [01:27] <Lachy> whereas mine says it's 54% larger
- # [01:27] <BenMillard> Lachy, yeah that's how I interpret it
- # [01:27] <Philip`> (Markup-size doesn't seem a very interesting measure when these tables are probably generated by programs from databases, and no human ever needs to look at the markup, and simplicity of implementing the table-generating code seems much more relevant)
- # [01:28] <BenMillard> Philip`, I've seen auto-generated headers+id, for sure
- # [01:28] <Lachy> Philip`, throwing lots of data at people, regardless of how relevant it is, is a useful techniqe for winning an argument :-)
- # [01:28] <BenMillard> I've also seen typoed headers+id
- # [01:29] <Lachy> fyorfty percent of all people know that, Kent
- # [01:29] <Philip`> Lachy: Winning an argument is not the aim; the aim is to design the best possible system :-p
- # [01:29] <BenMillard> Philip`, it's also worth considering that if the generating code can be radically simpler (such as just using <th> for all headers) that reduces the likelihood of bugs in the table
- # [01:30] <Lachy> s/fyorfty/forfty/ (I messed the simpsons quote :-))
- # [01:31] <Philip`> BenMillard: It's good to encourage people to do the simplest thing, but sometimes they just have complex tables, so I thought the issue was how to support the most complex tables (e.g. whether to force them to use <th> instead of <td>)
- # [01:31] <BenMillard> Philip`, that's right. So if a table can be supported by plain <th> using a sane association algorithm, that's preferable over the complexity and bloat of headers+id, in my judgement.
- # [01:32] * Quits: tantek (n=tantek@adsl-68-123-180-62.dsl.pltn13.pacbell.net) (Connection reset by peer)
- # [01:32] <BenMillard> but I can well imagine irregular tables will sometimes be necessary and need headers+id, although even then all the headers could be done as <th>
- # [01:32] <jgraham> Hmm BenMillard keeps saying sensible things so I don't have to
- # [01:33] <Philip`> BenMillard: Would the headers attribute be supported only on <td>, not <th>?
- # [01:33] <BenMillard> Philip`, I haven't studied that in detail yet. would you like me to forward the message to you?
- # [01:33] * Joins: tantek (n=tantek@adsl-68-123-180-62.dsl.pltn13.pacbell.net)
- # [01:34] <Philip`> How would it handle something like http://factfinder.census.gov/servlet/QTTable?_bm=n&_lang=en&qr_name=DEC_2000_SF1_U_DP1&ds_name=DEC_2000_SF1_U&geo_id=05000US48487 where the numbers need to be associated with the label in the first column, but the labels in the first column also need to be associated with some random set of other label cells?
- # [01:34] <jgraham> Philip`: I think there is likely a use case for @headers on th although no one has actually brought forward a table that needs it (at least recently)
- # [01:35] <BenMillard> Philip & jgraham, I call those "heirarchical row headers" although nobody else does :)
- # [01:35] <Lachy> "Test file 1 erroneously uses <td> for 10 of the 20 header cells" Which headers make up the 10? I only count 9
- # [01:35] <BenMillard> Lachy, I'll recount
- # [01:35] <Lachy> actually, 11
- # [01:36] <Lachy> 3 dates, 2 x Partner Portal, 6 Budged/actual/forcast
- # [01:36] <Philip`> BenMillard: It might be best to not forward the email, since I have too many other things I ought to be working on instead :-)
- # [01:36] <BenMillard> Philip`, sure thing
- # [01:36] <BenMillard> Lachy, so we're talking about? http://juicystudio.com/wcag/tables/noscope.html
- # [01:36] <Lachy> yes
- # [01:36] <jgraham> Philip`: That table looks like it should actually be several smaller tables
- # [01:37] * Quits: dglazkov (n=dglazkov@nat/google/x-7446b87021ae62b6)
- # [01:37] <BenMillard> Lachy, I agree with 11. thanks!
- # [01:37] * Joins: shepazu (n=schepers@88.128.85.131)
- # [01:37] <Philip`> jgraham: I don't think splitting it into smaller tables would help with the "One race -> Asian -> Asian Indian" label hierarchy, which is the main problem
- # [01:37] <BenMillard> (so this is another case where I understated the error)
- # [01:38] <jgraham> Philip`: I think that layout would need @headers on <th>
- # [01:38] <Hixie> iirc you can actually do Philip`'s table with some careful use of rowspans, but i forget if i ended up making that work or not (and it's dubious whether that's desireable anyway)
- # [01:38] <jgraham> Philip`: Sure but it would have confused me hell of lot less
- # [01:38] <Philip`> (Also splitting it into smaller tables would make the layout go all ugly, because you want them to all be exactly the same column sizes, and there's no way to enforce that when they're multiple tables)
- # [01:38] <jgraham> adn I can see it
- # [01:38] <BenMillard> Hixie, yes, rowspan works for "heirarchical row header" case...if you've got enough width to present it
- # [01:40] <Hixie> Philip`: i'm not sure what the best way to render that table is, but i'm pretty sure that "0. Subject, Race, One Race, Native Hawaiian and Other Pacific Islander, Other Pacific Islander 2; Number" is not the best way to read out that cell
- # [01:40] <BenMillard> Philip`, the table uses fixed-width, such as width="385", so you could split it and keep the fixed widths
- # [01:40] <Philip`> BenMillard: Then you're making assumptions about how many pixels the user's font uses
- # [01:40] <Hixie> Philip`: which is presumably what one would get if we encouraged people to chain headers
- # [01:41] <Hixie> Philip`: it should definitely be possible to link columns into having the same widths even in different tables, though css can't do that (and likely won't for some time) so i agree that in this case we shouldn't assume that it is possible
- # [01:41] <BenMillard> Hixie, when moving from cell to cell the more sophisticated ATs only announce the headers which have changed
- # [01:41] <Lachy> BenMillard, "3 of the cells using <td scope="row"> also use rowspan." - I only see 2 scope="row" in test 3
- # [01:42] <Hixie> BenMillard: well then it would sound exactly like if there weren't chained headers, assuming you're navigating the table linearly
- # [01:42] <Lachy> and this assertion of yours is debatable "For scope to work here under HTML4, scope=""rowgroup" must be used with the appropriate use of <tbody> around the rows which are being spanned: "
- # [01:42] <BenMillard> Lachy, yep, well spotted
- # [01:42] <Lachy> the spec is ambiguous though
- # [01:42] <jgraham> Hixie: FWIW Al suggested that the common AT setup is to have headers red out on demand
- # [01:42] <Lachy> it says row, but technically it's still in 3 rows
- # [01:42] <BenMillard> Lachy, does scope="row" apply to multiple rows in HTML4?
- # [01:42] <jgraham> s/red/read/
- # [01:43] <Lachy> in fact, it doesn't say one way or the other
- # [01:43] <Lachy> it just says "row: The current cell provides header information for the rest of the row that contains it"
- # [01:43] <Hixie> jgraham: that would suggest it would render as: "Zero." zero what? crap, what are the headers? "Subject, Race, One Race, Native Hawaiian and Other Pacific Islander, Other Pacific Islander 2; Number" say what now?
- # [01:43] <Lachy> so does that mean the rest of the <tr> that contains it, or the rest of the row(s) that it's actually in?
- # [01:44] <jgraham> Hixie: I agree in this case it's pretty hard to understand. But I find that table pretty hard to understand so maybe it's just a badly designed table
- # [01:44] <BenMillard> Lachy, it seems to think a "row" is different from a "row group" so my reading is that scope="row" applies to exactly one line of cells across the table
- # [01:44] <Hixie> jgraham: quite possible
- # [01:45] <Lachy> You say "This further exemplifies how difficult the headers+id system is to get right", after you mention errors with scope=""
- # [01:45] <Hixie> jgraham: but i think "Zero." zero what? crap, what are the headers? "Other Pacific Islander 2; Number" would be easier to understand.
- # [01:45] <BenMillard> Lachy, can we nail down the scope="row" thing first? :)
- # [01:45] <jgraham> Lachy: Trying to understand the HTML4 headers spec algorithm is a lost cause
- # [01:45] <Hixie> there's an algorithm?
- # [01:45] <Lachy> BenMillard, HTML4 is not clear enough to be certain one way or another
- # [01:45] <Hixie> i thought there was just some vague handwaving
- # [01:46] <jgraham> Hixie: Algorithm is a bit of a strong term
- # [01:46] <jgraham> vauge handwaving is indeed much closer
- # [01:46] <BenMillard> Lachy, it seems to make as much different between a row and a row group as it does between a column and a column group, though...
- # [01:47] <BenMillard> Lachy, indeed, why have a "rowgroup" value if "row" was intended to cover that case?
- # [01:47] <Lachy> hmm, perhaps.
- # [01:47] <jgraham> Hixie: re: what AT should read out; as I've said before this seems like exactly the sort of question that user testing could help answer
- # [01:47] <jgraham> BenMillard: If you care the Table Inspector has a HTML4 mode
- # [01:47] <BenMillard> Lachy, I agree that it's debateable, so I guess either interpretation is right. :)
- # [01:47] <Lachy> but I don't think it's a particularly strong argument
- # [01:48] <jgraham> BenMillard: I wouldn't expect miracles from it though
- # [01:48] <Lachy> anyway, with regards to that assertion I quoted above, the evidence you presented immediately before it doesn't support it
- # [01:50] <BenMillard> Lachy, I see what you mean
- # [01:51] * Quits: tantek (n=tantek@adsl-68-123-180-62.dsl.pltn13.pacbell.net) (Read error: 110 (Connection timed out))
- # [01:51] <BenMillard> Lachy, my thinking was that headers+id "missed out" 8 associations in favour of using scope, while headers+id also duplicates the 6 associations which are made by scope
- # [01:52] <BenMillard> Lachy, I interpret the gaps and overlapping as authoring mistakes...
- # [01:52] <Lachy> if they're consistent, it's not really a mistake. Just redundant
- # [01:52] <BenMillard> Lachy, they are consistent, that's true
- # [01:53] * Quits: weinig (n=weinig@nat/apple/x-416c389b1b319027)
- # [01:53] <BenMillard> Lachy, what sentence would you suggest in place of that one?
- # [01:55] <Lachy> I don't know
- # [01:57] * Joins: aboodman3 (n=aboodman@nat/google/x-c9301e41a04d7076)
- # [01:57] * aboodman3 is now known as aboodman
- # [01:58] <BenMillard> Lachy, how about I strike that sentence and change the 1st one in that paragraph to "So, test file 3 uses a weird patchwork of techniques, with mistakes in the use of scope and colspan."
- # [01:59] <Lachy> yeah
- # [02:00] <BenMillard> Lachy, done. did you find anything else?
- # [02:01] <BenMillard> jgraham, thanks for your review, btw. Short but sweet. :)
- # [02:02] <Hixie> Lachy: yeah, but to do that we'd have to make a number of variants of that table, and then give each variant to three or four different users, and ask each user to answer questions about the table
- # [02:02] <Hixie> Lachy: so if we tried, say, three variants, and had three users, that's nine users to get under a usability study video camera
- # [02:03] * Joins: KevinMarks (n=KevinMar@nat/google/x-234621707c1d44fc)
- # [02:03] <BenMillard> Hixie, is that towards jgraham?
- # [02:03] <Hixie> um
- # [02:03] <Hixie> yes
- # [02:03] <Hixie> my bad
- # [02:05] <BenMillard> I'll leave sending the mail about tables until tomorrow. I got a snapshot of all 3 tests.
- # [02:05] <BenMillard> Philip`, that table is going into my collection under "To Do".
- # [02:05] <jgraham> Hixie: Well I'm not sure how many people 9 is cmpared to the number that, say, Josh works with in a day. Plus given those 9 people they could each look at several different tables so once you had enough people to get data on one type of table, you'd have enough to get data on several
- # [02:06] <Hixie> certainly would be great if we could do it
- # [02:07] <jgraham> Even without a full test like that one could try a single user with several similar tables and different amounts of verbosity, for example
- # [02:07] * Joins: weinig (n=weinig@nat/apple/x-a30f8313d04811a7)
- # [02:07] <jgraham> (one user obviously isn't a very good sample)
- # [02:10] * Joins: tantek (n=tantek@adsl-99-137-128-33.dsl.snfc21.sbcglobal.net)
- # [02:13] * Joins: othermaciej_ (n=mjs@17.244.17.18)
- # [02:18] <BenMillard> Philip`, I've actually put some notes with it, so it ended up as "USA FactFinder: Demographic Characteristics, 2000" here: http://projectcerbera.com/web/study/2008/collection#tables-government
- # [02:19] * Quits: aroben (n=aroben@unaffiliated/aroben) (Read error: 104 (Connection reset by peer))
- # [02:21] * Quits: tantek (n=tantek@adsl-99-137-128-33.dsl.snfc21.sbcglobal.net)
- # [02:21] * Joins: othermaciej__ (n=mjs@17.244.17.18)
- # [02:21] * Quits: othermaciej_ (n=mjs@17.244.17.18) (Read error: 104 (Connection reset by peer))
- # [02:25] * Quits: shepazu (n=schepers@88.128.85.131) (Read error: 110 (Connection timed out))
- # [02:29] * Quits: othermaciej (n=mjs@17.203.15.200) (Read error: 110 (Connection timed out))
- # [02:42] * Dashiva equips vast-browser-wing-conspiracy hat
- # [02:43] * Joins: tantek (n=tantek@66-117-137-125.dsl.lmi.net)
- # [02:52] * othermaciej__ is now known as othermaciej
- # [02:55] * Parts: BenMillard (i=cerbera@cpc1-flee1-0-0-cust285.glfd.cable.ntl.com)
- # [03:00] <takkaria> ah, it's nice when you can mark 81 messages as read safely
- # [03:01] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
- # [03:08] * Quits: syp_ (n=syp@lasigpc9.epfl.ch) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: jacobolus (n=jacobolu@pool-71-119-188-52.lsanca.dsl-w.verizon.net) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: Philip` (n=philip@zaynar.demon.co.uk) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: hendry (n=hendry@nox.vm.bytemark.co.uk) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: bzed (n=bzed@devel.recluse.de) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: bdash (n=bdash@fire/developer/bdash) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: didymos (i=jho@rapwap.razor.dk) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: [YaaL] (i=yaal@hell.pl) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: uriel (n=uriel@h677044.serverkompetenz.net) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Quits: deltab (n=deltab@82-36-30-34.cable.ubr02.smal.blueyonder.co.uk) (simmons.freenode.net irc.freenode.net)
- # [03:08] * Joins: jacobolus (n=jacobolu@pool-71-119-188-52.lsanca.dsl-w.verizon.net)
- # [03:08] * Joins: syp_ (n=syp@lasigpc9.epfl.ch)
- # [03:08] * Joins: Philip` (n=philip@zaynar.demon.co.uk)
- # [03:08] * Joins: hendry (n=hendry@nox.vm.bytemark.co.uk)
- # [03:08] * Joins: bzed (n=bzed@devel.recluse.de)
- # [03:08] * Joins: bdash (n=bdash@fire/developer/bdash)
- # [03:08] * Joins: [YaaL] (i=yaal@hell.pl)
- # [03:08] * Joins: uriel (n=uriel@h677044.serverkompetenz.net)
- # [03:08] * Joins: didymos (i=jho@rapwap.razor.dk)
- # [03:08] * Joins: deltab (n=deltab@82-36-30-34.cable.ubr02.smal.blueyonder.co.uk)
- # [03:09] * Joins: weinig_ (n=weinig@nat/apple/x-9aa5e2e8f5ca56f3)
- # [03:09] * Quits: othermaciej (n=mjs@17.244.17.18) (Read error: 104 (Connection reset by peer))
- # [03:09] * Joins: othermaciej (n=mjs@17.244.17.18)
- # [03:13] * Joins: tantek_ (n=tantek@66-117-137-125.dsl.lmi.net)
- # [03:13] * Quits: tantek (n=tantek@66-117-137-125.dsl.lmi.net) (Read error: 104 (Connection reset by peer))
- # [03:14] * Quits: KevinMarks (n=KevinMar@nat/google/x-234621707c1d44fc) (Connection timed out)
- # [03:17] * Joins: othermaciej_ (n=mjs@17.244.17.18)
- # [03:17] * Quits: othermaciej (n=mjs@17.244.17.18) (Read error: 104 (Connection reset by peer))
- # [03:21] * Joins: tantek (n=tantek@66-117-137-125.dsl.lmi.net)
- # [03:21] * Quits: tantek_ (n=tantek@66-117-137-125.dsl.lmi.net) (Read error: 104 (Connection reset by peer))
- # [03:24] * Quits: weinig (n=weinig@nat/apple/x-a30f8313d04811a7) (Read error: 110 (Connection timed out))
- # [03:27] * Quits: tantek (n=tantek@66-117-137-125.dsl.lmi.net)
- # [03:52] * Quits: bdash (n=bdash@fire/developer/bdash) (Read error: 110 (Connection timed out))
- # [03:56] <takkaria> http://www.squarefree.com/burningedge/2008/08/29/2008-08-29-trunk-builds/ -- looks like yesterday was a pretty productive day for gecko
- # [03:59] * Joins: alyosha (n=anime4ch@74.93.182.234)
- # [03:59] <jruderman> that covers changes in the last two weeks, not just yesterday
- # [03:59] <jruderman> we only land that much in one day on crazy code freeze days
- # [04:00] <takkaria> ah, I thought it had rather a lot on it for a day
- # [04:00] <takkaria> still, pretty good going. :)
- # [04:01] <alyosha> hi ppl
- # [04:01] <alyosha> what do u guys think of IE 8 beta 2's HTML 5 support?
- # [04:02] <alyosha> I noticed (and I'm 100% sure I'm not the only one) a regression with unrecognized elements (eg. html 5 sectioning elements and inline elements such a mark)
- # [04:03] <alyosha> hopefully they'll fix it b4 final release
- # [04:10] * Quits: eseidel (n=eseidel@nat/google/x-e99074dd86d18be5)
- # [04:13] <takkaria> have they removed the document.createElement() hack?
- # [04:13] <alyosha> yeah, pretty much
- # [04:14] <alyosha> but the elements to seem to show up correctly in the DOM tree in IE 8's developer tools
- # [04:15] <alyosha> *do
- # [04:16] * Joins: tantek (n=tantek@66-117-137-125.dsl.lmi.net)
- # [04:20] * Joins: eseidel (n=eseidel@nat/google/x-9aff75974286f61f)
- # [04:21] <alyosha> I think it's probably an unintentional bug and they should fix it before final release, but I don't know for sure
- # [04:24] * Quits: eseidel (n=eseidel@nat/google/x-9aff75974286f61f) (Client Quit)
- # [04:24] <alyosha> and the interesting thing is that the IE7 mode button is disabled on the html 5 doctype
- # [04:25] * Joins: hdh (n=hdh@118.71.121.76)
- # [04:25] <alyosha> even though IE 7 rendering mode can be hacked to display new elements
- # [04:26] * Quits: tndH (i=Rob@adsl-77-86-6-71.karoo.KCOM.COM) ("ChatZilla 0.9.83-rdmsoft [XULRunner 1.9/2008061013]")
- # [04:26] <alyosha> hmmm, what do u get with html 5 doctype and <meta http-equiv="X-UA-Compatible" content="IE=7">?
- # [04:28] * Joins: franksalim (n=frank@user-64-9-234-71.googlewifi.com)
- # [04:30] <alyosha> html 5 doctype overrides the meta thingy
- # [04:31] <alyosha> IE 7 mode not available for html 5
- # [04:32] * Quits: othermaciej_ (n=mjs@17.244.17.18)
- # [04:38] <alyosha> actually, they didn't remove the document.createElement() hack. It's just in the CSS, it makes unrecognized elements "UNKNOWN"
- # [04:38] <alyosha> just tried disabling script with the hack, and it still works
- # [04:38] <alyosha> but the styles just aren't applied
- # [04:42] <alyosha> style attributes are applied after the hack, but external stylesheets are not applied
- # [04:49] * Quits: weinig_ (n=weinig@nat/apple/x-9aa5e2e8f5ca56f3)
- # [04:59] * Quits: franksalim (n=frank@user-64-9-234-71.googlewifi.com) (Read error: 110 (Connection timed out))
- # [05:18] * Quits: tantek (n=tantek@66-117-137-125.dsl.lmi.net)
- # [05:25] <Hixie> you gotta wonder what a mess their codebase is to get this kind of behaviour
- # [05:26] <alyosha> yeah, guess so
- # [05:27] <alyosha> IE 7 mode renders fine and shows the stylesheets fine too, but it can only be activated through developer tools or by adding the website to compatibility mode
- # [05:27] <alyosha> the meta thing is overridden by the doctype and the button is gone too
- # [05:28] <alyosha> gotta love M$, they make sure web designers won't lose their jobs (constantly gotta fix all their problems)
- # [05:28] <alyosha> lol
- # [05:30] <alyosha> nvm, adding it to compatibility view doesn't work either
- # [05:31] <alyosha> does MS have a bug tracker somewhere?
- # [05:32] <Hixie> https://connect.microsoft.com/feedback/AdvancedSearch.aspx?SiteID=136&Status=1&FeedbackType=1 i think?
- # [05:33] <alyosha> ooh, cool, Microsoft isn't completely submerged in the last decade after all.
- # [05:34] <Hixie> if you can get it to work, let me know
- # [05:35] <alyosha> sure. but I think most likely we'll have to wait for a fix from MS or do something like <header id="header"> ... #header { /*style here*/ } if they don't fix it
- # [05:39] <alyosha> according to this report IE8b1 didn't have this problem, so it's most likely a regression in IE8b2: https://connect.microsoft.com/IE/feedback/ViewFeedback.aspx?FeedbackID=364356
- # [05:42] <alyosha> well, g2g, l8rz
- # [05:42] * Parts: alyosha (n=anime4ch@74.93.182.234)
- # [05:55] * Quits: jruderman (n=jruderma@c-67-180-39-55.hsd1.ca.comcast.net)
- # [05:57] * Joins: jruderman (n=jruderma@c-67-180-39-55.hsd1.ca.comcast.net)
- # [06:17] * Joins: aboodman2 (n=aboodman@nat/google/x-c029fadc328fea49)
- # [06:18] * Joins: eseidel (n=eseidel@c-24-130-13-197.hsd1.ca.comcast.net)
- # [06:20] * Joins: eseidel_ (n=eseidel@72.14.224.1)
- # [06:27] * Quits: aboodman (n=aboodman@nat/google/x-c9301e41a04d7076) (Read error: 110 (Connection timed out))
- # [06:29] * Joins: aboodman (n=aboodman@216.239.45.19)
- # [06:32] * Joins: aboodman3 (n=aboodman@69.36.227.135)
- # [06:36] * Quits: eseidel (n=eseidel@c-24-130-13-197.hsd1.ca.comcast.net) (Read error: 110 (Connection timed out))
- # [06:42] * Quits: aboodman2 (n=aboodman@nat/google/x-c029fadc328fea49) (Read error: 110 (Connection timed out))
- # [06:44] * Joins: weinig (n=weinig@c-71-198-176-23.hsd1.ca.comcast.net)
- # [06:46] * Quits: aboodman (n=aboodman@216.239.45.19) (Read error: 110 (Connection timed out))
- # [06:54] * Joins: Kuruma (n=Kuruman@h123-176-107-050.catv01.catv-yokohama.ne.jp)
- # [07:09] * eseidel_ is now known as eseidel
- # [07:11] * Joins: eseidel_ (n=eseidel@c-24-130-13-197.hsd1.ca.comcast.net)
- # [07:28] <Hixie> "Ian's approach completely removes HTML conformance checking as a
- # [07:28] <Hixie> mechanism to introduce authors to accessibility issues."
- # [07:28] <Hixie> -- http://html4all.org/pipermail/list_html4all.org/2008-August/000977.html
- # [07:28] <Hixie> well at least they admit that they are trying to use conformance checking for their own purposes
- # [07:28] * Quits: eseidel (n=eseidel@72.14.224.1) (Read error: 110 (Connection timed out))
- # [07:29] <Hixie> and good to see others on that thread disagreeing with it :-)
- # [07:33] * Quits: aboodman3 (n=aboodman@69.36.227.135) (Read error: 110 (Connection timed out))
- # [07:36] * Quits: csarven (n=csarven@modemcable144.140-202-24.mc.videotron.ca) ("http://www.csarven.ca/")
- # [07:36] * Quits: weinig (n=weinig@c-71-198-176-23.hsd1.ca.comcast.net)
- # [08:05] * Quits: eseidel_ (n=eseidel@c-24-130-13-197.hsd1.ca.comcast.net) (Read error: 110 (Connection timed out))
- # [08:18] * Joins: shepazu (n=schepers@88.128.85.131)
- # [08:21] * Quits: hdh (n=hdh@118.71.121.76) ("Konversation terminated!")
- # [08:21] <hsivonen> wow. when I fixed bugs in my validation harness, it ran in 4 hours and the output was only 83.4 MB.
- # [08:22] * Joins: hdh (n=hdh@118.71.121.171)
- # [08:24] <hsivonen> annevk: typo. thanks
- # [08:37] <Hixie> hsivonen: heh
- # [08:38] * Joins: aboodman3 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net)
- # [08:39] * Joins: othermaciej (n=mjs@c-69-181-42-194.hsd1.ca.comcast.net)
- # [08:40] * Joins: aboodman4 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net)
- # [08:54] <hsivonen> whoa! there are many more 0-error docs than I would have thought
- # [08:54] * Joins: KevinMarks (n=KevinMar@c-98-207-134-151.hsd1.ca.comcast.net)
- # [08:56] * Joins: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
- # [08:57] <Hixie> hsivonen: 2?
- # [08:58] * Quits: aboodman3 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net) (Read error: 110 (Connection timed out))
- # [08:58] <hsivonen> Hixie: 4514
- # [08:58] <Hixie> out of a million?
- # [08:58] <hsivonen> out of 516875
- # [08:58] <hsivonen> and manual verification shows that it's really so
- # [08:58] <hsivonen> however, this ignores the document mode
- # [08:58] <Hixie> 0.87%
- # [08:59] <hsivonen> so doctypeless files count
- # [08:59] <Hixie> does bgcolor in a transitional doc count as pass or fail?
- # [08:59] <hsivonen> fail
- # [08:59] <hsivonen> this is HTML5 rules
- # [08:59] <hsivonen> except for doctype
- # [08:59] <Hixie> wow that's not bad then
- # [09:00] <hsivonen> note that omitted alt doesn't count as an error
- # [09:00] <Hixie> what are we saying, that's horrific. but still. higher than i expected.
- # [09:00] <hsivonen> and IRIs on non-UTF-8 pages pass
- # [09:00] <hsivonen> no parse errors (doctype errors ignored) is 29%
- # [09:01] <hsivonen> which is rather high compared to your old numbers
- # [09:02] <hsivonen> but now the results look pretty consistent with what I've seen before in terms of the relative frequencies
- # [09:03] <Hixie> i had two numbers, one that counted /> and doctypes as errors and one that didn't
- # [09:03] <hsivonen> ah
- # [09:04] <Hixie> i forget what my exact numbers were
- # [09:04] <Hixie> but one was about 70% and one was about 90%
- # [09:04] * Quits: shepazu (n=schepers@88.128.85.131) (Read error: 110 (Connection timed out))
- # [09:19] * Joins: bdash (n=bdash@fire/developer/bdash)
- # [09:33] <annevk> hsivonen, MB or GB?
- # [09:34] <annevk> gsnedders, hmm, you didn't do your checkin
- # [09:37] * Joins: aboodman5 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net)
- # [09:41] * Joins: Maurice (i=copyman@cc356098-a.emmen1.dr.home.nl)
- # [09:42] <hsivonen> annevk: MB
- # [09:43] <hsivonen> annevk: the harness used to have a simple but serious bug
- # [09:43] <annevk> but you expected 80GB initially?!
- # [09:43] <hsivonen> annevk: 60 GB actually, but that expectation was based on the bug, too
- # [09:43] <annevk> ok
- # [09:44] * Joins: GregHouston (n=ghouston@adsl-75-6-6-153.dsl.spfdmo.sbcglobal.net)
- # [09:50] * Joins: myakura (n=myakura@p3216-ipbf5106marunouchi.tokyo.ocn.ne.jp)
- # [09:54] * Quits: aboodman4 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net) (Read error: 110 (Connection timed out))
- # [09:54] <Hixie> hsivonen: very interesting results
- # [09:55] <Hixie> hsivonen: these results really argue for consolidating all "attribute [known presentational attribute] not allowed" messages into a single message "This page contains presentational markup. More details... Help on removing presentational markup..."
- # [09:56] <Hixie> wow, 7% of pages had an </embed> ?
- # [09:57] <hsivonen> so it seems
- # [09:57] <hsivonen> crazy
- # [09:57] <annevk> lots of people think <embed> needs a closing tag, I once did so too
- # [09:57] <annevk> it's not like there was good documentation out there on how it works...
- # [09:59] <Hixie> wow, malformed byte sequences aren't that common either
- # [09:59] <annevk> "No “p” element in scope but a “p” end tag seen." 9%!
- # [10:00] <annevk> madness
- # [10:00] <Hixie> that's probably a lot of <p><table></table></p>-type stuff
- # [10:01] <annevk> and 5% had "Element “frameset” not allowed in this context. (The parent was element “html”.) Suppressing further errors from this subtree." so many frames still around?
- # [10:01] <Hixie> this sample didn't bias for date of creation
- # [10:01] <Hixie> so it includes stuff going back many years
- # [10:01] <Hixie> there's a lot of old content out there still
- # [10:03] <Hixie> sigh i really don't want to reintroduce <script language="">, people typo it so much
- # [10:03] <Hixie> and the & issue is a sad one
- # [10:03] <annevk> >2% uses <head profile>
- # [10:04] <Hixie> iirc there's a lot of pages that have <head profile=""> (blank)
- # [10:04] <hsivonen> annevk: wordpress.com gives distinct host names to users
- # [10:04] <hsivonen> annevk: livejournal, too
- # [10:04] <Hixie> like there are a lot of <a> elements with shape="rect"
- # [10:04] <hsivonen> annevk: I was too lazy to deal with those
- # [10:04] <hsivonen> annevk: although I did collapse MySpace profiles
- # [10:05] <Hixie> hsivonen: i'll give you a domain-separated set of urls next time instead of site-separated
- # [10:06] <Hixie> maybe we should make & followed by alphanumerics, followed by =, a non-ambiguous ampersand
- # [10:06] <Hixie> that might deal with a bunch of these & errors
- # [10:06] <annevk> "Bad value (consolidated) for attribute “lang” from namespace “http://www.w3.org/XML/1998/namespace” on element “html”: Bad language tag: Bad variant subtag." XML sites were included?
- # [10:07] <hsivonen> annevk: no
- # [10:07] <hsivonen> annevk: the validator sees HTML lang as XML lang internally
- # [10:07] <takkaria> Hixie: I think that could be a big win for authoring
- # [10:07] <hsivonen> annevk: and these messages weren't fully sanitized for UI consumption
- # [10:07] <annevk> Hixie, maybe also allow anything but [a-Z#]
- # [10:08] <annevk> to follow it
- # [10:08] <Hixie> annevk: ?
- # [10:08] <annevk> &" would be conforming
- # [10:08] <annevk> and so would 2&2
- # [10:09] <Hixie> the character encoding thing -- we could make <meta charset> allowed if not preceeded by any non-ASCII
- # [10:09] <annevk> or (&)
- # [10:09] <Hixie> annevk: i posit that the problem is just urls in attributes
- # [10:10] * Quits: hdh (n=hdh@118.71.121.171) (Read error: 104 (Connection reset by peer))
- # [10:11] <takkaria> fwiw I'd prefer the "get a character reference" algorithm not to depend on whether you're in an attribute value state or not
- # [10:11] <annevk> I don't see what's wrong loosening them up both, given that you keep several extension points
- # [10:11] <annevk> takkaria, it already does
- # [10:12] <annevk> takkaria, and if we are to keep compat with IE, it has to be that way
- # [10:13] <takkaria> I mean in this particular case. i.e. if you can paste an unescaped URL into an attribute value you should also be able to conformingly paste it outside an attribute value
- # [10:15] <annevk> that wouldn't work well
- # [10:16] <annevk> eg, it would go wrong with &= which does different things
- # [10:17] <takkaria> mm, that's a point
- # [10:18] <takkaria> ah well. it would be nice, though
- # [10:21] <annevk> at this point chaals would ask for a pony
- # [10:59] * Joins: primal1 (n=primal1@pool-72-87-132-196.lsanca.dsl-w.verizon.net)
- # [10:59] <annevk> grmbl, how do you properly configure lxml?
- # [11:00] <annevk> unzipped it's 25MB
- # [11:05] * Joins: ROBOd (n=robod@89.122.216.38)
- # [11:42] <hsivonen> Unsupported character encoding name: “iso-utf-8”. Will continue sniffing.
- # [11:42] <hsivonen> Unsupported character encoding name: “44-iso-8859-1”. Will continue sniffing.
- # [11:43] <hsivonen> crazy ebcdic charset in HTTP: http://web-sniffer.net/?url=http%3A%2F%2Fwww.antalis.fr%2Fsitesweb%2FFO%2Fpages%2Finterne-2-66-2122-rich_text-73228.html&submit=Submit&http=1.1&type=GET&uak=0
- # [11:44] <hsivonen> Unsupported character encoding name: “gb2312,big5,euc-kr”. Will sniff.
- # [11:45] <hsivonen> Unsupported character encoding name: “zh-tw”. Will sniff.
- # [11:45] <hsivonen> you can't make this stuff up
- # [12:02] <jgraham> hsivonen: btw, I'm not sure that such a thing as an unbiased sample of webpages exists
- # [12:02] * Quits: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net) (Read error: 104 (Connection reset by peer))
- # [12:03] * Joins: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
- # [12:03] <hsivonen> jgraham: sure. I said it was biased. :-)
- # [12:03] <Philip`> You can't even know how it's biased, because you can't know what the population is
- # [12:04] <jgraham> hsivonen: I know. I just think it's a tautology
- # [12:04] <hsivonen> yeah
- # [12:04] <hsivonen> and, yet, with different page sets, the same common errors come to the top
- # [12:07] <jgraham> In some sense approximately all the pages on the web are autogenerated pages which use the url to determine the content e.g. calendar.example.com/year/month/day with only implementation limits on the value of year
- # [12:08] <jgraham> So an unbiased sample of the whole population of http URLs that return 200 would be very misleading
- # [12:09] <Philip`> "approximately all" is not a concept that makes sense, where there's an infinite number of pages
- # [12:11] <hsivonen> more to the point, the number of pages in countably infinite which should make counting proportions a bit more tractable
- # [12:12] <hsivonen> Unsupported character encoding name: “big6”. Will sniff.
- # [12:12] <gsnedders> annevk: I asked if you wanted me to do it last night so you could work on it this morning. I got no answer :P
- # [12:12] <Philip`> Positive integers are countably infinite too, but it doesn't make sense to ask for an unbiased random sampling of positive integers
- # [12:13] <gsnedders> annevk: I took the default-lazy solution
- # [12:13] <hsivonen> Philip`: true, but you can say that half of the integers are positive
- # [12:13] <Philip`> hsivonen: No you can't :-p
- # [12:14] <annevk> gsnedders, I thought the default was yes!
- # [12:14] <Philip`> For every positive integer you give me, I'll give you back two negative integers, so there's twice as many :-)
- # [12:14] <hsivonen> Philip`: hmm. right.
- # [12:14] <hsivonen> now I appear silly and badly educated
- # [12:14] * gsnedders attempts to cd Documents/Stuff\ I\'m\ Working\ On/spec-gen
- # [12:14] <annevk> gsnedders, I would appreciate a bundle of lxml+anolis+html5lib so I can just write the frontend script and don't have to worry about the bundling as I'm really bad at that
- # [12:15] * annevk tried it this morning and couldn't get the lxml dependency to work
- # [12:15] <gsnedders> annevk: I've never tried bundling :)
- # [12:15] <gsnedders> annevk: lxml is written in C, which may make it harder
- # [12:15] * Quits: Amorphous (i=jan@unaffiliated/amorphous) ("shutdown")
- # [12:15] <Philip`> (It does make sense to ask for an unbiased random real number between 0 and 1, even though that's an uncountable set)
- # [12:15] <annevk> gsnedders, I think that's the problem, yes
- # [12:15] <Philip`> (or at least I think it makes sense)
- # [12:16] <gsnedders> But it really does need to be for the sake of being reasonably quick
- # [12:17] * Quits: primal1 (n=primal1@pool-72-87-132-196.lsanca.dsl-w.verizon.net)
- # [12:17] <annevk> what's a difference between a pleonasm and tautology?
- # [12:17] <Hixie> hsivonen, Philip`: in this particular case the population was itself a (biased, non-random) subset of google's index
- # [12:18] <annevk> ah I see, tautology is also used in logic
- # [12:19] <Hixie> a tautology is specifically being overly specific in a redundant manner. a pleonasm is just using too many words. as i understand it.
- # [12:20] <Philip`> I think the logical meaning of tautology is a statement that's true regardless of the values of any variables in it
- # [12:20] <annevk> maybe the Dutch and English pleonasm are different then (in Dutch "round circle" is considered a "pleonasme")
- # [12:20] * Joins: virtuelv (n=virtuelv@163.80-202-65.nextgentel.com)
- # [12:20] <annevk> Philip`, yeah
- # [12:21] * Philip` guesses that must include all true statements that don't have any variables
- # [12:22] <annevk> "2. Logic. An empty or vacuous statement composed of simpler statements in a fashion that makes it logically true whether the simpler statements are factually true or false; for example, the statement Either it will rain tomorrow or it will not rain tomorrow."
- # [12:24] <GregHouston> Logical "proofs" of the existence of God generally falls into the category of a tautology.
- # [12:26] <annevk> gsnedders, anyway, for you the stuff is running right? can't you just zip that dir? :)
- # [12:27] <gsnedders> annevk: Only if you're running OS X/x86 :)
- # [12:27] <gsnedders> As of course the compiled C stuff…
- # [12:29] <annevk> grmbl
- # [12:29] * Joins: tndH (n=Rob@adsl-77-86-6-71.karoo.KCOM.COM)
- # [12:32] <annevk> so how do I install lxml?
- # [12:32] <annevk> running setup.py install fails
- # [12:33] <gsnedders> annevk: http://codespeak.net/lxml/installation.html :P
- # [12:34] <virtuelv> annevk: sudo apt-get install python-lxml :P
- # [12:35] <annevk> hmm
- # [12:35] * annevk wonders if dreamhost supports that
- # [12:35] <virtuelv> they don't
- # [12:36] <gsnedders> You need to install it in a custom path
- # [12:36] <virtuelv> on slicehost, that stuff is a bit easier, given that you have root
- # [12:36] <annevk> "annevk is not in the sudoers file. This incident will be reported."
- # [12:37] <annevk> gsnedders, DreamHost doesn't have easy_install
- # [12:37] * Joins: Amorphous (i=jan@unaffiliated/amorphous)
- # [12:40] <Philip`> Do they have hard_install?
- # [12:41] <Hixie> there appears to be an inverse corrolation between how much actual useful research someone has done, and how much they ask people who are doing research to do more
- # [12:41] <annevk> -_-
- # [12:42] * gsnedders is gonna have to install it on (mt)
- # [12:43] <Philip`> Hixie: That would be because the people who can do research themselves do it themselves instead of having to ask others :-)
- # [12:43] <annevk> grmbl, even if I do apt-get on my local machine it complains about lxml.html not being there :/
- # [12:43] <Hixie> that and they know how much work it is, i imagine
- # [12:43] <Philip`> It would be nicer if they said *why* they wanted that research, and what useful information it would be likely to reveal
- # [12:44] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
- # [12:50] <annevk> gsnedders, I guess the lxml dependency is pretty big?
- # [12:50] <gsnedders> annevk: Yeah.
- # [12:50] <annevk> sigh
- # [12:51] <gsnedders> annevk: It's the structure used for the tree everywhere
- # [12:51] <jgraham> Philip`: It's not clear to me that there are an infinite number of web pages given likely limits on URL length supported by servers
- # [12:52] <jgraham> annevk: If you want python to work sensibly on Dreamhost you have to install it youself under your home directory
- # [12:52] <jgraham> Then you install easy_install
- # [12:52] <jgraham> Then you do easy_install lxml
- # [12:53] <jgraham> Then you just have to rember to change anything like #!/usr/bin/env python to #!/home/annevk/bin/python
- # [12:54] <jgraham> Otherwise using any external dependencies seems to be really hard
- # [12:55] <gsnedders> Not really
- # [12:56] <jgraham> gsnedders: It's getting the paths right so you can import stuff that seemed to be hard
- # [12:56] <gsnedders> export PYTHONPATH=${HOME}/packages/lib/python
- # [12:56] <gsnedders> export PATH=${HOME}/packages/bin:$PATH
- # [12:56] <gsnedders> in .bash_profile
- # [12:56] <gsnedders> That's what used on sp.org
- # [12:56] <jgraham> Hmm, I thought I tried that and it didn't work
- # [12:57] <jgraham> Anyway setting PYTHONPATH is a bad idea in general
- # [12:57] <gsnedders> That's true, but it works ;P
- # [12:58] <gsnedders> annevk: See what I just pushed
- # [12:58] <gsnedders> i.e., http://hg.gsnedders.com/hgwebdir.cgi/anolis/rev/cf4770338aa0
- # [13:01] <virtuelv> annevk: there is some tutorial for rolling your own python on DH
- # [13:01] <virtuelv> http://wiki.dreamhost.com/Python#Building_a_custom_version_of_Python
- # [13:23] * Joins: maikmerten (n=maikmert@Lbaac.l.pppool.de)
- # [13:25] * Quits: virtuelv (n=virtuelv@163.80-202-65.nextgentel.com) ("Leaving")
- # [13:28] * Joins: virtuelv (n=virtuelv@163.80-202-65.nextgentel.com)
- # [13:47] * Joins: jacobolus1 (n=jacobolu@pool-71-119-188-52.lsanca.dsl-w.verizon.net)
- # [13:48] * Quits: jacobolus (n=jacobolu@pool-71-119-188-52.lsanca.dsl-w.verizon.net) (Read error: 104 (Connection reset by peer))
- # [14:49] * Quits: othermaciej (n=mjs@c-69-181-42-194.hsd1.ca.comcast.net)
- # [15:49] * Quits: GregHouston (n=ghouston@adsl-75-6-6-153.dsl.spfdmo.sbcglobal.net) (Read error: 110 (Connection timed out))
- # [15:49] * Joins: GregHouston (n=ghouston@ppp-66-143-220-108.dsl.spfdmo.swbell.net)
- # [16:10] * Joins: csarven (n=csarven@modemcable144.140-202-24.mc.videotron.ca)
- # [16:14] * Joins: BenMillard (i=cerbera@cpc1-flee1-0-0-cust285.glfd.cable.ntl.com)
- # [16:17] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
- # [16:29] * Quits: jacobolus1 (n=jacobolu@pool-71-119-188-52.lsanca.dsl-w.verizon.net) (Read error: 110 (Connection timed out))
- # [16:35] <gsnedders> Time to go out into town to do something about the /topic
- # [16:36] <jcranmer> gsnedders: you're leaving your sense of logic behind?
- # [16:37] * Joins: hdh (n=hdh@58.187.60.134)
- # [16:40] * Joins: jacobolus (n=jacobolu@pool-71-119-188-52.lsanca.dsl-w.verizon.net)
- # [17:06] * Joins: jacobolus1 (n=jacobolu@pool-71-119-188-52.lsanca.dsl-w.verizon.net)
- # [17:09] * Quits: jacobolus (n=jacobolu@pool-71-119-188-52.lsanca.dsl-w.verizon.net) (Read error: 104 (Connection reset by peer))
- # [17:12] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
- # [17:15] <virtuelv> gsnedders: I presume you'll put URL in /topic
- # [17:27] * gsnedders is too impatient to wait in a queue of the length there was
- # [17:28] <gsnedders> (i.e., my hair is still the same old colour)
- # [17:29] * Joins: weinig (n=weinig@c-71-198-176-23.hsd1.ca.comcast.net)
- # [17:33] * Joins: sverrej (n=sverrej@cBF13BF51.dhcp.bluecom.no)
- # [18:21] * Parts: BenMillard (i=cerbera@cpc1-flee1-0-0-cust285.glfd.cable.ntl.com)
- # [18:26] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
- # [18:31] * Quits: aboodman5 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net)
- # [18:50] <hsivonen> weird. my Mac had bluescreened (literally) while unattended
- # [18:53] * gsnedders still has never got a pinkscreen
- # [18:53] <Lachy> hsivonen, do you mean a kernel panic?
- # [18:54] <Lachy> AFAIK, macs can't get BSODs
- # [18:54] <gsnedders> Lachy: They can however get stuck on a blank blue screen
- # [18:54] <gsnedders> Lachy: For no apparent reason
- # [18:55] <Lachy> I've never seen that
- # [18:56] * gsnedders tries to decide in what order to post his blog posts
- # [18:56] <Lachy> I've had my machines have kernel panics a couple of times, and just freeze with the spinning beachball cursor.
- # [18:56] <Lachy> gsnedders, I'd recommend starting with number 1 followed by number 2
- # [18:57] <gsnedders> Lachy: It would make more sense to do them in chronological order, but the earlier one is far more time-consuming to write
- # [18:57] <Lachy> ok
- # [18:57] <Lachy> I have a number of blog posts I have to finish writing
- # [18:58] <Lachy> I suppose I should just post something about IE8 tonight, and then post my other, significantly longer, potentially 3-part series later
- # [19:00] <gsnedders> I have eight drafts currently
- # [19:01] <gsnedders> One gives a useful answer to <http://krijnhoetmer.nl/irc-logs/whatwg/20080605#l-450>
- # [19:02] <gsnedders> The other follows on from that
- # [19:02] * Quits: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
- # [19:03] <Lachy> I'd forgotten I'd even asked that question. I suppose it'll be good to get a better answer than "Stuff"
- # [19:03] <gsnedders> It was the first place I could think of that has a public record of me avoiding that question.
- # [19:07] <gsnedders> Writing about May last year is rather time-consuming.
- # [19:11] * gsnedders smacks his old writing
- # [19:12] <gsnedders> It uses -ise :(
- # [19:12] <Lachy> VMWare ThinApp is absolutely brilliant! Now I can seamlessly run IE6, IE7, and IE8b1 and IE8b2 all within the same copy of Windows XP, which is itself running in VMWare Fusion on OS X.
- # [19:13] <Lachy> it basically runs each version of IE, or any other application I like, within its own sandbox
- # [19:14] <gsnedders> As long as no sand falls over the edge, I guess that's all right
- # [19:15] <Lachy> gsnedders, what is wrong with using -ise?
- # [19:15] <gsnedders> Lachy: en-gb-oed prefers -ize :P
- # [19:15] <Lachy> what?!
- # [19:15] <Lachy> nooO!
- # [19:16] <Lachy> -ize is wrong. Stupid American misspelling
- # [19:16] <gsnedders> No, it isn't.
- # [19:16] <Lachy> yes, it is
- # [19:16] <Lachy> I thought en-GB used -ise, just like en-AU
- # [19:16] <gsnedders> -ize comes from Greek, and should be used on Greek-derived words
- # [19:17] <GregHouston> Am I looking at the right thing. It looks like Thin App starts around $6000. I have Workstation and it was a little under $200.
- # [19:18] <gsnedders> en-gb only uses -ise, en-gb-oed uses -ize for words of Greek origin and -ise for those of French, en-us uses -ize
- # [19:18] <gsnedders> "[T]he suffix…, whatever the element to which it is added, is in its origin the Gr[eek] -ιζειν, L[atin] -izāre; and, as the pronunciation is also with z, there is no reason why in English the special French spelling in -iser should be followed, in opposition to that which is at once etymological and phonetic." — the OED
- # [19:19] <gsnedders> en-us also over does the entire z thing. Analyze is wrong.
- # [19:19] <Lachy> hmm, interesting
- # [19:20] <gsnedders> en-gb uses -ise too much, en-us uses -ize too much
- # [19:20] <Lachy> I still think -ise should be used for *everything*
- # [19:21] <Lachy> except for words like prize which are supposed to end in -ize
- # [19:22] <Lachy> wiktionary says that it's supposed to be -ise for french-origin words and -ize for greek-origin words. But to do that, I would have to know the origin of each word before I tried to spell it
- # [19:22] <gsnedders> me wonders whether he really should add a certain girl on Facebook…
- # [19:25] <gsnedders> (She is all ready convinced that I'm secretly in love with her, which is totally untrue)
- # [19:32] <GregHouston> It appears Thin App really is $6k. Application virtualization must be pretty tricky to cost 20 times that of a virtual machine.
- # [19:32] <GregHouston> I can't multipy. Make that 30 times.
- # [19:33] <GregHouston> Or spell. * multiply
- # [19:34] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
- # [19:36] * Quits: weinig (n=weinig@c-71-198-176-23.hsd1.ca.comcast.net)
- # [19:54] * Joins: eseidel (n=eseidel@c-24-130-13-197.hsd1.ca.comcast.net)
- # [20:16] <Philip`> jgraham: You only need a single custom HTTP server that supports arbitrary-length URLs, and then the web can have an infinite number of pages, and I would have thought at least one person would have made such a server
- # [20:16] <Philip`> If nobody has, I'll make one, just to prove my point :-p
- # [20:28] * Joins: weinig (n=weinig@nat/apple/x-ade8156ca560b392)
- # [20:45] * Quits: myakura (n=myakura@p3216-ipbf5106marunouchi.tokyo.ocn.ne.jp) ("Leaving...")
- # [21:18] <gsnedders> Philip`: You have calendars that can be navigated endlessly. There's no need for custom HTTP servers.
- # [21:25] <Philip`> gsnedders: But those calendars might have finite URL limitations
- # [21:31] <Philip`> (even if it's only limited by the amount of RAM available)
- # [21:41] * Quits: maikmerten (n=maikmert@Lbaac.l.pppool.de) ("Leaving")
- # [22:00] * Quits: KevinMarks (n=KevinMar@c-98-207-134-151.hsd1.ca.comcast.net) ("The computer fell asleep")
- # [22:47] * Joins: MacDome (n=eric@c-24-130-13-197.hsd1.ca.comcast.net)
- # [22:54] * Joins: othermaciej (n=mjs@c-69-181-42-194.hsd1.ca.comcast.net)
- # [23:06] <gsnedders> Philip`: Your webserver that supports arbitrary-length URLs will have the same RAM limitations
- # [23:08] <Philip`> gsnedders: No it won't - it won't store the URL in memory
- # [23:08] <gsnedders> Philip`: It just returns something for any request?
- # [23:11] <Philip`> gsnedders: It could ignore the URL entirely, or it could do some streaming processing of it to calculate a finite output
- # [23:11] <Philip`> (I assume HTTP doesn't particularly like you sending the response before you've received the request, so you can't do anything like echo the URL back to the client)
- # [23:12] * Quits: ROBOd (n=robod@89.122.216.38) ("http://www.robodesign.ro")
- # [23:12] <gsnedders> I don't think RFC2616 actually forbids you from doing so…
- # [23:36] * Quits: hdh (n=hdh@58.187.60.134) (Read error: 104 (Connection reset by peer))
- # [23:52] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
- # [23:56] <Philip`> gsnedders: Does it never require you receive the whole header so you can detect invalid requests and send an appropriate response?
- # [23:59] * Quits: sverrej (n=sverrej@cBF13BF51.dhcp.bluecom.no) (Connection timed out)
- # Session Close: Sun Aug 31 00:00:00 2008
The end :)