Options:
- # Session Start: Sun May 13 00:00:00 2007
- # Session Ident: #whatwg
- # [00:03] * Joins: jruderman (n=jruderma@c-67-169-183-228.hsd1.ca.comcast.net)
- # [00:04] * om_out is now known as othermaciej
- # [00:50] * Quits: hasather (n=hasather@81-235-209-174-no62.tbcn.telia.com) (Remote closed the connection)
- # [00:50] * Joins: zcorpan_ (n=zcorpan@217-211-77-236-no13.tbcn.telia.com)
- # [00:50] * Joins: hasather (n=hasather@81-235-209-174-no62.tbcn.telia.com)
- # [00:51] * Quits: dbaron (n=dbaron@c-71-198-189-81.hsd1.ca.comcast.net) (Read error: 110 (Connection timed out))
- # [00:52] * Joins: MikeSmith (n=MikeSmit@202.33.78.114)
- # [01:26] * Quits: zcorpan_ (n=zcorpan@217-211-77-236-no13.tbcn.telia.com) (Read error: 110 (Connection timed out))
- # [02:04] * Quits: gavin (n=gavin@firefox/developer/gavin) (Remote closed the connection)
- # [02:04] * Joins: gavin (n=gavin@people.mozilla.com)
- # [02:06] * Parts: hasather (n=hasather@81-235-209-174-no62.tbcn.telia.com)
- # [02:58] * Quits: MikeSmith (n=MikeSmit@202.33.78.114) ("Get thee behind me, satan.")
- # [03:18] * Joins: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com)
- # [03:22] * Quits: bzed (n=bzed@dslb-084-059-126-057.pools.arcor-ip.net) (Remote closed the connection)
- # [03:34] * Joins: weinig (n=weinig@adsl-71-134-96-142.dsl.sntc01.pacbell.net)
- # [03:34] * Quits: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com) ("|")
- # [03:42] * Joins: tantek (n=tantek@dsl001-150-252.sfo1.dsl.speakeasy.net)
- # [03:56] * Joins: MikeSmith (n=MikeSmit@202.33.78.114)
- # [04:07] * Quits: MikeSmith (n=MikeSmit@202.33.78.114) ("Get thee behind me, satan.")
- # [04:43] * Quits: weinig (n=weinig@adsl-71-134-96-142.dsl.sntc01.pacbell.net)
- # [04:57] * Joins: jcgregorio (n=chatzill@adsl-072-148-043-048.sip.rmo.bellsouth.net)
- # [05:04] * Quits: jdandrea (n=jdandrea@ool-44c0a1fe.dyn.optonline.net)
- # [05:10] * Joins: mikeday (n=mikeday@CPE-60-224-50-129.vic.bigpond.net.au)
- # [05:10] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: mw22 (n=chatzill@h8441169151.dsl.speedlinq.nl) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: moeffju (i=moeffju@ubermutant.net) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: ianloic (n=ian@71.5.56.162.ptr.us.xo.net) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: jruderman (n=jruderma@c-67-169-183-228.hsd1.ca.comcast.net) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: Lachy (n=Lachlan@203-214-143-196.perm.iinet.net.au) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: tantek (n=tantek@dsl001-150-252.sfo1.dsl.speakeasy.net) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: csarven (n=nevrasc@modemcable081.152-201-24.mc.videotron.ca) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: annevk (n=annevk@pat-tdc.opera.com) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Quits: hays (n=hays@pool-138-88-199-16.res.east.verizon.net) (heinlein.freenode.net irc.freenode.net)
- # [05:10] * Joins: tantek (n=tantek@dsl001-150-252.sfo1.dsl.speakeasy.net)
- # [05:10] * Joins: jruderman (n=jruderma@c-67-169-183-228.hsd1.ca.comcast.net)
- # [05:10] * Joins: csarven (n=nevrasc@modemcable081.152-201-24.mc.videotron.ca)
- # [05:10] * Joins: annevk (n=annevk@pat-tdc.opera.com)
- # [05:10] * Joins: hays (n=hays@pool-138-88-199-16.res.east.verizon.net)
- # [05:10] * Joins: Lachy (n=Lachlan@203-214-143-196.perm.iinet.net.au)
- # [05:10] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
- # [05:10] * Joins: mw22 (n=chatzill@h8441169151.dsl.speedlinq.nl)
- # [05:10] * Joins: moeffju (i=moeffju@ubermutant.net)
- # [05:10] * Joins: ianloic (n=ian@71.5.56.162.ptr.us.xo.net)
- # [05:15] * Quits: tantek (n=tantek@dsl001-150-252.sfo1.dsl.speakeasy.net)
- # [05:17] * Quits: wakaba_ (n=w@118.166.210.220.dy.bbexcite.jp) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: gsnedders (n=gsnedder@host86-139-123-225.range86-139.btcentralplus.com) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: syp (n=syp@photpc17.epfl.ch) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: clotman (n=louis@shell.icgroup.com) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: Philip` (n=philip@zaynar.demon.co.uk) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: wilhelm (n=wilhelm@trivini.no) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: theoros (n=theoros@ACC8D244.ipt.aol.com) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: laug (n=laug@poy.chewa.net) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: didymos (i=jho@rapwap.razor.dk) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: bewest (n=ben@httpcraft/bewest) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: deltab (n=deltab@82-46-154-93.cable.ubr02.smal.blueyonder.co.uk) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: madmoose (i=madmoose@gateway/web/cgi-irc/beitsahour.net/x-a6a69e0cd54b3b1a) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: hsivonen (n=hsivonen@kekkonen.cs.hut.fi) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: Hixie (n=ianh@trivini.no) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: Dashiva (i=Dashiva@v035b.studby.ntnu.no) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Quits: Yudai (n=Yudai@p931d95.tokyte00.ap.so-net.ne.jp) (heinlein.freenode.net irc.freenode.net)
- # [05:17] * Joins: wakaba_ (n=w@118.166.210.220.dy.bbexcite.jp)
- # [05:17] * Joins: gsnedders (n=gsnedder@host86-139-123-225.range86-139.btcentralplus.com)
- # [05:17] * Joins: syp (n=syp@photpc17.epfl.ch)
- # [05:17] * Joins: clotman (n=louis@shell.icgroup.com)
- # [05:17] * Joins: Philip` (n=philip@zaynar.demon.co.uk)
- # [05:17] * Joins: wilhelm (n=wilhelm@trivini.no)
- # [05:17] * Joins: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net)
- # [05:17] * Joins: Dashiva (i=Dashiva@v035b.studby.ntnu.no)
- # [05:17] * Joins: Yudai (n=Yudai@p931d95.tokyte00.ap.so-net.ne.jp)
- # [05:17] * Joins: madmoose (i=madmoose@gateway/web/cgi-irc/beitsahour.net/x-a6a69e0cd54b3b1a)
- # [05:17] * Joins: hsivonen (n=hsivonen@kekkonen.cs.hut.fi)
- # [05:17] * Joins: Hixie (n=ianh@trivini.no)
- # [05:18] * Joins: theoros (n=theoros@ACC8D244.ipt.aol.com)
- # [05:18] * Joins: bewest (n=ben@httpcraft/bewest)
- # [05:18] * Joins: didymos (i=jho@rapwap.razor.dk)
- # [05:18] * Joins: laug (n=laug@poy.chewa.net)
- # [05:18] * Joins: deltab (n=deltab@82-46-154-93.cable.ubr02.smal.blueyonder.co.uk)
- # [05:24] * Joins: theoros` (n=theoros@ACC8D244.ipt.aol.com)
- # [05:25] * Quits: theoros` (n=theoros@ACC8D244.ipt.aol.com) (Read error: 104 (Connection reset by peer))
- # [05:26] * Quits: mikeday (n=mikeday@CPE-60-224-50-129.vic.bigpond.net.au) ("-")
- # [05:31] * Joins: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com)
- # [05:31] * Quits: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com) (Client Quit)
- # [06:12] * theoros is now known as theoros|asleep
- # [06:18] * Quits: theoros|asleep (n=theoros@ACC8D244.ipt.aol.com) (Excess Flood)
- # [06:19] * Joins: theoros|asleep (n=theoros@ACC8D244.ipt.aol.com)
- # [06:35] * Joins: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com)
- # [06:44] * Joins: tantek (n=tantek@66.201.57.7)
- # [06:52] * Quits: tantek (n=tantek@66.201.57.7)
- # [07:13] * Quits: jcgregorio (n=chatzill@adsl-072-148-043-048.sip.rmo.bellsouth.net) ("ChatZilla 0.9.78.1 [Firefox 2.0.0.3/0000000000]")
- # [07:32] * Joins: weinig (n=weinig@c-24-7-121-96.hsd1.ca.comcast.net)
- # [07:34] * Joins: tantek (n=tantek@66.201.57.7)
- # [07:46] * Quits: csarven (n=nevrasc@modemcable081.152-201-24.mc.videotron.ca)
- # [08:02] * Quits: tantek (n=tantek@66.201.57.7)
- # [08:32] * Joins: dbaron (n=dbaron@c-71-198-189-81.hsd1.ca.comcast.net)
- # [09:06] * Quits: dbaron (n=dbaron@c-71-198-189-81.hsd1.ca.comcast.net) ("8403864 bytes have been tenured, next gc will be global.")
- # [09:07] * Quits: Lachy (n=Lachlan@203-214-143-196.perm.iinet.net.au) (Read error: 104 (Connection reset by peer))
- # [09:08] * Joins: Lachy (n=Lachlan@203-214-143-196.perm.iinet.net.au)
- # [09:12] * Quits: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com)
- # [09:26] * weinig is now known as weinig|zZz
- # [09:27] * Joins: mikeday (n=mikeday@CPE-60-224-50-129.vic.bigpond.net.au)
- # [09:28] <mikeday> is whatwg.org down or is it just me?
- # [09:34] <Lachy> it appears to be down
- # [09:35] <mikeday> is the HTML5 spec anywhere else, like w3.org?
- # [09:35] <Lachy> yes, in CVS
- # [09:35] <Lachy> dev.w3.org
- # [09:35] <Lachy> http://dev.w3.org/cvsweb/html5/
- # [09:35] * Quits: weinig|zZz (n=weinig@c-24-7-121-96.hsd1.ca.comcast.net) (Read error: 60 (Operation timed out))
- # [09:36] <mikeday> awesome :)
- # [09:36] * Quits: Lachy (n=Lachlan@203-214-143-196.perm.iinet.net.au) ("Leaving")
- # [09:37] * Joins: Lachy (n=Lachlan@203-217-95-91.dyn.iinet.net.au)
- # [09:42] * Joins: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
- # [10:10] * Quits: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net)
- # [10:20] * Quits: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
- # [10:43] * Joins: ROBOd (n=robod@86.34.246.154)
- # [10:52] * Joins: zcorpan_ (n=zcorpan@217-211-77-236-no13.tbcn.telia.com)
- # [10:53] <mikeday> Hmm, the HTML5 spec seems to say that comments cannot occur before the root element
- # [10:54] <zcorpan_> mikeday: where do you read that?
- # [10:54] <mikeday> tree construction, 8.2.4.1. The initial phase
- # [10:55] <zcorpan_> that's before the doctype, no?
- # [10:55] <hsivonen> hmm. looks like the entire dreamhost is down
- # [10:55] <hsivonen> can't get to damowmow portal or the DOM viewer to check this
- # [10:56] <mikeday> ah, so only before the doctype
- # [10:56] <hsivonen> dreamhost has been down a bit too often lately
- # [10:56] <zcorpan_> mikeday: yeah... but then the #writing section goes ahead and says that comments are allowed before the doctype
- # [10:56] <mikeday> hrmph, that's helpful :)
- # [10:57] * zcorpan_ pointed that out before
- # [10:58] <mikeday> U+00 is converted to U+FFFD, but what about other weird characters like U+07?
- # [11:00] <hsivonen> mikeday: other weird stuff is preserved
- # [11:00] * hsivonen has complained about that before
- # [11:00] * mikeday is noticing a pattern here
- # [11:01] <mikeday> okay, one more thing: what does RCDATA stand for?
- # [11:02] <zcorpan_> replaced character data
- # [11:03] <mikeday> what exactly is replaced about it?
- # [11:03] <zcorpan_> entities
- # [11:03] <mikeday> can have entities... ah.
- # [11:32] <annevk> doesn't really matter what it stands for...
- # [11:32] <annevk> just implement the steps
- # [11:32] * Joins: met_ (n=Hassman@r5bx220.net.upc.cz)
- # [11:33] * Joins: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net)
- # [11:37] * Joins: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
- # [11:39] * Joins: peepo (n=Jay@host81-132-186-246.range81-132.btcentralplus.com)
- # [11:39] * Quits: peepo (n=Jay@host81-132-186-246.range81-132.btcentralplus.com) (Remote closed the connection)
- # [11:40] * Joins: peepo (n=Jay@host81-132-186-246.range81-132.btcentralplus.com)
- # [11:48] * Joins: maikmerten (n=maikmert@Lba02.l.pppool.de)
- # [11:49] * Joins: hasather (n=hasather@81-235-209-174-no62.tbcn.telia.com)
- # [11:51] * Quits: zcorpan_ (n=zcorpan@217-211-77-236-no13.tbcn.telia.com) (Read error: 110 (Connection timed out))
- # [11:57] <annevk> oh, whatwg is down?
- # [11:58] <annevk> is the mail server down too?
- # [11:58] * annevk wonders how that works
- # [12:01] <annevk> it seems that lists.whatwg.org is not down
- # [12:01] <annevk> on the other hand, my e-mail hasn't made it through to the archives yet...
- # [12:07] * Joins: jdandrea (n=jdandrea@ool-44c0a1fe.dyn.optonline.net)
- # [12:26] <mikeday> hi annevk
- # [12:27] <mikeday> took a look at the html5lib code, looks rather clean
- # [12:28] <mikeday> just toying with some C code
- # [12:28] <mikeday> it's a shame that you've got to do so much irrelevant stuff in C, though.
- # [12:31] <annevk> python is nice
- # [12:31] <annevk> especially to "quickly" prototype stuff like this
- # [12:31] <annevk> the problem is that it doesn't scale well for very large pages, such as the HTML5 spec
- # [12:32] <mikeday> you could probably speed it up, at the risk of making the code much uglier...
- # [12:34] <annevk> yeah... rather have a fast C implementation with Python wrappers I think
- # [12:35] * Quits: peepo (n=Jay@host81-132-186-246.range81-132.btcentralplus.com) (Read error: 104 (Connection reset by peer))
- # [12:36] <mikeday> that's the spirit, outsource the ugliness somewhere else :)
- # [12:38] * mikeday ponders
- # [12:38] <mikeday> the data state can have a very tight inner loop, just scanning for the next & or <
- # [12:39] <annevk> or EOF
- # [12:39] <annevk> charsUntil() handles EOF automatically
- # [12:39] <annevk> so you know
- # [12:40] <mikeday> I'm assuming you're working on a chunk of data, so you know there is no EOF in the middle of the chunk
- # [12:41] <annevk> if you do script execution document.close() might do that
- # [12:41] * annevk isn't sure
- # [12:41] <annevk> but it depends on how you implement stuff, I guess
- # [12:41] <mikeday> right
- # [12:42] <mikeday> I wonder which is faster: if '&' else if '<' else ..., or a table lookup
- # [12:42] <mikeday> eg. if charTable[currChar] == MARKUP_CHAR
- # [12:43] <annevk> from the little I know I believe table lookup is faster
- # [12:43] <annevk> however, how would you handle "any other character" in that case?
- # [12:43] <annevk> (I don't think I'm the right person to discuss this with though.)
- # [12:44] <mikeday> any other character would be the else case
- # [12:45] <annevk> that would work nicely then I suppose
- # [12:45] <mikeday> if (... == MARKUP_CHAR) { change state } else { keep accumulating character data }
- # [12:45] <mikeday> always frustrates me that efficient code looks less and less like the specification, though
- # [12:45] <mikeday> we still don't have a magical compiler that converts spec -> code
- # [12:46] <annevk> just use the tests from html5lib
- # [12:46] <annevk> and maybe contribute some more
- # [12:46] <annevk> and pay some attention to the spec too :)
- # [12:47] <mikeday> right :)
- # [12:50] <mikeday> hmm, using the HTML5 spec as a test document is rather meta
- # [12:50] <mikeday> especially considering it's not very well-formed :/
- # [12:51] <annevk> the multipage version of HTML5 is generated using html5lib
- # [12:51] <annevk> that's meta
- # [12:52] <mikeday> neat :)
- # [12:55] <hsivonen> mikeday: do you use a DFA for XML?
- # [12:56] <mikeday> hsivonen, not yet, but I'd like to
- # [12:56] <mikeday> I've generated one, but haven't got around to building a parser around it yet.
- # [12:56] <hsivonen> mikeday: surely a function call per tokenizer state is good enough considering that it is the de facto way to write XML parsers
- # [12:57] * mikeday shrugs
- # [12:57] <mikeday> for HTML5 you mean?
- # [12:57] <hsivonen> I intend to optimize away the explicit state variable but I hesitate going all the way to a hand-rolled DFA
- # [12:58] <hsivonen> mikeday: I meant a function call (possibly inlined by compiler) per state in the HTML5 tokenizer spec
- # [12:58] <hsivonen> mikeday: the XML parsers that I've looked at work roughly that way
- # [12:59] <mikeday> after looking at the spec, I've seen that the state machine is rather more complicated than the average DFA
- # [12:59] <mikeday> with XML it's easier, as you're going from grammar to DFA
- # [13:00] <annevk> there are some additional switches indeed based on tree construction feedback
- # [13:00] <annevk> although I think you should be able to integrate those too
- # [13:00] <mikeday> right, it would take a bit of messing around though
- # [13:00] <annevk> (it leads you further away from the spec though)
- # [13:00] <mikeday> that too.
- # [13:00] <annevk> shouldn't be much of an issue I think...
- # [13:01] <mikeday> by the way, a tiny test seems to show that the if/else is slightly faster than table
- # [13:01] <mikeday> if only two characters are being checked for
- # [13:01] <annevk> see, don't trust me :)
- # [13:01] <mikeday> but if three or more characters are being checked for, table wins by far
- # [13:01] <annevk> oh, ok :)
- # [13:01] <mikeday> eg. for whitespace characters it would be a win
- # [13:02] <mikeday> for the data state inner loop, not so much
- # [13:02] <hsivonen> I wonder if it is possible to construct a hash function that hashes all UTF-16 code units to a small range of integers so that markup-significant characters get unique scalars and neutral characters overlap
- # [13:03] * mikeday grins
- # [13:03] <hsivonen> (and effient one, that is)
- # [13:03] <hsivonen> efficient even
- # [13:03] <mikeday> let's see, markup significant characters are all < U+007F
- # [13:04] <mikeday> just make sure that everything above 127 is mapped to 127..255 range
- # [13:05] <mikeday> and ASCII stays as it is
- # [13:05] <mikeday> or do you want & and < to map to the same small integer?
- # [13:05] <hsivonen> didn't think that far
- # [13:05] <hsivonen> gotta go. later
- # [13:05] * mikeday waves
- # [13:06] <mikeday> hrm, jumping into the micro-optimisation, I forgot that no one uses UTF-16 anyway
- # [13:06] <mikeday> (for given values of no one)
- # [13:08] * Quits: aroben (n=adamrobe@c-67-160-250-192.hsd1.ca.comcast.net) (Read error: 110 (Connection timed out))
- # [13:10] <annevk> in some states unicode chars are important
- # [13:10] <mikeday> ?
- # [13:10] <annevk> tag name state
- # [13:10] <annevk> but I suppose that doesn't matter much
- # [13:11] <annevk> that's actually in the anything else case so...
- # [13:11] <annevk> nm me
- # [13:11] <mikeday> I noticed that the tag names all get lowercased
- # [13:11] <mikeday> that would mean that <camelCase> XML tags can't be embedded in HTML5, right?
- # [13:14] <annevk> ASCII lowercase, yes
- # [13:14] <annevk> XML can't be embedded in HTML5
- # [13:14] <mikeday> true, you could have camelCase tags as long as they use accented letters :)
- # [13:15] <mikeday> are unknown tags still added to the DOM?
- # [13:16] <annevk> of course
- # [13:16] <annevk> there's in fact no difference between "unknown tags" and <span> for instance
- # [13:16] <annevk> (iirc)
- # [13:17] <mikeday> so arbitrary vocabularies can be included,
- # [13:17] <mikeday> as long as they don't require <camelCase>
- # [13:17] <mikeday> or plain uppercase, for that matter
- # [13:17] <mikeday> seems like MathML would work fine
- # [13:18] <annevk> there's no namespace support either
- # [13:19] <annevk> but in due course we would add limited support for that I suppose
- # [13:20] <Philip`> annevk: Doesn't http://dev.w3.org/cvsweb/~checkout~/html5/spec/Overview.html?rev=1.12&content-type=text/html;%20charset=iso-8859-1#pixel cover the points about how an arbitrary object is treated as ImageData?
- # [13:21] <annevk> oh, I think I've been looking at an old version of the spec
- # [13:22] <Philip`> mikeday: I'd expect table lookups to usually be much slower than if/elses in real programs because you won't be able to keep the lookup table in the cache for very long (if you're processing lots of other data at the same time) and it'll have to do really expensive memory reads
- # [13:23] <Philip`> People used to use lookup tables for fast sin/cos calculations, but now it's much quicker just to get the CPU to recalculate it every time because memory is slow
- # [13:23] <mikeday> Philip`, the table is pretty small, 256 bytes, but the processing other data at the same time constraint could be a problem
- # [13:24] <Philip`> Caches are pretty small too :-)
- # [13:24] <annevk> thanks Philip`
- # [13:24] <Philip`> (like, uh, 16KB or something?)
- # [13:24] <Philip`> (depending on what processor you have)
- # [13:25] <mikeday> the whitespace test requires five else if branches, though
- # [13:26] <mikeday> at least it wouldn't be hard to try both methods on real world data
- # [13:26] <mikeday> as it's not really fundamental to the structure of the code
- # [13:33] <met_> annevk why is on http://annevankesteren.nl/2006/08-paintr21 It works in Firefox (given a few hacks), with the notable exception of the "Save it!" button.? Save works for me in FF 2.0.0.3
- # [13:34] <met_> the only difference is nice Paintr logo in Opera vs. text logo in FF
- # [13:35] * Quits: mikeday (n=mikeday@CPE-60-224-50-129.vic.bigpond.net.au) ("-")
- # [13:36] <met_> ah see the logo is made by css content:url
- # [13:38] <annevk> that thing was made before FF2
- # [13:38] <met_> can you update the text? 8-)))
- # [13:43] * Joins: bzed (n=bzed@dslb-084-059-108-031.pools.arcor-ip.net)
- # [13:44] * Joins: dk (i=dk@gouax1-151.dialup.optusnet.com.au)
- # [14:19] * Philip` wonders if <div irrelevant><img ...><img ...></div> would be a sensible way of pre-loading images to be used in a canvas, so you can just wait for window.onload and then be sure all the images are loaded
- # [14:20] <annevk> I think if you do img.src in a script the load event is delayed as well
- # [14:22] <Philip`> Oh, that sounds better
- # [14:29] <Dashiva> What's the deal with r\^ole?
- # [14:30] <Philip`> It's the (La)TeX spelling, I believe
- # [14:31] <Dashiva> of rĂ´le?
- # [14:32] <Philip`> Maybe, but my IRC client mangles that
- # [14:32] * Philip` looks in the log
- # [14:32] <Philip`> Ah, yes, that
- # [14:33] <Philip`> Same as rôle too, but not quite so ugly
- # [14:34] <Dashiva> But what's wrong with just role, was more my question
- # [14:34] <Lachy> aargh! I've asked 3 times for Patrick (or anyone else) to provide examples of tables that would benefit from the headers attribute, and each time he's bypassed the question entirely
- # [14:34] <annevk> lol, people are wasting their time on www-html? :)
- # [14:34] <Lachy> it's so annoying that they won't contribute when asked, and then bitch about being ignored
- # [14:35] <annevk> they are indeed
- # [14:35] <Philip`> Oh - just spelling it "role" seems far more sensible :-)
- # [14:35] <annevk> fun
- # [14:35] <Dashiva> Isn't that what the semantic web is all about?
- # [14:35] <Dashiva> Getting other people to do all the work, and then complaining about nothing happening
- # [14:40] <Philip`> That sounds like the approach of getting authors to mark up all their data correctly in a machine-processable form, so you can build advanced search engines on the semantic web that correctly understand the relationships between pieces of data
- # [14:41] <Philip`> compared to e.g. Google, which just puts up with whatever rubbish authors create
- # [14:41] <Philip`> but it's kind of obvious which one is doing better at the moment
- # [14:48] <maikmerten> wow, seems Opera's layout engine is 1345% more green that other competing engines... impressive http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(WHATWG)
- # [14:49] <maikmerten> one keeps wondering why such things make it into Wikipedia
- # [14:49] <Dashiva> Probably because all browsers have their share of fanatical fanboys
- # [14:50] <annevk> prolly also because it doesn't list all the WHATWG features
- # [14:53] <Philip`> You could replace the whole first table with "Web Forms 2.0: No ? Yes" and then Opera wouldn't be seen as having such an unfair lead
- # [14:54] <Dashiva> Thinking of it as a lead is a problem to begin with, IMO
- # [14:55] <annevk> Safari for instance does support type=range iirc
- # [14:55] <annevk> Firefox supports persistent storage
- # [14:55] <Philip`> Also one could change <video> to no in Opera, because it's not fair to count very experimental builds that don't even match the WA1 spec
- # [14:55] <annevk> Internet Explorer supports parts of drag & drop, draggable, contenteditable, etc.
- # [14:57] * Philip` wonders if anyone has made a <canvas> paint program that can save and load from globalStorage
- # [14:57] <Philip`> Oh, actually, that wouldn't work because you can't draw data: images then call toDataURL again :-(
- # [15:01] <annevk> Maybe the new definition of origin helps with that?
- # [15:02] <annevk> Cause in theory that would be a safe image, unless you got it after a redirect
- # [15:02] <Philip`> "The origin of a Document or image that was generated from a data: URI found in another Document or in a script is the origin of the that Document or script." - oh, sounds like that covers it
- # [15:03] * theoros|asleep is now known as theoros
- # [15:03] <annevk> Although if you store it in globalStorage and then retrieve it later...
- # [15:03] * annevk ponders
- # [15:04] <Philip`> You'd just get a string out of globalStorage, and I assume strings don't have complex security arrangements
- # [15:04] <Philip`> and then you'd create an image from that string, but that image would be created in your own document
- # [15:05] <annevk> sounds tricky
- # [15:06] <Philip`> (If you've got the data: string, you could rewrite libpng in JS and get the image data anyway, so the only problem is in whether you're allowed to get the string in the first place)
- # [15:06] <Philip`> (and you should be allowed to get strings from globalStorage, because otherwise it'd be a bit pointless...)
- # [15:06] <Philip`> but I don't know if that agrees with what the spec says
- # [15:07] <annevk> I suppose data: URLs not retrieved from <img> objects or non-same origin <canvas> objects are to be considered safe
- # [15:07] <annevk> and that therefore invoking toDataURL() should not fail and drawImage() should not mark the <canvas> object non-same origin
- # [15:11] <annevk> I suppose the problem is that painting a data URL might not always be safe
- # [15:53] * Quits: gsnedders (n=gsnedder@host86-139-123-225.range86-139.btcentralplus.com)
- # [16:00] * Joins: gsnedders (n=gsnedder@host86-139-123-225.range86-139.btcentralplus.com)
- # [16:34] * Quits: Lachy (n=Lachlan@203-217-95-91.dyn.iinet.net.au) (Read error: 104 (Connection reset by peer))
- # [16:54] * Joins: zcorpan_ (n=zcorpan@217-211-77-236-no13.tbcn.telia.com)
- # [17:10] * Joins: Lachy (n=Lachlan@203-217-95-91.dyn.iinet.net.au)
- # [17:45] * Joins: csarven (n=nevrasc@modemcable081.152-201-24.mc.videotron.ca)
- # [17:48] <annevk> http://weblog.200ok.com.au/2007/05/what-i-want-from-new-markup-spec.html
- # [17:52] <Lachy> hmm. Looks like we need some kind of tutorial to explain how the heading structure works
- # [17:53] <annevk> http://www.kavoir.com/2007/05/html5-adopted-by-w3c.html is someone who thinks Chris Wilson will be editor
- # [17:56] <Philip`> Also thinks Microsoft is one of the key contributing groups in the WHAT-WG
- # [17:56] <annevk> http://ma.gnolia.com/people/apartness/bookmarks/prejesh
- # [17:58] <annevk> http://www.designerstalk.com/forums/web-standards/26075-web-standards-danger.html
- # [17:59] <annevk> http://www.elementary-group-standards.com/web-standards/web-standards-html5-support-existing-content.html
- # [18:07] * met_ is glad he is wringting in Czech only, so all his mistakes cannot by discussed here 8-)
- # [18:08] <annevk> I wonder why people on www-html think there was some arbitrary descision process going on... The sole reason <samp> and such are still here is because dropping them would cost more.
- # [18:09] <annevk> I think there have hardly been any arbitrary descisions with regards to HTML5
- # [18:14] <wilhelm> Why would one want to drop such elements?
- # [18:17] <csarven> annevk tsk tsk <m>
- # [18:19] <Lachy> annevk, I think he's just using code, samp, etc. to make a point about dropping things like headers="" and summary=""
- # [18:20] <Lachy> personally, I somewhat agree with keeping headers (I'm just trying to get them to help find evidence for it), though I'm undecided about summary
- # [18:22] <Philip`> http://canvex.lazyilluminati.com/misc/summary.html is how people seem to be using summary now
- # [18:23] <Philip`> ((Can't remember if I pointed that out here before))
- # [18:25] <Lachy> Philip`, what was the total sample size surveyed?
- # [18:27] <Lachy> wow, so many of them are used for presentational purposes
- # [18:30] <Philip`> That was 2523 pages, of which 105 had a summary attribute anywhere
- # [18:31] * Joins: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com)
- # [18:31] <Lachy> I think we need a larger sample size
- # [18:31] <Philip`> The results are probably misleading because a few sites have a lot of distinct summaries
- # [18:32] <Lachy> the results should be grouped by domain name to deal with that
- # [18:33] <Philip`> It also seems quite hard to analyse the results automatically since pretty much everyone uses totally different strings (except for those that use "")
- # [18:33] <Philip`> But it would be useful to get much better data than this
- # [18:34] <Lachy> yeah, you could probably try to filter on things like the word "layout" and maybe the length (e.g. < 4 words is relatively useless)
- # [18:37] * Philip` would try to do something better if he didn't have far too much urgent work to do now instead :-)
- # [18:38] * Quits: dk (i=dk@gouax1-151.dialup.optusnet.com.au) (Read error: 60 (Operation timed out))
- # [18:38] <Lachy> are you going to release the code of the tool soon, so others can work with it?
- # [18:39] <Philip`> I'll attempt to do that once I have time
- # [18:39] <Philip`> It's not like it's particularly interesting or difficult code, though - it just downloads a load of pages into a database, then parses them all and walks through the tree trying to find things that match some condition, then sticks the results in a table
- # [18:40] <Philip`> (Can you get something like an XML database that does really fast queries on tree-structured data? That'd be quite handy for this kind of thing, after working around the problem that lots of sites can't be serialised into well-formed XML)
- # [18:41] <zcorpan_> TagSoup?
- # [18:41] <met_> Philip` have you some experience with xml databases?
- # [18:42] <Philip`> met_: None at all
- # [18:42] <met_> my colleagues recoomentder me http://exist.sourceforge.net/ but i never tried
- # [18:43] <met_> also you can use xml in postgresql (with xpath etc.), don't mentioning Oracle and MS SQL
- # [18:45] <Philip`> Ah, looks like it could be useful
- # [18:46] <met_> and here is a link about postgresql and xpath http://www.throwingbeans.org/postgresql_and_xml.html
- # [18:47] <met_> ms sql2005 and oracle (not sure wich version) have it natively as xml datatypes
- # [18:48] <Philip`> Hopefully the databases do some kind of indexing, because running unindexed queries over 100MB of XML doesn't sound like the absolute fastest thing ever
- # [18:49] <Philip`> or maybe I'm thinking from the wrong perspective for this kind of thing
- # [18:49] <met_> ms and oracle yes
- # [18:52] <Philip`> (For added fun, some of my downloaded documents are actually PDF files, parsed by html5lib into something that I expect is quite hideous. Maybe I should check the content-type on these things...)
- # [18:53] <met_> whow
- # [18:53] <met_> and what other types like *.doc etc
- # [18:54] <Philip`> I don't see any of those
- # [18:54] <Philip`> I just got the URLs from Yahoo search results (since they're nicer than Google and still provide search APIs), so it's limited to what they files they think are worth putting in the results
- # [19:06] * Joins: weinig|zZz (n=weinig@m810f36d0.tmodns.net)
- # [19:07] * weinig|zZz is now known as weinig
- # [19:19] * Quits: weinig (n=weinig@m810f36d0.tmodns.net)
- # [19:50] * Joins: dbaron (n=dbaron@c-71-198-189-81.hsd1.ca.comcast.net)
- # [20:14] * Joins: zcorpan (n=zcorpan@217-211-77-236-no13.tbcn.telia.com)
- # [20:30] * Joins: kingryan (n=kingryan@dsl081-240-149.sfo1.dsl.speakeasy.net)
- # [20:32] <annevk> csarven, what about it?
- # [20:32] <csarven> i find <m> arbitrary but im sure <samp> has its own story
- # [20:33] * Quits: zcorpan_ (n=zcorpan@217-211-77-236-no13.tbcn.telia.com) (Read error: 110 (Connection timed out))
- # [20:33] <annevk> <samp> is just there because dropping it would have little value
- # [20:34] <annevk> <m> is there because lots of pages use it
- # [20:34] <annevk> aiui
- # [20:34] <Philip`> I thought HTML5 was starting from a clean slate and only adding features when there's good enough reasons to justify adding them...
- # [20:35] <csarven> lots of pages use lots of things =)
- # [20:35] <Philip`> (or at least I'm fairly sure I remember people using that as an argument)
- # [20:35] <csarven> Philip` that would be the ideal approach but it is not always the case
- # [20:36] <annevk> Philip`, in general, ye
- # [20:36] <annevk> s
- # [20:56] * Quits: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com)
- # [21:03] * annevk tends to agree with David Baron that for implementors every HTML feature needs to be specified
- # [21:03] <annevk> (this includes <frameset>)
- # [21:10] * Joins: h3h (n=w3rd@cpe-66-75-149-197.san.res.rr.com)
- # [21:19] <hsivonen> annevk: yeah. If you build navigation systems, you need to know that the earth is round even if a flat earth would be nicer
- # [21:21] <Lachy> annevk, are you referring to David's latest on www-html? I didn't get the relevance, since the discussion was related to document conformance only.
- # [21:22] * Parts: zcorpan (n=zcorpan@217-211-77-236-no13.tbcn.telia.com)
- # [21:22] <annevk> the contents of his e-mail are relevant imo
- # [21:22] <annevk> although I agree it didn't make much sense in context
- # [21:23] <Lachy> sure, it's relevant to the spec in general
- # [21:23] * Parts: hasather (n=hasather@81-235-209-174-no62.tbcn.telia.com)
- # [21:23] <hsivonen> is there now relevant discussion on www-html? I unsubscribed to respect the HTML WG email recess.
- # [21:24] * Joins: hasather (n=hasather@81-235-209-174-no62.tbcn.telia.com)
- # [21:24] <Lachy> hsivonen, not really
- # [21:24] <Lachy> I'll let you know when something important is posted
- # [21:25] <hsivonen> Lachy: thanks
- # [21:26] <Lachy> nice! I can refer to this next time someone tries to shift the burden of proof on to me to disprove their claim http://en.wikipedia.org/wiki/Burden_of_proof#Science_and_other_uses
- # [21:37] * Joins: BenWard (n=BenWard@cpc3-cmbg2-0-0-cust58.cmbg.cable.ntl.com)
- # [21:50] * Joins: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net)
- # [22:12] * Quits: maikmerten (n=maikmert@Lba02.l.pppool.de) ("Leaving")
- # [22:27] <tantek> Lachy, nice reference, I hadn't seen that before and ended up writing up our own for microformats.org: http://microformats.org/wiki/brainstorming#Burden_of_Proof
- # [22:28] * Joins: zcorpan (n=zcorpan@84-216-40-20.sprayadsl.telenor.se)
- # [22:46] * Quits: ROBOd (n=robod@86.34.246.154) ("http://www.robodesign.ro")
- # [23:00] * Quits: tantek (n=tantek@adsl-63-195-114-133.dsl.snfc21.pacbell.net)
- # [23:03] * Joins: jdandrea_ (n=jdandrea@ool-44c0a58f.dyn.optonline.net)
- # [23:07] * Joins: JonT (n=opera@ti221110a080-11581.bb.online.no)
- # [23:11] * Parts: JonT (n=opera@ti221110a080-11581.bb.online.no)
- # [23:12] * Joins: JonT (n=opera@ti221110a080-11581.bb.online.no)
- # [23:12] * Parts: JonT (n=opera@ti221110a080-11581.bb.online.no)
- # [23:20] * Quits: jdandrea (n=jdandrea@ool-44c0a1fe.dyn.optonline.net) (Read error: 110 (Connection timed out))
- # [23:22] * Quits: hasather (n=hasather@81-235-209-174-no62.tbcn.telia.com) (Read error: 110 (Connection timed out))
- # [23:34] * Quits: met_ (n=Hassman@r5bx220.net.upc.cz) ("Chemists never die, they just stop reacting.")
- # [23:37] * Joins: Philip`_ (n=philip@zaynar.demon.co.uk)
- # [23:37] * Parts: BenWard (n=BenWard@cpc3-cmbg2-0-0-cust58.cmbg.cable.ntl.com)
- # [23:47] * Quits: Philip` (n=philip@zaynar.demon.co.uk) (Read error: 110 (Connection timed out))
- # [23:51] * Joins: mpt (n=mpt@canonical/launchpad/mpt)
- # Session Close: Mon May 14 00:00:00 2007
The end :)