Options:
- # Session Start: Mon Dec 22 00:00:00 2008
- # Session Ident: #whatwg
- # [00:00] * Joins: shepazu (n=schepers@cpe-65-29-70-220.indy.res.rr.com)
- # [00:02] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [00:04] * Joins: karlushi (n=karl@74.58.58.53)
- # [00:10] * Parts: erlehmann (n=erlehman@86.59.25.121)
- # [00:11] * Quits: Maurice (n=copyman@5ED548D4.cable.ziggo.nl) ("Disconnected...")
- # [00:11] * Joins: erlehmann (n=erlehman@86.59.25.121)
- # [00:13] * Quits: karlcow (n=karl@216.144.126.222) (Read error: 113 (No route to host))
- # [00:17] * Joins: hdh (n=hdh@58.187.23.189)
- # [00:18] * Quits: wakaba (n=wakaba@220.210.164.189) (Read error: 54 (Connection reset by peer))
- # [00:18] * Joins: wakaba_ (n=wakaba@189.164.210.220.dy.bbexcite.jp)
- # [00:46] * Joins: jruderman (n=jruderma@ip68-5-179-249.oc.oc.cox.net)
- # [00:49] * Quits: shepazu (n=schepers@cpe-65-29-70-220.indy.res.rr.com)
- # [00:51] * Quits: karlushi (n=karl@74.58.58.53) (Read error: 60 (Operation timed out))
- # [00:56] * Joins: karlushi (n=karl@216.144.126.222)
- # [00:59] * Joins: jruderman__ (n=jruderma@ip68-5-179-249.oc.oc.cox.net)
- # [01:00] * Quits: jruderman (n=jruderma@ip68-5-179-249.oc.oc.cox.net) (Read error: 60 (Operation timed out))
- # [01:02] * Quits: jruderman_ (n=jruderma@ip68-5-179-249.oc.oc.cox.net) (Read error: 110 (Connection timed out))
- # [01:03] <Philip`> [ ["Character", "txet>x lmth EPYTCOD!"], "ParseError", ["Character", "<"] ]
- # [01:03] <Philip`> Hmm, that doesn't quite look right
- # [01:04] * Joins: jruderman (n=jruderma@ip68-5-179-249.oc.oc.cox.net)
- # [01:09] * Joins: olliej (n=oliver@c-67-164-125-23.hsd1.ca.comcast.net)
- # [01:18] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
- # [01:19] * Joins: shepazu (n=schepers@adsl-76-252-31-89.dsl.ipltin.sbcglobal.net)
- # [01:20] * Quits: shepazu (n=schepers@adsl-76-252-31-89.dsl.ipltin.sbcglobal.net) (Remote closed the connection)
- # [01:21] * Quits: jruderman__ (n=jruderma@ip68-5-179-249.oc.oc.cox.net) (Read error: 110 (Connection timed out))
- # [01:33] * Joins: jruderman_ (n=jruderma@ip68-5-179-249.oc.oc.cox.net)
- # [01:36] * Quits: jruderman (n=jruderma@ip68-5-179-249.oc.oc.cox.net) (Read error: 60 (Operation timed out))
- # [01:50] <Philip`> Hooray, now my OCaml code passes all of the tokeniser tests (excluding the content model / escape flag ones)
- # [01:50] <Philip`> It's sometimes horrendously inefficient, e.g. every time it consumes an entity it sorts the whole entity list by length and then iterates through to find the first match, but that's okay because efficient is a non-goal
- # [01:51] <Philip`> s/efficient/efficiency/
- # [01:51] <famicom> eh
- # [01:52] <famicom> simplicity>consistency>efficiency
- # [01:52] <takkaria> except, say, when writing parsers that need to be time-efficient, when efficiencey is a pretty important thing
- # [01:52] <famicom> takkaria
- # [01:52] <famicom> repeat after me: "Premature optimization is the root of all evil"
- # [01:53] <takkaria> I'm not talking about premature optimisation :)
- # [01:53] <Philip`> Overly late optimisation is a problem too - you have to be careful to get it just right :-)
- # [01:54] <famicom> philip: You mean like mozilla firefox?
- # [01:54] <famicom> which is apiece of bloat
- # [01:54] <famicom> it crashed when i tried to open 109 bookmarks at the same time
- # [01:55] <Philip`> (My OCaml thing is meant to act as a flexible reference implementation rather than as a usable parser, but the idea is to be able to compile that implementation into efficient code in other languages)
- # [02:00] <Philip`> http://philip.html5.org/misc/tokeniser_states.png
- # [02:45] * Quits: tndH (n=Rob@adsl-83-100-138-116.karoo.KCOM.COM) ("ChatZilla 0.9.84-rdmsoft [XULRunner 1.9.0.1/2008072406]")
- # [02:51] * Quits: Amorphous (i=jan@unaffiliated/amorphous) (Read error: 110 (Connection timed out))
- # [02:53] * Joins: Amorphous (i=jan@unaffiliated/amorphous)
- # [02:57] * Joins: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au)
- # [04:07] * Parts: erlehmann (n=erlehman@86.59.25.121)
- # [04:07] * Joins: erlehmann (n=erlehman@86.59.25.121)
- # [04:07] * Parts: erlehmann (n=erlehman@86.59.25.121)
- # [04:08] * Joins: erlehmann (n=erlehman@86.59.25.121)
- # [04:17] * Joins: MikeSmith (n=MikeSmit@58.157.21.205)
- # [04:28] * Quits: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au) ("This computer has gone to sleep")
- # [04:46] <jwalden> Philip`: that's the entire state-transition diagram for HTML5, I take it? beats out ECMA-262 for simplicity as I recall
- # [05:05] * Joins: dglazkov (n=dglazkov@c-24-130-144-56.hsd1.ca.comcast.net)
- # [05:21] * Quits: doublec (n=chris@202.0.36.64) ("Leaving")
- # [05:43] * Joins: doublec (n=Chris_Do@118-92-151-230.dsl.dyn.ihug.co.nz)
- # [05:48] * Quits: doublec (n=Chris_Do@118-92-151-230.dsl.dyn.ihug.co.nz) (Read error: 104 (Connection reset by peer))
- # [05:48] * Joins: doublec (n=Chris_Do@118-92-151-230.dsl.dyn.ihug.co.nz)
- # [05:58] * Quits: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net) (Read error: 54 (Connection reset by peer))
- # [05:59] * Joins: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net)
- # [06:06] * Quits: heycam (n=cam@clm-laptop.infotech.monash.edu.au) ("bye")
- # [06:25] * Quits: doublec (n=Chris_Do@118-92-151-230.dsl.dyn.ihug.co.nz) ("ChatZilla 0.9.79-rdmsoft [XULRunner 1.8.0.9/2006120508]")
- # [06:38] * Quits: dglazkov (n=dglazkov@c-24-130-144-56.hsd1.ca.comcast.net)
- # [06:47] * Quits: karlushi (n=karl@216.144.126.222) (Read error: 113 (No route to host))
- # [06:50] * Joins: ap (n=ap@195.239.126.12)
- # [06:57] * Quits: Sephr (n=Sephr@c-68-38-250-93.hsd1.pa.comcast.net) ("Sephr.net")
- # [07:03] * Joins: harig (n=harig_in@122.160.12.230)
- # [07:16] * Joins: aboodman2 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net)
- # [07:41] * Joins: maikmerten (n=merten@ls5dhcp195.cs.uni-dortmund.de)
- # [07:44] * Joins: heycam (n=cam@210-84-45-25.dyn.iinet.net.au)
- # [07:54] * Quits: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net) (Read error: 104 (Connection reset by peer))
- # [07:54] * Joins: jacobolus_ (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net)
- # [07:56] * Quits: aboodman2 (n=aboodman@dsl081-073-212.sfo1.dsl.speakeasy.net)
- # [08:01] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
- # [08:09] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [08:09] * Joins: weinig (n=weinig@c-69-181-81-233.hsd1.ca.comcast.net)
- # [08:17] * Quits: jacobolus_ (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net) (Read error: 104 (Connection reset by peer))
- # [08:18] * Joins: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net)
- # [08:27] * Joins: pesla (n=retep@procurios.xs4all.nl)
- # [08:34] * Joins: pergj (n=pergj@195.159.61.155)
- # [08:38] * Quits: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net) (Read error: 131 (Connection reset by peer))
- # [08:38] * Joins: jacobolus (n=jacobolu@pool-71-104-189-240.lsanca.dsl-w.verizon.net)
- # [08:53] * Quits: pergj (n=pergj@195.159.61.155) ("Ex-Chat")
- # [08:55] * Joins: pergj (n=pergj@195.159.61.155)
- # [08:55] * Quits: pergj (n=pergj@195.159.61.155) (Remote closed the connection)
- # [08:56] * Joins: pergj (n=pergj@195.159.61.155)
- # [08:58] * Quits: pergj (n=pergj@195.159.61.155) (Client Quit)
- # [09:00] * Joins: pergj (n=pergj@195.159.61.155)
- # [09:01] * Quits: pergj (n=pergj@195.159.61.155) (Client Quit)
- # [09:02] * Joins: pergj (n=pergj@195.159.61.155)
- # [09:38] * Quits: harig (n=harig_in@122.160.12.230) (Read error: 110 (Connection timed out))
- # [09:48] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
- # [09:49] * Joins: yecril71 (n=giecrilj@piekna-gts.2a.pl)
- # [09:50] <yecril71> The advantage of having window for global scope is that otherwise you would not be able to differentiate between local and global.
- # [09:52] <yecril71> It does not cover all identifiers, e.g. it does not apply to constants and class names, but it is useful nevertheless.
- # [09:56] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
- # [10:00] <yecril71> Modern blogs and wikis allow users to embed images in editable content.
- # [10:00] * Joins: Maurice (n=copyman@5ED548D4.cable.ziggo.nl)
- # [10:09] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
- # [10:16] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) ("Leaving")
- # [10:21] * Parts: erlehmann (n=erlehman@86.59.25.121)
- # [10:24] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
- # [10:26] * Joins: danbri (n=danbri@ip565f6edb.direct-adsl.nl)
- # [10:38] <Philip`> jwalden: That's just for the tokeniser
- # [10:38] <jwalden> okay, I *think* that's analogous
- # [10:40] * jgraham tries to check in changes to html5lib gets caught by merge errors, cries
- # [10:40] <Philip`> jwalden: (The tree constructor algorithm is more like http://philip.html5.org/misc/insertion-modes-4.svg but that's about nine months out of date)
- # [10:41] <jwalden> tables
- # [10:42] <jwalden> bleh, let's just get rid of 'em
- # [10:42] <jwalden> :-)
- # [11:02] * Joins: ROBOd (n=robod@89.122.216.38)
- # [11:04] * Philip` remembers he used to have something that split out the content model flags like http://canvex.lazyilluminati.com/misc/states10.png but can't find the code anywhere :-(
- # [11:09] * Joins: erlehmann (n=erlehman@86.59.25.121)
- # [11:26] * Quits: Lachy (n=Lachlan@85.196.122.246) ("This computer has gone to sleep")
- # [11:33] * Quits: svl (n=me@ip565744a7.direct-adsl.nl) ("And back he spurred like a madman, shrieking a curse to the sky.")
- # [11:40] * Joins: Lachy (n=Lachlan@pat-tdc.opera.com)
- # [11:44] * Joins: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
- # [12:13] * Quits: hdh (n=hdh@58.187.23.189) ("Leaving.")
- # [12:22] * Quits: jwalden (n=waldo@c-67-180-39-55.hsd1.ca.comcast.net) (Connection reset by peer)
- # [12:24] <gsnedders> Philip`: Have you looked in /dev/null?
- # [12:44] * Quits: MikeSmith (n=MikeSmit@58.157.21.205) ("sex break")
- # [12:49] * Joins: mookid (i=mookid@ROFL.name)
- # [13:02] * Quits: Kuruma (n=Kuruman@h116-000-163-146.catv01.catv-yokohama.ne.jp) (Remote closed the connection)
- # [13:05] * Joins: Kuruma (n=Kuruman@h116-000-163-146.catv01.catv-yokohama.ne.jp)
- # [13:18] * Quits: ap (n=ap@195.239.126.12)
- # [13:21] <Philip`> gsnedders: Yes, but I couldn't find anything in there
- # [13:25] <Philip`> http://canvas.quaese.de/ looks like a handy canvas tutorial, if you speak German
- # [13:27] <hsivonen> it should be in /dev/random along with the works of Shakespeare
- # [13:37] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [13:38] * Joins: karlcow (n=karl@216.144.126.222)
- # [13:38] * Quits: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
- # [13:39] <jgraham> Philip`: Any good ideas about how to implement the character encoding reparsing stuff in html5lib?
- # [13:39] <jgraham> s/encoding/encoding switching/
- # [13:42] * Joins: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
- # [13:42] <Philip`> jgraham: I know almost entirely nothing about how character encoding works in HTML5 or html5lib or Python, so I have no ideas :-(
- # [14:00] <jgraham> Philip`: What I know: If we hit a meta element we need to either be sure that all the characters consumed so far have the same encoding as the previous characters or restart the parsing. The underlying file-like object may not natively support reseeking to the beginning so we either have to reread it or buffer the whole thing ourselves.
- # [14:01] <Philip`> We already try to buffer the first 10KB of the stream as soon as you start parsing it
- # [14:02] <gsnedders> jgraham: We don't want to re-read it if it's a urllib object of a POST request, for example
- # [14:02] <gsnedders> jgraham: So we probably need to buffer it
- # [14:02] <jgraham> I _think_ we need to buffer the raw character data before replacment characters are inserted and line breaks are normalised
- # [14:03] <Philip`> jgraham: Oh, that sounds true, and we only buffer the post-preprocessed input stream
- # [14:03] <Philip`> Is there some fixed limit on how much would need to be buffered?
- # [14:04] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
- # [14:04] <jgraham> Philip`: AFAIK, no
- # [14:04] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [14:04] <jgraham> So maybe we should make a BufferedStream type that adds a .tell() and .seek() method to non-buffered streams
- # [14:04] <jgraham> By storing all the read data in a buffer
- # [14:04] <jgraham> (there is something like this already but it is not quite what we want)
- # [14:05] <Philip`> So for a document that never confidently declares a character encoding, the entire thing will be buffered in memory?
- # [14:05] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [14:05] <Philip`> In that case, we could just slurp the entire stream into a single string at the start, and then parse that
- # [14:05] <jgraham> Oh but then there is another problem because when we hit the <meta> element we don't know where in the unprocessed stream we are
- # [14:05] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [14:06] <jgraham> (assuming we want to read that in chunks for the sake of efficiency)
- # [14:06] <Philip`> Why does it matter where we are in the unprocessed stream?
- # [14:07] <Philip`> If the encoding changes incompatibly, it would just have to seek to 0 and start again, and it wouldn't matter where it had changed
- # [14:07] <Philip`> Oh
- # [14:08] <jgraham> It matters if we want to continue without reparisng if the encoding is compatible
- # [14:08] <Philip`> but it needs to work out whether anything has changed incompatibly, up to the end of the meta charset element
- # [14:08] <Philip`> which means it needs to know where it's read up to
- # [14:08] <Philip`> Oh, and that too
- # [14:09] <Philip`> It'd be easier if html5lib decided not to be a "user agent [that] supports changing the converter on the fly"
- # [14:09] <jgraham> Yeah, we could ignore that for now
- # [14:09] <jgraham> (but it would be a perf. in if we supported it)
- # [14:09] <jgraham> /in/win/
- # [14:10] <jgraham> (assuming supporting it didn't place an undue burden on the implementation)
- # [14:10] <Philip`> jgraham: (Probably not much of one, since meta charset will typically be near the start of the document and it wouldn't have to reparse much at all)
- # [14:10] <takkaria> Hubbub doesn't allow changing the convertor on the fly, it just reparses
- # [14:10] <Philip`> takkaria: Is its input a stream or a string or something?
- # [14:11] <takkaria> yes, a string, so not a particularly useful comment from me there. :)
- # [14:12] <Philip`> jgraham: Is there a reason why html5lib should use streams rather than slurping everything into a string?
- # [14:12] <Philip`> Memory is cheap, after all :-)
- # [14:13] <jgraham> Philip`: It seems nicer? (especially for long strings). Also, we could, in principle, throw the buffer away once the encoding confidence was certian
- # [14:14] <takkaria> more properly, what hubbub actually does is call a "character encoding change" hook, which then can set a flag on the tokeniser so that it stops parsing and returns the new character encoding
- # [14:14] <takkaria> and then the app that's using hubbub has to send the data in again
- # [14:17] * Philip` mostly just wants to reduce the overhead of calling char(), to make things much faster, but that seems independent of the encoding-related buffering/reparsing issue since it's on the opposite side of the decoder
- # [14:18] * Quits: karlcow (n=karl@216.144.126.222) ("This computer has gone to sleep")
- # [14:20] <Philip`> BufferedStream with .seek_to_zero() (and reparse when the encoding changes, don't do the complex changing-on-the-fly thing) sounds like the sanest approach, I guess
- # [14:21] * jgraham wonders if hsivonen solved this issue
- # [14:21] <jgraham> Philip`: OK, I will look at that at some point soon
- # [14:22] <jgraham> (like maybe this evening)
- # [14:24] <hsivonen> for Java, I figured that it happens too often that the buffering in the character decoder causes non-ASCII to be buffered by the time of changing encodings
- # [14:24] <hsivonen> so I decided to remove support for changing decoders in place
- # [14:25] <hsivonen> instead, the java.io-based driver restarts the parse unconditionally when changing encodings
- # [14:26] <hsivonen> I intend to implement the same strategy for Gecko, but the current Gecko behavior is different, so I'm not sure if the spec as currently written is completely Web-compatible here
- # [14:26] <hsivonen> We'll see
- # [14:26] <jgraham> hsivonen: What does GEcko do?
- # [14:27] <hsivonen> jgraham: I don't understand what it does.
- # [14:27] <hsivonen> jgraham: my hypothesis is that it reparses if scripts haven't run and changes decoders in place if scripts have run
- # [14:38] * Quits: danbri (n=danbri@unaffiliated/danbri)
- # [14:43] * Joins: tndH (n=Rob@adsl-83-100-138-116.karoo.KCOM.COM)
- # [14:45] * Quits: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
- # [14:51] * Quits: olliej (n=oliver@c-67-164-125-23.hsd1.ca.comcast.net) (Remote closed the connection)
- # [14:51] * Joins: olliej (n=oliver@c-67-164-125-23.hsd1.ca.comcast.net)
- # [14:55] * Joins: karlcow (n=karl@modemcable168.84-81-70.mc.videotron.ca)
- # [15:05] * Joins: aroben (i=aroben@unaffiliated/aroben)
- # [15:12] * Joins: aroben_ (i=aroben@unaffiliated/aroben)
- # [15:22] * Quits: Hish (n=chatzill@mail2.n-e-s.de) (Remote closed the connection)
- # [15:24] <jgraham> Philip` or someone - let me know if I just horribly broke html5lib in some way and I'll back out the change (I checked in more than I intended to anyway)
- # [15:25] <Philip`> jgraham: It already fails enough test cases that I probably wouldn't notice if all the rest started breaking too :-)
- # [15:27] * Quits: aroben (i=aroben@unaffiliated/aroben) (Read error: 110 (Connection timed out))
- # [15:29] * aroben_ is now known as aroben
- # [15:45] <jgraham> Philip`: BTW, I think it should be a little faster now
- # [15:46] * Philip` sees ihatexml.py
- # [15:51] * Joins: svl (n=me@ip565744a7.direct-adsl.nl)
- # [15:53] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
- # [15:53] * Joins: dbaron (n=dbaron@pool-173-49-118-225.phlapa.fios.verizon.net)
- # [15:56] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [16:01] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [16:03] * Quits: olliej (n=oliver@c-67-164-125-23.hsd1.ca.comcast.net)
- # [16:10] * Joins: dglazkov (n=dglazkov@c-24-130-144-56.hsd1.ca.comcast.net)
- # [16:21] * Joins: ap (n=ap@195.239.126.11)
- # [16:30] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [16:32] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [16:32] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [16:33] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [16:37] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [16:38] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [16:38] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [16:38] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) ("Leaving")
- # [16:40] * Quits: dglazkov (n=dglazkov@c-24-130-144-56.hsd1.ca.comcast.net)
- # [16:43] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [16:43] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [16:51] * Joins: myakura (n=myakura@p3156-ipbf1910marunouchi.tokyo.ocn.ne.jp)
- # [16:55] * Quits: pergj (n=pergj@195.159.61.155) (Read error: 110 (Connection timed out))
- # [16:55] * Quits: maikmerten (n=merten@ls5dhcp195.cs.uni-dortmund.de) (Remote closed the connection)
- # [17:09] * Joins: dglazkov (n=dglazkov@nat/google/x-099d3ce636b3234b)
- # [17:21] * Quits: pesla (n=retep@procurios.xs4all.nl) ("( www.nnscript.com :: NoNameScript 4.21 :: www.esnation.com )")
- # [17:29] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Read error: 60 (Operation timed out))
- # [17:42] * Joins: mlpug (n=user@a88-115-168-225.elisa-laajakaista.fi)
- # [17:46] * Joins: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au)
- # [17:46] * Quits: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au) (Remote closed the connection)
- # [17:51] * Quits: Lachy (n=Lachlan@pat-tdc.opera.com) ("This computer has gone to sleep")
- # [17:51] * Philip` generates twelve thousand tokeniser test cases, and finds one bug in html5lib
- # [17:54] <gsnedders> Philip`: Then you don't have enough test cases
- # [17:56] <Philip`> gsnedders: I can't think of any more test cases to add, since I have one case for each interesting character that can occur from every tokeniser state
- # [18:00] <gsnedders> Philip`: Do you test every possible unicode character in every state?
- # [18:01] <gsnedders> No, you don't.
- # [18:01] <Philip`> gsnedders: No, because those aren't interesting characters
- # [18:01] <gsnedders> Philip`: That doesn't mean there aren't interesting bugs
- # [18:02] <Philip`> gsnedders: It means it's very unlikely that there will be bugs, because I test all the characters that a sane tokeniser would depend on, and every other character is equivalent and has no special processing
- # [18:03] <gsnedders> Philip`: You are assuming tokenizers are sane, which is very naïve
- # [18:19] * Quits: weinig (n=weinig@c-69-181-81-233.hsd1.ca.comcast.net)
- # [18:27] * Quits: myakura (n=myakura@p3156-ipbf1910marunouchi.tokyo.ocn.ne.jp) ("Leaving...")
- # [18:27] <takkaria> Philip`: please do make those testcases public. :)
- # [18:33] <jruderman_> Philip`: i bet you'd find more bugs by fuzzing than by trying to be exhaustive wrt one aspect of parsing
- # [18:45] * Joins: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au)
- # [18:51] * Joins: weinig (n=weinig@17.203.15.158)
- # [19:01] * Joins: weinig_ (n=weinig@nat/apple/x-d841b18ac91e3904)
- # [19:01] * Joins: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
- # [19:01] * dave_levin is now known as dave_levin|AWAY
- # [19:04] * Joins: shepazu (n=schepers@mo-76-0-60-125.dhcp.embarqhsd.net)
- # [19:16] * Quits: weinig (n=weinig@17.203.15.158) (Read error: 110 (Connection timed out))
- # [19:18] * Quits: dbaron (n=dbaron@pool-173-49-118-225.phlapa.fios.verizon.net) (Read error: 60 (Operation timed out))
- # [19:20] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [19:22] * weinig_ is now known as weinig
- # [19:22] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [19:22] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [19:29] * Quits: drry (n=drry@it17.opt2.point.ne.jp)
- # [19:32] * Joins: drry (n=drry@it17.opt2.point.ne.jp)
- # [19:32] <Philip`> takkaria: I think it'd be a bad idea to add them all into html5lib, but I could just upload them to the web somewhere
- # [19:32] <gsnedders> Philip`: Add them all into html5lib, please.
- # [19:32] * Joins: jwalden_ (n=waldo@corp-241.mountainview.mozilla.com)
- # [19:32] * jwalden_ is now known as jwalden
- # [19:34] <Philip`> jruderman_: This seems like a case where exhaustiveness is relatively feasible, since there's an algorithm with a well-defined series of states and state transitions, and most implementations are pretty close to that definition, so it works at providing decent coverage of the implementations
- # [19:34] <Philip`> gsnedders: Why?
- # [19:34] <Philip`> gsnedders: Also: No
- # [19:42] <gsnedders> Philip`: Because then we have test cases located in one place
- # [19:43] * Joins: dbaron (n=dbaron@pool-173-49-118-225.phlapa.fios.verizon.net)
- # [19:44] <Philip`> gsnedders: But if there's twelve thousand tokeniser tests, and it takes ages to run them all, people will run the tests less often, which is detrimental
- # [19:45] <gsnedders> Philip`: But if they aren't there then they won't be wrong, which is detrimental
- # [19:45] <gsnedders> s/wrong/run/
- # [19:45] <gsnedders> Interesting typo.
- # [19:46] <Philip`> gsnedders: It's only detrimental if they would have caught a bug that the remaining tests would have missed
- # [19:47] <Philip`> (and most of these tests are very redundant)
- # [19:47] * Quits: aroben (i=aroben@unaffiliated/aroben) (Read error: 104 (Connection reset by peer))
- # [19:47] * Quits: nessy (n=nessy@124-171-30-131.dyn.iinet.net.au) ("This computer has gone to sleep")
- # [19:48] <Philip`> (e.g. there are tests for "<!DOCTYPEa", "<!DOCTYPEb", "<!DOCTYPEy", "<!DOCTYPEz", "<!DOCTYPEA", ...)
- # [19:48] <Philip`> Also, if I did check in all these tests, and then the spec changed, someone would find hundreds of errors and get really annoyed trying to manually fix all the test cases
- # [19:50] <Philip`> Oops, there's only actually about one thousand tests, since I didn't sufficiently uniquify them
- # [20:03] <gsnedders> That certainly isn't too many.
- # [20:05] * Joins: annevk (n=annevk@53530B04.cable.casema.nl)
- # [20:14] <gsnedders> ergh. This is going to be horrible. Having the same @cite over and over again.
- # [20:14] <gsnedders> Meh.
- # [20:17] * Quits: weinig (n=weinig@nat/apple/x-d841b18ac91e3904) (Remote closed the connection)
- # [20:17] * Joins: weinig (n=weinig@17.203.15.158)
- # [20:22] <Philip`> http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/test3.test
- # [20:22] <Philip`> Happy now? :-p
- # [20:23] <Philip`> (That's about 1500, after I stopped stupidly failing to remove duplicates)
- # [20:23] <Philip`> takkaria: There's some new tests for you to run if you fancy it :-)
- # [20:25] <gsnedders> Philip`: :)
- # [20:54] <Dashiva> Are the tests sorted in order of relevance? :)
- # [20:58] <Philip`> I don't have a way to quantitatively determine relevance, so they're just sorted on the input strings :-p
- # [20:58] * Joins: aroben (n=adamrobe@c-69-142-103-232.hsd1.pa.comcast.net)
- # [21:00] * Quits: ROBOd (n=robod@89.122.216.38) (Excess Flood)
- # [21:01] * Joins: ROBOd (n=robod@89.122.216.38)
- # [21:04] * Joins: aaronlev (n=chatzill@e176230253.adsl.alicedsl.de)
- # [21:19] * Parts: annevk (n=annevk@53530B04.cable.casema.nl)
- # [21:21] * Quits: dolske (n=dolske@firefox/developer/dolske)
- # [21:27] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
- # [21:27] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [21:34] * Quits: yecril71 (n=giecrilj@piekna-gts.2a.pl)
- # [21:34] * Quits: kangax (n=kangax@ool-182f8118.dyn.optonline.net)
- # [21:35] * Quits: mlpug (n=user@a88-115-168-225.elisa-laajakaista.fi) (Remote closed the connection)
- # [21:35] <gsnedders> Is it reasonable to write notes for English in HTML?
- # [21:37] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
- # [21:38] <jruderman_> "for English"? as in classroom lecture notes?
- # [21:38] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [21:38] <gsnedders> jruderman_: Well, not lecture notes, but for my final year of school (in the en-gb meaning of school)
- # [21:38] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [21:38] * Joins: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl)
- # [21:39] <jruderman_> i used HTML for a few papers in college
- # [21:39] <jruderman_> and TeX for others
- # [21:39] <gsnedders> These are notes for my dissertation: I intend on doing the dissertation itself using XeTeX
- # [21:40] <jruderman_> i liked using HTML because i could easily tweak styles across an entire document. wysiwyg word processors usually don't do that well.
- # [21:41] <jruderman_> for example, if i needed to pad my paper a little, it was a simple matter of p { line-height: 1.05em; }
- # [21:41] <jruderman_> slightly less obvious than changing the font size ;)
- # [21:41] <gsnedders> For notes that isn't so needed :)
- # [21:41] <jruderman_> hehe
- # [21:42] <jruderman_> still useful to be able to change the styles of all the headings at once, though
- # [21:42] <jruderman_> another advantage of HTML is that you can put the notes on your web site and not worry about what software viewers have; )
- # [21:45] <Philip`> Is text/plain inadequate for notes?
- # [21:45] <gsnedders> Philip`: Yes
- # [21:45] <Philip`> Why?
- # [21:45] <gsnedders> Philip`: Can't so easily build TOCs for text/plain :)
- # [21:46] <Philip`> Why do notes need a TOC?
- # [21:46] <Philip`> Just use your editor's 'find' feature if you want to go to a certain section :-)
- # [21:46] * gsnedders now has a header element
- # [21:49] * Joins: dolske (n=dolske@corp-241.mountainview.mozilla.com)
- # [21:50] <gsnedders> I need automatic indexing in anolis
- # [21:50] * Joins: annevk (n=annevk@77.163.243.203)
- # [21:50] <gsnedders> I do like how I mention that then someone who asked for it comes along
- # [22:01] <gsnedders> Anyone have views on how to mark up a bilbiography?
- # [22:05] <Philip`> I suggest putting it in <cite>
- # [22:06] <gsnedders> Philip`: "The cite element represents the title of a work"
- # [22:06] <Philip`> Who cares what specs say?
- # [22:06] <Philip`> You're citing stuff, so use <cite> - it makes perfect sense
- # [22:06] <gsnedders> It does, but Hixie's stupid.
- # [22:09] <hsivonen> what classes of products is http://www.w3.org/TR/2008/WD-XForms-for-HTML-20081219/ supposed to be normative on?
- # [22:10] <hsivonen> gsnedders: my view about marking up a bibliography: http://hsivonen.iki.fi/thesis/html5-conformance-checker#references
- # [22:11] <gsnedders> hsivonen: That doesn't conform to ISO 690, though
- # [22:11] <gsnedders> I mean, sure, I can use classes, but what do I gain?
- # [22:12] <hsivonen> gsnedders: you probably don't gain anything
- # [22:12] * gsnedders links urn:isbn:0-330-29666-3
- # [22:13] <hsivonen> gsnedders: bibliography formats that don't show the first name of the authors in full suck
- # [22:13] <hsivonen> gsnedders: they are bad for googling and disapproved by feminists
- # [22:14] <hsivonen> gsnedders: also, emphasizing author names over titles of works sucks when you are mostly referencing specs and technical documents some of which conceal their authors/editors
- # [22:15] <gsnedders> hsivonen: I'm referencing a book for English work, so that isn't relevant :)
- # [22:16] <hsivonen> gsnedders: you could still make the argument that in cultural contexts where the surname of the author is the surname of the spouse, abbreviating the first name of the author diminishes the personal identifier of the author to one letter, which is uncool
- # [22:17] * Quits: ROBOd (n=robod@89.122.216.38) ("http://www.robodesign.ro")
- # [22:19] <hsivonen> gsnedders: besides, I suggest making references in a way that you can GET without paying CHF 72
- # [22:19] <gsnedders> :)
- # [22:20] * Philip` realises that efficiently cutting wrapping paper for varyingly-sized presents is probably a bin packing problem and therefore NP-hard, which is totally unfair
- # [22:21] * Quits: jwalden (n=waldo@corp-241.mountainview.mozilla.com) ("ChatZilla 0.9.82.1-rdmsoft [XULRunner 1.8.0.9/2006120508]")
- # [22:29] * Joins: Lachy (n=Lachlan@85.196.122.246)
- # [22:31] * Joins: jwalden_ (n=waldo@corp-241.mountainview.mozilla.com)
- # [22:31] * jwalden_ is now known as jwalden
- # [22:31] * gsnedders tries to follow the Oxford Guide to Style
- # [22:32] * gsnedders comes up with the probably stupid, "Vladimir Nabokov, The Enchanter [En. trans. of Volshebnik] (trans. Dmitri Nabokov) (London: Pan Books Ltd, 1987) (ISBN 0-330-29666-3)."
- # [22:33] * Quits: dolske (n=dolske@firefox/developer/dolske)
- # [22:35] * Philip` suggests focussing on the parts of the dissertation that are likely to result in marks :-)
- # [22:36] <hsivonen> gsnedders: at least it's positive that they approve of listing the ISBN
- # [22:36] <gsnedders> hsivonen: They don't, I ignored that part.
- # [22:36] <gsnedders> :)
- # [22:36] <hsivonen> oh well
- # [22:36] <gsnedders> hsivonen: They do however, as with most of the style guide, give a lot more flexibility than almost anything else
- # [22:37] <gsnedders> Philip`: Yeah, I should :)
- # [22:38] * Joins: dolske (n=dolske@corp-241.mountainview.mozilla.com)
- # [22:39] <gsnedders> Hixie: "A person's name is not the title of a work "
- # [22:40] <gsnedders> Hixie: Lolita's name is the title of the book about her!
- # [22:44] <Philip`> gsnedders: Only if you do a plain string comparison and ignore the context and semantics
- # [22:44] <gsnedders> Philip`: Oh, sure. :P
- # [22:44] <gsnedders> (and yes, I am doing my dissertation on such books)
- # [22:46] * Quits: famicom (i=famicom@5ED2FF2D.cable.ziggo.nl) ("Leaving")
- # [22:47] * Joins: famicom (n=famicom@5ED2FF2D.cable.ziggo.nl)
- # [22:47] * Quits: famicom (n=famicom@5ED2FF2D.cable.ziggo.nl) (Read error: 104 (Connection reset by peer))
- # [22:50] * Joins: famicom (n=famicom@5ED2FF2D.cable.ziggo.nl)
- # [22:52] * Quits: ap (n=ap@195.239.126.11)
- # [22:54] * Quits: famicom (n=famicom@5ED2FF2D.cable.ziggo.nl) (Client Quit)
- # [22:54] * Joins: virtuelv (n=virtuelv@74.80-202-66.nextgentel.com)
- # [22:55] <virtuelv> JohnResig: you around? You have a few broken links on http://docs.jquery.com/UI
- # [22:55] <virtuelv> (namely, all linked examples)
- # [23:14] * Joins: Lachy_ (n=Lachlan@rpl-ipsec-053.tip.csiro.au)
- # [23:26] * Quits: karlcow (n=karl@modemcable168.84-81-70.mc.videotron.ca) ("This computer has gone to sleep")
- # [23:28] * Joins: olliej (n=oliver@nat/apple/x-0d3562fa96f745ff)
- # [23:31] * Quits: Lachy (n=Lachlan@85.196.122.246) (Read error: 110 (Connection timed out))
- # [23:55] * Joins: Lachy__ (n=Lachlan@85.196.122.246)
- # Session Close: Tue Dec 23 00:00:00 2008
The end :)