Options:
- # Session Start: Mon Jul 23 00:00:00 2007
- # Session Ident: #html-wg
- # [00:03] * Joins: mjs (mjs@67.41.194.248)
- # [00:38] * Quits: mjs (mjs@67.41.194.248) (Ping timeout)
- # [00:46] * Joins: mjs (mjs@67.41.148.190)
- # [00:50] * Quits: tH (Rob@87.102.85.210) (Quit: ChatZilla 0.9.78.1-rdmsoft [XULRunner 1.8.0.9/2006120508])
- # [01:07] * Quits: gavin_ (gavin@63.245.208.169) (Ping timeout)
- # [01:09] * Quits: mjs (mjs@67.41.148.190) (Ping timeout)
- # [01:09] * Joins: gavin_ (gavin@63.245.208.169)
- # [01:12] * Quits: heycam (cam@203.214.127.179) (Ping timeout)
- # [01:17] * Joins: mjs (mjs@67.41.193.116)
- # [01:19] * Quits: zcorpan (zcorpan@84.216.41.90) (Ping timeout)
- # [01:43] * Joins: Zeros (Zeros-Elip@69.140.48.129)
- # [01:44] * Joins: heycam (cam@130.194.72.84)
- # [01:44] * Quits: mjs (mjs@67.41.193.116) (Ping timeout)
- # [01:50] * Joins: xover (xover@193.157.66.5)
- # [01:52] * Joins: mjs (mjs@70.56.48.154)
- # [01:56] * Quits: heycam (cam@130.194.72.84) (Quit: bye)
- # [01:56] * Joins: heycam (cam@130.194.72.84)
- # [02:08] * Joins: karl (karlcow@128.30.52.30)
- # [02:12] * Quits: Sander (svl@86.87.68.167) (Quit: And back he spurred like a madman, shrieking a curse to the sky.)
- # [02:16] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [02:21] * Joins: gavin (gavin@74.103.208.221)
- # [02:57] * Quits: mjs (mjs@70.56.48.154) (Ping timeout)
- # [03:05] * Joins: mjs (mjs@67.40.155.111)
- # [03:12] * Joins: olivier (ot@128.30.52.30)
- # [03:18] <karl> http://www.gizmosforgeeks.com/2007/07/20/new-html-spec-v5/
- # [03:40] * Quits: mjs (mjs@67.40.155.111) (Ping timeout)
- # [03:46] * Joins: mjs (mjs@67.41.192.213)
- # [04:23] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [04:28] * Joins: gavin (gavin@74.103.208.221)
- # [04:53] * Quits: mjs (mjs@67.41.192.213) (Quit: mjs)
- # [04:58] <karl> http://www.la-grange.net/2007/07/23-japanese-typography
- # [04:58] <karl> some example of Japanese conventions.
- # [04:58] <karl> I have tried equivalents of strong and em (or even bold) in a few texts around me, without success at all.
- # [04:59] <karl> italics was inexistant and bold fonts were all used for titles.
- # [05:06] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
- # [05:10] * Joins: schepers (schepers@128.30.52.30)
- # [06:30] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [06:33] <karl> just discovered that Robert Burns was the author of http://en.wikipedia.org/wiki/Auld_Lang_Syne
- # [06:34] <schepers> that may be a different Robbie Burns ;P
- # [06:35] * Joins: gavin (gavin@74.103.208.221)
- # [06:50] * Quits: schepers (schepers@128.30.52.30) (Client exited)
- # [06:57] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [06:57] * Joins: gavin (gavin@74.103.208.221)
- # [07:11] * Joins: schepers (schepers@128.30.52.30)
- # [07:24] * Joins: mjs (mjs@67.41.153.80)
- # [07:57] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [07:57] * Joins: gavin (gavin@74.103.208.221)
- # [08:01] * olivier is now known as dan
- # [08:10] * dan is now known as olivier
- # [08:10] * Quits: heycam (cam@130.194.72.84) (Quit: bye)
- # [08:22] * Quits: olivier (ot@128.30.52.30) (Quit: Leaving)
- # [08:22] * Joins: olivier (ot@128.30.52.30)
- # [08:28] * Joins: ROBOd (robod@86.34.246.154)
- # [08:47] <hsivonen> it is sad how many people think that HTML5 is wrong because they have an illusion of XML when they write XHTML as text/html
- # [08:54] <karl> hsivonen: then you must be crying when watching "you've got mail" - http://www.imdb.com/title/tt0128853/
- # [08:54] <olivier> :)
- # [09:05] <hsivonen> karl: I don't know what you mean, because I have not seen the movie.
- # [09:06] * Joins: heycam (cam@203.214.127.179)
- # [09:16] * Joins: mjs_ (mjs@67.41.147.12)
- # [09:17] * Quits: mjs (mjs@67.41.153.80) (Ping timeout)
- # [09:25] * Quits: olivier (ot@128.30.52.30) (Quit: Leaving)
- # [09:28] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
- # [09:57] * Quits: mjs_ (mjs@67.41.147.12) (Ping timeout)
- # [09:59] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [10:00] * Joins: billyjack (MikeSmith@mcclure.w3.org)
- # [10:00] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
- # [10:01] * billyjack is now known as MikeSmith
- # [10:06] * Joins: gavin (gavin@74.103.208.221)
- # [10:20] <beowulf> hsivonen: that is sad, but also an example of being careful what is taught to people
- # [10:25] * Joins: zcorpan (zcorpan@84.216.41.25)
- # [10:32] * Joins: mjs (mjs@67.41.136.143)
- # [10:33] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [10:34] * Joins: gavin (gavin@74.103.208.221)
- # [10:34] * Quits: Zeros (Zeros-Elip@69.140.48.129) (Quit: Leaving)
- # [10:39] * Quits: mjs (mjs@67.41.136.143) (Ping timeout)
- # [10:39] * Joins: mjs (mjs@67.41.136.143)
- # [10:41] <hsivonen> beowulf: which way do you mean? do you mean that the XHTML propaganda effort was a mistake? or that it should be continued to be taught?
- # [10:44] <beowulf> at this point I feel the xhtml propaganda effort was in a sense misleading
- # [10:44] <beowulf> i wouldn't say mistake though, it has plenty of positive outcomes
- # [10:44] <beowulf> s/outcomes/consequences
- # [10:45] <mjs> what were the positive consequences?
- # [10:45] <beowulf> as a movement it led people to think more about what they write
- # [10:46] <mjs> the Inquisition led people to think deeply about their religious convictions
- # [10:46] <beowulf> i can only speak for myself
- # [10:47] <beowulf> i wouldn't call it a mistake or compare it to the Inquisition
- # [10:48] <beowulf> given a choice between well written html and well written appendix c xhtml i wouldn't much care
- # [10:48] <beowulf> but i rarely see well written html
- # [10:49] <hsivonen> beowulf: is Anne's blog well-written HTML according to your definition of well-written?
- # [10:49] * beowulf looks
- # [10:50] <beowulf> at first glance, yes
- # [10:50] <hsivonen> beowulf: ok
- # [10:50] <mjs> I'm just saying, leading people to think about something by saying wrong things isn't what I would consider a positive consequence on the whole
- # [10:51] <beowulf> fair enough
- # [10:55] <beowulf> what you need then is a Zeldman for html
- # [10:57] <zcorpan> POSH
- # [10:58] <hsivonen> is Tantek's POSH HTML or XHTML-as-text/html?
- # [10:58] <beowulf> POSH has so far been a whisper in some quaint old corner of the web
- # [11:00] * Quits: mjs (mjs@67.41.136.143) (Ping timeout)
- # [11:02] <beowulf> plus POSH is hard to sell i'd imagine
- # [11:02] <beowulf> "you recall we rewrote the corporate site from html to xhtml to future proof and make all things wonderful? Well..."
- # [11:04] <hsivonen> the question we should be asking is why was XHTML-as-text/html easier to sell than HTML 4.01 Strict? The people going to XHTML Transitional as text/html felt it was forward-looking while HTML 4.01 Strict wasn't appealing
- # [11:05] <hsivonen> XHTML was all about the /> which has no technical effect in text/html. focusing on that felt like doing something, but no matter if you did it carefully or sloppily, it didn't really matter
- # [11:05] <hsivonen> Strict, OTOH, becomes an inconvenience that actually matters on terms of what works in browsers
- # [11:06] <hsivonen> s/on/in/
- # [11:08] * Joins: mjs (mjs@67.41.152.68)
- #
- # Session Start: Mon Jul 23 11:11:35 2007
- # Session Ident: #html-wg
- # [11:11] * Now talking in #html-wg
- # [11:11] * Topic is 'HTML WG http://www.w3.org/html/wg/ logged: http://krijnhoetmer.nl/irc-logs/'
- # [11:11] * Set by Zeros on Mon Apr 30 23:38:28
- # [11:13] <mjs> when I first heard about XML (this was before really knowing anything about technology) my firs thought was, "but this doesn't actually *do* anything"
- # [11:14] <hsivonen> mjs: semantics, not behavior :-)
- # [11:15] <mjs> s/anything about technology/anything about web technology/
- # [11:15] <mjs> I also remember around this same time having an exchange about the <object> tag with an HTML4 enthusiast
- # [11:15] <mjs> him: there's this great new tag, it's called <object>
- # [11:15] <mjs> me: great! what does it do?
- # [11:16] <mjs> him: it can do anything
- # [11:16] <mjs> me: how do I tell it what to actually do in a specific case?
- # [11:16] <mjs> him: that's undefined
- # [11:16] <mjs> me: I thought you said it was great
- # [11:17] <beowulf> :)
- # [11:18] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [11:19] * Joins: gavin (gavin@74.103.208.221)
- # [11:26] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Ping timeout)
- # [11:31] * Joins: MikeSmith (MikeSmith@mcclure.w3.org)
- # [11:47] * Quits: mjs (mjs@67.41.152.68) (Ping timeout)
- # [11:52] * Joins: tH (Rob@87.102.85.210)
- # [11:59] * Joins: Lachy (chatzilla@203.214.140.60)
- # [12:07] * Joins: myakura (myakura@58.88.37.26)
- # [12:14] * Joins: mjs (mjs@67.41.152.68)
- # [13:21] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [13:23] * Quits: mjs (mjs@67.41.152.68) (Ping timeout)
- # [13:23] * Joins: olivier (ot@128.30.52.30)
- # [13:26] * Quits: zcorpan (zcorpan@84.216.41.25) (Ping timeout)
- # [13:26] * Joins: gavin (gavin@74.103.208.221)
- # [13:56] * Parts: Lionheart (robin@66.57.69.65)
- # [13:58] * Quits: MikeSmith (MikeSmith@mcclure.w3.org) (Quit: Less talk, more pimp walk.)
- # [14:01] * Quits: Lachy (chatzilla@203.214.140.60) (Connection reset by peer)
- # [14:14] * Joins: zcorpan (zcorpan@84.216.41.25)
- # [14:18] <zcorpan> hsivonen: in validator.nu, choosing the HTML5 (prerelease schema) and the HTML parser, it says "Schema Error: The chosen preset schema is not appropriate for HTML."
- # [14:18] <hsivonen> whoa
- # [14:18] <anne> I also can't validate annevankesteren.nl/contact using html5.validator.nu...
- # [14:19] <anne> It aborts on IO and mumbles something about XHTML mode
- # [14:19] <anne> "Validator.nu is validation 2.0" :)
- # [14:20] <hsivonen> these are the kind of reasons why I mentioned it on IRC before making other announcements
- # [14:21] <hsivonen> validation 2.0, like Web 2.0, is in perpetual beta
- # [14:28] <hsivonen> anne: apparently, the way ifs fall, it mumbles about the XHTML mode if it dies before it had a chance to choos the mode...
- # [14:30] <hsivonen> zcorpan: fixed. I think.
- # [14:32] <hsivonen> anne: I get a non-200 HTTP status
- # [14:33] <zcorpan> hsivonen: the javascript needs fixing too
- # [14:33] <zcorpan> hsivonen: or nm
- # [14:33] <hsivonen> anne: I suspect your server has the same problem as krijn's had a few days ago.
- # [14:33] <anne> hsivonen, oh?
- # [14:33] <hsivonen> zcorpan: did you reload?
- # [14:33] <zcorpan> hsivonen: had a cached version of the js file
- # [14:34] <hsivonen> anne: probably something to do with content negotiation as the generic facet doesn't fail
- # [14:34] <hsivonen> I'll improve diagnostics
- # [14:34] <anne> hmm, now it does work
- # [14:38] <hsivonen> anne: your server says 406
- # [14:41] <anne> oh
- # [14:41] <anne> I guess that has something to do with conneg, yes
- # [14:41] <hsivonen> anne: chances are you are relying on */*
- # [14:42] <hsivonen> anne: the html5 facet does not Accept */*
- # [14:42] <anne> prolly
- # [14:42] <hsivonen> anne: in krijn's case, it was about Apache 1.3 PHP mapping and negotiation not working together
- # [14:46] <krijnh> \o
- # [14:55] <zcorpan> o/
- # [14:56] <hsivonen> anne: now with a slightly better error message: http://html5.validator.nu/?doc=http%3A%2F%2Fannevankesteren.nl%2Fcontact
- # [14:59] <hsivonen> anne: you have the exact same problem that krijnh had: Available variants: application/x-httpd-php
- # [14:59] <anne> yeah, makes sense
- # [14:59] <krijnh> anne: also running Apache 1.3?
- # [14:59] <anne> could be
- # [15:00] <anne> anyway, got to go
- # [15:00] <hsivonen> I'm mildly amused about how conneg is supposed to be great and then something as common as Apache+PHP is b0rked
- # [15:06] * Joins: Sander (svl@86.87.68.167)
- # [15:07] <hsivonen> hmm. looks like I had fallen for the classic way of making a page invalid
- # [15:07] <hsivonen> I had copied the CVSDude badge HTML boilerplace
- # [15:07] <krijnh> Even you? Wow :)
- # [15:08] <krijnh> Hey Sander
- # [15:08] <Sander> oi
- # [15:08] <krijnh> Are you coming to Delft Thursday?
- # [15:09] <Sander> I am
- # [15:09] <krijnh> (You're the zoid guy right?)
- # [15:09] <Sander> probably not
- # [15:09] <krijnh> Hmm
- # [15:09] * Sander tries to think what zoid would be
- # [15:09] <krijnh> Never mind then :)
- # [15:09] <krijnh> Ah, you're from haveskill
- # [15:10] <Sander> there's too many sander's in this country doing web standards stuff. :D
- # [15:10] <krijnh> Yeah ;)
- # [15:10] <Sander> I am indeed. :)
- # [15:10] <krijnh> Were you at Info.nl too last month?
- # [15:10] <Sander> No, I was in the USA at that point, alas
- # [15:26] * Joins: karl (karlcow@128.30.52.30)
- # [15:28] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [15:34] * Joins: gavin (gavin@74.103.208.221)
- # [15:37] <hsivonen> now that I made XHTML-1.0-as-text/html non-fatal, I don't know how to communicate to users the Jing-level errors about xml:lang.
- # [15:37] <hsivonen> that is, that the attribute that the schema does not allow is not lang in the XML namespace but xml:lang in no namespace
- # [15:50] * Quits: myakura (myakura@58.88.37.26) (Quit: Leaving...)
- # [15:54] * Quits: karl (karlcow@128.30.52.30) (Quit: Where dwelt Ymir, or wherein did he find sustenance?)
- # [15:55] * Joins: Lionheart (robin@198.86.248.1)
- # [16:26] * Joins: billmason (billmason@69.30.57.156)
- # [16:32] * Quits: Lionheart (robin@198.86.248.1) (Ping timeout)
- # [16:59] * Quits: olivier (ot@128.30.52.30) (Quit: Leaving)
- # [17:31] <zcorpan> hsivonen: "Namespaces do not work in text/html, and hence, xml:* attributes cannot be used in text/html." or some such
- # [17:37] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [17:39] <zcorpan> hsivonen: is /> the only HTML4-specific tokenization error?
- # [17:42] * Joins: gavin (gavin@74.103.208.221)
- # [18:41] <hsivonen> zcorpan: thanks. a new version of the XHTML-as-text/html info message should go live soonish as the service rebuilds itself
- # [18:42] <hsivonen> zcorpan: no, there are other HTML 4-specific errors
- # [18:42] <hsivonen> a list follows:
- # [18:42] <hsivonen> valueless boolean attributes
- # [18:43] <hsivonen> unquoted attributes with non-Name values
- # [18:43] <hsivonen> </ in CDATA or RCDATA
- # [18:43] <hsivonen> plus />
- # [18:43] <hsivonen> that's it for now
- # [18:44] <zcorpan> </ is allowed in cdata and rcdata unless it is followed by a name start character
- # [18:44] <zcorpan> iirc
- # [18:44] <hsivonen> moreover, it appears that some HTML case-insensitivity stuff regressed with the parser rewrite
- # [18:45] <hsivonen> zcorpan: interesting. I wasn't aware of that
- # [18:45] <hsivonen> hmm. thinking again, I was but not with the right terms
- # [18:53] <hsivonen> zcorpan: did HTML 4 have separate name characters and name start characters?
- # [18:54] <hsivonen> apparently yes
- # [18:57] <hsivonen> zcorpan: fix checked in. the service will take a while to rebuild
- # [18:58] <hsivonen> (I'm starting to think I might actually need two JVM instances to avoid these Service Temporarily Unavailable periods)
- # [19:01] <Philip`> "The measurable study would be the number of pages with XHTML DOCTYPEs, served as HTML, and containing markup that would have unintended consequences if served as XML"
- # [19:01] * Philip` wonders if anyone has data on how many XHTML-as-text/html pages are not even well-formed, and what are the most common causes of ill-formedness
- # [19:02] <Philip`> (All I can tell from my collected data is that 50% of XHTML-doctyped pages cause parse errors in the HTML5 tokeniser, which is marginally worse than the 45% of not-just-XHTML pages)
- # [19:03] <hsivonen> Philip`: do you keep a local copy of the files that you got when dereferencing dmoz URLs?
- # [19:04] <Philip`> I don't
- # [19:05] <Philip`> (since it'd probably be around half a gigabyte for 8K pages, which is not entirely negligible)
- # [19:07] <hsivonen> depends on your free disk space vs. your network downstream, I guess
- # [19:07] <hsivonen> I have a puny 1 Mbps downstream
- # [19:09] <zcorpan> ATTSPLEN 65536 -- These are the largest values --
- # [19:09] <zcorpan> LITLEN 65536 -- permitted in the declaration --
- # [19:09] <zcorpan> NAMELEN 65536 -- Avoid fixed limits in actual --
- # [19:09] <zcorpan> PILEN 65536 -- implementations of HTML UA's --
- # [19:09] <zcorpan> is in the sgml declaration for html4
- # [19:10] <zcorpan> even the sgml declaration for html4 admits that HTML UAs are not based on sgml
- # [19:13] <Philip`> hsivonen: I was using my computer, which has approximately no disk space except during the brief periods in which I've deleted some junk and not filled it up again, and a university one where I was just borrowing its /tmp and can't do permanent storage
- # [19:16] <Philip`> (I think the bottleneck ended up being in the way that I spawned two processes (curl and the tokeniser) for each downloaded page, which didn't work too badly but could probably be done much better)
- # [19:17] <zcorpan> hsivonen: you may want to warn about minimized href and src attributes since they get dropped in internet explorer
- # [19:18] <hsivonen> I have been hoping that someone on public-html curious enough about verifying Hixie's results to write a test harness in Java. Hasn't happened yet...
- # [19:18] <hsivonen> zcorpan: are those two the only ones?
- # [19:19] <Philip`> ((Still got about six pages per second (downloaded + tokenised), which seems much better than the ~0.2/sec from http://triin.net/2006/06/12/Running_the_program (though not collecting as much information about each page)))
- # [19:20] <zcorpan> hsivonen: yes
- # [19:23] <hsivonen> zcorpan: added. should be live in a few moments
- # [19:25] <hsivonen> Philip`: did the triin.net guy measure the total byte size of the stuff that was downloaded?
- # [19:26] <hsivonen> I wonder what would be an efficient way of storing the original content-type and URI along with the payload on disk...
- # [19:26] <Philip`> You could try to guess the numbers from http://triin.net/archive/kool/webstat/figure-8.png
- # [19:27] * Joins: kingryan (rking3@208.66.64.47)
- # [19:38] <Philip`> Of the 543 pages with XHTML doctypes, 329 had unrecognised entity names in attributes (which seems to always be <a href="a?b&c">), 159 had '?' in the tag open state (I guess <?xml...>), 94 had non-permitted slashes (<something/>), 38 had duplicate attributes
- # [19:39] <hsivonen> Philip`: based on the figure, there's only about 8 gigabytes to download
- # [19:42] <Philip`> Looks more like 20GB to me, since the mean is around 30KB and there's about 0.8 million in total
- # [19:42] <hsivonen> ok
- # [19:43] <Philip`> 20GB / 1Mbps = 1.9 days so it's not that bad :-)
- # [19:43] <hsivonen> I'm going away for a couple of days. too bad I don't have slurper software standing by
- # [19:44] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [19:45] <hsivonen> I wonder how many files per zip file java.util.zip can handle
- # [19:45] <hsivonen> or how many files per directory HFS+ can handle before melting down
- # [19:49] * Joins: gavin (gavin@74.103.208.221)
- # [19:49] <Philip`> I don't think the zip format is especially perfect for adding lots of files one at a time, given how it has a file table at the end that it'd have to keep rewriting
- # [19:50] <Philip`> but maybe that's a negligible problem if it's only thousands per zip file
- # [20:05] * Joins: hasather (hasather@80.203.71.22)
- # [20:19] * Quits: jgraham (jgraham@81.86.209.151) (Quit: Ex-Chat)
- # [20:32] <zcorpan> does anyone understand what robert burns is on to with consistency and createElement()?
- # [20:32] * Parts: hasather (hasather@80.203.71.22)
- # [20:32] * Joins: hasather (hasather@80.203.71.22)
- # [20:34] <hsivonen> zcorpan: he seems to believe that createElement() magically does the right thing if you only use elements from one namespace
- # [20:39] <Philip`> Is that true only if his understanding of "the right thing" doesn't include elements being treated like HTML elements (e.g. if you did createElement('b') it wouldn't be rendered as bold)?
- # [20:40] <hsivonen> I'm not going to guess his intent further.
- # [20:48] * Joins: jgraham (jgraham@81.86.208.107)
- # [21:08] <hsivonen> I'm getting more curious about what it is that Rob Burns is implementing
- # [21:35] * Parts: hasather (hasather@80.203.71.22)
- # [21:36] * Joins: edas (edaspet@88.191.34.123)
- # [21:36] * Joins: hasather (hasather@80.203.71.22)
- # [21:51] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # [21:56] * Joins: gavin (gavin@74.103.208.221)
- # [22:01] <jgraham> I'm getting more despondent about the #kB unread email I have on public-html
- # [22:02] <jgraham> and my lack of desire to read it
- # [22:18] * Joins: edaspet (edaspet@88.191.34.123)
- # [22:18] <jgraham> Only 8 emails to go, of which are from Rob Burns, and which weigh in at a total of 111kB
- # [22:19] <anne> nice
- # [22:19] <jgraham> s of which/7 of which/
- # [22:20] * Quits: edas (edaspet@88.191.34.123) (Ping timeout)
- # [22:25] * Quits: edaspet (edaspet@88.191.34.123) (Client exited)
- # [22:25] <anne> would be nice if RB provided some use cases and real problems all his new elements are solving
- # [22:25] * Joins: edas (edaspet@88.191.34.123)
- # [22:26] <hsivonen> anne: they are solving the problem of expressing precise semantics
- # [22:27] <anne> yeah, RDF does so too I'm told
- # [22:29] <jgraham> Expressing precise semantics is not, in itself, a use case
- # [22:29] <jgraham> Maybe I should post that
- # [22:29] <jgraham> But I feel bad sending people more email
- # [22:30] <anne> me too
- # [22:30] <anne> everytime I open the reply window and type something I close it a few seconds later because it seems rather pointless
- # [22:31] <anne> (I now stopped doing that; just reading)
- # [22:43] * Quits: schepers (schepers@128.30.52.30) (Quit: Trillian (http://www.ceruleanstudios.com)
- # [22:45] * Joins: Zeros (Zeros-Elip@67.154.87.254)
- # [22:53] * Quits: ROBOd (robod@86.34.246.154) (Quit: http://www.robodesign.ro )
- # [23:00] * Joins: mjs (mjs@67.41.204.169)
- # [23:13] <anne> jgraham, when are we going to do another release of html5lib?
- # [23:14] <anne> actually, what I'm more interested in is the plans there were at some point for a C version... have those progressed?
- # [23:15] <Philip`> I've been working on a partial C++ (and JS and Perl) version, which could be useful for that
- # [23:15] <Philip`> (I guess it'd be reasonably easy to port to straight C if necessary)
- # [23:16] <anne> 3 parsers at once? fancy
- # [23:17] <Philip`> I've just written one in OCaml, and a compiler with C++/JS/Perl code-generation backends
- # [23:17] <Philip`> (Only done the tokeniser, though)
- # [23:18] <anne> doing that for tree construction might get tricky
- # [23:18] <anne> although I suppose there's some logic there as well :)
- # [23:21] <Philip`> http://canvex.lazyilluminati.com/svn/tokeniser/tokeniser_spec.ml is the meta-implementation of the algorithm - most of the words in there still have to be implemented manually in each language, but that can be fairly straightforward
- # [23:25] * Quits: mjs (mjs@67.41.204.169) (Ping timeout)
- # [23:33] * Joins: mjs (mjs@67.41.152.66)
- # [23:38] <anne> I suppose in theory you can map the tree construction stuff to something similar
- # [23:38] <jgraham> anne: We should do one soon. I think we should make a few improvements to charsUntill in the tokenizer and then put the release out
- # [23:39] <jgraham> We can do the new charset detection stuff for 0.11
- # [23:39] <anne> k
- # [23:39] <anne> I wonder how much further changes charset detection will get
- # [23:40] <jgraham> I'm also interested in taking Philip`'s C++ tokenizer and hooking it up to python via SWIG or similar. But I need to motivate myself to actually learn a little more C++ than I know to do that.
- # [23:46] <Philip`> What would be involved in the C++/Python interface? I guess it's just transferring characters and tokens, but I don't know which side should be pushing/pulling or what kind of data structures they should pass around
- # [23:47] * Philip` finishes creating the Perl port of his JS port of his C++ tokeniser, and tries to work out how to run tests and see how many it fails...
- # [23:47] <zcorpan> mjs: html5 already requires all Document objects to implement HTMLDocument and other supported interfaces (like SVGDocument)
- # [23:48] <anne> "(This is the case whether or not the document in question is an HTML document or indeed whether it contains any HTML elements at all.)"
- # [23:52] <jgraham> Philip`: I envision python pulling tokens from C++ (basically html5lib views the tokenizer as an iterator which produces a sequqnce of tokens)
- # [23:53] <jgraham> So I think you need something like a emitToken method on the C++ side which returns a pointer to the next token
- # [23:53] <jgraham> Then the interface code turns that into a python object
- # [23:54] <jgraham> Or something
- # [23:54] <Philip`> Where would the C++ side get characters from?
- # [23:56] <zcorpan> mjs: i have tests on that, btw: http://simon.html5.org/test/html/dom/interfaces/Document/
- # [23:57] <jgraham> I guess you have to pass it something it can interpret as file-like
- # [23:57] * Joins: myakura (myakura@58.88.37.26)
- # [23:57] <jgraham> and then the C++ side would read the file
- # [23:57] <mjs> zcorpan: what I meant was requiring createElement to create elements in the HTML namespace for all documents
- # [23:57] <anne> oooh
- # [23:57] <mjs> zcorpan: HTML5 only requires that for HTML documents
- # [23:58] <mjs> sorry for being unclear
- # [23:58] <anne> i suppose it makes some sense
- # [23:59] * Quits: gavin (gavin@74.103.208.221) (Ping timeout)
- # Session Close: Tue Jul 24 00:00:00 2007
The end :)