Options:
- # Session Start: Wed Dec 12 00:00:00 2007
- # Session Ident: #whatwg
- # [00:01] <Hixie> it's hard to know how to react to people who say that apple and nokia are lying
- # [00:01] <Hixie> especially since google has the same concern and i know that google isn't lying...
- # [00:01] <hubick> Hixie: I think you should make more noise about free alternatives like h.261
- # [00:02] <Hixie> h.261 sucks
- # [00:02] <jgraham_> Hixie: Well of course you ould say Google aren't lying ;)
- # [00:02] <jgraham_> s/ould/would/
- # [00:02] <Hixie> jgraham_: well, right, that's why i haven't mentioned that
- # [00:02] <Hixie> jgraham_: people would just say that i was part of the conspiracy
- # [00:03] <Hixie> the problem is that even if you don't believe the submarine patent risk, you still have the problem that apple won't implement ogg
- # [00:03] <Hixie> so whether we are being played or not doesn't really matter, if what we want is interop
- # [00:03] <Hixie> sihg
- # [00:03] <Hixie> i'll reply when i get to work
- # [00:03] <Hixie> afk for now
- # [00:04] <jgraham_> Indeed. But I think it's not hard to understand the conspiracy theories. It's so common for companies, especially large companies, to lie
- # [00:05] <Dashiva> Even if they didn't lie, they would still be doing the same actions, though
- # [00:05] <jgraham_> so I think you'll just have to ignore the conspiracy theories and hope that people can find a technical solution
- # [00:05] <hubick> I have taken a survey of the room here at work, and Linux has more market share than Apple, and they will ship ogg, so you are set :)
- # [00:07] <jgraham_> Dashiva: I'm not sying they _are_ lying. I'm saying that they're suffering from "boy who cried wolf" syndrome (does that analogy translate?)
- # [00:07] <hubick> I'm not supposed to believe Apple is doing this just to protect their interest in pushing the whole world to using only Quick Time?
- # [00:09] <othermaciej> do you mean QuickTime the media container format, or QuickTime the media framework software?
- # [00:09] <othermaciej> the former is clearly false, Apple primarily advocates the MPEG-4 container and family of codecs
- # [00:09] <hubick> I mean the Whatever That Stuff To Do With Watching Video I Need To Get From Apple
- # [00:10] <jgraham_> hubick: Ultimately the question of why they are doing it can only be answered by seeing if they act in good faith to find a solution.
- # [00:10] <othermaciej> I don't think QuickTime downloads for Windows are a huge revenue driver for Apple, but I'm not privy to financial info on that
- # [00:10] <hubick> ask them to grant all patents on their format and donate it to the web then
- # [00:10] <Dashiva> jgraham: I know. I'm just saying, regardless of whether they lie or not, we have to consider their actions rather than their words
- # [00:11] * Joins: DIrtyF (n=DirtyF@gar31-2-82-224-211-195.fbx.proxad.net)
- # [00:11] * Parts: DIrtyF (n=DirtyF@gar31-2-82-224-211-195.fbx.proxad.net)
- # [00:11] <hubick> What about "Dirac" ?
- # [00:11] <jgraham_> hubick: AIUI Apple + Nokia don't actually hold all the necessary patents, just license them
- # [00:11] * Joins: DIrtyF (n=DirtyF@gar31-2-82-224-211-195.fbx.proxad.net)
- # [00:12] <jgraham_> hubick: I guess Dirac is a solution if it is agreed to be free of IPR issues
- # [00:12] * Quits: DIrtyF (n=DirtyF@gar31-2-82-224-211-195.fbx.proxad.net) (Client Quit)
- # [00:12] * Joins: DIrtyF (n=DirtyF@gar31-2-82-224-211-195.fbx.proxad.net)
- # [00:12] * Quits: DIrtyF (n=DirtyF@gar31-2-82-224-211-195.fbx.proxad.net) (Remote closed the connection)
- # [00:13] <hubick> I'm guessing the same "possibility of submarine patents" argument will be made against Dirac.
- # [00:13] <jgraham_> Dashiva: I agree entirely. At this juncture the actions we need to consider are their efforts to find a acceptable codec
- # [00:13] <othermaciej> Apple's actions so far have involved negotiating with codec licensing groups, asking the w3c to do patent searches (it's risky for large corporations to do a patent search), and providing a long list of possible codecs for the w3c to consider (w/ a brief summary of the tradeoffs)
- # [00:14] <hubick> is this list public?
- # [00:14] <othermaciej> sure, it's included in a summary of codec issues that Dave Singer sent to public-html a few months ago
- # [00:15] <anne-mac> public-html: http://lists.w3.org/Archives/Public/public-html/
- # [00:16] <hubick> Is public-html why the www-html I am subscribed to seems so dead?
- # [00:17] * Joins: grimeboy (n=grimboy@85-211-246-139.dsl.pipex.com)
- # [00:17] <mpt> http://lists.w3.org/Archives/Public/public-html/2007Nov/0153.html
- # [00:17] <othermaciej> I don't think Dirac is materially different from Theora, from an IP risk point of view
- # [00:18] <othermaciej> the thing that makes a big difference for MPEG is broad-based open process with disclosure requirements
- # [00:18] <othermaciej> that mitigates a lot (but not all) of the risk
- # [00:18] <othermaciej> something that *would* help a lot is a codec developed through an RF-license open standards process
- # [00:19] <othermaciej> (that could include taking Dirac or Theora through such a process)
- # [00:19] * Joins: ianloic (i=yakk@glub.dreamhostps.com)
- # [00:19] <anne-mac> hubick, this could be true, yes
- # [00:20] <anne-mac> joining the HTML WG (and public-html): http://blog.whatwg.org/w3c-restarts-html-effort
- # [00:21] * Quits: kingryan (n=kingryan@74.95.195.25)
- # [00:26] <hubick> I dunno if I should subscribe... all these issues just make me frustrated. I don't know how you people involved with creating these standards keep up morale for so long while creating standards that may never see the light of implementation day.
- # [00:27] * Joins: parcelbrat (n=parcelbr@96.239.197.10)
- # [00:27] <anne-mac> the company I work for implements stuff, so that helps :)
- # [00:27] <parcelbrat> can someone point me to the logs for any discussion on the removal of the ogg codec's from the spec?
- # [00:27] <hubick> parcelbrat: http://lists.w3.org/Archives/Public/public-html/2007Nov/0153.html (courtesy of mpt to me a moment ago)
- # [00:28] * jgraham_ finds the web standards stuff has more of a "people care abut this" vibe than Astrophysics
- # [00:28] <parcelbrat> hubick: thanks
- # [00:28] <parcelbrat> has it been a common topic?
- # [00:29] <hubick> parcelbrat: I been bugging them in here for almost an hour now :)
- # [00:29] <anne-mac> it has been discussed when <video> got introduced and during the technical planery a month ago
- # [00:29] <anne-mac> and now
- # [00:30] <anne-mac> same arguments each time iirc
- # [00:31] <parcelbrat> anne-mac: am i reading correctly that whatwg is still wanting support for the ogg formats? and it will be discussed more, or have they been tossed?
- # [00:32] <Philip`> jgraham_: There's as many HTML documents as there are stars in our galaxy (to within a couple of orders of magnitude), and the web is growing exponentially faster than the galaxy, so it's obviously the more interesting area :-)
- # [00:32] <hubick> parcelbrat: sounds like they are looking for alternatives, and a likely candidate may be h.261
- # [00:32] <anne-mac> parcelbrat, it will most certainly be discussed more
- # [00:33] <anne-mac> and W3C is looking into a patent search as I understand things
- # [00:33] <anne-mac> for Ogg stuff
- # [00:33] <jgraham_> parcelbrat: The WHATWG wants support for some freely-implementable formats
- # [00:36] <parcelbrat> so slashdot is over-reacting?
- # [00:36] <parcelbrat> http://yro.slashdot.org/yro/07/12/11/1339251.shtml
- # [00:36] * jgraham_ wonders how Philip` arrived at ~100 billion HTML documents
- # [00:36] <gavin> slashdot? overreacting? impossible!!
- # [00:36] <jgraham_> parcelbrat: If they're not then it would be the first time for this particular thing
- # [00:36] <parcelbrat> true
- # [00:37] <hubick> the bottom line is that that it *was* there and now it's not though.
- # [00:37] <parcelbrat> i knew there was a reason i avoid slashdot usually, ars is usually less dramatic
- # [00:37] <parcelbrat> hubick: was and not, but isn't banned, right?
- # [00:38] * parcelbrat rhetorical
- # [00:38] <hubick> parcelbrat: I find Ars editors have just as biased takes on topics, if less obvious
- # [00:38] <parcelbrat> biased yes, just less dramatic
- # [00:38] <Philip`> jgraham_: Searching for e.g. "a" on Yahoo gives reportedly 20 billion matches, and Yahoo probably misses lots of pages so round it up to 10^11
- # [00:38] <anne-mac> /. is written by contributors
- # [00:39] <anne-mac> this post happens to be written by a guy promoting his own post and providing all the incorrect links
- # [00:39] <anne-mac> who also posts to the WHATWG list a lot
- # [00:39] <parcelbrat> is he dramatic on the list too?
- # [00:39] <hubick> parcelbrat: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-December/013152.html
- # [00:43] <parcelbrat> <sarcasm>wow very weasely</sarcasm>
- # [00:43] <jgraham_> Philip`: I guess 10^(11±2) search-engine cached HTML documents might not be such a bad estimate, although that's not quite "total number of HTML documents"
- # [00:43] <parcelbrat> hubick: that definitely clears stuff up for me, thanks
- # [00:43] <jgraham_> s/cached/indexed/ I guess
- # [00:44] <Philip`> (Hooray, my cached downloader actually works)
- # [00:44] <hubick> Philip`: what are you building?
- # [00:45] <Philip`> hubick: Something to download and analyse lots of web pages (for a small value of "lots", like tens of thousands)
- # [00:46] <anne-mac> Philip`, didn't you already have sniffing for lots of pages?
- # [00:46] <hubick> Philip`: If you need graphs and haven't tried it yet, I highly recommend jFreeChart
- # [00:47] * anne-mac though Philip` had hsivonen's stuff up and running
- # [00:47] <Philip`> anne-mac: Yes, but I've rewritten stuff so it caches the pages and doesn't have to download hundreds of megabytes of HTML every time I run it
- # [00:47] <hubick> hsivonen: which reminds me... I hope you received the patch I sent you for the htmlparser Maven metadata
- # [00:48] <Philip`> anne-mac: http://canvex.lazyilluminati.com/misc/media.xml is from 256 pages - is that about what you want?
- # [00:49] * anne-mac liked 10s of 1000s
- # [00:49] <Philip`> I'm going to move it onto a better computer to run on more pages :-)
- # [00:49] <Philip`> assuming it's going to be doing the right thing
- # [00:49] <anne-mac> Philip`, I guess that page only shows something on Firefox or something?
- # [00:50] * anne-mac gets a blank in Opera
- # [00:50] <Philip`> anne-mac: View source :-p
- # [00:50] <anne-mac> and Safari
- # [00:50] <anne-mac> hmm
- # [00:50] <Philip`> It's "XML" - I expect you've heard of that before
- # [00:50] <anne-mac> what's this thing you call view source? :p
- # [00:51] <mpt> View Source is for hackers
- # [00:51] * Joins: dglazkov (n=dglazkov@adsl-074-229-248-021.sip.bhm.bellsouth.net)
- # [00:51] <parcelbrat> what is this XML you speak of?
- # [00:51] * Philip` is just generating a big XML dump of lots of headers and attributes and stuff, then using xml_grep to extract the @media values
- # [00:52] <anne-mac> so far people seem to comply to the arbitrary media= standards
- # [00:52] <anne-mac> i'm amazed
- # [00:53] <Philip`> parcelbrat: It's kind of like HTML, except the brackets are anglier
- # [00:53] * parcelbrat classic
- # [00:54] * Philip` has absolutely no idea how many concurrent downloading/processing threads to run
- # [00:55] * jgraham_ is sure the brackets are actually angrier; just look how upset they get when they don't get a partner...
- # [00:55] * Quits: hasather (n=hasather@90-231-107-133-no62.tbcn.telia.com) ("leaving")
- # [00:55] <hubick> At Linux world 2000 the Konqueror guys were telling me how awesome their browser was, so I loaded my home page which uses @media CSS tags in it, at which point it promptly disappeared *poof* from the screen, leaving them quite embarassed and quiet :)
- # [00:56] <mpt> <xml> ≪xml2≫
- # [00:56] <parcelbrat> oooh, xml2 is even anglier than xml!
- # [00:57] <hubick> mpt: so, xml will eventually evolve into Lisp then?
- # [00:57] <mpt> ⋘xml3⋙
- # [00:57] <parcelbrat> (xmlisp)
- # [00:57] <hubick> isn't this JSON stuff basically that? :)
- # [00:58] <parcelbrat> pretty darn close
- # [00:58] <Philip`> http://canvex.lazyilluminati.com/misc/sexp.html
- # [00:58] * hubick *lolz*
- # [00:58] * parcelbrat um.....
- # [00:59] <hubick> what was the old sgml transform language?
- # [00:59] <anne-mac> dsssl?
- # [00:59] <hubick> yeah
- # [00:59] <hubick> wasn't it like that?
- # [01:00] <anne-mac> no idea
- # [01:00] <hubick> heh, I just started clicking links on http://www.jclark.com/dsssl/ and got like, four 404's in a row :(
- # [01:03] <anne-mac> Philip`, nice
- # [01:11] * Philip` wonders how long it'll take to run "sort -R" on 4.5M lines
- # [01:12] <parcelbrat> nice
- # [01:12] <anne-mac> we could have text/html+lisp
- # [01:13] <parcelbrat> yeah, but then you'd have to start catering to everyone
- # [01:13] <parcelbrat> text/html+ruby
- # [01:13] <parcelbrat> text/html+python
- # [01:13] <parcelbrat> text/html+vbs...
- # [01:13] * parcelbrat keyboard breaks
- # [01:13] <anne-mac> neh, only those we like
- # [01:14] <parcelbrat> shouldn't those be application/html+<<lang>>
- # [01:14] <Philip`> We could support them all at first, then remove them all from the spec, and see which ones get complained about the most, and then just put those ones back in
- # [01:14] <anne-mac> application/* is overrated I think
- # [01:15] <parcelbrat> philip`: works for media, and got me here
- # [01:15] <parcelbrat> anne-mac: newbie question: why?
- # [01:15] * Dashiva is now known as Dashiva2
- # [01:15] * Dashiva2 is now known as Dashiva
- # [01:16] <anne-mac> parcelbrat, take text/xml versus application/xml, the only reason to prefer the latter is theoretical concerns over character encoding
- # [01:16] <Philip`> Hmm, 6 minutes of CPU time to randomise the list
- # [01:16] <anne-mac> but all implementations treat them as being equivalent
- # [01:17] <parcelbrat> how would handle text/xml with character encoding? let http handle it?
- # [01:17] <anne-mac> most text/* formats have their own specific rules for determining the character encoding actually, all against the various RFCs from the stone age
- # [01:18] <anne-mac> text/xml defaults to US-ASCII unless it has a charset parameter that says otherwise
- # [01:18] <anne-mac> application/xml defaults to whatever the XML file says unless it has a charset parameter specified
- # [01:18] <anne-mac> in practice, text/xml is like application/xml
- # [01:20] <parcelbrat> makes sense
- # [01:20] <parcelbrat> and we know how well application/html+xml works ;)
- # [01:20] <anne-mac> xhtml+xml* ;)
- # [01:20] * parcelbrat smacl
- # [01:21] <parcelbrat> s/smacl/smack
- # [01:22] <hubick> which makes me wonder if Firefox is ever gonna support */*+xml: https://bugzilla.mozilla.org/show_bug.cgi?id=155730
- # [01:25] <Philip`> Seems I can download/process 4K pages per minute
- # [01:26] <parcelbrat> hubick: yeah, lately, we've had a problem with Ruby on Rails error messages because <%= isn't a valid xml tag... the page isn't supposed to be coming across as xml... oh well
- # [01:26] <Philip`> I get loads of cookie spec violation warnings :-/
- # [01:30] <Philip`> Argh, and I've got ill-formed XML output too
- # [01:31] <parcelbrat> later ya'll
- # [01:31] * Quits: parcelbrat (n=parcelbr@96.239.197.10)
- # [01:31] <Philip`> <header uri="http://www.ganymede.cz/" name="Server" value="Apache/2.2.6 (Unix) mod_ssl/2.2.6  DAV/2 PHP/5.2.5"/>
- # [01:31] * Philip` wishes XML was easy
- # [01:33] <Philip`> Hmm, there's five sites with  in their Server
- # [01:33] * Quits: tndH (i=Rob@adsl-77-86-6-102.karoo.KCOM.COM) ("ChatZilla 0.9.79-rdmsoft [XULRunner 1.8.0.9/2006120508]")
- # [01:34] * Quits: aroben (i=aroben@unaffiliated/aroben) ("Leaving")
- # [01:34] <Philip`> <header uri="http://www.doxamus.ro/" name="Server" value="Apache/2.2.6 (Unix) mod_ssl/2.2.6 �N	YNED�����N	SAHP�����N	TATS�����N	 mod_bwlimited/1.4 mod_auth_passthrough/2.1 FrontPage/5.0.2.2635 PHP/5.2.4"/>
- # [01:34] <Philip`> That's really not going to work
- # [01:39] * Quits: jgraham_ (n=james@81-86-217-3.dsl.pipex.com) ("This computer has gone to sleep")
- # [01:39] * Quits: Dashiva (i=Dashiva@wikia/Dashiva)
- # [01:40] * Quits: mpt (n=mpt@ip-81-1-117-61.cust.homechoice.net) (Read error: 110 (Connection timed out))
- # [01:41] * Quits: hubick (n=hubick@cs14.pc.athabascau.ca)
- # [01:43] * Joins: Dashiva (i=Dashiva@wikia/Dashiva)
- # [01:43] <Philip`> anne-mac: http://www.cl.cam.ac.uk/~pjt47/misc/media.xml
- # [01:44] <Philip`> anne-mac: http://www.cl.cam.ac.uk/~pjt47/misc/media.txt too
- # [01:44] <Philip`> from 16 kilopages, minus about 500 with errors
- # [01:45] <Philip`> (This is only about 400MB of HTML, so I could do more fairly easily)
- # [01:45] <Philip`> (but probably not enough more to find really interesting things)
- # [01:50] <Philip`> s/about 500/940/
- # [01:50] <Philip`> (Also, binary kilo)
- # [01:51] * Joins: jgraham_ (n=james@81-86-217-3.dsl.pipex.com)
- # [01:54] * Quits: jgraham_ (n=james@81-86-217-3.dsl.pipex.com) (Client Quit)
- # [01:58] * Quits: anne-mac (n=annevk@88.80-202-68.nextgentel.com) (Read error: 110 (Connection timed out))
- # [02:00] <Hixie> "Re: [whatwg] HTML 5, OGG, competition, civil rights, and persons with disabilities"
- # [02:00] * Hixie fears looking at that e-mail
- # [02:00] <bradee-oh> lol
- # [02:01] <Dashiva> Hixie: Now you know how we feel about your mashup replies :P
- # [02:01] <Hixie> :-D
- # [02:04] <Philip`> hsivonen: I can process 16K pages in 20 seconds (wallclock time) - I think your parser is fast enough for me for now :-)
- # [02:05] <_Ivo> I can sadly say that that is a sad title.
- # [02:07] <Philip`> 441% CPU usage? I think 'top' is lying to me...
- # [02:26] <Hixie> http://www.bluishcoder.co.nz/2007/12/video-element-and-ogg-theora.html is a good summary
- # [02:34] * Joins: doublec_ (n=doublec@209.79.152.179)
- # [02:34] * Quits: doublec (n=doublec@209.79.152.179) (Read error: 104 (Connection reset by peer))
- # [02:35] <doublec_> hotel network connections, sigh
- # [02:39] <roc> hey
- # [02:39] <roc> yeah, that was good Chris
- # [02:39] <doublec_> thanks :)
- # [02:39] <doublec_> I had to type it in using w3m over a ssh connection using bloggers interface.
- # [02:40] <doublec_> since the hotel network seems to kill any browser traffic over a certain size
- # [02:41] <roc> This is Avante?
- # [02:41] <roc> I don't remember that being a problem
- # [02:41] <doublec_> Yes, it's avante. And it is strange. I can receive fine. I can't even send via gmail.
- # [02:41] <doublec_> yet that's over ssl so I don' t understand it
- # [02:42] * Quits: phsiao (i=shawn@nat/ibm/x-24dd963ac22371ac) (Read error: 110 (Connection timed out))
- # [02:44] <roc> complain to the management, that's pretty important right now
- # [02:45] <doublec_> will do
- # [02:46] <othermaciej> mmmm, ogg flamage
- # [02:46] <othermaciej> toasty
- # [02:47] <othermaciej> Hixie: it argued that not recommending support for the Ogg Theora video codec will be harmful to the blind
- # [02:47] <Hixie> i've been trying to keep the flames to a minimum by asking for politeness off-list, i hope it helps
- # [02:47] <Hixie> wait, what?
- # [02:47] <Hixie> wow
- # [02:47] <Hixie> can't wait to read that
- # [02:49] <Philip`> http://www.cl.cam.ac.uk/~pjt47/misc/attributes.html - some numbers about values of an arbitrarily-chosen set of elements/attributes
- # [02:50] <Hixie> off hand those numbers match what i remember seeing
- # [02:50] <Hixie> except for a rev=PARENT
- # [02:50] <Hixie> i just saw rev=made and rev=stylesheet
- # [02:50] * Quits: billmason (n=billmaso@ip156.unival.com) (".")
- # [02:51] <Philip`> They all come from http://www.dcs.gla.ac.uk/~simon/quantum/
- # [02:51] <Philip`> (since I counted number of occurrences in total, not number of pages)
- # [02:52] <Philip`> Maybe number of pages would be more useful...
- # [02:52] <Hixie> ah yeah i found that number of occurances just never works
- # [02:52] <Hixie> there are too many gigantic pages that totally screw the count
- # [02:54] * Joins: hdh (n=hdh@58.187.109.98)
- # [02:57] <Philip`> http://www.cl.cam.ac.uk/~pjt47/misc/attributes.html - now with number of pages too
- # [02:58] <Philip`> Not sure why I'm bothering to keep the number of occurrences too, but I guess it doesn't hurt
- # [02:58] <Philip`> Looks like cyan and magenta are the least favourite of the binary colours :-(
- # [02:59] <othermaciej> doublec_'s blog post is indeed a good summary
- # [02:59] * Hixie pokes Philip` to look at his /msgs (and maybe to have him respond on w3.net, since he's not registered on freenode)
- # [03:04] <othermaciej> I posted on the codec thread
- # [03:05] <othermaciej> may knuth have mercy on my soul
- # [03:05] <Philip`> "The" codec thread? I thought there was about two dozen of them
- # [03:05] <othermaciej> one of them
- # [03:13] * doublec_ is now known as doublec
- # [03:17] * Joins: Thezilch (n=fuz007@ip68-111-154-116.sd.sd.cox.net)
- # [03:21] <Philip`> othermaciej: About "I've heard game vendors cited, not sure which ones": See http://wiki.xiph.org/index.php/Games_that_use_Vorbis
- # [03:22] <othermaciej> thanks, that doesn't list the vendors in an easy-to-find way
- # [03:22] <Philip`> http://www.unrealtechnology.com/features.php?ref=audio mentions Vorbis support quite prominently
- # [03:22] <othermaciej> main thing I wondered about was whether any Microsoft-published games use Ogg Vorbis
- # [03:23] <othermaciej> I think the audio issue is somewhat less important since (a) Vorbis has good quality and a somewhat more solid IP footing and (b) MP3 patents will expire in a few years, at which point MP3 is a suitable audio baseline
- # [03:23] <Philip`> I don't see any on the list that I recognise as Microsoft
- # [03:25] <Philip`> Oh
- # [03:25] <Philip`> Halo
- # [03:26] <Philip`> which was while they were owned by Microsoft
- # [03:27] <Philip`> s/they/Bungie/
- # [03:27] <Hixie> halo is certainly high profile
- # [03:28] <Dashiva> Fable too
- # [03:29] <_Ivo> and Gears of War
- # [03:30] <Philip`> _Ivo: Uh, I don't think that used Vorbis
- # [03:30] <_Ivo> as far as I know, it did
- # [03:30] <_Ivo> may be worth confirming
- # [03:31] <Philip`> Oh, looks like you're right
- # [03:32] <Philip`> e.g. http://utforums.epicgames.com/showthread.php?t=583980&page=10 says it has a vorbis.dll
- # [03:34] <Hixie> btw just so everyone is up to date, i'm thinking we should just drop the whole cross-references nonsense and replace it with some recommendations about using <a href="">
- # [03:34] <Philip`> Actually, maybe that's just left over from it being an Unreal Engine game - I have no idea if they really use the Ogg support
- # [03:34] <othermaciej> Hixie: the autolinking cross-references?
- # [03:34] <Hixie> yeah
- # [03:35] <Hixie> too much complexity for not much gain
- # [03:35] <othermaciej> Hixie: they did seem cute, but admittedly not that compelling over <a>
- # [03:36] <Hixie> good lord we got a lot of feedback on <cite>
- # [03:40] <Hixie> hm, i'm thinking, <cite> maybe should just be for a title of a work. any work, and even if it's not technically really cited.
- # [03:40] <Hixie> it seems that the typographic convention angle is more useful than the "this is a citation" angle
- # [03:40] <Hixie> and the line of what a citation is is a bit vague anyway
- # [03:43] <roc> Hixie: FWIW Apple and Nokia support software patents in general.
- # [03:44] <othermaciej> http://www.macobserver.com/article/2007/08/31.1.shtml
- # [03:45] <othermaciej> (that's not a counter-argument to supporting them in general, I don't know if Apple has an official position on that)
- # [03:51] <roc> http://www.macobserver.com/article/2007/08/02.12.shtml
- # [03:51] <roc> "However, Apple's chief patent counsel, Chip Lutton, contradicted Ms. Lee, and doesn't think the patent system is broken. In fact, "it's the best system in the world," he said. "
- # [03:56] <othermaciej> that certainly seems like support for the patent system in general
- # [03:57] <roc> it seems to indicate Apple is pretty happy with software patents in general
- # [03:57] <roc> because many other systems in the world don't have them
- # [03:58] <othermaciej> well, again, I'm not privy to Apple's official view on the matter, but I know that Apple has actively supported patent reform, and that this is likely to improve the situation at least somewhat
- # [03:59] <othermaciej> that probably makes a bigger difference than sound bites
- # [04:00] <othermaciej> I personally think software patents are broken and should either not exist or have much shorter terms than current terms, but that that is certainly not an official position
- # [04:01] <roc> I agree
- # [04:02] <roc> the "reforms" pushed by Apple and others are probably good things in themselves, but they're essentially self-interested attempts to tamp down the troll problem
- # [04:04] * roc hopes that the US Supreme Court will just rule software patents invalid and all this will just blow away in the breeze
- # [04:05] <othermaciej> unfortunately that seems unlikely
- # [04:05] <roc> I would have thought so, but they've been most ornery about patents lately
- # [04:08] <othermaciej> honestly I'm not sure I get Apple's stance given which side of patent lawsuits we're usually on
- # [04:09] <roc> I know the feeling. I used to work for IBM
- # [04:10] * Joins: csarven- (n=nevrasc@modemcable130.251-202-24.mc.videotron.ca)
- # [04:14] <MikeSmith> Hixie - I won't shed any tears when the dfn cross-referencing thing gets dropped, but I think, gee wouldn't it be nice if we had a general xref mechanism?
- # [04:14] * Quits: Thezilch (n=fuz007@ip68-111-154-116.sd.sd.cox.net) (Read error: 104 (Connection reset by peer))
- # [04:15] <MikeSmith> such that empty <xref href="#foo"> gets replaced with content of element at foo
- # [04:16] * Quits: csarven (n=nevrasc@modemcable130.251-202-24.mc.videotron.ca) (Read error: 110 (Connection timed out))
- # [04:30] * Joins: phsiao (n=shawn@c-24-61-15-24.hsd1.ma.comcast.net)
- # [04:31] * Quits: phsiao (n=shawn@c-24-61-15-24.hsd1.ma.comcast.net) (Client Quit)
- # [04:42] * Joins: G0k (n=hmason@cpe-24-58-3-19.twcny.res.rr.com)
- # [04:42] <G0k> uh
- # [04:42] * Parts: G0k (n=hmason@cpe-24-58-3-19.twcny.res.rr.com)
- # [04:42] * Joins: G0k (n=hmason@cpe-24-58-3-19.twcny.res.rr.com)
- # [04:42] <G0k> who the hell is this rudd-o clown?
- # [04:46] <othermaciej> G0k: he's clearly passionate about his beliefs
- # [04:46] <G0k> at this point i'm convinced he's an agent for MPEG LA
- # [04:46] <G0k> because he's doing more to discredit the Ogg crowd than anyone else I've seen
- # [04:51] <aphid> it's bulldada, by contrast he makes the rest of us look civilized and reasonable.
- # [04:51] <aphid> :D
- # [04:52] <G0k> uhg
- # [04:58] * Quits: G0k (n=hmason@cpe-24-58-3-19.twcny.res.rr.com)
- # [05:17] * Joins: kfish (n=conrad@61.194.21.25)
- # [05:20] <MikeSmith> kfish - hei
- # [05:33] <kfish> yo MikeSmith
- # [05:33] <kfish> cold in tokyo?
- # [05:37] * Quits: dglazkov (n=dglazkov@adsl-074-229-248-021.sip.bhm.bellsouth.net)
- # [05:43] * Quits: csarven- (n=nevrasc@modemcable130.251-202-24.mc.videotron.ca) ("http://www.csarven.ca")
- # [05:50] * Joins: Compaq_Propietar (n=chatzill@201.153.4.9)
- # [05:51] * Quits: Compaq_Propietar (n=chatzill@201.153.4.9) (Client Quit)
- # [05:52] <MikeSmith> kfish - yeah, too cold for me already
- # [05:52] <MikeSmith> both in Tokyo and out at Keio/SFC
- # [05:53] <MikeSmith> warmed up last night by eating shabu-shabu and drinking fugu hire-zake
- # [05:54] <kfish> nice :-)
- # [06:02] * Joins: parcelbrat (n=parcelbr@c-67-185-108-198.hsd1.wa.comcast.net)
- # [06:08] * Quits: jruderman (n=jruderma@corp-241.mountainview.mozilla.com)
- # [06:14] <_Ivo> This won't be nice of me asking, but is it possible to block Manuel Amador (Rudd-O) from the lists at least temporarily til he cools off?
- # [06:15] * Quits: roc (n=roc@202.0.36.64)
- # [06:15] <bradee-oh> I see he promises an upcoming barrage of emails because of his handiwork at Digg.
- # [06:15] <bradee-oh> oh joy.
- # [06:16] <bradee-oh> sure has made it difficult to have actual discussions about actual standards issues today *sigh*
- # [06:20] <parcelbrat> I'm one of the people who came to get clarification (earlier) due to his /. post. I'd actually like to stay involved for the real reason.
- # [06:20] <parcelbrat> That aside, I'm wondering if there has been a barage here from that too
- # [06:22] * Joins: jruderman (n=jruderma@c-67-180-15-227.hsd1.ca.comcast.net)
- # [06:29] <bradee-oh> The quantity today has been *much* higher than usual, and much less productive. yay!
- # [06:30] * inimino wonders how many productive man-hours were lost
- # [06:32] <Teratogen> BRING BACK OGG!
- # [06:36] <jruderman> Teratogen: http://yro.slashdot.org/comments.pl?sid=385689&cid=21655557
- # [06:36] <jruderman> (sorry, i guess that was Oog)
- # [06:38] <parcelbrat> does Teratogen == rudd-o?
- # [06:42] <Teratogen> BRING BACK OGG NOW!
- # [06:44] <_Ivo> what the hell is a Oog?
- # [06:48] <jruderman> Oog, the open-source caveman, a legendary Slashdot troll
- # [06:52] <aphid> ogg is also the name of the stalinesque leader from some Nat'l Petroleum Institute produced cartoon that's on archive.org
- # [06:53] <aphid> http://www.archive.org/details/Destinat1956
- # [07:00] <parcelbrat> lol
- # [07:04] * Parts: hdh (n=hdh@58.187.109.98)
- # [07:04] * Quits: jruderman (n=jruderma@c-67-180-15-227.hsd1.ca.comcast.net)
- # [07:07] * Joins: jruderman (n=jruderma@c-67-180-15-227.hsd1.ca.comcast.net)
- # [07:08] * Quits: gavin (n=gavin@firefox/developer/gavin) ("leaving")
- # [07:08] * Joins: gavin__ (n=gavin@people.mozilla.com)
- # [07:09] * Parts: gavin__ (n=gavin@people.mozilla.com)
- # [07:09] <Hixie> if anyone is on site5, feel free to point out on this thread that we didn't trade ogg for something proprietary: http://forums.site5.com/showthread.php?t=19941
- # [07:16] <parcelbrat> i noticed someone on site5 on #ror, but he already left
- # [07:17] * Joins: gavin__ (n=gavin@people.mozilla.com)
- # [07:22] * Joins: maikmerten (n=merten@ls5laptop14.cs.uni-dortmund.de)
- # [07:46] * Quits: parcelbrat (n=parcelbr@c-67-185-108-198.hsd1.wa.comcast.net)
- # [07:46] * Quits: _Ivo (n=ivo@89.180.105.255) (Remote closed the connection)
- # [07:51] <Hixie> I've added documentation to the annotation system
- # [08:03] * Joins: jgraham_ (n=james@81-86-217-3.dsl.pipex.com)
- # [08:05] * Quits: jgraham_ (n=james@81-86-217-3.dsl.pipex.com) (Client Quit)
- # [08:33] * Joins: tndH_ (i=Rob@adsl-77-86-6-102.karoo.KCOM.COM)
- # [08:33] * tndH_ is now known as tndH
- # [08:34] * Joins: roc (n=roc@121-72-24-31.dsl.telstraclear.net)
- # [08:45] <hsivonen> lots and lots of ogg email :-(
- # [08:48] * Joins: roc_ (n=roc@121-72-24-31.dsl.telstraclear.net)
- # [08:51] <doublec> yes, Ogg is the favourite topic of the day
- # [08:52] <Hixie> more mail?
- # [08:53] <doublec> only a couple :)
- # [08:54] * Joins: anne-mac (n=annevk@88.80-202-68.nextgentel.com)
- # [08:55] <Hixie> we've hit 855 members
- # [08:55] <Hixie> that's 50 more than this morning
- # [08:55] * Quits: roc (n=roc@121-72-24-31.dsl.telstraclear.net) (Read error: 110 (Connection timed out))
- # [08:57] <othermaciej> Hixie: you need to make more controversial changes so we pass 1000
- # [08:59] <Hixie> seriously
- # [08:59] <doublec> and find a way to get money from each new member
- # [08:59] <doublec> since you aren't getting bribed to remove ogg :)
- # [09:00] <doublec> I think your reddit replies should get some sort of award
- # [09:01] <othermaciej> for "most reddit replies on a single thread"?
- # [09:01] <roc_> when Hixie does start taking bribes we're all going to look stupid
- # [09:24] * Quits: hober (n=ted@unaffiliated/hober) (Read error: 104 (Connection reset by peer))
- # [09:34] * Joins: madness (n=mng@client-82-2-93-126.manc.adsl.virgin.net)
- # [09:35] * Quits: Lachy (n=Lachlan@cm-84.215.41.149.getinternet.no) ("This computer has gone to sleep")
- # [09:36] * Joins: Lachy (n=Lachlan@ti200710a340-2895.bb.online.no)
- # [09:38] * Quits: Lachy (n=Lachlan@ti200710a340-2895.bb.online.no) (Client Quit)
- # [09:46] * Joins: virtuelv (n=virtuelv@pat-tdc.opera.com)
- # [09:53] * othermaciej is now known as om_sleep
- # [09:54] * Quits: maikmerten (n=merten@ls5laptop14.cs.uni-dortmund.de) ("Verlassend")
- # [09:56] <Hixie> hsivonen: i don't really see anything in http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-December/008849.html that i can respond to (other than the <q> issue), except maybe the suggestion in the parenthetical in point 1
- # [09:56] <Hixie> but i don't understand what that's suggesting
- # [09:56] <Hixie> can you advise?
- # [09:57] <anne-mac> that e-mail is not from hsivonen...
- # [09:57] <hsivonen> Hixie: looking at it now
- # [09:57] <Hixie> oh you're right, it's not
- # [09:57] <Hixie> oops
- # [09:58] <Hixie> well i don't know what to do with it then
- # [09:59] <anne-mac> i think it's more of a rant than a comment
- # [10:04] <hsivonen> Hixie: The main point I'd make is that non-heuristic machine consumption (i.e. naïvely trusting marked-up semantics and only using marked-up semantics) for dialogs, names of vessels, quotations, etc. does not have a plausible market-demand story. However, the elevator pitch for hCard and hCalendar is at least plausible: UI for adding event info or contact info to iCal, Address Book or similar app
- # [10:05] <Hixie> agreed
- # [10:05] <Hixie> doesn't really affect the spec though
- # [10:05] <Hixie> so...
- # [10:05] <hsivonen> the volume of the ogg thing has caused me to get out of sync with IRC and list email. :-(
- # [10:06] <Hixie> it wasn't that bad
- # [10:06] <anne-mac> oh, Hixie replies to a annevk@opera.com e-mail!
- # [10:06] <Hixie> :-)
- # [10:07] <anne-mac> reached the two year old e-mail mark? :)
- # [10:09] <Hixie> nah, just dealing with mail from buckets that have newer mail
- # [10:09] <Hixie> thought that particular mail was actually from the time my spam filter hated you
- # [10:09] <Hixie> though, even
- # [10:09] <anne-mac> could be
- # [10:10] <Hixie> no it definitely was
- # [10:10] <Hixie> i had to go fish it out of my gmail pile to reply to it
- # [10:10] <anne-mac> ok :)
- # [10:11] * Quits: weinig (n=weinig@17.203.15.140)
- # [10:11] <virtuelv> Hixie: you're implying you never delete even spam?
- # [10:11] <Hixie> no, i saved a bunch of mail from anne back when my filter hated him
- # [10:11] <Hixie> but e-mails saved from gmail's spam folder don't get forwarded to my main imap server
- # [10:12] <Hixie> they just sit in my gmail inbox
- # [10:12] * Hixie gets to an e-mail from Tina Holmboe
- # [10:12] * Hixie decides to deal with that one later
- # [10:13] <Hixie> btw, what's the status with the role="" stuff? and what's the status with the forms stuff?
- # [10:13] <anne-mac> forms: nobody replied to my request for input
- # [10:14] <anne-mac> role: zcorpan knows it better than me at this point, being on the group and all
- # [10:14] <anne-mac> I think they like to call it role=, want to use namespaces in some places? and the rest is aria-xxx
- # [10:15] <Hixie> so i have a request here for an <attn> element, whose primary use case would be to do what waiaria does with update regions
- # [10:16] <Hixie> will i eventually be mentioning waiaria somewhere in html5?
- # [10:16] <anne-mac> i think so, yes
- # [10:16] <zcorpan> i think the idea is to define both role= and aria-xxx=, and the old namespaced attributes, even with the knowledge that browsers don't want to support the namespaced attributes
- # [10:16] <zcorpan> i really don't know why
- # [10:17] * Joins: jgraham_ (n=james@81-86-217-3.dsl.pipex.com)
- # [10:17] * Joins: ROBOd (n=robod@89.122.216.38)
- # [10:17] <anne-mac> Hixie, I guess HTML5 would defer to the aria draft for role= and attributes prefixed with aria-, would make it clear there are no DOM interfaces, and that's it
- # [10:18] * anne-mac should probably go to work at some point
- # [10:18] <Hixie> no DOM interfaces? huh, that sucks. since it's primarily intended to be for scripts...
- # [10:18] <Hixie> zcorpan: can we fight for more sanity?
- # [10:19] <Hixie> zcorpan: i'm willing to help if needed. in particular, can we decouple from xhtml2's role stuff?
- # [10:19] <zcorpan> Hixie: not sure
- # [10:20] <zcorpan> but i think so
- # [10:22] <zcorpan> Hixie: the idea is that the attributes should be correct even in legacy browsers with no knowledge of aria, so that screen readers can pick it up from the dom
- # [10:22] <zcorpan> although i guess they could read js properties as well
- # [10:23] <Hixie> ah interesting
- # [10:26] <zcorpan> perhaps we can introduce a dom interface for aria 2, when we know what aria 2 will require
- # [10:34] * zcorpan considers marking all ogg emails as read
- # [10:34] * krijnh did
- # [10:35] <krijnh> Apart from the big Hixie replies
- # [10:39] <hsivonen> Philip`: re: 20 seconds: nice
- # [10:41] <zcorpan> hsivonen: btw, does your schema support role="fancy-checkbox checkbox"?
- # [10:42] * zcorpan can see a disadvantage of allowing arbitrary roles; conformance checkers won't catch typos
- # [10:43] <zcorpan> role="chekcbox"
- # [10:43] * jgraham_ seems to be getting new ogg emails at roughly his time average rate of reading WHATWG emails so the unread number is almost constant
- # [10:43] <hsivonen> Hixie: I think <cite> should be defined to capture the Chicago Manual of Style title-of-work concept while at the same time saying that an author is not an evil person for using <i> instead and saying that you don't need to change legacy content that uses <cite> for names of people
- # [10:44] * Joins: weinig (n=weinig@c-71-198-176-23.hsd1.ca.comcast.net)
- # [10:44] * Quits: anne-mac (n=annevk@88.80-202-68.nextgentel.com) (Read error: 110 (Connection timed out))
- # [10:46] <hsivonen> hubick: I'm near email bankruptcy but I have received your patch. Sorry about the delay.
- # [10:53] <hsivonen> zcorpan: no, the proof-of-concept schema approach does not take address extensibility
- # [10:53] <zcorpan> hsivonen: ok
- # [10:53] <hsivonen> zcorpan: extensibility is not on the agenda for 1.0
- # [10:54] <hsivonen> zcorpan: moreover, extensibility and white-list-like validation are conflicting things
- # [10:54] * Joins: Camaban (n=adrianle@host217-41-27-233.in-addr.btopenworld.com)
- # [10:54] <zcorpan> perhaps i should update the authoring conformance reqs accordingly
- # [11:00] * Parts: Camaban (n=adrianle@host217-41-27-233.in-addr.btopenworld.com)
- # [11:00] * Joins: Camaban (n=adrianle@host217-41-27-233.in-addr.btopenworld.com)
- # [11:02] * Joins: peepo (n=Jay@host86-129-168-72.range86-129.btcentralplus.com)
- # [11:04] <Philip`> Does the Ogg discussion indicate that HTML5 is actually relevant and people care about it, or would they complain as much about any other specification that did the same thing?
- # [11:04] <Hixie> hsivonen: well you never need to change legacy content
- # [11:04] <Hixie> hsivonen: it just might not be compliant to html5 :-)
- # [11:04] <Hixie> (i don't see any point explicit grandfathering in legacy content in that way)
- # [11:05] <hsivonen> Hixie: it appears that *some* authors don't think that way
- # [11:05] <hsivonen> Hixie: I'm inclined to think they are misguided, but still
- # [11:05] <Hixie> well if they want to be compliant, so much teh better
- # [11:05] <Hixie> could you send a mail elaborating on your idea for <cite>?
- # [11:05] <Hixie> i really want to address this inline vs block issue
- # [11:06] <Hixie> i don't really know hwo to do so
- # [11:06] <Hixie> maybe i really should do this matrix idea i mentioned
- # [11:06] <hsivonen> http://diveintomark.org/archives/2003/01/13/semantic_obsolescence
- # [11:06] * om_sleep is now known as othermaciej
- # [11:07] <hsivonen> Hixie: OK. I'll send email on both issue
- # [11:07] <hsivonen> s
- # [11:07] * Quits: jgraham_ (n=james@81-86-217-3.dsl.pipex.com) ("This computer has gone to sleep")
- # [11:07] <Hixie> well the inline thing i'm actually thinking of resolving right now
- # [11:07] <Hixie> so feel free to discuss that here
- # [11:07] <othermaciej> Hixie: I think the matrix would be a useful excercise to clarify what's important to capture in the content model reqirements
- # [11:07] <othermaciej> (there might be a bunch of "don't care" boxes)
- # [11:07] <othermaciej> Philip`: way to see a silver lining :-)
- # [11:08] <zcorpan> Hixie: did you see my thoughts about inline yesterday?
- # [11:09] <zcorpan> http://krijnhoetmer.nl/irc-logs/whatwg/20071211#l-361
- # [11:09] <Hixie> hsivonen: (and regarding mark's post, note that i work indirectly with him and he is up to date with the spec :-) )
- # [11:09] <Hixie> zcorpan: looking
- # [11:09] <hsivonen> Hixie: the following are random thoughts with no conclusion that I'd be comfortable with yet:
- # [11:10] <hsivonen> * so far it seems that authors don't like bimorphic
- # [11:10] <Hixie> zcorpan: i think there are pretty solid reasons for wanting to allow <p><ol/></p>, but i agree that it's conceptually a pain, especially given the html serialisation issue
- # [11:10] <zcorpan> indeed
- # [11:10] <hsivonen> * RELAX NG can validate bimorphic but the user experience sucks by default without schema-specific UI-level papering over
- # [11:10] <Hixie> hsivonen: (by which you mean they like mixing inline and block content?)
- # [11:11] <hsivonen> Hixie: yes
- # [11:11] <hsivonen> Hixie: cf. Sean Fraser, Sam Ruby and Dan Connolly
- # [11:11] <Hixie> yeah i agree that people want that
- # [11:11] <Hixie> i think we should allow that
- # [11:11] <Hixie> let's do this matrix thing
- # [11:11] * Hixie finds a tool that can do easy editing of grids
- # [11:11] <hsivonen> * having general content models like "block" or "inline" makes schemas easier to write
- # [11:12] <hsivonen> * also makes conformance easier to teach
- # [11:12] <zcorpan> Hixie: (what's the matrix thing?)
- # [11:12] <othermaciej> google spreadsheets?
- # [11:12] <othermaciej> makes a grid, easy to share on the web
- # [11:12] <hsivonen> * having content model differences in XHTML5 is inconvenient
- # [11:12] <hsivonen> - Makes schema ugly
- # [11:12] <Hixie> othermaciej: way ahead of you :-)
- # [11:12] <othermaciej> zcorpan: which block/inline elements should be allowed in what others, assuming the rules were being designed from scratch
- # [11:13] <hsivonen> - Requires me to maintain a separate "HTML5-compatible subset of XHTML5" schema/mode
- # [11:13] <hsivonen> - Does not make sense with the "people should just use text/html" party line
- # [11:14] <hsivonen> - Does not make sense with the "apps should use XHTML internally but serialize to HTML5 for IE" party line
- # [11:14] <Hixie> yeah i agree that we should strive for no differences
- # [11:14] <othermaciej> Hixie: would you consider also multi-level nesting? (I guess that's most likely to affect <p>)
- # [11:14] <Hixie> which basically means the html serialisation wins
- # [11:14] <Hixie> or rather, can veto
- # [11:14] <Hixie> othermaciej: can you give an example where three-or-more-way nesting would have a different answer than two-way?
- # [11:15] <hsivonen> * I'm not convinced that achieving semantic purity with intra-paragraph lists is worth all the trouble
- # [11:15] <Hixie> othermaciej: i was just gonna do it on the basis of indirect nesting
- # [11:15] <Hixie> do we have a list of elements anywhere yet? i could use docs' magic list making feature but that seems unlikely to work perfectly here :-)
- # [11:16] <othermaciej> Hixie: I don't think there is such a case, but expressing the rules in terms of indirect nesting may be hard to understand
- # [11:16] <othermaciej> there are lists of elements, yes
- # [11:16] <othermaciej> I think zcorpan has one
- # [11:16] <othermaciej> http://simon.html5.org/html5-elements
- # [11:16] <Hixie> cool thanks
- # [11:16] <Hixie> othermaciej: well we'll worry about exactly how to express the rules later
- # [11:16] <Hixie> i just want an idea of what the ideal would be first
- # [11:17] <hsivonen> * OpenOffice.org Writer/Web makes an interesting case study of an editor with very strict block/inline boundaries
- # [11:17] <othermaciej> Hixie: if paragraphs could contain tables, then it might make sense to let a table in a paragraph contain a paragraph in a cell
- # [11:18] <othermaciej> I guess that is one plausible exception I can think of
- # [11:18] <hsivonen> * I don't know how to reconcile hand-authoring flexibility and importability in an app like OO.o Writer/Web except defining that editing apps MAY introduce a lot of layout-wrecking <p>s
- # [11:18] <zcorpan> some elements only allow inlines. others allow both block and inline. inline elements never allow blocks. something in that direction seems to be what authors believe the rules are, i think
- # [11:19] <Hixie> othermaciej: i'm not really convinced it would, but interesting
- # [11:19] <hsivonen> * Having different content models for <i> and <em> sucks big time
- # [11:20] <hsivonen> * In general, this whole strict inline thing probably sucks
- # [11:20] <Hixie> if you have google accounts post your e-mail addresses here (or msg me) so i can add you to this thing
- # [11:20] <othermaciej> maciej@gmail.com
- # [11:20] <zcorpan> zcorpan@gmail.com
- # [11:20] <hsivonen> hsivonen@
- # [11:21] <krijnh> krijnhoetmer@xs4all.nl
- # [11:21] <krijnh> Is this about the mixing of inline/block content as well?
- # [11:22] <hsivonen> * I agree that the <div> content model in HTML 4 sucks semantically
- # [11:23] <Hixie> http://spreadsheets.google.com/ccc?key=pkNVM1HEQs-wsHB7s1M5Lbw
- # [11:23] <hsivonen> * But it seems the horse is out and the barn has burned
- # [11:23] <Hixie> feel free to fill in the cells you think look obvious
- # [11:23] <Hixie> if someone fills in something you disagree with, put a question mark after it
- # [11:24] <hsivonen> * We should probably allow Philip`'s Firefox <span> workaround
- # [11:25] <zcorpan> hmm, i'm a bit uncomfortable about that one. i breaks the rule "inlines never allows blocks"
- # [11:25] * Philip` isn't sure that's worthwhile since it doesn't work in IE and is therefore a bit useless for most authors
- # [11:25] <hsivonen> * I don't like the direction of these points, since the direction is that <p> should allow inline and almost every other block container should allow %Flow
- # [11:26] <zcorpan> Philip`: indeed
- # [11:27] <hsivonen> zcorpan: isn't block-in-span an IEism that others have to implement for compat with content out there?
- # [11:27] <Philip`> <section><div class=section> lets you style .section in all browsers, and HTML5 UAs will get the sectioning correct
- # [11:28] <zcorpan> hsivonen: also h1-h6 and address
- # [11:28] <Hixie> feel free to fill in this spreadsheet too, btw, i don't plan on doing all 10000 cells myself :-P
- # [11:28] <krijnh> Ow, you're not? ;)
- # [11:28] <zcorpan> hsivonen: yes, but authors think it's disallowed
- # [11:28] <Philip`> Is there a Google Spreadsheet API to fill these things in automatically? :-)
- # [11:29] <hsivonen> as a schema writer and a closet markup purist, I like bimorhic and stuff
- # [11:29] <hsivonen> but as a validator front end developer and realist I don't
- # [11:29] <hsivonen> I'm torn
- # [11:30] <Hixie> Philip`: it's not the api that's the hard part :-)
- # [11:30] <Hixie> i think it's clear that forcing a separation of inline and block isn't working in practice
- # [11:30] * jgraham wishes he could stay for this discussion
- # [11:30] <Hixie> i think we can just say that paragraphs are implied
- # [11:31] * Philip` wishes he could stay because it looks freezing outside
- # [11:31] <Hixie> done 50 of about 10000 so far
- # [11:31] <jgraham> Philip`: That too :)
- # [11:31] <zcorpan> (perhaps we should change address to allow %flow as well)
- # [11:31] <hsivonen> Hixie: yeah, but is there a credible story that'd allow the likes of OpenOffice.org Writer/Web to make all those implicit paragraphs explicit when it reserializes from its non-DOM internal datastructure?
- # [11:31] <Hixie> hsivonen: i don't think semantically we should disallow it
- # [11:32] <Hixie> hsivonen: of course as you say, it causes havoc with styling
- # [11:32] <Hixie> not sure what to do about that
- # [11:32] <jgraham> Hixie: Can you ad my jgraham.html@gmail account
- # [11:32] <Hixie> done
- # [11:32] <jgraham> thx
- # [11:32] <Hixie> anyone can feel free to add other people btw
- # [11:33] <Hixie> in case y'all want to continue this when i go to bed :-)
- # [11:33] <colione_> olle.lundberg@gmail
- # [11:34] <Hixie> added
- # [11:34] <colione_> thnx
- # [11:36] <Hixie> woot, finished <abbr>
- # [11:37] <Hixie> hmm, should <caption> ... <address> </address> </caption> be legal
- # [11:39] <zcorpan> Hixie: no, caption should only allow inline (like h1-h6)
- # [11:39] <Hixie> makes sense
- # [11:40] <Hixie> hmm, should we allow a <dialog> to contain a <section> or <address> or <footer>...
- # [11:40] <Hixie> indirectly even
- # [11:41] <krijnh> What do you mean by indirectly contain?
- # [11:41] <Hixie> <dialog> <dt> krijnh <dd> <section> <p> What do you mean by indirectly contain? ...
- # [11:41] <Hixie> in fact, should we allow <ol>/<ul>/<dl> to contain any sectioning elements
- # [11:41] <Hixie> i'm thinking not really
- # [11:42] <Hixie> how about tables? should they be allowed to contain sectioning elements?
- # [11:42] <hsivonen> Indirectly does not sound good for my purposes...
- # [11:42] <Hixie> i'm sure we'll find a better way of phrasing this in due course
- # [11:42] <krijnh> Hixie: probably for table based layouts
- # [11:42] <hsivonen> Hixie: a <td> should have the some content model as <body> to allow real-world authoring patterns
- # [11:43] <Hixie> krijnh: well those aren't legal anyway
- # [11:43] <hsivonen> s/some/same/
- # [11:43] <Hixie> hsivonen: really? i'd have thought discouraging table-based layouts would be a plus here.
- # [11:43] <Hixie> they're already invalid
- # [11:43] * Joins: Lachy (n=Lachlan@ti200710a340-2895.bb.online.no)
- # [11:43] <hsivonen> or more to the point, <td> should allow at least everything <body> allows
- # [11:43] <Hixie> we just can't catch them
- # [11:44] <Hixie> maybe <address> should only be allowed as a direct child of a sectioning element?
- # [11:44] <Hixie> hmm
- # [11:44] <hsivonen> Hixie: making table-based layout invalid is pointless until the CSS WG delivers a viable alternative *and* the top four browsers implement it
- # [11:44] <Hixie> hsivonen: table based layout has never been valid
- # [11:45] * Quits: Lachy (n=Lachlan@ti200710a340-2895.bb.online.no) (Client Quit)
- # [11:45] * Joins: Lachy (n=Lachlan@ti200710a340-2895.bb.online.no)
- # [11:45] <hsivonen> Hixie: then perhaps we should make them valid
- # [11:46] <hsivonen> Hixie: at least with appropriate role='' pixie dust
- # [11:46] <annevk> annevankesteren@gmail.com
- # [11:46] <Hixie> hsivonen: ew
- # [11:46] <Hixie> to the first part
- # [11:46] <Hixie> not so much the second
- # [11:46] <Hixie> annevk: done
- # [11:48] <hsivonen> Hixie: is the table intentionally missing <table> and table-internal stuff?
- # [11:48] <Hixie> yeah
- # [11:48] <Hixie> <td> and <th> cover those
- # [11:48] <Hixie> and <caption>
- # [11:48] <Hixie> they're the only "exit points" for tables
- # [11:48] <Hixie> i think i might drop <ol> <ul> <dl> <dialog> too for the same reason
- # [11:49] <annevk> Hixie, wasn't <p><strong>blah</strong></p> for <lede> or <lead>?
- # [11:49] <Hixie> annevk: it's not really marking importance, is it?
- # [11:49] <Hixie> actually <figure> should probably be taken out too
- # [11:50] <Hixie> it has a whole other set of issues
- # [11:50] <krijnh> Can an article contain an article?
- # [11:50] * krijnh is very efficient - 2 no's already
- # [11:51] <annevk> yeah, for comments on an article
- # [11:51] * hsivonen hopes the table will eventually generalize into a handful of content models to teach and to type into a schema
- # [11:51] <Hixie> krijnh: yes, blog comments are articles in articles
- # [11:51] <Hixie> hsivonen: i hope so too
- # [11:52] <annevk> Hixie, yeah, fair enough, just thought that was the idea earlier on
- # [11:53] <othermaciej> Hixie: does it make sense to have <td> on the horizontal axis?
- # [11:53] <othermaciej> lots of things can indirectly contain it but not directly
- # [11:53] <othermaciej> (similarly for other table structure)
- # [11:53] <Hixie> othermaciej: you mean the vertical axis? the question is "can elements in the left hand column contain elements on the top row"
- # [11:53] <Hixie> you mean on the top row?
- # [11:53] <othermaciej> yes
- # [11:54] <Hixie> i agree that we should remove all the inner table elements from the top row
- # [11:54] <Hixie> feel free to do so
- # [11:54] <othermaciej> ok
- # [11:55] * Joins: colione (n=colione@17.247.241.83.in-addr.dgcsystems.net)
- # [11:56] <othermaciej> I will remove <html> <head> and <body> from the top row for similar reasons
- # [11:56] <annevk> bah, the CSS WG discussions should really be public
- # [11:56] <Hixie> i'm nuking option and optgroup from the first column since they're special
- # [11:57] * Quits: colione_ (n=colione@17.247.241.83.in-addr.dgcsystems.net) (Read error: 104 (Connection reset by peer))
- # [11:57] <Hixie> select too, same reason
- # [11:57] <krijnh> Hixie: can't <details> contain additional contact information?
- # [11:57] <othermaciej> whoah, what's <nest>?
- # [11:57] <Hixie> krijnh: like <details> <legend> Contact information for this page </legend> <address> ... </address> </details> ?
- # [11:57] <krijnh> Yeah
- # [11:57] <Hixie> othermaciej: part of the data template feature. nuke it, it's seriously special.
- # [11:57] <Hixie> othermaciej: same wiht <datatemplate> and <rule>
- # [11:58] <Hixie> othermaciej: i'm probably gonna nuke that whole section anyway, in favour of some apis to make it easier to make your own templating language and graft it onto html
- # [11:58] <Hixie> othermaciej: there are too many different ways of doing templates, each with their own pros and cons, to really annoint any one model
- # [11:59] <Hixie> krijnh: sounds plausible...
- # [11:59] <othermaciej> removed
- # [11:59] <krijnh> So then an article can be put in a details as well
- # [11:59] <othermaciej> looking for other things too special to be worth gridding
- # [11:59] <krijnh> <details><legend>More articles on this subject</legend><article>foo</article><article>bar</article></details>
- # [12:00] <Hixie> seems reasonable
- # [12:00] <krijnh> Or would that be wrong use?
- # [12:00] <Hixie> it seems unexpected use, but i don't see that it would be wrong... dunno
- # [12:01] <othermaciej> base, link, meta, noscript, optgroup, option, param, source, style, script, title
- # [12:01] <othermaciej> from the first row
- # [12:01] <othermaciej> any objections?
- # [12:01] <krijnh> noscript ?
- # [12:02] <othermaciej> isn't noscript allowed anywhere and therefore not worth mentioning?
- # [12:02] <othermaciej> it does not seem affected by block/inline considerations
- # [12:02] <othermaciej> but I'll not remove it for now
- # [12:03] <Hixie> leave noscript for now, it's a weird case
- # [12:04] <Hixie> but the others can go for sure
- # [12:04] <Hixie> actually <style> might be worth leaving
- # [12:05] * Quits: doublec (n=doublec@209.79.152.179)
- # [12:05] <annevk> <area> can be nuked from the top row probably
- # [12:06] <Hixie> why?
- # [12:06] <Hixie> <area>'s a tough one
- # [12:06] <Hixie> when it's allowed is unclear to me
- # [12:06] <annevk> only as descendent of <map>
- # [12:06] <annevk> or maybe only as child of <map>
- # [12:07] <Hixie> we're allowing things like <map><area/><area/></map> as well as things like <map><p>...<a/><area/>...</map>
- # [12:07] <Hixie> so far
- # [12:07] <othermaciej> I nuked <style> already, I can put it back if you really want it
- # [12:08] <othermaciej> (where <style scoped> is allowed is kind of interesting, but not really related to the core block/inline type issue)
- # [12:08] <annevk> Hixie, I'm not sure what the use of the latter is
- # [12:09] <Hixie> othermaciej: nuking is fine
- # [12:09] <Hixie> annevk: makes it easier to make sure you've got all your links and areas done together
- # [12:09] <annevk> isn't <area> a link?!
- # [12:09] <annevk> hmm
- # [12:10] <annevk> then again, I thought <map> was display:none, it isn't
- # [12:10] <Hixie> annevk: <area> is a link with a shape
- # [12:10] <krijnh> Hmm, could video contain section elements?
- # [12:10] <annevk> Hixie, per HTML4 so is <a>
- # [12:11] <Hixie> yeah but we dropped that long ago
- # [12:11] <annevk> and I'm not sure how that's relevant
- # [12:11] <Hixie> krijnh: i'd say yes, if the fallback is very detailed :-)
- # [12:11] <Hixie> annevk: ?
- # [12:11] <krijnh> Hixie: yeah, or has contact information?
- # [12:11] <Hixie> krijnh: right
- # [12:12] <Hixie> i could see one doing <video> <address> For a transcript, contact ...</address> </video>
- # [12:12] <Hixie> i guess
- # [12:12] <krijnh> Cause you put a ? at video->address :)
- # [12:15] <Hixie> fixed :-)
- # [12:15] <annevk> Hixie, I'm not sure how <area> being a shaped link is relevant
- # [12:15] <Hixie> annevk: when you're doing an image map, you want to provide both the shape link <area> and the fallback link <a>
- # [12:15] <Hixie> for each link
- # [12:15] <Hixie> easiest to do if you have them all together
- # [12:16] <Hixie> rather than as two blocks
- # [12:16] <annevk> isn't the algorithm for fallback to use <area>?
- # [12:16] <annevk> or is this in the case <map> is not supported?
- # [12:17] <Hixie> if <a>s aren't provided the UA uses <area>, but that kinda sucks compared to providing a custom fallback
- # [12:17] <annevk> that's not at all how the image map algorithm for fallback works
- # [12:17] * Joins: hdh (n=hdh@58.187.109.128)
- # [12:18] <Teratogen> bring ogg back!
- # [12:26] <hsivonen> coming up with suggested definition of <cite> is hard
- # [12:26] <hsivonen> I'm bad at writing weasel words
- # [12:26] <Dashiva> hehe
- # [12:29] <Hixie> annevk: really?
- # [12:30] <Hixie> that was my intention...
- # [12:30] <Hixie> Teratogen: :-)
- # [12:32] <Hixie> hm, should <map> allow <section> in it then? or <article>?
- # [12:32] <Hixie> or <aside> or <address>? hmm
- # [12:33] * Quits: roc_ (n=roc@121-72-24-31.dsl.telstraclear.net)
- # [12:34] <annevk> just <area> imo
- # [12:35] <hsivonen> is the diveintomark.org favicon one of the allegedly indecent IceWeasel icon suggestions?
- # [12:37] <Hixie> krijnh: ogg :-P
- # [12:37] <krijnh> :p
- # [12:38] <krijnh> Bring it back! ;)
- # [12:40] <hsivonen> Hixie: so I didn't send email about block/inline, because I dumped my points here instead
- # [12:41] <Hixie> k
- # [12:41] <Hixie> i think i mostly agreed with your points anyway
- # [12:41] <Hixie> it's turning out that there are some edge cases that i hadn't really thought of
- # [12:41] <Hixie> like should <nav> be able to contain <article>
- # [12:42] <annevk> no
- # [12:43] * hdh imagines people use narration to guide users around the site
- # [12:44] <krijnh> <body><nav><section><article>An interesting article with lots and lots of interesting links</article></section></nav></body>
- # [12:44] * Quits: virtuelv (n=virtuelv@pat-tdc.opera.com) ("Leaving")
- # [12:45] <krijnh> Hixie: is that conditional formatting?
- # [12:45] <krijnh> Yes, cool
- # [12:46] <Hixie> yeah
- # [12:47] <Hixie> ok i'm gonna go sleep, i can't concentrate anymore
- # [12:47] <Hixie> feel free to continue editing :-)
- # [12:47] <Hixie> thanks for the help btw
- # [12:47] <Hixie> really helpful
- # [12:47] <Hixie> i'll try to continue this tomorrow
- # [12:48] <hsivonen> it is "interesting" how <article> and friends that would be "easy" to implement are less supported by browsers than "hard" stuff like <canvas> and <video>
- # [12:49] <annevk> it's not entirely clear what it would mean for rendering to support <section> and other sectioning elements
- # [12:50] <krijnh> It's also a dull feature to sell :)
- # [12:50] <Camaban> hsivonen: video and canvas are 'cool' and 'new', while article doesn't 'do' anything? :)
- # [12:51] <annevk> <article> together with <h1>-<h6> can in theory effect rendering, but it's unclear how
- # [12:51] <hsivonen> krijnh: yeah. it makes me wonder if using <article> instead of <div class='article'> will ever be compelling for authors
- # [12:52] <annevk> I think if CSS gave you something like :heading(2) to style all level two headers that might work
- # [12:54] <krijnh> hsivonen: do you think authors would use the new elements already, if IE/Fx didn't close unknown block level elements immediately?
- # [12:54] <krijnh> That's probably easy to fix behavior, but I don't think it would change anything
- # [12:55] <othermaciej> :heading(n) might be handy for implementing the default rendering of <h[1-6]> as well
- # [12:56] <othermaciej> hsivonen: we could easily support default rendering as a block for all the new semantic block elements
- # [12:56] <othermaciej> hsivonen: supporting headings styled in accordance with the outline algorithm would be hard and the spec doesn't say how to do that yet
- # [12:57] <othermaciej> (or whether)
- # [12:57] <othermaciej> I will tell you that we're interested in supporting any new elements and attributes that seem like low hanging fruit in WebKit in the fairly near future
- # [13:00] <othermaciej> (that would basically be irrelevant="", sectioning elements, dialog, m if someone decides it should have some special default style
- # [13:00] <othermaciej> )
- # [13:00] <othermaciej> figure would also be low-hanging fruit if not for the <legend> issue
- # [13:02] <hsivonen> krijnh: with the IE/Firefox situation, using the new elements is not worthwhile ATM from the author POV
- # [13:03] <hsivonen> othermaciej: Opera Mobile has a nice "scroll to content feature" it would be cool to have that in WebKit, too, and both taking <article> into account
- # [13:04] <hsivonen> actually, that's the only UA-side semantic treatment of <article> that I can come up with at the moment
- # [13:04] <hsivonen> skipping to content whether on mobile or in an aural browsing setup
- # [13:06] <othermaciej> yeah, supporting the new block-level elements would not have much value besides patriotism just yet
- # [13:20] <annevk> maybe [#heading=n]
- # [13:20] <annevk> at some point there was this idea of separating intrinsic attributes of pseudo-classes, but maybe that point is moot
- # [13:24] <othermaciej> "intrinsic attributes"?
- # [13:25] <annevk> td[#col=2] was another one
- # [13:26] <hsivonen> anyway, it seems that the whole <section> thing hinges upon a selector with reasonable perf and implementation characteristics
- # [13:27] <krijnh> Something for CSS5?
- # [13:29] * Joins: hasather (n=hasather@90-231-107-133-no62.tbcn.telia.com)
- # [13:29] <othermaciej> I'm not sure why that's better than pseudo-classes
- # [13:30] <annevk> me neither, I guess the idea might have been dropped already
- # [13:31] <krijnh> Why was it an idea in the first place then?
- # [13:32] <hsivonen> Is it reasonable to expect the CSS WG to have cycles to look into an outline-dependent selector any time soon?
- # [13:33] <hsivonen> they seem to have a lot on their plate even without new HTML5 needs
- # [13:35] <othermaciej> this is the WG that's bringing us ascii art layout
- # [13:35] <othermaciej> there doesn't have to be a reason
- # [13:40] <annevk> it should be pretty easy to draft a specific selector proposal for heading:
- # [13:40] <hsivonen> hmm. the TAG is going the way of SGML: the more common concepts have the longer names: resource vs. resource representation
- # [13:40] <annevk> especially as CSS only needs to define the syntax and say that it's up to languages to define when it actually matches
- # [13:41] <hsivonen> annevk: you still need a kind of rare person who groks CSS formatter internals well enough to assess the computational feasibility with dynamics DOM changes
- # [13:42] <krijnh> no
- # [13:42] <krijnh> no
- # [13:42] <krijnh> Oops :)
- # [13:43] <annevk> I can ask the guy who implemented selectors in Opera
- # [13:43] <othermaciej> yeah, I'm not sure the current html5 outline algorithm is computationally feasible for incremental rendering and dynamic DOM updates
- # [13:43] <annevk> I suppose other implementors have ways to find out themselves
- # [13:45] <othermaciej> it's written in terms of generating a hypothetical tree and walking it
- # [13:45] <othermaciej> but for selector matching you need something that's evaluated from the element point of view
- # [13:47] <othermaciej> so it's hard to tell if it can be efficient without doing a conversion to that type of algorithm first (and making sure it is actually equivalent)
- # [13:48] <othermaciej> it's not clear to me if changes outside an <hn> element can change its heading level
- # [13:49] <othermaciej> actually it's not that clear to me how the section tree affects heading levels
- # [13:56] <annevk> <section><section><h1> would be level 3 I think
- # [14:03] * Quits: Lachy (n=Lachlan@ti200710a340-2895.bb.online.no) (Read error: 110 (Connection timed out))
- # [14:11] <hasather> The Ogg debate is like a hydra. You read one thread, and two others pop up during the time
- # [14:19] <Dashiva> And over half the mails are by the same two people
- # [14:21] <Dashiva> I'm strongly tempted to send a mail saying "So-called non-commercial entities have the option to pay for licences, they just choose not to."
- # [14:24] <Philip`> Open source entities don't have that option, since they couldn't distribute their code in a way that other people could modify and use
- # [14:26] <Dashiva> But that's just an arbitrary restriction they place on themselves. We're here to get interoperability, not to run errands for other organizations
- # [14:26] <Philip`> (unless they get a licence which allows everybody royalty-free usage of the patents for any purpose)
- # [14:26] <Philip`> (which is what Theora got, so it's not totally impossible)
- # [14:36] <hsivonen> I wonder what MPEG-LA estimates as the expected value of the overall H.264 licensing income over the lifetime of the patents discounted to present value
- # [14:41] * Philip` wonders if it's worth trying some basic comparisons of non-state-of-the-art video codecs
- # [14:44] <othermaciej> if you know how to do such a thing then sure
- # [14:44] <Philip`> I don't know how to do it especially well, hence the "basic" :-)
- # [14:45] <othermaciej> might be interesting to try this shootout with H.261, MPEG-2, H.263, etc: http://osnews.com/story.php/19019/Theora-vs-h.264/
- # [14:48] * Quits: peepo (n=Jay@host86-129-168-72.range86-129.btcentralplus.com) ("later")
- # [14:49] <hsivonen> something that hasn't been explored: it is important to have RF decoding and *an* RF encoder, but it doesn't follow that there could not exist non-RF state-of-the-art encoders or hardware decoders
- # [14:49] <hsivonen> back when *compressed* GIF encoding was encumbered, there were RF decoders
- # [14:50] <Philip`> http://www.doom9.org/index.html?/codecs-quali-105-1.htm has Theora, but it's a few years old now and I don't know how much has changed
- # [14:50] <hsivonen> and RF encoders that sucked badly (i.e. produced a stream that decoded as lzw but was not compressed)
- # [14:51] <Philip`> (Also, DVDs are very different quality to what you'd publish on the web)
- # [14:52] <othermaciej> MPEG-LA could probably limit decoder revenue to hardware implementations, or mobile devices, or both, and not lose significant revenue
- # [14:53] <othermaciej> s/revenue/royalties/
- # [14:53] <hsivonen> mobile devices is still a field-of-use restriction that would not go well with Open Source
- # [14:54] <othermaciej> realistically nearly all handsets on which you could run interesting software have paid the license fee
- # [15:01] * Joins: maikmerten (n=merten@ls5laptop14.cs.uni-dortmund.de)
- # [15:02] <alp> othermaciej: mobile distributors of webkit/gtk+ and webkit/qt at least would probably be happy with a royalty-free codec from what i gather but asking for licensing fees is pushing it. maybe the situation is different for other free browsers though
- # [15:02] <othermaciej> I don't think you need to pay royalties for the browser component if the phone already has a hardware or software decoder
- # [15:03] <othermaciej> which many do
- # [15:03] <othermaciej> could be wrong though, it's hard to understand the MPEG-LA's documents
- # [15:04] * Quits: Oeighty (n=polx@ip-118-90-51-37.xdsl.xnet.co.nz) (Read error: 104 (Connection reset by peer))
- # [15:08] <alp> for various reasons it may be necessary to ship the codec with the browser (say if the hardware only allows video overlay and you need to support more complex rendering features, or if the hardware just doesn't support the codec at all in the first place)
- # [15:08] * Joins: Lachy (n=Lachlan@ti200710a340-2895.bb.online.no)
- # [15:09] <Philip`> It's fun how FFmpeg is happy to be told to output an Ogg file, but actually it silently doesn't support it and outputs something totally different
- # [15:10] <othermaciej> fair enough, I don't really know how this works
- # [15:10] * Quits: Lachy (n=Lachlan@ti200710a340-2895.bb.online.no) (Client Quit)
- # [15:12] * Joins: Lachy_ (n=Lachlan@ti200710a340-2895.bb.online.no)
- # [15:14] <Philip`> Hmm, H.261 doesn't support 320x240 :-(
- # [15:14] <othermaciej> is that too small or too big for it?
- # [15:14] <othermaciej> (I guess that is a problem in itself)
- # [15:16] <Philip`> "Valid sizes are 176x144, 352x288"
- # [15:16] <Philip`> Great flexibility!
- # [15:17] <Philip`> Also, it looks horrible quality
- # [15:18] <Lachy_> wow, that's terrible
- # [15:18] <Lachy_> what about motion jpeg or any of the other older alternatives?
- # [15:19] <Lachy_> motion jpeg just sounds like it would have huge file sizes
- # [15:21] <annevk> motion jpeg is not a serious option
- # [15:21] <Philip`> MJPEG doesn't look that awful compared to other codecs
- # [15:21] <maikmerten> H.261 didn't even expire
- # [15:22] <maikmerten> it's a 1990 standard
- # [15:23] <Philip`> (With my current utterly rubbish test setup, only looking at 5 seconds of video, it's quite better quality than H.261 but twice the filesize and I guess I need to fiddle with the encoder settings to get a fair comparison)
- # [15:23] <maikmerten> MJPEG doesn't stand a chance against even H.261
- # [15:23] <maikmerten> no motion compensation, no inter-frame coding...
- # [15:23] <Philip`> It can do more than two different frame sizes, though :-)
- # [15:23] <maikmerten> I'll give it that ;)
- # [15:24] <maikmerten> plus there is no MJPEG standard IIRC
- # [15:24] <maikmerten> there are a lot of codecs claiming "MJPEG"
- # [15:24] <Philip`> and it can do variable bitrates, which I assume H.261 can't since it was designed for streaming over ISDN
- # [15:25] <maikmerten> H.261 should be able to do VBR
- # [15:25] <maikmerten> thanks to the very nature of video compression codecs are VBR
- # [15:26] <maikmerten> and it's actually a lot of work to get them to do CBR
- # [15:26] <maikmerten> (bitrate reservoirs etc. etc.)
- # [15:26] * MikeSmith finally gets to reading Hixie responses to messages he sent about <term> and <xref> stuff
- # [15:27] * MikeSmith goes to re-read spec for <i>
- # [15:27] <maikmerten> (well, okay, granted, it's very possible to develop codecs with a fixed bitrate and CBR)
- # [15:27] <Philip`> maikmerten: If you want to do e.g. videoconferencing over an ISDN channel, that'd have to be CBR since you can't do buffering, and I thought that was roughly what H.261 was designed for
- # [15:28] <maikmerten> (but it's difficult to do e.g. with DCT based codecs)
- # [15:28] <maikmerten> why wouldn't you be allowed to send less data than the line allows you to send?
- # [15:28] <Philip`> (but you know more about this than I do so I'm probably wrong :-) )
- # [15:28] <maikmerten> in worst case just pad with zeros ;)
- # [15:28] <Philip`> You could send less but that'd be a waste of resources
- # [15:29] <Philip`> since you're not going to be able to reuse the spare bandwidth for anything else
- # [15:29] <maikmerten> (which, actually, is one way to get CBR if you manage to ceil the bitrate: Just fill up "unused" bits with crap)
- # [15:29] <Philip`> so you might as well send higher quality images instead of lowering the bitrate
- # [15:29] <maikmerten> well, albeit it's good to use up all the bandwidth you may not be able to actually deliver
- # [15:30] <maikmerten> think compressing a perfect black frame
- # [15:30] <maikmerten> DCT will give you a nice zero run
- # [15:30] <maikmerten> near-perfect compression
- # [15:30] <maikmerten> (only protocol overhead)
- # [15:30] <Philip`> You could send 64Kb/s of a really really precise shade of black
- # [15:30] <maikmerten> so it's actually *hard* to guarantee you're using up all bandwidth ;)
- # [15:30] <Philip`> Fair enough :-)
- # [15:30] * Philip` has to go for some minutes
- # [15:31] <maikmerten> well, you mostly can't be any more precise than (0,0,0) with black ;)
- # [15:31] <Philip`> You can do (0.00000000, 0.00000000, 0.00000000) :-)
- # [15:31] <maikmerten> basically all video codecs are integer based
- # [15:31] * Philip` is gone
- # [15:32] <maikmerten> plus 0.0000000 would just be another case of "fill up with crap" ;)
- # [15:42] * Joins: dglazkov (n=dglazkov@adsl-065-081-081-030.sip.bhm.bellsouth.net)
- # [15:46] <Philip`> maikmerten: Those extra significant figures are important in physics - something that I measure as 0.0kg is probably much heavier than something I measure as 0.000000kg :-)
- # [15:47] <maikmerten> well, only if you try to give an idea with what sort of precision you were working along with the value
- # [15:47] <maikmerten> (which is often done)
- # [15:48] <maikmerten> (but for christ's (or any other religious figure's) sake - zero shall be zero ;-) )
- # [15:48] * Joins: doublec (n=doublec@209.79.152.130)
- # [15:48] <maikmerten> actually my physics teacher always got semi-angry if we didn't compact numerical values ;)
- # [15:49] <maikmerten> (0.0340000 wouuld be 0.034)
- # [15:50] <Philip`> Eww - I've always been taught that significant figures are significant, and you can't just add them or drop them off whenever you fancy
- # [15:51] <maikmerten> well, you *have* to specify what precision is used
- # [15:51] <Philip`> and the worst thing you can possibly do is write 3½ instead of 3.5, because "½" implies some kind of mathematical precision that physics never has
- # [15:51] * Philip` prefers Computer Science where everything is integers ;-)
- # [15:51] <maikmerten> aye
- # [15:58] * Joins: csarven (n=nevrasc@81-5-133-33.static.nfwebsolutions.com)
- # [16:00] * Philip` wonders what typical internet video bitrate is
- # [16:01] <maikmerten> youtube used to serve 256 kbit/s h.263 with 64 kbit/s MP3
- # [16:01] <maikmerten> (the latter 22.05 kHz, mono - MP3 just is pretty far behind)
- # [16:02] <Philip`> Based on an extensive sample of two Youtube FLVs in my /tmp, they're 320kbit/s, so that sounds right
- # [16:03] <maikmerten> yup, two is basically a perfect statistical base ;)
- # [16:03] <maikmerten> but they're batch encoded with same settings anyway
- # [16:05] <hsivonen> what press release is David Gerard referring to on whatwg@?
- # [16:06] <annevk> the one where Chris Double is quoted
- # [16:06] <annevk> about Opera and Mozilla pushing <video>
- # [16:06] * hsivonen notes that dgerard talk about wikipedia and "we" in a way that assumes that everyone knows his wikipedia affiliation
- # [16:06] <hsivonen> annevk: ah the PC World article?
- # [16:07] <annevk> PC World just copied it
- # [16:07] <annevk> just like Washington Post and several others
- # [16:07] <hsivonen> hmm. something has gotten past my HTML5 radar
- # [16:07] <Philip`> Urgh, H.263 says "Valid sizes are 128x96, 176x144, 352x288, 704x576, and 1408x1152. Try H.263+."
- # [16:07] * Philip` tries H.263+, which works
- # [16:09] <hsivonen> annevk: whose press release it was?
- # [16:09] <annevk> not sure what the original was
- # [16:09] <maikmerten> H.263+ is 1998
- # [16:09] <maikmerten> it's not close to expiring
- # [16:10] <maikmerten> H.263 itself is 1995/1996
- # [16:10] * Quits: doublec (n=doublec@209.79.152.130)
- # [16:11] <maikmerten> it's more or less a direct predecessor to MPEG4 Part 2, IIRC
- # [16:11] <Philip`> What happened to H.262? :-)
- # [16:12] <hsivonen> annevk: I don't see a <video> press release from any of WHATWG, W3C, Mozilla or Opera
- # [16:12] <maikmerten> when MPEG ran out of puppies they began consuming future standards
- # [16:12] <maikmerten> look MPEG-3 ;)
- # [16:12] <annevk> hsivonen, there was no press release
- # [16:12] <hsivonen> annevk: ok
- # [16:12] <annevk> someone made an article that was reused all over the place (even localized)
- # [16:12] <hsivonen> ok
- # [16:13] * Joins: grimboy_uk (n=grimboy@85.211.236.12)
- # [16:13] <maikmerten> press releases are boring anyway.... "CEO of ..... says.... '...glad to be here and drive innovation..... customer satisfaction.... revenue... world domination'" - not sure I ever read a really interesting press release
- # [16:14] <hdh> the opera's dork release?
- # [16:14] <hdh> bork, maybe, the spelling escaped me
- # [16:17] * Quits: grimeboy (n=grimboy@85-211-246-139.dsl.pipex.com) (Read error: 104 (Connection reset by peer))
- # [16:19] <hsivonen> the silly thing about press releases is that all the substance has to be cast into quotations so that jounalists can print them as quotations and avoid stating anything controversial in text that isn't quoted
- # [16:19] <hsivonen> so to write a press release, one has to first come up with the points, then massage them into soundbytes and then figure out who agrees to be attributed with which soundbyte
- # [16:20] <Camaban> so journos can misinterpret them, mis-quote them, and be selective about what they quote to create a story ;)
- # [16:20] <Philip`> (Hmph, I tell these things to do 256Kb/s but they end up doing 430Kb/s instead :-( )
- # [16:21] <Philip`> (Maybe they're just not designed to scale so low?)
- # [16:24] <maikmerten> many codecs have limits on how low bitrate can be
- # [16:24] <hsivonen> comparing codecs properly is very hard, because the fixed parameter is what code you ship to the client and the tricks the encoder does plays such a big role
- # [16:24] <maikmerten> tried with lower resolution?
- # [16:24] <maikmerten> aye
- # [16:24] <hsivonen> so it is quite possible that you end up testing encoders instead of decoding specs
- # [16:25] <hsivonen> Philip`: are you testing ffmpeg encoders against each other?
- # [16:25] <Philip`> Testing just decoding specs isn't very useful in practice
- # [16:25] <maikmerten> well, at least for extremely old codecs "what we have now is as good as it'll ever get"
- # [16:25] <Philip`> since people will have to encode things, using what's available
- # [16:25] <hsivonen> Philip`: it isn't but testing a bad encoder or a good encoder with bad params isn't, either
- # [16:26] <hsivonen> I'd love to have a cheat sheet of QuickTime/H.264, x264, XiphQT and ffmpeg2theora tried and true magic params
- # [16:27] <Philip`> hsivonen: Yep, I'm just looking at FFmpeg for now, which is far from ideal and I won't claim this is an especially good comparison :-)
- # [16:27] <hsivonen> since the stuff other people encode tends to look better than what I get with naïve dabbling
- # [16:28] <maikmerten> most frontends e.g. don't expose all coding options
- # [16:28] <maikmerten> like in case of Theora I usually end up altering keyframe interval, the noise threshold or even the complete set of quantization tables
- # [16:28] <hsivonen> maikmerten: or worse, they do expose a zillion options that let you shoot yourself in the foot by overstepping your AVC profile bounds
- # [16:29] <maikmerten> well, that is a genuine opportunity for formats with profiles ;)
- # [16:29] * hsivonen thinks AVC profiles and levels are an awfully bad idea from the interop POV
- # [16:29] <maikmerten> well, the argument was that at least it could be made sure restricted deviced could at least reliably support *something*
- # [16:30] <maikmerten> but I feel this has gone out of hand a bit
- # [16:31] <maikmerten> it's sometimes not quite easy to e.g. come up with a file that happens to play fine on both Playstation Portable and the iPod and some mobile phone etc. etc.
- # [16:32] <hsivonen> Google Video has the iPod/PSP magic figured out but afaik they aren't sharing it
- # [16:32] <maikmerten> especially for extremely sophisticated codecs it can make sense to have profiles that drop coding schemes that (wild example) increase CPU usage by 50% but only give 5% coding gain
- # [16:33] * Philip` can't get below 1300Kb/s with MJPEG
- # [16:33] * Joins: billmason (n=billmaso@ip156.unival.com)
- # [16:34] * Joins: jdandrea (n=jdandrea@ool-18e42ae7.dyn.optonline.net)
- # [16:34] * Quits: maikmerten (n=merten@ls5laptop14.cs.uni-dortmund.de) ("Verlassend")
- # [16:35] <MikeSmith> hsivonen, annevk - I believe the source for the "Mozilla, Opera Want to Make Video on the Web Easier" article was Jeremy Kirk of IDG
- # [16:35] <MikeSmith> pcworld article has the correct byline at least
- # [16:36] <MikeSmith> it was not a press release from Opera or Mozilla or whoever
- # [16:36] <annevk> yeah, I recall reading those names
- # [16:39] <hsivonen> MikeSmith: ok
- # [16:43] * Parts: hdh (n=hdh@58.187.109.128)
- # [17:01] <Philip`> hsivonen: Would there be any chance of your HTML Parser including a brief summary of the changes between releases?
- # [17:04] <hsivonen> Philip`: perhaps next time.
- # [17:04] <hsivonen> Philip`: this time the main difference was Mavenization
- # [17:04] * Joins: phsiao (i=shawn@nat/ibm/x-92d84a74862c8112)
- # [17:05] * Joins: grimeboy (n=grimboy@85.211.236.228)
- # [17:06] <Philip`> hsivonen: Okay - it's just useful to know if e.g. the main difference is something like Mavenization that I don't care about, or if it's important bug fixes and I should bother updating
- # [17:06] <Philip`> but since "updating" involves copying one file over another, it's not a significant issue at all :-)
- # [17:09] <hsivonen> Philip`: I can't remember if there was something else as well
- # [17:09] * hsivonen looks at logs
- # [17:11] <hsivonen> Philip`: there was also a bug fix in case you run SAX Tree without a Locator
- # [17:13] <Philip`> hsivonen: Okay, thanks
- # [17:14] <hsivonen> Philip`: also, I eleminated a bogus import that referenced a Sun-internal class and caused badness
- # [17:14] <hsivonen> Philip`: that's about it
- # [17:15] <annevk> hsivonen, see www-style for media queries
- # [17:15] * annevk tries to fix stuff
- # [17:16] * Philip` wonders what "pseudo-legal" really means
- # [17:18] <hsivonen> Philip`: my guess is that it means doing stuff that is not legal for a commercial entity to do in the United States but that is legal in e.g. Sweden or Hungary
- # [17:20] * Quits: grimboy_uk (n=grimboy@85.211.236.12) (Read error: 110 (Connection timed out))
- # [17:21] <hsivonen> annevk: aargh. I didn't realize there were comments, too
- # [17:23] <annevk> escapes, comments, error handling of syntax errors
- # [17:32] * Joins: maikmerten (n=maikmert@L932c.l.pppool.de)
- # [17:55] * Joins: doublec (n=doublec@li5-223.members.linode.com)
- # [18:24] * Joins: gsnedders (n=gsnedder@host86-135-224-200.range86-135.btcentralplus.com)
- # [18:24] * Joins: aroben (i=aroben@unaffiliated/aroben)
- # [18:25] * gsnedders probably shouldn't actually even look at the emails
- # [18:26] <gsnedders> woah.
- # [18:26] <gsnedders> 102 in whatwg alone
- # [18:27] <annevk> November 2007: 110 e-mails
- # [18:27] <annevk> Decenber 2007: 208 e-mails so far
- # [18:30] * Quits: Camaban (n=adrianle@host217-41-27-233.in-addr.btopenworld.com) (Read error: 104 (Connection reset by peer))
- # [18:30] * Joins: Camaban (n=adrianle@host217-41-27-233.in-addr.btopenworld.com)
- # [18:38] * gsnedders starts writing a reply
- # [18:38] * gsnedders apologies and closes it
- # [18:39] <Philip`> gsnedders: If it's like your last two messages, it'll be sucked into my spam folder, so I won't even notice :-)
- # [18:39] <gsnedders> I just think there's no point.
- # [18:39] * Quits: jruderman (n=jruderma@c-67-180-15-227.hsd1.ca.comcast.net)
- # [18:39] <gsnedders> There are formats with no valid patents that cover them (guaranteed, as they are too old) — the same can't be said for Theora
- # [18:40] <gsnedders> My preference is currently H.261/Vorbis (in some container, dunno what)
- # [18:44] * Quits: jdandrea (n=jdandrea@ool-18e42ae7.dyn.optonline.net) (Read error: 110 (Connection timed out))
- # [18:44] <annevk> it's better to wait a week like I said, because then reports from the video workshop will be in to better inform us what's going on
- # [18:45] <doublec> anyone here at the workshop right now (apart from me?)
- # [18:45] * Joins: jgraham_ (n=james@81-86-217-3.dsl.pipex.com)
- # [18:46] <gsnedders> annevk: see what I wrote to you last night?
- # [18:46] * Joins: Lachy__ (n=Lachlan@cm-84.215.41.149.getinternet.no)
- # [18:50] <Philip`> gsnedders: H.261 is limited to 176x144 and 352x288, which seems a bit rubbish
- # [18:50] <annevk> gsnedders, I think so, looked like a start
- # [18:50] * gsnedders doesn't remember it being limited to specific resolutions
- # [18:50] <gsnedders> oh well.
- # [18:50] <gsnedders> MJPEG anyone? :P
- # [18:51] <Philip`> gsnedders: FFmpeg doesn't support other sizes
- # [18:51] <gsnedders> Philip`: wikipedia agrees with you
- # [18:51] <Philip`> H.263 seemingly supports the five sizes on http://en.wikipedia.org/wiki/Common_Intermediate_Format
- # [18:51] * Joins: kingryan (n=kingryan@dsl092-002-056.sfo1.dsl.speakeasy.net)
- # [18:51] <maikmerten> MJPEG is even worse than GIF ;)
- # [18:51] <Philip`> (which all have a stupid naming convention)
- # [18:51] <maikmerten> GIF can at least say "keep this part unchanged from the last frame" ;)
- # [18:52] <maikmerten> MJPEG just codes everything, every frame
- # [18:52] <Philip`> maikmerten: But keyframes in GIF are gigantic :-)
- # [18:52] <maikmerten> H.263 is a mid-nineties standard
- # [18:52] <maikmerten> won't expire any time soon
- # [18:52] <maikmerten> Philip`, sure, GIF is horrid
- # [18:52] <maikmerten> I wasn't completely serious about GIF
- # [18:52] <doublec> maikmerten, so you suggest animated gif's as the baseline ;)
- # [18:53] <maikmerten> but GIF at least is able to exploit temporal redundancy ;)
- # [18:53] <gsnedders> I wasn't completely serious about MJPEG :P
- # [18:53] <maikmerten> doublec, well, would also save some implementation effort, right? ;)
- # [18:53] <doublec> absolutely :)
- # [18:53] <maikmerten> ah, good
- # [18:54] <maikmerten> having like 500 outdated and underperforming-till-it's-no-fun codec as baseline for sure isn't desirable ;)
- # [18:54] <Philip`> Couldn't you extend JPEG to proper 3D (2D+time, making use of redundancy in all directions) by just using a 3D DCT or something? :-)
- # [18:54] <maikmerten> Philip`, doing anything clever would again make you target to submarine patents
- # [18:54] * Quits: Lachy_ (n=Lachlan@ti200710a340-2895.bb.online.no) (Read error: 110 (Connection timed out))
- # [18:55] <maikmerten> because you'd effectively develop a new codec
- # [18:55] * Joins: jruderman (n=jruderma@corp-241.mountainview.mozilla.com)
- # [18:57] <maikmerten> the only old old old codec I know that is really expired would be H.120 - 2 MBit/s video conferencing
- # [18:57] <maikmerten> oh joy
- # [18:57] <maikmerten> (it's from 1982)
- # [18:57] <maikmerten> (and no, I don't know a nowadays implementation)
- # [18:57] <gsnedders> H.261 at least is commonly shipped
- # [18:58] <maikmerten> 1990
- # [18:58] <maikmerten> currently not old enough
- # [18:58] <gsnedders> H.261 is 1982
- # [18:58] <gsnedders> revised 1988
- # [18:58] <maikmerten> not to my knowledge
- # [18:58] * gsnedders looks up
- # [18:58] <Philip`> I wonder about Bink
- # [18:59] <gsnedders> H.260 is that
- # [18:59] <maikmerten> H.261 is a 1990 ITU-T video coding standard originally designed for transmission over ISDN lines on which data rates are multiples of 64 kbit/s. It
- # [18:59] <gsnedders> H.261 is 1990
- # [18:59] <maikmerten> While H.261 was preceded in 1982 by H.120 [1][2] (which also underwent a revision in 1988 of some historic importance) as a digital video coding standard, H.261 was the first truly practical digital video coding standard (in terms of product support in significant quantities).
- # [18:59] <maikmerten> Wikipedia
- # [18:59] <maikmerten> ^^ yeah, I know this is not the ultimate source
- # [18:59] <maikmerten> but H.261 in 1990 just makes sense from the history-point-of-view
- # [19:00] <maikmerten> it led to a direct line of successors to H.264
- # [19:00] * gsnedders can't remember it
- # [19:00] <gsnedders> but there again, I wasn't yet alive in 1990 :P
- # [19:01] * Quits: jruderman (n=jruderma@corp-241.mountainview.mozilla.com)
- # [19:04] * Joins: jruderman (n=jruderma@corp-241.mountainview.mozilla.com)
- # [19:05] * Joins: dbaron (n=dbaron@corp-241.mountainview.mozilla.com)
- # [19:05] * Parts: Camaban (n=adrianle@host217-41-27-233.in-addr.btopenworld.com)
- # [19:06] <Teratogen> bring back ogg!
- # [19:07] <gsnedders> Teratogen: _CAN YOU PLEAE MAKE A CONSTRUCTIVE COMMENT!?_
- # [19:07] <Teratogen> yes
- # [19:07] <Teratogen> BRING BACK OGG!
- # [19:07] <gsnedders> why?
- # [19:07] <Teratogen> because it's free!
- # [19:08] <gsnedders> So? What advantages does it have over, say, H.260 or Dirac?
- # [19:08] <Teratogen> freedom!
- # [19:08] <maikmerten> H.260 is oooooold beyond usefulness
- # [19:08] <gsnedders> All three are free (in terms of cost to license patents).
- # [19:08] <Teratogen> ogg totally rocks
- # [19:08] <maikmerten> Dirac is not finished yet and big players would still be scared about submarines
- # [19:09] <Philip`> If we just built web browsers on land instead of in the sea, submarines wouldn't be a problem at all
- # [19:09] * gsnedders hugs Philip`
- # [19:09] <maikmerten> I second that
- # [19:10] <gsnedders> Ogg is no help if it does not achieve interoperability between all browsers.
- # [19:10] <maikmerten> their choice.
- # [19:11] <maikmerten> there *is* no less-than-20-years old codec they'd accept
- # [19:11] <gsnedders> I don't particularly care whose choice it is. I want a video format I can use in every browser.
- # [19:11] <Philip`> gsnedders: You can use FLV
- # [19:12] <gsnedders> silly Philip`. that works in Flash, not any browser.
- # [19:12] <gsnedders> :P
- # [19:14] <Philip`> Flash works in any browser, and it works now, and in a few years it'll still work in more installed browsers than <video> even if IE8/FF3/Opera9.5/Safari4 add support
- # [19:14] <gsnedders> I know, that's true.
- # [19:14] <Philip`> and it'll support VP6, which is better than H.263
- # [19:14] <gsnedders> Philip`: it doesn't run on browsers on IA-64!
- # [19:16] <doublec> or any new hardware or devices that comes along
- # [19:16] <doublec> they'd have to rely on the flash vendor to port their software to it
- # [19:16] <Philip`> It doesn't run on Lynx either, and there's probably Lynx users than IA-64 users :-p
- # [19:16] <Philip`> s/probably/probably more/
- # [19:16] <gsnedders> Philip`: I dunno. probably more IA-64 users, but most won't run browsers on the it :P
- # [19:17] * annevk wonders when the Ogg discussion stops
- # [19:18] <gsnedders> annevk: Christmas, because everyone is away :)
- # [19:18] <Philip`> I wonder how much it cost to get Flash on Opera Wii
- # [19:19] * gsnedders is glad we don't require 100% consensus on everything for REC
- # [19:20] <gsnedders> http://xkcd.com/356/ — you know the worst part? I actually am now stuck thinking about that.
- # [19:21] <Philip`> gsnedders: Just do a numerical simulation :-p
- # [19:21] <gsnedders> I've moved on.
- # [19:21] <gsnedders> Better things to waste my time with.
- # [19:21] <gsnedders> (where <video> is worse)
- # [19:22] * Quits: doublec (n=doublec@li5-223.members.linode.com) (Remote closed the connection)
- # [19:23] * Joins: doublec (n=doublec@li5-223.members.linode.com)
- # [19:23] * Parts: doublec (n=doublec@li5-223.members.linode.com) ("Leaving")
- # [19:24] * Joins: doublec (n=doublec@li5-223.members.linode.com)
- # [19:26] * Quits: doublec (n=doublec@li5-223.members.linode.com) (Client Quit)
- # [19:38] <gsnedders> hmm… vital maths test tomorrow. do I revise (i.e., learn stuff I missed when I was ill which will lead me to fail :P) or work on HTTP parsing, or write on my blog?
- # [19:44] * Joins: roc (n=roc@121-72-24-31.dsl.telstraclear.net)
- # [19:54] * Joins: doublec (n=doublec@li5-223.members.linode.com)
- # [20:05] * Quits: jruderman (n=jruderma@corp-241.mountainview.mozilla.com)
- # [20:08] * Joins: jruderman (n=jruderma@corp-241.mountainview.mozilla.com)
- # [20:18] <Philip`> gsnedders: By the way, would you be interested in information about HTTP response headers in the wild? There's some data I've got already, and some other stuff would be easy to collect, but I have no idea if it'd be useful for anything at all
- # [20:22] <maikmerten> gargh, somehow my messages to whatwg go to the moderation queue first because I subscribed as <blabla>@gmail.com but apparently Google is "correcting" the sender address to <blabla>@googlemail.com
- # [20:22] * Quits: roc (n=roc@121-72-24-31.dsl.telstraclear.net)
- # [20:22] <maikmerten> is there a way to get the list accept that gmail.com "==" googlemail.com ?
- # [20:23] <maikmerten> (I really have to thank that brilliant german guy registering "GMail" as trademark and suing Google so all german users are "googlemail")
- # [20:24] <gsnedders> Philip`: yeah, sure. could you possibly drop it in an email to me?
- # [20:24] <Philip`> maikmerten: In Settings / Accounts / 'Send mail as', is that where it claims it's @gmail.com when actually it's not?
- # [20:25] <maikmerten> Philip`, I'll check
- # [20:25] * Quits: doublec (n=doublec@li5-223.members.linode.com) ("Leaving")
- # [20:25] <maikmerten> Philip`, wasn't aware there was a way to specify these things
- # [20:25] <maikmerten> anyway, thanks for the tip
- # [20:25] <Philip`> (The WHATWG moderation queue never gets moderated, so anything sent there will be lost)
- # [20:25] <gsnedders> wow. the email on whatwg won't stop.
- # [20:26] <maikmerten> Philip`, same policy as here at xiph.org ;)
- # [20:26] <zcorpan> oook. i've now cought up with whatwg email (mostly by marking large chunks as read)
- # [20:26] <zcorpan> not much that was interesting, actually
- # [20:27] <maikmerten> "You cannot send e-mail from maikmerten@gmail.com"
- # [20:27] <maikmerten> I'll just unsubscribe and resubscribe with googlemail.com
- # [20:27] <Philip`> That sounds irritating
- # [20:29] * Joins: grimboy_uk (n=grimboy@85-211-244-14.dsl.pipex.com)
- # [20:30] <Philip`> I (in the UK) get a "Google Mail" logo rather than "Gmail", but it seems to be happy with my account staying as @gmail.com
- # [20:30] <Philip`> so I'm not quite sure how all this stuff works
- # [20:30] * Quits: kingryan (n=kingryan@dsl092-002-056.sfo1.dsl.speakeasy.net)
- # [20:31] <deltab> which MLM is it?
- # [20:31] <gsnedders> My older email is @gmail.com
- # [20:31] <gsnedders> my newer is @googlemail.com
- # [20:32] <maikmerten> Philip`, may indeed be the "GMail" is a registered trademark in germany thingie
- # [20:32] <gsnedders> maikmerten: there was a dispute in the UK too
- # [20:32] <maikmerten> oh, wasn't aware of that
- # [20:32] <deltab> I think mailman supports alternate sending addresses
- # [20:33] <maikmerten> too bad I registered "AMail", "BMail" etc. but stopped at "FMail" ;)
- # [20:33] <Philip`> gsnedders: http://www.cl.cam.ac.uk/~pjt47/misc/headers.xml.bz2 is the headers from ~15K pages, as parsed by HttpClient
- # [20:34] <Philip`> (which is the only data I've got at the moment)
- # [20:35] <gsnedders> Philip`: it's served as application/xml!
- # [20:35] <Philip`> Uh, I think I don't care that it's served as ap...
- # [20:35] <Philip`> Yeah, that
- # [20:35] <Philip`> Not my web server :-p
- # [20:36] <deltab> maikmerten: instead of un/resubscribing you shoudl be able to use this: http://list.org/mailman-member/node22.html
- # [20:36] * gsnedders remembers that SEE doesn't like large files
- # [20:36] <maikmerten> deltab, too late, I'm afraid.... :(
- # [20:36] <maikmerten> (and yeah, shame on me for not finding that myself)
- # [20:37] <gsnedders> Philip`: how's that done? just saving what it gives as headers?
- # [20:39] <Philip`> gsnedders: Yes, specifically via http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apache/commons/httpclient/HttpMethod.html#getResponseHeaders()
- # [20:39] <Philip`> (i.e. using whatever kind of parsing code is provided by that)
- # [20:40] <Philip`> excluding anything that doesn't respond with 200
- # [20:41] <Philip`> and replacing control characters (except 9/A/D) with spaces
- # [20:42] <Philip`> (The output isn't necessarily grouped by uri, since it's processed multithreadedly)
- # [20:46] * Joins: madness_ (n=mng@client-86-27-168-55.popl.adsl.virgin.net)
- # [20:46] <gsnedders> Philip`: you got any issues with me publishing anything based on it?
- # [20:46] * gsnedders assumes not seeming he's just linked to the data in a publicly logged IRC channel
- # [20:46] <Philip`> gsnedders: No - it's all from public web sites anyway, and I didn't ask them for permission ;-)
- # [20:51] * gsnedders goes back to fumbling around with Python
- # [20:51] * Quits: grimeboy (n=grimboy@85.211.236.228) (Read error: 110 (Connection timed out))
- # [20:52] * Philip` can't see an obvious way to get unparsed headers from HttpClient
- # [20:52] * Quits: madness (n=mng@client-82-2-93-126.manc.adsl.virgin.net) (Read error: 110 (Connection timed out))
- # [21:02] * Joins: doublec (n=doublec@li5-223.members.linode.com)
- # [21:04] * Joins: roc (n=roc@202.0.36.64)
- # [21:04] * Quits: dbaron (n=dbaron@corp-241.mountainview.mozilla.com) ("8403864 bytes have been tenured, next gc will be global.")
- # [21:06] * Joins: dbaron (n=dbaron@corp-241.mountainview.mozilla.com)
- # [21:14] * Quits: maikmerten (n=maikmert@L932c.l.pppool.de) ("Leaving")
- # [21:15] * Quits: weinig (n=weinig@c-71-198-176-23.hsd1.ca.comcast.net)
- # [21:18] <gsnedders> Philip`: xml.parsers.expat.ExpatError: not well-formed (invalid token): line 14534, column 56 :(
- # [21:19] <gsnedders> Philip`: ISO-8859-1 file, with encoding undefined it seems
- # [21:19] <Philip`> gsnedders: Argh
- # [21:20] <Philip`> It was correct in my initial XML file, but it looks like xml_grep messes it up
- # [21:21] <Philip`> which is odd since it claims to be doing UTF-8
- # [21:21] <hsivonen> any recommendable JavaScript plug-in for Eclipse?
- # [21:22] <roc> The Aptana stuff is supposed to be good
- # [21:22] <Philip`> and if I set "--encoding utf-8" then it removes all linebreaks
- # [21:23] * gsnedders is failing at Python
- # [21:25] <gsnedders> I'm getting a KeyError with self.headers[name] = [{"name": name, "uri": uri, "value": value}]
- # [21:27] <Philip`> gsnedders: Updated http://www.cl.cam.ac.uk/~pjt47/misc/headers.xml.bz2 so it should hopefully be utf-8 now
- # [21:27] <Philip`> Also I think I removed the <file> element
- # [21:28] * gsnedders is just using getElementsByTagName() anyway
- # [21:28] <Philip`> gsnedders: I thought it only gave KeyError when trying to read a non-present key
- # [21:29] <gsnedders> Philip`: so did I.
- # [21:29] <gsnedders> "Raised when a mapping (dictionary) key is not found in the set of existing keys." — TFM
- # [21:29] <Philip`> About gEBTN: Ah, okay - I've been avoiding the DOM since my XML files are 200MB :-)
- # [21:30] * gsnedders doesn't particularly care how long the scripts take to run
- # [21:30] <gsnedders> I'd be doing it in C if I did :P
- # [21:30] * Philip` does care, because he's lazy and doesn't like waiting
- # [21:32] <inimino> gsnedders: self.headers['name'] perhaps?
- # [21:32] <gsnedders> inimino: no, name is a variable
- # [21:33] <inimino> oh, ok
- # [21:33] <gsnedders> (and set)
- # [21:33] <gsnedders> odd.
- # [21:33] <gsnedders> now with the new headers.xml it works
- # [21:33] <Philip`> That sounds impossible
- # [21:34] <gsnedders> totally real, though.
- # [21:34] <gsnedders> I love software :P
- # [21:35] <Philip`> I love only software that doesn't involve character encodings
- # [21:35] <gsnedders> Philip`: like what? :P
- # [21:35] <Philip`> Like anything that just uses integers :-)
- # [21:35] <inimino> Philip`: but not as character codes?
- # [21:36] <gsnedders> Philip`: but output? :P
- # [21:36] <gsnedders> Philip`: how do you encode those integers for display?
- # [21:36] <Philip`> inimino: Using integers for character codes is okay, as long as you never have to interpret those integers as characters :-)
- # [21:37] <Philip`> gsnedders: If you're only outputting and never inputting, then you don't have to care about encoding errors, because you just shove stuff through printf() and if the user gets garbage then it's their problem for not using ASCII
- # [21:37] <gsnedders> :D
- # [21:37] <inimino> heh
- # [21:40] * Philip` 's current work's only user interface is via Telnet outputting to a serial console in a virtual machine connected to the host through UDP and then passed through Perl and Python scripts
- # [21:40] <Philip`> so there's pretty much no chance of me getting character encodings straight, so I'm just sticking with ASCII there to save myself the pain
- # [21:41] <Philip`> (Actually, it's a Telnet server and a netcat client, so all the magic Telnet commands get printed out to the screen)
- # [21:43] * inimino guesses there is a story behind doing it that way
- # [21:45] <hsivonen> roc: thanks. installing now
- # [21:45] <Philip`> inimino: I'm testing some networking stuff, so it has to be done in VMs, and then that's the best way I've found to collect the output from them
- # [21:51] * Quits: csarven (n=nevrasc@81-5-133-33.static.nfwebsolutions.com) (Read error: 104 (Connection reset by peer))
- # [22:07] <hsivonen> annevk: where does HTML5 allow target='' on <form>?
- # [22:10] * Quits: dglazkov (n=dglazkov@adsl-065-081-081-030.sip.bhm.bellsouth.net)
- # [22:20] <gsnedders> Philip`: ValueError: too many values to unpack — your file is too big for what I want :P
- # [22:21] <Philip`> gsnedders: Uh, that sounds like an odd reason to get ValueError
- # [22:21] <gsnedders> Philip`: well, there are over ~15k items in the dictionary I'm trying to iterate over
- # [22:22] <Philip`> I don't see why that would be a problem
- # [22:23] <gsnedders> Philip`: and your saying of ~15k is wrong. ~115k would've been closer :) (116945 FYI)
- # [22:24] <Philip`> It's ~15K unique documents, mostly with >1 header each
- # [22:24] <gsnedders> ah. ~15k documents.
- # [22:24] <gsnedders> that is what you said, actually.
- # [22:27] <gsnedders> This is starting to get really annoying.
- # [22:27] <gsnedders> http://mail.python.org/pipermail/python-list/2006-June/387414.html
- # [22:27] <gsnedders> hmm
- # [22:28] <gsnedders> the value of the dictionary is a list
- # [22:28] * Quits: ROBOd (n=robod@89.122.216.38) ("http://www.robodesign.ro")
- # [22:28] <Philip`> What do you mean by "the value of"?
- # [22:29] <gsnedders> the value of every key is a list.
- # [22:30] <Philip`> The bit where you said [{"name": ...}] is a list because of the []
- # [22:30] <Philip`> or do you mean something else?
- # [22:31] <gsnedders> {"foo": "bar"} where "foo" is the key and "bar" the value
- # [22:32] * Philip` doesn't understand the problem
- # [22:33] * gsnedders now understanding the problem writes a tiny exemplar
- # [22:34] <gsnedders> Philip`: http://pastebin.ca/813896 — make that work.
- # [22:35] <hsivonen> Lachy__: http://html5.lachy.id.au/ could be better if the form was seeded with an HTML5 skeleton document
- # [22:35] <Philip`> gsnedders: foo.items()
- # [22:37] <Philip`> because "for k in foo" iterates over keys, whereas "for (k, v) in foo.items()" iterates over (key, value) pairs
- # [22:37] <gsnedders> and for k, v in foo?
- # [22:37] <Philip`> (http://docs.python.org/lib/typesmapping.html shows most of the useful functions)
- # [22:38] <hsivonen> Lachy__: the validate html5 button works only once for me on http://html5.lachy.id.au/ in Firefox 2
- # [22:38] <hsivonen> Lachy__: I have to reload to make the button work again
- # [22:38] <Philip`> gsnedders: That will iterates over keys, and try to unpack each key into a (k,v) tuple, which will raise ValueError because your keys are strings and can't be unpacked
- # [22:38] <gsnedders> ah
- # [22:39] <Philip`> (It's like "for x in foo: k, v = x")
- # [22:39] <Philip`> (in terms of iterating over keys rather than keys+values)
- # [22:39] <gsnedders> ah
- # [22:39] * gsnedders doesn't pretend to be anything but a python n00b
- # [22:41] <Philip`> It mostly makes sense when you can see which rules apply - it's not like Perl where magical context-sensitive things happen and you'll never understand unless you read that particular detail in the documentation and newsgroups :-)
- # [22:44] <Philip`> Hmm, I want to watch some video on the web tomorrow, but it's streaming Windows Media and I'm not sure how to handle that
- # [22:44] <Lachy__> hsivonen, the validate button works for me all the time, without reloading
- # [22:46] <Lachy__> hsivonen, send me an email about adding a template document to it, and I'll see what I can do when I get back from Linkoping on Sunday
- # [22:47] <hsivonen> Lachy__: ok
- # [22:47] * Lachy__ is now known as Lachy
- # [22:48] * gsnedders has kinda given up hope at actually passing the maths test tomorrow, having missed a couple of weeks, and barely knowing one section
- # [22:49] <Lachy> gsnedders, maths isn't too hard
- # [22:49] * Joins: weinig (n=weinig@17.203.15.140)
- # [22:49] <Lachy> which particular maths topics are you covering this year?
- # [22:51] <gsnedders> Lachy: http://www.sqa.org.uk/files/nq/C10012.pdf (which I think gives enough detail :P)
- # [22:51] <Philip`> Aha, mplayer works
- # [22:52] * Philip` wonders if anyone happens to know how to record and watch a stream simultaneously
- # [22:52] <gsnedders> can anyone explain the reasoning behind there being both plus/minus and minus/plus signs?
- # [22:53] <gsnedders> Lachy: it's the stuff on page 17 of the PDF that I've missed
- # [22:53] <Philip`> gsnedders: They're useful for e.g. "+/- x = - (-/+ x)"
- # [22:54] <Philip`> i.e. representing two versions of the equation, where one has a mixture of + and -, and the other has the +s and -s flipped
- # [22:54] * gsnedders doesn't see how that helps
- # [22:54] <Lachy> gsnedders, on which page can I find one of these minus-plus signs?
- # [22:54] <Lachy> I've never seen one before
- # [22:55] <Philip`> You can't say "+/- x = - (+/- x)" because that would be interpreted as "+x = -(+x) and -x = -(-x)" which is untrue
- # [22:55] <gsnedders> Lachy: the same page (p17 of the PDF, labelled within itself as p16)
- # [22:55] <gsnedders> Philip`: ah. so then you have to take both from the same side!
- # [22:55] <Philip`> In that cos example, it means "cos(A+B) = blah - blah; and cos(A-B) = blah + blah"
- # [22:56] <gsnedders> yeah, that makes sense now
- # [22:56] <Philip`> gsnedders: Yep
- # [22:56] <gsnedders> (I've just about done the basics of the top two formula)
- # [22:56] <gsnedders> and obviously the third is just rearranged
- # [22:56] <Philip`> (though sometimes +/- and -/+ are used in a context-sensitive way and don't actually work like that :-) )
- # [22:56] <gsnedders> (or rather, a rearranged copy of the above)
- # [22:57] <Philip`> The third/fourth are just taking A=B
- # [22:57] <gsnedders> Philip`: 15332 pages of HTTP headers, BTW
- # [22:58] <gsnedders> Philip`: I understand the first equality on the forth, but not the second/third equalities
- # [22:59] <gsnedders> Philip`: are all the headers from accessing the page once, or not?
- # [23:00] <Philip`> cos^2 x + sin^2 x = 1
- # [23:00] <Philip`> ...is the relevant fact that should be known
- # [23:00] <gsnedders> that more or less makes sense from a graph, yeah.
- # [23:00] <Philip`> so that gives the cos^2 A - sin^2 A = 2cos^2 A - 1 and suchlike
- # [23:00] * Quits: Teratogen (i=leontopo@unaffiliated/teratogen) (Read error: 110 (Connection timed out))
- # [23:02] <Lachy> there was a time about 10 years ago where I would have been able to do those trig equations. Now I just stare at them blankly
- # [23:02] <jgraham> IRC clearly needs better support for maths notation
- # [23:02] <gsnedders> Philip`: LaTeX!
- # [23:02] <roc> IRC should be XML
- # [23:02] <roc> then we could post our HTML examples directly
- # [23:02] <roc> and use MathML
- # [23:03] <gsnedders> peh. start with the basics. get a universally accepted character encoding on IRC :)
- # [23:03] <Philip`> gsnedders: Each page was GETed once, but the <header uri> in the output is the result of redirections, so it's possible that some pages redirected to the same location
- # [23:03] <gsnedders> kk
- # [23:03] <roc> and play XSS pranks on each otehr
- # [23:03] <Hixie> lordy what a lot of main
- # [23:03] <Lachy> UTF-8 seems to be fairly widely accepted for IRC these days
- # [23:03] <gsnedders> Philip`: there's a page that claims to have 94 headers
- # [23:03] <Philip`> gsnedders: Maybe it'd be more helpful if I gave the original unique requested URI instead of the redirected result?
- # [23:03] <Hixie> half of this video mail has neither the word "video" nor the word "ogg" in it
- # [23:03] <Hixie> sheesh
- # [23:03] <roc> fear the wrath of Ogg!
- # [23:04] <gsnedders> Philip`: give the original URI for each request (i.e., a redirect has a different URI)
- # [23:04] <jgraham> Hixie: That's to make it hard to automatically redirect to /dev/null ;)
- # [23:04] <Lachy> I can't believe the whole ogg discussion is still going on, on far too many different lists
- # [23:05] <Hixie> what i'm amused by is that for every person sending 10 flames to one of the lists, i get a person e-mailing me privately telling me that they have my support and that they believe we're doing the right thing
- # [23:05] <gsnedders> I'm not totally sure whether it was the right thing to do.
- # [23:05] <gsnedders> Though I'm sure plenty of the people on the mailing list think I agree with you :)
- # [23:06] * Quits: billmason (n=billmaso@ip156.unival.com) (".")
- # [23:06] * hsivonen finally replies to an ogg email
- # [23:06] <gsnedders> Philip`: least headers is 3
- # [23:06] * Joins: heycam` (n=cam@clm-laptop.infotech.monash.edu.au)
- # [23:07] <hsivonen> evidently, rudd-o hadn't read the spec before he started his slashdot campaign
- # [23:07] <gsnedders> Philip`: median is 7, mean is 8
- # [23:07] <Hixie> hah, the first comment on http://digg.com/tech_news/Nokia_and_Apple_seem_to_have_succeeded_in_suppressing_ogg is a complete non-sequitur
- # [23:07] <Hixie> hsivonen: shocking, that
- # [23:07] <gsnedders> hsivonen: why bother? it's only another damned technical document!
- # [23:07] <gsnedders> (on a totally unrelated note, I updated the to-do list on the tolerant http parsing spec)
- # [23:08] <Lachy> oh wow, I never expected accessibility to come up in the discussion: [whatwg] HTML 5, OGG, competition, civil rights, and persons with disabilities
- # [23:09] <hsivonen> Lachy: I must have skipped that message.
- # [23:09] <gsnedders> (that's <http://stuff.gsnedders.com/draft-sneddon-http-parsing-00.html> or .txt)
- # [23:09] <Philip`> gsnedders: If I give the original URI, it's still going to return the final after-redirection request's headers, so if several URIs redirect to the same place then it'll repeat the redirection target's headers
- # [23:09] <Philip`> which isn't necessarily bad, but it's something to be aware of
- # [23:09] <gsnedders> Philip`: ergh.
- # [23:10] * Quits: heycam` (n=cam@clm-laptop.infotech.monash.edu.au) (Client Quit)
- # [23:13] * Philip` wonders how to select elements of type A or type B using the subset of XPath supported by xml_grep
- # [23:14] <Hixie> i think i have found an easy way to achieve my goals of replying to hundreds of e-mails by month's end
- # [23:14] <Philip`> Oh, looks like I can't do that
- # [23:18] <Philip`> gsnedders: http://www.cl.cam.ac.uk/~pjt47/misc/headers2.xml.bz2 has the original request URI, and some <redirect>s to point out what got redirected
- # [23:18] * gsnedders just commited into hg the entire XML file
- # [23:19] <Philip`> The new XML file is indented differently, just to make fun diffs
- # [23:20] <gsnedders> what is the <redirect> element? just noting movement?
- # [23:20] <Philip`> Yes - it's added whenever the request URI and response URI differ
- # [23:20] <gsnedders> pay any attention to how many redirects it has?
- # [23:20] <Philip`> (i.e. when HttpClient did whatever magical redirection-handling it does)
- # [23:20] <Philip`> It has less than 100 redirects, but that's all I know
- # [23:21] <Philip`> (because otherwise it throws an exception and aborts)
- # [23:21] <hsivonen> Philip`: did you write your own spider based on HttpClient and the Validator.nu parser?
- # [23:21] <gsnedders> Philip`: different root element, too
- # [23:22] <Philip`> gsnedders: Yes, but you said you were using getElementsByWhatever so I assumed that wouldn't matter, and I used grep/echo/cat instead of xml_grep to extract the bits from my original XML file
- # [23:22] <gsnedders> Philip`: yeah, just an observation :)
- # [23:22] <Philip`> which is totally not the right way to do it :-)
- # [23:22] <gsnedders> http://hg.gsnedders.com/cgi-bin/hgwebdir.cgi/http-parsing/file/96df15d57efb/Philip%20Taylor%27s%20Header%20Data/README.txt — that all right?
- # [23:23] <hsivonen> Philip`: is the code that you are using to drive HttpClient in SVN somewhere?
- # [23:23] <Philip`> hsivonen: I don't think it's a spider since it doesn't follow links at all, but I did write my own thing to download/analyse a list of HTML files using HttpClient and the Validator.nu parser
- # [23:23] <hsivonen> Philip`: ok
- # [23:23] <Philip`> hsivonen: It isn't at the moment
- # [23:24] <hsivonen> Philip`: surely at that point you could make links from the parse tree feed back into the download list
- # [23:24] <hsivonen> although it probably isn't that simple
- # [23:25] <Philip`> hsivonen: I'm not sure exactly what you mean
- # [23:25] <Philip`> gsnedders: The last paragraph doesn't really make any sense :-)
- # [23:25] <hsivonen> If you analyse docs, presumably the contain links and those could be put on the list to download/analyse
- # [23:26] <hsivonen> but then there's robots.txt
- # [23:26] <gsnedders> Philip`: that's true, but I only did it quickly
- # [23:26] <Philip`> hsivonen: Ah, yes
- # [23:26] <hsivonen> and crawling in a reasonable breadth-first order etc, etc
- # [23:26] <Philip`> hsivonen: I'd prefer to use someone else's code rather than do all that work
- # [23:27] <Philip`> (I'm not even looking at robots.txt now, since that would double the number of requests I make)
- # [23:27] <gsnedders> Philip`: "It may not be grouped by URI fully, as it is not processed by a single thread"?
- # [23:27] <Philip`> gsnedders: Is that sentence needed at all?
- # [23:28] <gsnedders> Philip`: not really, but I may as well put it there in case anyone ever cares.
- # [23:28] <Philip`> gsnedders: It just means it might have <header uri=a/><header uri=b/><header uri=a/>, which isn't an extremely interesting observation
- # [23:28] <hsivonen> Philip`: the Internet Archive spider has the kitchen sink in it but seems to be picky about its execution environment according to docs
- # [23:29] <hsivonen> Philip`: also, the code base isn't particularly approachable due to the kitchen sink nature
- # [23:29] <gsnedders> Philip`: it means you can't do anything that assumes it's in order, which you might sometimes want to do
- # [23:31] <Philip`> hsivonen: Hmm, it does sound not entirely trivial
- # [23:31] <Philip`> I'm not sure how worthwhile it would be to do actual spidering, rather than sticking with the dmoz.org list
- # [23:32] <hsivonen> Philip`: isn't dmoz biased towards front pages?
- # [23:32] <Philip`> (particularly since I can't do especially extensive spidering - I'd prefer not to be making a hundred thousand requests, because it's kind of expensive in bandwidth)
- # [23:33] <Philip`> hsivonen: Yes, and to CNN
- # [23:33] <hsivonen> also, how alive is dmoz these days? does it represent current authoring?
- # [23:33] * Philip` has no idea
- # [23:34] <Philip`> I can imagine getting much worse results from a spider that gets sucked into a single giant site, so I'm not sure how to make things definitely better
- # [23:34] * Quits: gsnedders (n=gsnedder@host86-135-224-200.range86-135.btcentralplus.com) ("404: Not Found")
- # [23:35] <Philip`> (I'm not even sure what "better" means)
- # [23:37] <hsivonen> Philip`: knowledge about web site structures would probably be needed to make reasonable guesses
- # [23:38] <hsivonen> Philip`: without data I might guess that a sensible strategy would be taking a list of site roots, analyzing the front page, picking two site-internal links at random, analyzing those pages too and following one site-internal link from each of those
- # [23:38] <hsivonen> that would give front page plus 4 non-front pages for each site
- # [23:38] * Quits: othermaciej (n=mjs@dsl081-048-145.sfo1.dsl.speakeasy.net)
- # [23:51] <Philip`> hsivonen: It would be good to have a way of evaluating the strategies, to see which ones actually work sensibly in practice, but I've got no idea how to do that either :-/
- # [23:54] * Joins: weinig_ (n=weinig@17.255.108.233)
- # [23:54] * Quits: weinig (n=weinig@17.203.15.140) (Read error: 104 (Connection reset by peer))
- # [23:55] * Joins: weinig (n=weinig@17.203.15.140)
- # [23:59] * Quits: gavin_ (n=gavin@firefox/developer/gavin)
- # [23:59] <tndH> ooh, acronym/initialism debate again
- # [23:59] <tndH> feels nice to read that after all the ogg stuff
- # Session Close: Thu Dec 13 00:00:01 2007
The end :)