Does Google index XHTML files, served as application/xhtml+xml?

My blog entry about this.

This document is served as text/html so Google can at least index this one and follow the links.

Let's find out, by sitting and waiting.. And keeping track of this search.

Document Type Definition?

Most XHTML files in /stuff/xhtml/ have an XHTML 1.1 DTD and aren't indexed by Google (File Format: Unrecognized).

I noticed before Google didn't index some XHTML files. Never knew why. This comment made me rethink about it. Does it indeed have to do with the DTD? Some test documents:

  1. XHTML 1.0 with an XML declaration → not indexed
  2. XHTML 1.1 with an XML declaration → not indexed

XML declaration?

Or does it have to do with the XML declaration? I made two other documents:

  1. XHTML 1.0 without an XML declaration → indexed
  2. XHTML 1.1 without an XML declaration → indexed

The type attribute?

Perhaps Google sees this attributes, after which it decides it can't handle it. Tests:

  1. XHTML 1.0, linked with type="application/xhtml+xml" → not indexed
  2. XHTML 1.0, linked without type="application/xhtml+xml" → not indexed
  3. XHTML 1.1, linked with type="application/xhtml+xml" → not indexed
  4. XHTML 1.1, linked without type="application/xhtml+xml" → not indexed

All these documents have an XML declaration.


A conclusion

(Made on June 20, 2005)

Hmm, so Google doesn't index XHTML files with an XML declaration. Interesting. Without that declaration they are indexed, even though it's an unrecognized file format.

So what about about XHTML, with an XML declaration, in text/html?

Makes sense.

Why the unrecognized file format?

That's probably the application/xhtml+xml MIME type. I guess :)