It’s all about the Content-Type

Jump to menu

25 October 2003

I feel kinda guilty for saying this, but what the heck. Simon Jessey points to an article he wrote: Serving up XHTML with the correct MIME type. This has some problems.

Just look at the code:

<?php

$charset = "iso-8859-1";

$prolog_type = "";

if(

    stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml") ||

    stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator")) {

    $mime = "application/xhtml+xml";

    $prolog_type = "<?xml version=\"1.0\" encoding=\"iso-8859-1\" ?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en-US\" lang=\"en-US\">\n";

}

else {

    $mime = "text/html";

    $prolog_type = "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n<html lang=\"en-US\">\n";

}

header("content-type:$mime;charset=$charset");

print $prolog_type;

?>

Now, how many ways can this cause problems?

  1. It sends XML to the validator, but also to every user-agent. Even if it doesn’t match the conditional, you’ll still be sent the content with the XML empty element syntax. Unless, of course, you don’t use any empty elements.
  2. There’s a variable $charset. But that charset isn’t specified in the XML prolog, so it’s basically pointless. If you change the value of $charset to, say, utf-8, you’ll send a Content-Type of utf-8 and an XML prolog of iso-8859-1.
  3. The language is fixed at en-US. Not only is this unnecessarily specific — English is reasonably malleable, so why not just en? — but I can could write French, German, Swedish, etc. in the iso-8859-1 encoding.
  4. Not a problem per se, but why two transitional DOCTYPEs? Are there elements in XHTML 1.0 Trasnitional not in HTML 4 Strict?
  5. Oh, and I was always under the impression that RFC key words should be uppercased.

Comments

  1. Slightly altered

    I rather rushed that article out late last night, after being asked to try and get it done, so it isn't perfect.

    I am not sure what you mean by your first point. Is there a flaw in my conditional logic?

    I fixed the $charset issue. Stupid of me. I also switched the language to simply "en". Using en-US was a habit I picked up when I moved to the US from England. I was sort of making a point LOL.

    I used Transitional doctypes because the website is a business site, and I may get lazy and use the odd deprecated attribute from time to time. jessey.net is done in XHTML 1.1.

    Fair point about RFC keywords. I did say it was a "re-working" of the table, but I altered it to reflect the correct case. I lowercased it before because I HATE TO SEE LOTS OF UPPERCASE TEXT.

    Posted by Simon Jessey on 25 October 2003 at 14:23:10.

  2. Re: I, IV

    I mean that if you have, for example, a link element, it will use the XML empty element syntax (<link/>). That is not valid HTML--ask the WDG validator.

    My point about Transitional DOCTYPEs was that if the site is XHTML 1.0 Transitional, is it not also valid HTML 4.01 Strict?

    Thanks for fixing the charset/language thing.

    Posted by Sean on 25 October 2003 at 14:42:33.

  3. I, IV

    Hmmm. I understand now. Bill Mason at Accessible Internet also pointed it out to me. I am not skilled in the art of transformation, so it may be that switching XHTML to HTML (as well as switching MIME types) is a little ambitious. I'll have to give it some thought...

    By the way, I have credited you for your assistance at the end of the article.

    Posted by Simon Jessey on 25 October 2003 at 14:55:30.

  4. Empty elements

    Well, I think you could use PHP's output buffering to simply replace '/>' with '>'. I'm not so good with PHP, but ob_start looks like it can do what you would need.

    Posted by Sean on 25 October 2003 at 15:7:8.

  5. Fixed HTML

    Your suggestion worked out. I have altered the article to add a function that strips out the trailing slashes from the buffered page. Thank you for your help.

    Posted by Simon Jessey on 25 October 2003 at 15:44:52.

  6. Re: transitional issue

    XHTML1.0 Transitional is almost equal to HTML4.01 Transitional, so elements like 'font' are still allowed ;).

    Posted by Anne on 27 October 2003 at 12:29:55.