sabato 9 ottobre 2010

Charset in Content-Type Header and Tomcat/Apache couple

When a resourse is served by Apache httpd via Tomcat, the Content-Type header is filled with the value defined in mime-mapping tag. You can found this value in the default web.xml under TOMCAT_HOME/conf.
For resource like html page the Content-Type header is text/html, regardless of charset encoding that the page was written.
You Apache, by default, is configured with AddDefaultCharset directive (In my Fedora 13 is UTF-8, but may vary between version and distribution you use) that add the charset declaration into Content-Type HTTP Header, if it hasn't one.

The problem:
If you write an HTML page with ISO-8859-1 and set correctly the meta tag into page as:
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1">
you obtain two different behaviours if your browser get this page directly from Tomcat, or from Apache via mod_jk:
1) If your browser talk directly with Tomcat, it can detect the encoding by meta tag into html page.
2) If you get the page from Apache, this one add the charset declaration into Content-Type HTTP Header with his default value from httpd.conf, then you are serving a iso-8859-1 page but your server declare it ad UTF-8.

To fix this mismatch you can follow two ways:
1) Modity the Tomcat's mime-mapping declaration (in default web.xml or overraid the default mapping in you web.xml):
   <mime-mapping>  
    <extension>html</extension>  
    <mime-type>text/html; charset=utf-8</mime-type>  
   </mime-mapping>  
   <mime-mapping>  
<extension>css</extension> <mime-type>text/css; charset=utf-8</mime-type> </mime-mapping> <mime-mapping>
<extension>js</extension> <mime-type>text/javascript; charset=utf-8</mime-type> </mime-mapping>
2) Change default charset in Apache httpd. Note that this work only for content-type either text/plain or text/html, and AddCharset and AddType directives to force this value in other file type (say .js or .css) don't work for resourses came from mod_jk (from tomcat).

I prefer the first way for two reasones: the know limitation explained on second point and the major liberty of personalization of web.xml configuration (you can serve, as example, from the same virtual host html page from two different webapps with different charser).

Reference:
AddDefaultCharset
AddCharset