Thursday, October 27, 2011

i18n : JSP and MySQL Internationalization


I) JSP Character Set Handling

Character handling can be split into two categories: displaying the characters and receiving the ones entered by the user.  For JSP pages, setting the encoding for each category is done separately by using different directives and/or functions.

Displaying International Characters

i. To set the character encoding for JSP page display, use the standard “page” directive with “contentType” parameter as follows:

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>

Page directive is used to control the structure of a servlet or a JSP by importing classes, customizing superclasses, and setting the content type, etc.

ii. The following HTML tag can be used for the web browser to load the correct character set:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Meta tags with an http-equiv attribute are the same as HTTP headers. In general, they are used to control the action of browsers, and can be used to refine the information provided by the actual headers.

iii. Avoid using Java String functions (or constructors) such as the following to convert character encoding of a string, because it is both inefficient and unnecessary. Once you set all the options mentioned in this article correctly, you will not need to use such a conversion.

str = new String(request.getParameter("value").getBytes("ISO-8859-1"), "UTF-8");

iv. On every request, you have to set the encoding of characters manually; it is best to create a filter that can be called for every action by specifying it in web.xml:

CharacterEncodingFilter.java

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.ServletException;
import javax.servlet.Filter;
import javax.servlet.FilterConfig;
import javax.servlet.FilterChain;

import java.io.IOException;

public class CharacterEncodingFilter implements Filter
{

    private FilterConfig fc;

    public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException
    {

        HttpServletRequest request = (HttpServletRequest) req;
        HttpServletResponse response = (HttpServletResponse) res;

        response.setContentType("text/html; charset=UTF-8");
        request.setCharacterEncoding("UTF8");

        chain.doFilter(request, response);        //do it again, since JSPs will set it to the default

        response.setContentType("text/html; charset=UTF-8");
        request.setCharacterEncoding("UTF8");
    }

    public void init(FilterConfig filterConfig)
    {

        this.fc = filterConfig;
    }

    public void destroy()
    {

        this.fc = null;
    }
}


web.xml


<filter>
  <filter-name>CharacterEncodingFilter</filter-name>
  <filter-class>com.its.struts.action.CharacterEncodingFilter</filter-class>
</filter>

<filter-mapping>
  <filter-name>CharacterEncodingFilter</filter-name>
  <servlet-name>action</servlet-name>
</filter-mapping>

<filter-mapping>
  <filter-name>CharacterEncodingFilter</filter-name>
  <url-pattern>*.jsp</url-pattern>
</filter-mapping>

<filter-mapping>
  <filter-name>CharacterEncodingFilter</filter-name>
  <url-pattern>*.html</url-pattern>
</filter-mapping>

ACTION class needs to be specified for the STRUTS application.
Other than STRUTS, we can specify all the servlets to handle all requests.

<filter-mapping>
  <filter-name>CharacterEncodingFilter</filter-name>
  <url-pattern>/servlets/*</url-pattern>
</filter-mapping>


v. Java Mail : Finally comes the easiest part: sending e-mails with the subject and body in UTF-8.  The only things you have to do here is use MimeMessage, and give additional parameters when setting the subject and text of your message:

(…)
MimeMessage msg = new MimeMessage(session);
msg.setFrom(InternetAddress.parse(from, false)[0]);
msg.setSentDate(new Date());
msg.setRecipients(Message.RecipientType.TO, InternetAddress.parse(to, false));
msg.setSubject(subject, "UTF-8");
msg.setText(body, "UTF-8");
transport.sendMessage(msg, msg.getAllRecipients());




II) For the data entered into and received from MySQL database server, you can set the default character set and collation at five levels: (i) server, (ii) database, (iii) table, (iv) column, and (v) connection.  More information and example for each follows:

i. When you start the database server: We can add the following lines in my.ini file to initialize the settings on database startup.

default-collation=utf8
collation_server=utf8_unicode_ci
character_set_server=utf8
default-character-set=utf8


ii. When you are creating the database (or with alter statement after creation):

CREATE DATABASE db_name
DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;

iii. When you are creating the table (or with alter statement after creation):

CREATE TABLE tbl_name (column_list)
DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;

iv. When you are describing the columns during table creation:

CREATE TABLE tbl_name
(
clm_name VARCHAR(5) CHARACTER SET utf8 COLLATE utf8_unicode_ci
);

v. When you are creating the connection:

conn.createStatement().execute(" SET NAMES 'utf8' ");


i18n