Making JSP pages that use a Unicode encoding is really easy. You only need to do a few things to be successful.
For starters, you need to understand what happens when you make a JSP page. JSPs are really Java programs (they extend the class servlet). The JSP container reads the source file and writes a little Java program. Click here to see the source generated for a version of this page. Basically, all the program generated from the JSP does is write a byte stream back to the browser using an OutputStreamWriter.
Internally, all Java String objects are Unicode UTF-16. So you are working with Unicode within the JSP page, regardless of the encoding. What you need to do to have a working page is:
- Convert parameters (data sent to your program) to String objects using the correct encoding.
- Read the JSP source file using the right encoding
- Tell the JSP compiler what encoding you want to use when sending the file to the browser.
Here’s the code you need to do that.
First, we need to read any input using the right encoding. This requires that you instruct the ServletRequest object what encoding to use before you read any data from it. Once you read data from the Request, the encoding is set forever.
If all of your pages use UTF-8, then you can skip the above step. Be careful too of pages that link to yours but which are not part of your application. If an external page links to you with a form and it doesn’t use UTF-8, you’ll null strings when you ask for parameters (since the UTF-8 conversion will silently fail).
Next we need to tell the system what encoding the page is written in. This is not necessarily the same thing as the encoding the page will be in when you serve it, as this directive tells the JSP system how to read the bytes in the .JSP file. You may find that it is easier to use a legacy (non-Unicode) encoding to author your pages, since many IDEs and editors don’t support UTF-8 or make it difficult to work with.
The pageEncoding directive controls this encoding:
Finally, we need to control the encoding that the page uses when it is sent to the browser. This has several parts to it, since you need to set things in several places. First, you should include a META tag in your HTML markup so that end users can see it. The other things we are doing to the page are more than adequate to make the page work correctly, but end users can sometimes debug page display problems (such as when they manually override the encoding and get junk) by looking at the tag. Here’s what a META tag looks like:
<META http-equiv=”Content-Type” content=”text/html;charset=UTF-8″>
Then you need to tell the page compiler what encoding you want to use. This has several effects. First, it sets the actual encoding used. Second it sets the HTTP Content-Type header.
Note that XML files are handled similarly. XML doesn’t use a META tag, but you can (and should) set the encoding attribute in the document declaration:
<?xml version=”1.0″ encoding=”UTF-8″?>
There are some caveats about setting the encoding explicitly to UTF-8 using the contentType page directive. The big one is: prior to J2EE 1.4 (Servlet 2.4), the JSTL and other taglib directives related to getting and setting the page Locale caused the page to change encoding to one inferred by the Locale. In J2EE 1.4, this is fixed (so that the page directive takes precedence over any implicit encoding), but for now you need to be careful of setting the page Locale. We’ll examine that on another page.
No Charset Specified: Demonstrates how not using the page directives results in a page that uses Latin-1.
Inferred Charset: Demonstrates how an inferred character encoding (from response.setLocale() in this case) overrides the contentType page directive on Servlet 2.3 and earlier. Note that using any of the fmt tags in JSTL in your page will give you this same result.
Includes: Demonstrates the tags <%@include> and <jsp:include />.