当前页面: 开发资料首页 → Java 专题 → WEB应用中的编码问题
WEB应用中的编码问题
摘要: WEB应用中的编码问题
</td>
</tr>
<tr>
<td height="35" valign="top" class="ArticleTeitle">
Java是为做国际化应用设计的,Servlet应根据浏览器语言设置自动切换字符集配置。
首先一个概念:即使是基于Java的WEB应用,在服务器和客户端之间传递的仍然是字节流,比如我从一个中文客户端的浏览器表单中提交“世界你好”这4个中文字到服务器时:首先浏览器按照GBK方式编码成字节流CA C0 BD E7 C4 E3 BA C3,然后8个字节按照URLEncoding的规范转成:%CA%C0%BD%E7%C4%E3%BA%C3,服务器端的Servlet接收到请求后应该按什么解码处理,输出时又应该按什么方式编码字节流呢?
<table width="665" border="0">
<tr>
<td width="380"> 在目前的Servlet的规范中,如果不指定的话通过WEB提交时的输入ServletRequest和输出时的ServletResponse缺省都是ISO-8859-1方式编码/解码的(注意,这里的编码/解码方式是和操作系统环境中的语言环境是无关的)。因此,即使服务器操作系统的语言环境是中文,上面输入的请求仍然按英文解码成8个UNICODE字符,输出时仍按照英文再编码成8个字节,虽然这样在浏览器端如果设置是中文能够正确显示,但实际上读写的是“字节”,正确的方式是应该根据客户端浏览器设置ServletRequest和ServletResponse,用相应语言的编码方式进行输入解码/输出编码,HelloUnicodeServlet.java就是这样一个监测客户端浏览器语言设置的例子:
package examples;
/*
* Che, Dong Email: chedongATbigfoot.com/chedongATchedong.com
* $Id: HelloUnicodeServlet.java,v 1.4
*/
</td>
<td width="275">
</td>
</tr>
</table>
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
/**
* 一个简单的Servlet试验:用于说明如何自动根据HTTP头中的"Accept-Language"
* 检测客户端的语言字符集设置,正确的将请求中的内容解码,并将内容用正确的
* 编码方式编码反馈给客户端。
*
* @author Che, Dong
*/
public class HelloUnicodeServlet extends HttpServlet {
/**
* Brower language detection demo
* @param req HTTP Servlet Request
* @param res HTTP Servlet Response
* @throws ServletException servlet error
* @throws IOException io error
*/
public void doGet(HttpServletRequest req, HttpServletResponse res)
throws ServletException, IOException {
String clientLanguage = req.getHeader("Accept-Language");
/*
* comment following code to disable brower language detection
*/
if (clientLanguage.startsWith("zh-cn")) {//
//for Simplied Chinese
req.setCharacterEncoding("GBK");
res.setContentType("text/html; charset=GBK");
} else if (clientLanguage.startsWith("zh-tw")) {
//for Traditional Chinese
req.setCharacterEncoding("BIG5");
res.setContentType("text/html; charset=BIG5");
} else {
//default encoding
req.setCharacterEncoding("ISO-8859-1");
res.setContentType("text/html; charset=ISO-8859-1");
}
//defualt hello string
String hello = "hello world";
if (req.getParameter("hello") != null) {
hello = req.getParameter("hello").trim();
}
PrintWriter pw = res.getWriter();
pw.println(
" HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">");
pw.println("");
pw.println("<head>");
pw.println("
" + hello + "");
pw.println("</head>");
pw.println();
pw.println("<body>");
pw.println();
pw.println("
" +"clientLanguage="+ clientLanguage+"
");
pw.println("
'" + hello + "' length=" + hello.length()+ "
");
//print current request and response charset encoding
pw.println("ServletRequest's Charset Encoding = "
+ req.getCharacterEncoding());
pw.println("
");
pw.println("ServletResponse's Charset Encoding = "
+ res.getCharacterEncoding());
pw.println("
");
//print char array
pw.println(getCharArray(hello));
//show input form
pw.println("<form action='' method='GET'>");
pw.println("<input name='hello' value='");
pw.print("'>");
pw.println(" <input type='submit'>");
pw.print("</form>");
//print system properties
pw.println("
");
System.getProperties().list(pw);
pw.println("
");
pw.println("</body>");
pw.close();
}
/**
* print char array
* @param inStr input String
* @return String output String
*/
private static String getCharArray(String inStr) {
char[] myBuffer = inStr.toCharArray();
StringBuffer sb = new StringBuffer();
//list each Charactor in byte value, short value, and UnicodeBlock Mapping
for (int i = 0; i < inStr.length(); i++) {
byte b = (byte) myBuffer[i];
short s = (short) myBuffer[i];
String hexB = Integer.toHexString(b).toUpperCase();
String hexS = Integer.toHexString(s).toUpperCase();
//print char
sb.append("char[");
sb.append(i);
sb.append("]='");
sb.append(myBuffer[i]);
sb.append("'\t");
//byte value
sb.append("byte=");
sb.append(b);
sb.append(" \\u");
sb.append(hexB);
sb.append('\t');
//short value
sb.append("short=");
sb.append(s);
sb.append(" \\u");
sb.append(hexS);
sb.append('\t');
//Unicode Block
sb.append(Character.UnicodeBlock.of(myBuffer[i]));
sb.append("
");
}
return sb.toString();
}
}
当浏览器的首选语言设置为中文(中国)[zh-ch]时,输出如下:
clientLanguage=zh-cn,zh-tw;q=0.8,zh;q=0.7,en-us;q=0.5,en;q=0.3,tr;q=0.2
'hello 世界你好' length=10
ServletRequest's Charset Encoding = GBK
ServletResponse's Charset Encoding = GBK
char[0]='h' byte=104 \u68 short=104 \u68 BASIC_LATIN
char[1]='e' byte=101 \u65 short=101 \u65 BASIC_LATIN
char[2]='l' byte=108 \u6C short=108 \u6C BASIC_LATIN
char[3]='l' byte=108 \u6C short=108 \u6C BASIC_LATIN
char[4]='o' byte=111 \u6F short=111 \u6F BASIC_LATIN
char[5]=' ' byte=32 \u20 short=32 \u20 BASIC_LATIN
char[6]='世' byte=22 \u16 short=19990 \u4E16 CJK_UNIFIED_IDEOGRAPHS
char[7]='界' byte=76 \u4C short=30028 \u754C CJK_UNIFIED_IDEOGRAPHS
char[8]='你' byte=96 \u60 short=20320 \u4F60 CJK_UNIFIED_IDEOGRAPHS
char[9]='好' byte=125 \u7D short=22909 \u597D CJK_UNIFIED_IDEOGRAPHS
当浏览器的首选语言设置为英语(美国)[en-us]时,输出如下:
clientLanguage=en-us,zh-cn;q=0.8,zh-tw;q=0.7,zh;q=0.5,en;q=0.3,tr;q=0.2
'hello ÊÀ½çÄãºÃ' length=14
ServletRequest's Charset Encoding = ISO-8859-1
ServletResponse's Charset Encoding = ISO-8859-1
char[0]='h' byte=104 \u68 short=104 \u68 BASIC_LATIN
char[1]='e' byte=101 \u65 short=101 \u65 BASIC_LATIN
char[2]='l' byte=108 \u6C short=108 \u6C BASIC_LATIN
char[3]='l' byte=108 \u6C short=108 \u6C BASIC_LATIN
char[4]='o' byte=111 \u6F short=111 \u6F BASIC_LATIN
char[5]=' ' byte=32 \u20 short=32 \u20 BASIC_LATIN
char[6]='Ê' byte=-54 \uFFFFFFCA short=202 \uCA LATIN_1_SUPPLEMENT
char[7]='À' byte=-64 \uFFFFFFC0 short=192 \uC0 LATIN_1_SUPPLEMENT
char[8]='½' byte=-67 \uFFFFFFBD short=189 \uBD LATIN_1_SUPPLEMENT
char[9]='ç' byte=-25 \uFFFFFFE7 short=231 \uE7 LATIN_1_SUPPLEMENT
char[10]='Ä' byte=-60 \uFFFFFFC4 short=196 \uC4 LATIN_1_SUPPLEMENT
char[11]='ã' byte=-29 \uFFFFFFE3 short=227 \uE3 LATIN_1_SUPPLEMENT
char[12]='º' byte=-70 \uFFFFFFBA short=186 \uBA LATIN_1_SUPPLEMENT
char[13]='Ã' byte=-61 \uFFFFFFC3 short=195 \uC3 LATIN_1_SUPPLEMENT
function TempSave(ElementID)
{
CommentsPersistDiv.setAttribute("CommentContent",document.getElementById(ElementID).value);
CommentsPersistDiv.save("CommentXMLStore");
}
function Restore(ElementID)
{
CommentsPersistDiv.load("CommentXMLStore");
document.getElementById(ElementID).value=CommentsPersistDiv.getAttribute("CommentContent");
}
</td>
</tr>
<tr>
↑返回目录
前一篇: 在Servlet中使用两种输出机制
后一篇: 版权过滤器