当前页面: 开发资料首页 → J2SE 专题 → 下载网页遇到的编码问题

下载网页遇到的编码问题

摘要: 下载网页遇到的编码问题

我使用下面的函数得到网页的字符串，并且将得到的字符串写入JDataStore数据库，然后再从数据库中取出放到Jtable中。现在的问题是如果网页编码是GB2312没问题，但是如果是UTF-8编码的网页（Yahoo的搜索页），就会有一个奇怪的问题，我用IE打开网页，可以看到这样一行；但如果使用下载函数下载，这行就会变成；而且所有应该显示中文的地方都被空子符代替。
public static String getHtmlText(String strUrl) {
if (strUrl == null || strUrl.length() == 0) {
return null;
}

String strHtml = "";
String strLine = "";
try {

//链接网络得到网页源代码
URL url = new URL(strUrl);
HttpURLConnection pconn = (HttpURLConnection) url.openConnection();
pconn.addRequestProperty("User-Agent", "IcewolfHttp/1.0");
pconn.addRequestProperty("Accept",
"www/source; text/html; image/gif; */*");

pconn.connect();
//System.out.println("Connect status:"+pconn.getResponseCode());
//if(HttpURLConnection.HTTP_ACCEPTED == pconn.getResponseCode())
//InputStream in = url.openConnection();

InputStream in = pconn.getInputStream();
//System.out.println("Get status:"+pconn.getResponseCode());
BufferedInputStream buff = new BufferedInputStream(in);
Reader r = new InputStreamReader(buff);
BufferedReader br = new BufferedReader(r);

while ( (strLine = br.readLine()) != null) {
strHtml += strLine;
}
//strHtml = UnToWC(strHtml);
br.close();
buff.close();
in.close();
pconn.disconnect();
}
catch (MalformedURLException mfe) {
System.err.println("url is not a parsable URL");
}
catch (IOException ioe) {
System.err.println(ioe);
}

return strHtml;
}

试试在建reader的时候指定编码方式为UTF-8

恩，试试楼上的方法先～

Reader r = new InputStreamReader(buff, "UTF-8");
是这样吗，我试过了，没用。

我发现，只有yahoo有问题，其他的UTF-8的网页，比如AOL的网页用Reader r = new InputStreamReader(buff, "UTF-8");都可以正常转化，都没有问题，太奇怪了。

↑返回目录
前一篇: 怎么会有2个jre??jdk1.5的问题..
后一篇: 向数据库插入纪录出错