站内搜索: 请输入搜索关键词

当前页面: 开发资料首页Java 专题正则表达式学习-从一个URL中获取所有的超链接

正则表达式学习-从一个URL中获取所有的超链接

摘要: 正则表达式学习-从一个URL中获取所有的超链接

</td> </tr> <tr> <td width="489" height="35" valign="top" class="ArticleTeitle"> 下面的例子演示如何利用正则表达式从一个URL中查找并输出所有类似下面的超链接:

首先我们从命令行输入URL地址,打开输入流,读取URL的内容并转化为字符串存入htmlString中。然后以"(]*>)"构造正则表达式,最后在字符串htmlString中查找匹配的字符串。

import java.io.*;
import java.net.*;
import java.util.regex.*;
public class GetHref {
public static void main(String[] args) {
InputStream in = null;
PrintWriter out = null;
String htmlString=null;
try {
// Check the arguments
if ((args.length != 1)&& (args.length != 2))
throw new IllegalArgumentException("Wrong number of args");

// Set up the streams
URL url = new URL(args[0]); // Create the URL
in = url.openStream(); // Open a stream to it
if (args.length == 2) // Get an appropriate output stream
out = new PrintWriter(new FileWriter(args[1]));
BufferedReader bin=new BufferedReader(new InputStreamReader(in));
String line;
StringBuffer sb = new StringBuffer();
while((line=bin.readLine())!=null){
if(out!=null) out.println(line);
sb=sb.append(line);
}
htmlString=sb.toString();
// System.out.println(sb.toString());
}
// On exceptions, print error message and usage message.
catch (Exception e) {
System.err.println(e);
System.err.println("Usage: java GetURL []");
}
finally { // Always close the streams, no matter what.
try { in.close(); out.close(); } catch (Exception e) {}
}

Pattern p = Pattern.compile("(]*>)");
Matcher m = p.matcher(htmlString);
boolean result = m.find();
while(result){
for(int i=1;i<=m.groupCount();i++){
System.out.println(m.group(i));
}
result=m.find();
}
}
}
程序运行结果:
C:\java>java GetHref http://127.0.0.1:8080/zz3zcwbwebhome/index.jsp

w.zzedu.gov.cn','java学习室')">
page)'this.setHomePage('http://10.10.1.1/index.jsp');">










..........................
</td> <td width="186" align="center" valign="top" class="ArticleTeitle">

</td> </tr> <tr> <td height="25" colspan="2" valign="top" class="ArticleTeitle">


↑返回目录
前一篇: 在MS SQL server2000中创建和调用存储过程
后一篇: 让JSP页面不缓存