使用htmlunit可以很方便的實現(xiàn)URL文件的下載
public static String httpDownload(String url, String encode) {
WebClient webClient = new WebClient();
webClient.getOptions().setActiveXNative(false);
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
InputStream is = null;
String temp = null;
StringBuilder sb = new StringBuilder();
try {
Page page = webClient.getPage(url);
is = page.getWebResponse().getContentAsStream();
byte[] bytes = new byte[4096];
int len = 0;
while ((len = is.read(bytes)) != -1) {
sb.append(new String(bytes, 0, len, encode));
}
byte[] specialByte = { (byte) 0xC2, (byte) 0xA0 };
String UTFSpace = new String(specialByte, StandardCharsets.UTF_8);
temp = sb.toString().replaceAll(UTFSpace, " ");
} catch (Exception e) {
e.printStackTrace();
} finally {
if (null != is) {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
webClient.close();
}
return temp;
}
注意:該方法針對HTML文件進行了空字符替換岭洲,將0xC2和0xA0替換為了HTML里面的?陆馁。如果是其他類型的文件博杖,不要替換空字符,不然會導致文件打不開或亂碼什么的娩脾。