- 用 php 處理注意棚唆,explode()拆分字符串旱函,返回給一個變量便是將拆分出的東西依次存進這個變量资厉,返回給一個list(變量1王污,變量2罢吃,……)則是將拆分出的東西分別存到變量1,變量2昭齐,……尿招。
- 結(jié)合 array_fliter() 過濾的時候,如果不給回調(diào)函數(shù)過濾不干凈的時候阱驾,可以在增加過濾回調(diào)函數(shù)就谜,排除 "\r\n" 和 原本的 FALSE(不寫過濾回調(diào)函數(shù)的時候默認就是這個FALSE)。
- 調(diào)試可用file_put_contents("/home/search/result", var_export(['time'=> date('Y-m-d H:i:s'),'res'=>$result], true)."\n", FILE_APPEND);
<?php
$file=fopen("mission2.txt","r") or exit("Unable to open file!");
$fileCity=fopen("city2.txt","r") or exit("Unable to open file!");
$arr = []; //用不用都行
// 讀取文件每一行啊易,直到文件結(jié)尾
while(!feof($file))
{
//echo fgets($file). "<br>";
$arr[] = fgets($file);
}
while(!feof($fileCity))
{
$city[] = fgets($fileCity);
}
//print_r($arr);
foreach($arr as $a){
list($term[]) = explode(" ",$a);
}
echo '<br>';
print_r($term);
echo '<br>';
$city = array_filter($city);
print_r($city);
function gl($gg){
if($gg == "\r\n" or $gg == FALSE) //要用\r\n這個才能過濾掉某些windows里面的換行吁伺,還需要加上默認的FALSE饮睬,不然也可能濾不干凈
return 0;
else
return 1;
}
foreach($city as $c){
$cityAlone[] = array_filter(explode("\t",$c),"gl"); //array_filter不使用第二個參數(shù)回調(diào)函數(shù)的話默認是過濾FALSE
}
echo '<br>';
echo '<pre>';
print_r($cityAlone);
echo '</pre>';
echo '<br>';
print_r(array_filter($cityAlone[0]));
foreach($term as $key=>$te){
foreach(array_filter($cityAlone[$key]) as $ci){
echo $ci.$te.'<br>';
}
}
fclose($fileCity);
fclose($file);
?>
<?php
ini_set('memory_limit', '512M');
$fileLeimu = fopen("leimu.txt","r") or exit("Unable to open file!");
$fileCity = fopen("city.txt","r") or exit("Unable to open file!");
$fileZaci1 = fopen("zaci1.txt","r") or exit("Unable to open file!");
while(!feof($fileLeimu))
{
$leimu[] = fgets($fileLeimu);
}
while(!feof($fileCity))
{
$city[] = fgets($fileCity);
}
while(!feof($fileZaci1))
{
$zaci1[] = fgets($fileZaci1);
}
function gl($gg){
if($gg == "\r\n" or $gg == FALSE) //要用\r\n這個才能過濾掉某些windows里面的換行租谈,還需要加上默認的FALSE,不然也可能濾不干凈
return 0;
else
return 1;
}
$leimu = array_filter($leimu,"gl");
$city = array_filter($city,"gl");
$zaci1 = array_filter($zaci1,"gl");
$zaci2 = ['上門','找'];
$leimu = array_unique($leimu); //去重
$city = array_unique($city);
$leimu = str_replace(array("\r\n", "\r", "\n"), "", $leimu); //去掉每個字符串元素末尾的換行
$city = str_replace(array("\r\n", "\r", "\n"), "", $city);
$zaci1 = str_replace(array("\r\n", "\r", "\n"), "", $zaci1);
// echo '<pre>';
// print_r($zaci2);
// print_r($leimu);
// print_r($city);
// print_r($zaci1);
// echo '</pre>';
/*
foreach($city as $c){
foreach($leimu as $l){
foreach($zaci1 as $zOne){
$combRes[] = $c.$l.$zOne;
}
}
}
foreach($city as $c){
foreach($leimu as $l){
foreach($zaci2 as $zTwo){
$combRes[] = $c.$zTwo.$l;
}
}
}
foreach($city as $c){
foreach($leimu as $l){
$combRes[] = $c.$l;
}
}
*/
foreach($leimu as $l){
foreach($zaci1 as $zOne){
$combRes[] = $l.$zOne;
}
}
foreach($zaci2 as $zTwo){
foreach($leimu as $l){
$combRes[] = $zTwo.$l;
}
}
foreach($leimu as $l){
$combRes[] = $l;
}
$combRes = implode("\r\n", $combRes); //想要打印到txt中捆愁,能有換行的效果割去,需要添加這個
file_put_contents("456.txt",$combRes);
// echo gettype($combRes);
echo '<pre>';
print_r($combRes);
echo '</pre>';
fclose($fileLeimu);
fclose($fileCity);
fclose($fileZaci1);
獲取頁面某標簽中的內(nèi)容,若是能借助下載simplehtmldom類打開操作的話昼丑,dom方便呻逆。若打不開,則用file_get_contents或者curl(網(wǎng)上說效率更高)菩帝,讀取全文內(nèi)容咖城,然后用正則匹配來做。
<?php
require_once "../classes/simplehtmldom_1_5/simple_html_dom.php";
$mainHtml = 'http://***/index.xml';
//$mainHtml ='http://***/20161222-0.xml'; //為何沒法用file_get_html打開呼奢?
$html = file_get_html($mainHtml); //創(chuàng)建一個DOM
foreach($html->find('loc') as $loc){
$locTextRecord[] = $loc->plaintext;
}
// $htmltest = file_get_contents($locTextRecord[4]);
foreach($locTextRecord as $everyLoc){
$htmltest = file_get_contents($everyLoc);
$reg = '/\<catg\>(.*?)\<\/catg\>/is';
if(preg_match_all($reg, $htmltest, $arr)) {
foreach($arr[1] as $a){
$record[] = $a;
}
} else {
echo "匹配失敗!<br>";
}
}
$record = array_unique($record);
foreach($record as $r){
var_dump($r); //利用 var_dump 來查看變量類型宜雀,可以調(diào)試和直接在網(wǎng)頁上復制用。
}
$html->clear();
?>
一個主xml下含很多<loc>子標簽握础,在所有xml中辐董,查找某個字符串:
<?php
$mainHtml = file_get_contents('http://***/tp_index.xml');
$reg = '/\<loc\>(.*?)\<\/loc\>/is';
if(preg_match_all($reg, $mainHtml, $arr)) {
foreach($arr[1] as $a){
$locTextRecord[] = $a;
}
} else {
echo "loc匹配失敗!<br>";
}
//var_dump($locTextRecord);
foreach($locTextRecord as $everyLoc){
$htmlTest = file_get_contents($everyLoc);
//$aim = 'https://baidu.com?srcid%3D1000%26id%3D10034_5467_%E5%8C%97%E4%BA%AC';
$aim = '搬家';
if(strpos($htmlTest, $aim)) {
$record[] = $everyLoc;
} else {
echo "aim匹配失敗!<br>";
}
}
//$record = array_unique($record);
foreach($record as $r){
var_dump($r); //利用 var_dump 來查看變量類型,可以調(diào)試和直接在網(wǎng)頁上復制用禀综。
}
?>
注意:
preg_match()的第三個參數(shù):如果提供了參數(shù)matches简烘,它將被填充為搜索結(jié)果苔严。 $matches[0]將包含完整模式匹配到的文本, $matches[1] 將包含第一個捕獲子組匹配到的文本孤澎,以此類推届氢。
preg_match_all()的第三個參數(shù):多維數(shù)組,作為輸出參數(shù)輸出所有匹配結(jié)果, 數(shù)組排序通過flags指定亥至。