眾多初學(xué)者,如果有老的數(shù)據(jù)牲迫,從編輯器里生成出來的html代碼片段,導(dǎo)入elasticsearch中借卧,會出現(xiàn)搜索高亮?xí)r把html顯示出來盹憎,體驗不好,同步logstash時谓娃,需要進行filter過濾器先過濾掉html代碼
filter{
mutate{
gsub => [ "content", "<script(.*?)</script>", "" ]
}
mutate{
gsub => [ "content", "<iframe(.*?)</iframe>", "" ]
}
mutate{
gsub => [ "content", "<style(.*?)</style>", "" ]
}
mutate{
gsub => [ "content", "<(.*?)>", "" ]
}
mutate{
gsub => [ "content", " ", "" ]
}
}
許多需要先在mysql中過濾脚乡,尤其是時間類型字段,建索引時也要指定格式:
"format"=>"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis||strict_date_optional_time"
SELECT a.id,a.title,b.content,b.content as content_old,CONCAT(a.addtime) AS addtime,CONCAT(a.autotime) AS autotime,a.views,a.zans,a.type_a,a.type_b,CONCAT(a.isshow) AS isshow,CONCAT(a.isdelete) AS isdelete,if(isnull(a.deletetime),0,a.deletetime) as deletetime FROM web_information a