Python:如何判斷一個(gè)url是以http開頭的底循?
比如一個(gè)文本test.txt,里面的內(nèi)容為:
http://www.sogou.com
this is a url
this is http://www.sogou.com address
第一種方式是巢株,判斷包含:
#encoding: utf-8
with open("test.txt", "r") as f:
content = f.readlines()
for line in content:
if "http" in line:
print(line)
輸出為:
http://www.sogou.com
this is http://www.sogou.com address
如果只獲取以http開頭的,那么:
#encoding: utf-8
import re
with open("test.txt", "r") as f:
content = f.readlines()
for line in content:
r = re.match("http", line)
if r != None:
print(line)
輸出為:
http://www.sogou.com
re.match, 從開頭匹配字符串熙涤,如果匹配到返回匹配到的對(duì)象阁苞。沒(méi)有匹配到返回None。
有沒(méi)有更簡(jiǎn)單的方式呢祠挫?
#encoding: utf-8
with open("test.txt", "r") as f:
content = f.readlines()
for line in content:
if line.startswith("http"):
print(line)
同樣輸出為:
http://www.sogou.com
既然有startswith那槽,那么有沒(méi)有判斷結(jié)尾的呢?
答案是當(dāng)然的等舔。
#encoding: utf-8
with open("test.txt", "r") as f:
content = f.readlines()
for line in content:
if line.replace("\n","").endswith("com"):
print(line)
這里要注意的是骚灸,每行結(jié)束會(huì)有一個(gè)換行符,因此要替換掉慌植。
雖然從代碼行數(shù)上甚牲,區(qū)別不是太大义郑,但是從方法名稱的理解上,startswith和endswith丈钙,更容易一些非驮。
如果要匹配多個(gè)字符怎么辦?
比如文本內(nèi)容為:
http://www.sogou.com
this is a url
this is http://www.sogou.com address
ftp://www.sogou.com
#encoding: utf-8
with open("test.txt", "r") as f:
content = f.readlines()
for line in content:
if line.startswith(("http", "ftp")):
print(line)
只需要傳參數(shù)為元組雏赦,包含要匹配的字串即可劫笙。