爬取2018年8月27日~9月2日的歐元匯率次绘。
先說(shuō)結(jié)論:
如果是現(xiàn)匯賣出價(jià),可以選擇
2018-08-31 09:19:26 撒遣,現(xiàn)鈔賣出價(jià) 805.28邮偎。
我剛問(wèn)了報(bào)銷過(guò)的人她說(shuō)任選都行,可以不是中行折算價(jià)义黎。
最近出差禾进,學(xué)校可以以人民幣的形式報(bào)銷路費(fèi)廉涕、住宿費(fèi)泻云,匯率,可以任選出差期間的任何一天任何時(shí)候的中國(guó)銀行的匯率狐蜕,中國(guó)銀行網(wǎng)站上的匯率長(zhǎng)這樣:
如果想要合理利用規(guī)則壶愤,多回一點(diǎn)本,不妨選擇匯率最坑的一天(默默給財(cái)務(wù)處大佬作揖馏鹤,別搞我征椒,我為北郵體育館平均每周至少貢獻(xiàn)20元)
50多頁(yè)匯率都讓我用小本本記上嗎?
比較笨的方法
在頁(yè)面中必須要選擇這個(gè)東西勃救,我去碍讨,日期怎么點(diǎn)?
還算可以的方法
這里面要稍微琢磨一下的一點(diǎn),是如果你僅僅輸入了網(wǎng)址的內(nèi)容瓢省,返回的是這個(gè)空空的頁(yè)面“對(duì)不起弄息,檢索詞不能為空”,這時(shí)候應(yīng)該怎么辦勤婚?
回到最初的網(wǎng)頁(yè)摹量,打開控制臺(tái),選中network馒胆,刷新頁(yè)面看一下缨称,果然是你,表單就在這里:
那么我直接在post請(qǐng)求里面加上headers和表單就可以了祝迂。
import requests
url = 'http://srh.bankofchina.com/search/whpj/search.jsp'
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Length': '58',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': 'JSESSIONID=0000eiLWbmpU1jmVd-YyiUf_XDM:-1',
'Host': 'srh.bankofchina.com',
'Origin': 'http://srh.bankofchina.com',
'Referer': 'http://srh.bankofchina.com/search/whpj/search.jsp',
'Upgrade-Insecure-Requests': '1',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
form_data = {
'erectDate': '2018-08-26', #起始日期
'nothing': '2018-09-02', #截止日期
'pjname': '1326', #1326是歐元的代碼
'page': '1' #打開第一頁(yè)
}
wb_data = requests.post(url,headers = headers,data=form_data)
print(wb_data.text)
爬取結(jié)果:
<html>
<head>
<META content="IE=7.0000" http-equiv="X-UA-Compatible">
<meta http-equiv="Content-Type" content="text/html; charset=GBK">
<title>中國(guó)銀行外匯牌價(jià)</title>
<SCRIPT LANGUAGE="JavaScript" src="../js/wcm_page_2013.js"></SCRIPT>
<script type="text/javascript" src="../js/My97DatePicker/WdatePicker.js"></script>
</head>
<body onload="init_list();" style='width:980px;margin:0 auto;'>
<!-- 頭部嵌套 -->
<link rel="stylesheet" type="text/css" ignoreapd="1">
<link rel="stylesheet" type="text/css" ignoreapd="1">
<link rel="stylesheet" type="text/css" ignoreapd="1">
<!--[if lte IE 8]>
<link type="text/css" rel="stylesheet" >
<![endif]-->
<script language="JavaScript" src="http://www.bankofchina.com/head.js" ignoreapd="1"></script>
<script type="text/javascript" src="http://www.bankofchina.com/images/boc2013_jquery-min.js" ignoreapd="1"></script>
<script type="text/javascript" src="http://www.bankofchina.com/images/boc2013_boc.js" ignoreapd="1"></script>
<style type="text/css">
body{
background:#FFF;
font: 12px/26px Verdana,Geneva,sans-serif,"宋體";
color: rgb(83, 83, 83);
margin:0 auto;
text-align:center;
}
.wrapper{text-align:left;}
.invest_t table td,.publish table th ,.publish table tr{
font-size:12px;
}
.invest_t table tr td select{
height:30px;
}
.invest_t table tr th select{
height:auto;
}
.invest_t select{
width:auto;
height:auto;
}
#calendarTable tr td{
height:20px;
}
</style>
<div class="wrapper">
<script language="JavaScript">
<!--
createTop();
//-->
</script>
<!-- 頁(yè)面導(dǎo)航 -->
<div class='cramb' id='PL_NAVIGATOR'><span>當(dāng)前位置:</span><a >首頁(yè)</a>>外匯牌價(jià)</div>
<h2 class="title"> <br><br></h2>
<form method="post" name="historysearchform" id="historysearchform" action="search.jsp">
<div class="invest_t" style="float:left;width:980px;">
<SCRIPT language=javascript src="../js/WebCalendar.js"></SCRIPT>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td align="right" width="8%">起始時(shí)間:</td>
<td align="left" width="110px;">
<div class="search_bar" style="float:left;width:100px;margin-left:10px;">
<input class="search_ipt" style="width:100px;" type="text" name="erectDate" value="2018-08-26" onclick="new Calendar(null, null,0).show(this);" readonly>
</div>
</td>
<td align="right" width="8%">結(jié)束時(shí)間:</td>
<td align="left" width="110px;">
<div class="search_bar" style="float:left;width:100px;margin-left:10px;">
<input class="search_ipt" style="width:100px;" type="text" name="nothing" value="2018-09-02" onclick="new Calendar(null, null,0).show(this);" readonly>
</div>
</td>
<td align="right" width="10%">牌價(jià)選擇:</td>
<td align="left" width="110px;">
<select name="pjname" id="pjname">
<option value="0" >選擇貨幣</option>
<option value="1314" >英鎊</option>
<option value="1315" >港幣</option>
<option value="1316" >美元</option>
<option value="1317" >瑞士法郎</option>
<option value="1318" >德國(guó)馬克</option>
<option value="1319" >法國(guó)法郎</option>
<option value="1375" >新加坡元</option>
<option value="1320" >瑞典克朗</option>
<option value="1321" >丹麥克朗</option>
<option value="1322" >挪威克朗</option>
<option value="1323" >日元</option>
<option value="1324" >加拿大元</option>
<option value="1325" >澳大利亞元</option>
<option value="1326" selected>歐元</option>
<option value="1327" >澳門元</option>
<option value="1328" >菲律賓比索</option>
<option value="1329" >泰國(guó)銖</option>
<option value="1330" >新西蘭元</option>
<option value="1331" >韓元</option>
<option value="1843" >盧布</option>
<option value="2890" >林吉特</option>
<option value="2895" >新臺(tái)幣</option>
<option value="1370" >西班牙比塞塔</option>
<option value="1371" >意大利里拉</option>
<option value="1372" >荷蘭盾</option>
<option value="1373" >比利時(shí)法郎</option>
<option value="1374" >芬蘭馬克</option>
<option value="3030" >印尼盧比</option>
<option value="3253" >巴西里亞爾</option>
<option value="3899" >阿聯(lián)酋迪拉姆</option>
<option value="3900" >印度盧比</option>
<option value="3901" >南非蘭特</option>
<option value="4418" >沙特里亞爾</option>
<option value="4560" >土耳其里拉</option>
</select>
</td>
<td width="30px;" align="left">
<input class="search_btn" style="float:right;margin-righth:26px;" type="button" onclick="executeSearch()">
</td>
<td> </td>
</tr>
</table>
</div>
</form>
<div class="BOC_main publish">
<table cellpadding="0" cellspacing="0" width="100%" align="left">
<tr>
<th>貨幣名稱</th>
<th>現(xiàn)匯買入價(jià)</th>
<th>現(xiàn)鈔買入價(jià)</th>
<th>現(xiàn)匯賣出價(jià)</th>
<th>現(xiàn)鈔賣出價(jià)</th>
<th>中行折算價(jià)</th>
<th>發(fā)布時(shí)間</th>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.02 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.02 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.02 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.02 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.02 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.01 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.01 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.02 00:00:05</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.01 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.01 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.01 05:30:00</td>
</tr>
<tr>
<td>歐元</td>
<td>790.41</td>
<td>765.85</td>
<td>796.24</td>
<td>797.82</td>
<td>796.46</td>
<td>2018.09.01 04:52:28</td>
</tr>
<tr>
<td>歐元</td>
<td>789.45</td>
<td>764.92</td>
<td>795.27</td>
<td>796.86</td>
<td>796.46</td>
<td>2018.09.01 00:11:15</td>
</tr>
<tr>
<td>歐元</td>
<td>789.52</td>
<td>764.99</td>
<td>795.34</td>
<td>796.93</td>
<td>796.46</td>
<td>2018.09.01 00:00:13</td>
</tr>
<tr>
<td>歐元</td>
<td>790</td>
<td>765.45</td>
<td>795.83</td>
<td>797.41</td>
<td>796.46</td>
<td>2018.09.01 00:00:05</td>
</tr>
<tr>
<td>歐元</td>
<td>790</td>
<td>765.45</td>
<td>795.83</td>
<td>797.41</td>
<td>796.46</td>
<td>2018.08.31 23:52:31</td>
</tr>
<tr>
<td>歐元</td>
<td>790</td>
<td>765.45</td>
<td>795.83</td>
<td>797.41</td>
<td>796.46</td>
<td>2018.08.31 23:52:31</td>
</tr>
<tr>
<td>歐元</td>
<td>790</td>
<td>765.45</td>
<td>795.83</td>
<td>797.41</td>
<td>796.46</td>
<td>2018.08.31 23:52:31</td>
</tr>
<tr>
<td>歐元</td>
<td>790.64</td>
<td>766.07</td>
<td>796.47</td>
<td>798.05</td>
<td>796.46</td>
<td>2018.08.31 23:22:05</td>
</tr>
<tr>
<td>歐元</td>
<td>790.8</td>
<td>766.23</td>
<td>796.63</td>
<td>798.22</td>
<td>796.46</td>
<td>2018.08.31 23:18:43</td>
</tr>
<tr>
<td colspan="11" style="height:30px;"> </td>
</tr>
</table>
<div class="pb_ft clearfix" style="width:500px;clear:both;">
<div class="turn_page" id="list_navigator" style="margin-left:300px;">
</div><!--翻頁(yè)-->
</div><!--content--end-->
</div><!--發(fā)布-end-->
<script language="JavaScript">
function executeSearch()
{
document.historysearchform.method = 'post';
document.historysearchform.submit();
}
PageContext.PageNav.go = function(_iPage,_maxPage)
{
document.pageform.page.value = _iPage;
document.pageform.submit();
};
//畫分頁(yè)代碼以及自動(dòng)調(diào)整窗口大小
var m_nRecordCount = 1014;
if(m_nRecordCount.length == 0){
m_nRecordCount = 0;
}
var m_nCurrPage = 1;
var m_nPageSize = 20;
function init_list(){
PageContext.params["RecordNum"] = m_nRecordCount;
PageContext.params["CurrPage"] = m_nCurrPage;
PageContext.params["PageSize"] = m_nPageSize;
PageContext.drawNavigator();
}
function gotoPage(npage)
{
document.pageform.page.value = npage;
document.pageform.submit();
}
function getPage()
{
var val = document.getElementById("currentPage").value;
return val ;
}
</script>
<script language="JavaScript" src="http://www.bankofchina.com/bottom.js"></script>
<script>
createBottom();
</script>
<form name="pageform" action="search.jsp" method=post >
<input type="hidden" name="erectDate" value="2018-08-26">
<input type="hidden" name="nothing" value="2018-09-02">
<INPUT type="hidden" name="pjname" value="1326">
<input type="hidden" name="page" value="1">
</form>
</div>
</body>
</html>
這個(gè)一整好像挺麻煩的睦尽,有沒(méi)有更簡(jiǎn)單的方法呢?有型雳!
表格信息原本是長(zhǎng)這樣的:
點(diǎn)擊查看源(view source)
当凡,可以切換成鉤子形式,得來(lái)全不費(fèi)功夫
早知如此何必費(fèi)那個(gè)勁呢四啰?(早知道就不那么搞)
我直接打開這個(gè)網(wǎng)頁(yè)好不好宁玫?好的!毫無(wú)差別柑晒。
import requests
url = 'http://srh.bankofchina.com/search/whpj/search.jsp?erectDate=2018-08-26¬hing=2018-09-02&pjname=1326&page=1'
wb_data = requests.get(url)
print(wb_data.text)
效果一樣的。
網(wǎng)頁(yè)的解析
這里是帶有tbody的匙赞,但是如果真的用beautifulsoup解析會(huì)發(fā)現(xiàn)tbody是不存在的佛掖,導(dǎo)致body > div > div.BOC_main.publish > table > tbody > tr > td
選擇出來(lái)的結(jié)果為空列表,這是一個(gè)已知問(wèn)題涌庭,解析網(wǎng)頁(yè)的時(shí)候芥被,會(huì)遇到tbody標(biāo)簽。tbody標(biāo)簽有的時(shí)候可以解析坐榆,有的時(shí)候不可以解析拴魄,遇到tbody標(biāo)簽時(shí)要看網(wǎng)頁(yè)源代碼,如果源代碼有tbody標(biāo)簽,就要加上tbody標(biāo)簽才能解析匹中。
如果源代碼沒(méi)有tbody標(biāo)簽夏漱,那么tbody標(biāo)簽是瀏覽器對(duì)html文本進(jìn)行一定的規(guī)范化而強(qiáng)行加上去的,這時(shí)如果有tbody則無(wú)法解析出來(lái)顶捷,此時(shí)去掉其中的tbody即可挂绰。
MacOS環(huán)境下的Python3.6代碼
from bs4 import BeautifulSoup
import pandas as pd
import requests
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.font_manager import FontProperties
# 解決matplotlib中文顯示問(wèn)題服赎,僅適用windows系統(tǒng)
plt.rcParams['font.sans-serif'] = ['SimHei']
# 解決matplotlib中文顯示問(wèn)題葵蒂,僅適用mac系統(tǒng)
def get_chinese_font():
return FontProperties(fname='/System/Library/Fonts/PingFang.ttc')
url = 'http://srh.bankofchina.com/search/whpj/search.jsp?erectDate=2018-08-27¬hing=2018-09-02&pjname=1326&page=1'
wb_data = requests.get(url)
#print(wb_data.text)
soup= BeautifulSoup(wb_data.text,'lxml')
raw_price_tag = soup.select('body > div > div.BOC_main.publish > table > tr > th')
raw_price = soup.select('body > div > div.BOC_main.publish > table > tr > td')[:-1]
#body > div > div.BOC_main.publish > table > tbody > tr:nth-child(2) > td:nth-child(1)
#print(raw_price.json())
price_dict = {}
raw_price = [i.text for i in raw_price]
for i in range(len(raw_price_tag)):
price_dict[raw_price_tag[i].text] = raw_price[i::len(raw_price_tag)]
urls = [url[:-1]+str(i) for i in range(2,52)] #一共51頁(yè)
for each_url in urls:
wb_data = requests.get(each_url)
soup = BeautifulSoup(wb_data.text, 'lxml')
raw_price = soup.select('body > div > div.BOC_main.publish > table > tr > td')[:-1]
raw_price = [i.text for i in raw_price]
for i in range(len(raw_price_tag)):
price_dict[raw_price_tag[i].text] += raw_price[i::len(raw_price_tag)]
df = pd.DataFrame(price_dict).drop_duplicates() #成幀、去重
df['發(fā)布時(shí)間'] = pd.to_datetime(df['發(fā)布時(shí)間'])
df['具體時(shí)間'] = df['發(fā)布時(shí)間'].dt.strftime('%H:%M:%S') #時(shí)間時(shí)分秒的提取
df.set_index("發(fā)布時(shí)間", inplace=True)
#print(df.sort_values(by = ['發(fā)布時(shí)間','中行折算價(jià)'],ascending=[True,False])) #每天里面最高到最低
#print(df['2018-09-02']) 可以按照日期過(guò)濾
print('每日最坑價(jià)和最坑時(shí)間如下:')
tag_list = ['現(xiàn)匯買入價(jià)', '現(xiàn)鈔買入價(jià)', '現(xiàn)匯賣出價(jià)', '現(xiàn)鈔賣出價(jià)', '中行折算價(jià)']
days = ['2018-08-27','2018-08-28','2018-08-29','2018-08-30','2018-08-31','2018-09-01','2018-09-02']
rsp_df = df.resample('D').max() #僅僅針對(duì)時(shí)間序列的操作 每一項(xiàng)的最大值
df_extra=pd.DataFrame(np.arange(42).reshape((7,6)),index=days,columns=['現(xiàn)匯買入價(jià)', '現(xiàn)鈔買入價(jià)', '現(xiàn)匯賣出價(jià)', '現(xiàn)鈔賣出價(jià)', '中行折算價(jià)','具體時(shí)間'])
for i in range(rsp_df.shape[0]):
this_day = days[i]
max_price = max(rsp_df.iloc[i][tag_list].values)
max__price_loc = tag_list[list(rsp_df.iloc[i][tag_list].values).index(max_price)]
this_day_df = df[this_day]
high_frame = this_day_df[this_day_df[max__price_loc]==max_price].iloc[0]
high_day = str(high_frame.name)[:10]
high_time = high_frame["具體時(shí)間"]
df_extra.iloc[i] = high_frame
print(f'{high_day} {high_time} 此時(shí){max__price_loc}:{max_price}')
if i == 0:
best_price = max_price
elif max_price > best_price:
best_day,best_time ,best_tag, best_price= high_day,high_time,max__price_loc,max_price
print('-----------------------------------')
print('日最高價(jià)全幀預(yù)覽:')
print(df_extra)
print('-----------------------------------')
print(f'綜上所述重虑,歐元對(duì)人民幣匯率最高的時(shí)機(jī):\n{best_day} {best_time} 此時(shí){best_tag}:{best_price}')
print('-----------------------------------')
#繪圖之前進(jìn)行強(qiáng)制類型轉(zhuǎn)換保證數(shù)據(jù)可繪
for each_tag in tag_list:
rsp_df[each_tag] = rsp_df[each_tag].astype(float)
df_for_plot = rsp_df.drop(['貨幣名稱'],axis = 1) #去掉不能畫圖的貨幣名稱
df_for_plot = rsp_df.drop(['具體時(shí)間'],axis = 1) #去掉不能畫圖的具體時(shí)間
df_for_plot.plot()
plt.title('2018年八月末九月初 中行 歐元對(duì)人民幣匯率變化', fontproperties=get_chinese_font())
plt.legend(loc='best', prop=get_chinese_font())
#plt.interactive(False)
plt.ylabel('人民幣/100歐', #y標(biāo)簽
fontproperties = get_chinese_font(), #字體
fontsize=14 #字大小
)
plt.xlabel('日期', #y標(biāo)簽
fontproperties = get_chinese_font(), #字體
fontsize=14 #字大小
)
plt.tight_layout()
plt.savefig('./cur.png')
plt.show()
每日最坑價(jià)和最坑時(shí)間如下:
2018-08-27 23:03:27 此時(shí)現(xiàn)鈔賣出價(jià):800.9
2018-08-28 21:29:12 此時(shí)現(xiàn)鈔賣出價(jià):802.52
2018-08-29 22:17:07 此時(shí)現(xiàn)鈔賣出價(jià):803.62
2018-08-30 15:23:29 此時(shí)現(xiàn)鈔賣出價(jià):805.27
2018-08-31 09:19:26 此時(shí)現(xiàn)鈔賣出價(jià):805.28
2018-09-01 05:30:00 此時(shí)現(xiàn)鈔賣出價(jià):797.82
2018-09-02 05:30:00 此時(shí)現(xiàn)鈔賣出價(jià):797.82
-----------------------------------
日最高價(jià)全幀預(yù)覽:
現(xiàn)匯買入價(jià) 現(xiàn)鈔買入價(jià) 現(xiàn)匯賣出價(jià) 現(xiàn)鈔賣出價(jià) 中行折算價(jià) 具體時(shí)間
2018-08-27 793.46 768.8 799.31 800.9 797.77 23:03:27
2018-08-28 795.06 770.36 800.93 802.52 795.45 21:29:12
2018-08-29 796.15 771.41 802.02 803.62 795.9 22:17:07
2018-08-30 797.78 773 803.67 805.27 797.59 15:23:29
2018-08-31 797.79 773.01 803.68 805.28 797.59 09:19:26
2018-09-01 790.41 765.85 796.24 797.82 796.46 05:30:00
2018-09-02 790.41 765.85 796.24 797.82 796.46 05:30:00
-----------------------------------
綜上所述践付,歐元對(duì)人民幣匯率最高的時(shí)機(jī):
2018-08-31 09:19:26 此時(shí)現(xiàn)鈔賣出價(jià):805.28
坑
- 坑1
tbody問(wèn)題 - 坑2
pandas幀內(nèi)的對(duì)象在為object類型時(shí)能進(jìn)行比較,但是不能畫圖嚎尤,要轉(zhuǎn).astype(float)
荔仁。 - 坑3
時(shí)間序列的日期提取
df['具體時(shí)間'] = df['發(fā)布時(shí)間'].dt.strftime('%H:%M:%S') #時(shí)間時(shí)分秒的提取
后記
不要學(xué)我刷中行的網(wǎng)站伍宦,我的ip已經(jīng)被中行block了芽死,這篇結(jié)果我是開手機(jī)熱點(diǎn)爬的(捂臉)。