作業(yè)要求:
選擇簡書解密大數據專題里面上次爬蟲課的作業(yè)文檔地址作為分析頁面,分析并提交該頁面的網頁結構分析與元素標簽位置信息勒庄。
上次作業(yè)鏈接 http://www.reibang.com/p/7e2fccb4fad9
HTML基本結構
網頁基本結構圖
head 區(qū)域
網頁標題
<title>爬蟲課程作業(yè)01-解密大數據社群 - 簡書</title>
頂部導航欄
簡
|<a class="logo" href="/">| </a>
寫文章
| <a class="btn write-btn" target="_blank" href="/writer#/">
| <i class="iconfont ic-write"></i>寫文章
| </a>
發(fā)現润梯、關注弛说、消息和搜索四個按鈕
<div class="collapse navbar-collapse" id="menu">
<ul class="nav navbar-nav">
<li class="">
<a href="/">
<span class="menu-text">發(fā)現</span><i class="iconfont ic-navigation-discover menu-icon"></i>
</a> </li>
<li class="">
<a href="/subscriptions">
<span class="menu-text">關注</span><i class="iconfont ic-navigation-follow menu-icon"></i>
</a> </li>
<li class="notification v-notification-dropdown-menu ">
<a class="notification-btn" href="/notifications" data-hover="dropdown">
<span class="menu-text">消息</span>
<i class="iconfont ic-navigation-notification menu-icon"></i>
<span class="badge"></span>
</a>
</li>
<li class="search">
<form target="_blank" action="/search" accept-charset="UTF-8" method="get"><input name="utf8" type="hidden" value="?" />
<input type="text" name="q" id="q" value="" placeholder="搜索" class="search-input" />
<a class="search-btn" href="javascript:void(null)"><i class="iconfont ic-search"></i></a>
</form> </li>
</ul>
</div>
文章標題
<h1 class="title">爬蟲課程作業(yè)01-解密大數據社群</h1>
作者信息
<div class="author">
<a class="avatar" href="/u/40cc6159e5ad">
</a> <div class="info">
<span class="tag">作者</span>
<span class="name"><a href="/u/40cc6159e5ad">在旅途的車</a></span>
文章基本信息句柠,包括更新時間冀泻、字數凛驮、閱讀數量裆站、評論數量、喜歡數量等等
<div class="meta">
<span class="publish-time" data-toggle="tooltip" data-placement="bottom" title="" data-original-title="最后編輯于 2017.07.04 00:29">2017.07.04 00:26*</span>
<span class="wordage">字數 387</span>
<span class="views-count">閱讀 33</span><span class="comments-count">評論 2</span><span class="likes-count">喜歡 2</span></div>
文章主體內容:
<div data-note-content="" class="show-content">
<div class="image-package">
<div class="image-caption">glenn-carstens-peters-203007.jpg</div>
</div>
<p>最近對金融行業(yè)的就業(yè)情況比較感興趣黔夭,準備從領英網站獲取一些數據宏胯,做一些分析。</p>
<p>一本姥、要爬取的數據類別</p>
<p>領英網站金融行業(yè)的職位數據肩袍,包括公司名稱、職位名稱扣草、薪酬范圍了牛、職位要求</p>
<p>二、對應的數據源網站</p>
<p>領英網址 www.linkedin.com</p>
<p>三辰妙、爬取數據的URL</p>
<p><a target="_blank">https://www.linkedin.com/jobs/search/?keywords=audit&location=%E5%85%A8%E7%90%83&locationId=OTHERS.worldwide</a></p>
<p>四鹰祸、數據篩選規(guī)則</p>
<p>根據職位的類別、招聘公司密浑、職位所在地域蛙婴、職位對應工作年限的要求、發(fā)布日期尔破、職位要求街图、薪酬范圍等維度,對爬取的數據進行篩選和分析懒构,希望獲得以下結論:</p>
<p>某個特定職位的薪酬水平及變化趨勢餐济,判斷該職位的稀缺程度和就業(yè)概率;</p>
<p>某個特定職位的地域分布情況胆剧,提供自己發(fā)展的區(qū)域選擇參考依據絮姆;</p>
<p>某個特定職位在不同行業(yè)的分布情況,和對應的薪酬水平秩霍,以審計(audit)為例篙悯,該職位具備一定的行業(yè)共性,但是不同行業(yè)铃绒、同一個職位薪酬水平不同鸽照,可以為自己做職業(yè)轉換提供參考;</p>
<p>某個特定職位的工作要求,為自己的職業(yè)發(fā)展和技能培訓提供指導性意見泌神。</p>
</div>
側邊浮動按鈕,主要包括回到頂部须鼎、文章投稿漏峰、收藏文章和分享文章四個功能:
<ul><li data-placement="left" data-toggle="tooltip" data-container="body" data-original-title="回到頂部"><a class="function-button"><i class="iconfont ic-backtop"></i></a></li> <li data-placement="left" data-toggle="tooltip" data-container="body" data-original-title="文章投稿"><a class="js-submit-button"><i class="iconfont ic-note-requests"></i></a> </li> <li data-placement="left" data-toggle="tooltip" data-container="body" data-original-title="收藏文章"><a class="function-button"><i class="iconfont ic-mark"></i></a></li> <li data-placement="left" data-toggle="tooltip" data-container="body" data-original-title="分享文章"><a tabindex="0" role="button" data-toggle="popover" data-placement="left" data-html="true" data-trigger="focus" href="javascript:void(0);" data-content="<ul class='share-list'>
<li><a class="weixin-share"><i class="social-icon-sprite social-icon-weixin"></i><span>分享到微信</span></a></li>
<li><a href="javascript:void((function(s,d,e,r,l,p,t,z,c){var%20f='http://v.t.sina.com.cn/share/share.php?appkey=1881139527',u=z||d.location,p=['&url=',e(u),'&title=',e(t||d.title),'&source=',e(r),'&sourceUrl=',e(l),'&content=',c||'gb2312','&pic=',e(p||'')].join('');function%20a(){if(!window.open([f,p].join(''),'mb',['toolbar=0,status=0,resizable=1,width=440,height=430,left=',(s.width-440)/2,',top=',(s.height-430)/2].join('')))u.href=[f,p].join('');};if(/Firefox/.test(navigator.userAgent))setTimeout(a,0);else%20a();})(screen,document,encodeURIComponent,'','','', '我寫了新文章《爬蟲課程作業(yè)01-解密大數據社群》( 分享自 @簡書 )','http://www.reibang.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=weibo','頁面編碼gb2312|utf-8默認gb2312'));"><i class='social-icon-sprite social-icon-weibo'></i><span>分享到微博</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,r='http://sns.qzone.qq.com/cgi-bin/qzshare/cgi_qzshare_onekey?url='+e('http://www.reibang.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=qzone')+'&title='+e('我寫了新文章《爬蟲課程作業(yè)01-解密大數據社群》'),x=function(){if(!window.open(r,'qzone','toolbar=0,resizable=1,scrollbars=yes,status=1,width=600,height=600'))location.href=r};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})();"><i class='social-icon-sprite social-icon-zone'></i><span>分享到QQ空間</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,r='https://twitter.com/share?url='+e('http://www.reibang.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=twitter')+'&text='+e('我寫了新文章《爬蟲課程作業(yè)01-解密大數據社群》( 分享自 @jianshucom )')+'&related='+e('jianshucom'),x=function(){if(!window.open(r,'twitter','toolbar=0,resizable=1,scrollbars=yes,status=1,width=600,height=600'))location.href=r};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})();"><i class='social-icon-sprite social-icon-twitter'></i><span>分享到Twitter</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,r='https://www.facebook.com/dialog/share?app_id=483126645039390&display=popup&href=http://www.reibang.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=facebook',x=function(){if(!window.open(r,'facebook','toolbar=0,resizable=1,scrollbars=yes,status=1,width=450,height=330'))location.href=r};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})();"><i class='social-icon-sprite social-icon-facebook'></i><span>分享到Facebook</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,r='https://plus.google.com/share?url='+e('http://www.reibang.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=google_plus'),x=function(){if(!window.open(r,'google_plus','toolbar=0,resizable=1,scrollbars=yes,status=1,width=450,height=330'))location.href=r};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})();"><i class='social-icon-sprite social-icon-google'></i><span>分享到Google+</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,s1=window.getSelection,s2=d.getSelection,s3=d.selection,s=s1?s1():s2?s2():s3?s3.createRange().text:'',r='http://www.douban.com/recommend/?url='+e('http://www.reibang.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=douban')+'&title='+e('爬蟲課程作業(yè)01-解密大數據社群')+'&sel='+e(s)+'&v=1',x=function(){if(!window.open(r,'douban','toolbar=0,resizable=1,scrollbars=yes,status=1,width=450,height=330'))location.href=r+'&r=1'};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})()"><i class='social-icon-sprite social-icon-douban'></i><span>分享到豆瓣</span></a></li>
</ul>" data-original-title="" title="" class="function-button"><i class="iconfont ic-share"></i></a> <!----></li></ul>
底部作者信息:
<div class="follow-detail">
<div class="info">
<a class="avatar" href="/u/40cc6159e5ad">
</a> <div data-author-follow-button=""></div>
<a class="title" href="/u/40cc6159e5ad">在旅途的車</a>
<p>寫了 39662 字糠悼,被 26 人關注,獲得了 35 個喜歡</p></div>
</div>