講解:Maths Skills茎杂、R错览、LendingClub、RPython|R

Maths Skills 2 (Statistics) assessmentSpring Term 2019This assignment consists of two parts: in the first one you are asked to revisit the data onUK fishing vessels under 10 metres long; the second part instead concerns data on loans(granted as well as rejected) from LendingClub, one of the largest U.S. peer-to-peer financialcompanies. For each part, you will download data from a website, read them into R, performsome computations on them, and produce graphical displays.The process and results of the analysis should be documented in an R Markdown file, producingan output document in PDF format. Code alone is not acceptable: at each step, you need toexplain what you are doing and comment (briefly!) on the results. Your submission shouldconsist of a single zipped folder containing only:1. the R Markdown file (.Rmd file) and2. the output PDF document.Absence of one of the files will incur a very high penalty. Inclusion in the zipped folder ofother unrequested files, including the data files, will also attract a penalty.The maximum number of pages of the PDF document is 10, all inclusive. Because of this, itis best to do without a table of contents; however, sectioning of the document is stronglyencouraged. In particular, separate sections should be devoted to each of the two parts ofthe assignmemt.You are allowed to use and adapt for your purposes all the materials presented in the lecturesand posted on Moodle, including the R Markdown input files, without any need of referencingthem. However, you should provide, at the end of your report, references to the R packagesyou are using and to the data sources.You may informally discuss with fellow students how to perform a task. However, you arenot allowed to share your code, and should write the report on your own.The following is a more detailed description of what the assignment entails.Part A: UK Fishing vesselsThe goal of this part is provide a graphical representation of the geographical distribution ofUK fishing vessels under 10 metres in length.1. Use get_map() in package ggmap to obtain from Stamen Maps a map of the UK andsave it into an object UKmap. Make sure that Northern Ireland and the Shetland Islandsare included, this will require some experimentation to find an appropriate bounding box.2. Read into R the vessel data using read_csv(), then change the variables’ names asdone in the lectures.13. Produce a tibble AdminPortCount containing a tally of the numbers of vessels in eachAdministrative Port.4. Use geocode() on the first column of AdminPortCount to obtain a tibble with thecoordinates of each port. Then, with bind_cols(), bind together this tibble andAdminPortCount, and call the resulting tibble AdminPortCLL.Check that the coordinates of the ports in AdminPortCLL are all within the map’sbounding box; filter() may be useful for this purpose.If any of the coordinates are outside the map, use geocode() again for the correspondingports, adding , UK to the name of the port. Then replace the wrong coordinates inAdminPortCLL with the correct ones. Print the tibble AdminPortCLL.5. Plot UKmap with ggmap() and use geom_point() to overlay on it red filled circlescentered at the Administrative ports and with area proportional to the number of fishingvessels in each port. The size aesthetic is useful in this regard. Experiment with a fewvalues of alpha to control the transparency of the circles. What is the eectof addingscale_size_area() to the specification of your plot?Part B: LendingClub’s data on Loans and RejectionsThe general aim of this part is to produce some plots to compare the loans granted byLendingClub to the loan applications that were rejected. In order to do that, the data forLoans and Rejects, which are available in dierentformats, need to undergo preliminarytransformations.1. Pick a State in the U.S., a list is available at https://en.wikipedia.org/wiki/U.S._state.Send me an email at agostino.nobile@york.ac.uk with subject Maths Skills 2 -US State, to inform me of your choice: I may ask代做Maths Skills作業(yè)煌往、代寫R實驗作業(yè)倾哺、代寫LendingClub作業(yè)、代做R編程設(shè)計作業(yè) 代做Python程 you to change it, if it is unsuitable ortoo popular. By sending the email you are not committing yourself to that choice ofState (email me again to change it), or indeed to doing the Statistics assignment ofMaths Skills 2.2. Go to the LendingClub website https://www.lendingclub.com/info/download-data.actionand download the Loans data and the Rejections data for the four quarters of 2018.Also download the Data Dictionary available at the bottom of the same page. It maybe useful to explore a bit the LendingClub site, to get an idea of their business and ofpeer-to-peer lending more generally.The eight zipped csv files downloaded have a total size of about 170MB. Place them ina subfolder, say called DATA, of the folder where you keep the Rmd file for this project.Depending on how much disk space you have available, you may not wish to unzip allthe files at once, as their inflated size is about 1.1GB.23. For each of the zipped Rejects files: Unzip the file from within R using something likesystem(unzip DATA/filename.csv.zip) Read the data in the unzipped file into R, using read_csv(). Remove the unzipped file (to save disk space), with something likesystem(rm filename.csv) Filter the data in the resulting tibble, to keep only the records where the variable Stateequals the two letter code of your chosen state.Finally, use the function bind_rows() to collect all the data for your chosen state intoa single tibble called Rejects.4. Read the Loans data into R, following a procedure similar to the one for the Rejects.For the Loans data, you will need to deal with two additional complications: the csv files contain two summary rows at the bottom which must not be read, theargument n_max in read_csv() is helpful in this respect; the algorithm used by read_csv() to guess the type of each variable (chr, dbl, etc.)fails for these data, if one uses the default value of the argument guess_max (moredetails can be found in § 11.4 of Wickham and Grolemund, 2017). To fix this, in thecall to read_csv(), set guess_max to the same value as n_max.As for Rejects, collect all the Loans data for your chosen state in a single tibble calledLoans.5. The Loans tibble should contain 145 variables. Use select() to keep only the variablesloan_amnt, title, dti, zip_code, addr_state and emp_length, renamed respectivelyas Amount, Title, Debt2IncomeRatio, ZipCode, State and EmploymentLength.Perform the same operation on the Rejects data, by selecting the same six variables(note the dierentnames!) and renaming them as done for the Loans data.The variable Debt2IncomeRatio for the Rejects data is of type chr: remove thetrailing % symbols and transform the variable to type dbl.Replace any instance of n/a in the character variables in the two tibbles with NA.6. Use bind_rows() to collect together the Loans and Rejects data into a single tibblenamed LoanApps, with an additional character variable Granted that is equal to Yesfor Loans, and equal to No for Rejects. The argument .id of bind_rows() is ofhelp in this regard.Use mutate(), fct_recode() and fct_relevel() to replace, in the tibble LoanApps,the variable EmploymentLength with a factor EmploymentYears having levels 1, ..., 9, 10+.7. Make a barplot to explore whether the proportion of granted loans is related to the loantitle. Which type of loans are granted more (or less) often?Consider next how the proportion of granted loans varies across the levels of the factor3EmploymentYears. Make a plot and comment on the results.8. Use boxplots (and an appropriate scale) to compare the distributions of the variableAmount between granted and rejected loans. Are the variables Amount and Grantedrelated, and if so how?Compare next the distributions of the variable Debt2IncomeRatio between granted andrejected loans.9. Use geom_bar() to investigate whether the proportion of granted loans varies acrossZipCodes. As some ZipCodes have small counts, use fct_lump() to restrict attentionto the most common (30 or fewer) ZipCodes轉(zhuǎn)自:http://ass.3daixie.com/2019030729243791.html

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市羞海,隨后出現(xiàn)的幾起案子忌愚,更是在濱河造成了極大的恐慌,老刑警劉巖扣猫,帶你破解...
    沈念sama閱讀 222,729評論 6 517
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件菜循,死亡現(xiàn)場離奇詭異,居然都是意外死亡申尤,警方通過查閱死者的電腦和手機癌幕,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 95,226評論 3 399
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來昧穿,“玉大人勺远,你說我怎么就攤上這事∈蓖遥” “怎么了胶逢?”我有些...
    開封第一講書人閱讀 169,461評論 0 362
  • 文/不壞的土叔 我叫張陵,是天一觀的道長饰潜。 經(jīng)常有香客問我初坠,道長,這世上最難降的妖魔是什么彭雾? 我笑而不...
    開封第一講書人閱讀 60,135評論 1 300
  • 正文 為了忘掉前任碟刺,我火速辦了婚禮,結(jié)果婚禮上薯酝,老公的妹妹穿的比我還像新娘半沽。我一直安慰自己,他們只是感情好吴菠,可當(dāng)我...
    茶點故事閱讀 69,130評論 6 398
  • 文/花漫 我一把揭開白布者填。 她就那樣靜靜地躺著,像睡著了一般做葵。 火紅的嫁衣襯著肌膚如雪占哟。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 52,736評論 1 312
  • 那天蜂挪,我揣著相機與錄音重挑,去河邊找鬼。 笑死棠涮,一個胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的刺覆。 我是一名探鬼主播严肪,決...
    沈念sama閱讀 41,179評論 3 422
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了驳糯?” 一聲冷哼從身側(cè)響起篇梭,我...
    開封第一講書人閱讀 40,124評論 0 277
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎酝枢,沒想到半個月后恬偷,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 46,657評論 1 320
  • 正文 獨居荒郊野嶺守林人離奇死亡帘睦,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 38,723評論 3 342
  • 正文 我和宋清朗相戀三年袍患,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片竣付。...
    茶點故事閱讀 40,872評論 1 353
  • 序言:一個原本活蹦亂跳的男人離奇死亡诡延,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出古胆,到底是詐尸還是另有隱情肆良,我是刑警寧澤,帶...
    沈念sama閱讀 36,533評論 5 351
  • 正文 年R本政府宣布逸绎,位于F島的核電站惹恃,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏棺牧。R本人自食惡果不足惜巫糙,卻給世界環(huán)境...
    茶點故事閱讀 42,213評論 3 336
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望陨帆。 院中可真熱鬧曲秉,春花似錦、人聲如沸疲牵。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,700評論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽纲爸。三九已至亥鸠,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間识啦,已是汗流浹背负蚊。 一陣腳步聲響...
    開封第一講書人閱讀 33,819評論 1 274
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留颓哮,地道東北人家妆。 一個月前我還...
    沈念sama閱讀 49,304評論 3 379
  • 正文 我出身青樓,卻偏偏與公主長得像冕茅,于是被迫代替她去往敵國和親伤极。 傳聞我的和親對象是個殘疾皇子蛹找,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 45,876評論 2 361

推薦閱讀更多精彩內(nèi)容

  • The Great A.I. Awakening How Google used artificial intel...
    圖羽閱讀 1,214評論 0 3
  • pyspark.sql模塊 模塊上下文 Spark SQL和DataFrames的重要類: pyspark.sql...
    mpro閱讀 9,465評論 0 13
  • 當(dāng)我走向天臺的那一刻,我已經(jīng)決定了要告別人生哨坪。 選擇自殺是迫不得已庸疾,生活無望,信念崩潰当编,極度渴望獲得重生的感覺如一...
    張柒念閱讀 1,113評論 10 18
  • 最近幾天陳姑娘有些不開心届慈,原本大大咧咧的姑娘,一整天下來除了刷微博忿偷、刷劇金顿,一句話都不說。就連最愛的外賣牵舱,都不積極了...
    舊念777閱讀 226評論 0 0
  • 每日一問 當(dāng)你寫作沒有內(nèi)容時芜壁,你如何考慮輸入礁凡? 沒有寫作的內(nèi)容時: 1.我會去翻書,大致粗略的快速翻書慧妄,快速抓關(guān)鍵...
    耿艷菊閱讀 91評論 0 1