[28] 《R數(shù)據(jù)科學》Workflow: Projects

本文摘自《R數(shù)據(jù)科學》,主要介紹了RStudio中R工作目錄的設置万牺,通過這種方式,可以設置工作文件夾洽腺,快速的讀取文件脚粟,除此之外,也介紹了使用ggsave()對pdf進行保存蘸朋,原文如下:

One day you will need to quit R, go do something else, and return to your analysis the next day. One day you will be working on multiple analyses simultaneously that all use R and you want to keep them separate. One day you will need to bring data from the outside world into R and send numerical results and figures from R back out into the world. To handle these real-life situations, you need to make two decisions:

  1. What about your analysis is “real,” i.e., what will you save as your lasting record of what happened?
  2. Where does your analysis “l(fā)ive”?

What Is Real?

As a beginning R user, it’s OK to consider your environment (i.e., the objects listed in the environment pane) “real.” However, in the long run, you’ll be much better off if you consider your R scripts as “real.”
With your R scripts (and your data files), you can re-create the environment.
It’s much harder to re-create your R scripts from your environment! You’ll either have to retype a lot of code from memory (making mistakes all the way) or you’ll have to carefully mine your R history.
To foster this behavior, I highly recommend that you instruct RStudio not to preserve your workspace between sessions:


image.png

This will cause you some short-term pain, because now when you restart RStudio it will not remember the results of the code that you ran last time. But this short-term pain will save you long-term agony because it forces you to capture all important interactions in your code. There’s nothing worse than discovering three months after the fact that you’ve only stored the results of an important calculation in your workspace, not the calculation itself in your code.
There is a great pair of keyboard shortcuts that will work together to make sure you’ve captured the important parts of your code in the editor:
? Press Cmd/Ctrl-Shift-F10 to restart RStudio.
? Press Cmd/Ctrl-Shift-S to rerun the current script.
I use this pattern hundreds of times a week.

Where Does Your Analysis Live?

R has a powerful notion of the working directory. This is where R
looks for files that you ask it to load, and where it will put any files
that you ask it to save. RStudio shows your current working directory
at the top of the console:


image.png

And you can print this out in R code by running getwd():

getwd()
#> [1] "/Users/hadley/Documents/r4ds/r4ds"

As a beginning R user, it’s OK to let your home directory, documents directory, or any other weird directory on your computer be R’s working directory. But you’re six chapters into this book, and you’re no longer a rank beginner. Very soon now you should evolve to organizing your analytical projects into directories and, when working on a project, setting R’s working directory to the associated directory.
I do not recommend it, but you can also set the working directory from within R:

setwd("/path/to/my/CoolProject")

But you should never do this because there’s a better way; a way that also puts you on the path to managing your R work like an expert.

Paths and Directories

Paths and directories are a little complicated because there are two basic styles of paths: Mac/Linux and Windows. There are three chief ways in which they differ:
? The most important difference is how you separate the components of the path. Mac and Linux use slashes (e.g., plots/diamonds.pdf) and Windows uses backslashes (e.g., plots\dia
monds.pdf). R can work with either type (no matter what platform you’re currently using), but unfortunately, backslashes mean something special to R, and to get a single backslash in the path, you need to type two backslashes! That makes life frustrating, so I recommend always using the Linux/Max style with forward slashes.
? Absolute paths (i.e., paths that point to the same place regardless of your working directory) look different. In Windows they start with a drive letter (e.g., C:) or two backslashes (e.g., \servername) and in Mac/Linux they start with a slash “/”(e.g., /users/hadley). You should never use absolute paths in your scripts, because they hinder sharing: no one else will have
exactly the same directory configuration as you.
? The last minor difference is the place that ~ points to. ~ is a convenient shortcut to your home directory. Windows doesn’t really have the notion of a home directory, so it instead points to your documents directory.

RStudio Projects

R experts keep all the files associated with a project together—input data, R scripts, analytical results, figures. This is such a wise and common practice that RStudio has built-in support for this via projects.
Let’s make a project for you to use while you’re working through the rest of this book. Click File → New Project, then:


image.png

image.png

image.png

Call your project r4ds and think carefully about which subdirectory you put the project in. If you don’t store it somewhere sensible, it will be hard to find it in the future!
Once this process is complete, you’ll get a new RStudio project just for this book. Check that the “home” directory of your project is the current working directory:

getwd()
#> [1] /Users/hadley/Documents/r4ds/r4ds

Whenever you refer to a file with a relative path it will look for it here.
Now enter the following commands in the script editor, and save the file, calling it diamonds.R. Next, run the complete script, which will save a PDF and CSV file into your project directory. Don’t worry about the details, you’ll learn them later in the book:

library(tidyverse)
ggplot(diamonds, aes(carat, price)) +
geom_hex()
ggsave("diamonds.pdf")
write_csv(diamonds, "diamonds.csv")

Quit RStudio. Inspect the folder associated with your project notice the .Rproj file. Double-click that file to reopen the project. Notice you get back to where you left off: it’s the same working
directory and command history, and all the files you were working on are still open. Because you followed my instructions above, you will, however, have a completely fresh environment, guaranteeing that you’re starting with a clean slate.
In your favorite OS-specific way, search your computer for diamonds.pdf and you will find the PDF (no surprise) but also the script that created it (diamonds.r). This is huge win! One day you will want to remake a figure or just understand where it came from. If you rigorously save figures to files with R code and never with the mouse or the clipboard, you will be able to reproduce old work with ease!

Summary

In summary, RStudio projects give you a solid workflow that will serve you well in the future:
? Create an RStudio project for each data analyis project.
? Keep data files there; we’ll talk about loading them into R in Chapter 8.
? Keep scripts there; edit them, and run them in bits or as a whole.
? Save your outputs (plots and cleaned data) there.
? Only ever use relative paths, not absolute paths.
Everything you need is in one place, and cleanly separated from all the other projects that you are working on.

?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末核无,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子藕坯,更是在濱河造成了極大的恐慌团南,老刑警劉巖,帶你破解...
    沈念sama閱讀 206,839評論 6 482
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件炼彪,死亡現(xiàn)場離奇詭異吐根,居然都是意外死亡,警方通過查閱死者的電腦和手機辐马,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,543評論 2 382
  • 文/潘曉璐 我一進店門拷橘,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人喜爷,你說我怎么就攤上這事膜楷。” “怎么了贞奋?”我有些...
    開封第一講書人閱讀 153,116評論 0 344
  • 文/不壞的土叔 我叫張陵,是天一觀的道長穷绵。 經(jīng)常有香客問我轿塔,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 55,371評論 1 279
  • 正文 為了忘掉前任勾缭,我火速辦了婚禮揍障,結果婚禮上,老公的妹妹穿的比我還像新娘俩由。我一直安慰自己毒嫡,他們只是感情好,可當我...
    茶點故事閱讀 64,384評論 5 374
  • 文/花漫 我一把揭開白布幻梯。 她就那樣靜靜地躺著兜畸,像睡著了一般。 火紅的嫁衣襯著肌膚如雪碘梢。 梳的紋絲不亂的頭發(fā)上咬摇,一...
    開封第一講書人閱讀 49,111評論 1 285
  • 那天,我揣著相機與錄音煞躬,去河邊找鬼肛鹏。 笑死,一個胖子當著我的面吹牛恩沛,可吹牛的內(nèi)容都是我干的在扰。 我是一名探鬼主播,決...
    沈念sama閱讀 38,416評論 3 400
  • 文/蒼蘭香墨 我猛地睜開眼雷客,長吁一口氣:“原來是場噩夢啊……” “哼芒珠!你這毒婦竟也來了?” 一聲冷哼從身側響起佛纫,我...
    開封第一講書人閱讀 37,053評論 0 259
  • 序言:老撾萬榮一對情侶失蹤妓局,失蹤者是張志新(化名)和其女友劉穎飞蚓,沒想到半個月后猪钮,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,558評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡奶卓,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,007評論 2 325
  • 正文 我和宋清朗相戀三年甥啄,在試婚紗的時候發(fā)現(xiàn)自己被綠了存炮。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 38,117評論 1 334
  • 序言:一個原本活蹦亂跳的男人離奇死亡蜈漓,死狀恐怖穆桂,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情融虽,我是刑警寧澤享完,帶...
    沈念sama閱讀 33,756評論 4 324
  • 正文 年R本政府宣布,位于F島的核電站有额,受9級特大地震影響般又,放射性物質(zhì)發(fā)生泄漏彼绷。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 39,324評論 3 307
  • 文/蒙蒙 一茴迁、第九天 我趴在偏房一處隱蔽的房頂上張望寄悯。 院中可真熱鬧,春花似錦堕义、人聲如沸猜旬。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,315評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽洒擦。三九已至,卻和暖如春糖耸,著一層夾襖步出監(jiān)牢的瞬間秘遏,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,539評論 1 262
  • 我被黑心中介騙來泰國打工嘉竟, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留邦危,地道東北人。 一個月前我還...
    沈念sama閱讀 45,578評論 2 355
  • 正文 我出身青樓舍扰,卻偏偏與公主長得像倦蚪,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子边苹,可洞房花燭夜當晚...
    茶點故事閱讀 42,877評論 2 345

推薦閱讀更多精彩內(nèi)容