[讀書筆記r4ds]19 Functions

III.Program 編程技巧

19 Functions

When should you write a function?

-當需要多次使用相同的代碼時,應該考慮寫function冒萄。
-寫function的3個關鍵步驟:

  • 名字pick a name for the function.
  • 參數(shù)You list the inputs, or arguments, to the function inside function.
  • 代碼You place the code you have developed in body of the function, a { block that immediately follows function(...).

This is an important part of the “do not repeat yourself” (or DRY) principle.

寫函數(shù)而不是復制样勃、粘貼有3大好處:

  • 可以給function 起一個明了的名字
  • 隨著需求變化岩喷,只需要更改部分代碼而不是全部吏砂。
  • 消除犯錯的機會任岸。

19.2.1 Practice

  1. Why is TRUE not a parameter to rescale01()? What would happen if x contained a single missing value, and na.rm was FALSE?
    TRUE 這個參數(shù)沒必要改變竖共,所以不是parameter纤虽。沒結(jié)果乳绕。
  2. In the second variant of rescale01(), infinite values are left unchanged. Rewrite rescale01()so that -Inf is mapped to 0, and Inf is mapped to 1.
rescale02 <- function(x) {
  x[x==Inf] <- 1
  x[x==-Inf] <- 0
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}
  1. Practice turning the following code snippets into functions. Think about what each function does. What would you call it? How many arguments does it need? Can you rewrite it to be more expressive or less duplicative?

    mean(is.na(x))
    
    x / sum(x, na.rm = TRUE)
    
    sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE)
    
  2. Follow http://nicercode.github.io/intro/writing-functions.html to write your own functions to compute the variance and skew of a numeric vector.

  3. Write both_na(), a function that takes two vectors of the same length and returns the number of positions that have an NA in both vectors.

both_na <- function(x,y){
  position <- ""
  if(length(x)==length(y)){
    is_x <- is.na(x)
    is_y <- is.na(y)
    len <- length(x)
    for(i in 1:len){
      if(is_x[i]==T &is_y[i]==T){
        position=c(position,i)}
    }
    if(length(position)>1){ 
      position=position[-1]
    }
  }else{
    print("Length is not equal.")
  }
  position
}
  1. What do the following functions do? Why are they useful even though they are so short?

    is_directory <- function(x) file.info(x)$isdir
    is_readable <- function(x) file.access(x, 4) == 0
    
  2. Read the complete lyrics to “Little Bunny Foo Foo”. There’s a lot of duplication in this song. Extend the initial piping example to recreate the complete song, and use functions to reduce the duplication.

19.3 Functions are for humans and computers函數(shù)的可讀性

Tips:The name of a function

  • Your function name will be short, but clearly evoke what the function does. But it’s better to be clear than short.
  • Generally, function names should be verbs, and arguments should be nouns.
  • using snake_case, or camelCase consistently for multiple words name.
  • prefix for family functions.
  • avoid overriding existing functions and variables.
  • Use comments, lines starting with #, to explain the “why” of your code.
  • Use long lines of - and = to make it easy to spot the breaks.

19.3.1 Exercises

  1. Read the source code for each of the following three functions, puzzle out what they do, and then brainstorm better names.
### 判斷是否是字符串的前綴是否正確
f1 <- function(string, prefix) {
  substr(string, 1, nchar(prefix)) == prefix
}
### 刪除向量的最后一個單位
f2 <- function(x) {
  if (length(x) <= 1) return(NULL)
  x[-length(x)]
}
### 重復y字符以x的長度
f3 <- function(x, y) {
  rep(y, length.out = length(x))
}

f1: prefix_check
f2: vector_del
f3: rep_as_length

  1. Take a function that you’ve written recently and spend 5 minutes brainstorming a better name for it and its arguments.

  2. Compare and contrast rnorm() and MASS::mvrnorm(). How could you make them more consistent?
    norm_r and norm_mvr

  3. Make a case for why norm_r(), norm_d() etc would be better than rnorm(), dnorm(). Make a case for the opposite.

19.4 Conditional execution 條件判斷

  • if
if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}
  • condition
    -- The condition 必須是 TRUE 或者 FALSE.
    -- 用 || (or) and && (and) 合并multiple logical expressions.
    -- 不要用 | or & in an if statement. If you do have a logical vector, you can use any() or all() to collapse it to a single value.
    -- 使用==需要小心,==具有向量性逼纸,能夠產(chǎn)生多個邏輯值洋措。
    -- identical() 嚴格比較是否一致,產(chǎn)生1個邏輯值杰刽。
    -- dplyr::near() 近似比較
  • Multiple conditions 多重判斷
if (this) {
  # do that
} else if (that) {
  # do something else
} else {
  # 
}
  • switch() . It allows you to evaluate selected code based on position or name.
#> function(x, y, op) {
#>   switch(op,
#>     plus = x + y,
#>     minus = x - y,
#>     times = x * y,
#>     divide = x / y,
#>     stop("Unknown op!")
#>   )
#> }
  • cut(). It’s used to discretise continuous variables.

19.4.3 Code style

  • Both if and function should (almost) always be followed by squiggly brackets ({}), and the contents should be indented by two spaces.
  • An opening curly brace{ should never go on its own line and should always be followed by a new line.
  • A closing curly brace} should always go on its own line, unless it’s followed by else.

19.4.4 Exercises

  1. What’s the difference between if and ifelse()? Carefully read the help and construct three examples that illustrate the key differences.
    1) ifelse 必定返回一個值菠发,不能返回向量。if條件判斷后专缠,可以返回向量雷酪,可以不返回任何值。
    2)if 可以進行多重條件判斷涝婉, ifelse 只能進行T/F 判斷哥力。

  2. Write a greeting function that says “good morning”, “good afternoon”, or “good evening”, depending on the time of day. (Hint: use a time argument that defaults to lubridate::now(). That will make it easier to test your function.)

greeting<- function(){
  now <- lubridate::now() %>% hour()
    if(now<12&&now>=5){
      print("Good morning!")
    }else if(now>=12 &&now<18){
      print("Good afternoon!")
    }else{
      print("Good evening!")
    }
}
  1. Implement a fizzbuzz function. It takes a single number as input. If the number is divisible by three, it returns “fizz”. If it’s divisible by five it returns “buzz”. If it’s divisible by three and five, it returns “fizzbuzz”. Otherwise, it returns the number. Make sure you first write working code before you create the function.
fizzbuzz <- function(x){
  if (x%%3==0&& x%%5==0){
    "fizzbuzz"
  } else if(x%%3==0&& x%%5!=0){
    "fizz"
  } else if(x%%3!=0&& x%%5==0){
    "buzz"
  } else{x}
}
### 使用switch()
fizzbuzz2 <- function(x){
  a <- "a"
  if (x%%3==0) {a <- paste0(a,"b")}
  if (x%%5==0) {a <- paste0(a,"c")}
  switch(a,
          a = x,
         ab = "fizz",
         ac = "bizz",
         abc= "fizzbizz")
}
  1. How could you use cut() to simplify this set of nested if-else statements?
if (temp <= 0) {
  "freezing"
} else if (temp <= 10) {
  "cold"
} else if (temp <= 20) {
  "cool"
} else if (temp <= 30) {
  "warm"
} else {
  "hot"
}
##使用cut() 和switch()
if (temp <= 0) {  
  "freezing"
} else if(temp>=0&& temp<=30){
  c <- cut(temp,breaks=c(0,10,20,30)) %>% as.integer
  switch(c,"cold","cool","warm")
} else {
  "hot"
}

How would you change the call to cut() if I’d used < instead of <=? What is the other chief advantage of cut() for this problem? (Hint: what happens if you have many values in temp?)

##使用right=FALSE參數(shù),切斷部分包含左邊界,不包含右邊界
if (temp <=0) {  
  "freezing"
} else if(temp>=0&& temp<=30){
  c <- cut(temp,breaks=c(0,10,20,30), right=FALSE) %>% as.integer
  switch(c,"cold","cool","warm")
} else {
  "hot"
}
  1. What happens if you use switch() with numeric values?
    可以不用‘=’制定吩跋,按數(shù)字順序識別寞射,后續(xù)操作。
  2. What does this switch() call do? What happens if x is “e”?
    Experiment, then carefully read the documentation.
switch(x, 
  a = ,
  b = "ab",
  c = ,
  d = "cd",
)

Nothing happend!

19.5 Function arguments 函數(shù)的參數(shù)

參數(shù)主要有兩種作用:

  • data锌钮,函數(shù)計算直接需要的信息桥温。
  • details,函數(shù)的細節(jié)調(diào)整需要的信息梁丘。

data 類參數(shù)放在最前面侵浸,details參數(shù)放后面,并且最好有默認值氛谜。

  • 參數(shù)的默認值最好是最常用的值掏觉。少許例外是出于數(shù)據(jù)安全的考慮,例如值漫。na.rm 默認FALSE澳腹,因為缺失值na 非常值得我們關注,而正常運算時杨何,往往需要將na.rm 設置為TRUE酱塔。為了方便而直接將na.rm值設置為TRUE,不是一個好主意危虱。
  • 當你調(diào)用函數(shù)時羊娃,data類參數(shù),參數(shù)名稱往往可以省略槽地。details類參數(shù)迁沫,如果使用默認值可以不體現(xiàn)芦瘾,如果需要修改默認值捌蚊,則需要通過參數(shù)名調(diào)用。
  • 調(diào)用參數(shù)名時近弟,允許使用參數(shù)名的前綴進行部分匹配缅糟,但是需要避免混淆。
  • 調(diào)用函數(shù)時祷愉, 在 =前后加入空格窗宦, ,后面加入空格,可以提高代碼的可讀性二鳄。

19.5.1 Choosing names

  • 好的參數(shù)名赴涵,便于理解,要兼顧易讀性與長度订讼。
  • 有一些常用的函數(shù)名非常短髓窜,值得記住:
    there are a handful of very common, very short names. It’s worth memorising these:
    • x, y, z: vectors.
    • w: a vector of weights.
    • df: a data frame.
    • i, j: numeric indices (typically rows and columns).
    • n: length, or number of rows.
    • p: number of columns.
  • 可以考慮使用其他函數(shù)中的參數(shù)名。例如:使用 na.rm參數(shù)來確定是否需要去除missing value。

19.5.2 Checking values

  • 對重要參數(shù)進行檢驗并報錯寄纵,是好習慣鳖敷。
    It’s good practice to check important preconditions, and throw an error (with stop())。
  • 需要平衡你花費的精力與函數(shù)的質(zhì)量程拭,對于一些非重要參數(shù)可不必檢驗定踱。
    There’s a tradeoff between how much time you spend making your function robust, versus how long you spend writing it.
  • 折中的方式是采用stopifnot()函數(shù)。
    stopifnot():it checks that each argument is TRUE, and produces a generic error message if not.

19.5.3 Dot-dot-dot (…)

  • R中的許多函數(shù)可以具有任意個輸入?yún)?shù)恃鞋,這種功能依賴特殊的參數(shù)...崖媚。
    Many functions in R take an arbitrary number of inputs,That rely on a special argument: ...
  • 你可以在你的函數(shù)中使用其他函數(shù)中的...參數(shù)。
    It’s useful because you can then send those ... on to another function.
  • ...參數(shù)使用非常方便恤浪,可以讓我把不想處理的參數(shù)交給其他函數(shù)至扰。
    It’s a very convenient technique.
  • 任何拼錯的參數(shù)都不會引起錯誤,這使得打字錯誤很容易被忽視资锰。
    But it does come at a price: any misspelled arguments will not raise an error. This makes it easy for typos to go unnoticed.
x <- c(1, 2)
sum(x, na.mr = TRUE)
#> [1] 4
### 你看出錯誤是怎么產(chǎn)生的嗎敢课?
## na.rm 參數(shù)被寫成了na.mr

19.5.4 Lazy evaluation

19.5.5 Exercises

  1. What does commas(letters, collapse = "-") do? Why?
commas(letters, collapse = "-")
# Error in stringr::str_c(..., collapse = "- ") : 
##   formal argument "collapse" matched by multiple actual arguments

因為在之前,設置commas 函數(shù)時绷杜,已經(jīng)設定過collapse = ", "的參數(shù)直秆,再次設定collapse = "- ",則collapse參數(shù)出現(xiàn)了多個匹配項鞭盟,導致報錯圾结。
解決方法:

commas <- function(...) stringr::str_c(...)
commas(letters, collapse="-") 
[1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

Notes: 如果str_c()設置了collapse = ", "的默認值,commas函數(shù)對collapse 默認值的修改齿诉,無法傳遞給str_c()

commas <- function(...,collaspe=",") stringr::str_c(..., collapse = ", ")
> commas(letters)
[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z"
> commas(letters,collaspe = "-")
[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z"

需要一個中間變量傳遞

commas <- function(...,collaspe=",") {
  a <- collaspe
  stringr::str_c(..., collapse = a)
}
> commas(letters,collaspe = "-")
[1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"
  1. It’d be nice if you could supply multiple characters to the pad argument, e.g. rule("Title", pad = "-+"). Why doesn’t this currently work? How could you fix it?
rule <- function(..., pad = "-") {
  title <- paste0(...)
  width <- getOption("width") - nchar(title) - 5
  cat(title, " ", stringr::str_dup(pad, width%/%str_length(pad)), "\n", sep = "")
}
rule("Important output",pad="+-")
Important output +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
  1. What does the trim argument to mean() do? When might you use it?
    trim在計算平均值之前筝野,從x的兩端總共截斷分數(shù)(0-0.5)倍數(shù)量的觀測值。trim值之外范圍的值被認為是最接近的終點粤剧。
    使用trim 值進行計算的平均值歇竟,稱為:截斷均值。在統(tǒng)計學里面一般是去除最高端的5%和最低端的5%抵恋。當然為了滿足不同的需求焕议,不一定是5%,但是一般都是高端和低端同時去除同樣比例的數(shù)據(jù)弧关。
    目的主要是為了避免部分極高值和極低值對于數(shù)據(jù)整體均值的影響盅安,從而使平均值對整體更加有代表性。
    最典型的例子是:奧運會上世囊,體操運動員的得分别瞭,要將所有裁判的打分,去掉1個最高分株憾,1個最低分蝙寨,其余的平均值及為運動員的最后得分。
  2. The default value for the method argument to cor() is c("pearson", "kendall", "spearman"). What does that mean? What value is used by default?
  • pearson correlation coefficient(皮爾森相關性系數(shù))是一種最簡單的反應特征和響應之間關系的方法。這個方法衡量的是變量之間的線性相關性籽慢。
  • spearman correlation coefficient(斯皮爾曼相關性系數(shù))通常也叫斯皮爾曼秩相關系數(shù)浸遗。“秩”箱亿,可以理解成就是一種順序或者排序跛锌,那么它就是根據(jù)原始數(shù)據(jù)的排序位置進行求解。
  • kendall correlation coefficient(肯德爾相關性系數(shù))届惋,又稱肯德爾秩相關系數(shù)髓帽,它也是一種秩相關系數(shù),不過它所計算的對象是分類變量脑豹。
    默認值是“pearson”郑藏。

19.6 Return values 返回值

函數(shù)的返回值,是你創(chuàng)建函數(shù)的目的瘩欺。需要考慮2個問題:

  1. 提前返回值是否使函數(shù)更容易讀必盖?
  2. 能否讓函數(shù)通過管道符傳遞?
    19.6.1 Explicit return statements
  • 返回值一般是最后計算的值俱饿。
  • 可以使用return()函數(shù)歌粥,提前返回值。
  • I think it’s best to save the use of return() to signal that you can return early with a simpler solution.
    • A common reason to do this is because the inputs are empty拍埠。
    • Another reason is because you have a if statement with one complex block and one simple block.
  • If the first block is very long, by the time you get to the else, you’ve forgotten the condition. One way to rewrite it is to use an early return for the simple case:
f <- function() {
  if (!x) {
    return(something_short)
  }

  # Do 
  # something
  # that
  # takes
  # many
  # lines
  # to
  # express
}

19.6.2 Writing pipeable functions

  • return value’s object type will mean that your pipeline will “just work”. For example, with dplyr and tidyr the object type is the data frame.
  • There are two basic types of pipeable functions: transformations and side-effects.
    • transformations: 將一個對象傳遞給函數(shù)的第一個參數(shù)失驶,并返回一個修改后的對象。
      an object is passed to the function’s first argument and a modified object is returned.
    • side-effects: 傳遞的對象沒有被轉(zhuǎn)換枣购。該函數(shù)對對象執(zhí)行操作嬉探,如繪制繪圖或保存文件。副作用函數(shù)應該在不可見的情況下返回第一個參數(shù)棉圈,這樣即使它們沒有被打印出來涩堤,仍然可以在管道中使用。
      the passed object is not transformed. Instead, the function performs an action on the object, like drawing a plot or saving a file. Side-effects functions should “invisibly” return the first argument, so that while they’re not printed they can still be used in a pipeline.

19.7 Environment

  • Environments are crucial to how functions work.
  • The environment of a function controls how R finds the value associated with a name.
  • R uses rules called lexical scoping to find the value associated with a name.
  • Since y is not defined inside the function, R will look in the environment where the function was defined.
  • R places few limits on your power.You can do many things that you can’t do in other programming languages.
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末迄损,一起剝皮案震驚了整個濱河市定躏,隨后出現(xiàn)的幾起案子账磺,更是在濱河造成了極大的恐慌芹敌,老刑警劉巖,帶你破解...
    沈念sama閱讀 217,277評論 6 503
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件垮抗,死亡現(xiàn)場離奇詭異氏捞,居然都是意外死亡,警方通過查閱死者的電腦和手機冒版,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,689評論 3 393
  • 文/潘曉璐 我一進店門液茎,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事捆等≈驮欤” “怎么了?”我有些...
    開封第一講書人閱讀 163,624評論 0 353
  • 文/不壞的土叔 我叫張陵栋烤,是天一觀的道長谒养。 經(jīng)常有香客問我,道長明郭,這世上最難降的妖魔是什么买窟? 我笑而不...
    開封第一講書人閱讀 58,356評論 1 293
  • 正文 為了忘掉前任,我火速辦了婚禮薯定,結(jié)果婚禮上始绍,老公的妹妹穿的比我還像新娘。我一直安慰自己话侄,他們只是感情好亏推,可當我...
    茶點故事閱讀 67,402評論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著年堆,像睡著了一般径簿。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上嘀韧,一...
    開封第一講書人閱讀 51,292評論 1 301
  • 那天篇亭,我揣著相機與錄音,去河邊找鬼锄贷。 笑死译蒂,一個胖子當著我的面吹牛,可吹牛的內(nèi)容都是我干的谊却。 我是一名探鬼主播柔昼,決...
    沈念sama閱讀 40,135評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼炎辨!你這毒婦竟也來了捕透?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 38,992評論 0 275
  • 序言:老撾萬榮一對情侶失蹤碴萧,失蹤者是張志新(化名)和其女友劉穎乙嘀,沒想到半個月后,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體婴噩,經(jīng)...
    沈念sama閱讀 45,429評論 1 314
  • 正文 獨居荒郊野嶺守林人離奇死亡擎场,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,636評論 3 334
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了几莽。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片迅办。...
    茶點故事閱讀 39,785評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖章蚣,靈堂內(nèi)的尸體忽然破棺而出礼饱,到底是詐尸還是另有隱情,我是刑警寧澤究驴,帶...
    沈念sama閱讀 35,492評論 5 345
  • 正文 年R本政府宣布洒忧,位于F島的核電站蝴韭,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏熙侍。R本人自食惡果不足惜榄鉴,卻給世界環(huán)境...
    茶點故事閱讀 41,092評論 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望巷送。 院中可真熱鬧驶忌,春花似錦、人聲如沸笑跛。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,723評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽飞蹂。三九已至几苍,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間陈哑,已是汗流浹背妻坝。 一陣腳步聲響...
    開封第一講書人閱讀 32,858評論 1 269
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留惊窖,地道東北人刽宪。 一個月前我還...
    沈念sama閱讀 47,891評論 2 370
  • 正文 我出身青樓,卻偏偏與公主長得像爬坑,于是被迫代替她去往敵國和親纠屋。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 44,713評論 2 354

推薦閱讀更多精彩內(nèi)容