5.數(shù)據(jù)轉(zhuǎn)換（三）

5.6 信息匯總`summarise()`

最后一個是summarise()。它將數(shù)據(jù)框折疊為一行：

summarise(flights, delay = mean(dep_delay, na.rm = TRUE))
#> # A tibble: 1 x 1
#>   delay
#>   <dbl>
#> 1  12.6

（na.rm = TRUE意味著什么？）

一般是將Summarise()與group_by()一起使用，否則它并不是特別有用。這將分析范圍從完整數(shù)據(jù)集更改為單個組。然后，當(dāng)您在分組數(shù)據(jù)幀上使用dplyr時筹燕，它們將自動“按組”分配。例如衅鹿，如果我們對日期分組撒踪，將得到每個日期的平均延遲：

by_day <- group_by(flights, year, month, day)
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 4
#> # Groups:   year, month [12]
#>    year month   day delay
#>   <int> <int> <int> <dbl>
#> 1  2013     1     1 11.5 
#> 2  2013     1     2 13.9 
#> 3  2013     1     3 11.0 
#> 4  2013     1     4  8.95
#> 5  2013     1     5  5.73
#> 6  2013     1     6  7.15
#> # … with 359 more rows

group_by()和summarise()一起使用提供最常用的工具之一：分組摘要。但在我們進(jìn)一步討論這個問題之前大渤，我們需要了解管道制妄。

5.6.1 用管道連接多個操作

想象一下，我們要探索每個位置的距離和平均延遲之間的關(guān)系泵三。通過對 dplyr 的了解耕捞，可以編寫如下代碼：

by_dest <- group_by(flights, dest)
delay <- summarise(by_dest,
  count = n(),
  dist = mean(distance, na.rm = TRUE),
  delay = mean(arr_delay, na.rm = TRUE)
)
#> `summarise()` ungrouping output (override with `.groups` argument)
delay <- filter(delay, count > 20, dest != "HNL")

# It looks like delays increase with distance up to ~750 miles 
# and then decrease. Maybe as flights get longer there's more 
# ability to make up delays in the air?
ggplot(data = delay, mapping = aes(x = dist, y = delay)) +
  geom_point(aes(size = count), alpha = 1/3) +
  geom_smooth(se = FALSE)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

image

準(zhǔn)備這些數(shù)據(jù)需要三個步驟：

按目的地對航班分組。
統(tǒng)計并計算距離烫幕、平均延誤和航班數(shù)量俺抽。
過濾以去除噪聲點和檀香山機(jī)場，該機(jī)場的距離幾乎是下一個最近機(jī)場的兩倍较曼。

這段代碼寫起來有點繁瑣磷斧，因為我們必須為每個中間數(shù)據(jù)幀命名。對每個變量都要命名捷犹，因此這會減慢我們的分析速度弛饭。

以下方法通過管道可以解決相同的問題 %>%：

delays <- flights %>% 
  group_by(dest) %>% 
  summarise(
    count = n(),
    dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>% 
  filter(count > 20, dest != "HNL")
#> `summarise()` ungrouping output (override with `.groups` argument)

實際上，x %>% f(y)變成f(x, y)萍歉，x %>% f(y) %>% g(z)變成g(f(x, y), z)等等侣颂。您可以使用管道以從左到右、從上到下閱讀的方式重寫多個操作枪孩。從現(xiàn)在開始我們將經(jīng)常使用管道憔晒，因為它大大提高了代碼的可讀性，我們將在管道中更詳細(xì)地回到它蔑舞。

5.6.2 缺失值

我們上面使用的參數(shù)na.rm拒担。如果我們不設(shè)置它會發(fā)生什么？

flights %>% 
  group_by(year, month, day) %>% 
  summarise(mean = mean(dep_delay))
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 4
#> # Groups:   year, month [12]
#>    year month   day  mean
#>   <int> <int> <int> <dbl>
#> 1  2013     1     1    NA
#> 2  2013     1     2    NA
#> 3  2013     1     3    NA
#> 4  2013     1     4    NA
#> 5  2013     1     5    NA
#> 6  2013     1     6    NA
#> # … with 359 more rows

我們將會得到了很多缺失值斗幼！這是因為聚合函數(shù)遵循缺失值的通用規(guī)則：如果輸入中有任何缺失值，則輸出將是缺失值抚垄。然而所有聚合函數(shù)都有一個na.rm參數(shù)蜕窿，我們可以在計算之前刪除缺失值：

flights %>% 
  group_by(year, month, day) %>% 
  summarise(mean = mean(dep_delay, na.rm = TRUE))
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 4
#> # Groups:   year, month [12]
#>    year month   day  mean
#>   <int> <int> <int> <dbl>
#> 1  2013     1     1 11.5 
#> 2  2013     1     2 13.9 
#> 3  2013     1     3 11.0 
#> 4  2013     1     4  8.95
#> 5  2013     1     5  5.73
#> 6  2013     1     6  7.15
#> # … with 359 more rows

在此處谋逻，缺失值代表取消的航班，我們還可以先刪除取消的航班來解決該問題桐经。我們將保存此數(shù)據(jù)集毁兆，以便在接下來的幾個示例中重復(fù)使用它。

not_cancelled <- flights %>% 
  filter(!is.na(dep_delay), !is.na(arr_delay))

not_cancelled %>% 
  group_by(year, month, day) %>% 
  summarise(mean = mean(dep_delay))
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 4
#> # Groups:   year, month [12]
#>    year month   day  mean
#>   <int> <int> <int> <dbl>
#> 1  2013     1     1 11.4 
#> 2  2013     1     2 13.7 
#> 3  2013     1     3 10.9 
#> 4  2013     1     4  8.97
#> 5  2013     1     5  5.73
#> 6  2013     1     6  7.15
#> # … with 359 more rows

5.6.3 計數(shù)

無論何時進(jìn)行任何聚合阴挣，包含一個count (n())或一個非缺失值的計數(shù)(sum(!is.na(x))都是很好選擇气堕。這樣你就可以確定你不是基于非常少量的數(shù)據(jù)得出結(jié)論。例如畔咧，讓我們看看平均延誤時間最高的飛機(jī)(通過機(jī)尾號來確定):

delays <- not_cancelled %>% 
  group_by(tailnum) %>% 
  summarise(
    delay = mean(arr_delay)
  )
#> `summarise()` ungrouping output (override with `.groups` argument)

ggplot(data = delays, mapping = aes(x = delay)) + 
  geom_freqpoly(binwidth = 10)

image

可以看到茎芭，有些飛機(jī)平均延誤了 5 小時（300 分鐘）！

如果我們繪制航班數(shù)量與平均延誤的散點圖誓沸，我們可以獲得更多信息：

delays <- not_cancelled %>% 
  group_by(tailnum) %>% 
  summarise(
    delay = mean(arr_delay, na.rm = TRUE),
    n = n()
  )
#> `summarise()` ungrouping output (override with `.groups` argument)

ggplot(data = delays, mapping = aes(x = n, y = delay)) + 
  geom_point(alpha = 1/10)

image

毫不奇怪梅桩，當(dāng)航班很少時，平均延誤的變化要大得多拜隧。該圖的形狀非常有特點：每當(dāng)您繪制均值（或其他匯總）與組大小的關(guān)系圖時宿百，您會看到變異隨著樣本大小的增加而減小。

在查看此類圖時洪添，過濾掉具有最少觀測值的組通常很有用垦页，這樣您就可以看到更多的模式，并減少最小組中的極端變化干奢。這就是以下代碼的作用痊焊，并向您展示了將 ggplot2 集成到 dplyr 流中的便捷模式。必須從%>%切換到有+的過程律胀。

delays %>% 
  filter(n > 25) %>% 
  ggplot(mapping = aes(x = n, y = delay)) + 
    geom_point(alpha = 1/10)

image

當(dāng)我將擊球手的技巧（以擊球平均值ba衡量）與擊球機(jī)會數(shù)（以擊球次數(shù)ab衡量）作圖時宋光，您會看到兩種模式：

如上所述，隨著我們獲得更多數(shù)據(jù)點炭菌，我們的聚合變化會減少罪佳。
擊球技巧 ( ba) 和擊球次數(shù)(ab)之間存在正相關(guān)關(guān)系。這是因為球隊控制誰可以上場黑低，而且顯然他們會挑選最好的球員赘艳。

# Convert to a tibble so it prints nicely
batting <- as_tibble(Lahman::Batting)

batters <- batting %>% 
  group_by(playerID) %>% 
  summarise(
    ba = sum(H, na.rm = TRUE) / sum(AB, na.rm = TRUE),
    ab = sum(AB, na.rm = TRUE)
  )
#> `summarise()` ungrouping output (override with `.groups` argument)

batters %>% 
  filter(ab > 100) %>% 
  ggplot(mapping = aes(x = ab, y = ba)) +
    geom_point() + 
    geom_smooth(se = FALSE)
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

image

這對排名也有重要影響。如果你直接排序desc(ba)克握，具有最佳擊球率的人顯然是幸運(yùn)的蕾管，而不是技術(shù)嫻熟的：

batters %>% 
  arrange(desc(ba))
#> # A tibble: 19,689 x 3
#>   playerID     ba    ab
#>   <chr>     <dbl> <int>
#> 1 abramge01     1     1
#> 2 alanirj01     1     1
#> 3 alberan01     1     1
#> 4 banisje01     1     1
#> 5 bartocl01     1     1
#> 6 bassdo01      1     1
#> # … with 19,683 more rows

5.6.4 其它統(tǒng)計函數(shù)

只使用means, counts和 sum 可以解決很多問題，但R提供了許多其他有用的統(tǒng)計函數(shù)：

位置測量：我們使用過mean(x)菩暗，但median(x)也很有用掰曾。平均值是總和除以長度；中位數(shù)是一個值停团，x中 50%高于中位數(shù)旷坦，50% 低于中位數(shù)掏熬。

有時將匯總與邏輯子集相結(jié)合很有用。

not_cancelled %>% 
  group_by(year, month, day) %>% 
  summarise(
    avg_delay1 = mean(arr_delay),
    avg_delay2 = mean(arr_delay[arr_delay > 0]) # the average positive delay
  )
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 5
#> # Groups:   year, month [12]
#>    year month   day avg_delay1 avg_delay2
#>   <int> <int> <int>      <dbl>      <dbl>
#> 1  2013     1     1      12.7        32.5
#> 2  2013     1     2      12.7        32.0
#> 3  2013     1     3       5.73       27.7
#> 4  2013     1     4      -1.93       28.3
#> 5  2013     1     5      -1.53       22.6
#> 6  2013     1     6       4.24       24.4
#> # … with 359 more rows

數(shù)據(jù)分布：sd(x), IQR(x), mad(x)秒梅。均方根偏差或標(biāo)準(zhǔn)偏差sd(x)是數(shù)據(jù)分布的標(biāo)準(zhǔn)度量旗芬。四分位數(shù)IQR(x)和中值絕對偏差mad(x)`是有用的選項，如果您有異常值捆蜀，它們可能更有用疮丛。

# Why is distance to some destinations more variable than to others?
not_cancelled %>% 
  group_by(dest) %>% 
  summarise(distance_sd = sd(distance)) %>% 
  arrange(desc(distance_sd))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 104 x 2
#>   dest  distance_sd
#>   <chr>       <dbl>
#> 1 EGE         10.5 
#> 2 SAN         10.4 
#> 3 SFO         10.2 
#> 4 HNL         10.0 
#> 5 SEA          9.98
#> 6 LAS          9.91
#> # … with 98 more rows

等級度量：min(x), quantile(x, 0.25), max(x)。分位數(shù)和中位數(shù)定義相似辆它。例如誊薄，quantile(x, 0.25) 會發(fā)現(xiàn)一個值x大于 25% 的值，而小于其余 75% 的值娩井。

# When do the first and last flights leave each day?
not_cancelled %>% 
  group_by(year, month, day) %>% 
  summarise(
    first = min(dep_time),
    last = max(dep_time)
  )
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 5
#> # Groups:   year, month [12]
#>    year month   day first  last
#>   <int> <int> <int> <int> <int>
#> 1  2013     1     1   517  2356
#> 2  2013     1     2    42  2354
#> 3  2013     1     3    32  2349
#> 4  2013     1     4    25  2358
#> 5  2013     1     5    14  2357
#> 6  2013     1     6    16  2355
#> # … with 359 more rows

位置測量：first(x), nth(x, 2), last(x)暇屋。這些類似于x[1], x[2],x[length(x)]但如果該位置不存在，則讓您設(shè)置默認(rèn)值（即您試圖從只有兩個元素的組中獲取第三個元素）洞辣。例如咐刨，我們可以找到每天的第一次和最后一次出發(fā)：

not_cancelled %>% 
  group_by(year, month, day) %>% 
  summarise(
    first_dep = first(dep_time), 
    last_dep = last(dep_time)
  )
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 5
#> # Groups:   year, month [12]
#>    year month   day first_dep last_dep
#>   <int> <int> <int>     <int>    <int>
#> 1  2013     1     1       517     2356
#> 2  2013     1     2        42     2354
#> 3  2013     1     3        32     2349
#> 4  2013     1     4        25     2358
#> 5  2013     1     5        14     2357
#> 6  2013     1     6        16     2355
#> # … with 359 more rows

這些函數(shù)是對排序過濾后的補(bǔ)充。過濾可以為您提供所有變量扬霜，每個觀察值都在單獨的行中：

not_cancelled %>% 
  group_by(year, month, day) %>% 
  mutate(r = min_rank(desc(dep_time))) %>% 
  filter(r %in% range(r))
#> # A tibble: 770 x 20
#> # Groups:   year, month, day [365]
#>    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
#> 1  2013     1     1      517            515         2      830            819
#> 2  2013     1     1     2356           2359        -3      425            437
#> 3  2013     1     2       42           2359        43      518            442
#> 4  2013     1     2     2354           2359        -5      413            437
#> 5  2013     1     3       32           2359        33      504            442
#> 6  2013     1     3     2349           2359       -10      434            445
#> # … with 764 more rows, and 12 more variables: arr_delay <dbl>, carrier <chr>,
#> #   flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
#> #   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>, r <int>

Counts：你已經(jīng)使用了n()定鸟，它不接受任何參數(shù)，并返回當(dāng)前組的大小著瓶。要計算非缺失值的數(shù)量联予，請使用 sum(!is.na(x))。要計算不同（唯一）值的數(shù)量材原，請使用 n_distinct(x).

# Which destinations have the most carriers?
not_cancelled %>% 
  group_by(dest) %>% 
  summarise(carriers = n_distinct(carrier)) %>% 
  arrange(desc(carriers))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 104 x 2
#>   dest  carriers
#>   <chr>    <int>
#> 1 ATL          7
#> 2 BOS          7
#> 3 CLT          7
#> 4 ORD          7
#> 5 TPA          7
#> 6 AUS          6
#> # … with 98 more rows

計數(shù)非常有用沸久，如果你只想要計數(shù)，dplyr 提供了一個函數(shù)count()：

not_cancelled %>% 
  count(dest)
#> # A tibble: 104 x 2
#>   dest      n
#>   <chr> <int>
#> 1 ABQ     254
#> 2 ACK     264
#> 3 ALB     418
#> 4 ANC       8
#> 5 ATL   16837
#> 6 AUS    2411
#> # … with 98 more rows

您可以提供權(quán)重變量余蟹。例如卷胯，您可以使用它來“count”（sum）飛機(jī)飛行的總英里數(shù)：

not_cancelled %>% 
  count(tailnum, wt = distance)
#> # A tibble: 4,037 x 2
#>   tailnum      n
#>   <chr>    <dbl>
#> 1 D942DN    3418
#> 2 N0EGMQ  239143
#> 3 N10156  109664
#> 4 N102UW   25722
#> 5 N103US   24619
#> 6 N104UW   24616
#> # … with 4,031 more rows

邏輯值的計數(shù)和比例：sum(x > 10), mean(y == 0)。當(dāng)與數(shù)字函數(shù)使用時威酒，TRUE被轉(zhuǎn)換成1和FALSE轉(zhuǎn)換成0窑睁。這使得sum()和mean()非常有用的：sum(x)給出在x中 TRUE的數(shù)量，而mean(x)給出的比例葵孤。

# How many flights left before 5am? (these usually indicate delayed
# flights from the previous day)
not_cancelled %>% 
  group_by(year, month, day) %>% 
  summarise(n_early = sum(dep_time < 500))
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 4
#> # Groups:   year, month [12]
#>    year month   day n_early
#>   <int> <int> <int>   <int>
#> 1  2013     1     1       0
#> 2  2013     1     2       3
#> 3  2013     1     3       4
#> 4  2013     1     4       3
#> 5  2013     1     5       3
#> 6  2013     1     6       2
#> # … with 359 more rows

# What proportion of flights are delayed by more than an hour?
not_cancelled %>% 
  group_by(year, month, day) %>% 
  summarise(hour_prop = mean(arr_delay > 60))
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 4
#> # Groups:   year, month [12]
#>    year month   day hour_prop
#>   <int> <int> <int>     <dbl>
#> 1  2013     1     1    0.0722
#> 2  2013     1     2    0.0851
#> 3  2013     1     3    0.0567
#> 4  2013     1     4    0.0396
#> 5  2013     1     5    0.0349
#> 6  2013     1     6    0.0470
#> # … with 359 more rows

5.6.5 多變量分組

當(dāng)您按多個變量分組時担钮，每個摘要都會剝離分組的一個級別。這使得逐步匯總數(shù)據(jù)集變得容易：

daily <- group_by(flights, year, month, day)
(per_day   <- summarise(daily, flights = n()))
#> `summarise()` regrouping output by 'year', 'month' (override with `.groups` argument)
#> # A tibble: 365 x 4
#> # Groups:   year, month [12]
#>    year month   day flights
#>   <int> <int> <int>   <int>
#> 1  2013     1     1     842
#> 2  2013     1     2     943
#> 3  2013     1     3     914
#> 4  2013     1     4     915
#> 5  2013     1     5     720
#> 6  2013     1     6     832
#> # … with 359 more rows
(per_month <- summarise(per_day, flights = sum(flights)))
#> `summarise()` regrouping output by 'year' (override with `.groups` argument)
#> # A tibble: 12 x 3
#> # Groups:   year [1]
#>    year month flights
#>   <int> <int>   <int>
#> 1  2013     1   27004
#> 2  2013     2   24951
#> 3  2013     3   28834
#> 4  2013     4   28330
#> 5  2013     5   28796
#> 6  2013     6   28243
#> # … with 6 more rows
(per_year  <- summarise(per_month, flights = sum(flights)))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 1 x 2
#>    year flights
#>   <int>   <int>
#> 1  2013  336776

逐步匯總匯總時要小心：sum和count是可以的尤仍，但需要考慮加權(quán)均值和方差箫津，而對于基于等級的統(tǒng)計數(shù)據(jù)（如中位數(shù)），不可能完全做到這一點。換句話說苏遥，分組總和的總和是總和送挑，但分組中位數(shù)的中位數(shù)不是總中位數(shù)。

5.6.6 取消分組

如果需要刪除分組暖眼，并返回對未分組的數(shù)據(jù)，請使用ungroup().

daily %>% 
  ungroup() %>%             # no longer grouped by date
  summarise(flights = n())  # all flights
#> # A tibble: 1 x 1
#>   flights
#>     <int>
#> 1  336776

5.7 改變分組（和過濾器）

在與summarise()結(jié)合使用時分組最有用纺裁，但也可以使用mutate()和filter()進(jìn)行操作：

找出每組中最差的成員：

flights_sml %>% 
  group_by(year, month, day) %>%
  filter(rank(desc(arr_delay)) < 10)
#> # A tibble: 3,306 x 7
#> # Groups:   year, month, day [365]
#>    year month   day dep_delay arr_delay distance air_time
#>   <int> <int> <int>     <dbl>     <dbl>    <dbl>    <dbl>
#> 1  2013     1     1       853       851      184       41
#> 2  2013     1     1       290       338     1134      213
#> 3  2013     1     1       260       263      266       46
#> 4  2013     1     1       157       174      213       60
#> 5  2013     1     1       216       222      708      121
#> 6  2013     1     1       255       250      589      115
#> # … with 3,300 more rows

查找所有大于閾值的組：

popular_dests <- flights %>% 
  group_by(dest) %>% 
  filter(n() > 365)
popular_dests
#> # A tibble: 332,577 x 19
#> # Groups:   dest [77]
#>    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
#> 1  2013     1     1      517            515         2      830            819
#> 2  2013     1     1      533            529         4      850            830
#> 3  2013     1     1      542            540         2      923            850
#> 4  2013     1     1      544            545        -1     1004           1022
#> 5  2013     1     1      554            600        -6      812            837
#> 6  2013     1     1      554            558        -4      740            728
#> # … with 332,571 more rows, and 11 more variables: arr_delay <dbl>,
#> #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
#> #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

標(biāo)準(zhǔn)化以計算每組指標(biāo)：

popular_dests %>% 
  filter(arr_delay > 0) %>% 
  mutate(prop_delay = arr_delay / sum(arr_delay)) %>% 
  select(year:day, dest, arr_delay, prop_delay)
#> # A tibble: 131,106 x 6
#> # Groups:   dest [77]
#>    year month   day dest  arr_delay prop_delay
#>   <int> <int> <int> <chr>     <dbl>      <dbl>
#> 1  2013     1     1 IAH          11  0.000111 
#> 2  2013     1     1 IAH          20  0.000201 
#> 3  2013     1     1 MIA          33  0.000235 
#> 4  2013     1     1 ORD          12  0.0000424
#> 5  2013     1     1 FLL          19  0.0000938
#> 6  2013     1     1 ORD           8  0.0000283
#> # … with 131,100 more rows

最后編輯于：2022.01.21 16:19:04

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末诫肠，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子欺缘，更是在濱河造成了極大的恐慌栋豫，老刑警劉巖，帶你破解...
沈念sama閱讀 206,839評論 6贊 482
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件谚殊，死亡現(xiàn)場離奇詭異丧鸯，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)嫩絮，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,543評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門丛肢，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人剿干，你說我怎么就攤上這事蜂怎。” “怎么了置尔？”我有些...
開封第一講書人閱讀 153,116評論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵杠步，是天一觀的道長。經(jīng)常有香客問我榜轿，道長幽歼，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 55,371評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任谬盐，我火速辦了婚禮甸私，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘设褐。我一直安慰自己颠蕴，他們只是感情好，可當(dāng)我...
茶點故事閱讀 64,384評論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布助析。她就那樣靜靜地躺著犀被，像睡著了一般。火紅的嫁衣襯著肌膚如雪外冀。梳的紋絲不亂的頭發(fā)上寡键，一...
開封第一講書人閱讀 49,111評論 1贊 285
城市分裂傳說
那天，我揣著相機(jī)與錄音雪隧，去河邊找鬼西轩。笑死员舵，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的藕畔。我是一名探鬼主播马僻，決...
沈念sama閱讀 38,416評論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼注服！你這毒婦竟也來了韭邓？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 37,053評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤溶弟，失蹤者是張志新（化名）和其女友劉穎女淑，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體辜御，經(jīng)...
沈念sama閱讀 43,558評論 1贊 300
?護(hù)林員之死
正文獨居荒郊野嶺守林人離奇死亡鸭你，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 36,007評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了擒权。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片袱巨。...
茶點故事閱讀 38,117評論 1贊 334
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖碳抄，靈堂內(nèi)的尸體忽然破棺而出瓣窄，到底是詐尸還是另有隱情，我是刑警寧澤纳鼎，帶...
沈念sama閱讀 33,756評論 4贊 324
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布俺夕，位于F島的核電站，受9級特大地震影響贱鄙，放射性物質(zhì)發(fā)生泄漏劝贸。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 39,324評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一逗宁、第九天我趴在偏房一處隱蔽的房頂上張望映九。院中可真熱鬧，春花似錦瞎颗、人聲如沸件甥。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,315評論 0贊 19
一樁弒父案哼拔，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽引有。三九已至，卻和暖如春倦逐，著一層夾襖步出監(jiān)牢的瞬間譬正，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,539評論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留曾我，地道東北人粉怕。一個月前我還...
沈念sama閱讀 45,578評論 2贊 355
代替公主和親
正文我出身青樓，卻偏偏與公主長得像抒巢，于是被迫代替她去往敵國和親贫贝。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 42,877評論 2贊 345

5.數(shù)據(jù)轉(zhuǎn)換（三）

5.6 信息匯總summarise()

5.6.1 用管道連接多個操作

5.6.2 缺失值

5.6.3 計數(shù)

5.6.4 其它統(tǒng)計函數(shù)

5.6.5 多變量分組

5.6.6 取消分組

5.7 改變分組（和過濾器）

推薦閱讀更多精彩內(nèi)容

5.6 信息匯總`summarise()`