R Shiny 基礎 3. 實戰(zhàn)演練 數據探索

1. 引入

融匯貫通之前介紹的內容,來實現一個簡單的數據可視化app厚满。首先會在R里進行一些數據探索碘箍,然后把這些功能轉換為shiny的可交互模式。

本章節(jié)會用到下面的包四濒。

library(shiny)
library(vroom)
library(tidyverse)

2. 數據

數據會用到National Electronic Injury Surveillance System (NEISS)的一份數據。記錄了長期事故入院的病例數據姆涩。

https://github.com/hadley/neiss

本章只會用到2017年的數據,數據大小在10M左右宏赘。

dir.create("neiss")
#> Warning in dir.create("neiss"): 'neiss' already exists
download <- function(name) {
  url <- "https://github.com/hadley/mastering-shiny/raw/master/neiss/"
  download.file(paste0(url, name), paste0("neiss/", name), quiet = TRUE)
}
download("injuries.tsv.gz")
download("population.tsv")
download("products.tsv")

injuries 是長這樣的

injuries <- vroom::vroom("neiss/injuries.tsv.gz")
injuries
#> # A tibble: 255,064 × 10
#>   trmt_date    age sex   race  body_part   diag         location prod_code weight
#>   <date>     <dbl> <chr> <chr> <chr>       <chr>        <chr>        <dbl>  <dbl>
#> 1 2017-01-01    71 male  white Upper Trunk Contusion O… Other P…      1807   77.7
#> 2 2017-01-01    16 male  white Lower Arm   Burns, Ther… Home           676   77.7
#> 3 2017-01-01    58 male  white Upper Trunk Contusion O… Home           649   77.7
#> 4 2017-01-01    21 male  white Lower Trunk Strain, Spr… Home          4076   77.7
#> 5 2017-01-01    54 male  white Head        Inter Organ… Other P…      1807   77.7
#> 6 2017-01-01    21 male  white Hand        Fracture     Home          1884   77.7
#> # … with 255,058 more rows, and 1 more variable: narrative <chr>

每個變量的定義:

  • trmt_date is date the person was seen in the hospital (not when the accident occurred).
  • age, sex, and race give demographic information about the person who experienced the accident.
  • body_part is the location of the injury on the body (like ankle or ear); location is the place where the accident occurred (like home or school).
  • diag gives the basic diagnosis of the injury (like fracture or laceration).
  • prod_code is the primary product associated with the injury.
  • weight is statistical weight giving the estimated number of people who would suffer this injury if this dataset was scaled to the entire population of the US.
  • narrative is a brief story about how the accident occurred.

然后把這個數據和其他兩個數據匹配起來

products <- vroom::vroom("neiss/products.tsv")
products
#> # A tibble: 38 × 2
#>   prod_code title                            
#>       <dbl> <chr>                            
#> 1       464 knives, not elsewhere classified 
#> 2       474 tableware and accessories        
#> 3       604 desks, chests, bureaus or buffets
#> 4       611 bathtubs or showers              
#> 5       649 toilets                          
#> 6       676 rugs or carpets, not specified   
#> # … with 32 more rows

population <- vroom::vroom("neiss/population.tsv")
population
#> # A tibble: 170 × 3
#>     age sex    population
#>   <dbl> <chr>       <dbl>
#> 1     0 female    1924145
#> 2     0 male      2015150
#> 3     1 female    1943534
#> 4     1 male      2031718
#> 5     2 female    1965150
#> 6     2 male      2056625
#> # … with 164 more rows

3. 數據探索

在做成app之前脐往,先把數據探索一下∶酚龋可以從比較好玩的地方開始入手,比方說prod_code 為649的數據缰揪,這個表示發(fā)生漏電事故的產品編號是廁所邀跃。

selected <- injuries %>% filter(prod_code == 649)
nrow(selected)
#> [1] 2993

接著往下看可以找到發(fā)生事故的場所最多的是Home家里,身體部位是Head

selected %>% count(location, wt = weight, sort = TRUE)
#> # A tibble: 6 × 2
#>   location                         n
#>   <chr>                        <dbl>
#> 1 Home                       99603. 
#> 2 Other Public Property      18663. 
#> 3 Unknown                    16267. 
#> 4 School                       659. 
#> 5 Street Or Highway             16.2
#> 6 Sports Or Recreation Place    14.8

selected %>% count(body_part, wt = weight, sort = TRUE)
#> # A tibble: 24 × 2
#>   body_part        n
#>   <chr>        <dbl>
#> 1 Head        31370.
#> 2 Lower Trunk 26855.
#> 3 Face        13016.
#> 4 Upper Trunk 12508.
#> 5 Knee         6968.
#> 6 N.S./Unk     6741.
#> # … with 18 more rows

selected %>% count(diag, wt = weight, sort = TRUE)
#> # A tibble: 20 × 2
#>   diag                       n
#>   <chr>                  <dbl>
#> 1 Other Or Not Stated   32897.
#> 2 Contusion Or Abrasion 22493.
#> 3 Inter Organ Injury    21525.
#> 4 Fracture              21497.
#> 5 Laceration            18734.
#> 6 Strain, Sprain         7609.
#> # … with 14 more rows

然后也可以用ggplot做簡單的可視化分析唁毒,比方說按照agesex count一下數量

summary <- selected %>% 
  count(age, sex, wt = weight)
summary
#> # A tibble: 208 × 3
#>     age sex         n
#>   <dbl> <chr>   <dbl>
#> 1     0 female   4.76
#> 2     0 male    14.3 
#> 3     1 female 253.  
#> 4     1 male   231.  
#> 5     2 female 438.  
#> 6     2 male   632.  
#> # … with 202 more rows

summary %>% 
  ggplot(aes(age, n, colour = sex)) + 
  geom_line() + 
  labs(y = "Estimated number of injuries")

上圖顯示的是實際數字,也可以改成按照比例顯示诺核。比方說10000個人里有多少個。

summary <- selected %>% 
  count(age, sex, wt = weight) %>% 
  left_join(population, by = c("age", "sex")) %>% 
  mutate(rate = n / population * 1e4)

summary
#> # A tibble: 208 × 5
#>     age sex         n population   rate
#>   <dbl> <chr>   <dbl>      <dbl>  <dbl>
#> 1     0 female   4.76    1924145 0.0247
#> 2     0 male    14.3     2015150 0.0708
#> 3     1 female 253.      1943534 1.30  
#> 4     1 male   231.      2031718 1.14  
#> 5     2 female 438.      1965150 2.23  
#> 6     2 male   632.      2056625 3.07  
#> # … with 202 more rows
summary %>% 
  ggplot(aes(age, rate, colour = sex)) + 
  geom_line(na.rm = TRUE) + 
  labs(y = "Injuries per 10,000 people")

最后還可以查看一下里面一些案件的文字描述。比方說隨機取樣10個樣本夭咬。

selected %>% 
  sample_n(10) %>% 
  pull(narrative)
#>  [1] "97 YOM FELL HITTING HEAD ON TOILET SEAT.DX:  NECK PX, BACK PX, FREQUENT FALLS."                                                   
#>  [2] "95 YOF - CONTUSION HEAD - PT WAS TRANSFERRING FROM W.C TO TOILETAND FELL HITTING HEAD ON FLOOR@ N.H"                              
#>  [3] "54YOM HAD A MECHANICAL FALL ATTEMPTING TO USE THE TOILET, C/O LT-SIDEDCHEST PAIN. DX - RIB FX, PNEUMOTHORAX, CHEST WALL CONTUSION"
#>  [4] "99YF ATTEMPTING TO GET OFF THE TOILET&FELL FWD STRIKING HEAD&CHEST AGAINST THE WALKER, -LOC>>CHI, RIB FX, FREQ FALLS"             
#>  [5] "79 YOF HAD SYNCOPAL EPISODE AND FELL FROM TOILET HITTING FACE ONFLOOR     DX  NASAL BONE FRACTURE"                                
#>  [6] "76YOM C/O GLF @HOME JUST PTA. ATTEMPTING TO SIT ON TOILET, MISSED TOILET AND FELL. NO HI, NO LOC DX=LEFT HIP FRACTURE="           
#>  [7] "6 YO M LAC HEAD-FELL,CLIMBING ON TOILET,STRUCK THE COUNTERTOP"                                                                    
#>  [8] "85YOM SITTING ON THE TOILET AT HOME AND LEANED FORWARD FELL ONTO HEAD SUSTAINED A SUBDURAL HEMATOMA"                              
#>  [9] "79YOF H'TMA HEAD- LOWERING ONTO TOILET, FELL ON FLOOR"                                                                            
#> [10] "81YOM WENT TO SIT ON A TOILET AND MISSED IT AND SUSTAINED A CLOSED HEADINJURY"

都是一些比較基礎的數據探索。接下來要做的就是把這些事情轉交給shiny,全都改寫成shiny code托修。

4. Shiny小試牛刀

首先是設置UI界面。
這里會用最簡單的方式演示三張表格涩拙,一張圖〈瓿梗可以提前用筆在紙上打一下草稿規(guī)劃一下界面的排版旭贬。這里打算做成一個2行3列的界面。第一行顯示三張表格奋刽,第二行顯示一張圖。
由于每一行的最大寬度是12列台谍,所以三張表格均勻分布的話是每一張占4列坞生。

prod_codes <- setNames(products$prod_code, products$title)

ui <- fluidPage(
  fluidRow(
    column(6,
      selectInput("code", "Product", choices = prod_codes)
    )
  ),
  fluidRow(
    column(4, tableOutput("diag")),
    column(4, tableOutput("body_part")),
    column(4, tableOutput("location"))
  ),
  fluidRow(
    column(12, plotOutput("age_sex"))
  )
)

雖然到目前為止還沒有對fluidRow()column()有過詳細講解,但是大概可以猜到是用來做什么的卒废。包括setNames()selectInput() 之后的章節(jié)里都會有說明。

然后是server端参袱。

server <- function(input, output, session) {
  selected <- reactive(injuries %>% filter(prod_code == input$code))

  output$diag <- renderTable(
    selected() %>% count(diag, wt = weight, sort = TRUE)
  )
  output$body_part <- renderTable(
    selected() %>% count(body_part, wt = weight, sort = TRUE)
  )
  output$location <- renderTable(
    selected() %>% count(location, wt = weight, sort = TRUE)
  )

  summary <- reactive({
    selected() %>%
      count(age, sex, wt = weight) %>%
      left_join(population, by = c("age", "sex")) %>%
      mutate(rate = n / population * 1e4)
  })

  output$age_sex <- renderPlot({

    summary() %>%
      ggplot(aes(age, n, colour = sex)) +
      geom_line() +
      labs(y = "Estimated number of injuries")
  }, res = 96)
}

5. 對齊表格樣式

剛才在剛才的結果里看出表格的行數層次不齊环壤,不是很美觀镐捧。如果可以像下面那樣指定排名前幾的就好了。

injuries %>%
  mutate(diag = fct_lump(fct_infreq(diag), n = 5)) %>%
  group_by(diag) %>%
  summarise(n = as.integer(sum(weight)))
#> # A tibble: 6 × 2
#>   diag                        n
#>   <fct>                   <int>
#> 1 Other Or Not Stated   1806436
#> 2 Fracture              1558961
#> 3 Laceration            1432407
#> 4 Strain, Sprain        1432556
#> 5 Contusion Or Abrasion 1451987
#> 6 Other                 1929147

可以寫一個function列牺,不會寫也沒關系,之后會有詳細的解說九默。

count_top <- function(df, var, n = 5) {
  df %>%
    mutate({{ var }} := fct_lump(fct_infreq({{ var }}), n = n)) %>%
    group_by({{ var }}) %>%
    summarise(n = as.integer(sum(weight)))
}

然后在server端安排一下function。這里面有個細節(jié)乙各,width=100% 指定撐滿行距恩静,讓其看上去不會那么凌亂驶乾。

  output$diag <- renderTable(count_top(selected(), diag), width = "100%")
  output$body_part <- renderTable(count_top(selected(), body_part), width = "100%")
  output$location <- renderTable(count_top(selected(), location), width = "100%")

5. 添加選項 rate vs count

fluidRow(
    column(8,
      selectInput("code", "Product",
        choices = setNames(products$prod_code, products$title),
        width = "100%"
      )
    ),
    column(2, selectInput("y", "Y axis", c("rate", "count")))
  ),
output$age_sex <- renderPlot({
    if (input$y == "count") {
      summary() %>%
        ggplot(aes(age, n, colour = sex)) +
        geom_line() +
        labs(y = "Estimated number of injuries")
    } else {
      summary() %>%
        ggplot(aes(age, rate, colour = sex)) +
        geom_line(na.rm = TRUE) +
        labs(y = "Injuries per 10,000 people")
    }
  }, res = 96)

6. 添加文字敘述

首先在UI里添加新的元素罕扎。比如說actionButton

fluidRow(
    column(2, actionButton("story", "Tell me a story")),
    column(10, textOutput("narrative"))
  )

然后在sever里添加一個eventReactive 杆查,這個是表示只有被點擊才會被激活。

arrative_sample <- eventReactive(
    list(input$story, selected()),
    selected() %>% pull(narrative) %>% sample(1)
  )
  output$narrative <- renderText(narrative_sample())
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市舔琅,隨后出現的幾起案子,更是在濱河造成了極大的恐慌郊尝,老刑警劉巖流昏,帶你破解...
    沈念sama閱讀 206,839評論 6 482
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現場離奇詭異襟锐,居然都是意外死亡,警方通過查閱死者的電腦和手機莫杈,發(fā)現死者居然都...
    沈念sama閱讀 88,543評論 2 382
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來关顷,“玉大人,你說我怎么就攤上這事平痰∽诠停” “怎么了逾礁?”我有些...
    開封第一講書人閱讀 153,116評論 0 344
  • 文/不壞的土叔 我叫張陵债热,是天一觀的道長焕刮。 經常有香客問我配并,道長畸冲,這世上最難降的妖魔是什么邑闲? 我笑而不...
    開封第一講書人閱讀 55,371評論 1 279
  • 正文 為了忘掉前任儡陨,我火速辦了婚禮嫌褪,結果婚禮上,老公的妹妹穿的比我還像新娘信轿。我一直安慰自己,他們只是感情好即彪,可當我...
    茶點故事閱讀 64,384評論 5 374
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著深胳,像睡著了一般轻庆。 火紅的嫁衣襯著肌膚如雪余爆。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,111評論 1 285
  • 那天转捕,我揣著相機與錄音,去河邊找鬼枢步。 笑死醉途,一個胖子當著我的面吹牛,可吹牛的內容都是我干的货葬。 我是一名探鬼主播,決...
    沈念sama閱讀 38,416評論 3 400
  • 文/蒼蘭香墨 我猛地睜開眼蹲姐,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了拐邪?” 一聲冷哼從身側響起汹胃,我...
    開封第一講書人閱讀 37,053評論 0 259
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎宰掉,沒想到半個月后轨奄,有當地人在樹林里發(fā)現了一具尸體,經...
    沈念sama閱讀 43,558評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 36,007評論 2 325
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現自己被綠了石景。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片潮孽。...
    茶點故事閱讀 38,117評論 1 334
  • 序言:一個原本活蹦亂跳的男人離奇死亡必逆,死狀恐怖名眉,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情福压,我是刑警寧澤荆姆,帶...
    沈念sama閱讀 33,756評論 4 324
  • 正文 年R本政府宣布邮破,位于F島的核電站,受9級特大地震影響摧莽,放射性物質發(fā)生泄漏范嘱。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 39,324評論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望棉饶。 院中可真熱鬧照藻,春花似錦幸缕、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,315評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽斥赋。三九已至,卻和暖如春隘膘,著一層夾襖步出監(jiān)牢的瞬間弯菊,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,539評論 1 262
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留醇滥,地道東北人鸳玩。 一個月前我還...
    沈念sama閱讀 45,578評論 2 355
  • 正文 我出身青樓,卻偏偏與公主長得像窝革,于是被迫代替她去往敵國和親工猜。 傳聞我的和親對象是個殘疾皇子篷帅,可洞房花燭夜當晚...
    茶點故事閱讀 42,877評論 2 345

推薦閱讀更多精彩內容