1. 引入
融匯貫通之前介紹的內容,來實現一個簡單的數據可視化app厚满。首先會在R里進行一些數據探索碘箍,然后把這些功能轉換為shiny的可交互模式。
本章節(jié)會用到下面的包四濒。
library(shiny)
library(vroom)
library(tidyverse)
2. 數據
數據會用到National Electronic Injury Surveillance System (NEISS)的一份數據。記錄了長期事故入院的病例數據姆涩。
https://github.com/hadley/neiss
本章只會用到2017年的數據,數據大小在10M左右宏赘。
dir.create("neiss")
#> Warning in dir.create("neiss"): 'neiss' already exists
download <- function(name) {
url <- "https://github.com/hadley/mastering-shiny/raw/master/neiss/"
download.file(paste0(url, name), paste0("neiss/", name), quiet = TRUE)
}
download("injuries.tsv.gz")
download("population.tsv")
download("products.tsv")
injuries
是長這樣的
injuries <- vroom::vroom("neiss/injuries.tsv.gz")
injuries
#> # A tibble: 255,064 × 10
#> trmt_date age sex race body_part diag location prod_code weight
#> <date> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2017-01-01 71 male white Upper Trunk Contusion O… Other P… 1807 77.7
#> 2 2017-01-01 16 male white Lower Arm Burns, Ther… Home 676 77.7
#> 3 2017-01-01 58 male white Upper Trunk Contusion O… Home 649 77.7
#> 4 2017-01-01 21 male white Lower Trunk Strain, Spr… Home 4076 77.7
#> 5 2017-01-01 54 male white Head Inter Organ… Other P… 1807 77.7
#> 6 2017-01-01 21 male white Hand Fracture Home 1884 77.7
#> # … with 255,058 more rows, and 1 more variable: narrative <chr>
每個變量的定義:
-
trmt_date
is date the person was seen in the hospital (not when the accident occurred). -
age
,sex
, andrace
give demographic information about the person who experienced the accident. -
body_part
is the location of the injury on the body (like ankle or ear);location
is the place where the accident occurred (like home or school). -
diag
gives the basic diagnosis of the injury (like fracture or laceration). -
prod_code
is the primary product associated with the injury. -
weight
is statistical weight giving the estimated number of people who would suffer this injury if this dataset was scaled to the entire population of the US. -
narrative
is a brief story about how the accident occurred.
然后把這個數據和其他兩個數據匹配起來
products <- vroom::vroom("neiss/products.tsv")
products
#> # A tibble: 38 × 2
#> prod_code title
#> <dbl> <chr>
#> 1 464 knives, not elsewhere classified
#> 2 474 tableware and accessories
#> 3 604 desks, chests, bureaus or buffets
#> 4 611 bathtubs or showers
#> 5 649 toilets
#> 6 676 rugs or carpets, not specified
#> # … with 32 more rows
population <- vroom::vroom("neiss/population.tsv")
population
#> # A tibble: 170 × 3
#> age sex population
#> <dbl> <chr> <dbl>
#> 1 0 female 1924145
#> 2 0 male 2015150
#> 3 1 female 1943534
#> 4 1 male 2031718
#> 5 2 female 1965150
#> 6 2 male 2056625
#> # … with 164 more rows
3. 數據探索
在做成app之前脐往,先把數據探索一下∶酚龋可以從比較好玩的地方開始入手,比方說prod_code
為649的數據缰揪,這個表示發(fā)生漏電事故的產品編號是廁所邀跃。
selected <- injuries %>% filter(prod_code == 649)
nrow(selected)
#> [1] 2993
接著往下看可以找到發(fā)生事故的場所最多的是Home
家里,身體部位是Head
頭
selected %>% count(location, wt = weight, sort = TRUE)
#> # A tibble: 6 × 2
#> location n
#> <chr> <dbl>
#> 1 Home 99603.
#> 2 Other Public Property 18663.
#> 3 Unknown 16267.
#> 4 School 659.
#> 5 Street Or Highway 16.2
#> 6 Sports Or Recreation Place 14.8
selected %>% count(body_part, wt = weight, sort = TRUE)
#> # A tibble: 24 × 2
#> body_part n
#> <chr> <dbl>
#> 1 Head 31370.
#> 2 Lower Trunk 26855.
#> 3 Face 13016.
#> 4 Upper Trunk 12508.
#> 5 Knee 6968.
#> 6 N.S./Unk 6741.
#> # … with 18 more rows
selected %>% count(diag, wt = weight, sort = TRUE)
#> # A tibble: 20 × 2
#> diag n
#> <chr> <dbl>
#> 1 Other Or Not Stated 32897.
#> 2 Contusion Or Abrasion 22493.
#> 3 Inter Organ Injury 21525.
#> 4 Fracture 21497.
#> 5 Laceration 18734.
#> 6 Strain, Sprain 7609.
#> # … with 14 more rows
然后也可以用ggplot
做簡單的可視化分析唁毒,比方說按照age
和sex
count一下數量
summary <- selected %>%
count(age, sex, wt = weight)
summary
#> # A tibble: 208 × 3
#> age sex n
#> <dbl> <chr> <dbl>
#> 1 0 female 4.76
#> 2 0 male 14.3
#> 3 1 female 253.
#> 4 1 male 231.
#> 5 2 female 438.
#> 6 2 male 632.
#> # … with 202 more rows
summary %>%
ggplot(aes(age, n, colour = sex)) +
geom_line() +
labs(y = "Estimated number of injuries")
上圖顯示的是實際數字,也可以改成按照比例顯示诺核。比方說10000個人里有多少個。
summary <- selected %>%
count(age, sex, wt = weight) %>%
left_join(population, by = c("age", "sex")) %>%
mutate(rate = n / population * 1e4)
summary
#> # A tibble: 208 × 5
#> age sex n population rate
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 0 female 4.76 1924145 0.0247
#> 2 0 male 14.3 2015150 0.0708
#> 3 1 female 253. 1943534 1.30
#> 4 1 male 231. 2031718 1.14
#> 5 2 female 438. 1965150 2.23
#> 6 2 male 632. 2056625 3.07
#> # … with 202 more rows
summary %>%
ggplot(aes(age, rate, colour = sex)) +
geom_line(na.rm = TRUE) +
labs(y = "Injuries per 10,000 people")
最后還可以查看一下里面一些案件的文字描述。比方說隨機取樣10個樣本夭咬。
selected %>%
sample_n(10) %>%
pull(narrative)
#> [1] "97 YOM FELL HITTING HEAD ON TOILET SEAT.DX: NECK PX, BACK PX, FREQUENT FALLS."
#> [2] "95 YOF - CONTUSION HEAD - PT WAS TRANSFERRING FROM W.C TO TOILETAND FELL HITTING HEAD ON FLOOR@ N.H"
#> [3] "54YOM HAD A MECHANICAL FALL ATTEMPTING TO USE THE TOILET, C/O LT-SIDEDCHEST PAIN. DX - RIB FX, PNEUMOTHORAX, CHEST WALL CONTUSION"
#> [4] "99YF ATTEMPTING TO GET OFF THE TOILET&FELL FWD STRIKING HEAD&CHEST AGAINST THE WALKER, -LOC>>CHI, RIB FX, FREQ FALLS"
#> [5] "79 YOF HAD SYNCOPAL EPISODE AND FELL FROM TOILET HITTING FACE ONFLOOR DX NASAL BONE FRACTURE"
#> [6] "76YOM C/O GLF @HOME JUST PTA. ATTEMPTING TO SIT ON TOILET, MISSED TOILET AND FELL. NO HI, NO LOC DX=LEFT HIP FRACTURE="
#> [7] "6 YO M LAC HEAD-FELL,CLIMBING ON TOILET,STRUCK THE COUNTERTOP"
#> [8] "85YOM SITTING ON THE TOILET AT HOME AND LEANED FORWARD FELL ONTO HEAD SUSTAINED A SUBDURAL HEMATOMA"
#> [9] "79YOF H'TMA HEAD- LOWERING ONTO TOILET, FELL ON FLOOR"
#> [10] "81YOM WENT TO SIT ON A TOILET AND MISSED IT AND SUSTAINED A CLOSED HEADINJURY"
都是一些比較基礎的數據探索。接下來要做的就是把這些事情轉交給shiny,全都改寫成shiny code托修。
4. Shiny小試牛刀
首先是設置UI界面。
這里會用最簡單的方式演示三張表格涩拙,一張圖〈瓿梗可以提前用筆在紙上打一下草稿規(guī)劃一下界面的排版旭贬。這里打算做成一個2行3列的界面。第一行顯示三張表格奋刽,第二行顯示一張圖。
由于每一行的最大寬度是12列台谍,所以三張表格均勻分布的話是每一張占4列坞生。
prod_codes <- setNames(products$prod_code, products$title)
ui <- fluidPage(
fluidRow(
column(6,
selectInput("code", "Product", choices = prod_codes)
)
),
fluidRow(
column(4, tableOutput("diag")),
column(4, tableOutput("body_part")),
column(4, tableOutput("location"))
),
fluidRow(
column(12, plotOutput("age_sex"))
)
)
雖然到目前為止還沒有對fluidRow()
和column()
有過詳細講解,但是大概可以猜到是用來做什么的卒废。包括setNames()
和selectInput()
之后的章節(jié)里都會有說明。
然后是server端参袱。
server <- function(input, output, session) {
selected <- reactive(injuries %>% filter(prod_code == input$code))
output$diag <- renderTable(
selected() %>% count(diag, wt = weight, sort = TRUE)
)
output$body_part <- renderTable(
selected() %>% count(body_part, wt = weight, sort = TRUE)
)
output$location <- renderTable(
selected() %>% count(location, wt = weight, sort = TRUE)
)
summary <- reactive({
selected() %>%
count(age, sex, wt = weight) %>%
left_join(population, by = c("age", "sex")) %>%
mutate(rate = n / population * 1e4)
})
output$age_sex <- renderPlot({
summary() %>%
ggplot(aes(age, n, colour = sex)) +
geom_line() +
labs(y = "Estimated number of injuries")
}, res = 96)
}
5. 對齊表格樣式
剛才在剛才的結果里看出表格的行數層次不齊环壤,不是很美觀镐捧。如果可以像下面那樣指定排名前幾的就好了。
injuries %>%
mutate(diag = fct_lump(fct_infreq(diag), n = 5)) %>%
group_by(diag) %>%
summarise(n = as.integer(sum(weight)))
#> # A tibble: 6 × 2
#> diag n
#> <fct> <int>
#> 1 Other Or Not Stated 1806436
#> 2 Fracture 1558961
#> 3 Laceration 1432407
#> 4 Strain, Sprain 1432556
#> 5 Contusion Or Abrasion 1451987
#> 6 Other 1929147
可以寫一個function列牺,不會寫也沒關系,之后會有詳細的解說九默。
count_top <- function(df, var, n = 5) {
df %>%
mutate({{ var }} := fct_lump(fct_infreq({{ var }}), n = n)) %>%
group_by({{ var }}) %>%
summarise(n = as.integer(sum(weight)))
}
然后在server端安排一下function。這里面有個細節(jié)乙各,width=100%
指定撐滿行距恩静,讓其看上去不會那么凌亂驶乾。
output$diag <- renderTable(count_top(selected(), diag), width = "100%")
output$body_part <- renderTable(count_top(selected(), body_part), width = "100%")
output$location <- renderTable(count_top(selected(), location), width = "100%")
5. 添加選項 rate vs count
fluidRow(
column(8,
selectInput("code", "Product",
choices = setNames(products$prod_code, products$title),
width = "100%"
)
),
column(2, selectInput("y", "Y axis", c("rate", "count")))
),
output$age_sex <- renderPlot({
if (input$y == "count") {
summary() %>%
ggplot(aes(age, n, colour = sex)) +
geom_line() +
labs(y = "Estimated number of injuries")
} else {
summary() %>%
ggplot(aes(age, rate, colour = sex)) +
geom_line(na.rm = TRUE) +
labs(y = "Injuries per 10,000 people")
}
}, res = 96)
6. 添加文字敘述
首先在UI里添加新的元素罕扎。比如說actionButton
fluidRow(
column(2, actionButton("story", "Tell me a story")),
column(10, textOutput("narrative"))
)
然后在sever里添加一個eventReactive
杆查,這個是表示只有被點擊才會被激活。
arrative_sample <- eventReactive(
list(input$story, selected()),
selected() %>% pull(narrative) %>% sample(1)
)
output$narrative <- renderText(narrative_sample())