R可視化:基礎(chǔ)圖形可視化(一)

基礎(chǔ)圖形可視化

數(shù)據(jù)分析的圖形可視化是了解數(shù)據(jù)分布、波動和相關(guān)性等屬性必不可少的手段。不同的圖形類型對數(shù)據(jù)屬性的表征各不相同锡宋,通常具體問題使用具體的可視化圖形。R語言在可視化方面具有極大的優(yōu)勢特恬,因其本身就是統(tǒng)計學(xué)家為了研究統(tǒng)計問題開發(fā)的編程語言执俩,因此極力推薦使用R語言可視化數(shù)據(jù)。

圖形類型及其使用意義

散點圖

散點圖是由x值和y值確定的點散亂分布在坐標(biāo)軸上癌刽,一是可以用來展示數(shù)據(jù)的分布和聚合情況役首,二是可通過分布情況得到x和y之間的趨勢結(jié)論。多用于回歸分析显拜,發(fā)現(xiàn)自變量和因變量的變化趨勢衡奥,進而選擇合適的函數(shù)對數(shù)據(jù)點進行擬合。

library(ggplot2)
library(dplyr)

dat <- %>% mutate(cyl = factor(cyl)) 
ggplot(dat, aes(x = wt, y = mpg, shape = cyl, color = cyl)) + 
    geom_point(size = 3, alpha = 0.4) + 
    geom_smooth(method = lm, linetype = "dashed", 
        color = "darkred", fill = "blue") + 
    geom_text(aes(label = rownames(dat)), size = 4) + 
    theme_bw(base_size = 12) + 
    theme(plot.title = element_text(size = 10, color = "black", face = "bold", hjust = 0.5), 
          axis.title = element_text(size = 10, color = "black", face = "bold"), 
          axis.text = element_text(size = 9, color = "black"), 
          axis.ticks.length = unit(-0.05, "in"), 
          axis.text.y = element_text(margin = unit(c(0.3, 0.3, 
            0.3, 0.3), "cm"), size = 9), 
          axis.text.x = element_blank(), 
          text = element_text(size = 8, color = "black"), 
          strip.text = element_text(size = 9, color = "black", face = "bold"), 
          panel.grid = element_blank())

直方圖

直方圖是一種對數(shù)據(jù)分布情況進行可視化的圖形远荠,它是二維統(tǒng)計圖表矮固,對應(yīng)兩個坐標(biāo)分別是統(tǒng)計樣本以及該樣本對應(yīng)的某個屬性如頻率等度量。

library(ggplot2)

data <- data.frame(
  Conpany = c("Apple", "Google", "Facebook", "Amozon", "Tencent"), 
  Sale2013 = c(5000, 3500, 2300, 2100, 3100), 
  Sale2014 = c(5050, 3800, 2900, 2500, 3300), 
  Sale2015 = c(5050, 3800, 2900, 2500, 3300), 
  Sale2016 = c(5050, 3800, 2900, 2500, 3300))
mydata <- tidyr::gather(data, Year, Sale, -Conpany)
ggplot(mydata, aes(Conpany, Sale, fill = Year)) + 
    geom_bar(stat = "identity", position = "dodge") +
    guides(fill = guide_legend(title = NULL)) + 
    ggtitle("The Financial Performance of Five Giant") + 
    scale_fill_wsj("rgby", "") + 
    theme_wsj() + 
    theme(
      axis.ticks.length = unit(0.5, "cm"), 
      axis.title = element_blank()))
library(patternplot)

data <- read.csv(system.file("extdata", "monthlyexp.csv", 
        package = "patternplot"))
data <- data[which(data$City == "City 1"), ]
x <- factor(data$Type, c("Housing", "Food", "Childcare"))
y <- data$Monthly_Expenses
pattern.type <- c("hdashes", "blank", "crosshatch")
pattern.color <- c("black", "black", "black")
background.color <- c("white", "white", "white")
density <- c(20, 20, 10)

patternplot::patternbar(data, x, y, group = NULL, 
        ylab = "Monthly Expenses, Dollar", 
        pattern.type = pattern.type, 
        pattern.color = pattern.color,
        background.color = background.color, 
        pattern.line.size = 0.5, 
        frame.color = c("black", "black", "black"), density = density) + 
ggtitle("(A) Black and White with Patterns"))

箱線圖

箱線圖是一種顯示一組數(shù)據(jù)分布情況的統(tǒng)計圖譬淳,它形狀像箱子因此被也被稱為箱形圖档址。它通過六個數(shù)據(jù)節(jié)點將一組數(shù)據(jù)從大到小排列(上極限到下極限),反應(yīng)原始數(shù)據(jù)分布特征邻梆。意義在于發(fā)現(xiàn)關(guān)鍵數(shù)據(jù)如平均值守伸、任何異常值、數(shù)據(jù)分布緊密度和偏分布等浦妄。

library(ggplot2)
library(dplyr)

pr <- unique(dat$Fruit)
grp.col <- c("#999999", "#E69F00", "#56B4E9")

dat %>% mutate(Fruit = factor(Fruit)) %>% 
    ggplot(aes(x = Fruit, y = Weight, color = Fruit)) + 
        stat_boxplot(geom = "errorbar", width = 0.15) + 
        geom_boxplot(aes(fill = Fruit), width = 0.4, outlier.colour = "black",                       outlier.shape = 21, outlier.size = 1) + 
        stat_summary(fun.y = mean, geom = "point", shape = 16,
                     size = 2, color = "black") +
        # 在頂部顯示每組的數(shù)目
        stat_summary(fun.data = function(x) {
            return(data.frame(y = 0.98 * 120, label = length(x)))
            }, geom = "text", hjust = 0.5, color = "red", size = 6) + 
        stat_compare_means(comparisons = list(
            c(pr[1], pr[2]), c(pr[1], pr[3]), c(pr[2], pr[3])),
            label = "p.signif", method = "wilcox.test") + 
        labs(title = "Weight of Fruit", x = "Fruit", y = "Weight (kg)") +
        scale_color_manual(values = grp.col, labels = pr) +
        scale_fill_manual(values = grp.col, labels = pr) + 
        guides(color = F, fil = F) + 
        scale_y_continuous(sec.axis = dup_axis(
            label = NULL, name = NULL),
            breaks = seq(90, 108, 2), limits = c(90, 120)) + 
        theme_bw(base_size = 12) + 
        theme(plot.title = element_text(size = 10, color = "black", 
                                        face = "bold", hjust = 0.5),
              axis.title = element_text(size = 10, 
                                        color = "black", face = "bold"), 
              axis.text = element_text(size = 9, color = "black"),
              axis.ticks.length = unit(-0.05, "in"), 
              axis.text.y = element_text(margin = unit(c(0.3, 0.3, 
                                          0.3, 0.3), "cm"), size = 9),
              axis.text.x = element_text(margin = unit(c(0.3, 
                                          0.3, 0.3, 0.3), "cm")),
              text = element_text(size = 8, color = "black"),
              strip.text = element_text(size = 9, color = "black", face = "bold"),
              panel.grid = element_blank())

面積圖

面積圖是一種展示個體與整體的關(guān)系的統(tǒng)計圖尼摹,更多用于時間序列變化的研究。

library(ggplot2)
library(dplyr)

dat %>% group_by(Fruit, Store) %>% 
summarize(mean_Weight = mean(Weight)) %>% 
        ggplot(aes(x = Store, group = Fruit)) + 
        geom_area(aes(y = mean_Weight, 
            fill = as.factor(Fruit)), position = "stack", linetype = "dashed") + 
        geom_hline(aes(yintercept = mean(mean_Weight)), color = "blue", 
            linetype = "dashed", size = 1) + 
        guides(fill = guide_legend(title = NULL)) + 
        theme_bw(base_size = 12) + 
        theme(plot.title = element_text(size = 10, 
                color = "black", face = "bold", hjust = 0.5), 
            axis.title = element_text(size = 10, 
                color = "black", face = "bold"), 
            axis.text = element_text(size = 9, color = "black"), 
            axis.ticks.length = unit(-0.05, "in"), 
            axis.text.y = element_text(margin = unit(c(0.3, 0.3, 
                0.3, 0.3), "cm"), size = 9), 
            axis.text.x = element_text(margin = unit(c(0.3, 
                0.3, 0.3, 0.3), "cm")), 
            text = element_text(size = 8, color = "black"), 
            strip.text = element_text(size = 9, 
                color = "black", face = "bold"), 
            panel.grid = element_blank())

熱圖

熱圖也是一種對數(shù)據(jù)分布情況可視化的統(tǒng)計圖形剂娄,如下圖表現(xiàn)得是數(shù)據(jù)差異性的具象化實例蠢涝。一般用于樣本聚類等可視化過程。在基因表達或者豐度表達差異研究中阅懦,熱圖既可以展現(xiàn)數(shù)據(jù)質(zhì)量間的差異性和二,也可以用于聚類等。

library(ggplot2)

data <- as.data.frame(matrix(rnorm(9 * 10), 9, 10))
rownames(data) <- paste("Gene", 1:9, sep = "_")
colnames(data) <- paste("sample", 1:10, sep = "_")
data$ID <- rownames(data)
data_m <- tidyr::gather(data, sampleID, value, -ID)

ggplot(data_m, aes(x = sampleID, y = ID)) + 
    geom_tile(aes(fill = value)) + 
    scale_fill_gradient2("Expression", low = "green", high = "red", 
            mid = "black") + 
    xlab("samples") + 
    theme_classic() + 
    theme(axis.ticks = element_blank(), 
          axis.line = element_blank(), 
          panel.grid.major = element_blank(),
          legend.key = element_blank(), 
          axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
          legend.position = "top")

相關(guān)圖

相關(guān)圖是熱圖的一種特殊形式故黑,展示的是樣本間相關(guān)系數(shù)大小的熱圖儿咱。

library(corrplot)

corrplot(corr = cor(dat[1:7]), order = "AOE", type = "upper", tl.pos = "d")
corrplot(corr = cor(dat[1:7]), add = TRUE, type = "lower", method = "number", 
    order = "AOE", diag = FALSE, tl.pos = "n", cl.pos = "n")

折線圖

折線圖是反應(yīng)數(shù)據(jù)分布趨勢的可視化圖形庭砍,其本質(zhì)和堆積圖或者說面積圖有些相似场晶。

library(ggplot2)
library(dplyr)

grp.col <- c("#999999", "#E69F00", "#56B4E9")
dat.cln <- sampling::strata(dat, stratanames = "Fruit", 
    size = rep(round(nrow(dat) * 0.1/3, -1), 3), method = "srswor")

dat %>% slice(dat.cln$ID_unit) %>% 
    mutate(Year = as.character(rep(1996:2015, times = 3))) %>% 
    mutate(Year = factor(as.character(Year))) %>% 
    ggplot(aes(x = Year, y = Weight, linetype = Fruit, colour = Fruit, 
            shape = Fruit, fill = Fruit)) + 
        geom_line(aes(group = Fruit)) + 
        geom_point() + 
        scale_linetype_manual(values = c(1:3)) + 
        scale_shape_manual(values = c(19, 21, 23)) +
        scale_color_manual(values = grp.col, 
            labels = pr) + 
        scale_fill_manual(values = grp.col, labels = pr) + 
        theme_bw() + 
        theme(plot.title = element_text(size = 10, 
                color = "black", face = "bold", hjust = 0.5),
              axis.title = element_text(size = 10, color = "black", face = "bold"), 
              axis.text = element_text(size = 9, color = "black"),
              axis.ticks.length = unit(-0.05, "in"), 
              axis.text.y = element_text(margin = unit(c(0.3, 0.3, 
                0.3, 0.3), "cm"), size = 9),
              axis.text.x = element_text(margin = unit(c(0.3, 
                0.3, 0.3, 0.3), "cm")),
              text = element_text(size = 8, color = "black"),
              strip.text = element_text(size = 9, color = "black", face = "bold"),                    panel.grid = element_blank())

韋恩圖

韋恩圖是一種展示不同分組之間集合重疊區(qū)域的可視化圖混埠。

library(VennDiagram)

A <- sample(LETTERS, 18, replace = FALSE)
B <- sample(LETTERS, 18, replace = FALSE)
C <- sample(LETTERS, 18, replace = FALSE)
D <- sample(LETTERS, 18, replace = FALSE)

venn.diagram(x = list(A = A, D = D, B = B, C = C),
     filename = "Group4.png", height = 450, width = 450, 
     resolution = 300, imagetype = "png", col = "transparent", 
     fill = c("cornflowerblue", "green", "yellow", "darkorchid1"),
     alpha = 0.5, cex = 0.45, cat.cex = 0.45)
library(ggplot2)
library(UpSetR)

movies <- read.csv(system.file("extdata", "movies.csv", 
                package = "UpSetR"), header = T, sep = ";")
mutations <- read.csv(system.file("extdata", "mutations.csv", 
                package = "UpSetR"), header = T, sep = ",")

another.plot <- function(data, x, y) {
  round_any_new <- function(x, accuracy, f = round) {
    f(x/accuracy) * accuracy
  }
  data$decades <- round_any_new(as.integer(unlist(data[y])), 10, ceiling)
  data <- data[which(data$decades >= 1970), ]
  myplot <- (ggplot(data, aes_string(x = x)) + 
               geom_density(aes(fill = factor(decades)), alpha = 0.4) + 
               theme_bw() + 
               theme(plot.margin = unit(c(0, 0, 0, 0), "cm"), 
               legend.key.size = unit(0.4, "cm")))
}

upset(movies, main.bar.color = "black", 
      mb.ratio = c(0.5, 0.5), 
      queries = list(list(query = intersects, params = list("Drama"),
        color = "red", active = F), 
                list(query = intersects, params = list("Action", "Drama"), active = T),
                list(query = intersects, params = list("Drama", "Comedy", "Action"),
                    color = "orange",active = T)), 
      attribute.plots = list(gridrows = 50, 
           plots = list(list(plot = histogram, x = "ReleaseDate", queries = F), 
                   list(plot = scatter_plot, x = "ReleaseDate", 
                        y = "AvgRating", queries = T), 
                   list(plot = another.plot,x = "AvgRating", y = "ReleaseDate",
                        queries = F)),
                    ncols = 3)))

火山圖

火山圖通過兩個屬性Fold changeP value反應(yīng)兩組數(shù)據(jù)的差異性。

library(ggplot2)

data <- read.table(choose.files(),header = TRUE)
data$color <- ifelse(data$padj<0.05 & abs(data$log2FoldChange)>= 1,
                     ifelse(data$log2FoldChange > 1,'red','blue'),'gray')
color <- c(red = "red",gray = "gray",blue = "blue")

ggplot(data, aes(log2FoldChange, -log10(padj), col = color)) +
  geom_point() +
  theme_bw() +
  scale_color_manual(values = color) +
  labs(x="log2 (fold change)",y="-log10 (q-value)") +
  geom_hline(yintercept = -log10(0.05), lty=4,col="grey",lwd=0.6) +
  geom_vline(xintercept = c(-1, 1), lty=4,col="grey",lwd=0.6) +
  theme(legend.position = "none",
        panel.grid=element_blank(),
        axis.title = element_text(size = 16),
        axis.text = element_text(size = 14))

餅圖

餅圖是用于刻畫分組間如頻率等屬性的相對關(guān)系圖诗轻。

library(patternplot)

data <- read.csv(system.file("extdata", "vegetables.csv", 
                             package = "patternplot"))
pattern.type <- c("hdashes", "vdashes", "bricks")
pattern.color <- c("red3", "green3", "white")
background.color <- c("dodgerblue", "lightpink", "orange")

patternpie(group = data$group, pct = data$pct, 
    label = data$label, pattern.type = pattern.type,
    pattern.color = pattern.color, 
    background.color = background.color, frame.color = "grey40", 
    pixel = 0.3, pattern.line.size = 0.3, frame.size = 1.5, 
    label.size = 5, label.distance = 1.35) + 
  ggtitle("(B) Colors with Patterns"))

密度曲線圖

密度曲線圖反應(yīng)的是數(shù)據(jù)在不同區(qū)間的密度分布情況钳宪,和概率密度函數(shù)PDF曲線類似。

library(ggplot2)
library(plyr)

set.seed(1234)
df <- data.frame(
  sex=factor(rep(c("F", "M"), each=200)),
  weight=round(c(rnorm(200, mean=55, sd=5),
                 rnorm(200, mean=65, sd=5)))
)
mu <- ddply(df, "sex", summarise, grp.mean=mean(weight))

ggplot(df, aes(x=weight, fill=sex)) +
  geom_histogram(aes(y=..density..), alpha=0.5, 
                 position="identity") +
  geom_density(alpha=0.4) +
  geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
             linetype="dashed") + 
  scale_color_grey() + 
  theme_classic()+
  theme(legend.position="top")

參考

  1. 直方圖定義
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
禁止轉(zhuǎn)載扳炬,如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者吏颖。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市恨樟,隨后出現(xiàn)的幾起案子半醉,更是在濱河造成了極大的恐慌,老刑警劉巖劝术,帶你破解...
    沈念sama閱讀 217,734評論 6 505
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件缩多,死亡現(xiàn)場離奇詭異,居然都是意外死亡养晋,警方通過查閱死者的電腦和手機衬吆,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,931評論 3 394
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來绳泉,“玉大人逊抡,你說我怎么就攤上這事×憷遥” “怎么了冒嫡?”我有些...
    開封第一講書人閱讀 164,133評論 0 354
  • 文/不壞的土叔 我叫張陵,是天一觀的道長四苇。 經(jīng)常有香客問我灯谣,道長,這世上最難降的妖魔是什么蛔琅? 我笑而不...
    開封第一講書人閱讀 58,532評論 1 293
  • 正文 為了忘掉前任胎许,我火速辦了婚禮,結(jié)果婚禮上罗售,老公的妹妹穿的比我還像新娘辜窑。我一直安慰自己,他們只是感情好寨躁,可當(dāng)我...
    茶點故事閱讀 67,585評論 6 392
  • 文/花漫 我一把揭開白布穆碎。 她就那樣靜靜地躺著,像睡著了一般职恳。 火紅的嫁衣襯著肌膚如雪所禀。 梳的紋絲不亂的頭發(fā)上方面,一...
    開封第一講書人閱讀 51,462評論 1 302
  • 那天,我揣著相機與錄音色徘,去河邊找鬼恭金。 笑死,一個胖子當(dāng)著我的面吹牛褂策,可吹牛的內(nèi)容都是我干的横腿。 我是一名探鬼主播,決...
    沈念sama閱讀 40,262評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼斤寂,長吁一口氣:“原來是場噩夢啊……” “哼耿焊!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起遍搞,我...
    開封第一講書人閱讀 39,153評論 0 276
  • 序言:老撾萬榮一對情侶失蹤罗侯,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后溪猿,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體钩杰,經(jīng)...
    沈念sama閱讀 45,587評論 1 314
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,792評論 3 336
  • 正文 我和宋清朗相戀三年再愈,在試婚紗的時候發(fā)現(xiàn)自己被綠了榜苫。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 39,919評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡翎冲,死狀恐怖垂睬,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情抗悍,我是刑警寧澤驹饺,帶...
    沈念sama閱讀 35,635評論 5 345
  • 正文 年R本政府宣布,位于F島的核電站缴渊,受9級特大地震影響赏壹,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜衔沼,卻給世界環(huán)境...
    茶點故事閱讀 41,237評論 3 329
  • 文/蒙蒙 一蝌借、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧指蚁,春花似錦菩佑、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,855評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至搓劫,卻和暖如春瞧哟,著一層夾襖步出監(jiān)牢的瞬間混巧,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,983評論 1 269
  • 我被黑心中介騙來泰國打工勤揩, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留咧党,地道東北人。 一個月前我還...
    沈念sama閱讀 48,048評論 3 370
  • 正文 我出身青樓雄可,卻偏偏與公主長得像凿傅,于是被迫代替她去往敵國和親缠犀。 傳聞我的和親對象是個殘疾皇子数苫,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,864評論 2 354