免疫組庫數(shù)據(jù)分析||immunarch教程：Diversity 分析

immunarch — Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R

10× Genomics單細胞免疫組庫VDJ分析必知必會
 免疫組庫數(shù)據(jù)分析||immunarch教程：克隆型分析
 免疫組庫數(shù)據(jù)分析||immunarch教程：探索性數(shù)據(jù)分析
 免疫組庫數(shù)據(jù)分析||immunarch教程：載入10X數(shù)據(jù)
免疫組庫數(shù)據(jù)分析||immunarch教程：快速開始
 免疫組庫數(shù)據(jù)分析||immunarch教程：GeneUsage分析

今天司草，我們繼續(xù)我們的免疫組庫數(shù)據(jù)分析的Demos，這一次我們來談談Diversity 分析泡仗。像我這樣剛?cè)腴T免疫組庫的人首先會問什么是Diversity 埋虹？如果不是生態(tài)學出身，可能更多地聽到的是異質(zhì)性沮焕，多樣性是異質(zhì)性的一種體現(xiàn)。坦率地說拉宗，我接觸多樣性是從《數(shù)量生態(tài)學》開始的峦树，這是一門研究某生境下物種分布/多寡/遷移的學科。其實我們的數(shù)據(jù)可以做一個類比：群落-組織旦事；物種-VDJ克隆型魁巩。其實《數(shù)量生態(tài)學》研究的也是一張豐度表。所以我們可以用數(shù)量生態(tài)學大名鼎鼎的R包（vegan）來計算VDJ
的多樣性姐浮，只是我們今天介紹的immunarch 用的是自己寫的代碼（我是怎么知道的谷遂？看了源碼呀），而另一個VDJ分析工具(scRepertoire)就直接用了vegen 卖鲤。

所以肾扰，什么是Diversity 畴嘶？

在生物學種，物種（替換為克隆型）豐富度(Species Richness集晚，s)是一個相對的術(shù)語窗悯，指的是群落（組織）中物種的數(shù)量，它直接關系到某一地區(qū)物種多樣性的測量偷拔。一個相關的術(shù)語蒋院，均勻性(evenness ，E)莲绰，是多樣性的另一個方面欺旧，它定義了同一地區(qū)每個物種的個體數(shù)量。這些術(shù)語一起被用來描述地球上的物種多樣性（diversity ）模式蛤签。

Species Diversity - QS Study

在下面的分析中辞友，我們可以看到許多生態(tài)學中的概念和指標。在新版的《數(shù)量生態(tài)學：R語言應用》中新增第八章專門講了群落多樣性顷啼，可以參考踏枣。很多分析不過是換個矩陣，當年為什么要學生態(tài)學钙蒙，原來在這等著我呢茵瀑。

在immunarch中有更豐富的多樣性指標和方便的統(tǒng)計方法。在repDiversity函數(shù)中實現(xiàn)了對曲目多樣性估計的幾種方法躬厌。與上述函數(shù)相似的马昨。method參數(shù)設置了多樣性估計的方法。你可以選擇以下方法之一:

Chao1 estimator is a nonparameteric asymptotic estimator of species richness (number of species in a population).
Hill numbers are a mathematically unified family of diversity indices (differing only by an exponent q).
div- True diversity, or the effective number of types, refers to the number of equally-abundant types needed for the average proportional abundance of the types to equal that observed in the dataset of interest where all types may not be equally abundant.
gini.simp - The Gini-Simpson index is the probability of interspecific encounter, i.e., probability that two entities represent different types.
inv.simp - Inverse Simpson index is the effective number of types that is obtained when the weighted arithmetic mean is used to quantify average proportional abundance of types in the dataset of interest.
gini - The Gini coefficient measures the inequality among values of a frequency distribution (for example levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of one (or 100 percents ) expresses maximal inequality among values (for example where only one person has all the income).
raref - Rarefaction is a technique to assess species richness from the results of sampling through extrapolation.

我們同樣載入R包和數(shù)據(jù)：

library(immunarch); data(immdata)       # Load the package and the test dataset
?repDiversity

div_div <- repDiversity(immdata$data, "inv.simp")
div_div

    Sample     Value
1  A2-i129  795.1269
2  A2-i131 1271.0224
3  A2-i133  425.6711
4  A2-i132 3435.5682
5  A4-i191  191.2722
6  A4-i192  525.2406
7      MS1  140.5916
8      MS2 1816.6960
9      MS3  141.6550
10     MS4 4504.9258
11     MS5  135.1877
12     MS6 3809.8502

下面我們用vegna計算一下來做個比較：

library(vegan)
ve_inv <- diversity(immdata$data$`A2-i129`$Clones,"inv")
ve_inv
797.5846

names(immdata$data)
 [1] "A2-i129" "A2-i131" "A2-i133" "A2-i132" "A4-i191" "A4-i192" "MS1"     "MS2"     "MS3"     "MS4"     "MS5"     "MS6"    
 ve_inv = unlist(lapply(1:length(names(immdata$data)), FUN = function(x ){diversity(immdata$data[[x]]$Clones,"inv")}))

 cor(div_div$Value,ve_inv)
[1] 0.9986046

兩種方法計算的inv.simp相關系數(shù)為0.9986046.

繼續(xù)發(fā)揮immunarch 短平快的優(yōu)勢扛施，快速統(tǒng)計和作圖鸿捧。

# Compute statistics and visualise them
# Chao1 diversity measure
div_chao <- repDiversity(immdata$data, "chao1")

# Hill numbers
div_hill <- repDiversity(immdata$data, "hill")

# D50
div_d50 <- repDiversity(immdata$data, "d50")

# Ecological diversity measure
div_div <- repDiversity(immdata$data, "div")


p1 <- vis(div_chao)
p2 <- vis(div_chao, .by = c("Status", "Sex"), .meta = immdata$meta)
p3 <- vis(div_hill, .by = c("Status", "Sex"), .meta = immdata$meta)

p4 <- vis(div_d50)
p5 <- vis(div_d50, .by = "Status", .meta = immdata$meta)
p6 <- vis(div_div)

p1 + p2

p3 + p6

p4 + p5

做過擴增子和生態(tài)的朋友對稀釋曲線不應感到陌生：

imm_raref <- repDiversity(immdata$data, "raref", .verbose = F)

 imm_raref[1:5,]
  Size     Q0.025       Mean     Q0.975  Sample          Type
1 0.02 0.02485373 0.02387968 0.02582681 A2-i129 interpolation
2 0.04 0.04849410 0.04689468 0.05009084 A2-i129 interpolation
3 0.06 0.07154025 0.06938140 0.07369409 A2-i129 interpolation
4 0.08 0.09416917 0.09148383 0.09684658 A2-i129 interpolation
5 0.10 0.11647440 0.11328184 0.11965546 A2-i129 interpolation


p1 <- vis(imm_raref)
p2 <- vis(imm_raref, .by = "Status", .meta = immdata$meta)
p1 + p2

repDiversity(immdata$data, "raref", .verbose = F) %>% vis(.log = TRUE)

VDJ 多樣性是免疫組庫分析的核心，本節(jié)我們引進生態(tài)學多樣性的指標來刻畫克隆型疙渣，為我們從總體上來看VDJ的狀態(tài)匙奴。我們?yōu)槭裁匆龆鄻有匝芯浚窟€不是為了找出異質(zhì)性嗎妄荔。

下面的描述來自生物學的某個wiki泼菌，對我們研究VDJ不無啟發(fā)意義。

以下是三種公認的生物物種多樣性假說啦租。它們包括:(1)異質(zhì)性假說哗伯，(2)競爭假說，(3)捕食假說篷角。
衡量多樣性有三個主要原因:(1)衡量穩(wěn)定性焊刹，以確定一個環(huán)境是否在退化，(2)比較兩個或更多的環(huán)境，和(3)消除對廣泛列表的需要（形成數(shù)據(jù)概覽虐块，即用一個多樣性指標俩滥，說明了需要列舉很多概念才能說清楚的東西）。多樣性指數(shù)提供了一個群落組成的重要信息非凌。這些指數(shù)不僅衡量了物種的豐富度举农，還考慮了物種的相對豐富度或均勻度。在測量物種多樣性時敞嗡，物種豐富度和均勻度必須同時考慮颁糟。此外，指數(shù)還提供了物種稀有度和共性的重要信息喉悴。

https://immunarch.com/articles/web_only/v6_diversity.html
https://en.wikibooks.org/wiki/Ecology/Species_Richness_and_Diversity