在用Seurat包做多樣本整合的時(shí)候啡彬,我們通常采用兩種方式:
(1)merge的方式
(2)FindIntegrationAnchors的方式整合
這里我們來解析一下FindIntegrationAnchors函數(shù)里面的參數(shù)及用法:
對于要進(jìn)行多樣本整合的數(shù)據(jù),通常的做法是:
for (each in samples){
# ob=paste("ob",each,sep="_")
pbmc <- readRDS(paste0(path,'/',each,'_QC.rds'))
if(grep('-1',colnames(pbmc@assays$RNA@counts)[1])){
colnames(pbmc@assays$RNA@counts) <- str_replace_all(colnames(pbmc@assays$RNA@counts), '-1',paste0('-',numsap))
}else{
colnames(pbmc@assays$RNA@counts) <- paste0(colnames(pbmc@assays$RNA@counts),'-',numsap)
}
ob <- CreateSeuratObject(counts =pbmc@assays$RNA@counts,project =each,min.cells = min_cells)
ob$stim <-each
ob <- NormalizeData(ob)
ob <- FindVariableFeatures(ob, selection.method = "vst",nfeatures = Nfeatures)
numsap=numsap+1
ob.list[[each]] <- ob
}
anchors <- FindIntegrationAnchors(object.list = ob.list, dims = 1:20)
combined <- IntegrateData(anchorset = anchors, dims = 1:20)
也就是單樣本做了均一化后追逮,進(jìn)行多樣本的整合
那這個(gè)函數(shù)FindIntegrationAnchors就是來幫助我們尋找樣本整合的數(shù)據(jù)點(diǎn)玩讳;
看一下這個(gè)函數(shù)的參數(shù):
Description:
Find a set of anchors between a list of ‘Seurat’ objects. These
anchors can later be used to integrate the objects using the
‘IntegrateData’ function.(多個(gè)Seurat對象尋找anchors,也就是錨點(diǎn))
主要參數(shù):
assay: A vector of assay names specifying which assay to use when
constructing anchors. If NULL, the current default assay for
each object is used.(這個(gè)參數(shù)說明我們可以用部分樣本進(jìn)行anchors的尋找)。
reference: A vector specifying the object/s to be used as a reference
during integration. If NULL (default), all pairwise anchors
are found (no reference/s). If not NULL, the corresponding
objects in ‘object.list’ will be used as references. When
using a set of specified references, anchors are first found
between each query and each reference. The references are
then integrated through pairwise integration. Each query is
then mapped to the integrated reference.(這種方式說明希太,如果我們有部分樣本細(xì)胞定義的結(jié)果很好,那么這部分樣本可以作為reference酝蜒,然后未知細(xì)胞類型的樣本與參考集之間查找錨點(diǎn)誊辉,然后進(jìn)行整合,這種方式類似于RCA亡脑,scanpy的樣本整合方式堕澄,區(qū)別在于這里不需要事先對細(xì)胞進(jìn)行定義)。
anchor.features: Can be either:
? A numeric value. This will call
‘SelectIntegrationFeatures’ to select the provided number
of features to be used in anchor finding
? A vector of features to be used as input to the anchor
finding process
(如果在先驗(yàn)知識(shí)很強(qiáng)的前提下霉咨,我們可以指定基因進(jìn)行錨點(diǎn)的查找)蛙紫。
normalization.method:
Name of normalization method used: LogNormalize
or SCT
(SCT的標(biāo)準(zhǔn)化的方式是SCTransform這個(gè)函數(shù),大家可以看一下途戒,推薦這個(gè))
reduction: Dimensional reduction to perform when finding anchors. Can
be one of:
? cca(典型相關(guān)分析): Canonical correlation analysis(挖掘出數(shù)據(jù)間的關(guān)聯(lián)關(guān)系的算法坑傅,原理就是CCA將多維數(shù)據(jù)利用線性變換投影為1維的數(shù)據(jù),然后計(jì)算相關(guān)系數(shù)喷斋,進(jìn)而得到二者的相關(guān)性唁毒,在這里我們就是兩兩細(xì)胞之間的相關(guān)性,
那么我們的投影標(biāo)準(zhǔn)就是:
投影后星爪,兩組數(shù)據(jù)的相關(guān)系數(shù)最大浆西。
但是要要注意,CCA投影到一維且尋找最大相關(guān)性移必,所以存在整合過矯正的問題)
? rpca: Reciprocal PCA
When determining anchors between any two datasets using reciprocal PCA, we project each dataset into the other's PCA space and constrain the anchors by the same mutual neighborhood requirement. All downstream integration steps remain the same and we are able to 'correct' (or harmonize) the datasets.
(也就是是說從基因角度尋找錨點(diǎn)變成了主成分)室谚。
For large studies with many datasets, we recommend also combining reciprocal PCA with reference-based integration, or SCTransform normalization (see details on previous tab)。但是要注意崔泵,這種整合方式僅用于大樣本秒赤,因?yàn)殄^點(diǎn)尋找的方式比CCA“粗糙”,大家要注意這一點(diǎn)憎瘸。
k.anchor: How many neighbors (k) to use when picking anchors
k.filter: How many neighbors (k) to use when filtering anchors
k.score: How many neighbors (k) to use when scoring anchors
(錨點(diǎn)的確定原則入篮,這個(gè)需要深入研究一下)
nn.method: Method for nearest neighbor finding. Options include: rann,
annoy
其中Seurat中有很多方法值得我們借鑒,但是具體情況要具體分析幌甘,千萬不要是個(gè)方法就拿來用潮售。
請保持憤怒痊项,讓王多魚傾家蕩產(chǎn)~~~~