由于層次聚類(lèi)的問(wèn)題(數(shù)據(jù)量太大的時(shí)候燎潮,層次聚類(lèi)的速度很慢喻鳄,同時(shí)會(huì)報(bào)堆錯(cuò)誤,出現(xiàn)內(nèi)存溢出)确封,inferCNV的不能處理大批量的數(shù)據(jù)除呵。另外,對(duì)于來(lái)源于不同病人的樣本爪喘,inferCNV的issue的討論也是分開(kāi)跑颜曾。
首先是對(duì)于配對(duì)的病人和多樣本的的處理問(wèn)題
官網(wǎng)的解釋?zhuān)?https://github.com/broadinstitute/infercnv/issues/429
對(duì)于同時(shí)具癌旁和癌paired的病人,可以使用正常的epithelial cel作為reference秉剑,tumor的epi cell作為observation泛豪。但是對(duì)于同時(shí)提供多個(gè)Normal作為ref,inferCNV并不能把他們和tumor的epi cell做到一一對(duì)應(yīng)侦鹏。
另外诡曙,單獨(dú)對(duì)一名患者的數(shù)據(jù)運(yùn)行 infercnv 確實(shí)會(huì)更快,并且如果您的不同樣本具有相似的類(lèi)型略水,則結(jié)果應(yīng)該在很大程度上相似价卤,盡管它們可能不會(huì)完全相同,因?yàn)榈捅磉_(dá)/不表達(dá)的初始過(guò)濾基因會(huì)受到細(xì)胞總數(shù)以及不同細(xì)胞類(lèi)型之間混合比例的影響渊涝。由于你對(duì)亞克隆結(jié)構(gòu)感興趣慎璧,官網(wǎng)強(qiáng)烈建議使用HMM=T,analysis_mode="subclusters"否則預(yù)測(cè)可能不準(zhǔn)確。
如果您決定對(duì)多個(gè)患者的數(shù)據(jù)一起運(yùn)行 infercnv跨释,除了cluster_by_groups您提到的選項(xiàng)之外胸私,plot_per_group()函數(shù)還可以將每個(gè)組繪制在不同的圖形上,并且dynamic_resize繪圖方法的選項(xiàng)可以使熱圖的可讀性更好鳖谈,得到的heatmap對(duì)于大型數(shù)據(jù)集友好岁疼。
對(duì)于單樣本分析完成樣本,如果想做可比的heatmap圖蚯姆,有一些問(wèn)題:
- plot_cnv默認(rèn)的參數(shù)x軸的scale五续,是直接根據(jù)測(cè)到gene數(shù)目進(jìn)行scale,需要使用plot_chr_scale 參數(shù)來(lái)保證染色體的寬度是可比的
chr_df = read.chromInfo(species = 'hg38')$df
chr_df = chr_df[chr_df$chr %in% paste0("chr", 1:22), ]
chrLen <- chr_df$end
names(chrLen) <- chr_df$chr
cnvobj <- readRDS(file = paste0('./result/run.final.infercnv_obj'))
fileDir <- paste0('./result_ChromeScale/')
dir_mk(fileDir)
plot_cnv(infercnv_obj = cnvobj,
out_dir = fileDir,
title = sampleid,
plot_chr_scale = T,
chr_lengths = chrLen,
#old version.
x.range = c(0.9,1.1),
x.center=1,
png_res = 300,
output_format = 'png',
useRaster = T)
- 每個(gè)樣本測(cè)到的基因數(shù)目是不一樣的龄恋,為了不同樣本的CNV可比疙驾,需要使用complexHeatmap的提供的幫助函數(shù),從而獲得chromesome level的CNV結(jié)果郭毕。具體的操作如下:
library(circlize)
library(GenomicRanges)
chr_df = read.chromInfo()$df
chr_df = chr_df[chr_df$chr %in% paste0("chr", 1:22), ]
chr_gr = GRanges(seqnames = chr_df[, 1], ranges = IRanges(chr_df[, 2] + 1, chr_df[, 3]))
library(EnrichedHeatmap)
chr_window = makeWindows(chr_gr, w = 1e6)
chr_window
average_in_window = function(window, gr, v, method = "weighted", empty_v = NA) {
if(missing(v)) v = rep(1, length(gr))
if(is.null(v)) v = rep(1, length(gr))
if(is.atomic(v) && is.vector(v)) v = cbind(v)
v = as.matrix(v)
if(is.character(v) && ncol(v) > 1) {
stop("`v` can only be a character vector.")
}
if(length(empty_v) == 1) {
empty_v = rep(empty_v, ncol(v))
}
u = matrix(rep(empty_v, each = length(window)), nrow = length(window), ncol = ncol(v))
mtch = as.matrix(findOverlaps(window, gr))
intersect = pintersect(window[mtch[,1]], gr[mtch[,2]])
w = width(intersect)
v = v[mtch[,2], , drop = FALSE]
n = nrow(v)
ind_list = split(seq_len(n), mtch[, 1])
window_index = as.numeric(names(ind_list))
window_w = width(window)
if(is.character(v)) {
for(i in seq_along(ind_list)) {
ind = ind_list[[i]]
if(is.function(method)) {
u[window_index[i], ] = method(v[ind], w[ind], window_w[i])
} else {
tb = tapply(w[ind], v[ind], sum)
u[window_index[i], ] = names(tb[which.max(tb)])
}
}
} else {
if(method == "w0") {
gr2 = reduce(gr, min.gapwidth = 0)
mtch2 = as.matrix(findOverlaps(window, gr2))
intersect2 = pintersect(window[mtch2[, 1]], gr2[mtch2[, 2]])
width_intersect = tapply(width(intersect2), mtch2[, 1], sum)
ind = unique(mtch2[, 1])
width_setdiff = width(window[ind]) - width_intersect
w2 = width(window[ind])
for(i in seq_along(ind_list)) {
ind = ind_list[[i]]
x = colSums(v[ind, , drop = FALSE]*w[ind])/sum(w[ind])
u[window_index[i], ] = (x*width_intersect[i] + empty_v*width_setdiff[i])/w2[i]
}
} else if(method == "absolute") {
for(i in seq_along(ind_list)) {
u[window_index[i], ] = colMeans(v[ind_list[[i]], , drop = FALSE])
}
} else if(method == "weighted") {
for(i in seq_along(ind_list)) {
ind = ind_list[[i]]
u[window_index[i], ] = colSums(v[ind, , drop = FALSE]*w[ind])/sum(w[ind])
}
} else {
if(is.function(method)) {
for(i in seq_along(ind_list)) {
ind = ind_list[[i]]
u[window_index[i], ] = method(v[ind], w[ind], window_w[i])
}
} else {
stop("wrong method.")
}
}
}
return(u)
}
bed1 = generateRandomBed(nr = 1000, nc = 10) # generateRandomBed() is from circlize package
# convert to a GRanes object
gr1 = GRanges(seqnames = bed1[, 1], ranges = IRanges(bed1[, 2], bed1[, 3]))
#獲取統(tǒng)一的結(jié)果
num_mat = average_in_window(chr_window, gr1, bed1[, -(1:3)])
#行為genomic ranges它碎, 列為cell
dim(num_mat)
獲得num_mat后,自己根據(jù)cellid作圖即可
參考: https://jokergoo.github.io/ComplexHeatmap-reference/book/genome-level-heatmap.html
我的數(shù)據(jù)得到的圖: