CARD 空間轉(zhuǎn)錄組和單細(xì)胞轉(zhuǎn)錄組之間的反卷積

首先附上參考文獻(xiàn):《Spatially informed cell-type deconvolution for spatial transcriptomics》

原理分析

截至目前蚌成,大部分的反卷積求細(xì)胞組分的基本邏輯都是回歸,CARD的基本邏輯同樣采用的是回歸策略:

其中:

  1. B: as the G-by-K cell-type-specific expression matrix for the informative genes(單細(xì)胞表達(dá)矩陣)
  2. X: as the G-by-N gene expression matrix for the same set of informative genes measured on N spatial locations(空轉(zhuǎn)表達(dá)矩陣)
  3. V: as the N-by-K cell-type composition matrix(空間每個(gè)location的細(xì)胞組分矩陣)
  4. E: 服從
  5. 以上矩陣均為非負(fù)矩陣

同時(shí)作者也考慮到了相鄰位置的兩個(gè)location細(xì)胞成分可能會(huì)很相近音羞,因此考慮了矯正歧胁,這里的矯正作者利用了回歸到思想,建立了 VikVjk 之間的線性關(guān)系静秆,表明第 i 個(gè)位置的細(xì)胞組分首臨近位置細(xì)胞組分的影響(第 i 個(gè)位置的細(xì)胞組分為非自由項(xiàng)):

其中:

  1. Vik:represents the proportion of cell type k on the ith location娘侍,即第 i 個(gè)位置 cell type k 的比例
  2. bk: is the kth cell-type-specific intercept that represents the average cell-type composition across locations路媚,類似于回歸問題中的截距,表示 cell type k 在所有位置比例的均值扶认,為一列向量
  3. W: an N-by-N non-negative weight matrix with each element Wij specifying the weight used for inferring the cell-type composition on the ith location based on the cell-type composition information on the jth location请琳,類似于回歸問題中的回歸系數(shù)
  4. ?: is a spatial autocorrelation parameter that determines the strength of the spatial correlation in cell-type composition粱挡,表征為決定系數(shù)
  5. Vjk:kth cell-type compositions on all other locations

而我們的目的就是要推斷出 Vik 矩陣,即每個(gè)location的細(xì)胞組分

統(tǒng)計(jì)推斷部分:
根據(jù)引理:

引理

假設(shè)下面式子中的εik服從正態(tài)分布

εik 代入引理可得到公式1俄精,我們可以得到如下關(guān)系询筏,公式1主要描述的是尋找一個(gè)合適的 Vik 矩陣,使得誤差 εik 盡可能惺邸:
公式1

而整個(gè)統(tǒng)計(jì)推斷將轉(zhuǎn)變?yōu)橐粋€(gè)最優(yōu)化問題嫌套,即尋找一個(gè)合適的 Vik 矩陣,使得 Vik
之間的誤差 εik 盡可能小

將公式1化簡以后:


這里巧用矩陣乘法法則圾旨,將加和(∑)改變成為了兩個(gè)矩陣乘積形式踱讨,其中:



提出公因式后

接下來構(gòu)造似然函數(shù),利用極大似然法求解參數(shù):
首先砍的,作者先定義協(xié)方差矩陣如下:


第二步將所有函數(shù)的似然值相乘構(gòu)造似然函數(shù):( k 為 k 個(gè)cell type痹筛,i 為 i 個(gè)location)
這里的似然函數(shù)要估計(jì) 4 個(gè)參數(shù) V,λk廓鞠,σe2帚稠,bk,并且在這個(gè)似然函數(shù)中要優(yōu)化兩個(gè)回歸:


于是聯(lián)合起來得到的似然函數(shù)床佳,這個(gè)似然函數(shù)的作用是使得誤差 Egi 和 εik 以及σe2在 0 處的似然值最大滋早,λk 感覺像是一個(gè)正則項(xiàng)
公式2


這里求解的是似然函數(shù)的最小值,作者把求解最小值的問題轉(zhuǎn)換為求解最大值:
公式3-1

公式3-2

最后利用極大似然法求出最優(yōu) V矩陣 的參數(shù)解夕土,這里的G代表基因數(shù)目馆衔,N為 location 數(shù)目

代碼分析

首先下載相關(guān)數(shù)據(jù):

  1. sc_count.RData為單細(xì)胞表達(dá)矩陣
  2. sc_meta.RData為單細(xì)胞基本信息
  3. spatial_count.RData為空間轉(zhuǎn)錄組表達(dá)矩陣
    spatial_count
    行代表基因瘟判,列代表位置信息,矩陣元素代表基因在不同 location 的表達(dá)量
  4. spatial_location.RData為空間轉(zhuǎn)錄組位置信息
    spatial_location
    x角溃,y相當(dāng)于二維坐標(biāo)拷获,空間轉(zhuǎn)錄組相當(dāng)于在一個(gè)平面上按照矩形形式標(biāo)記不同的像素點(diǎn),10×10代表平面第10行第10列的像素點(diǎn)减细,每一個(gè)像素點(diǎn)相當(dāng)于一個(gè)小的 bulk-seq
    每一個(gè)小圓點(diǎn)相當(dāng)于一個(gè)像素點(diǎn)匆瓜,稱之為一個(gè) location

示例數(shù)據(jù)給的流程如下:

# 載入數(shù)據(jù)
load("C:/Users/lenovo/Downloads/spatial_count.RData")
load("C:/Users/lenovo/Downloads/spatial_location.RData")
load("C:/Users/lenovo/Downloads/sc_count.RData")
load("C:/Users/lenovo/Downloads/sc_meta.RData")

library(CARD)

CARD_obj = createCARDObject(
  sc_count = sc_count,
  sc_meta = sc_meta,
  spatial_count = spatial_count,
  spatial_location = spatial_location,
  ct.varname = "cellType",
  ct.select = unique(sc_meta$cellType),
  sample.varname = "sampleInfo",
  minCountGene = 100,
  minCountSpot = 5) 

CARD_obj = CARD_deconvolution(CARD_object = CARD_obj)
# 細(xì)胞組分矩陣
CARD_obj@Proportion_CARD

細(xì)胞組分矩陣:CARD_obj@Proportion_CARD

1.解析createCARDObject()函數(shù)

# 載入數(shù)據(jù)
sc_count = sc_count
sc_meta = sc_meta
spatial_count = spatial_count
spatial_location = spatial_location
ct.varname = "cellType"
ct.select = unique(sc_meta$cellType)
sample.varname = "sampleInfo"
minCountGene = 100
minCountSpot = 5

# step 1 對(duì)單細(xì)胞的數(shù)據(jù)進(jìn)行質(zhì)控
sc_countMat  <- sc_count
ct.select <- as.character(ct.select[!is.na(ct.select)])
sc_eset = sc_QC(sc_countMat,sc_meta,ct.varname,ct.select,sample.varname)
#### Check the spatial count dataset
#### QC on spatial dataset
spatial_countMat <- spatial_count
commonGene = intersect(rownames(spatial_countMat),rownames(assays(sc_eset)$counts))

# step2 對(duì)空轉(zhuǎn)的數(shù)據(jù)進(jìn)行過濾
#### QC on spatial dataset
spatial_countMat = spatial_countMat[rowSums(spatial_countMat > 0) > minCountSpot,]
spatial_countMat = spatial_countMat[,(colSums(spatial_countMat) >= minCountGene & colSums(spatial_countMat) <= 1e6)]
spatial_location = spatial_location[rownames(spatial_location) %in% colnames(spatial_countMat),]
spatial_location = spatial_location[match(colnames(spatial_countMat),rownames(spatial_location)),]

object <- new(
  Class = "CARD",
 # 質(zhì)控后的單細(xì)胞不同 cell type 的基因表達(dá)矩陣
  sc_eset = sc_eset,
  spatial_countMat = spatial_countMat,
  spatial_location = spatial_location,
  project = "Deconvolution",
  info_parameters = list(ct.varname = ct.varname,ct.select = ct.select,sample.varname = sample.varname)
)
return(object)


## sc_QC 函數(shù)的作用是過濾一些低表達(dá)的基因和低質(zhì)量的細(xì)胞
sc_QC <- function(counts_in,metaData,ct.varname,ct.select,sample.varname = NULL, min.cells = 0,min.genes = 0){
# Filter based on min.features
    coldf = metaData
    counts = counts_in
    if (min.genes >= 0) {
        nfeatures <- colSums(x = counts )
        counts <- counts[, which(x = nfeatures > min.genes)]
        coldf <- coldf[which(x = nfeatures > min.genes),]
    }
    # filter genes on the number of cells expressing
    if (min.cells >= 0) {
        num.cells <- rowSums(x = counts > 0)
        counts <- counts[which(x = num.cells > min.cells), ]
    }
    fdata = as.data.frame(rownames(counts))
    rownames(fdata) = rownames(counts)
    keepCell = as.character(coldf[,ct.varname]) %in% ct.select
    counts = counts[,keepCell]
    coldf = coldf[keepCell,]
    keepGene = rowSums(counts) > 0
    fdata = as.data.frame(fdata[keepGene,])
    counts = counts[keepGene,]
    sce <- SingleCellExperiment(list(counts=counts),
    colData=as.data.frame(coldf),
    rowData=as.data.frame(fdata))
    return(sce)
}

2.解析CARD_deconvolution()函數(shù)

# 讀取createCARDObject()的結(jié)果文件
CARD_object = CARD_obj

# 獲取不同的 cellType 名稱, ct.select 為不同 cellType 的名稱
ct.select = CARD_object@info_parameters$ct.select
# ct.varname 為字符串 "cellType"
ct.varname = CARD_object@info_parameters$ct.varname
sample.varname = CARD_object@info_parameters$sample.varname

# sc_eset 為單細(xì)胞表達(dá)矩陣, 利用 counts(sc_eset) 查看表達(dá)矩陣
sc_eset = CARD_object@sc_eset

# 對(duì)單細(xì)胞表達(dá)矩陣進(jìn)行標(biāo)準(zhǔn)化
Basis_ref = createscRef(sc_eset, ct.select, ct.varname, sample.varname)

Basis = Basis_ref$basis
Basis = Basis[,colnames(Basis) %in% ct.select]
Basis = Basis[,match(ct.select,colnames(Basis))]
# 獲得空間轉(zhuǎn)錄組表達(dá)矩陣 spatial_count 
spatial_count = CARD_object@spatial_countMat
commonGene = intersect(rownames(spatial_count),rownames(Basis))
#### remove mitochondrial and ribosomal genes
#### 去除 mt DNA
commonGene  = commonGene[!(commonGene %in% commonGene[grep("mt-",commonGene)])]

common = selectInfo(Basis,sc_eset,commonGene,ct.select,ct.varname)
# 空轉(zhuǎn)表達(dá)矩陣 Xinput 
Xinput = spatial_count
# 單細(xì)胞表達(dá)矩陣 B
B = Basis

##### match the common gene names
##### 對(duì)空間表達(dá)矩陣選擇 common 的 gene 進(jìn)行后續(xù)分析
Xinput = Xinput[order(rownames(Xinput)),]
B = B[order(rownames(B)),]
B = B[rownames(B) %in% common,]
Xinput = Xinput[rownames(Xinput) %in% common,]

##### filter out non expressed genes or cells again
##### 對(duì)空間表達(dá)矩陣過濾掉沒有表達(dá)的細(xì)胞和基因
Xinput = Xinput[rowSums(Xinput) > 0,]
Xinput = Xinput[,colSums(Xinput) > 0]

##### normalize count data
##### 對(duì)空轉(zhuǎn)表達(dá)矩陣進(jìn)行標(biāo)準(zhǔn)化
colsumvec = colSums(Xinput)
### 相當(dāng)于每個(gè)基因?qū)ο鄳?yīng)位置的總測序深度做標(biāo)準(zhǔn)化
Xinput_norm = sweep(Xinput,2,colsumvec,"/")
B = B[rownames(B) %in% rownames(Xinput_norm),]    
B = B[match(rownames(Xinput_norm),rownames(B)),]

#### spatial location
#### 獲取空轉(zhuǎn)的位置信息
spatial_location = CARD_object@spatial_location
spatial_location = spatial_location[rownames(spatial_location) %in% colnames(Xinput_norm),]
spatial_location = spatial_location[match(colnames(Xinput_norm),rownames(spatial_location)),]

##### normalize the coordinates without changing the shape and relative position
### 對(duì)空轉(zhuǎn)的位置進(jìn)行標(biāo)準(zhǔn)化,轉(zhuǎn)換為相對(duì)位置
norm_cords = spatial_location[ ,c("x","y")]
norm_cords$x = norm_cords$x - min(norm_cords$x)
norm_cords$y = norm_cords$y - min(norm_cords$y)
scaleFactor = max(norm_cords$x,norm_cords$y)
norm_cords$x = norm_cords$x / scaleFactor
norm_cords$y = norm_cords$y / scaleFactor

##### initialize the proportion matrix
### 計(jì)算空轉(zhuǎn)位置間的歐式距離
ED <- rdist::rdist(as.matrix(norm_cords))##Euclidean distance matrix

set.seed(20200107)
Vint1 = as.matrix(gtools::rdirichlet(ncol(Xinput_norm), rep(10,ncol(B))))
colnames(Vint1) = colnames(B)
rownames(Vint1) = colnames(Xinput_norm)
b = rep(0,length(ct.select))

###### parameters that need to be set
isigma = 0.1 ####construct Gaussian kernel with the default scale /length parameter to be 0.1
epsilon = 1e-04  #### convergence epsion 
phi = c(0.01,0.1,0.3,0.5,0.7,0.9,0.99) #### grided values for phi

## 隨機(jī)生成 W 矩陣
kernel_mat <- exp(-ED^2 / (2 * isigma^2))
diag(kernel_mat) <- 0

###### scale the Xinput_norm and B to speed up the convergence. 
mean_X = mean(Xinput_norm)
mean_B = mean(B)
Xinput_norm = Xinput_norm * 1e-01 / mean_X
B = B * 1e-01 / mean_B
ResList = list()
Obj = c()
## 利用不同的參數(shù) phi 來估計(jì)模型
for(iphi in 1:length(phi)){
  res = CARDref(
    XinputIn = as.matrix(Xinput_norm),
    UIn = as.matrix(B),
    WIn = kernel_mat, 
    phiIn = phi[iphi],
    max_iterIn =1000,
    epsilonIn = epsilon,
    initV = Vint1,
    initb = rep(0,ncol(B)),
    initSigma_e2 = 0.1, 
    initLambda = rep(10,length(ct.select)))
  rownames(res$V) = colnames(Xinput_norm)
  colnames(res$V) = colnames(B)
  ResList[[iphi]] = res
  Obj = c(Obj,res$Obj)
}

## 選擇最優(yōu)的參數(shù)下的模型
Optimal = which(Obj == max(Obj))
Optimal = Optimal[length(Optimal)] #### just in case if there are two equal objective function values
OptimalPhi = phi[Optimal]
OptimalRes = ResList[[Optimal]]
cat(paste0("## Deconvolution Finish! ...\n"))
CARD_object@info_parameters$phi = OptimalPhi

### 獲得細(xì)胞組分矩陣 Proportion_CARD
CARD_object@Proportion_CARD = sweep(OptimalRes$V,1,rowSums(OptimalRes$V),"/")
CARD_object@algorithm_matrix = list(B = B * mean_B / 1e-01, Xinput_norm = Xinput_norm * mean_X / 1e-01, Res = OptimalRes)
CARD_object@spatial_location = spatial_location


################################## 其中 createscRef() 函數(shù)
# 讀取數(shù)據(jù)
x = sc_eset # 單細(xì)胞表達(dá)矩陣
ct.select = ct.select # 獲取不同的 cellType 名稱, ct.select 為不同 cellType 的名稱
ct.varname = ct.varname # ct.varname 為字符串 "cellType"
sample.varname = sample.varname

# 其中 createscRef() 函數(shù)
createscRef <- function(x, ct.select = NULL, ct.varname, sample.varname = NULL){
  library(MuSiC)
  if (is.null(ct.select)) {
    ct.select <- unique(colData(x)[, ct.varname])
  }
  # 去除 cellType 為 NA 的 cell Type
  ct.select <- ct.select[!is.na(ct.select)]
  # countMat <- as.matrix(assays(x)$counts)
  # 將單細(xì)胞表達(dá)矩陣取出來賦予 countMat 
  countMat <- as(SummarizedExperiment::assays(x)$counts,"sparseMatrix")
  # ct.id 的作用相當(dāng)于將每個(gè) cell 賦予對(duì)應(yīng)的 cell type
  ct.id <- droplevels(as.factor(SummarizedExperiment::colData(x)[, ct.varname]))
  #if(length(unique(colData(x)[,sample.varname])) > 1){
  if(is.null(sample.varname)){
    SummarizedExperiment::colData(x)$sampleID = "Sample"
    sample.varname = "sampleID"
  }
  sample.id <- as.character(SummarizedExperiment::colData(x)[, sample.varname])
  ct_sample.id <- paste(ct.id, sample.id, sep = "$*$")
  colSums_countMat <- colSums(countMat)
  colSums_countMat_Ct = aggregate(colSums_countMat ~ ct.id + sample.id, FUN = 'sum')
  colSums_countMat_Ct_wide = reshape(colSums_countMat_Ct, idvar = "sample.id", timevar = "ct.id", direction = "wide")
  colnames(colSums_countMat_Ct_wide) = gsub("colSums_countMat.","",colnames(colSums_countMat_Ct_wide))
  rownames(colSums_countMat_Ct_wide) = colSums_countMat_Ct_wide$sample.id
  colSums_countMat_Ct_wide$sample.id <- NULL
  tbl <- table(sample.id,ct.id)
  colSums_countMat_Ct_wide = colSums_countMat_Ct_wide[,match(colnames(tbl),colnames(colSums_countMat_Ct_wide))]
  colSums_countMat_Ct_wide = colSums_countMat_Ct_wide[match(rownames(tbl),rownames(colSums_countMat_Ct_wide)),]
  S_JK <- colSums_countMat_Ct_wide / tbl
  S_JK <- as.matrix(S_JK)
  S_JK[S_JK == 0] = NA
  S_JK[!is.finite(S_JK)] = NA
  S = colMeans(S_JK, na.rm = TRUE)
  S = S[match(unique(ct.id),names(S))]
  library("wrMisc")
  if(nrow(countMat) > 10000 & ncol(countMat) > 50000){ ### to save memory 
    seqID = seq(1,nrow(countMat),by = 10000)
    Theta_S_rowMean = NULL
    for(igs in seqID){
      if(igs != seqID[length(seqID)]){
        Theta_S_rowMean_Tmp <- rowGrpMeans(as.matrix(countMat[(igs:(igs+9999)),]), grp = ct_sample.id, na.rm = TRUE)
      }else{
        Theta_S_rowMean_Tmp <- rowGrpMeans(as.matrix(countMat[igs:nrow(countMat),]), grp = ct_sample.id, na.rm = TRUE)
        
      }
      Theta_S_rowMean <- rbind(Theta_S_rowMean,Theta_S_rowMean_Tmp)
      
    }
  }else{
    Theta_S_rowMean <- rowGrpMeans(as.matrix(countMat), grp = ct_sample.id, na.rm = TRUE)
  }
  tbl_sample = table(ct_sample.id)
  tbl_sample = tbl_sample[match(colnames(Theta_S_rowMean),names(tbl_sample))]
  Theta_S_rowSums <- sweep(Theta_S_rowMean,2,tbl_sample,"*")
  Theta_S <- sweep(Theta_S_rowSums,2,colSums(Theta_S_rowSums),"/")
  grp <- sapply(strsplit(colnames(Theta_S),split="$*$",fixed = TRUE),"[",1)
  Theta = rowGrpMeans(Theta_S, grp = grp, na.rm = TRUE)
  Theta = Theta[,match(unique(ct.id),colnames(Theta))]
  S = S[match(colnames(Theta),names(S))]
  basis = sweep(Theta,2,S,"*")
  colnames(basis) = colnames(Theta)
  rownames(basis) = rownames(Theta)
  return(list(basis = basis))
}



################################## 其中 selectInfo() 函數(shù)
selectInfo <- function(Basis,sc_eset,commonGene,ct.select,ct.varname){
#### log2 mean fold change >0.5
gene1 = lapply(ct.select,function(ict){
rest = rowMeans(Basis[,colnames(Basis) != ict])
FC = log((Basis[,ict] + 1e-06)) - log((rest + 1e-06))
rownames(Basis)[FC > 1.25 & Basis[,ict] > 0]
})
gene1 = unique(unlist(gene1))
gene1 = intersect(gene1,commonGene)
counts = assays(sc_eset)$counts
counts = counts[rownames(counts) %in% gene1,]
##### only check the cell type that contains at least 2 cells
ct.select = names(table(colData(sc_eset)[,ct.varname]))[table(colData(sc_eset)[,ct.varname]) > 1]
sd_within = sapply(ct.select,function(ict){
  temp = counts[,colData(sc_eset)[,ct.varname] == ict]
  apply(temp,1,var) / apply(temp,1,mean)
  })
##### remove the outliers that have high dispersion across cell types
gene2 = rownames(sd_within)[apply(sd_within,1,mean,na.rm = T) < quantile(apply(sd_within,1,mean,na.rm = T),prob = 0.99,na.rm = T)]
return(gene2)
}

關(guān)于 CARD_deconvolution()中的變量

  1. 有關(guān) Basis_ref$basis
    Basis_ref$basis
  2. 有關(guān) spatial_count
    spatial_count
  3. 有關(guān)標(biāo)準(zhǔn)化后的位置信息 norm_cords:
    norm_cords

關(guān)于 createscRef()中的變量

  1. 有關(guān) ct.id:
    ct.id
  2. 有關(guān) ct.select:
    ct.select

3. Cpp 函數(shù) CARDref()

#include <iostream>
#include <fstream>
#define ARMA_64BIT_WORD 1
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

#include <R.h>
#include <Rmath.h>
#include <cmath>
#include <stdio.h>
#include <stdlib.h>
#include <cstring>
#include <ctime>
#include <Rcpp.h>

// Enable C++11 via this plugin (Rcpp 0.10.3 or later)
// [[Rcpp::plugins(cpp11)]]

using namespace std;
using namespace arma;
using namespace Rcpp;

#define ARMA_DONT_PRINT_ERRORS


//*******************************************************************//
//              spatially informed deconvolution:CARD                        //
//*******************************************************************//
//' SpatialDeconv function based on Conditional Autoregressive model
//' @param XinputIn The input of normalized spatial data
//' @param UIn The input of cell type specific basis matrix B
//' @param WIn The constructed W weight matrix from Gaussian kernel
//' @param phiIn The phi value
//' @param max_iterIn Maximum iterations
//' @param epsilonIn epsilon for convergence 
//' @param initV Initial matrix of cell type compositions V
//' @param initb Initial vector of cell type specific intercept
//' @param initSigma_e2 Initial value of residual variance
//' @param initLambda Initial vector of cell type sepcific scalar. 
//'
//' @return A list
//'
//' @export
// [[Rcpp::export]]
SEXP CARDref(SEXP XinputIn, SEXP UIn, SEXP WIn, SEXP phiIn, SEXP max_iterIn, SEXP epsilonIn, SEXP initV, SEXP initb, SEXP initSigma_e2, SEXP initLambda)
{    
    try {
        // read in the data
        arma::mat Xinput = as<mat>(XinputIn);
        arma::mat U = as<mat>(UIn);
        arma::mat W = as<mat>(WIn);
        double phi = as<double>(phiIn);
        int max_iter = Rcpp::as<int>(max_iterIn);
        double epsilon = as<double>(epsilonIn);
        arma::mat V = as<mat>(initV);
        arma::vec b = as<vec>(initb);
        double sigma_e2 = as<double>(initSigma_e2);
        arma::vec lambda = as<vec>(initLambda);
        // initialize some useful items
        int nSample = (int)Xinput.n_cols; // number of spatial sample points
        int mGene = (int)Xinput.n_rows; // number of genes in spatial deconvolution
        int k = (int)U.n_cols; // number of cell type
        arma::mat L = zeros<mat>(nSample,nSample);
        arma::mat D = zeros<mat>(nSample,nSample);
        arma::mat V_old = zeros<mat>(nSample,k);
        arma::mat UtU = zeros<mat>(k,k);
        arma::mat VtV = zeros<mat>(k,k);
        arma::vec colsum_W = zeros<vec>(nSample);
        arma::mat UtX = zeros<mat>(k,nSample);
        arma::mat XtU = zeros<mat>(nSample,k);
        arma::mat UtXV = zeros<mat>(k,k);
        arma::mat temp = zeros<mat>(k,k);
        arma::mat part1 = zeros<mat>(nSample,k);
        arma::mat part2 = zeros<mat>(nSample,k);
        arma::vec updateV_k = zeros<vec>(k);
        arma::vec updateV_den_k = zeros<vec>(k);
        arma::vec vecOne = ones<vec>( nSample);
        arma::vec diag_UtU = zeros<vec>(k);
        bool logicalLogL = FALSE;
        double obj = 0;
        double obj_old = 0;
        double normNMF = 0;
        double logX = 0;
        double logV = 0;
        double alpha = 1.0;
        double beta = nSample / 2.0;
        double logSigmaL2 = 0.0;
        double accu_L = 0.0;
        double trac_xxt = accu(Xinput % Xinput);
        
        // initialize values
        // constant matrix caculations for increasing speed 
        UtX = U.t() * Xinput;
        XtU = UtX.t();
        colsum_W = sum(W,1);
        D =  diagmat(colsum_W);// diagnol matrix whose entries are column
        L = D -  phi*W; // graph laplacian
        accu_L = accu(L);
        UtXV = UtX * V;
        VtV = V.t() * V;
        UtU = U.t() * U;
        diag_UtU = UtU.diag();
        // calculate initial objective function 
        normNMF = trac_xxt - 2.0 * trace(UtXV) + trace(UtU * VtV);
        logX = -(double)(mGene * nSample) * 0.5 * log(sigma_e2) - 0.5 * (double)(normNMF / sigma_e2);
        temp = (V.t() - b * vecOne.t()) * L * (V - vecOne * b.t());
        logV = - (double)(nSample) * 0.5 * sum(log(lambda )) - 0.5 * (sum(temp.diag() / lambda )); 
        logSigmaL2 = -(alpha + 1.0) * sum(log(lambda)) - sum(beta / lambda);
        obj_old = logX + logV + logSigmaL2;
        V_old = V;
        // iteration starts
        for(int i = 1; i <= max_iter; ++i) {
            logV = 0.0;  
            b = sum(V.t() * L, 1) / accu_L;
            lambda = (temp.diag() / 2.0 + beta ) / (double(nSample) / 2.0 + alpha + 1.0);  
            part1 = sigma_e2 * (D * V + phi * colsum_W * b.t());
            part2 = sigma_e2 * (phi * W * V + colsum_W * b.t());
            for(int nCT = 0; nCT < k; ++nCT){
                updateV_den_k = lambda(nCT) * (V.col(nCT) * diag_UtU(nCT) + (V * UtU.col(nCT) - V.col(nCT) * diag_UtU(nCT))) +  part1.col(nCT);
                updateV_k = (lambda(nCT) * XtU.col(nCT) + part2.col(nCT)) / updateV_den_k;
                V.col(nCT) %= updateV_k;
            }
            UtXV = UtX * V;
            VtV = V.t() * V;
            normNMF = trac_xxt - 2.0 * trace(UtXV) + trace(UtU * VtV);
            sigma_e2 = normNMF / (double)(mGene * nSample);
            temp = (V.t() - b * vecOne.t()) * L * (V - vecOne * b.t());
            logX = -(double)(nSample * mGene) * 0.5 * log(sigma_e2) - 0.5 * (double)(normNMF / sigma_e2);
            logV = -(double)(nSample) * 0.5 * sum(log(lambda))- 0.5 * (sum(temp.diag() / lambda )); 
            logSigmaL2 = -(alpha + 1.0) * sum(log(lambda)) - sum(beta / lambda);
            obj = logX + logV + logSigmaL2;
            logicalLogL = (obj > obj_old) && (abs(obj - obj_old) * 2.0 / abs(obj + obj_old) < epsilon);
            if(isnan(obj) || (sqrt(accu((V - V_old) % (V - V_old)) / double(nSample * k))  < epsilon) || logicalLogL){
               if(i > 5){ // run at least 5 iterations 
                   break;
               }
       }else{
            obj_old = obj;
            V_old = V;
         }
       }
       return List::create(Named("V") = V,
                           Named("sigma_e2") = sigma_e2,
                           Named("lambda") = lambda,
                           Named("b") = b,
                           Named("Obj") = obj);
        }//end try 
        catch (std::exception &ex)
        {
            forward_exception_to_r(ex);
        }
        catch (...)
        {
            ::Rf_error("C++ exception (unknown reason)...");
        }
        return R_NilValue;
} // end funcs

Cpp 中的變量解釋:

  1. trac_xxt = accu(Xinput % Xinput);,與公式3-1的:
  2. 2.0 * trace(UtXV)
# 根據(jù)矩陣乘法的性質(zhì), 體現(xiàn)出加和的形式
UtX = U.t() * Xinput;
UtXV = UtX * V;
trace(UtXV);

UtX = U.t() * Xinput; UtXV = UtX * V;,代表 BTXV

  1. trace(UtU * VtV)
# 根據(jù)矩陣乘法的性質(zhì), 體現(xiàn)出加和的形式
VtV = V.t() * V;
UtU = U.t() * U; 
trace(UtU * VtV);

UtU = U.t() * U;未蝌,代表 BTB驮吱;VtV = V.t() * V;,代表 VTV

其中:

  1. 初始化各項(xiàng)矩陣:
UtX = U.t() * Xinput;
XtU = UtX.t();
colsum_W = sum(W,1);
D =  diagmat(colsum_W);// diagnol matrix whose entries are column
L = D -  phi*W; // graph laplacian
accu_L = accu(L);
UtXV = UtX * V;
VtV = V.t() * V;
UtU = U.t() * U;
diag_UtU = UtU.diag();
  1. 構(gòu)造似然函數(shù):
normNMF = trac_xxt - 2.0 * trace(UtXV) + trace(UtU * VtV);
logX = -(double)(mGene * nSample) * 0.5 * log(sigma_e2) - 0.5 * (double)(normNMF / sigma_e2);
temp = (V.t() - b * vecOne.t()) * L * (V - vecOne * b.t());
logV = -(double)(nSample) * 0.5 * sum(log(lambda )) - 0.5 * (sum(temp.diag() / lambda )); 
logSigmaL2 = -(alpha + 1.0) * sum(log(lambda)) - sum(beta / lambda);
obj_old = logX + logV + logSigmaL2;
V_old = V;
  1. normNMF = trac_xxt - 2.0 * trace(UtXV) + trace(UtU * VtV);代表公式3-1中的:
  2. logX = -(double)(mGene * nSample) * 0.5 * log(sigma_e2) - 0.5 * (double)(normNMF / sigma_e2);代表公式3-1中的(不清楚為什么這里的符號(hào)是反的):
  3. temp = (V.t() - b * vecOne.t()) * L * (V - vecOne * b.t());代表公式3-1中的:
  4. logV = - (double)(nSample) * 0.5 * sum(log(lambda )) - 0.5 * (sum(temp.diag() / lambda ));代表公式3-1中的:
  5. logSigmaL2 = -(alpha + 1.0) * sum(log(lambda)) - sum(beta / lambda);代表公式3-1中的:
  6. obj_old = logX + logV + logSigmaL2;將他們加和
  7. 迭代終止條件:
if(isnan(obj) || (sqrt(accu((V - V_old) % (V - V_old)) / double(nSample * k))  < epsilon) || logicalLogL){
     if(i > 5){ // run at least 5 iterations 
     break;
}

滿足下式小于定義的epsilon即可
  1. 每次迭代更新 V 矩陣:
for(int nCT = 0; nCT < k; ++nCT){
     // nCT 相當(dāng)于cell type k萧吠,V.col(nCT) 代表提取 V 矩陣的第 nCT 列
     updateV_den_k = lambda(nCT) * (V.col(nCT) * diag_UtU(nCT) + (V * UtU.col(nCT) - V.col(nCT) * diag_UtU(nCT))) +  part1.col(nCT);
     // 計(jì)算每一列的 updateV_k 
     updateV_k = (lambda(nCT) * XtU.col(nCT) + part2.col(nCT)) / updateV_den_k;
     // V 矩陣的每一列除以 updateV_k左冬,從而更新 V 矩陣
     V.col(nCT) %= updateV_k;
}

然后基于更新后的 V 矩陣更新 lambda :

temp = (V.t() - b * vecOne.t()) * L * (V - vecOne * b.t());
lambda = (temp.diag() / 2.0 + beta ) / (double(nSample) / 2.0 + alpha + 1.0);  

然后基于更新后的 lambda 在更新 V 矩陣:

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市纸型,隨后出現(xiàn)的幾起案子拇砰,更是在濱河造成了極大的恐慌,老刑警劉巖狰腌,帶你破解...
    沈念sama閱讀 217,277評(píng)論 6 503
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件除破,死亡現(xiàn)場離奇詭異,居然都是意外死亡琼腔,警方通過查閱死者的電腦和手機(jī)瑰枫,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,689評(píng)論 3 393
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來丹莲,“玉大人光坝,你說我怎么就攤上這事∩模” “怎么了教馆?”我有些...
    開封第一講書人閱讀 163,624評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵,是天一觀的道長擂达。 經(jīng)常有香客問我土铺,道長,這世上最難降的妖魔是什么板鬓? 我笑而不...
    開封第一講書人閱讀 58,356評(píng)論 1 293
  • 正文 為了忘掉前任悲敷,我火速辦了婚禮,結(jié)果婚禮上俭令,老公的妹妹穿的比我還像新娘后德。我一直安慰自己,他們只是感情好抄腔,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,402評(píng)論 6 392
  • 文/花漫 我一把揭開白布瓢湃。 她就那樣靜靜地躺著理张,像睡著了一般。 火紅的嫁衣襯著肌膚如雪绵患。 梳的紋絲不亂的頭發(fā)上雾叭,一...
    開封第一講書人閱讀 51,292評(píng)論 1 301
  • 那天,我揣著相機(jī)與錄音落蝙,去河邊找鬼织狐。 笑死,一個(gè)胖子當(dāng)著我的面吹牛筏勒,可吹牛的內(nèi)容都是我干的移迫。 我是一名探鬼主播,決...
    沈念sama閱讀 40,135評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼管行,長吁一口氣:“原來是場噩夢啊……” “哼厨埋!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起捐顷,我...
    開封第一講書人閱讀 38,992評(píng)論 0 275
  • 序言:老撾萬榮一對(duì)情侶失蹤揽咕,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后套菜,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,429評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡设易,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,636評(píng)論 3 334
  • 正文 我和宋清朗相戀三年逗柴,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片顿肺。...
    茶點(diǎn)故事閱讀 39,785評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡戏溺,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出屠尊,到底是詐尸還是另有隱情旷祸,我是刑警寧澤,帶...
    沈念sama閱讀 35,492評(píng)論 5 345
  • 正文 年R本政府宣布讼昆,位于F島的核電站托享,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏浸赫。R本人自食惡果不足惜闰围,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,092評(píng)論 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望既峡。 院中可真熱鬧羡榴,春花似錦、人聲如沸运敢。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,723評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至迄沫,卻和暖如春稻扬,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背邢滑。 一陣腳步聲響...
    開封第一講書人閱讀 32,858評(píng)論 1 269
  • 我被黑心中介騙來泰國打工腐螟, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人困后。 一個(gè)月前我還...
    沈念sama閱讀 47,891評(píng)論 2 370
  • 正文 我出身青樓乐纸,卻偏偏與公主長得像,于是被迫代替她去往敵國和親摇予。 傳聞我的和親對(duì)象是個(gè)殘疾皇子汽绢,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,713評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容