參考生信技能樹(shù)教程https://mp.weixin.qq.com/s/i6_x1yeMbXawfKm36ewnKQ
對(duì)原鏈接中混合使用shell和py腳本的方法進(jìn)行改進(jìn),避免了不必要的錯(cuò)誤,運(yùn)行更高效
注意修改路徑链韭,讀取路徑為cellranger對(duì)各個(gè)樣本的輸出文件路徑
import scrublet as scr
import scipy.io
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import sys
import os, sys
os.chdir('/biodata_01_4T/scRNA-seq_raw_data/Esophagus/PRJNA777911/result/')
file_to_search = "/biodata_01_4T/scRNA-seq_raw_data/Esophagus/PRJNA777911/result/"
dirlist=[]
for filename in os.listdir(file_to_search):
if os.path.isdir(filename) == True:
dirlist.append(filename)
print(dirlist)
path="/biodata_01_4T/scRNA-seq_raw_data/Esophagus/PRJNA777911/result/"
for i in dirlist:
input_dir = path + i
counts_matrix = scipy.io.mmread(input_dir + '/matrix.mtx.gz').T.tocsc()
counts_matrix
out_df = pd.read_csv(input_dir + '/barcodes.tsv.gz', header=None, index_col=None, names=['barcode'])
out_df
scrub = scr.Scrublet(counts_matrix, expected_doublet_rate=0.06)
doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85,
n_prin_comps=30)
# doublets占比
print(scrub.detected_doublet_rate_)
out_df['doublet_scores'] = doublet_scores
out_df['predicted_doublets'] = predicted_doublets
out_df.to_csv('/biodata_01_4T/scRNA-seq_raw_data/Esophagus/PRJNA777911/scrublet_result/' + i + 'doublet.txt',
index=False, header=True)
# out_df.head()
print(out_df["predicted_doublets"].value_counts())
本方法經(jīng)過(guò)比對(duì)與DoubletFinder R包的結(jié)果有較高的一致性,運(yùn)行速度提升巨大