with pysam.AlignmentFile(bamfile_path, "rb",require_index=False) as bamfile:
# Iterate over each read
for read in bamfile:
idx += 1
gene = read.get_tag(gene_tag) if read.has_tag(gene_tag) else None
bc=read.get_tag(bc_tag) if read.has_tag(gene_tag) else None
UMI = read.get_tag(umi_tag) if read.has_tag(umi_tag) else None
for sub_prop in sub_props:
random_num = random.randint(0, 99)
if random_num < sub_prop * 100:
# Check if the tag is present in the read
if (gene is not None) and (bc is not None) and (UMI is not None):
# Add the tag value to the list
if gene not in gene_count_dict[sub_prop]:
gene_count_dict[sub_prop][gene] = {}
gene_count_dict[sub_prop][gene].setdefault(bc, set()).add(UMI)
res_cnt[sub_prop]["transcriptomic"] += 1
這里我們關(guān)注到:
if gene not in gene_count_dict[sub_prop]:
gene_count_dict[sub_prop][gene] = {}
gene_count_dict[sub_prop][gene].setdefault(bc, set()).add(UMI)
這段代碼的功能是:
檢查 gene 是否已經(jīng)在 gene_count_dict[sub_prop] 字典中础芍。如果不在酪惭,為這個(gè)基因創(chuàng)建一個(gè)新的空字典。
然后者甲,使用 .setdefault() 方法檢查 bc(條形碼)是否已經(jīng)作為鍵存在于 gene 的字典中春感。如果不存在,設(shè)置其值為一個(gè)新的空集合虏缸。
最后鲫懒,將 UMI 添加到與 bc 對(duì)應(yīng)的集合中。`