目錄
- 根據(jù)barcodes的名稱獲取數(shù)據(jù)的meta信息,或批量修改var或obs對(duì)象
- 查看是否有重名的基因葛菇,或查看基因是否存在
- 將Anndata導(dǎo)出為csv
- 刪除特定基因
- 獲取矩陣數(shù)據(jù)沒(méi)有經(jīng)過(guò)尺度歸一化的raw adata數(shù)據(jù)
- 根據(jù)obs對(duì)象的信息(聚類等)獲取barcodes index
- 根據(jù)Index獲取adata的部分
- 查看某些基因是否在同一個(gè)細(xì)胞中表達(dá)
- 刪除obs中的某一列
- 多個(gè)adata組合
1甘磨、根據(jù)barcodes的名稱獲取數(shù)據(jù)的meta信息
# 獲取barcodes名
obs_name_list = adata.obs_names.to_list()
obs_name_list
OUTS:
['AAATGCCCAATCTGCA_Ileum-1_Enterocyte',
'AACTCTTGTCTAGTCA_Ileum-1_Enterocyte',
'AAGACCTCACGGACAA_Ileum-1_Enterocyte',
'AAGCCGCGTCTTGCGG_Ileum-1_Enterocyte',
'AAGTCTGGTTGTCTTT_Ileum-1_Enterocyte',
......
]
# 根據(jù)barcodes名的組成模式,選擇分割符
obs_name_list = [i.split("_") for i in obs_name_list]
obs_name_list
OUTS:
[['AAATGCCCAATCTGCA', 'Ileum-1', 'Enterocyte'],
['AACTCTTGTCTAGTCA', 'Ileum-1', 'Enterocyte'],
['AAGACCTCACGGACAA', 'Ileum-1', 'Enterocyte'],
['AAGCCGCGTCTTGCGG', 'Ileum-1', 'Enterocyte'],
['AAGTCTGGTTGTCTTT', 'Ileum-1', 'Enterocyte'],
['ACGAGGATCGGCCGAT', 'Ileum-1', 'Enterocyte'],
['ACGGCCAGTCTAAACC', 'Ileum-1', 'Enterocyte'],
['AGGCCGTTCGAGCCCA', 'Ileum-1', 'Enterocyte'],
['AGTGAGGGTCGGCTCA', 'Ileum-1', 'Enterocyte'],
['ATGAGGGAGGATATAC', 'Ileum-1', 'Enterocyte'],
['ATGAGGGCAAGGTTCT', 'Ileum-1', 'Enterocyte'],
['ATGAGGGTCTGCCAGG', 'Ileum-1', 'Enterocyte'],
['ATTGGACGTTGAGTTC', 'Ileum-1', 'Enterocyte'],
......
]
# 獲取批次信息
batch_name_list = []
for i in obs_name_list:
# barcodes的第2個(gè)元素(i[1])對(duì)應(yīng)了器官和器官序號(hào)眯停,即batch信息
j = i[1]
batch_name_list.append(j)
# 獲取batch的list济舆,注意list一定是和barcodes的數(shù)量等長(zhǎng)
adata.obs['batch'] = batch_name_list
adata.obs
OUTS:
batch
AAATGCCCAATCTGCA_Ileum-1_Enterocyte Ileum-1
AACTCTTGTCTAGTCA_Ileum-1_Enterocyte Ileum-1
AAGACCTCACGGACAA_Ileum-1_Enterocyte Ileum-1
AAGCCGCGTCTTGCGG_Ileum-1_Enterocyte Ileum-1
AAGTCTGGTTGTCTTT_Ileum-1_Enterocyte Ileum-1
... ...
2. 查看基因在數(shù)據(jù)集中是否存在,或是否有重名的基因
(1)查看是否有重名基因
起因是在整合不同文章開(kāi)源的數(shù)據(jù)的時(shí)候發(fā)現(xiàn)一個(gè)報(bào)錯(cuò)庵朝,提示Reindexing only valid with uniquely valued Index objects(重新索引僅對(duì)唯一值索引對(duì)象有效)吗冤,這個(gè)報(bào)錯(cuò)意味著obs_names(barcodes)或者var_names(gene)存在重復(fù)的對(duì)象又厉,導(dǎo)致在整合時(shí)無(wú)法處理九府。
adata = adata_1.concatenate([adata_2,adata_3,adata_4],join='outer')
---------------------------------------------------------------------------
InvalidIndexError Traceback (most recent call last)
<ipython-input-3-17e328e1a874> in <module>
----> 1 adata = adata_1.concatenate([adata_2,adata_3,adata_4],join='outer')
/mnt/f/Linux/anaconda/envs/pytorch/lib/python3.7/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
1705 fill_value=fill_value,
1706 index_unique=index_unique,
-> 1707 pairwise=False,
1708 )
1709
/mnt/f/Linux/anaconda/envs/pytorch/lib/python3.7/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
800 )
801 reindexers = [
--> 802 gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
803 ]
804
/mnt/f/Linux/anaconda/envs/pytorch/lib/python3.7/site-packages/anndata/_core/merge.py in <listcomp>(.0)
800 )
801 reindexers = [
--> 802 gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
803 ]
804
/mnt/f/Linux/anaconda/envs/pytorch/lib/python3.7/site-packages/anndata/_core/merge.py in gen_reindexer(new_var, cur_var)
393 [1., 0., 0.]], dtype=float32)
394 """
--> 395 return Reindexer(cur_var, new_var)
396
397
/mnt/f/Linux/anaconda/envs/pytorch/lib/python3.7/site-packages/anndata/_core/merge.py in __init__(self, old_idx, new_idx)
265 self.no_change = new_idx.equals(old_idx)
266
--> 267 new_pos = new_idx.get_indexer(old_idx)
268 old_pos = np.arange(len(new_pos))
269
/mnt/f/Linux/anaconda/envs/pytorch/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
2985 if not self.is_unique:
2986 raise InvalidIndexError(
-> 2987 "Reindexing only valid with uniquely valued Index objects"
2988 )
2989
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
- 首先,嘗試對(duì)不同adata的組合進(jìn)行整合覆致,判斷出現(xiàn)重復(fù)的adata侄旬。發(fā)現(xiàn)adata_3存在時(shí)會(huì)報(bào)錯(cuò),其他adata都可以整合煌妈。
- 其次儡羔,排除barcodes的重復(fù),對(duì)adata_3的barcodes進(jìn)行重命名技即,在重命名的過(guò)程中發(fā)現(xiàn)墅冷,barcodes被提示最好使用符號(hào)型元素而不是數(shù)值型元素廓脆。重命名方法如下:
# 獲取barcodes列表,以便于循環(huán)
obs_name_list = adata_3.obs_names.to_list()
n = 'aaa'
m = 0
re_name_list = []
for i in obs_name_list:
j = n + '%d' %m
m = m+1
re_name_list.append(j)
adata_3.obs_names = re_name_list
重命名保證barcodes不會(huì)重復(fù)后族操,再次嘗試整合還是出現(xiàn)相同報(bào)錯(cuò),表明問(wèn)題出在var_names上比被。下面查找重復(fù)基因色难。
# 查找重復(fù)基因名,發(fā)現(xiàn)一個(gè)重復(fù)基因名
adata_3.var[adata_3.var.index.duplicated()]
OUTS:
n_cells
NAA38 1522
# 定位重復(fù)基因等缀,該基因重復(fù)了兩次
adata_3.var.loc['NAA38']
OUTS:
n_cells
NAA38 523
NAA38 1522
# 刪除重復(fù)基因(刪除的前提是得先確認(rèn)這個(gè)基因?qū)ρ芯康囊饬x不大枷莉,也不是什么特別的細(xì)胞群的marker基因)
# 剔除NAA38基因
non_NAA38_genes_list = [name for name in adata.var_names if not name.startswith('NAA38')]
adata_3 = adata_3[:, non_NAA38_genes_list]
重新測(cè)試數(shù)據(jù)整合,發(fā)現(xiàn)可以整合了尺迂。
(2)查看基因'obj'是否存在笤妙,也可以用來(lái)判斷該基因是否重名
list.count() 方法用來(lái)統(tǒng)計(jì)某個(gè)元素在列表中出現(xiàn)的次數(shù)冒掌,基本語(yǔ)法格式為:
# listname 代表列表名,obj 表示要統(tǒng)計(jì)的元素
# 如果 count() 返回 0蹲盘,就表示列表中不存在該元素宋渔,所以 count() 也可以用來(lái)判斷列表中的某個(gè)元素是否存在
listname.count(obj)
應(yīng)用在adata里:
# 測(cè)試TPH1,返回1表示該基因存在辜限,且只有唯一的基因?yàn)樵摶蛎?# 若返回值為大于1的值皇拣,表明該基因出現(xiàn)了重名。
adata.var_names.to_list().count('TPH1')
OUTS:
1