現(xiàn)UCSC xena已經(jīng)將TCGA數(shù)據(jù)匯總整理得很好了轮蜕,連表達(dá)矩陣都已轉(zhuǎn)換完成朽合。
但如果有心就會發(fā)現(xiàn)做入,UCSC上的RNAseq數(shù)據(jù)有3個下載鏈接缭嫡,以下將以
cohort: TCGA Breast Cancer (BRCA)為例做一整理說明:
https://xenabrowser.net/datapages/?cohort=TCGA%20Breast%20Cancer%20(BRCA)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443
在gene expression RNAseq下:
1:<abbr style="border-width: 0px 0px 1px; border-top-style: initial; border-right-style: initial; border-bottom-style: dotted; border-left-style: initial; border-top-color: initial; border-right-color: initial; border-bottom-color: rgb(102, 102, 102); border-left-color: initial; border-image: initial; font-family: inherit; font-size: 15px; font-style: inherit; font-weight: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline; cursor: help;">IlluminaHiSeq</abbr>* (n=1,218) TCGA hub:由Illumina HiSeq 2000 RNA 測序平臺完成,該數(shù)據(jù)集都已經(jīng)過 log2(x+1)轉(zhuǎn)換惜颇,其中x是RSEM值皆刺。raw_count是某個轉(zhuǎn)錄本/基因的測到的原始reads條數(shù),normalized_count是經(jīng)過標(biāo)準(zhǔn)化的數(shù)據(jù)量。做差異分析就是用normalized_count的值來做的官还。先根據(jù)count芹橡,利用rsem軟件來計算表達(dá)量,然后根據(jù)表達(dá)量進(jìn)行表達(dá)差異分析望伦。
Gene expression RNAseq (IlluminaHiSeq pancan normalize
<abbr style="border-width: 0px 0px 1px; border-top-style: initial; border-right-style: initial; border-bottom-style: dotted; border-left-style: initial; border-top-color: initial; border-right-color: initial; border-bottom-color: rgb(102, 102, 102); border-left-color: initial; border-image: initial; font-family: inherit; font-size: 15px; font-style: inherit; font-weight: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline; cursor: help;">IlluminaHiSeq pancan normalized</abbr> (n=1,218) TCGA hubd):如果分析時同時使用了其它類型腫瘤的數(shù)據(jù)林说,建議使用該數(shù)據(jù)煎殷,即在不同腫瘤間對數(shù)據(jù)做了處理。因為TCGA提供30-40種RNAseq,這樣TCGA可以作為各種腫瘤研究的大背景腿箩。Gene expression RNAseq (IlluminaHiSeq percentile)
<abbr style="border-width: 0px 0px 1px; border-top-style: initial; border-right-style: initial; border-bottom-style: dotted; border-left-style: initial; border-top-color: initial; border-right-color: initial; border-bottom-color: rgb(102, 102, 102); border-left-color: initial; border-image: initial; font-family: inherit; font-size: 15px; font-style: inherit; font-weight: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline; cursor: help;">IlluminaHiSeq percentile</abbr> (n=1,218) TCGA hub:如果需要與TCGA以外的數(shù)據(jù)進(jìn)行比較豪直,且外部數(shù)據(jù)也percentile rank進(jìn)行處理,可選擇該數(shù)據(jù)珠移。
這些值percentile ranks ranges為 0 – 100之間, 值越小表示表達(dá)越低. 大家可以結(jié)合 TCGA RNAseq 數(shù)據(jù) 與自己的RNAseq data, perform normalization across the combined dataset using whatever method you choose, then analyze the combined dataset further.可自行選擇合適的方法進(jìn)行標(biāo)準(zhǔn)化弓乙,然后進(jìn)行進(jìn)一步的分析。TCGA Pan-Cancer gene expression
For comparison across multiple or all TCGA cohorts. Dataset is generated at UCSC by combining “gene expression RNAseq (IlluminaHiSeq) data” (see above) from all TCGA cohorts. No further normalization is performed钧惧。(具體使用待查)暇韧。
** TCGA下載文件中都代表哪些值?**
| Example filename | Values in file |
| TCGA_KIRC_exp_HiSeqV2 | Log2(x+1), x is the RSEM value |
| TCGA_KIRC_exp_HiSeqV2_PANCAN | Log2(x+1) value mean-normalized per-gene across all TCGA samples, extracted converted values only belong to this cohort. x is the RSEM value |
| TCGA_KIRC_exp_HiSeqV2_percentile | Percentile ranking of RSEM value per sample, values range from 0 to 100, lower values representing lower expression |
| TCGA_KIRC_gistic2 | Gistic2 value from Broad Firehose |
| TCGA_KIRC_gistic2thd | Gistic2 value discretized to -2,-1,0,1,2 by Broad Firehose |
| TCGA_KIRC_hMethyl27 | beta values |
| TCGA_KIRC_hMethyl450 | beta values |
| TCGA_KIRC_miRNA | Log2(x+1), x is RPKM value |
| TCGA_KIRC_mutation | PANCAN AWG somatic mutation calls |
| TCGA_KIRC_PDMRNAseq | Pathway inference score derived using RNAseq data alone (generated at Firehose) |
| TCGA_KIRC_PDMRNAseqCNV | Pathway inference score derived using RNAseq and copy number data (generated at Firehose) |
| TCGA_KIRC_RPPA | RPPA value |
| TCGA_KIRC_RPPA_RBN | RBN-normalized RPP |
廣而告之
說一個事浓瞪,鑒于簡書平臺在信息傳播方面有不足之處懈玻,應(yīng)粉絲要求,白介素2的個人微信平臺已經(jīng)開啟乾颁,繼續(xù)聊臨床與科研的故事涂乌,R語言,數(shù)據(jù)挖掘英岭,文獻(xiàn)閱讀等內(nèi)容湾盒。當(dāng)然也不要期望過高,微信平臺目前的定位是作為自己的讀書筆記诅妹,如果對大家有幫助最好罚勾。如果感興趣, 可以掃碼關(guān)注下漾唉。