1. 使用CD-Search工具來可以鑒定蛋白質(zhì)或者核酸序列內(nèi)的保守結(jié)構(gòu)域或功能單位响疚。該工具位于NCBI中国撵。具體我們可以進(jìn)入NCBI后選擇Conserved Domain然后點(diǎn)擊Search。
出現(xiàn)如下界面成黄,黃色部分即是本次要講的工具呐芥。其中CD-search只能提交單條序列,Batch CD-Search可以上傳多條序列奋岁。
2. 我們先以CD-Search為例思瘟,預(yù)測單條序列的結(jié)構(gòu)域。
點(diǎn)擊上圖中的CD-Search厦取,輸入蛋白質(zhì)/核酸查詢序列潮太,可以是FASTA格式的序列數(shù)據(jù),或者輸入GI或Accession號虾攻,同時在右方OPTIONS中選擇要搜索的數(shù)據(jù)庫铡买,Expect Value等,或者使用默認(rèn)設(shè)置霎箍,然后按“提交”按鈕奇钞。
3. 一分鐘后,運(yùn)行結(jié)果產(chǎn)生漂坏,見下圖景埃。搜索結(jié)果將顯示在默認(rèn)條件下使用簡要顯示模式(圖中右上方View可以下拉選擇其他模式),該模式僅顯示查詢序列最高得分的區(qū)域顶别。如果您想查看所有匹配的區(qū)域谷徙,請在View中更改為完整顯示。
搜索結(jié)果中有四種類型的匹配:特定匹配(specific hits)驯绎,非特定匹配(non-specific hits)完慧,這些匹配所屬的超家族(superfamily),以及多結(jié)構(gòu)域(multi-domains)剩失。保守特征/位點(diǎn)的氨基酸用小三角形標(biāo)識屈尼,這些位點(diǎn)可能為催化位點(diǎn)或者結(jié)合位點(diǎn)等。具體參見上圖中的注解拴孤。
如果CD-Search發(fā)現(xiàn)特定匹配脾歧,則查詢序列與命中的保守結(jié)構(gòu)域之間的關(guān)聯(lián)具有高置信度,進(jìn)推斷查詢序列的功能也是高可信的演熟。其他類型的匹配也可以揭示查詢蛋白的假定功能鞭执,其可信度由E值來評價。
4.批量遞交。點(diǎn)擊Batch CD-Search兄纺,如下:
可以選擇文件來上上傳免猾,文件序列數(shù)目不超過4000條。其他選項選擇后囤热,填入郵箱,程序運(yùn)行完會將結(jié)果發(fā)送郵件获三。
以下為結(jié)果頁面旁蔼,可以點(diǎn)擊Download下載:
其中對結(jié)果解釋如下:
| Query | 你輸入的序列ID |
| Hit type | CD-Search results can include hit types that represent various confidence levels (specific hits, non-specific hits) and domain model scope (superfamilies, multi-domains). They can be seen in both the Concise display and Full display, except for non-specific hits, which are shown only in the Full Display. |
| PSSM-ID | A PSSM ID is the unique identifier for a domain model's position-specific scoring matrix (PSSM). |
| From..To | The range of amino acids in the query protein sequence to which the domain model aligns. (Note: If the alignment found by RPS-BLAST omitted more than 20% of the CD's extent at either the n- or c-terminus or both, the partial nature of the hit is indicated in the "Incomplete" column of the hit table. Partial hits can also be spotted in the graphical display as domain model cartoons with jagged edges (illustrated example).) |
| E-value | The expect value, or E-value, indicates the statistical significance of the hit as the likelihood the hit was found by chance. |
| Bit Score | 比對得分分 |
| Accession | The accession number of the hit, which can either be a domain model or a superfamily cluster. (If the hit is a domain model, then the accession number (cl) of the superfamily cluster to which it belongs is listed in the "Superfamily" column of the output file.)* |
| Short name | The short name of a conserved domain, which concisely defines the domain. For example, "Voltage gated ClC" is the short title of the NCBI-curated conserved domain model for the voltage gated chloride channel (cd00400). |
| Incomplete | If the hit to a conserved domain is partial (i.e., if the alignment found by RPS-BLAST omitted more than 20% of the CD's extent at either the n- or c-terminus or both), this column will be populated with one of the following values:
N: incomplete at the N-terminus
C: incomplete at the C-terminus
NC: incomplete at both the N-terminus and C-terminus
If the hit to a conserved domain is complete, then this column will be populated with a dash (-).
(Note: Partial hits can also be spotted in the graphical display as domain model cartoons with jagged edges (illustrated example).) |
| Superfamily | This column is populated only for domain models that are specific or non-specific hits, and it lists the accession number of the superfamily to which the domain model belongs.
(If the hit is to a superfamily itself, then this column is simply populated with a dash because the superfamily accession is already listed in the preceding "Accession" column.) |