Ontology
概念: 個人理解就是生物信息學(xué)界對生物上的一些重要信息名眉,如序列和基因信息 制定一個通用的標(biāo)準(zhǔn)化協(xié)議,對一些概念污尉、信息進行精準(zhǔn)的定義关面。而非復(fù)雜模棱兩可的解釋。
在文章 The Sequence Ontology: a tool for the unification of genome annotations十厢,作者這樣強調(diào)一致性consistency的重要性:
Unfortunately, biological terminology is notoriously ambiguous; the same word is often used to describe more than one thing and there are many dialects. For example, does a coding sequence (CDS) contain the stop codon or is the stop codon part of the 3'-untranslated region (3' UTR)?
There really is no right or wrong answer to such questions, but consistency is crucial when attempting to compare annotations from different sources, or even when comparing annotations performed by the same group over an extended period of time.
- Ontology主要包括兩個部分
- what a piece of DNA is: annotations or classification:注釋等太,分類。
- what a piece of DNA does: functional analyses.
Sequence Ontology 序列本體論
某一段序列的注釋蛮放,分類缩抡,genetic features。
在Sequence Ontology Browser有對序列信息進行詳細分類包颁、定義瞻想。
例如,對于CDS來說娩嚼,準(zhǔn)確的定義是
A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon.
可以對Sequence ontology信息下載進行一些探索
URL=https://raw.githubusercontent.com/The-Sequence-Ontology/SO-Ontologies/master/so-simple.obo
wget $URL
cat so-simple.obo | grep 'name: gene$' -B 1 -A 6
cat so-simple.obo | grep 'PCR' -B 2 -A 2
基因本體論 Gene Ontology
對基因的功能進行注釋蘑险,分類。對gene products分類岳悟,每一個基因可能含有多個功能信息佃迄。
兩個重要網(wǎng)站Gene Ontology, Quick GO
GO主要包括3個子類:
- Cellular component (CC)細胞組分:基因產(chǎn)物的定位泼差,如細胞核、線粒體基質(zhì)
- Molecular function (MF)分子功能:元件的活性呵俏,如催化活性堆缘、結(jié)合活性
- Biological process (BP)生物學(xué)過程:某些代謝從開始到終止的過程,如嘧啶代謝普碎、配糖基的運輸?shù)取?/li>
對GO數(shù)據(jù)的一些探索:
wget http://geneontology.org/gene-associations/goa_human.gaf.gz
grep -v ! goa_human.gaf |cut -f 2|sort |uniq -c \
sort -k1nr |less -S
grep -v ! goa_human.gaf \
|cut -f 14 \
|perl -alne 'print substr($_,0,4)' \
|sort |uniq -c \
|sort -k2nr \
|perl -alne 'print"$F[1]\t$F[0]"'
生信數(shù)據(jù)的功能分析
對于生物數(shù)據(jù)的處理吼肥,科學(xué)家希望能將其從生物的角度做出合理的解釋。
當(dāng)你得到一堆基因或蛋白之后(基因/序列)接下來就可以用通路分析(pathway analysis)或者叫功能分析(functional analysis)
功能通路分析functional pathway analysis主要包括三個層次:
-
過表征分析Over-Representation Analysis
看某功能是否有更加明顯的趨勢麻车;ORA attempts to find representative functions of a list of genes by comparing the number of times a function is observed to a baseline.
-
Functional Class Scoring (FCS算法)
強調(diào)非單個基因的顯著影響缀皱,而是那些功能相關(guān)的類似微效基因累加后其代表的功能通路也有顯著的效果。
FCS methods use this information to detect coordinated changes in the expression of genes in the same pathway. Finally, by considering the coordinated changes in gene expression, FCS methods account for dependence between genes in a pathway, which ORA does not.基本步驟包括:1. 單個基因的基因水平的統(tǒng)計值动猬;2. 同一通路上所有基因的基因水平的統(tǒng)計值 聚合成單個通路水平的統(tǒng)計值 3. 評估通路水平的統(tǒng)計顯著值啤斗。
-
Pathway Topology (PT)通路拓撲學(xué)
基于通路拓撲學(xué)的方法,需要用到給定通路互作的信息枣察。