某些翻譯軟件可能翻譯為床文件,其實(shí)是叫Browser Extensible Data啡彬,直譯就是瀏覽器拓展數(shù)據(jù)胖缤。
三個必須的列:The first three required BED fields are:
chrom?- The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
chromStart?- The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
chromEnd?- The ending position of the feature in the chromosome or scaffold. The?chromEnd?base is not included in the display of the feature, however, the number in?position format?will be represented. For example, the first 100 bases of chromosome 1 are defined as?chrom=1, chromStart=0, chromEnd=100, and span the bases numbered 0-99 in our software (not 0-100), but will represent the position notation chr1:1-100. Read more?here.
關(guān)于染色體的起始位置和終止位置的坐標(biāo)計(jì)數(shù),The UCSC Genome Browser Coordinate Counting Systems有更多的解釋淤刃。畫的圖很容易懂摸恍,用手指就可以領(lǐng)會砸逊。
If you submit data to the browser in position format (chr#:##-##), the browser assumes this information is 1-based. If you submit data in any other format (BED (chr# ## ##) or otherwise), the browser will assume it is 0-based.??Similarly, any data returned by the browser in position format is 1-based, while data returned in BED format is 0-based.至于為什么要有這兩種文件格式璧南,留坑,我也不知道师逸。
The 9 additional optional BED fields are:
name?- Defines the name of the BED line. This label is displayed to the left of the BED line in the Genome Browser window when the track is open to full display mode or directly to the left of the item in pack mode.
score?- A score between 0 and 1000. If the track line?useScore?attribute is set to 1 for this annotation data set, the?score?value will determine the level of gray in which this feature is displayed (higher numbers = darker gray).
strand?- Defines the strand. Either "." (=no strand) or "+" or "-".
thickStart?- The starting position at which the feature is drawn thickly (for example, the start codon in gene displays). When there is no thick part, thickStart and thickEnd are usually set to the chromStart position.
thickEnd?- The ending position at which the feature is drawn thickly (for example the stop codon in gene displays).
itemRgb?- An RGB value of the form R,G,B (e.g. 255,0,0). If the track line?itemRgb?attribute is set to "On", this RBG value will determine the display color of the data contained in this BED line. NOTE: It is recommended that a simple color scheme (eight colors or less) be used with this attribute to avoid overwhelming the color resources of the Genome Browser and your Internet browser.
blockCount?- The number of blocks (exons) in the BED line.
blockSizes?- A comma-separated list of the block sizes. The number of items in this list should correspond to?blockCount.
blockStarts?- A comma-separated list of block starts. All of the?blockStart?positions should be calculated relative to?chromStart. The number of items in this list should correspond to?blockCount.
name- BED行名司倚,在基因組瀏覽器左邊顯示;
score- 在基因組瀏覽器中顯示的灰度設(shè)定篓像,值介于0-1000动知;
strand- 正負(fù)鏈標(biāo)記. Either "." (=no strand) or "+" or "-".
thickStart- feature起始位置(for example, the start codon in gene displays)。 When there is no thick part, thickStart and thickEnd are usually set to the chromStart position.
thickEnd-? feature編碼終止位置 (for example the stop codon in gene displays).
itemRgb- R,G,B (e.g. 255,0,0)值员辩,當(dāng)itemRgb設(shè)置為 "On"盒粮,BED的行會顯示顏色.
blockCount- blocks (exons外顯子)數(shù)目.
blockSizes- blocks (exons)大小列表,逗號分隔奠滑,對應(yīng)于blockCount.
blockStarts-blocks (exons)起始位置列表丹皱,逗號分隔,對應(yīng)于blockCount.宋税;這個起始位置是與chromStart的一個相對位置摊崭。
2.BED detail format
包含BED格式文件的4-12列,此外還有ID和a description of the item乏屯。
track name=HbVar type=bedDetail description="HbVar custom track" db=hg19 visibility=3 url="$$"chr11 5246919 5246920 Hb_North_York 2619 Hemoglobin variantchr11 5255660 5255661 HBD c.1 G>A 2659 delta0 thalassemiachr11 5247945 5247946 Hb Sheffield 2672 Hemoglobin variantchr11 5255415 5255416 Hb A2-Lyon 2676 Hemoglobin variantchr11 5248234 5248235 Hb Aix-les-Bains 2677 Hemoglobin variant
3.BedGraph Track Format
track line attribute=value pairs
track lines define the display attributes for all lines in an annotation data set.
track line定義了注釋文件集的展示屬性。
Following the track definition line are the track data in four column BED format:
The chromosome coordinates are?zero-based, half-open.
BAM is the compressed binary version of the?Sequence Alignment/Map (SAM)?format, a compact and index-able representation of nucleotide sequence alignments. Many?next-generation sequencing and analysis tools?work with SAM/BAM.?
SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments.
(1)Convert SAM to BAM using the samtools program:#將SAM文件轉(zhuǎn)換為BAM文件弹囚。
? ? samtools view -S -b -o my.bam my.sam
If converting a SAM file that does not have a proper header, the -t or -T option is necessary. For more information about the command, run samtools view with no other arguments.
(2)Sort and create an index for the BAM:排序并且建立索引
? ? samtools sort my.bam my.sorted
? ? samtools index my.sorted.bam
The sort command appends .bam to my.sorted, creating a BAM file of alignments ordered by leftmost position on the reference assembly.
The index command generates a new file, my.sorted.bam.bai, with which genomic coordinates can quickly be translated into file offsets in my.sorted.bam.有了這個.BAI為后綴的文件基因組坐標(biāo)可以快速在BAM文件中轉(zhuǎn)換為文件偏移量。(看不太懂)