Pileup format is first used by Tony Cox and Zemin Ning at the Sanger Institute. It desribes the base-pair information at each chromosomal position. This format facilitates SNP/indel calling and brief alignment viewing by eyes.
Pileup 格式是桑格中心(Tony Cox and Zemin Ning)提出,描述可用肉眼观察的某一个区域所有reads匹配的情况。
The pileup format has several variants. The default output by SAMtools looks like this:
seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<
seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
where each line consists of
. stands for a match to the reference base on the forward strand
代表匹配到正链, for a match on the reverse strand
代表匹配到负链ACGTN for a mismatch on the forward strand
大写的ACGTN代表与reference的正向链上不同的实际碱基的5种情况acgtn for a mismatch on the reverse strand
小写的acgtn代表与reference的反向链上不同的实际碱基的5种情况A pattern \+[0-9]+[ACGTNacgtn]+ indicates there is an insertion between this reference position and the next reference position. The length of the insertion is given by the integer in the pattern, followed by the inserted sequence.
seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<中的+2AG有3处,代表有3个read上有AG的2个bp的插入Similarly, a pattern `-[0-9]+[ACGTNacgtn]+' represents a deletion from the reference.
seq3 200 A 20 ,,,,,..,.-4CACC.-4CACC....,.,,.^~. ==<<<<<<<<<<<::<;2<<同理,此处的-4CACC有2处,代表有2个read上有CACC的4个bp的缺失^ marks the start of a read segment which is a contiguous subsequence on the read separated by N/S/H CIGAR operations.^代表刚好是read的开头^ minus 33 gives the mapping quality.
^后面跟着的符号表示比对的质量(ASCII码减33)$ marks the end of a read segment.
$代表刚好是read的结尾Pileup format is first used by Tony Cox and Zemin Ning at the Sanger Institute. It desribes the base-pair information at each chromosomal position. This format facilitates SNP/indel calling and brief alignment viewing by eyes.
Pileup 格式是桑格中心(Tony Cox and Zemin Ning)提出,描述可用肉眼观察的某一个区域所有reads匹配的情况。
The pileup format has several variants. The default output by SAMtools looks like this:
seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<
seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
where each line consists of
. stands for a match to the reference base on the forward strand
代表匹配到正链, for a match on the reverse strand
代表匹配到负链ACGTN for a mismatch on the forward strand
大写的ACGTN代表与reference的正向链上不同的实际碱基的5种情况acgtn for a mismatch on the reverse strand
小写的acgtn代表与reference的反向链上不同的实际碱基的5种情况A pattern \+[0-9]+[ACGTNacgtn]+ indicates there is an insertion between this reference position and the next reference position. The length of the insertion is given by the integer in the pattern, followed by the inserted sequence.
seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<中的+2AG有3处,代表有3个read上有AG的2个bp的插入Similarly, a pattern `-[0-9]+[ACGTNacgtn]+' represents a deletion from the reference.
seq3 200 A 20 ,,,,,..,.-4CACC.-4CACC....,.,,.^~. ==<<<<<<<<<<<::<;2<<同理,此处的-4CACC有2处,代表有2个read上有CACC的4个bp的缺失^ marks the start of a read segment which is a contiguous subsequence on the read separated by N/S/H CIGAR operations.^代表刚好是read的开头^ minus 33 gives the mapping quality.
^后面跟着的符号表示比对的质量(ASCII码减33)$ marks the end of a read segment.
$代表刚好是read的结尾