2-mers
Here are the frequencies of 2-mers in the human genome (hg19).
(obtained using the count-words program of RSAT)
One line stands out due to a historic accumulation of certain mutations, called CG suppression.
seq identifier observed_freq occ
aa aa 0.0977693510124 279490734
ac ac 0.0503391220503 143903156
ag ag 0.0699208325000 199880889
at at 0.0772705679279 220891389
ca ca 0.0725344058342 207352244
cc cc 0.0520831825569 148888857
cg cg 0.0098517609035 28162976
ct ct 0.0699588753085 199989641
ga ga 0.0593289285247 169602085
gc gc 0.0426523912572 121929296
gg gg 0.0521099551064 148965391
gt gt 0.0504530230976 144228762
ta ta 0.0656671783246 187721077
tc tc 0.0593535812105 169672559
tg tg 0.0726616984034 207716132
tt tt 0.0980451459821 280279142
leave a comment