2-mers

Here are the frequencies of 2-mers in the human genome (hg19).

(obtained using the count-words program of RSAT)

One line stands out due to a historic accumulation of certain mutations, called CG suppression.

seq identifier observed_freq occ
aa aa 0.0977693510124 279490734
ac ac 0.0503391220503 143903156
ag ag 0.0699208325000 199880889
at at 0.0772705679279 220891389
ca ca 0.0725344058342 207352244
cc cc 0.0520831825569 148888857
cg cg 0.0098517609035 28162976
ct ct 0.0699588753085 199989641
ga ga 0.0593289285247 169602085
gc gc 0.0426523912572 121929296
gg gg 0.0521099551064 148965391
gt gt 0.0504530230976 144228762
ta ta 0.0656671783246 187721077
tc tc 0.0593535812105 169672559
tg tg 0.0726616984034 207716132
tt tt 0.0980451459821 280279142

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s