Salmon vs kallisto on RNA-seq benchmarking datasets

As part of a presentation on bias modeling for RNA-seq data, I put together a Shiny app that allows one to visually explore the differences between Salmon and kallisto on all samples for the RNA-seq datasets GEUVADIS and SEQC, the former a large RNA-seq project with annotated batches and sequencing centers and the latter a large benchmarking RNA-seq project. Running the two programs on these real, large datasets reveals many differences in transcript abundance quantification, in particular stability and reliability of abundance estimates across sequencing center.

For all datasets, we see that Salmon has more consistent estimation across the labs performing the experiment, as described in detail in the Salmon paper. You can click on the overview plots to explore the estimated counts by different methods for individual genes, where often the inconsistencies are driven by mis-estimation of which isoform is expressed within a gene.

  • The Salmon kallisto diffs GitHub repository where you can find the Shiny app comparing Salmon and kallisto abundance estimates on these benchmarking datasets.

Example images (link to presentation slides):

DESeq2 or edgeR

I was asked recently on Twitter about my thoughts on DESeq2 and edgeR and which tool is better. My reply was:

But then I was asked to elaborate more so here goes:

TL/DR summary: I think DESeq2 and edgeR are both good methods for gene-level DE, as is limma-voom.

Aside: The bootstrapping idea within the sleuth method seems especially promising for transcript-level DE, but I have only kept up with the ideas and haven’t done any testing or benchmarking with sleuth yet, so I won’t include it in this post.

History: When we started working on DESeq2 in Fall 2012, one of the main differences we were focusing on was a methodological detail* of the dispersion shrinkage steps. Aside from this difference, we wanted to update DESeq to use the GLM framework and to shrink dispersion estimates toward a central value as in edgeR, as opposed to the maximum rule that was previously implemented in DESeq (which tends to overestimate dispersion).

I would say that the difference in dispersion shrinkage didn’t make a huge difference in performance compared to edgeR, as can be seen in real and simulated** data benchmarks in our DESeq2 paper published in 2014. From what I’ve seen in my own testing and numerous third-party benchmarks, the two methods typically report overlapping sets of genes, and have similar performance for gene-level DE testing.

Continue reading “DESeq2 or edgeR”

RNA-seq fragment sequence bias

Our paper was just published describing a new method for modeling and correcting fragment sequence bias for estimation of transcript abundances from RNA-seq:

Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation


Q: What’s the point of the paper?

GC content bias – affecting the amplification of fragments – is widespread in sequencing datasets and well-known, yet there were no existing transcript abundance estimation methods that properly corrected for this sample-specific bias. We identified hundreds of cases where top methods mis-identified the dominant isoform due to fragment-level GC bias being left uncorrected. While there are existing methods for post-hoc correction of gene abundance estimates using gene-level GC content, the transcript-level bias correction task is much more difficult. Sample-specific technical variation in coverage on small regions of transcripts leads to dramatic shifts in abundance estimates.

Q: What do you mean by “systematic errors”?

You can split any errors into systematic and stochastic components, with the former a fixed quantity and the latter varying but with zero mean. With the measurement of transcript expression, the stochastic part can be minimized by, for example, increasing sequencing depth and increasing the sample size: the number of biological units measured to infer a population mean expression value. We used the term “systematic” in the title to underscore that higher sequencing depth and more samples will not help to remove the bias we describe.

Q: Is there a video where you explain the gist of this?

Yes. Below is a 5 minute video where I cover the basic ideas presented in the paper.

Q: When should I correct for fragment GC bias? Are all my results wrong?

There are two situations where it’s critical to correct for GC bias for transcript-level analysis:

(1) The most problematic situation is when one is comparing abundances across groups of samples, and the groups have variable GC content dependence. This often happens when samples are processed in different labs or at different times (ideally experiments should not be designed this way, but it is nevertheless common in genomics research). We demonstrate that this can lead to thousands of false positives results for differential expression of transcripts, and these differences will often rise to the very top of a ranked list. The solutions are to use methods which produce GC-bias-corrected transcript abundance estimates, to use block experimental design, to include experimental batch as a covariate in statistical analysis, and to examine GC content dependence across samples using software like FASTQC and MultiQC.

(2) The second point of warning is when any of the samples in the dataset contain strong GC dependence, that is, the library preparation was not able to adequately amplify fragments with GC content < 35% or > 65%*. Such samples were present in a subset of the batches of all the datasets we examined in the paper. Even if all of the samples come from a single batch, or the experiment has a block design with multiple batches, simple descriptive analysis of transcript abundance (e.g. how many isoforms are expressed per gene, which isoforms are expressed) will be inaccurate for hundreds-to-thousands of genes when ignoring fragment GC bias. Furthermore, differential expression, if present, could be attributed to the wrong isoform or isoforms within genes.

* These guidelines come from the Discussion section of an excellent paper on technical reproducibility of RNA-seq by the GEUVADIS project: ‘t Hoen et al (2013).

Q: Where are the details of the alpine statistical method?

The statistical method is detailed in the Supplementary Note. This allowed us to have more fine grained control of the presentation of LaTeX equations. The methods in the Supplementary Note were originally in the main text and were refined during peer review.

Q: Is alpine the only method implementing fragment GC bias correction?

No, the latest version of Salmon (v0.7, with methods described in the bioRxiv preprint) also implements a fragment GC bias correction similar to alpine, which runs at a fraction of the time. Salmon with the gcBias flag similarly reduces the mis-estimation of isoforms caused by fragment GC bias (see examples in Supp. Figure 5 of the latest Salmon preprint).

Q: Is the Salmon implementation of GC bias correction better than alpine?

The GC bias correction model is probably about equal, but other aspects of the Salmon method are superior to alpine. The alpine method, when estimating transcript abundances, focuses on one gene or locus at a time, and does not account for multi-mapping fragments across genes. The Salmon implementation is a full, transcriptome-wide estimation method which can simultaneously estimate and correct for sample-specific fragment GC bias (and other biases as well).

Our focus in writing alpine was to make a super-extensible method for modeling RNA-seq biases, in order to make rigorous comparisons of various methods for bias correction, and then to show that fragment GC bias correction in particular leads to more accurate estimates of transcript abundance. By super-extensible, I mean that one can easily modify details of the bias correction specification: any combination of bias effects (fragment length distribution, positional, random hexamer sequence bias, GC bias), interactions between these, or modifications to the spline parameters used to fit positional and fragment GC bias. It is also possible to take empirical fragment GC bias curves from alpine and directly incorporate these into the Polyester RNA-seq simulator. This results in simulated RNA-seq data with variable coverage that looks more similar to real RNA-seq coverage.

Links to papers mentioned in video:

R gotchas

I put together a short list of potential R gotchas: unexpected results which might trip up new R users.

For example, if we have a matrix m,


returns a matrix, while


returns a vector, unless one writes


Setting aside what the default behavior of functions should be, my point is simply that these are important to be aware of.

Foreign Affairs collection

I’m visiting home for a week.  Here are some quotes I’ve been reading, mostly from this collection of essays from Foreign Affairs.

From a 1935 essay by H. G. Wells, “Civilization on Trail”:

I continually try to see whether there is not a way of dealing with the civilized man in Germany and getting past that extraordinarily ugly Nazi mark which he has to wear because the alternative to the wearing of it would have meant submission to some foreign influence as dishonoring and even more humiliating.

George F. Kennan in “The Sources of Soviet Conduct”, 1947:

It is an undeniable privilege of every man to prove himself right in the thesis that the world is his enemy; for if he reiterates it frequently enough and makes it the background of his conduct he is bound eventually to be right.

It’s funny…Kennan is talking about the Bolsheviks, but it reads like an attack on neoconservatives.

Edward Gibbon from The Decline and Fall of the Roman Empire, 1776:

From enthusiasm to imposture the step is perilous and slippery; the demon of Socrates affords a memorable instance how a wise man may deceive himself, how a good man may deceive others, how the conscience may slumber in a mixed and middle state between self-illusion and voluntary fraud.

Historian Arthur Schlesinger, Jr., 1978:

Little has done more harm to human affairs than illusions on the part of leaders and of nations of their infallibility. Reinhold Niebuhr has warned of “a deep layer of Messianic consciousness in the mind of America” and of “the depth of evil to which individuals and communities may sink particularly when they try to play the role of God in history.”

Kissinger in an interview with the Financial Times this past week:

It is imperative to realize that we cannot do in China in the 21st century what others thought to do in the 19th, prescribe their institutions for them and seek to organize Asia.  The Chinese peole have undergone huge changes since 1971.  The China of 2008 is totally different from the one I first visited. The Communist party is different and though we need not agree with every action taken by Chinese leaders, we cannot simply set ourselves up as their critics.

Weird barrel music device

I’m at IFTF’s Future of Making conference, and got to chat with Hideo Tamura, editor of the Japanese edition of Make Magazine.   Make: Japan is a selection from the US Make plus some original Japanese content.  He showed me a bunch of videos from the recent Maker Faire in Japan, including this weird barrel input device for making music.


Babelfish’s attempt at translating: “You can adjust also volume with the strength which pushes. It is the truly rational musical instrument.”