Introduction: Batch effect correction

Batch effects come from technical variation across samples. This can often be prevented with good experimental design. When it cannot, there are computational approaches that can help.

Background

Problem: Variation in single-cell and spatial RNA sequencing data is known to be influenced by technical factors. In some cases, these technical factors may confound our ability to measure true biological variation between samples, making it more challenging to address the research question at hand.

Cause: These confounding factors include experimental biases and batch effects. Unavoidable systematic technical biases can include unequal amplification during PCR, cell lysis, reverse transcriptase enzyme efficiency, and stochastic molecular sampling during sequencing. By contrast, batch effects are technical, non-biological factors that also affect variation in the resulting data, but they occur in batches of samples. A “batch” refers to an individual group of samples that are processed differently relative to other samples in the experiment.

Solution: Technical factors that potentially lead to batch effects may be avoided with mitigation strategies in the lab and during sequencing. Examples of lab strategies include: sampling cells on the same day, using the same handling personnel, reagent lots, protocols, reducing PCR amplification bias, and generally using the same equipment. Sequencing strategies can include multiplexing libraries across flow cells. For example, if samples came from two patients, pooling libraries together and spreading them across flow cells can potentially spread out the flow cell-specific variation across samples.

Computational batch correction aims to remove technical variation from the data preventing this variation from confounding downstream analysis. There are several batch correction methods and tools that have implemented them.

The list below is not comprehensive. New and exciting tools, algorithms, and other resources continue to be released. We compiled this list based on a combination of factors including citations, quality of documentation, functionality/ease of use, and active support.

Tools and Algorithms