Skip to contents

[Experimental]

Predict doublets in a Seurat object by simulating doublets. Doublet detection can be used to identify cell-cell aggregates or technical doublet artefacts in single-cell omics data.

Both real and simulated cells are used to perform PCA and a nearest neighbor search is performed to find the number of doublets in the neighborhood of each cell. The expected doublet rate is calculated based on the simulation rate and the number of nearest neighbors. A binomial test is performed to test if the number of doublets in the neighborhood of each cell is significantly different from the expected doublet rate. The algorithm is inspired by the DoubletFinder algorithm by McGinnis et al. (2019).

This method of doublet detection assumes that a doublet attains a composite identity of the two cells that form it, i.e. that a doublet constitutes the sum of its parts. This may not always the case, as cell-cell aggregates could have molecular profiles that are different from the two cells that form them. Furthermore, if the two cells that form a doublet are of the same or similar identity, the doublet may not be detected as a doublet. This is a limitation of the method and should be taken into account when interpreting the results.

Usage

PredictDoublets(object, ...)

# S3 method for class 'Matrix'
PredictDoublets(
  object,
  ref_cells1 = NULL,
  ref_cells2 = NULL,
  simulation_rate = 3,
  n_neighbor = 100,
  npcs = 10,
  p_adjust_method = "BH",
  p_threshold = 0.01,
  seed = 37,
  verbose = TRUE,
  ...
)

# S3 method for class 'Seurat'
PredictDoublets(
  object,
  ref_cells1 = NULL,
  ref_cells2 = NULL,
  simulation_rate = 3,
  n_neighbor = 100,
  npcs = 10,
  p_adjust_method = "BH",
  p_threshold = 0.01,
  seed = 37,
  assay = NULL,
  layer = "counts",
  verbose = TRUE,
  ...
)

Arguments

object

A Seurat object or a count matrix.

...

Additional arguments. Currently not used.

ref_cells1, ref_cells2

A character vector with cell names or indices to use as the first and second reference populations. If NULL, all cells are used.

simulation_rate

The rate of doublets to simulate. E.g. 3 means that for each real cell, 3 doublets are simulated.

n_neighbor

The number of nearest neighbors to use for the doublet prediction. Default is 100.

npcs

The number of principal components to use for PCA. Default is 10.

p_adjust_method

The p-value adjustment method to use. Default is "BH".

p_threshold

The p-value threshold to use. Default is 0.01.

seed

The seed to use for reproducibility.

verbose

Print messages.

assay

A character with the name of the assay to use.

layer

A character with the name of the layer to use. Default is "counts".

References

McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Syst. 2019 Apr 24;8(4):329-337.e4. doi: 10.1016/j.cels.2019.03.003. Epub 2019 Apr 3. PMID: 30954475; PMCID: PMC6853612.