Skip to contents

[Experimental]

Predict doublets in a Seurat object by simulating doublets. Doublet detection can be used to identify cell-cell aggregates or technical doublet artefacts in single-cell omics data.

Both real and simulated cells are used to perform PCA and a nearest neighbor search is performed to find the number of doublets in the neighborhood of each cell. The expected doublet rate is calculated based on the simulation rate and the number of nearest neighbors. A binomial test is performed to test if the number of doublets in the neighborhood of each cell is significantly different from the expected doublet rate. The algorithm is inspired by the DoubletFinder algorithm by McGinnis et al. (2019).

This method of doublet detection assumes that a doublet attains a composite identity of the two cells that form it, i.e. that a doublet constitutes the sum of its parts. This may not always the case, as cell-cell aggregates could have molecular profiles that are different from the two cells that form them. Furthermore, if the two cells that form a doublet are of the same or similar identity, the doublet may not be detected as a doublet. This is a limitation of the method and should be taken into account when interpreting the results.

Usage

PredictDoublets(object, ...)

# S3 method for class 'Matrix'
PredictDoublets(
  object,
  ref_cells1 = NULL,
  ref_cells2 = NULL,
  simulation_rate = 1,
  n_neighbor = 100,
  npcs = 10,
  p_adjust_method = "BH",
  p_threshold = 0.05,
  seed = 37,
  iter = 1,
  return_trials = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for class 'Seurat'
PredictDoublets(
  object,
  ref_cells1 = NULL,
  ref_cells2 = NULL,
  simulation_rate = 1,
  n_neighbor = 100,
  npcs = 10,
  p_adjust_method = "BH",
  p_threshold = 0.05,
  seed = 37,
  iter = 1,
  assay = NULL,
  layer = "counts",
  verbose = TRUE,
  ...
)

Arguments

object

A Seurat object or a count matrix.

...

Additional arguments. Currently not used.

ref_cells1, ref_cells2

A character vector with cell names or indices to use as the first and second reference populations. If NULL, all cells are used.

simulation_rate

The rate of doublets to simulate. E.g. 3 means that for each real cell, 3 doublets are simulated.

n_neighbor

The number of nearest neighbors to use for the doublet prediction. Default is 100.

npcs

The number of principal components to use for PCA. Default is 10.

p_adjust_method

The p-value adjustment method to use. Default is "BH".

p_threshold

The p-value threshold to use. Default is 0.01.

seed

The seed to use for reproducibility.

iter

The number of iterations to use for the doublet simulation. Increasing the number of iterations will increase the robustness of the doublet detection.

return_trials

Whether to return the result from each iteration (TRUE) or an aggregated summary of all iterations.

verbose

Print messages.

assay

A character with the name of the assay to use.

layer

A character with the name of the layer to use. Default is "counts".

Value

A tibble with the following columns:

  • trial Integer or factor indicating the resampling trial. Only returned if return_trials = TRUE.

  • id Cell ID.

  • doublet_nns Number of nearest neighbors that are simulated doublets.

  • doublet_nn_rate Proportion of nearest neighbors that are simulated doublets. Only returned if return_trials = FALSE.

  • doublet_vote The fraction of iterations where the cell has been classified as a doublet. Only returned if return_trials = FALSE.

  • doublet_p Raw p-value for the doublet prediction.

  • doublet_p_adj Adjusted p-value (multiple testing correction) for the doublet prediction.

  • logratio Log2-ratio of observed simulated doublet neighbors compared to expectation.

  • doublet_prediction Predicted doublet status (doublet/singlet).

References

McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Syst. 2019 Apr 24;8(4):329-337.e4. doi: 10.1016/j.cels.2019.03.003. Epub 2019 Apr 3. PMID: 30954475; PMCID: PMC6853612.