
Predict doublets in a Seurat object
PredictDoublets.Rd
Predict doublets in a Seurat object by simulating doublets. Doublet detection can be used to identify cell-cell aggregates or technical doublet artefacts in single-cell omics data.
Both real and simulated cells are used to perform PCA and a nearest neighbor search is performed to find the number of doublets in the neighborhood of each cell. The expected doublet rate is calculated based on the simulation rate and the number of nearest neighbors. A binomial test is performed to test if the number of doublets in the neighborhood of each cell is significantly different from the expected doublet rate. The algorithm is inspired by the DoubletFinder algorithm by McGinnis et al. (2019).
This method of doublet detection assumes that a doublet attains a composite identity of the two cells that form it, i.e. that a doublet constitutes the sum of its parts. This may not always the case, as cell-cell aggregates could have molecular profiles that are different from the two cells that form them. Furthermore, if the two cells that form a doublet are of the same or similar identity, the doublet may not be detected as a doublet. This is a limitation of the method and should be taken into account when interpreting the results.
Usage
PredictDoublets(object, ...)
# S3 method for class 'Matrix'
PredictDoublets(
object,
ref_cells1 = NULL,
ref_cells2 = NULL,
simulation_rate = 3,
n_neighbor = 100,
npcs = 10,
p_adjust_method = "BH",
p_threshold = 0.01,
seed = 37,
verbose = TRUE,
...
)
# S3 method for class 'Seurat'
PredictDoublets(
object,
ref_cells1 = NULL,
ref_cells2 = NULL,
simulation_rate = 3,
n_neighbor = 100,
npcs = 10,
p_adjust_method = "BH",
p_threshold = 0.01,
seed = 37,
assay = NULL,
layer = "counts",
verbose = TRUE,
...
)
Arguments
- object
A
Seurat
object or a count matrix.- ...
Additional arguments. Currently not used.
- ref_cells1, ref_cells2
A character vector with cell names or indices to use as the first and second reference populations. If NULL, all cells are used.
- simulation_rate
The rate of doublets to simulate. E.g. 3 means that for each real cell, 3 doublets are simulated.
- n_neighbor
The number of nearest neighbors to use for the doublet prediction. Default is 100.
- npcs
The number of principal components to use for PCA. Default is 10.
- p_adjust_method
The p-value adjustment method to use. Default is "BH".
- p_threshold
The p-value threshold to use. Default is 0.01.
- seed
The seed to use for reproducibility.
- verbose
Print messages.
- assay
A character with the name of the assay to use.
- layer
A character with the name of the layer to use. Default is "counts".