
Predict doublets in a Seurat object
PredictDoublets.Rd
Predict doublets in a Seurat object by simulating doublets. Doublet detection can be used to identify cell-cell aggregates or technical doublet artefacts in single-cell omics data.
Both real and simulated cells are used to perform PCA and a nearest neighbor search is performed to find the number of doublets in the neighborhood of each cell. The expected doublet rate is calculated based on the simulation rate and the number of nearest neighbors. A binomial test is performed to test if the number of doublets in the neighborhood of each cell is significantly different from the expected doublet rate. The algorithm is inspired by the DoubletFinder algorithm by McGinnis et al. (2019).
This method of doublet detection assumes that a doublet attains a composite identity of the two cells that form it, i.e. that a doublet constitutes the sum of its parts. This may not always the case, as cell-cell aggregates could have molecular profiles that are different from the two cells that form them. Furthermore, if the two cells that form a doublet are of the same or similar identity, the doublet may not be detected as a doublet. This is a limitation of the method and should be taken into account when interpreting the results.
Usage
PredictDoublets(object, ...)
# S3 method for class 'Matrix'
PredictDoublets(
object,
ref_cells1 = NULL,
ref_cells2 = NULL,
simulation_rate = 1,
n_neighbor = 100,
npcs = 10,
p_adjust_method = "BH",
p_threshold = 0.05,
seed = 37,
iter = 1,
return_trials = FALSE,
verbose = TRUE,
...
)
# S3 method for class 'Seurat'
PredictDoublets(
object,
ref_cells1 = NULL,
ref_cells2 = NULL,
simulation_rate = 1,
n_neighbor = 100,
npcs = 10,
p_adjust_method = "BH",
p_threshold = 0.05,
seed = 37,
iter = 1,
assay = NULL,
layer = "counts",
verbose = TRUE,
...
)
Arguments
- object
A
Seurat
object or a count matrix.- ...
Additional arguments. Currently not used.
- ref_cells1, ref_cells2
A character vector with cell names or indices to use as the first and second reference populations. If NULL, all cells are used.
- simulation_rate
The rate of doublets to simulate. E.g. 3 means that for each real cell, 3 doublets are simulated.
- n_neighbor
The number of nearest neighbors to use for the doublet prediction. Default is 100.
- npcs
The number of principal components to use for PCA. Default is 10.
- p_adjust_method
The p-value adjustment method to use. Default is "BH".
- p_threshold
The p-value threshold to use. Default is 0.01.
- seed
The seed to use for reproducibility.
- iter
The number of iterations to use for the doublet simulation. Increasing the number of iterations will increase the robustness of the doublet detection.
- return_trials
Whether to return the result from each iteration (TRUE) or an aggregated summary of all iterations.
- verbose
Print messages.
- assay
A character with the name of the assay to use.
- layer
A character with the name of the layer to use. Default is "counts".
Value
A tibble with the following columns:
trial
Integer or factor indicating the resampling trial. Only returned ifreturn_trials = TRUE
.id
Cell ID.doublet_nns
Number of nearest neighbors that are simulated doublets.doublet_nn_rate
Proportion of nearest neighbors that are simulated doublets. Only returned ifreturn_trials = FALSE
.doublet_vote
The fraction of iterations where the cell has been classified as a doublet. Only returned ifreturn_trials = FALSE
.doublet_p
Raw p-value for the doublet prediction.doublet_p_adj
Adjusted p-value (multiple testing correction) for the doublet prediction.logratio
Log2-ratio of observed simulated doublet neighbors compared to expectation.doublet_prediction
Predicted doublet status (doublet/singlet).