Summarize proximity scores — SummarizeProximityScores • pixelatorR

Computes the median or mean proximity score for each protein pair across all group combinations defined by group_vars. This is typically useful when profiling a population of interest, where the goal is to compute representative proximity scores for each protein pair within that population.

The proximity table typically contains a small fraction of the possible protein pairs for each cell. These missing pairs typically have UMI counts below the detection threshold and their proximity scores can therefore be imputed as 0 (0 observed join counts and 0 expected join counts => no deviation from the expected value). SummarizeProximityScores pads the proximity score vectors with 0s for the missing pairs to ensure that the summary statistics are computed for the entire population. This behavior can be turned off by setting include_missing_obs = FALSE.

Usage

SummarizeProximityScores(object, ...)

# S3 method for class 'tbl_lazy'
SummarizeProximityScores(
  object,
  proximity_metric = c("log2_ratio", "join_count_z"),
  group_vars = NULL,
  include_missing_obs = TRUE,
  summary_stat = c("mean", "median"),
  detailed = FALSE,
  ...
)

# S3 method for class 'data.frame'
SummarizeProximityScores(
  object,
  proximity_metric = "log2_ratio",
  group_vars = NULL,
  include_missing_obs = TRUE,
  summary_stat = c("mean", "median"),
  detailed = FALSE,
  ...
)

Arguments

object: A tbl_df or tbl_lazy object with proximity scores.
...: Additional arguments. Currently not used.
proximity_metric: The proximity metric to use. One of "log2_ratio" or "join_count_z".
group_vars: A character vector with the names of the variables to use for grouping This is typically used is you want to summarize the proximity scores for multiple cell types and/or conditions.
include_missing_obs: Logical indicating whether to include missing observations as 0 when computing the summary statistics.
summary_stat: One of "mean" or "median"
detailed: Logical indicating whether to return lists which can be used to compute custom summary statistics. See examples for details

Value

A tbl_df with summary statistics

Examples

library(pixelatorR)
library(dplyr)

pxl_file <- minimal_pna_pxl_file()
se <- ReadPNA_Seurat(pxl_file)
#> ✔ Created a <Seurat> object with 5 cells and 158 targeted surface proteins
proximity_table <- ProximityScores(se)

# Default method uses mean
SummarizeProximityScores(proximity_table) %>% head()
#> # A tibble: 6 × 7
#>   marker_1 marker_2 n_cells_detected n_cells n_cells_missing pct_detected
#>   <chr>    <chr>               <int>   <int>           <int>        <dbl>
#> 1 CD56     CD82                    4       5               1          0.8
#> 2 CD56     TCRab                   4       5               1          0.8
#> 3 CD366    CD4                     5       5               0          1  
#> 4 CD366    CD50                    5       5               0          1  
#> 5 CD366    CD47                    5       5               0          1  
#> 6 CD366    CD45RA                  5       5               0          1  
#> # ℹ 1 more variable: mean_log2_ratio <dbl>

# Switch to median
SummarizeProximityScores(proximity_table, summary_stat = "median") %>% head()
#> # A tibble: 6 × 7
#>   marker_1 marker_2 n_cells_detected n_cells n_cells_missing pct_detected
#>   <chr>    <chr>               <int>   <int>           <int>        <dbl>
#> 1 CD56     CD8                     4       5               1          0.8
#> 2 CD56     mIgG2a                  4       5               1          0.8
#> 3 CD56     CD94                    4       5               1          0.8
#> 4 CD56     VISTA                   4       5               1          0.8
#> 5 CD56     TCRVd2                  3       5               2          0.6
#> 6 CD366    CD93                    5       5               0          1  
#> # ℹ 1 more variable: median_log2_ratio <dbl>

# Ignore missing values
SummarizeProximityScores(proximity_table, include_missing_obs = FALSE) %>% head()
#> # A tibble: 6 × 7
#>   marker_1 marker_2 n_cells_detected n_cells n_cells_missing pct_detected
#>   <chr>    <chr>               <int>   <int>           <int>        <dbl>
#> 1 CD56     CD82                    4       5               1          0.8
#> 2 CD56     TCRab                   4       5               1          0.8
#> 3 CD366    CD4                     5       5               0          1  
#> 4 CD366    CD50                    5       5               0          1  
#> 5 CD366    CD47                    5       5               0          1  
#> 6 CD366    CD45RA                  5       5               0          1  
#> # ℹ 1 more variable: mean_log2_ratio <dbl>

# Return lists which can be used to compute custom summary statistics
SummarizeProximityScores(proximity_table, detailed = TRUE) %>%
  # It's important to do rowwise computations
  rowwise() %>%
  mutate(
    sd = sd(unlist(log2_ratio_list)),
    iqr = IQR(unlist(log2_ratio_list)),
    mad = mad(unlist(log2_ratio_list)),
    q90 = quantile(unlist(log2_ratio_list), 0.9)
  ) %>%
  select(marker_1, marker_2, sd, iqr, mad, q90) %>%
  ungroup()
#> # A tibble: 12,561 × 6
#>    marker_1 marker_2    sd   iqr   mad   q90
#>    <chr>    <chr>    <dbl> <dbl> <dbl> <dbl>
#>  1 CD56     NKp80    0     0     0     0    
#>  2 CD56     HLA-DQ   0     0     0     0    
#>  3 CD56     CD90     0     0     0     0    
#>  4 CD56     CD58     0     0     0     0    
#>  5 CD366    mIgG1    0     0     0     0    
#>  6 CD366    CD45RO   0     0     0     0    
#>  7 CD52     CD71     0.537 0.299 0.443 0.600
#>  8 CD52     CD59     0.458 0.608 0.684 0.887
#>  9 CD52     CX3CR1   1.35  0.669 0     0    
#> 10 CD37     CD37     0.917 0     0     1.23 
#> # ℹ 12,551 more rows