Calculate Local G

Local G is a local spatial statistic that measures the degree of clustering protein marker counts.

Usage

local_G(
  g,
  counts,
  k = 1,
  W = NULL,
  use_weights = TRUE,
  normalize_counts = FALSE,
  type = c("gi", "gstari"),
  return_p_vals = FALSE,
  p_adjust_method = "BH",
  alternative = c("two.sided", "less", "greater"),
  ...
)

Arguments

g: A tbl_graph object representing an MPX/PNA component graph
counts: A dgCMatrix (sparse matrix) with node marker counts. Raw counts are recommended.
k: An integer value specifying the maximum steps from each node to expand the neighborhood. E.g. with k = 2, each node neighborhood will include the direct neighbors and nodes two steps away. With k = 1 (default), only the direct neighbors will be used.
W: A dgCMatrix (sparse matrix) with edge weights. If not provided, the function will use the adjacency matrix of the graph to compute weights.
use_weights: Logical indicating whether to use edge weights. See section weights for details.
normalize_counts: Logical indicating whether to normalize the counts. If TRUE, the counts within each node are converted to proportions.
type: One of "gstari" or "gi". See details below for more information.
return_p_vals: Compute and return p-values for the local G scores. The p values are also adjusted for multiple testing per marker.
p_adjust_method: Method to use for multiple testing adjustment. See p.adjust for details.
alternative: a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".
...: Additional arguments passed from other methods.

Value

A matrix with local G scores or a list with the following items:

gi_mat: A matrix with local G scores
gi_p_mat: A matrix with p-values (if return_p_vals = TRUE)
gi_p_adj_mat: A matrix with adjusted p-values (if return_p_vals = TRUE)

Details

Local G can for instance be used to detect hot spots for protein markers in a graph, where nodes that are close to each other have similar marker count values. The metric is a Z-score that measures the deviation of the observed local marker expression from the expected marker expression under the null hypothesis of no spatial association. The sign of the score indicates whether the observed marker counts are higher or lower than expected and can therefore be used to identify hot and/or cold spots.

The observed local marker expression for a node is computed by aggregating the weighted marker expression within its local neighborhood. The local G metric is largely influenced by the choice of edge weights (see section on weights below) and the size of the local neighborhood k. The method implemented here uses incoming transition probabilities for a k-step random walk as edge weights. By increasing k, the local neighborhood of each node is expanded, increasing the "smoothing effect" of the metric, which can be useful to increase the scale of spatial association.

Local G can also be useful for more interpretative visualization of polarized marker expression, as it enhances spatial trends across neighborhoods in the graph, even if the marker counts in individual nodes are sparse.

Definition of $G_{i}$ ("gi")

Definition of local G (Ord and Getis 1995, equation 6): $$ Z(G_i)=\dfrac{[\sum_{j=1}^{n}w_{i,j}x_j]-[(\sum_{j=1}^{n}w_{i,j})\bar x^*]} {s^*\{[(n-1)\sum_{j=1}^{n}w_{i,j}^2-(\sum_{j=1}^{n}w_{i,j})^2]/(n-2)\}^{1/2}},\forall j\neq i $$
where $$s_i = \sqrt{((\sum_{j=1}^{n}x_j^2)/(n-1))-[\bar x_i]^2},\forall j\neq i$$ and $$\bar x_i=(\sum_{j=1}^{n}x_j)/(n-1),\forall j\neq i$$

Definition of $G_{i}^{*}$ ("gstari")

In the equation for $G_{i}$, the condition that $i\neq j$ is central. $G_i^*$ relaxes this constraint, by including $i$ as a neighbor of itself. This statistic is expressed as (Ord and Getis 1995, equation 7): $$ Z(G_i^*)=\dfrac{[\sum_{j=1}^{n}w_{i,j}x_j]-[\sum_{j=1}^{n}w_{i,j}\bar x_i]} {s_i\{[n\sum_{j=1}^{n}w_{i,j}^2-(\sum_{j=1}^{n}w_{i,j})^2]/(n-1)\}^{1/2}} $$
where $$s^* = \sqrt{((\sum_{j=1}^{n}x_j^2)/n)-\bar x_i^{*2}}$$ and $$\bar x_i^*=(\sum_{j=1}^{n}x_j)/n$$ The left numerator corresponds to $G_i$, the right to $E(G_i)$, and the denominator to $Var(G_i)$.

Weights

Weights are calculated for node pairs if use_weights = TRUE. For a center node $i$, we calculate weights for each neighbor $j \in N(i)$ as the the transition probability for a k-step walk from $j$ to $i$. This helps normalizing the contribution by each neighbor in a manner that reduces the influence of high degree nodes. It also ensures that nodes that are further away from the central node in each neighborhood are given lower weights. The strategy rests on the assumption that node degree correlates strongly with the node marker count.

In practice, the weights are calculated by normalizing the adjacency matrix of the graph such that each row sums to 1 (also known as the stochastic matrix). By multiplying the transpose of the stochastic matrix with the count matrix, we obtain a "lag matrix" with the weighted marker expression for each node neighborhood which is leveraged in the computation of local G.

When using k > 1, the k'th power of the stochastic matrix is used instead. In this case, the transition probabilities are calculated for a random walk of length 2 or higher. Since the random walker can reach a larger neighborhood when k > 1, we will also include information about the marker expression for nodes that are more than 1 step away from the center node. Moreover, when k > 1, the weights for $i \sim i$ (self-loops) can become positive. These weights are not allowed in the calculation of $G_i$ and are therefore set to 0, and the remaining transition probabilities are normalized to 1 within each node neighborhood.

The weighting scheme is ignored if W is provided.

References

Ord, J. K., Getis. A. Local Spatial Autocorrelation Statistics: Distributional Issues and an Application https://doi.org/10.1111/j.1538-4632.1995.tb00912.x

Examples

library(dplyr)
# Load example data as a Seurat object
pxl_file <- minimal_mpx_pxl_file()
seur_obj <- ReadMPX_Seurat(pxl_file) %>%
  LoadCellGraphs(cells = colnames(.)[1]) %>%
  ComputeLayout(layout_method = "pmds", dim = 3)
#> ✔ Created a 'Seurat' object with 5 cells and 80 targeted surface proteins
#> →    Loading CellGraphs for 1 cells from sample 1
#> ✔ Successfully loaded 1 CellGraph object(s).
#> ℹ Computing layouts for 1 graphs
cg <- CellGraphs(seur_obj)[[1]]
g <- CellGraphData(cg, "cellgraph")
counts <- CellGraphData(cg, "counts")
xyz <- CellGraphData(cg, "layout")[["pmds_3d"]]

# Compute local G scores
gi_mat <- local_G(g, counts = counts)

# Visualize
scores <- gi_mat[, "CD3E"]
max_abs_val <- max(abs(scores))
node_colors <-
  scales::col_quantile(
    domain = c(-max_abs_val, max_abs_val),
    palette = "RdBu",
    probs = seq(0, 0.99, length.out = 11),
    reverse = TRUE
  )(scores)
#> Warning: Some values were outside the color scale and will be treated as NA
plotly::plot_ly(
  xyz,
  x = ~x,
  y = ~y,
  z = ~z,
  type = "scatter3d",
  mode = "markers",
  marker = list(
    size = 5,
    color = node_colors
  )
)