Calculate Local G
local_G.Rd
Arguments
- g
A
tbl_graph
object representing an MPX component- counts
A
dgCMatrix
(sparse matrix) with node marker counts. Raw counts are recommended.- k
An integer value specifying the maximum steps from each node to expand the neighborhood. E.g. with
k = 2
, each node neighborhood will include the direct neighbors and nodes two steps away. Withk = 1
(default), only the direct neighbors will be used.- W
A
dgCMatrix
(sparse matrix) with edge weights. If not provided, the function will use the adjacency matrix of the graph to compute weights.- use_weights
Logical indicating whether to use edge weights. See section weights for details.
- normalize_counts
Logical indicating whether to normalize the counts. If
TRUE
, the counts within each node are converted to proportions.- type
One of "gstari" or "gi". See details below for more information.
- return_p_vals
Compute and return p-values for the local G scores. The p values are also adjusted for multiple testing per marker.
- p_adjust_method
Method to use for multiple testing adjustment. See
p.adjust
for details.- alternative
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".
- ...
Additional arguments passed from other methods.
Value
A matrix with local G scores or a list with the following items:
gi_mat: A matrix with local G scores
gi_p_mat: A matrix with p-values (if
return_p_vals = TRUE
)gi_p_adj_mat: A matrix with adjusted p-values (if
return_p_vals = TRUE
)
Details
Local G can for instance be used to detect hot spots for protein markers in a graph, where nodes that are close to each other have similar marker count values. The metric is a Z-score that measures the deviation of the observed local marker expression from the expected marker expression under the null hypothesis of no spatial association. The sign of the score indicates whether the observed marker counts are higher or lower than expected and can therefore be used to identify hot and/or cold spots.
The observed local marker expression for a node is computed by aggregating the weighted marker
expression within its local neighborhood. The local G metric is largely influenced by the choice
of edge weights (see section on weights below) and the size of the local neighborhood k
.
The method implemented here uses incoming transition probabilities for a k-step random walk as
edge weights. By increasing k
, the local neighborhood of each node is expanded, increasing
the "smoothing effect" of the metric, which can be useful to increase the scale of spatial association.
Local G can also be useful for more interpretative visualization of polarized marker expression, as it enhances spatial trends across neighborhoods in the graph, even if the marker counts in individual nodes are sparse.
Definition of \(G_{i}\) ("gi")
Definition of local G (Ord and Getis 1995, equation 6):
$$
Z(G_i)=\dfrac{[\sum_{j=1}^{n}w_{i,j}x_j]-[(\sum_{j=1}^{n}w_{i,j})\bar x^*]}
{s^*\{[(n-1)\sum_{j=1}^{n}w_{i,j}^2-(\sum_{j=1}^{n}w_{i,j})^2]/(n-2)\}^{1/2}},\forall j\neq i
$$
where $$s_i = \sqrt{((\sum_{j=1}^{n}x_j^2)/(n-1))-[\bar x_i]^2},\forall j\neq i$$
and $$\bar x_i=(\sum_{j=1}^{n}x_j)/(n-1),\forall j\neq i$$
Definition of \(G_{i}^{*}\) ("gstari")
In the equation for \(G_{i}\), the condition that \(i\neq j\) is central. \(G_i^*\)
relaxes this constraint, by including \(i\) as a neighbor of itself. This statistic
is expressed as (Ord and Getis 1995, equation 7):
$$
Z(G_i^*)=\dfrac{[\sum_{j=1}^{n}w_{i,j}x_j]-[\sum_{j=1}^{n}w_{i,j}\bar x_i]}
{s_i\{[n\sum_{j=1}^{n}w_{i,j}^2-(\sum_{j=1}^{n}w_{i,j})^2]/(n-1)\}^{1/2}}
$$
where $$s^* = \sqrt{((\sum_{j=1}^{n}x_j^2)/n)-\bar x_i^{*2}}$$
and $$\bar x_i^*=(\sum_{j=1}^{n}x_j)/n$$
The left numerator corresponds to \(G_i\), the right to \(E(G_i)\),
and the denominator to \(Var(G_i)\).
Weights
Weights are calculated for node pairs if use_weights = TRUE
. For a center node \(i\),
we calculate weights for each neighbor \(j \in N(i)\) as the the transition probability
for a k-step walk from \(j\) to \(i\). This helps normalizing the contribution
by each neighbor in a manner that reduces the influence of high degree nodes. It also
ensures that nodes that are further away from the central node in each neighborhood
are given lower weights. The strategy rests on the assumption that node degree correlates
strongly with the node marker count.
In practice, the weights are calculated by normalizing the adjacency matrix of the graph such that each row sums to 1 (also known as the stochastic matrix). By multiplying the transpose of the stochastic matrix with the count matrix, we obtain a "lag matrix" with the weighted marker expression for each node neighborhood which is leveraged in the computation of local G.
When using k > 1
, the k
'th power of the stochastic matrix is used instead. In this
case, the transition probabilities are calculated for a random walk of length 2
or higher.
Since the random walker can reach a larger neighborhood when k > 1
, we will
also include information about the marker expression for nodes that are more than 1 step
away from the center node. Moreover, when k > 1
, the weights for \(i \sim i\)
(self-loops) can become positive. These weights are not allowed in the calculation of
\(G_i\) and are therefore set to 0, and the remaining transition probabilities are normalized
to 1 within each node neighborhood.
The weighting scheme is ignored if W
is provided.
References
Ord, J. K., Getis. A. Local Spatial Autocorrelation Statistics: Distributional Issues and an Application https://doi.org/10.1111/j.1538-4632.1995.tb00912.x
Examples
library(dplyr)
# Load example data as a Seurat object
pxl_file <- system.file("extdata/five_cells",
"five_cells.pxl",
package = "pixelatorR"
)
seur_obj <- ReadMPX_Seurat(pxl_file) %>%
LoadCellGraphs(cells = colnames(.)[1]) %>%
ComputeLayout(layout_method = "pmds", dim = 3)
#> ✔ Created a 'Seurat' object with 5 cells and 80 targeted surface proteins
#> → Loading CellGraphs for 1 cells from sample 1
#> ✔ Successfully loaded 1 CellGraph object(s).
#> ℹ Computing layouts for 1 graphs
cg <- CellGraphs(seur_obj)[[1]]
g <- CellGraphData(cg, "cellgraph")
counts <- CellGraphData(cg, "counts")
xyz <- CellGraphData(cg, "layout")[["pmds_3d"]]
# Compute local G scores
gi_mat <- local_G(g, counts = counts)
# Visualize
scores <- gi_mat[, "CD3E"]
max_abs_val <- max(abs(scores))
node_colors <-
scales::col_quantile(
domain = c(-max_abs_val, max_abs_val),
palette = "RdBu",
probs = seq(0, 0.99, length.out = 11),
reverse = TRUE
)(scores)
#> Warning: Some values were outside the color scale and will be treated as NA
plotly::plot_ly(
xyz,
x = ~x,
y = ~y,
z = ~z,
type = "scatter3d",
mode = "markers",
marker = list(
size = 5,
color = node_colors
)
)