Identify post-hoc neighbourhood marker genes

This function will perform differential gene expression analysis on differentially abundant neighbourhoods, by first aggregating adjacent and concordantly DA neighbourhoods, then comparing cells between these aggregated groups. For differential gene experession based on an input design within DA neighbourhoods see testDiffExp.

Arguments

x	A `Milo` object containing single-cell gene expression and neighbourhoods.
da.res	A `data.frame` containing DA results, as expected from running `testNhoods`.
da.fdr	A numeric scalar that determines at what FDR neighbourhoods are declared DA for the purposes of aggregating across concorantly DA neighbourhoods.
assay	A character scalar determining which `assays` slot to extract from the `Milo` object to use for DGE testing.
aggregate.samples	logical indicating wheather the expression values for cells in the same sample and neighbourhood group should be merged for DGE testing. This allows to perform testing exploiting the replication structure in the experimental design, rather than treating single-cells as independent replicates. The function used for aggregation depends on the selected gene expression assay: if `assay="counts"` the expression values are summed, otherwise we take the mean.
sample_col	a character scalar indicating the column in the colData storing sample information (only relevant if `aggregate.samples==TRUE`)
overlap	A scalar integer that determines the number of cells that must overlap between adjacent neighbourhoods for merging.
lfc.threshold	A scalar that determines the absolute log fold change above which neighbourhoods should be considerd 'DA' for merging. Default=NULL
merge.discord	A logical scalar that overrides the default behaviour and allows adjacent neighbourhoods to be merged if they have discordant log fold change signs. Using this argument is generally discouraged, but may be useful for constructing an empirical null group of cells, regardless of DA sign.
subset.row	A logical, integer or character vector indicating the rows of `x` to use for sumamrizing over cells in neighbourhoods.
gene.offset	A logical scalar the determines whether a per-cell offset is provided in the DGE GLM to adjust for the number of detected genes with expression > 0.
return.groups	A logical scalar that returns a `data.frame` of the aggregated groups per single-cell. Cells that are members of non-DA neighbourhoods contain `NA` values.
subset.nhoods	A logical, integer or character vector indicating which neighbourhoods to subset before aggregation and DGE testing.
na.function	A valid NA action function to apply, should be one of `na.fail, na.omit, na.exclude, na.pass`.
compute.new	A logical scalar indicating whether to force computing a new neighbourhood adjacency matrix if already present.

Value

A data.frame of DGE results containing a log fold change and adjusted p-value for each aggregated group of neighbourhoods. If return.groups then the return value is a list with the slots groups and dge containing the aggregated neighbourhood groups per single-cell and marker gene results, respectively.

Warning: If all neighbourhoods are grouped together, then it is impossible to run findNhoodMarkers. In this (hopefully rare) instance, this function will spit out a warning and return NULL.

Details

Adjacent neighbourhoods are first merged based on two criteria: 1) they share at least overlap number of cells, and 2) the DA log fold change sign is concordant. This behaviour can be modulated by setting overlap to be more or less stringent. Additionally, a threshold on the log fold-changes can be set, such that lfc.threshold is required to merge adjacent neighbourhoods. Note: adjacent neighbourhoods will never be merged with opposite signs.

Using a one vs. all approach, each aggregated group of cells is compared to all others using the single-cell log normalized gene expression with a GLM (for details see limma-package), or the single-cell counts using a negative binomial GLM (for details see edgeR-package). When using the latter it is recommended to set gene.offset=TRUE as this behaviour adjusts the model offsets by the number of detected genes in each cell.

Examples

library(SingleCellExperiment)
ux.1 <- matrix(rpois(12000, 5), ncol=400)
ux.2 <- matrix(rpois(12000, 4), ncol=400)
ux <- rbind(ux.1, ux.2)
vx <- log2(ux + 1)
pca <- prcomp(t(vx))

sce <- SingleCellExperiment(assays=list(counts=ux, logcounts=vx),
                            reducedDims=SimpleList(PCA=pca$x))
colnames(sce) <- paste0("Cell", 1:ncol(sce))
milo <- Milo(sce)
milo <- buildGraph(milo, k=20, d=10, transposed=TRUE)
#> Constructing kNN graph with k:20
milo <- makeNhoods(milo, k=20, d=10, prop=0.3)
#> Checking valid object
milo <- calcNhoodDistance(milo, d=10)

cond <- rep("A", ncol(milo))
cond.a <- sample(1:ncol(milo), size=floor(ncol(milo)*0.25))
cond.b <- setdiff(1:ncol(milo), cond.a)
cond[cond.b] <- "B"
meta.df <- data.frame(Condition=cond, Replicate=c(rep("R1", 132), rep("R2", 132), rep("R3", 136)))
meta.df$SampID <- paste(meta.df$Condition, meta.df$Replicate, sep="_")
milo <- countCells(milo, meta.data=meta.df, samples="SampID")
#> Checking meta.data validity
#> Setting up matrix with 69 neighbourhoods
#> Counting cells in neighbourhoods

test.meta <- data.frame("Condition"=c(rep("A", 3), rep("B", 3)), "Replicate"=rep(c("R1", "R2", "R3"), 2))
test.meta$Sample <- paste(test.meta$Condition, test.meta$Replicate, sep="_")
rownames(test.meta) <- test.meta$Sample
da.res <- testNhoods(milo, design=~0 + Condition, design.df=test.meta[colnames(nhoodCounts(milo)), ])
#> Performing spatial FDR correction withk-distance weightingPerforming spatial FDR correction withneighbour-distance weightingPerforming spatial FDR correction withedge weightingPerforming spatial FDR correction withvertex weightingPerforming spatial FDR correction withnone weighting

nhood.dge <- findNhoodMarkers(milo, da.res, overlap=1, compute.new=TRUE)
#> Found 37 DA neighbourhoods at FDR 10%
#> Calculating nhood adjacency
#> Nhoods aggregated into 1 groups
#> Warning: All graph neighbourhoods are in the same group - cannot perform DGE testing. Returning NULL
nhood.dge
#> NULL