fastMNN
algorithmIt is possible for the batch effect to be variable in direction across different subpopulations without violating the assumption of orthogonality to the biological subspace. In such cases, the orthogonalization step performed by fastMNN()
is not effective at resolving the kissing problem for subpopulations with batch vectors that that differ from the average batch vector. This results in incomplete mixing of batches within each cluster, which is usually harmless but not aesthetically pleasing. The solution is to increase k
, ideally to the anticipated average size of each cluster.
An interesting consequence of the orthogonalization step is that fastMNN()
may not work correctly in the absence of a batch effect. In such cases, the batch vector will be of near-zero length in some random direction. If this is parallel to the biological subspace, orthogonalization may subsequently end up removing geniune biology. This is a natural side-effect of the orthogonality assumption, which obviously fails if there is no batch vector in the first place. In practice, this is unlikely to be a major problem as a random vector is still likely to be orthogonal to any one biological dimension. If this is not the case, we should be able to observe a large loss of variance that indicates that fastMNN()
should not be run.
fastMNN()
can also be instructed to skip the correction if the relative batch effect size is below some threshold. The relative size is defined as the ratio of the L2 norm of the average correction vector to the expected L2 norm of the per-pair vectors. This is small if there is no batch effect as the per-pair vectors will point in different directions. If large losses of variance at particular merge steps are suspected to be caused by the lack of a batch effect, we recommend examining the relative effect sizes and picking a threshold that allows one to skip those steps.
Comments:
fastMNN()
will not skip correction by default, even though the relative effect sizes are available. This is because the relative effect size can be small in situations where there is a genuine batch effect (e.g., due to a small proportion of per-pair vectors that are very large). More generally, the absence of a batch effect is an uncommon scenario that warrants some further manual investigation.