Run Canek on a toy example

library(Canek)

# Functions
## Function to plot the pca coordinates
plotPCA <- function(pcaData = NULL, label = NULL, legPosition = "topleft"){
  col <- as.integer(label) 
  plot(x = pcaData[,"PC1"], y = pcaData[,"PC2"],
       col = as.integer(label), cex = 0.75, pch = 19,
       xlab = "PC1", ylab = "PC2")
  legend(legPosition,  pch = 19,
         legend = levels(label), 
         col =  unique(as.integer(label)))
}

Load the data

On this toy example we use the two simulated batches included in the SimBatches data from Canek’s package. SimBatches is a list containing:

lsData <- list(B1 = SimBatches$batches[[1]], B2 = SimBatches$batches[[2]])
batch <- factor(c(rep("Batch-1", ncol(lsData[[1]])),
                  rep("Batch-2", ncol(lsData[[2]]))))
celltype <- SimBatches$cell_types
table(batch)
#> batch
#> Batch-1 Batch-2 
#>     631     948
table(celltype)
#> celltype
#> Cell Type 1 Cell Type 2 Cell Type 3 Cell Type 4 
#>        1451          53          38          37

PCA before correction

We perform the Principal Component Analysis (PCA) of the joined datasets and scatter plot the first two PCs. The batch-effect causes cells to group by batch.

data <- Reduce(cbind, lsData)
pcaData <- prcomp(t(data), center = TRUE, scale. = TRUE)$x
plotPCA(pcaData = pcaData, label = batch, legPosition = "bottomleft")

plotPCA(pcaData = pcaData, label = celltype, legPosition = "bottomleft")

Run Canek

We correct the toy batches using the function RunCanek. This function accepts:

On this example we use the list of matrices created before.

data <- RunCanek(lsData)

PCA after correction

We perform PCA of the corrected datasets and plot the first two PCs. After correction, the cells group by their corresponding cell type.

pcaData <- prcomp(t(data), center = TRUE, scale. = TRUE)$x
plotPCA(pcaData = pcaData, label = batch, legPosition = "topleft")

plotPCA(pcaData = pcaData, label = celltype, legPosition = "topleft")

Session info

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#> 
#> Matrix products: default
#> BLAS/LAPACK: /Users/martin/miniconda3/envs/Canek/lib/libopenblasp-r0.3.18.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Canek_0.2.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3         highr_0.9            DEoptimR_1.0-11     
#>  [4] bslib_0.3.1          compiler_4.1.3       bluster_1.4.0       
#>  [7] jquerylib_0.1.4      class_7.3-20         prabclus_2.3-2      
#> [10] BiocNeighbors_1.12.0 numbers_0.8-2        tools_4.1.3         
#> [13] mclust_5.4.9         digest_0.6.29        jsonlite_1.8.0      
#> [16] evaluate_0.15        lattice_0.20-45      pkgconfig_2.0.3     
#> [19] rlang_1.0.2          Matrix_1.4-1         igraph_1.3.0        
#> [22] cli_3.2.0            rstudioapi_0.13      yaml_2.3.5          
#> [25] parallel_4.1.3       xfun_0.30            fastmap_1.1.0       
#> [28] stringr_1.4.0        knitr_1.38           cluster_2.1.3       
#> [31] sass_0.4.1           S4Vectors_0.32.4     fpc_2.2-9           
#> [34] diptest_0.76-0       nnet_7.3-17          stats4_4.1.3        
#> [37] grid_4.1.3           robustbase_0.95-0    R6_2.5.1            
#> [40] flexmix_2.3-17       BiocParallel_1.28.3  rmarkdown_2.13      
#> [43] irlba_2.3.5          kernlab_0.9-30       magrittr_2.0.3      
#> [46] matrixStats_0.61.0   modeltools_0.2-23    MASS_7.3-56         
#> [49] htmltools_0.5.2      BiocGenerics_0.40.0  stringi_1.7.6       
#> [52] FNN_1.1.3