This document has last been compiled on 2021-12-15 10:27:48.
## Reading in metadata from file: results/BT642Year123/data/root_meta.txt
## Reading in data from file: results/BT642Year123/data/root_normCounts.txt
## Total number of root samples: 355
## Total number of genes in root samples: 24785
## No additional bad samples to remove (probably removed during normalization)
## Removing the 13 samples identified as not part of the main experiment, from the root samples
## Variance filter for root samples: Remove genes in the bottom 0.2 quantile of variability
## Total number of genes after limiting to high variable genes : 19828
## After filtering root samples: 328 samples, 19828 genes
## Total number of root samples: 355
## Total number of genes in root samples: 24785
## No additional bad samples to remove (probably removed during normalization)
## Removing the 13 samples identified as not part of the main experiment, from the root samples
## After filtering root samples: 328 samples, 24785 genes
In this document, we perform clustering using built in function in moanin package. We filter based on the combined p values we previously generated (using Fisher’s method), and the combined log fold change (see mRNA_DE_combinePvalue_*
for more details). The file containing this information is results/BT642Year123/DE_combine/root_Preflowering_combining_every.csv.
(Note: for year 2 and 3, there were different combined p-values based on tried different timepoint combinations; we previously compared these to see if the clusering results would be affected; we are now just using the choice listed above).
## Dimensions of data matrix restricting to designated timepoints: 19828 x 245
## The unique time points are:
## 21, 28, 35, 42, 49, 105, 91, 56, 63, 70, 77, 84, 119, 58, 59, 61
## The conditions and number of samples are:
##
## Year1.Control.BT642 Year1.Preflowering.BT642 Year2.Control.BT642
## 35 34 45
## Year2.Preflowering.BT642 Year3.Control.BT642 Year3.Preflowering.BT642
## 43 43 45
## Dimensions of data matrix restricting to designated timepoints: 24785 x 245
## The unique time points are:
## 21, 28, 35, 42, 49, 105, 91, 56, 63, 70, 77, 84, 119, 58, 59, 61
## The conditions and number of samples are:
##
## Year1.Control.BT642 Year1.Preflowering.BT642 Year2.Control.BT642
## 35 34 45
## Year2.Preflowering.BT642 Year3.Control.BT642 Year3.Preflowering.BT642
## 43 43 45
We have already performed variance filtering above (see data entry above). In this section we perform further gene filtering. We keep the genes that have a combined p values < 0.05 (after adjusting for multiple testing), and that we previously determined have a high enough log-fold change (logical variable lfc_keep
created in mRNA_DE_combinePvalue*.Rmd
, see documentation there; different cutoffs were set for drought effects and genotype effects).
## Total number of genes: 24785
## Total number of genes after variance filter: 19828
## Total number of genes after filtering on p-value and log-fold change are: 6510
We will find 20 clusters using the splines_kmeans
function in moanin package.
## Distribution of genes in clusters:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 359 355 174 375 279 338 384 219 355 303 380 272 581 287 500 253 246 278 304 268
Now we will score all the genes, even those not used in clustering.
## Total number of genes assigned to a cluster: 12392
## Distribution of all genes in best clusters (NA=no cluster):
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 496 868 89 626 219 358 789 235 869 714 2010 196 760
## 14 15 16 17 18 19 20 <NA>
## 291 1033 144 945 599 546 605 12393
## png
## 2
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 splines stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] moanin_1.1.2 topGO_2.44.0
## [3] SparseM_1.81 GO.db_3.13.0
## [5] AnnotationDbi_1.56.1 graph_1.72.0
## [7] data.table_1.14.2 clusterExperiment_2.14.0
## [9] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [11] GenomicRanges_1.46.0 GenomeInfoDb_1.30.0
## [13] IRanges_2.28.0 S4Vectors_0.32.2
## [15] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [17] MASS_7.3-54 reshape2_1.4.4
## [19] forcats_0.5.1 stringr_1.4.0
## [21] purrr_0.3.4 readr_2.0.2
## [23] tidyr_1.1.4 tibble_3.1.6
## [25] tidyverse_1.3.1 dplyr_1.0.7
## [27] ggplot2_3.3.5 NMF_0.23.0
## [29] synchronicity_1.3.5 bigmemory_4.5.36
## [31] Biobase_2.54.0 BiocGenerics_0.40.0
## [33] cluster_2.1.2 rngtools_1.5.2
## [35] pkgmaker_0.32.2 registry_0.5-1
## [37] knitr_1.36 rmarkdown_2.11
## [39] SCF_4.1.0
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 uuid_1.0-3 backports_1.3.0
## [4] plyr_1.8.6 lazyeval_0.2.2 gmp_0.6-2.1
## [7] BiocParallel_1.28.0 rncl_0.8.4 gridBase_0.4-7
## [10] digest_0.6.28 foreach_1.5.1 htmltools_0.5.2
## [13] viridis_0.6.2 fansi_0.5.0 magrittr_2.0.1
## [16] memoise_2.0.0 ScaledMatrix_1.2.0 doParallel_1.0.16
## [19] tzdb_0.2.0 limma_3.50.0 Biostrings_2.60.2
## [22] annotate_1.72.0 modelr_0.1.8 prettyunits_1.1.1
## [25] colorspace_2.0-2 blob_1.2.2 rvest_1.0.2
## [28] haven_2.4.3 xfun_0.28 crayon_1.4.2
## [31] RCurl_1.98-1.5 jsonlite_1.7.2 bigmemory.sri_0.1.3
## [34] genefilter_1.76.0 phylobase_0.8.10 survival_3.2-13
## [37] iterators_1.0.13 ape_5.5 glue_1.5.0
## [40] gtable_0.3.0 zlibbioc_1.40.0 XVector_0.34.0
## [43] DelayedArray_0.20.0 BiocSingular_1.10.0 kernlab_0.9-29
## [46] Rhdf5lib_1.16.0 HDF5Array_1.22.0 scales_1.1.1
## [49] DBI_1.1.1 edgeR_3.36.0 Rcpp_1.0.7
## [52] viridisLite_0.4.0 xtable_1.8-4 progress_1.2.2
## [55] bit_4.0.4 rsvd_1.0.5 httr_1.4.2
## [58] RColorBrewer_1.1-2 ellipsis_0.3.2 ClusterR_1.2.5
## [61] pkgconfig_2.0.3 XML_3.99-0.8 sass_0.4.0
## [64] dbplyr_2.1.1 locfit_1.5-9.4 utf8_1.2.2
## [67] howmany_0.3-1 tidyselect_1.1.1 rlang_0.4.12
## [70] softImpute_1.4-1 munsell_0.5.0 cellranger_1.1.0
## [73] tools_4.1.2 cachem_1.0.6 cli_3.1.0
## [76] generics_0.1.1 RSQLite_2.2.8 ade4_1.7-18
## [79] broom_0.7.10 evaluate_0.14 fastmap_1.1.0
## [82] yaml_2.2.1 bit64_4.0.5 fs_1.5.0
## [85] KEGGREST_1.34.0 nlme_3.1-153 xml2_1.3.2
## [88] compiler_4.1.2 rstudioapi_0.13 png_0.1-7
## [91] reprex_2.0.1 bslib_0.3.1 RNeXML_2.4.5
## [94] stringi_1.7.5 highr_0.9 lattice_0.20-45
## [97] Matrix_1.3-4 vctrs_0.3.8 rhdf5filters_1.6.0
## [100] pillar_1.6.4 lifecycle_1.0.1 jquerylib_0.1.4
## [103] zinbwave_1.16.0 bitops_1.0-7 irlba_2.3.3
## [106] R6_2.5.1 gridExtra_2.3 codetools_0.2-18
## [109] gtools_3.9.2 assertthat_0.2.1 rhdf5_2.38.0
## [112] withr_2.4.2 GenomeInfoDbData_1.2.7 locfdr_1.1-8
## [115] parallel_4.1.2 hms_1.1.1 grid_4.1.2
## [118] beachmat_2.10.0 lubridate_1.8.0