This document has last been compiled on 2021-12-15 10:27:48.

Data Entry

## Reading in metadata from file: results/BT642Year123/data/root_meta.txt
## Reading in data from file: results/BT642Year123/data/root_normCounts.txt
## Total number of root samples: 355 
## Total number of genes in root samples: 24785 
## No additional bad samples to remove (probably removed during normalization)
## Removing the 13  samples identified as not part of the main experiment, from the root samples
## Variance filter for root samples:  Remove genes in the bottom 0.2 quantile of variability 
## Total number of genes after limiting to high variable genes : 19828 
## After filtering root samples: 328 samples, 19828 genes
## Total number of root samples: 355 
## Total number of genes in root samples: 24785 
## No additional bad samples to remove (probably removed during normalization)
## Removing the 13  samples identified as not part of the main experiment, from the root samples
## After filtering root samples: 328 samples, 24785 genes

In this document, we perform clustering using built in function in moanin package. We filter based on the combined p values we previously generated (using Fisher’s method), and the combined log fold change (see mRNA_DE_combinePvalue_* for more details). The file containing this information is results/BT642Year123/DE_combine/root_Preflowering_combining_every.csv.

(Note: for year 2 and 3, there were different combined p-values based on tried different timepoint combinations; we previously compared these to see if the clusering results would be affected; we are now just using the choice listed above).

Create Moanin Object

## Dimensions of data matrix restricting to designated timepoints: 19828 x 245 
## The unique time points are:
##       21, 28, 35, 42, 49, 105, 91, 56, 63, 70, 77, 84, 119, 58, 59, 61 
## The conditions and number of samples are:
##      
##      Year1.Control.BT642 Year1.Preflowering.BT642      Year2.Control.BT642 
##                       35                       34                       45 
## Year2.Preflowering.BT642      Year3.Control.BT642 Year3.Preflowering.BT642 
##                       43                       43                       45
## Dimensions of data matrix restricting to designated timepoints: 24785 x 245 
## The unique time points are:
##       21, 28, 35, 42, 49, 105, 91, 56, 63, 70, 77, 84, 119, 58, 59, 61 
## The conditions and number of samples are:
##      
##      Year1.Control.BT642 Year1.Preflowering.BT642      Year2.Control.BT642 
##                       35                       34                       45 
## Year2.Preflowering.BT642      Year3.Control.BT642 Year3.Preflowering.BT642 
##                       43                       43                       45

Filtering genes

We have already performed variance filtering above (see data entry above). In this section we perform further gene filtering. We keep the genes that have a combined p values < 0.05 (after adjusting for multiple testing), and that we previously determined have a high enough log-fold change (logical variable lfc_keep created in mRNA_DE_combinePvalue*.Rmd, see documentation there; different cutoffs were set for drought effects and genotype effects).

## Total number of genes: 24785
## Total number of genes after variance filter: 19828
## Total number of genes after filtering on p-value and log-fold change are: 6510

Clustering

We will find 20 clusters using the splines_kmeans function in moanin package.

## Distribution of genes in clusters:
## 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
## 359 355 174 375 279 338 384 219 355 303 380 272 581 287 500 253 246 278 304 268

Now we will score all the genes, even those not used in clustering.

## Total number of genes assigned to a cluster: 12392
## Distribution of all genes in best clusters (NA=no cluster):
## 
##     1     2     3     4     5     6     7     8     9    10    11    12    13 
##   496   868    89   626   219   358   789   235   869   714  2010   196   760 
##    14    15    16    17    18    19    20  <NA> 
##   291  1033   144   945   599   546   605 12393

Centroid Plotting.

## png 
##   2

Session Info

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    splines   stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] moanin_1.1.2                topGO_2.44.0               
##  [3] SparseM_1.81                GO.db_3.13.0               
##  [5] AnnotationDbi_1.56.1        graph_1.72.0               
##  [7] data.table_1.14.2           clusterExperiment_2.14.0   
##  [9] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [11] GenomicRanges_1.46.0        GenomeInfoDb_1.30.0        
## [13] IRanges_2.28.0              S4Vectors_0.32.2           
## [15] MatrixGenerics_1.6.0        matrixStats_0.61.0         
## [17] MASS_7.3-54                 reshape2_1.4.4             
## [19] forcats_0.5.1               stringr_1.4.0              
## [21] purrr_0.3.4                 readr_2.0.2                
## [23] tidyr_1.1.4                 tibble_3.1.6               
## [25] tidyverse_1.3.1             dplyr_1.0.7                
## [27] ggplot2_3.3.5               NMF_0.23.0                 
## [29] synchronicity_1.3.5         bigmemory_4.5.36           
## [31] Biobase_2.54.0              BiocGenerics_0.40.0        
## [33] cluster_2.1.2               rngtools_1.5.2             
## [35] pkgmaker_0.32.2             registry_0.5-1             
## [37] knitr_1.36                  rmarkdown_2.11             
## [39] SCF_4.1.0                  
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1           uuid_1.0-3             backports_1.3.0       
##   [4] plyr_1.8.6             lazyeval_0.2.2         gmp_0.6-2.1           
##   [7] BiocParallel_1.28.0    rncl_0.8.4             gridBase_0.4-7        
##  [10] digest_0.6.28          foreach_1.5.1          htmltools_0.5.2       
##  [13] viridis_0.6.2          fansi_0.5.0            magrittr_2.0.1        
##  [16] memoise_2.0.0          ScaledMatrix_1.2.0     doParallel_1.0.16     
##  [19] tzdb_0.2.0             limma_3.50.0           Biostrings_2.60.2     
##  [22] annotate_1.72.0        modelr_0.1.8           prettyunits_1.1.1     
##  [25] colorspace_2.0-2       blob_1.2.2             rvest_1.0.2           
##  [28] haven_2.4.3            xfun_0.28              crayon_1.4.2          
##  [31] RCurl_1.98-1.5         jsonlite_1.7.2         bigmemory.sri_0.1.3   
##  [34] genefilter_1.76.0      phylobase_0.8.10       survival_3.2-13       
##  [37] iterators_1.0.13       ape_5.5                glue_1.5.0            
##  [40] gtable_0.3.0           zlibbioc_1.40.0        XVector_0.34.0        
##  [43] DelayedArray_0.20.0    BiocSingular_1.10.0    kernlab_0.9-29        
##  [46] Rhdf5lib_1.16.0        HDF5Array_1.22.0       scales_1.1.1          
##  [49] DBI_1.1.1              edgeR_3.36.0           Rcpp_1.0.7            
##  [52] viridisLite_0.4.0      xtable_1.8-4           progress_1.2.2        
##  [55] bit_4.0.4              rsvd_1.0.5             httr_1.4.2            
##  [58] RColorBrewer_1.1-2     ellipsis_0.3.2         ClusterR_1.2.5        
##  [61] pkgconfig_2.0.3        XML_3.99-0.8           sass_0.4.0            
##  [64] dbplyr_2.1.1           locfit_1.5-9.4         utf8_1.2.2            
##  [67] howmany_0.3-1          tidyselect_1.1.1       rlang_0.4.12          
##  [70] softImpute_1.4-1       munsell_0.5.0          cellranger_1.1.0      
##  [73] tools_4.1.2            cachem_1.0.6           cli_3.1.0             
##  [76] generics_0.1.1         RSQLite_2.2.8          ade4_1.7-18           
##  [79] broom_0.7.10           evaluate_0.14          fastmap_1.1.0         
##  [82] yaml_2.2.1             bit64_4.0.5            fs_1.5.0              
##  [85] KEGGREST_1.34.0        nlme_3.1-153           xml2_1.3.2            
##  [88] compiler_4.1.2         rstudioapi_0.13        png_0.1-7             
##  [91] reprex_2.0.1           bslib_0.3.1            RNeXML_2.4.5          
##  [94] stringi_1.7.5          highr_0.9              lattice_0.20-45       
##  [97] Matrix_1.3-4           vctrs_0.3.8            rhdf5filters_1.6.0    
## [100] pillar_1.6.4           lifecycle_1.0.1        jquerylib_0.1.4       
## [103] zinbwave_1.16.0        bitops_1.0-7           irlba_2.3.3           
## [106] R6_2.5.1               gridExtra_2.3          codetools_0.2-18      
## [109] gtools_3.9.2           assertthat_0.2.1       rhdf5_2.38.0          
## [112] withr_2.4.2            GenomeInfoDbData_1.2.7 locfdr_1.1-8          
## [115] parallel_4.1.2         hms_1.1.1              grid_4.1.2            
## [118] beachmat_2.10.0        lubridate_1.8.0