This document has last been compiled on 2021-12-14 23:14:01.
## Reading in metadata from file: results/BT642Year2/data/root_meta.txt
## Reading in data from file: results/BT642Year2/data/root_normCounts.txt
## Reading in metadata from file: results/BT642Year2/data/leaf_meta.txt
## Reading in data from file: results/BT642Year2/data/leaf_normCounts.txt
## Total number of leaf samples: 125
## Total number of genes in leaf samples: 22181
## No additional bad samples to remove (probably removed during normalization)
## Removing the 13 samples identified as not part of the main experiment, from the leaf samples
## Variance filter for leaf samples: Remove genes in the bottom 0.2 quantile of variability
## Total number of genes after limiting to high variable genes : 17744
## After filtering leaf samples: 112 samples, 17744 genes
## Total number of root samples: 133
## Total number of genes in root samples: 23916
## No additional bad samples to remove (probably removed during normalization)
## Removing the 13 samples identified as not part of the main experiment, from the root samples
## Variance filter for root samples: Remove genes in the bottom 0.2 quantile of variability
## Total number of genes after limiting to high variable genes : 19132
## After filtering root samples: 120 samples, 19132 genes
There are 35115 in the S. bicolor genome (based on annotation: gene_annotation_642.tsv). After low-expression filtering and normalization, we obtain 22181 genes in the leaf samples, and 23916 genes in root samples. If we filter on top of this to the top 50% variable genes, we get 19132 in the root and 17744 (Note: we are not using this variance filter except in the clustering, and we need to double check that it is still being used in this way) Unlike in our previous analysis, our differential expression analysis is not first doing a leaf-specific or root-specific expression filter after normalization to remove leaf or root specific genes.
We give some basic info on the annotations on all genes / orthogroups:
For leaf samples, out of the 17744 genes considered, here is the number of genes DE for each genotype, under the two drought conditions (using the global measure of DE, ie based on splines function fit, with qvalue required to be less than 0.05). Log-fold change is based on the global log-fold change measure, MaxOfIntervals (require to be in absolute value greater than 2)).
Genotype | Preflowering (DE) | Preflowering (DE+lfc) | Postflowering (DE) | Postflowering (DE+lfc) |
---|---|---|---|---|
RT430 | NA | 198 | NA | 34 |
BT642 | 7621 | 198 | 910 | 34 |
We intersect the above DE/lfc results with the information about the ortholog group / gene, and its relationship to the 642/430 mappings
First we consider the relationship between 642/430 within the ortholog group:
Now the relationship of the orthogroup to 623:
The following are the number of genes significant (q-value < 0.05) for different cutoffs of absolute log-fold-change
Threshold | Pre.642 | Post.642 |
---|---|---|
0.8 | 1619 | 269 |
1.0 | 1022 | 178 |
1.5 | 406 | 81 |
2.0 | 198 | 34 |
The following consider the number DE with log-fold change cutoffs specific to drought/recovery intervals:
Summary of the number of genes found DE with lfc > 2 in the indicated direction in drought intervals:
leaf_Preflowering_BT642 | leaf_Postflowering_BT642 | |
---|---|---|
DroughtTps_up | 52 | 4 |
DroughtTps_down | 28 | 3 |
EarlyDroughtTps_up | 7 | 0 |
EarlyDroughtTps_down | 0 | 0 |
LateDroughtTps_up | 167 | 34 |
LateDroughtTps_down | 117 | 40 |
Summary of the number of genes found DE with lfc > 2 in the indicated direction in recovery intervals (pre-flowering only):
leaf_Preflowering_BT642 | |
---|---|
RecoveryTps_up | 6 |
RecoveryTps_down | 0 |
EarlyRecoveryTps_up | 39 |
EarlyRecoveryTps_down | 11 |
LateRecoveryTps_up | 2 |
LateRecoveryTps_down | 1 |
For root samples, out of the 19132 genes considered, here is the number of genes DE for each genotype, under the two drought conditions (using the global measure of DE, ie based on splines function fit, with qvalue required to be less than 0.05). Log-fold change is based on the global log-fold change measure, MaxOfIntervals (require to be in absolute value greater than 2)).
Genotype | Preflowering (DE) | Preflowering (DE+lfc) | Postflowering (DE) | Postflowering (DE+lfc) |
---|---|---|---|---|
RT430 | NA | 657 | NA | 187 |
BT642 | 15882 | 657 | 9150 | 187 |
We intersect the above DE/lfc results with the information about the ortholog group / gene, and its relationship to the 642/430 mappings
First we consider the relationship between 642/430 within the ortholog group:
Now the relationship of the orthogroup to 623:
The following are the number of genes significant (q-value < 0.05) for different cutoffs of absolute log-fold-change
Threshold | Pre.642 | Post.642 |
---|---|---|
0.8 | 4254 | 2101 |
1.0 | 2950 | 1302 |
1.5 | 1348 | 470 |
2.0 | 657 | 187 |
The following consider the number DE with log-fold change cutoffs specific to drought/recovery intervals:
Summary of the number of genes found DE with lfc > 2 in the indicated direction in drought intervals:
root_Preflowering_BT642 | root_Postflowering_BT642 | |
---|---|---|
DroughtTps_up | 229 | 65 |
DroughtTps_down | 303 | 138 |
EarlyDroughtTps_up | 104 | 36 |
EarlyDroughtTps_down | 121 | 98 |
LateDroughtTps_up | 386 | 107 |
LateDroughtTps_down | 593 | 269 |
Summary of the number of genes found DE with lfc > 2 in the indicated direction in recovery intervals (pre-flowering only):
root_Preflowering_BT642 | |
---|---|
RecoveryTps_up | 73 |
RecoveryTps_down | 15 |
EarlyRecoveryTps_up | 326 |
EarlyRecoveryTps_down | 177 |
LateRecoveryTps_up | 17 |
LateRecoveryTps_down | 0 |
## [1] "2021-12-14 23:14:57 PST"
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [5] tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
## [9] readr_2.0.2 rmarkdown_2.11 knitr_1.36 SCF_4.1.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 fs_1.5.0 lubridate_1.8.0
## [4] bit64_4.0.5 httr_1.4.2 GenomeInfoDb_1.30.0
## [7] tools_4.1.2 backports_1.3.0 bslib_0.3.1
## [10] utf8_1.2.2 R6_2.5.1 DBI_1.1.1
## [13] BiocGenerics_0.40.0 colorspace_2.0-2 withr_2.4.2
## [16] tidyselect_1.1.1 bit_4.0.4 compiler_4.1.2
## [19] cli_3.1.0 rvest_1.0.2 Biobase_2.54.0
## [22] xml2_1.3.2 sass_0.4.0 scales_1.1.1
## [25] genefilter_1.76.0 digest_0.6.28 XVector_0.34.0
## [28] pkgconfig_2.0.3 htmltools_0.5.2 highr_0.9
## [31] limma_3.50.0 dbplyr_2.1.1 fastmap_1.1.0
## [34] rlang_0.4.12 readxl_1.3.1 rstudioapi_0.13
## [37] RSQLite_2.2.8 jquerylib_0.1.4 generics_0.1.1
## [40] jsonlite_1.7.2 vroom_1.5.5 RCurl_1.98-1.5
## [43] magrittr_2.0.1 GenomeInfoDbData_1.2.7 Matrix_1.3-4
## [46] Rcpp_1.0.7 munsell_0.5.0 S4Vectors_0.32.2
## [49] fansi_0.5.0 lifecycle_1.0.1 stringi_1.7.5
## [52] yaml_2.2.1 zlibbioc_1.40.0 grid_4.1.2
## [55] blob_1.2.2 parallel_4.1.2 crayon_1.4.2
## [58] lattice_0.20-45 splines_4.1.2 Biostrings_2.60.2
## [61] haven_2.4.3 annotate_1.72.0 hms_1.1.1
## [64] KEGGREST_1.34.0 pillar_1.6.4 stats4_4.1.2
## [67] reprex_2.0.1 XML_3.99-0.8 glue_1.5.0
## [70] evaluate_0.14 modelr_0.1.8 png_0.1-7
## [73] vctrs_0.3.8 tzdb_0.2.0 cellranger_1.1.0
## [76] gtable_0.3.0 assertthat_0.2.1 cachem_1.0.6
## [79] xfun_0.28 xtable_1.8-4 broom_0.7.10
## [82] survival_3.2-13 AnnotationDbi_1.56.1 memoise_2.0.0
## [85] IRanges_2.28.0 ellipsis_0.3.2