This document has last been compiled on 2021-12-14 21:34:11.
## Reading in metadata from file: results/BT642Year1/data/root_meta.txt
## Reading in data from file: results/BT642Year1/data/root_normCounts.txt
## Reading in metadata from file: results/BT642Year1/data/leaf_meta.txt
## Reading in data from file: results/BT642Year1/data/leaf_normCounts.txt
## Total number of leaf samples: 100
## Total number of genes in leaf samples: 21183
## No additional bad samples to remove (probably removed during normalization)
## There are no samples to remove beyond those used in the main experiment
## Variance filter for leaf samples: Remove genes in the bottom 0.2 quantile of variability
## Total number of genes after limiting to high variable genes : 16946
## After filtering leaf samples: 100 samples, 16946 genes
## Total number of root samples: 96
## Total number of genes in root samples: 23595
## No additional bad samples to remove (probably removed during normalization)
## There are no samples to remove beyond those used in the main experiment
## Variance filter for root samples: Remove genes in the bottom 0.2 quantile of variability
## Total number of genes after limiting to high variable genes : 18876
## After filtering root samples: 96 samples, 18876 genes
There are 35115 in the S. bicolor genome (based on annotation: gene_annotation_642.tsv). After low-expression filtering and normalization, we obtain 21183 genes in the leaf samples, and 23595 genes in root samples. If we filter on top of this to the top 50% variable genes, we get 18876 in the root and 16946 (Note: we are not using this variance filter except in the clustering, and we need to double check that it is still being used in this way) Unlike in our previous analysis, our differential expression analysis is not first doing a leaf-specific or root-specific expression filter after normalization to remove leaf or root specific genes.
We give some basic info on the annotations on all genes / orthogroups:
For leaf samples, out of the 16946 genes considered, here is the number of genes DE for each genotype, under the two drought conditions (using the global measure of DE, ie based on splines function fit, with qvalue required to be less than 0.05). Log-fold change is based on the global log-fold change measure, MaxOfIntervals (require to be in absolute value greater than 2)).
Genotype | Preflowering (DE) | Preflowering (DE+lfc) | Postflowering (DE) | Postflowering (DE+lfc) |
---|---|---|---|---|
RT430 | NA | 505 | NA | 160 |
BT642 | 9839 | 505 | 6429 | 160 |
We intersect the above DE/lfc results with the information about the ortholog group / gene, and its relationship to the 642/430 mappings
First we consider the relationship between 642/430 within the ortholog group:
Now the relationship of the orthogroup to 623:
The following are the number of genes significant (q-value < 0.05) for different cutoffs of absolute log-fold-change
Threshold | Pre.642 | Post.642 |
---|---|---|
0.8 | 2736 | 1637 |
1.0 | 1950 | 1045 |
1.5 | 938 | 394 |
2.0 | 505 | 160 |
The following consider the number DE with log-fold change cutoffs specific to drought/recovery intervals:
Summary of the number of genes found DE with lfc > 2 in the indicated direction in drought intervals:
leaf_Preflowering_BT642 | leaf_Postflowering_BT642 | |
---|---|---|
DroughtTps_up | 131 | 73 |
DroughtTps_down | 59 | 71 |
EarlyDroughtTps_up | 79 | 114 |
EarlyDroughtTps_down | 9 | 48 |
LateDroughtTps_up | 254 | 62 |
LateDroughtTps_down | 223 | 92 |
Summary of the number of genes found DE with lfc > 2 in the indicated direction in recovery intervals (pre-flowering only):
leaf_Preflowering_BT642 | |
---|---|
RecoveryTps_up | 6 |
RecoveryTps_down | 19 |
EarlyRecoveryTps_up | 265 |
EarlyRecoveryTps_down | 111 |
LateRecoveryTps_up | 6 |
LateRecoveryTps_down | 23 |
For root samples, out of the 18876 genes considered, here is the number of genes DE for each genotype, under the two drought conditions (using the global measure of DE, ie based on splines function fit, with qvalue required to be less than 0.05). Log-fold change is based on the global log-fold change measure, MaxOfIntervals (require to be in absolute value greater than 2)).
Genotype | Preflowering (DE) | Preflowering (DE+lfc) | Postflowering (DE) | Postflowering (DE+lfc) |
---|---|---|---|---|
RT430 | NA | 915 | NA | 449 |
BT642 | 17937 | 915 | 10358 | 449 |
We intersect the above DE/lfc results with the information about the ortholog group / gene, and its relationship to the 642/430 mappings
First we consider the relationship between 642/430 within the ortholog group:
Now the relationship of the orthogroup to 623:
The following are the number of genes significant (q-value < 0.05) for different cutoffs of absolute log-fold-change
Threshold | Pre.642 | Post.642 |
---|---|---|
0.8 | 5753 | 3088 |
1.0 | 4073 | 2147 |
1.5 | 1783 | 913 |
2.0 | 915 | 449 |
The following consider the number DE with log-fold change cutoffs specific to drought/recovery intervals:
Summary of the number of genes found DE with lfc > 2 in the indicated direction in drought intervals:
root_Preflowering_BT642 | root_Postflowering_BT642 | |
---|---|---|
DroughtTps_up | 331 | 189 |
DroughtTps_down | 721 | 318 |
EarlyDroughtTps_up | 112 | 187 |
EarlyDroughtTps_down | 264 | 398 |
LateDroughtTps_up | 705 | 198 |
LateDroughtTps_down | 1404 | 355 |
Summary of the number of genes found DE with lfc > 2 in the indicated direction in recovery intervals (pre-flowering only):
root_Preflowering_BT642 | |
---|---|
RecoveryTps_up | 30 |
RecoveryTps_down | 6 |
EarlyRecoveryTps_up | 427 |
EarlyRecoveryTps_down | 170 |
LateRecoveryTps_up | 15 |
LateRecoveryTps_down | 4 |
## [1] "2021-12-14 21:34:53 PST"
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [5] tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
## [9] readr_2.0.2 rmarkdown_2.11 knitr_1.36 SCF_4.1.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 fs_1.5.0 lubridate_1.8.0
## [4] bit64_4.0.5 httr_1.4.2 GenomeInfoDb_1.30.0
## [7] tools_4.1.2 backports_1.3.0 bslib_0.3.1
## [10] utf8_1.2.2 R6_2.5.1 DBI_1.1.1
## [13] BiocGenerics_0.40.0 colorspace_2.0-2 withr_2.4.2
## [16] tidyselect_1.1.1 bit_4.0.4 compiler_4.1.2
## [19] cli_3.1.0 rvest_1.0.2 Biobase_2.54.0
## [22] xml2_1.3.2 sass_0.4.0 scales_1.1.1
## [25] genefilter_1.76.0 digest_0.6.28 XVector_0.34.0
## [28] pkgconfig_2.0.3 htmltools_0.5.2 highr_0.9
## [31] limma_3.50.0 dbplyr_2.1.1 fastmap_1.1.0
## [34] rlang_0.4.12 readxl_1.3.1 rstudioapi_0.13
## [37] RSQLite_2.2.8 jquerylib_0.1.4 generics_0.1.1
## [40] jsonlite_1.7.2 vroom_1.5.5 RCurl_1.98-1.5
## [43] magrittr_2.0.1 GenomeInfoDbData_1.2.7 Matrix_1.3-4
## [46] Rcpp_1.0.7 munsell_0.5.0 S4Vectors_0.32.2
## [49] fansi_0.5.0 lifecycle_1.0.1 stringi_1.7.5
## [52] yaml_2.2.1 zlibbioc_1.40.0 grid_4.1.2
## [55] blob_1.2.2 parallel_4.1.2 crayon_1.4.2
## [58] lattice_0.20-45 splines_4.1.2 Biostrings_2.60.2
## [61] haven_2.4.3 annotate_1.72.0 hms_1.1.1
## [64] KEGGREST_1.34.0 pillar_1.6.4 stats4_4.1.2
## [67] reprex_2.0.1 XML_3.99-0.8 glue_1.5.0
## [70] evaluate_0.14 modelr_0.1.8 png_0.1-7
## [73] vctrs_0.3.8 tzdb_0.2.0 cellranger_1.1.0
## [76] gtable_0.3.0 assertthat_0.2.1 cachem_1.0.6
## [79] xfun_0.28 xtable_1.8-4 broom_0.7.10
## [82] survival_3.2-13 AnnotationDbi_1.56.1 memoise_2.0.0
## [85] IRanges_2.28.0 ellipsis_0.3.2