This document has last been compiled on 2021-12-14 21:02:10.
This document will create a the final, normalized dataset that can be used for downstream analysis. It creates a separate file for leaf and root samples (i.e. normalized separately). It is a subset of the original (defunct) mRNA_normalize.Rmd
that explores different normalizations. This file instead only does the final normalization, which is a simple quantile normalization.
We will normalize starting with the count data (fpkm_counts.txt).
Year1 There are questionable samples that we will remove. Specifically, the mislabeled leaf sample from week 4 (0629164L11
). Even with the genotype corrected, it did not appear to well match other replicates. We will also remove the bad samples from week 15. [total=4 samples]. However, for now we will not remove the other questionable quality samples nor all of week 15 and 16.
Year2 There are bad samples we will remove and not normalize, namely the mislabeled leaf sample ()
In what follows, we will normalize the leaf and root samples separately. For integrative analysis, we will go back and normalize them jointly as well, see mRNA_normalizeJoint.Rmd
. We will normalize all samples, not only samples from the main experiment (only applicable in years after year 1).
## Reading in metadata from file: results/BT642Year2/data/rnaMetaData.txt
## Reading in data from file: results/BT642Year2/data/tpm_counts.txt
Now we will remove genes based on the getFilteredGenes results for leaf and root. getFilteredGenes results are based on count file, where it will require at least 20 counts in 3 samples (since there generally 3 replicates of each condition).
## Separating Leaves from full dataset for filtering.
## Total Leaves samples: 141
## Total number of leaf samples: 141
## Total number of genes in leaf samples: 35115
## Removing the 16 bad samples identified from the leaf samples
## After filtering leaf samples: 125 samples, 35115 genes
We can look at the RLE and PCA plots before any normalization:
## Separating Roots from full dataset for filtering.
## Total Roots samples: 149
## Total number of root samples: 149
## Total number of genes in root samples: 35115
## Removing the 16 bad samples identified from the root samples
## After filtering root samples: 133 samples, 35115 genes
We can look at the RLE and PCA plots before any normalization:
Right now, nothing is beating upper-quartile normalization, so we will save those going forward.
We can look at the RLE plots after normalization:
We can look at the RLE plots after normalization: