BINF 6203: RNASeq (Part 2)
Once your read mapping completes, you are into the part of the assignment where analyses will run much more quickly.
CLC Genomics has, literally, seven tutorials that you could follow to deal with all of the pieces of RNASeq and gene expression analysis. The parts we’ve already done — loading your data and mapping reads — are covered in Tutorial 1. We are not going to do all of the possible remaining steps. Let’s focus on a few important things.
Set up an experiment with your mapped data
Go to: Toolbox > Transcriptomics Analysis > Set Up Experiment
First, select the mapped gene expression results you just generated. Select all four samples.
Then choose an unpaired two-group comparison:
Then assign your samples to groups. *23 and *24 are wild type, *26 and *27 are LuxS mutants:
Then, click through a couple more self-explanatory steps to finish the data analysis and you will almost immediately get this result:
In the table, you’ll see that the first column is gene names and the 5th column is fold change. You can sort by fold change and scan through your data to see what things look like.
Statistics of differential expression
You can calculate all sorts of things to add to your basic table. For instance, in Toolbox > Transcriptomics > Transformations and Normalizations, you can manually initialize different kinds of normalizations and log transformations, and apply them to your data.
If you want to get the appropriate statistics to create a volcano plot, you can go to Toolbox > Transcriptomics > Transformations and Normalizations > Statistical Analysis and choose Empirical Analysis of DGE. This will get you Bonferroni-corrected and FDR-corrected p-values for your data set. There are other ways to do this as well, for example via a t-test on Gaussian data. Once you’ve done that, you can make some of the key plots:
Plot your data in a volcano plot and heat map
Once you’ve calculated your DGE statistics, you can select the volcano plot icon at the bottom corner of the main window to see that view on your data:
To obtain a heat map view, you need to run “Hierarchical Clustering of Features”.
Once all of the views are available, you can filter the available data based on criteria such as the EDGE test p-value and fold-change, to save a list of the most significantly differentially expressed genes:
There’s one more short part coming.