GenoSets: Loading OrthoMCL clusters and GO terms
Once your BLAST and OrthoMCL runs are completed, you can add the output from those processes into GenoSets. First, select Add Data in the top menu. You’re going to add OrthoMCL data from the directories where you told GenoSets to create those files. In the Load Wizard, expand the OrthoMCL menu and then Load OrthoMCL v2.
Next you’ll select the FASTA results and the OrthoMCL groups.txt file that describe your data. You’ll find these in the directory you created in the last step. In the example we’re looking in $HOME/Documents/DATA/ecoli_tutorial2, but if you’ve given your directory a different name you’ll have to choose something else. From that directory, choose the fasta subdirectory and the groups.txt file as shown:
Click “next”. You’ll be prompted to name your “method” and a description of the parameters used when you ran OrthoMCL. Naming your method allows you to run the analysis with different parameters (for instance a different similarity cutoff in OrthoMCL) and compare outcomes.
(note for later: ideally we would simply capture this from the input files. since you construct the command sequence why not just store it and pass it through).
Then click ‘Finish’ to add the OrthoMCL data to Genosets. Depending on the size of your data set, it may take a while to load these data. However, for the tutorial example it will probably take less than 5 minutes for them to load.
Next, if you’re interested in the Gene Ontology classifications of the genes you find in your comparison, you should import GO terms for your genes. Gene Ontology is a structured vocabulary for describing the functional classification of a gene in terms of biological process, molecular function and cellular component. Using a structured vocabulary facilitates comparison across genomes where annotations may not be completely consistent. In a comparison among bacterial genomes, GO term enrichment analysis of subgroups of genes can provide insight into the functional outcome of gene gains and losses.
Go to Add Data again and this time choose Gene Ontology. You’ll see that you can either grab GO annotations from EMBL or upload them from your local machine in GO Annotation File (GAF) format. The second option is useful if you’re working with newly sequenced genomes that don’t have GO terms or aren’t in the EMBL database yet. You can generate GO terms for unclassified genes by using a program like Blast2GO. In this example though, we can just upload GO terms from EMBL.
Click ‘next’. You’ll be prompted to choose an OBO file. The OBO file describes the hierarchy of ontology terms and their relationships. Most likely you’ll want to use the latest version of the Gene Ontology OBO file, but we leave the option for you to choose a different file or version if needed.
Once your OBO file is imported, you can select the chromosomes for which you want to retrieve GO terms. In this case, just select them all and click “Add”. You should see the list of chromosomes move from the left box to the right.
Click “next”, then create a name and description for this Method.
Click “Next” and then “Finish” to complete the upload process. Depending on the size of your data set, the GO terms may take a while to load. However, with the tutorial example, it should take less than 5 minutes to complete this step.