Whole genome shotgun metagenomics with MetaPhLan

Whole genome shotgun metagenomics with MetaPhLan

Like last week’s tutorial, this tutorial uses Urban Environmental Genomics Project data. The original version of the tutorial was developed by Anju Lulla for our student interns. Preparation and software installation You can use metaphlan on the cluster and that’s probably the best idea. If you have a reasonably powerful laptop you can run it there on these small data sets. You can download Metaphlan by cloning it (hg/Mercurial): hg clone https://bitbucket.org/biobakery/metaphlan2. Note: this will put metaphlan2’s directory as a…

Read More Read More

Gene expression analysis with DE-Seq

Gene expression analysis with DE-Seq

With many thanks to Anju Lulla — this is a modification of a protocol she used for the paper we are working on with our collaborators. To start off this lab, you should have an output file from featurecounts with five columns.  The gene ID column followed by counts from both samples in group A, then counts from both samples in group B. There’s two ways to run this analysis. You can use R at the command line, or you…

Read More Read More

Troubleshooting the read mapping lab

Troubleshooting the read mapping lab

Hey everyone! A few people are reporting some trouble in making this lab work so I updated all my software and went in. I am just going to walk through what worked with one sample. I started with the BC26 chloroplast file from SRA. This is fine: fastq-dump SRR1763773 Moved it to a shorter filename: mv SRR1763773.fastq BC26.fastq Built the bowtie2 index from the reference genome file. This is fine: bowtie2-build NC_007898.fasta NC_007898 This command should produce this output, including…

Read More Read More

Expression quantitation with featurecounts

Expression quantitation with featurecounts

In this lab and the next, we are going to use two different methods to calculate differential expression for the same RNASeq dataset. Note: there is no standalone lab writeup due for the featurecounts part. Instead you’ll do one writeup that covers both this part plus the differential expression analysis with DeSeq. In a nutshell, we have measured gene expression under two conditions (two replicates each) and we want to find out which genes are the most significantly differentially expressed…

Read More Read More

Read mapping and variant calling

Read mapping and variant calling

Mapping short sequence reads to a reference sequence is a common task in genomics. Many different results can be extracted from a mapped sequence, depending on the original experimental design that produced the sequence reads and on the analysis that follows the mapping. For example: a genomic consensus for an individual (against the reference genome for that species) location of SNPs and other variations in one genome relative to the other location of expressed transcripts (coding mRNAs, noncoding RNAs such…

Read More Read More

Pangenome analysis with PanX

Pangenome analysis with PanX

PanX is a program for pangenome analysis and production of core genome phylogenies. The PanX analysis tools are available as a github package for custom genome analyses, and also as a web server. The PanX authors make a collection of precomputed comparisons available here. One of the demonstration datasets available at the PanX site is a collection of prochlorococcus marinus genomes sequenced by Biller et al (2014). In lecture last week, we also looked at a study of prochlorococcus by…

Read More Read More

Genome Comparison with Mauve

Genome Comparison with Mauve

There are many ways to compare genomes, and these comparisons provide different kinds of information about evolutionary history and shared function. First, how do you decide which genes are “the same” across multiple genomes, i.e. orthologs, genes having a common ancestor and related by speciation. Second, which parts of the nucleotide sequence of multiple genomes align, irrespective of gene boundaries and orthology relationships. Finally, how do we decide what genes “are” and what they do to help interpret the meaning…

Read More Read More

Access and use genome track data

Access and use genome track data

Genome browsers are designed to get different types of genome track data together using the common reference system of genomic coordinates. Often you’re more interested in manipulating whole sets of data, rather than just looking at one region at a time manually, so there are tools available both online and at the command line that will help you with genome track math. The purpose of today’s lab is to get you familiar with genome browsers and track types, and experienced…

Read More Read More

Sequence Read QC

Sequence Read QC

DNA Sequencing is a continually evolving technology, with new platforms becoming available each year, all designed with the aim of reducing the cost and increasing the speed of large sequencing projects. As you can see from the figure below, certain types of sequencing dominate the current market. The top three instruments shown are established high and moderate capacity Illumina sequencing instruments, the next two are Ion Torrent instruments, and the two below that are newer Illumina instruments. Beyond that there is a…

Read More Read More

Prepare Your Computer

Prepare Your Computer

Many of the assignments in BINF 6203 are scaled so that you can complete them on a reasonably powerful laptop without using the University Research Computing cluster. That said, in order to do the assignments you are going to need to prepare your computer by installing certain tools. I am going to assume that you’re working on an Apple computer running OSX, because that’s the type of computer I have available to set up and test, but the general requirements…

Read More Read More