Read mapping and simple variants

Read mapping and simple variants

Mapping short sequence reads to a reference sequence is a common task in genomics. Many different results can be extracted from a mapped sequence, depending on the original experimental design that produced the sequence reads and on the analysis that follows the mapping. For example: a genomic consensus for an individual (against the reference genome for that species) location of SNPs and other variations in one genome relative to the other location of expressed transcripts (coding mRNAs, noncoding RNAs such…

Read More Read More

Genome annotation with prokka

Genome annotation with prokka

Why use Prokka? First, because in a benchmark test it has been shown to be as or more accurate at reproducing known annotation than RAST or xBASE2 in most annotation categories. Second, because it’s fast and you can run it on a standard laptop within a short time, while sending your genome out for annotation to the RAST server can take a day or so to return. If you’re interested in learning to use the RAST server, search the site…

Read More Read More

Sequence Assembly

Sequence Assembly

For today’s class, I have prepared a shared dropbox folder that contains the following items: sequence reads from one of the better chloroplast genome samples generated with our Ion Torrent instrument ERR008613 (a set of paired end Illumina sequence reads from ends of 200bp E. coli fragments) ERR022075 (a set of paired end Illumina sequence reads from ends of 600bp E. coli fragments) sets of PacBio CCS and CLR reads for E. coli In the first part of the exercise,…

Read More Read More

NGS QC

NGS QC

In this exercise we will focus primarily on quality analysis and quality control of Illumina sequencing data, since that is the type of NGS data you are currently most likely to encounter in new datasets. You can view older versions of this exercise for tips on how to handle Ion Torrent or 454 data if you encounter that in your work. What sequence data looks like — the FASTQ file NGS read files tend to be distributed in *.fastq format. To…

Read More Read More

Got a Fresh Apple?

Got a Fresh Apple?

How do I get my computer ready for Bioinformatics 2111? I just got a new Apple laptop, so I get to do a fresh start. By the time I get through 4 years with a machine, I’ve generally installed and tweaked so much bioinformatics software and supporting libraries (and probably installed some of it via obsolete tools) that I literally can not tell students how to exactly replicate the working environment I have. As always, xckd knows how it is:…

Read More Read More

Microbial community analysis with QIIME2

Microbial community analysis with QIIME2

This tutorial makes use of the data from the NC Urban Microbiome Project, a collaboration seeded by the Department of Bioinformatics and Genomics and involving participants from our department as well as Civil Engineering, Biology, and Geography and Earth Science. Our goal in lab this week is to analyze 16S ribosomal RNA sequences from mixed microbial samples using QIIME2. The QIIME2 analysis will tell us what identifiable microbes are present in the samples (usually at the genus level rather than…

Read More Read More

Whole genome shotgun metagenomics with MetaPhLan

Whole genome shotgun metagenomics with MetaPhLan

Like last week’s tutorial, this tutorial uses Urban Environmental Genomics Project data. The original version of the tutorial was developed by Anju Lulla for our student interns. Preparation and software installation You can use metaphlan on the cluster and that’s probably the best idea. If you have a reasonably powerful laptop you can run it there on these small data sets. You can download Metaphlan by cloning it (hg/Mercurial): hg clone https://bitbucket.org/biobakery/metaphlan2. Note: this will put metaphlan2’s directory as a…

Read More Read More

Gene expression analysis with DE-Seq

Gene expression analysis with DE-Seq

With many thanks to Anju Lulla — this is a modification of a protocol she used for the paper we are working on with our collaborators. To start off this lab, you should have an output file from featurecounts with five columns.  The gene ID column followed by counts from both samples in group A, then counts from both samples in group B. There’s two ways to run this analysis. You can use R at the command line, or you…

Read More Read More

Troubleshooting the read mapping lab

Troubleshooting the read mapping lab

Hey everyone! A few people are reporting some trouble in making this lab work so I updated all my software and went in. I am just going to walk through what worked with one sample. I started with the BC26 chloroplast file from SRA. This is fine: fastq-dump SRR1763773 Moved it to a shorter filename: mv SRR1763773.fastq BC26.fastq Built the bowtie2 index from the reference genome file. This is fine: bowtie2-build NC_007898.fasta NC_007898 This command should produce this output, including…

Read More Read More

Expression quantitation with featurecounts

Expression quantitation with featurecounts

In this lab and the next, we are going to use two different methods to calculate differential expression for the same RNASeq dataset. Note: there is no standalone lab writeup due for the featurecounts part. Instead you’ll do one writeup that covers both this part plus the differential expression analysis with DeSeq. In a nutshell, we have measured gene expression under two conditions (two replicates each) and we want to find out which genes are the most significantly differentially expressed…

Read More Read More