Browsed by
Category: genomics

Lab 2: Genome Annotation and Comparison

Lab 2: Genome Annotation and Comparison

Why use Prokka? Because in a benchmark test it has been shown to be as or more accurate at reproducing known annotation than RAST or xBASE2 in most annotation categories. It can run on a normal laptop for a small genome and is easy and convenient to use. Setup Prokka is not installed on the hpc-student cluster yet, so we’re going to learn how to install a program for your own use on the cluster. Jon Halter in University Research…

Read More Read More

UPDATED: Sequence Assembly

UPDATED: Sequence Assembly

I’ve asked Juan to upload data to the class directory on the Centaurus cluster (hpc-student.uncc.edu). So instead of transferring from Dropbox, you can go directly to the cluster and copy the data you need into your working directory. You will find: sequence reads from one of the better chloroplast genome samples generated with our Ion Torrent instrument ERR008613 (a set of paired end Illumina sequence reads from ends of 200bp E. coli fragments) ERR022075 (a set of paired end Illumina…

Read More Read More

Updated: NGS QC

Updated: NGS QC

In this exercise we will focus primarily on quality analysis and quality control of Illumina sequencing data, since that is the type of NGS data you are currently most likely to encounter in new datasets. You can view older versions of this exercise for tips on how to handle Ion Torrent or 454 data if you encounter that in your work. What sequence data looks like — the FASTQ file NGS read files tend to be distributed in *.fastq format. To…

Read More Read More

New computer setup: Apple

New computer setup: Apple

Here’s what I do, in order, to set my computer up to be able to install software that we use in the class. Disclaimer: I just got a new work laptop, so it doesn’t have much of an environment on it yet. I can’t know what you’ve done to your computer in the past, and so some of these things may not work for you if your environment’s already like this: Administrator privileges 2. xcode Install the xcode libraries that…

Read More Read More

Read mapping and simple variants

Read mapping and simple variants

Mapping short sequence reads to a reference sequence is a common task in genomics. Many different results can be extracted from a mapped sequence, depending on the original experimental design that produced the sequence reads and on the analysis that follows the mapping. For example: a genomic consensus for an individual (against the reference genome for that species) location of SNPs and other variations in one genome relative to the other location of expressed transcripts (coding mRNAs, noncoding RNAs such…

Read More Read More

Microbial community analysis with QIIME2

Microbial community analysis with QIIME2

This tutorial makes use of the data from the NC Urban Microbiome Project, a collaboration seeded by the Department of Bioinformatics and Genomics and involving participants from our department as well as Civil Engineering, Biology, and Geography and Earth Science. Our goal in lab this week is to analyze 16S ribosomal RNA sequences from mixed microbial samples using QIIME2. The QIIME2 analysis will tell us what identifiable microbes are present in the samples (usually at the genus level rather than…

Read More Read More

Whole genome shotgun metagenomics with MetaPhLan

Whole genome shotgun metagenomics with MetaPhLan

Like last week’s tutorial, this tutorial uses Urban Environmental Genomics Project data. The original version of the tutorial was developed by Anju Lulla for our student interns. Preparation and software installation You can use metaphlan on the cluster and that’s probably the best idea. If you have a reasonably powerful laptop you can run it there on these small data sets. You can download Metaphlan by cloning it (hg/Mercurial): hg clone https://bitbucket.org/biobakery/metaphlan2. Note: this will put metaphlan2’s directory as a…

Read More Read More

Gene expression analysis with DE-Seq

Gene expression analysis with DE-Seq

With many thanks to Anju Lulla — this is a modification of a protocol she used for the paper we are working on with our collaborators. To start off this lab, you should have an output file from featurecounts with five columns.  The gene ID column followed by counts from both samples in group A, then counts from both samples in group B. There’s two ways to run this analysis. You can use R at the command line, or you…

Read More Read More

Expression quantitation with featurecounts

Expression quantitation with featurecounts

In this lab and the next, we are going to use two different methods to calculate differential expression for the same RNASeq dataset. Note: there is no standalone lab writeup due for the featurecounts part. Instead you’ll do one writeup that covers both this part plus the differential expression analysis with DeSeq. In a nutshell, we have measured gene expression under two conditions (two replicates each) and we want to find out which genes are the most significantly differentially expressed…

Read More Read More

Read mapping and variant calling

Read mapping and variant calling

Mapping short sequence reads to a reference sequence is a common task in genomics. Many different results can be extracted from a mapped sequence, depending on the original experimental design that produced the sequence reads and on the analysis that follows the mapping. For example: a genomic consensus for an individual (against the reference genome for that species) location of SNPs and other variations in one genome relative to the other location of expressed transcripts (coding mRNAs, noncoding RNAs such…

Read More Read More