Browsed by
Category: genomics

Read mapping and simple variants

Read mapping and simple variants

Mapping short sequence reads to a reference sequence is a common task in genomics. Many different results can be extracted from a mapped sequence, depending on the original experimental design that produced the sequence reads and on the analysis that follows the mapping. For example: a genomic consensus for an individual (against the reference genome for that species) location of SNPs and other variations in one genome relative to the other location of expressed transcripts (coding mRNAs, noncoding RNAs such…

Read More Read More

Genome annotation with prokka

Genome annotation with prokka

Why use Prokka? First, because in a benchmark test it has been shown to be as or more accurate at reproducing known annotation than RAST or xBASE2 in most annotation categories. Second, because it’s fast and you can run it on a standard laptop within a short time, while sending your genome out for annotation to the RAST server can take a day or so to return. If you’re interested in learning to use the RAST server, search the site…

Read More Read More

Sequence Assembly

Sequence Assembly

For today’s class, I have prepared a shared dropbox folder that contains the following items: sequence reads from one of the better chloroplast genome samples generated with our Ion Torrent instrument ERR008613 (a set of paired end Illumina sequence reads from ends of 200bp E. coli fragments) ERR022075 (a set of paired end Illumina sequence reads from ends of 600bp E. coli fragments) sets of PacBio CCS and CLR reads for E. coli In the first part of the exercise,…

Read More Read More

NGS QC

NGS QC

In this exercise we will focus primarily on quality analysis and quality control of Illumina sequencing data, since that is the type of NGS data you are currently most likely to encounter in new datasets. You can view older versions of this exercise for tips on how to handle Ion Torrent or 454 data if you encounter that in your work. What sequence data looks like — the FASTQ file NGS read files tend to be distributed in *.fastq format. To…

Read More Read More

Microbial community analysis with QIIME2

Microbial community analysis with QIIME2

This tutorial makes use of the data from the NC Urban Microbiome Project, a collaboration seeded by the Department of Bioinformatics and Genomics and involving participants from our department as well as Civil Engineering, Biology, and Geography and Earth Science. Our goal in lab this week is to analyze 16S ribosomal RNA sequences from mixed microbial samples using QIIME2. The QIIME2 analysis will tell us what identifiable microbes are present in the samples (usually at the genus level rather than…

Read More Read More

Whole genome shotgun metagenomics with MetaPhLan

Whole genome shotgun metagenomics with MetaPhLan

Like last week’s tutorial, this tutorial uses Urban Environmental Genomics Project data. The original version of the tutorial was developed by Anju Lulla for our student interns. Preparation and software installation You can use metaphlan on the cluster and that’s probably the best idea. If you have a reasonably powerful laptop you can run it there on these small data sets. You can download Metaphlan by cloning it (hg/Mercurial): hg clone https://bitbucket.org/biobakery/metaphlan2. Note: this will put metaphlan2’s directory as a…

Read More Read More

Gene expression analysis with DE-Seq

Gene expression analysis with DE-Seq

With many thanks to Anju Lulla — this is a modification of a protocol she used for the paper we are working on with our collaborators. To start off this lab, you should have an output file from featurecounts with five columns.  The gene ID column followed by counts from both samples in group A, then counts from both samples in group B. There’s two ways to run this analysis. You can use R at the command line, or you…

Read More Read More

Expression quantitation with featurecounts

Expression quantitation with featurecounts

In this lab and the next, we are going to use two different methods to calculate differential expression for the same RNASeq dataset. Note: there is no standalone lab writeup due for the featurecounts part. Instead you’ll do one writeup that covers both this part plus the differential expression analysis with DeSeq. In a nutshell, we have measured gene expression under two conditions (two replicates each) and we want to find out which genes are the most significantly differentially expressed…

Read More Read More

Read mapping and variant calling

Read mapping and variant calling

Mapping short sequence reads to a reference sequence is a common task in genomics. Many different results can be extracted from a mapped sequence, depending on the original experimental design that produced the sequence reads and on the analysis that follows the mapping. For example: a genomic consensus for an individual (against the reference genome for that species) location of SNPs and other variations in one genome relative to the other location of expressed transcripts (coding mRNAs, noncoding RNAs such…

Read More Read More

Pangenome analysis with PanX

Pangenome analysis with PanX

PanX is a program for pangenome analysis and production of core genome phylogenies. The PanX analysis tools are available as a github package for custom genome analyses, and also as a web server. The PanX authors make a collection of precomputed comparisons available here. One of the demonstration datasets available at the PanX site is a collection of prochlorococcus marinus genomes sequenced by Biller et al (2014). In lecture last week, we also looked at a study of prochlorococcus by…

Read More Read More