Browsed by
Category: genomics

Pangenome analysis with PanX

Pangenome analysis with PanX

PanX is a program for pangenome analysis and production of core genome phylogenies. The PanX analysis tools are available as a github package for custom genome analyses, and also as a web server. The PanX authors make a collection of precomputed comparisons available here. One of the demonstration datasets available at the PanX site is a collection of prochlorococcus marinus genomes sequenced by Biller et al (2014). In lecture last week, we also looked at a study of prochlorococcus by…

Read More Read More

Genome Comparison with Mauve

Genome Comparison with Mauve

There are many ways to compare genomes, and these comparisons provide different kinds of information about evolutionary history and shared function. First, how do you decide which genes are “the same” across multiple genomes, i.e. orthologs, genes having a common ancestor and related by speciation. Second, which parts of the nucleotide sequence of multiple genomes align, irrespective of gene boundaries and orthology relationships. Finally, how do we decide what genes “are” and what they do to help interpret the meaning…

Read More Read More

Access and use genome track data

Access and use genome track data

Genome browsers are designed to get different types of genome track data together using the common reference system of genomic coordinates. Often you’re more interested in manipulating whole sets of data, rather than just looking at one region at a time manually, so there are tools available both online and at the command line that will help you with genome track math. The purpose of today’s lab is to get you familiar with genome browsers and track types, and experienced…

Read More Read More

Sequence Read QC

Sequence Read QC

DNA Sequencing is a continually evolving technology, with new platforms becoming available each year, all designed with the aim of reducing the cost and increasing the speed of large sequencing projects. As you can see from the figure below, certain types of sequencing dominate the current market. The top three instruments shown are established high and moderate capacity Illumina sequencing instruments, the next two are Ion Torrent instruments, and the two below that are newer Illumina instruments. Beyond that there is a…

Read More Read More

Prepare Your Computer

Prepare Your Computer

Many of the assignments in BINF 6203 are scaled so that you can complete them on a reasonably powerful laptop without using the University Research Computing cluster. That said, in order to do the assignments you are going to need to prepare your computer by installing certain tools. I am going to assume that you’re working on an Apple computer running OSX, because that’s the type of computer I have available to set up and test, but the general requirements…

Read More Read More

BINF 6203: 16S rRNA classification with QIIME

BINF 6203: 16S rRNA classification with QIIME

This tutorial makes use of the data from the NC Urban Microbiome Project, a collaboration seeded by the Department of Bioinformatics and Genomics and involving participants from our department as well as Civil Engineering, Biology, and Geography and Earth Science. Many thanks to Kevin Lambirth, who created the original version of this tutorial for our lab interns. This tutorial uses QIIME 1.9 — there is a newer QIIME and we are all in the process of learning it, but it…

Read More Read More

BINF 6203: Read Mapping

BINF 6203: Read Mapping

Mapping short sequence reads to a reference sequence is a common task in genomics. Many different results can be extracted from a mapped sequence, depending on the original experimental design that produced the sequence reads and on the analysis that follows the mapping. For example: a genomic consensus for an individual (against the reference genome for that species) location of SNPs and other variations in one genome relative to the other location of expressed transcripts (coding mRNAs, noncoding RNAs such…

Read More Read More

BINF 6203: Bacterial genome annotation with prokka

BINF 6203: Bacterial genome annotation with prokka

In previous iterations of this course, we’ve used the RAST/SEED servers to perform annotation. RAST is a highly regarded genome annotation pipeline, and you will easily be able to justify using it to reviewers if you are ever writing a bacterial genome paper. However, it can be cumbersome to use the myRAST interface, the command line tools are not particularly well documented, and it can take a day or more to run. So this year we’re going to try a newer…

Read More Read More

UEGP: Detecting antibiotic resistance determinants

UEGP: Detecting antibiotic resistance determinants

One of the main sequence analysis tasks we want to perform on the UEGP dataset is the evaluation of antibiotic resistance potential in the wastewater microbial community. We have examples of analytical approaches to this problem in several prior studies. A very straightforward approach would be to use the CARD database and associated RGI tools to identify components of the “resistome”. However, RGI works on translated protein sequences, ignoring very short sequences, is optimized for genomes and genomic contigs, and has a 20MB…

Read More Read More

Summer Research: Urban Environmental Genomics Project

Summer Research: Urban Environmental Genomics Project

This summer, my students and I are working through analysis of wastewater microbiome sequencing data. The analysis includes 3 timepoints, 11 sampling points, 2 wastewater treatment streams, and 3 replicates of each sample. 198 samples in total. For each sample we have 16S ribosomal and whole genome metagenome sequence data, and mass spec analysis for 10 different antibiotics of interest. Samples were collected by students in areas made accessible to us by Charlotte Water, and sequenced at DHMRI. We need to do several different types…

Read More Read More