Browsed by
Category: BINF 2111

Got a Fresh Apple?

Got a Fresh Apple?

How do I get my computer ready for Bioinformatics 2111? I just got a new Apple laptop, so I get to do a fresh start. By the time I get through 4 years with a machine, I’ve generally installed and tweaked so much bioinformatics software and supporting libraries (and probably installed some of it via obsolete tools) that I literally can not tell students how to exactly replicate the working environment I have. As always, xckd knows how it is:…

Read More Read More

BINF 2111: Genome Horoscope (eye color — part 2)

BINF 2111: Genome Horoscope (eye color — part 2)

Your mission for today is to write a script that checks a 23 and Me genotype file for several groups of SNPs that are associated with blue eyes. The SNPs in question are in the file “blueeyepanel.txt” in the Dropbox. The format of the file is “genotype, SNP, haplotype block”. Genotype is the derived genotype common in people with the blue eye variant discussed in the last post. SNP is the SNP identifier. Haplotype block is the code name for…

Read More Read More

BINF 2111: Genome Horoscope — eye color (Part 1)

BINF 2111: Genome Horoscope — eye color (Part 1)

In this week’s exercise, we’re going to practice loops and lists in Python by looping through a 23 and me file to pull out data that matches a list of SNPs of interest. Before we get into the coding, I think it’s worth exploring how we arrive at such a list. In order to write a useful program, you need to understand the problem it’s solving. The raw information As I mentioned previously, I have access to my own 23 and…

Read More Read More

BINF 2111: Do SNPs make a difference

BINF 2111: Do SNPs make a difference

In the next script, we’ll combine the information from genomic FASTA file, GFF, and VCF to see if the variants change the protein sequence. The big challenge in this problem is integrating three sources of information that don’t all use the same unique identifying information. The FASTA file has the actual sequence (which can be put in the form of a big string). The GFF CDS records have the coordinates of proteins. The VCF has the location and sequence of SNPs or…

Read More Read More

BINF 2111: Protein translation

BINF 2111: Protein translation

Over the last few days, we’ve learned several pieces of python that will add up to a script that translates DNA into protein. Today’s lab is a little bit of a test — can you take what you’ve learned piece by piece and synthesize it together to solve a problem. Today we wrote a script that turns a genetic code table downloaded from NCBI: AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = —M—————M————MMMM—————M———— Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG into…

Read More Read More

BINF 2111: Genome Horoscope, Inc.

BINF 2111: Genome Horoscope, Inc.

Last week, you made a script that takes a “panel” of SNPs where a particular variant allele is associated with blue eyes in Europeans, parses an individual’s 23 and Me results to check their SNP status for each site in the panel, and reports back to the user. This week, I want you to modify your “Genome Horoscope”project with a couple of goals in mind: Define functions that do distinct parts of the script’s work. Make the script a little…

Read More Read More

BINF 2111: Course plan

BINF 2111: Course plan

Welcome to Bioinformatics 2111/2111L. In this course, you’ll learn how to write simple executable UNIX scripts that automate “pipelines” of bioinformatics software, and to write scripts for genomic data analysis using the Python language. The course is structured so that you learn to work in the UNIX environment writing shell scripts first. Shell scripting will familiarize you with variables, and elements of control flow, and you’ll also learn something about how to assemble genome sequences. Then we’ll move into Python…

Read More Read More

BINF 2111: Opening and writing files in python

BINF 2111: Opening and writing files in python

Last week, we covered conditionals and comparators in python, and this morning we used them to test for sequence length, GC content, and presence of an “adapter”: DNASeq = raw_input(“Enter a DNA Sequence: “) ccount = DNASeq.count(“C”) length = len(DNASeq) cfrac = float(ccount) / length if cfrac < 0.25: ____print (“C count is abnormally low!”) See the slides for the rest of the examples if you need them. Today, we saw how to find the location of a pattern in…

Read More Read More

BINF 2111: Exploring SNPs with bedtools (Lab)

BINF 2111: Exploring SNPs with bedtools (Lab)

We’re not quite ready to launch into full-on python scripting just yet — so here’s a little add-on to your final bash project, to help make use of the output files you generate. Why it’s important? Information about genomes is organized in “tracks”. The genome itself is kind of like a ruler, a straight continuous piece of sequence with linear coordinates. The position of different kinds of features on that “ruler” is contained in files like GFF files (annotations of…

Read More Read More

BINF 2111: Variant calling workflow (Homework)

BINF 2111: Variant calling workflow (Homework)

Now that you’ve implemented one workflow in a bash script, the challenge is to take the skills you’ve learned and implement a different script, on your own. The workflow that we’ll use is variant calling from the chloroplast data. The end product will be a collection of variant call files (VCF format) which can be displayed in a genome browser to show you the position of your variants. You can definitely repurpose your assembly script to carry out this workflow…

Read More Read More