Browsed by
Category: BINF 2111

BINF 2111: Variant calling workflow (Homework)

BINF 2111: Variant calling workflow (Homework)

Now that you’ve implemented one workflow in a bash script, the challenge is to take the skills you’ve learned and implement a different script, on your own. The workflow that we’ll use is variant calling from the chloroplast data. The end product will be a collection of variant call files (VCF format) which can be displayed in a genome browser to show you the position of your variants. You can definitely repurpose your assembly script to carry out this workflow…

Read More Read More

BINF 2111: looping, user input, and complex conditionals

BINF 2111: looping, user input, and complex conditionals

Today you are going to continue to work with your genome assembly script.  By the end of the lab period, you should have implemented the following features: Script loops through multiple *.fastq files, parsing name variables for use at subsequent steps out of the filename, as shown in class on Tuesday. Script prompts user for a reference genome basename. Script then tests to see if both the needed files are present, with different outcomes depending on the result of the test. Since…

Read More Read More

BINF 2111: Know your UNIX, part 2 (Homework)

BINF 2111: Know your UNIX, part 2 (Homework)

For your homework (due next Thursday), I want you to go all the way to the end of the book to Chapter 16 and learn the following commands. I mentioned “cut” and “paste” in class and we will use them in our script tomorrow, but there are a few other useful ones in Ch. 16 as well: cut paste sort uniq Also do the advanced grep exercises Also review head… …and tail These exercises are on page 299-308 in the book….

Read More Read More

BINF 2111: Variables and conditionals in bash

BINF 2111: Variables and conditionals in bash

A good script should be: Reusable Efficient Clearly organized Correct in how it performs its task Error-/bug-free Today you’re going to learn two key elements of bash scripting:  use of variables, and use of a simple conditional test. We’ll also see how you can use the UNIX pipe operator to simplify your script and cut down on intermediate files, and how you can use the echo command to make your script inform you about its progress.

BINF 2111: Assembly at the Command Line (Lab)

BINF 2111: Assembly at the Command Line (Lab)

Today’s lab task is preparation for developing a script that will assemble a batch of small genome sequence data sets. In our script development process there are four main steps: Figure out the pipeline that we want to execute, and why (covered in lecture on Tuesday) Run the series of steps in the pipeline by hand to make sure that they work, and make sure we understand the outputs Write a script that will do exactly what we just did…

Read More Read More

BINF 2111: Know your UNIX, part 1

BINF 2111: Know your UNIX, part 1

Before we move on to scripting, we’ve got to get a handle on the basic UNIX commands covered in Chapters 4 and 5 of the book. You can’t write a script until you understand the commands you want to give to the computer! What you already know from the first day of class: review! You know how to open a shell window on your computer. You know how to find out where you are in the filesystem using pwd. You…

Read More Read More

BINF 2111: Know your UNIX (part 3)

BINF 2111: Know your UNIX (part 3)

The homework for next week (due next Tuesday) is an opportunity to practice writing a few small scripts. Work through the exercises on p. 88-98 of your book. You will practice making scripts executable using chmod, generating scripts automatically by editing file lists, renaming collections of files, and automating the curl command to get a list of references from a public database called CrossRef.  Remember that you can get the example files for the textbook here. What you will turn in:…

Read More Read More

BINF 2111: Assembly at the command line

BINF 2111: Assembly at the command line

The most common things that you’ll want to do with shell scripts in bioinformatics are 1) data manipulation and 2) driving programs to run automatically, collecting their output, and feeding it into other programs. To successfully build a script that drives a bioinformatics pipeline, you need to: understand the process you’re looking to execute identify appropriate programs for the process and for the data you have understand the inputs, outputs, and tuneable parameters of each program Then you need to…

Read More Read More