Browsed by
Category: BINF 6215

BINF 6215: Taming Illumina datasets

BINF 6215: Taming Illumina datasets

As you’re uploading files to Galaxy or waiting for jobs to finish, you may be wondering “how can I make this go faster”? Scientists don’t like to throw out data, but Next Gen Sequencing generates enormous data sets and you don’t necessarily want to use full-sized data just for learning a process. So here are a couple of tips for reducing the size and complexity of data sets, especially when you’re practicing on them. Subsampling is good for just randomly…

Read More Read More

BINF 6215: RNA-Seq in CLC Genomics

BINF 6215: RNA-Seq in CLC Genomics

OK — we’ve done a few exercises now where everything’s spelled out for you — this time you’re on your own.  You have access to a variety of bacterial RNA-Seq data sets. Legitimate things to do with transcriptomes include: Differential expression analysis — parts (1)(2)(3) de novo assembly and analysis Discovery of novel transcripts

BINF 6215: Datasets

BINF 6215: Datasets

For practicing bioinformatics techniques in class, you can use any data set that’s not beyond the capacity of your computer. In practice that means that you probably want to use a bacterial NGS data set.

BINF 6215: Get your own Galaxy

BINF 6215: Get your own Galaxy

If you want to have some flexibility about what tools are available in your Galaxy instance, you can set up your own local instance. Remember that whatever you do with a local Galaxy instance will be limited by the available memory and processor speed of your computer — so a genome assembly that won’t happen fast at your command line isn’t going to be speeded up just by virtue of putting it in Galaxy. But it’s a great way to…

Read More Read More

BINF 6215: Know your UNIX

BINF 6215: Know your UNIX

Before we move on to scripting, we’ve got to get a handle on the rest of the basic UNIX commands covered in Chapters 4 and 5 of the book. What you already know On the first day of class, we took a look at commands for figuring out where you are, setting up your shell environment, and moving files. This morning, we looked at the simplest way to “automate” a repetitive task in UNIX — by using wildcards in commands and…

Read More Read More

BINF 6215

BINF 6215

This is the game plan for BINF 6215 — Bioinformatics Pipeline Programming. Links to these posts will also be provided in Moodle, but this post will serve as a directory for the course. Links will go live as the tutorials are ready. The recommended textbook is Practical Computing for Biologists by Haddock and Dunn. We are not going to work through the entire book, but there are some chapters in there that will be a useful reference for you in your…

Read More Read More

BINF 6215: ChIP-Seq workflow in CLC

BINF 6215: ChIP-Seq workflow in CLC

On the first day of class, we downloaded a small ChIP-Seq data set and used it to practice using the SRA toolkit. Now we’re going to go back and analyze that.  Your first step should be to go through the “Basic” and “Advanced” ChIP-Seq tutorials and see what kind of information you can extract for a single ChIP-Seq data set. In the ChIP-Seq analysis world, a “control” has a special meaning. It’s not just a comparison between conditions; the control…

Read More Read More

BINF 6215: Intro to UNIX — automating a repetitive task

BINF 6215: Intro to UNIX — automating a repetitive task

Let’s say you’ve got a whole bunch of fastq.gz files that you want to get md5 checksums for so that you can upload them into the Short Read Archive. You could, of course, type “md5 filename” and wait, type “md5 filename2” and wait, etc. until you were done. But in UNIX, you do not have to do that. This lesson introduces three concepts that are important for UNIX file manipulation: wildcards, standard output, and redirection of output to a file.

BINF 6215: Deposit your data in the SRA (Part 3: SRA Submission)

BINF 6215: Deposit your data in the SRA (Part 3: SRA Submission)

Step 3: Deposit data in the SRA. Now that you have your BioProject and BioSample identifiers, you are ready to deposit your data. You will need to create a username and password on the submission page. Once you are “in”, create a new submission.  You’ll then add experiments to your submission.  In my experiment, we have sixteen samples for four bacterial strains under two conditions of interest. The replicate samples here represent different biological preps, not just different sequencing runs…

Read More Read More

BINF 6215: Deposit your data in the SRA (Part 2: BioProject)

BINF 6215: Deposit your data in the SRA (Part 2: BioProject)

Step 2: Set up your BioProject Once you have your BioSamples set up, the next step is to create a BioProject.  First you’ll add your contact information.  For the class, we are going to look at this process but then create only one submission using my contact information.