Getting your 23 and Me data into Galaxy

Getting your 23 and Me data into Galaxy

I just happen to have a file with 960,628 lines of personal SNP data from 23 and Me burning a hole in my hard drive. I’m one of the lucky people who gets 23 and Me’s extended health information because I bought my genotype at the right time, before the FDA got all handsy. But if I wanted to do more with that information on my own, could I?

  • If I wanted to search for SNPs and traits one by one, I could use SNPedia.
  • I could do the same thing in the UCSC Genome Browser.
  • If I want to start from a disease perspective, I could use OMIM.
  • If I wanted to search for SNPs that have an association in a GWAS study, I could use GOCI, and either search based on broad categories or keywords.
  • If I wanted to really get into an interface that delivered me more detail than I could process with my eyeballs, I could use dbSNP.

Next time I’ll write about each of these sources and what you can learn there, but first, let’s do something a little more high-throughput. Because 960,628 is a lot of one-by-one.

23andme3

 

Convert your data to VCF format

The first thing that we need to do is get this special 23 and Me file into a form Galaxy can deal with. Surely someone has wanted to do this before, right? Well, some guy called arrogantrobot has created a perl script that converts a 23 and Me file into a VCF file, which is a file format that Galaxy definitely traffics in. To get it, you just:

git clone https://github.com/arrogantrobot/23andme2vcf

in a directory where you’ve got write access. Then, run it on your data:

perl ../23andme2vcf/23andme2vcf.pl genome_Cynthia_Gibas_Full_20140719094157.txt genome_Cynthia_Gibas_Full_20140719094157.vcf

(Make sure that a copy of the reference genome file that’s distributed with 23andme2vcf is sitting in your working directory.)

The contents of your output VCF file should look like this:

23andme13

Get it into Galaxy

Now you can upload the VCF file into Galaxy. Simply select the file to upload, allow Galaxy to auto-detect the file, and associate the file with the GrCh37/hg19 genome build.

23andme14

Galaxy should find your newly-created file acceptable.

Visualize

This is not going to easily tell you much about what individual SNPs mean, but it’s a first thing to do with your data — visualize it in Trackster.  All you need to do is select your data and select the same genome build you used as a reference when you imported it. Here’s a section of my data, visualized on Chromosome 1.

23andme15

Create a custom UCSC browser track with your data

The next thing you can do is convert your data into a genome browser track for the UCSC genome browser.  First, select VCF to MAF custom track from the Graph/Display Data section in the Galaxy Tool panel.

23andme16Select your imported VCF data set to be converted:

23andme17

Once it’s done converting, open the data set’s detailed information in the history panel and choose display at UCSC Main:

23andme19

HEY LOOK MOM I’M IN THE UCSC GENOME BROWSER!!!

23andme20

1337 genome h4x0r!

Comments are closed.