BINF 2111: Genome Horoscope — eye color (Part 1)
In this week’s exercise, we’re going to practice loops and lists in Python by looping through a 23 and me file to pull out data that matches a list of SNPs of interest. Before we get into the coding, I think it’s worth exploring how we arrive at such a list. In order to write a useful program, you need to understand the problem it’s solving.
The raw information
As I mentioned previously, I have access to my own 23 and Me raw data (as anyone does who gets their genome done, even now that the FDA has put some limits on their distribution of health data reports). That means I have 960,000 individual pieces of information about my genome (twice that if you consider that I know both alleles).
23 and me currently provides me with information about a few dozen disease risks, traits, and drug responses — based on at most a few times that many SNPs.
That means that even with 23 and Me’s information there is still vast unexplored territory in my genomic data, and even if the FDA opened up 23 and Me health reports again tomorrow, there would still be many thousands of SNPs that I could research on my own. The Genome Horoscope project is a very basic example of how you could get started finding more information about your own SNPs and traits.
Eye color
Why did I choose to investigate eye color? Because my own results from 23 and Me are just a little bit confusing.
23 and Me categorizes my eye color as “likely blue”. However, witness my actual eyes:
They’re not really brown like the brown eyes that predominate in the world outside Europe — the ancestral eye color type of the human species — but they’re definitely not blue, and they’ve got enough brown in them that they’re hardly recognizable as green. I’ve always called them “hazel” on my driver’s license.
So what does SNPedia have to say about my eye color?
The topline results at SNPedia are even more adamant that my GG genotype at Rs12913832 should add up to a blue-eye phenotype. 99% of the time. I’m defying the odds here in a fairly significant way, although someone has to be that 1%.
What 23 and Me also tells me about their report, though, is that they are assaying one SNP out of many that may affect eye color. Many of those other SNPs are variations in the OCA2 gene, while the Rs12913832 SNP is actually a variation in HERC2, which may have some higher-level control over OCA2.
So, the goal of this exercise is to expand the number of eye color SNPs we look at, and to create a little eye color report for the individual. We won’t look at every possible eye color SNP, but SNPedia can give us some ideas about other SNPS to look at.
1. The h-1 haplotype
A haplotype block is a run of SNPs in the genome that tend to be fellow-travelers. For various reasons, that stretch of SNP sites is rarely disrupted by recombination, so an individual having a particular genotype at one position in the block (like my G-G genotype at Rs12913832) is likely to have all of the genotype variants that travel with the first.
The h-1 haplotype is a recognized group of SNPs that turn up as homozygous (the same DNA base in both alleles) in humans that are homozygous for G at Rs12913832, like me. Having this whole block of SNPs with the alleles specified below is common in blue-eyed individuals — over 97% of them will have the complete pattern. There’s a paper about this block here, but it’s unfortunately behind an Elsevier paywall.
C at rs4778241
A at rs1129038
A at rs12593929
G at rs12913832
C at rs7183877
G at rs3935591
A at rs7170852
T at rs2238289
C at rs3940272
T at rs8028689
A at rs2240203
G at rs11631797
G at rs916977
So this opens some questions up that we can find out the answers to with a script. 1) Does 23 and Me assay for all of these SNPs? (Sometimes when SNPs are strongly linked to each other only one will be assayed because knowing the allele at one means you know all the others). 2) If they do, what is my pattern? Do I have the complete block? 3) What are their physical locations in the genome? (We’ll be able to figure this out in a future exercise by using reference data).
Because I’m a Level 11 UNIX wizard, I already know how to do this with a grep command:
egrep "rs4778241|rs1129038|rs12593929|rs12913832|rs7183877|rs3935591|
rs7170852|rs2238289|rs3940272|rs8028689|rs2240203|rs11631797|rs916977" genome_Cynthia_Gibas_Full_20140719094157.txt
We’ll see how to code it in Python later to make a more flexible script.
Anyway, here’s the first part of my “Genome Horoscope” results. Out of the 13 genes in that haplotype block, 23 and Me assays nine of them. And for each of these markers, I’m a homozygote, exactly as you’d expect from my G-G genotype at Rs12913832. So the answer to my murky green/brown eye color does not lie in the state of the h-1 haplotype block.
2. OCA2 SNPs
So, what are some other eye color genes we can investigate? Well, 23 and me directed us to the OCA2 gene, so let’s take a look. In a global study of eye color associated SNPs in OCA2, Donnelly et al described three blue eye associated haplotypes.
- BEH1 involves 3 SNPs, rs4778138, rs4778241, rs7495174. The haplotype associated with blue (or light) eye color is A-C-A at these three SNPs.
- BEH2 consists of 2 SNPs, rs1129038 and our old friend rs12913832, which are actually in the HERC2 region just like 23 and Me’s marker for eye color. These SNPs were found to be very strongly correlated to each other — in other words, if you have the “derived” or blue-eye genotype at one of them, you’ll have it at the other as well. The blue-eye type here is T – G (that’s me!). These genes already turned up in the h-1 haplotype, so we don’t have to search for them again.
- BEH3 consists of 2 SNPS, rs916977 and rs1667394, also in the HERC2 region, also strongly correlated. The blue eye genotype at these sites is C-A. The first of these genes is also in the h-1 haplotype list, but the second is not.
If I make a little “panel” out of these genes using egrep, you see that I have the blue-eye type in all but one case. Strangely, I have T at rs1667394, which is not even a known option (ancestral allele is G and derived is A), but there is a note in SNPedia that indicates that this may be due to an error in which strand of DNA is assumed to be the sense strand. If this were the case, then my “T” at this position could actually be the complement of “T” which is, of course, “A”.
So, broadening our SNP search to OCA2 does not really give any further insights into my murky hazel eyes.
3. Other SNPs that change the appearance of the iris
It’s obvious from My OCA2 and HERC2 genotypes that I have the full set of genes (homozygous, no less) for blue eye color. And it’s really no surprise that I would. Per my 23 and Me report, the blue eye color mutation is thought to have a single founder and arose with the spread of agriculture into Europe from the near east about 9000 years ago. My maternal mitochondrial genotype is T1a1, which is a group associated with that very migration.
If we look more closely at my eyes (don’t freak out, now):
It’s possible to see that the base color is really kind of green, but that the appearance of darkness and brown-ness is created by a couple of features — primarily a dark-ish ring around the outside edge of the iris, and a smear of amber color around the pupil.
On the SNPedia eye-color page, there’s a user-added (and unsatisfactorily sourced) list of iris characteristics and associated genotypes, supposedly to be taken in the order of priority with the last in the list superseding the first in the resulting phenotype.
AA at rs1533995 (equivalent to rs10235789) — More crypts in the iris
AA at rs3739070 — More pronounced furrows in the iris
GG at rs12896399 (equivalent to rs4900109) — pigmented rings around the iris
CC at rs3794604 — blocks some melanin/gives light colored eyes.
GG at rs7174027 — blocks some melanin/gives light colored eyes.
CC at rs4778241 — low melanin
CC at rs9782955 — blocks some melanin/gives light colored eyes.
CT at rs989869 — contrasting sphincter around pupil
AA at rs4778138 — weak amber gradient in the eye
TT at rs1129038 — increases blue penetrance
GG at rs12906280 — gray ring around the iris
GG at rs16891982 — starburst or collarette in the eye
TT at: rs1667394 — starburst or collarette in the eye
CC at: rs12203592 — no collarette
If I were writing a research paper or a review article, I’d definitely want to dig into the references for each one of these sites and get the details, but for now we’re just going to assume that the unsourced list of markers are valid and use these sites to make another egrep “SNP panel”.
The results below suggest that I should have a tendency to more pronounced furrows in my iris (AA at rs3739070 on Chr. 2), a tendency to a pigmented ring around the iris (GG at rs12896399 on Chr. 14 and GG at rs12906280), reduced melanin/lighter colored eyes (CC at rs3794604, GG at rs7174027, CC at rs4778241, CC at rs9782955 on Chr. 1), a weak amber gradient in the eye (AA at rs4778138), increased penetrance of the blue trait (TT at rs1129038), and probably no collarette in the eye, because CC at rs12203592 on Chr. 6 is likely to outweigh GG at rs16891982 on Chr. 5 and TT at: rs1667394). Note: sites not labeled are all in familiar locations on Chr. 5.
These results definitely explain a bit more of what there is to see in my eyes!
Next time we’ll use this whole list of SNPs to begin to construct a “Genome Horoscope” Python script.
Note: rs989869 is a substring of rs9898699. Caught an extra SNP by mistake. Something to watch out for when you’re pattern matching.
Hat tip to my friend Dr. David Wilson for coining the name “Genome Horoscope”.