BINF 2111: Genome Horoscope (eye color — part 2)
Your mission for today is to write a script that checks a 23 and Me genotype file for several groups of SNPs that are associated with blue eyes. The SNPs in question are in the file “blueeyepanel.txt” in the Dropbox. The format of the file is “genotype, SNP, haplotype block”. Genotype is the derived genotype common in people with the blue eye variant discussed in the last post. SNP is the SNP identifier. Haplotype block is the code name for a multi-SNP pattern associated with blue eyes. The file looks like this:
AA, rs1129038, h-1
AA, rs12593929, h-1
GG, rs12913832, h-1
CC, rs7183877, h-1
GG, rs12913832, BEH2
GG, rs916977, BEH3
TT, rs1667394, BEH3
For simplicity, although the real blue-eye genotype in BEH3 is C-A, I have changed those values to the complement G-T to make them straightforward to match in the 23 and Me file, where the strandedness is most likely wrong.
Your goal is to parse this file and the 23 and Me file, and to inform the user whether they have all, part, or none of the genotype for blue eyes. Examples:
Your BEH1 genotype is:
A - C - A
A - C - A
Both alleles are positive for the complete BEH1 blue eye haplotype.
Your BEH2 genotype is:
G - T
G - G
One allele is positive for the complete BEH2 blue eye haplotype.
I have provided you with a few decoy 23 and Me files. One file has all the blue eye SNPs, one file has none (homozygote), and one file has partial coverage of the regions as if the haplotype has been disrupted by recombination. Your script should handle at least these three cases correctly.
Techniques you should use?
- Open your files with the “with” syntax
- Iterate through lines of a file
- Read lines of a file (or split out elements of those lines) into a list
- Compare patterns in a list to file contents
- Conditionals where outcomes depend on the input genotype
There are examples of all of these in the last few class notes.
Some hints: remember, not all of the SNPs in the blue eye panel are assayed by 23 and me. Haplotype H-1 has some overlap with the BEH2 and 3 regions. Especially in the BEH2 and BEH3 region, the two SNPs are so strongly correlated that you can infer the allele at one site if you know the others.