GenoSets installation
I’m going to be adding some tutorials on how to use GenoSets, our business intelligence-inspired tool for mining multiple genome data sets.
The GenoSets system is platform independent, and installation and setup of the software is straightforward for an experienced programmer or computational scientist. For student use, however, you may wish to provide an environment in which the software is already installed on the student’s computer, so that students can concern themselves primarily with loading and analyzing data. These detailed set-up instructions should be sufficient to help you prepare the computers to be used in the tutorial.
GenoSets has some dependencies, which must be satisfied before it is installed in order to run the application and access all its features.
1) GenoSets requires a Java Runtime Environment (JRE) 1.6 or later.
2) To load your own data into a new GenoSets database, you will need a working copy of MySQL, either on the user’s machine or on a server to which the user has access.
3) GenoSets requires you to load an OrthoMCL output file for your genomes, so you need a copy of OrthoMCL installed.
In this document, we’ll give an example of how to prepare a computer running Mac OSX (v. 10.8.5) to use GenoSets. Installation on MacOSX is a little less straightforward than Windows or Linux so we’ll walk through installing it on a Mac OSX platform. In order to complete this install on MacOSX, you’ll need to have the Apple XCode Developer Tools installed (go on, get it, it’s useful! — bioinformaticians can’t live without compilers). You can install XCode through the App Store.
GenoSets installation
To install and run GenoSets, simply download and extract the package archive, and place the genosets folder in your Applications folder. Executables are found in genosets/bin.
MySQL installation
To create your own databases, you will need to have a working MySQL installed on your local computer. For example, on a Mac running OSX 10.6.8, you should go to the website:
http://dev.mysql.com/downloads/mysql/5.6.html
Download the MySQL DMG archive for your OSX version and architecture. I picked the version for OSX 10.6, 64-bit. Then just open the archive and follow the instructions. MySQL will be installed in /usr/local/mysql/bin.
Open a terminal window and type:
which mysql
You should get a message that says “/usr/local/mysql/bin/mysql”. If the system knows where to look for mysql, users do not need to type the entire directory path to access the program. If your system is not set up so that it correctly finds the path to mysql, you can give it this information in one of two ways.
Either you can make a file called .bash_profile in your home directory, if it’s not there already, and add the information to your system path, using a UNIX command like this one:
PATH=${PATH}:/usr/local/mysql/bin
Or, if you have superuser access to the machine you’re working on, you can type:
sudo vi /etc/paths
You’ll enter a text editor window with the system paths file open and editable. Add /usr/local/mysql/bin as its own line in the file. Do not change anything else in the file. If you do not actually know how to edit a text file using a text editor on a UNIX system, you need to get someone who does know these things, to help you do this part of the setup. Once you have edited the system path file, close your current shell window and open a new one. The new shell will be initialized with the updated path information.
Starting MySQL
Now you need to get MySQL started. On some platforms this happens automatically upon installation, but on Mac OSX you’ll need to change a few things first. Start by typing:
sudo /usr/local/bin/mysql/support-files/mysql.server start
Then open another window and type:
mysql
You should see a message that begins “Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection ID is (#)”. That means it’s running.
OrthoMCL and BLAST installation
GenoSets makes use of established software packages for functions like sequence alignment and ortholog clustering. You will need to install those packages on your computer in order to use GenoSets. Again, the installation process for these tools is platform dependent; we will give an example of installation on a computer running Mac OSX.
NCBI BLAST installation
For comparison of sequences, we use the version of BLAST available from the National Center for Biotechnology Information (NCBI). The current version can be downloaded at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST. You need to have version 2.2.23 or greater, due to a bug that affects sequence ID formatting in earlier BLAST versions.
Download the current *.dmg file, click on the *.pkg file, and follow the instructions to install.
The BLAST programs will be installed in /usr/local/ncbi/blast/bin/. Just as we did for MySQL, you will need to add this location to your system path. To do so, type:
sudo vi /etc/paths
and add the path /usr/local/ncbi/blast/bin to your path file.
OrthoMCL installation
OrthoMCL is a package that performs the clustering of orthologs that GenoSets relies on to show you gene content comparisons. You need to have it installed in your Applications folder or elsewhere on your computer. OrthoMCL will also make use of your MySQL database, but you won’t have to issue any MySQL commands. The current version of GenoSets will create a master OrthoMCL input script for you that will take care of all that.
You can download OrthoMCL at http://orthomcl.org, or by typing:
curl -O orthomcl.org/common/downloads/software/v2.0/orthomclSoftware-v2.0.9.tar.gz
…or whatever the current version is. Move the software archive file into your system Applications folder (you may need to use the sudo command again). Extract the archive by typing:
gunzip orthomclSoftware-v2.0.9.tar.gz
tar xvf orthomclSoftware-v2.0.9.tar
You now have a new directory: /Applications/orthomclSoftware-v2.0.9. Executables are in the bin subdirectory. Edit the system path file to put that directory in your system path.
Perl Module Installation
OrthoMCL requires that you install two Perl modules. The easiest way to do this is to use the cpan (Comprehensive Perl Archive Network) program, which will automatically download and install your updates. Type:
sudo cpan
Then , at the cpan prompt, type:
cpan> install DBI cpan> install DBD::mysql
MCL installation
MCL is the graph clustering algorithm used by OrthoMCL. You also need to install the MCL package. Go to the License and Software page at the MCL site, download the tarball mcl-latest.tar.gz, and execute the following commands:
tar xzf mcl-latest.tar.gz
cd mcl-12-068
./configure --prefix=$HOME/local
make install
Note — the “latest” version of MCL may update, so if the second step doesn’t work check to see what directory was created by unpacking the archive. If you want MCL to be in your default system path, you can move the executables that it creates in $HOME/local/bin to your /usr/local/bin, or just start out with a different configuration prefix to begin with. You are now ready to use GenoSets.