BINF 6215: Deposit your data in the SRA (Part 3: SRA Submission)
Step 3: Deposit data in the SRA.
Now that you have your BioProject and BioSample identifiers, you are ready to deposit your data.
You will need to create a username and password on the submission page. Once you are “in”, create a new submission. You’ll then add experiments to your submission. In my experiment, we have sixteen samples for four bacterial strains under two conditions of interest. The replicate samples here represent different biological preps, not just different sequencing runs on the same sample, so we are going to have one “experiment” for each sample.
Then you’ll start adding experiments:
Try to enter as much information as possible. I don’t have the values for the nominal insert size and nominal standard deviation right now (I’ve requested them from the biologists) so I will go back and add them when I receive the information.
Once you have the experiment created, you can create a new run within the experiment. If you only ran one lane per biosample, as we did, you will have one run per experiment. You might run more than one sequencing run per sample if you were interested in technical replication.
On the above page, you are going to tell NCBI the file to look for, but you’re not actually going to upload the file — you’ll do that later via FTP. Take note of the ftp login information for transmitting your files.
Make sure you verify the correct relationships among BioSampleID, sample file name(s), and the descriptive label that you put on each experiment/sample/run combo. In this case, I checked the message I got from the SRA to make sure I was using the BioSampleID that corresponded to CMCP6 artificial seawater replicate 1, and then I went back to my spreadsheet and double checked that that corresponds to sample #9, which has two associated fastq files for its paired-end data. Other people can only use your data set properly if you describe it correctly.
On your Mac, you can use the command line program md5 to get the needed MD5 checksums for each of your files. MD5 checksums are a cryptographic marker that can be used to verify data integrity. They’re not foolproof but certainly good enough for government work (as they say).
Once you have your filename(s) entered and connected with the correct MD5 checksum, you can enter those into the Run form and save your run. (Here’s how to generate and save all your md5 checksums into a file with one command).
Once you have your run saved, it may take a while (like half an hour) for it to move to “waiting for files” status. It may even show an error, but eventually the status of the entry will change on its own. You can upload your files anyway, though. Use the UNIX “ftp” or “sftp” command to connect to the NCBI upload server from your command line:
On the “Run” form, once you have saved, click “Back” a couple of times and you will get back to your personal SRA submission main page, where you can continue to upload additional experiments.
Don’t make your final submission until you have all the files uploaded and you have filled in any missing fields.