The 1000 Genomes Project is an international genomic research and data collection effort that has produced "a deep catalog of human genetic variation" for public research use. Now, thanks to Amazon Web Services (AWS) and the White House's recently-announced Big Data Research and Development Initiative, the 1000 Genomes data is available gratis on the AWS cloud. In reality, there are over 1700 genome profiles in the demographically-diverse study, and all that data takes up about 200 terabytes of memory, according to a New York Times article on the cloud bonanza. So even though researchers could download the data free to their own computers from 1000 Genomes directly before, it's something you really don't want to do, even if you have that kind of memory (re: 200TB). Instead, you'll likely be better off accessing the data through AMS and paying them to crunch numbers for you, which probably explains why AWS has decided to engage in this bit of philanthropy. Future profit, plus their preeminence as a computational resource in the brave new world of Big Data.
According to NIH Director Francis S. Collins, in a recent press release:
"The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery, innovation and improvements in the nation’s health and economy."
Lisa D. Brooks, program director for the Genetic Variation Program of the NHGRI, a branch of the NIH, says of the significance of the 1000 Genomes data:
“It’s the only public data set like this...It is an almost complete set of human genetic variants.”
The Big Data Initiative will allocate some $200M federal dollars over a number of years toward ensuring that the gargantuan amounts of scientific information we are generating in areas like genomics can actually be accessed and used to further national research goals, including bringing disease therapies to individuals.
The Big Data Initiative is being carried out through various agencies besides the NIH, including the Department of Energy and the NSF, both of which have just awarded major computational research grants to programs at the University of California Berkeley and its partner institute, Lawrence Berkeley National Lab (Berkeley Lab).
UC Berkeley will lead the $10M NSF-funded Algorithms, Machines and People (AMP) Expedition team as it tackles some of the super-sized computing challenges that come with all that new data availability. Michael Franklin, Computer Science professor and director of the AMP explains:
“Buried within this flood of information are the keys to solving huge societal problems and answering the big questions of science. Our goal is to develop a new generation of data analysis tools that provide a quantum leap in our ability to make sense of the world around us.”
Up the hill at Berkeley Lab, the DOE has committed $25M over 5 years to the Scalable Data Management, Analysis, and Visualization Institute (SDAV), a collaboration that brings in partners from seven universities and five other national laboratories with the common aim of delivering end-to-end computing solutions, from managing large datasets as they are being generated to creating new algorithms for analyzing the data on emerging architectures. SDAV is being led by Berkeley Lab Computational Research Division scientist Arie Shoshani, who calls the project “the best of everything being done in DOE and the universities in these domains. This team is the cream of the crop.”
Biotechnology Calendar, Inc. will be holding its 15th Annual Berkeley BioResearch Product Faire event on the UCB campus on June 6, 2012. This networking opportunity brings life science researchers together with laboratory equipment suppliers to discuss the latest technologies in lab science. For information on exhibiting, click the button below:
If you are interested in attending the UC Berkeley event as an exhibitor or researcher register here.
Be sure to check our complete 2012 National Show Schedule too.