CCMB moves 83 terabytes of genomics data to AWS cloud to boost research


Amazon Web Services (AWS) on Friday announced that the Centre for Cellular and Molecular Biology (CCMB) — India’s premier research organisation for modern molecular biology and genomics — has selected AWS as its preferred cloud provider to accelerate its genomics research projects.


The research institute has moved as much as 83 terabytes (TB) of genomics data from on-premises servers to AWS via an offline data transport service. It then migrated its genomic analysis toolkit and bioinformatics data pipelines for secondary analysis to Amazon Genomics CLI, an open-source tool that enables genomics organisations to process raw genomics and biological data.


Based out of Hyderabad, CCMB is an arm of the government-backed Council of Scientific and Industrial Research (CSIR) and focuses on the study of genetic material, how it varies among populations, and how the variance leads to disparities in human health and disease, as one of its research areas.


CCMB has already reduced the time taken for research analysis from 550 days to just nine days, or by up to 98 per cent on average, with cloud computing services, AWS said.


Life sciences and genomics research organisations need to access, store, and analyse large amounts of data, generated from next-generation high-throughput sequencers, and cloud services help in that process. Previously, these organisations relied on on-premises servers to meet their storage and computation needs.


“At a time when genetics research is becoming critical for life sciences advancement, disease diagnosis, and drug development, we must innovate using technologies like cloud computing to achieve outcomes faster and better,” said Dr Divya Tej Sowpati, genomics scientist at CSIR CCMB.


Globally, AWS serves organisations such as AstraZeneca, CSIRO, GRAIL, Illumina, Melbourne Genomics Health Alliance, National Institutes of Health, Regeneron, and Stanford University for their research needs.


In another project, CCMB has started analysing breast cancer samples to identify molecular signatures of triple-negative breast cancers among the Indian population. Further, it also used AWS graphics processing unit (GPU) instances to train and test machine learning (ML) neural network models on long-read data sequenced using Oxford Nanopore sequencers to detect DNA modifications associated with various diseases, including cancer, neurodegenerative disorders, and cardiovascular diseases.


“Understanding the genomic variation in India’s population is a government priority towards developing precision healthcare and diagnostics, and delivering them at affordable costs. However, genomics research is data-intensive, and the increasing volume and velocity of genomics data are challenges for research institutions in managing both infrastructure and costs,” said Pankaj Gupta, Leader – Public Sector (Government, Education, Healthcare), AWS India Private Limited.


Cloud migration has also allowed CCMB to access multiple genomics databases from the Registry of Open Data on AWS (RODA) without having to download these locally for processing, saving months of data download time and benefiting from access to documented sources of truth.