Today, we can produce more biomedical data in about three months than we did in the past, with petabytes from just a single hospital. There is a deluge from various sources, such as patients' medical records, medical instruments, lab work, and more. Likewise, in the research and development space, similar amounts of data are generated from clinical trials, e-notebooks, and other drug development groups.
No human can synthesize that level of data, and this is where artificial intelligence is crucial. It is software that writes software that no human can. Machine learning and deep learning can harness this exploding quantity of health data and are capable of identifying patterns in immense amounts of data, helping us glean insights faster and more holistically. This is especially important in areas where clinical staff and resources are low compared to the number of patients.
Deep learning can help identify genomic variants and detect anomalies in a radiology image. It can help us to generate new chemical structures during drug development, make sense of pathology reports and medical records, guide us through medical procedures via AI-assisted robotics and surgical devices, flag urgent cases or patient falls, and be the extra tools we need to bring care to patients faster.
AI is an invaluable tool across all aspects of biology in the discovery of therapeutics in humans, to discovering new plant molecules in solar synthesis, to agro-industrial uses. And AI helps us amass insight faster, as speed of discovery became especially important this past year during the deadly public health crisis of COVID-19.
Image above: Using HPC and AI, researchers achieved the clearest view of the coronavirus thus far.
Screening More Compounds
The pharmaceutical industry invests billions of dollars to bring drugs to market, only to watch 90 percent of them fail before they even reach clinical trials. The rise in digital data in health care actually presents an opportunity to improve the understanding and holistic view of a patient through AI. Momentum around this shifted with COVID-19, with start-ups in this space raising well over $5 billion in 2020.
Top research institutions, like the University of California at San Francisco, are using Nvidia GPUs to power their work in cryo-electron microscopy, a technique used to study the structure of molecules, such as the spike proteins on the COVID-19 virus, and accelerate vaccine and drug discovery.
Pharmaceutical companies, such as GlaxoSmithKline and major health-care systems like the U.K.'s National Health Service, will utilize the computer power of Cambridge-1, the U.K.'s faster supercomputer, to identify and create novel therapeutics, with the goal of improving diagnosis and delivery of critical medicines and vaccines.
Last fall, a team of 27 researchers, led by Rommie Amaro at the University of California at San Diego, combined high-performance computing (HPC) and AI to provide the clearest view of the coronavirus to date, winning the Gordon Bell Prize for fighting COVID-19 - the equivalent of the Nobel Prize in the supercomputing community.
The image was taken by Amaro's lab using what is called a computational microscope, a digital tool that links the power of HPC simulations with AI to see the details beyond what is capable with conventional instruments.
For more than a decade, every major pharmaceutical company has used Schrödinger's modeling software, which can perform physics-based simulations designed to model and compute properties of novel molecules.
Schrödinger has devoted decades to refining computational algorithms and uses Nvidia GPUs to generate and evaluate petabytes of data to accelerate drug discovery, which is a dramatic improvement over the traditional process of slow and expensive lab work.
With accelerated computing, millions of drug candidates can be screened at a time. Researchers at Oak Ridge National Laboratory (ORNL) and Scripps Research have shown that this process, which has traditionally taken years, can be completed in hours with accelerated computing. Using AutoDock on ORNL's supercomputer with 27,648 Nvidia GPUs, they were able to screen more than 25,000 molecules per second and dock one billion compounds in less than 12 hours. This represents a speedup of more than 50X, compared to running AutoDock with just CPUs.
Transformer-based neural network architectures, which have become available only in the last few years, allow researchers to leverage self-supervised training methods that avoid the need for large labeled datasets, which is a significant barrier in building any AI model. BioMegatron, a transformer--based AI natural language processing (NLP) model that understands biomedical language, offers great promise of bringing together all types of clinical and scientific data from R&D biology and chemistry e-notebooks, clinical trial data at pharmaceutical companies, and hospital patient reports and lab work.
It's just like if you went to visit a new country to learn a new language, transformers are skilled at learning the language of many diverse data types using unsupervised learning and fine-tuned to a specific task using supervised learning. NLP models have demonstrated human capability in Q&A, summarization, and language generation, and most recently, transformers are being used for image generation and analysis.
Nvidia is collaborating with AstraZeneca on a transformer-based generative AI NLP model that reads SMILES, the text language for chemical compounds. This is based on Megatron, a giant transformer model that is fast, powerful, and utilizes multiple GPUs in parallel. Large transformer models can be trained in a similar amount of time compared to their smaller counterparts and demonstrate improved performance.
Even with Megatron, a trillion-parameter model will take about three to four months to train on Nvidia's Selen supercomputer. With biomedical data at the scale of petabytes and learning at the scale of billions, and soon trillions, of parameters, transformer AI models are helping life sciences do and find much more than expected.
Road to Personalized Medicine
End-to-end genomic analysis, from a patient's blood sample, to sequencing, to analysis, to final clinical results, is a data--intensive process. The cost of genomic sequencers has dramatically decreased, which is wonderful, as more people can afford to sequence patients' DNA. With more people sequencing DNA, there is more data being generated that needs to be interpreted accurately and efficiently.
Large population studies are in a race to analyze thousands of whole genomes to find genetic variants of diseases in specific populations, cancer centers are eager to identify the right variants for specific cancers, pediatric units utilize genomic analysis when a child is not thriving to identify a rare disease, and researchers on the cutting edge of genomic discovery need fast tools to publish their findings.
Lightning-speed genomic analysis in minutes versus hours or days can have a significant impact on scientific discovery and therapeutic treatments for patients. A hospital can now quickly sequence a baby's DNA to figure out if the infant has a genetic variant that is causing symptoms or a disease, and offer the right treatments based on that variant.
Besides utilizing GPUs to expedite genomic analysis, GPUs are being used to build AI models for genomic discovery. There are AI-based variant callers that learn statistical relationships that are showing amazing accuracy for secondary genomic analysis. There are also AI models to help denoise data, such as with AtacWorks for ATAC-seq data. AtacWorks brings down the cost and time needed for rare and single-cell experiments.
AI at the Hospital Edge
One place you might not expect AI to be is in the operating room. This new generation of medical devices is equipped with dozens of real-time AI applications that provide support at each step of the clinical experience, automating patient setup, improving image quality, and analyzing data streams to deliver critical insights to caregivers.
Medical Instruments, like surgical robots and endoscopes, are mounted with cameras, sending a live video feed to the clinicians operating the devices. Capturing these video streams and applying computer vision AI to the video content can equip medical professionals with the tools to guide surgeons, detect and measure anomalies, or provide alerts for urgent cases, such as a stroke. AI models for streaming data in health care will help the clinical community greatly to identify, measure, and report on surgical findings more quickly. These smart sensors are software--defined, which allows them to be regularly updated with AI algorithms as they continuously learn and improve - a capability that is essential to connecting research breakthroughs to the day-to-day practice of medicine.
Medical imaging from radiology and pathology was one of the first places to witness the benefits of AI models and has been adopted in many areas, from reporting, to detection, to measurements. AI is already helping radiologists to quickly detect and classify anomalies, as well as prioritize work lists based on urgency of cases. In areas where there is a shortage of clinical teams, AI is helping to triage patients based on a "first pass" of a DICOM, the standard for communications and management of medical imagery information, to detect any emergency findings.
The health-care and life-sciences industry is embracing and adopting AI at every step. With the deluge of patient data, R&D drug development data, and clinical data, institutions are investing in AI to accelerate their workflows, bring disparate patient data together, accurately analyze genomes and visualize protein 3D structures, monitor patients, and optimize patient experiences and findings. Continued advances in AI models and uses of GPUs will continue to revolutionize life-science research and health care.
Vanessa Braunstein is the health-care AI product marketing lead at Nvidia.