Skip to main content

30% Off New Releases: Explore the List

Harvard University Press - home

Can AI Make Us Healthier?


The Age Of Scientific Wellness Excerpt

In this excerpt from their new book, The Age of Scientific Wellness, biotechnologist Leroy Hood and researcher Nathan Price explain how artificial intelligence is essential to a wellness-centered future.

Author - Editorial Staff

Date - 31 December 2023

Time to read - 21 min

The future of healthcare will take us to a place where increasing numbers of routine medical decisions are being made by AI alone. But far more decisions will come from a combined approach of powerful AI assessments augmented and amplified by highly trained human intelligence, a schema that has come to be known as “centaur AI.” Like the mythical half-human, half-horse creature of Greek mythology, this hybrid arrangement is part human, part computer and should offer us the best of both worlds. This is especially true in areas where extreme human complexities play major roles and brute computational power is likely to be less successful than it can be in a closed, fully specified system like a game.

“The future of healthcare will take us to a place where increasing numbers of routine medical decisions are being made by AI alone. But far more decisions will come from a combined approach of powerful AI assessments augmented and amplified by highly trained human intelligence.”

Science’s 2021 Breakthrough of the Year

There is a long-standing “grand challenge” for computational biology: being able to predict the shape of a folded protein given just its gene (amino acid) sequence. This is an ideal problem for data-driven AI in that it is well defined, offers concrete ways of measuring “better” or “worse” predictions, and can draw on large data repositories, such as the Protein Data Bank which has the three-dimensional structures of thousands of proteins for training. It’s also a question of great importance, because the shape of a protein determines its function—what chemical reactions it may catalyze, what molecular machines it will help build as multi-protein complexes, and how it interacts with other molecules to assemble cells, tissues, and organs. So if we want to understand the chain of events that moves us from DNA to RNA to amino acids—the building blocks of proteins—and onward to the full complexity of human life, we have to know how proteins fold.

This challenge of predicting protein structures based on their gene (amino acid) sequences—and trying to get these predictions as close as possible to actual experimental measurements—is so important to the scientific community that a competition is held every two years pitting the best minds in the field against one another in an effort to stimulate a breakthrough in predicted folding accuracy. This competition, known as Critical Assessment of Protein Structure Prediction, or CASP, has been going on since 1994, seven years before the completion of the Human Genome Project. Brilliant researchers such as David Baker at the University of Washington and Yang Zhang at the University of Michigan have consistently won this contest and yet, for a period of eight years between 2008 and 2016, accuracy ceased to improve. During that time, the top performance at CASP hadn’t changed much, with winning performances at around 40 percent on the Global Distance Test, or GDT, which is used to measure the difference between predictions and the experimental measurement of test proteins.

That changed when the Google-owned AI company DeepMind entered the fray for the first time in 2019. DeepMind had become famous for its work on AlphaGo, a program that defeated world champion Lee Sedol in 2016 in the ancient game of Go, widely regarded as the most complex board game in the world, though its rules are quite straightforward. One player has a set of black pieces, the other white. The players take turns placing pieces on a grid, anywhere they choose. The goal is to surround the pieces of the other player. Hidden in this seeming simplicity is deep complexity: the combinations of possible moves are greater than the number of molecules in the known universe. Strategy is devilishly complicated, such that large-scale number crunching alone is insufficient to win. To make AlphaGo a success, its programmers had to leverage the 3,000-year history of human experience playing go. They then used reinforcement learning to improve the program through millions of games iteratively. Sedol, the eighteen-time world champion who fell to AlphaGo, would later say that because AI would rapidly improve—while humans are only capable of improving a little—it was unlikely that any human would ever again be as good at Go as a capable computer.

Following on the success of AlphaGo, the team at DeepMind embarked on an even more ambitious algorithm. They threw away everything that had been learned by humans about go, seeded the reinforcement learning algorithm with the rules of the game, and had the algorithm play against itself over and over. The modifications of the winning side were preferentially kept while those of the losing side were eliminated. The algorithm grew stronger each time it played the game. The resulting program, AlphaGo Zero (“Zero” to denote no human knowledge contamination) beat the original AlphaGo in a tournament one hundred games to zero. It was a stunning repudiation of what we thought we had learned playing go as humans. Expert players are often baffled, unable to understand why AlphaGo makes the moves it does, but that they are superior is not in doubt.

Could such a technology help us solve the protein folding problem? When DeepMind’s AlphaFold entered the CASP contest in 2018, it beat the competition by nearly 50 percent, coming in at an accuracy of nearly 60 percent GDT.3 It was an astonishing leap forward. In 2020, AlphaFold 2.0 blew the original AlphaFold away, making an even bigger leap forward and jumping to nearly 85 percent GDT.

The conventional “gold standard” for determining a protein’s structure is made by forming a regular array of individual protein molecules stabilized by crystal contacts, a process known as protein crystallization. This highly ordered form makes it possible to use techniques such as X-ray crystallography and nuclear magnetic resonance to define the protein structure. But this isn’t the exact shape that the protein has in the body as part of the living system, where function can influence form. Researchers estimate that they can experimentally measure these structures with about 90 percent accuracy. That’s just about where AlphaFold was back in 2020, meaning that we are now living in a world in which our computational predictions can be just as good as our experimental measurements for protein folding. Importantly, computational predictions at this level of accuracy could better represent what the structures are in their natural state (rather than in a crystalized state), making them even better than experimentally determined structures. A remarkable advance!

Such computational predictive power provides tantalizing opportunities, allowing us to simulate conditions we can’t simply measure. But how do we know if the predictions are correct if we can’t check them against direct measurements? One approach is to simulate effects that can be seen, and then determine whether those ancillary downstream consequences match. Will simulations of cell function be more or less accurate when using protein structure inputs from AlphaFold predictions or from experimental data? At this point we can’t know—at least not completely. To some degree, as they say in Great Britain, “the proof is in the pudding.” If computational predictions ultimately prove better than direct measurement at identifying underlying states of wellness, transition, and disease—and do so again and again across various challenges—it will be hard not to trust those predictions. Will they be perfect? Probably not. Will they have a better degree of accuracy than most human doctors achieve? In time, undoubtedly yes.

Programs like AlphaFold are computationally expensive. It can take weeks on a present-day supercomputer to simulate a complex protein. But when a team led by David Baker at the University of Washington took what they had learned from AlphaFold and integrated it with insights from their own work, they developed a hybrid human-computer algorithm, RoseTTAFold, that came very close to the accuracy of AlphaFold 2.0 but completed the computations in a fraction of the time, taking only about ten minutes on a single high-end GPU machine.(1) The time will continue to decrease as computing power continues to increase, so we will soon have quick, high-accuracy solutions to replicating the results of protein crystallization experiments that combine purely computational approaches with domain expertise in protein folding. This new ability to predict protein structures exemplified by AlphaFold and RoseTTAFold was Science magazine’s Breakthrough of the Year for 2021—and rightfully so. They were revolutionary.

None of this suggests that the protein folding problem is completely solved. Far from it. Proteins interact dynamically and have multiple configurations as they carry out their functions in living systems. And while AI is now good at predicting the crystallized structure, none of the current approaches can predict proteins in all of their potential structures, and validating such predictions is a significant challenge. There is an entire field of molecular dynamics simulation devoted to this problem that uses the crystallized structures as a starting point, but these simulations are computationally demanding, even over very short periods of time.(2) It’s almost undebatable, however, that AI and big data have reached a tremendously important milestone, one that is vital to increasing our understanding of scientific wellness.

Knowledge Is Power

For many challenges, data-driven AI is king. In the long run, however, it will take the power of both data-driven and knowledge-driven AI to fundamentally change healthcare. Quite logically, this will begin with the data-driven systems that are most advanced. And among the modern marvels of data-driven AI are artificial neural networks, inspired by the wiring of the human brain.

For computers to become faster, more efficient, and better at problem-solving, it made sense to model their functioning after the circuitry of their creators. While neural networks have been around for a long time—they were first proposed in 1949 by Donald Hebb, who gave us the “what fires together wires together” conception of neural learning discussed earlier in this book—they weren’t very effective until recently due to limitations in data available and computational power.(3)

We have much more data at our disposal today. Enormous caches of electronic health records, or EHRs, have been amassed by following patient interactions with healthcare providers across essentially all major health organizations. A 2017 study found that an average of 80 megabytes of information were generated by each patient per year.(4) These records include imaging data, basic testing results, information on patient outcomes, and more.

One of the tasks doctors understandably seem to hate more than just about anything is inputting patient data into EHRs at the end of a long day. This has given rise to a new, dark joke among physicians, who often lament that they end each day by making a sacrifice to their robot overlords. All of this comes before we add in the larger data sets that will become part of each patient’s data cloud when we start amassing and integrating genomics, longitudinal phenotyping, gut microbiome analyses, and data from wearable devices. But these will automatically be fed into the medical record and won’t have to be manually and laboriously logged. (Lee went to his fiftieth medical school reunion recently and found that many of his colleagues had retired; the strongest motivation for retirement, for most, was detailing their patients’ EHRs and dealing with billing.)

Another major domain of medical AI is interpreting imaging data. Images make up a large fraction of medical data and generally take significant amounts of time from trained experts to interpret.(5) AI technologies are now providing help in extracting, visualizing, and interpreting imaging data, in some cases generating insights that extend beyond what humans are able to do. Deep learning is at the heart of many of these algorithms.(6) Companies such as IBM’s Watson Health, Google’s DeepMind, Microsoft’s Open Mind, and others are building capabilities for many important applications, including detecting anemia and identifying various cancers.(7)

A team at Google was able to use deep learning to identify blood vessel damage in the eye with very high sensitivity and specificity after training on 130,000 retinal images.(8) Importantly, the diagnostic performance of this algorithm was essentially the same as the results achieved by US board-certified ophthalmologists. In a study conducted at Stanford University, AI was able to detect arrhythmias on an electro-cardiogram with higher accuracy than the average cardiologist.(9) For now, we should think of these systems as aids to clinicians, but it is not hard to imagine that we will eventually be entering a world in which the analysis of medical imaging data will mostly be handled by computers, making it possible to incorporate much larger imaging data sets and process them more quickly to readily provide information for decision making. Because AI is generally inexpensive to run once it has been developed, the potential for optimizing care and making it radically cheaper is striking.

The Use of Knowledge-Based Systems

We are discovering new ways to code collective human knowledge into computers, a class of AI approaches built on what have long been called “expert systems.” In a perfect world, these systems would be able to execute decisions on a set of facts and come to the same conclusions as human experts—or even improve on human performances with lightning-fast processing speed, perfect memory recall, and the ability to see the nearly limitless permutations that could arise from any given data combination. At their best, they are akin to having not one expert but thousands upon thousands, all working together at top speed. That is the goal, and we are well on the road to achieving it.

Traditional expert systems have been hard to scale because they tend to get convoluted as the rules pile up, leading to incredibly complex decision trees. Also, human thinking is not purely rule based. Humans are quite good at recognizing when rules shouldn’t be applied to a particular case or where the logic breaks down. While the more common cases can effectively be captured by AI, enumerating every possible permutation for a computer is an impossible task. One of the marvels of the healthy human brain is that it does not get stuck in endless loops and is broadly able to deal with the unexpected. A breakthrough similar to the one that propelled data-driven AI systems is sorely needed for knowledge-based AI systems. This would lead to a world in which “deep learning” could be joined by “deep reasoning,” such that AI can understand implicit relationships, not just ones that have been specifically programmed into its code. What makes this challenge so difficult is that unlike deep learning, where adding massive amounts of computing power and data fueled the leap forward, we need conceptual advances to make deep reasoning achievable.

Before we can hope to get there, we need to understand what reasoning is. One way to get to this is through a seemingly unrelated question: Do platypuses drink water?

Well, you might think, of course. You don’t need to be informed by massive amounts of data on platypuses and water to come to this conclusion, as machine learning would require. After all, you likely know that a platypus is a mammal—albeit a very strange sort of mammal—and you probably think it’s fair to assume that all mammals need water. So the chances are good that platypuses drink water. But unless you happen to be a zookeeper or a platypus expert, how would you know for sure? Have you ever seen a platypus drinking water? Is the fact that platypuses drink water written down in anything you’ve ever read from a credible source?

Just to be sure, you’d probably do a quick Google search of the question “Do platypuses drink water?” And do you know what you would find? No specific answer. What’s more, platypuses spend a lot of their lives in the water, so if your idea was to jump onto a zoo’s webcam and look for a water bowl, you’d be out of luck. If they are drinking water, they’re likely doing it while swimming.

But if you had to hazard a guess, you’d probably go with your intuition, relying on implicit logic to guide your thinking, because that’s how humans make decisions when they don’t have perfect information. Data-driven AI, though, has a difficult time with such questions because it has no intuitive understanding of mechanism or causality. It doesn’t do well at “guessing.” It lives in a world of correlation and prediction.

A form of AI that brings together today’s incredibly powerful data-driven advances strengthened by a breakthrough in causal knowledge models will be much more comfortable with unknowns, implicit relationships, and implied probabilities—the sorts of things experts rely on to make decisions every day when they apply conceptual knowledge (domain expertise) to questions that are far less random and far more consequential.

AI Tools to Help Physicians

A host of AI tools have already emerged to help physicians with their diagnoses. Just a few years ago, most medical decisions were based entirely on the knowledge in the head of the doctor at the time the decision was made, though it has long been clear that the data deluge coming from biomedical sciences exceeds any human’s capacity to process it even superficially. Today, clinical decision support systems have arisen to present physicians and other care providers with access to a wealth of information at the point of care. This leverages what computers are naturally good at—storing, recalling, and correlating vast amounts of information virtually instantaneously—and links it to an expert human’s deep ability to reason intuitively and think creatively.

When these expert systems first came along in the 1980s and 1990s, they were met with hostility by many physicians who worried that computers would soon be in charge of medical decision making, taking the “doctor’s touch” out of the equation and binding the hands of physicians whose opinions differed from the computer’s analysis. But that’s not what happened. Research has shown that these systems have gotten better and better at helping doctors spot potential outcomes they might have missed without taking the ultimate decision-making authority out of their hands.(10) The physician can still say no—at least for now.

We are fast approaching a time when “centaur doctors” combining the best parts of human intelligence and AI assistance will be empowered to make bold medical decisions with far fewer unintended consequences. That’s vitally important, because medical mistakes account for about a quarter of a million deaths annually in the United States alone.(11) (Setting aside the recent COVID-19 pandemic, these errors are the third leading cause of death in the nation after heart disease and cancer.)(12) It is not even a little bit bold to say that AI-enabled healthcare already has saved countless lives.(13)

“We are fast approaching a time when “centaur doctors” combining the best parts of human intelligence and AI assistance will be empowered to make bold medical decisions with far fewer unintended consequences. That’s vitally important, because medical mistakes account for about a quarter of a million deaths annually in the United States alone.”

An AI program called MedAware has helped doctors avoid accidentally prescribing the wrong medication.(14) The system was pioneered by Dr. Gidi Stein after he heard about a 9-year-old boy who died because a doctor clicked the wrong box, ordering up a prescription for blood thinners instead of asthma medicine. Mistakes like this are frighteningly common. About 70 percent of medication errors that may result in adverse effects are prescription errors.(15) And it’s not hard to understand how this could be such a pervasive problem. The FDA has approved tens of thousands of prescription drug products, many of which have very similar names. There’s Novolin and Novolog. There’s vinblastine and vincristine. There’s hydroxyzine and hydralazine. If you recall that doctors have famously bad handwriting, you can imagine how this could have been problematic in the days in which most “scripts” were written by hand, but it’s still a challenge in the digital age, when a simple misspelling or temporary lapse in memory can deliver the wrong medication to a patient. So when a doctor prescribes a medicine that doesn’t match the patient’s medical needs as assessed by MedAware, that physician gets an alert. The system also signals doctors if they attempt to prescribe a medication that could interact negatively with one of the patient’s existing medications—another common error that is almost never checked by physicians.

In hospitals around the world using MedAware, the doctor still has the final say. Sometimes an unusual prescription is warranted in a particular case. The system simply offers an extra check—one that is particularly beneficial when physicians are overworked and exhausted. And it is saving lives.(16)

There’s another advantage: the risk of making mistakes often keeps doctors from thinking creatively, restricting their options to a small number of familiar treatments. These practices, at their best, are grounded in clinical trials, but with the combined power of AI and individual data clouds, we can do much better than “following the average,” taking an individual’s unique genetic makeup, biochemistry, lifestyle, and personal history into account. By eliminating simple errors and making available a wealth of scientifically validated insights specific to each person, an AI-assisted doctor could quickly and confidently evaluate tens of thousands of possible outcomes—in the context of each patient’s unique biology and medical conditions—before settling on a much smaller selection of high-quality recommendations.

The Age of Scientific Wellness 9780674245945


  1. M. Baek et al., “Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network,” Science 373 (2021): 871–876.

  2. J. C. Phillips et al., “Scalable Molecular Dynamics on CPU and GPU Architectures with NAMD,” Journal of Chemical Physics 153 (2020): 044130; X. Liu et al., “Molecular Dynamics Simulations and Novel Drug Discovery,” Expert Opinion on Drug Discovery 13 (2018): 23–37; Y. Wang, J. M. Lamim Ribeiro, and P. Tiwary, “Machine Learning Approaches for Analyzing and Enhancing Molecular Dynamics Simulations,” Current Opinion in Structural Biology 61 (2020): 139–145.

  3. D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory (Wiley and Sons, 1949).

  4. M. D. Huesch and T. J. Mosher, “Using It or Losing It? The Case for Data Scientists inside Health Care,” NEJM Catalyst, May 4, 2017,

  5. S. Nuti and M. Vainieri, “Managing Waiting Times in Diagnostic Medical Imaging,” BMJ Open 2 (2012).

  6. R. Yousef et al., “A Holistic Overview of Deep Learning Approach in Medical Imaging,” Multimedia Systems 28 (2022): 881–914; S. K. Zhou et al., “Deep Reinforcement Learning in Medical Imaging: A Literature Review,” Medical Image Analysis 73 (2021): 102193; A. Esteva et al., “A Guide to Deep Learning in Healthcare,” Nature Medicine 25 (2019): 24–29.

  7. M. S. Kim et al., “Artificial Intelligence and Lung Cancer Treatment Decision: Agreement with Recommendation of Multidisciplinary Tumor Board,” Translational Lung Cancer Research 9 (2020): 507–514; A. Mitani et al., “Detection of Anaemia from Retinal Fundus Images via Deep Learning,” Nature Biomedical Engineering 4 (2020): 18–27; S. M. McKinney et al., “International Evaluation of an AI System for Breast Cancer Screening,” Nature 577 (2020): 89–94; O. J. Oktay et al., “Evaluation of Deep Learning to Augment Image-Guided Radiotherapy for Head and Neck and Prostate Cancers,” JAMA Network Open 3 (2020): e2027426.

  8. V. Gulshan et al., “Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs,” Journal of the American Medical Association 316 (2016): 2402–2410.

  9. A. Y. Hannun et al., “Cardiologist-Level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms Using a Deep Neural Network,” Nature Medicine 25 (2019): 65–69.

  10. L. Moja et al., “Effectiveness of Computerized Decision Support Systems Linked to Electronic Health Records: A Systematic Review and Meta-Analysis,” American Journal of Public Health 104 (2014): e12–e22.

  11. J. G. Anderson and K. Abrahamson, “Your Health Care May Kill You: Medical Errors,” Studies in Health Technology and Informatics 234 (2017): 13–17.

  12. Anderson and Abrahamson, “Your Health Care May Kill You.”

  13. E, J. Topol, “High-Performance Medicine: The Convergence of Human and Artificial Intelligence,” Nature Medicine 25 (2019): 44–56; A. Haque, A. Milstein, and L. Fei-Fei, “Illuminating the Dark Spaces of Healthcare with Ambient Intelligence,” Nature 585 (2020): 193–202; R. T. Sutton et al., “An Overview of Clinical Decision Support Systems: Benefits, Risks, and Strategies for Success,” NPJ Digital Medicine 3 (2020): 17.

  14. R. Rozenblum et al., “Using a Machine Learning System to Identify and Prevent Medication Prescribing Errors: A Clinical and Cost Analysis Evaluation,” Joint Commission Journal on Quality and Patient Safety 46 (2020): 3–10.

  15. G. P. Velo and P. Minuz, “Medication Errors: Prescribing Faults and Prescription Errors,” British Journal of Clinical Pharmacology 67 (2009): 624–628.

  16. Rozenblum et al., “Using a Machine Learning System.” 2 0. E. R. Doherty-Torstrick, K. E. Walton, and B. A. Fallon, “Cyberchondria: Parsing Health Anxiety from Online Behavior,” Psychosomatics 57 (2016): 390–400.