Generative AI is making a large impact on biotechnology research. We’ll focus on two major areas of interest: drug discovery and protein structure prediction.

Drug discoveryKirkpatrick, Peter, and Clare Ellis. 2004. “Chemical Space.” Nature 432 (7019): 823–23. https://doi.org/10.1038/432823a. involves the exploration and development of new pharmaceutical compounds to combat various diseases and medical conditions. On the other hand, protein structure prediction focuses on the computational modeling of three-dimensional protein structures based on amino acid sequences.

Searching chemical space with generative molecular graph networks

At its base, a medicine—be it drugstore aspirin or an antibiotic prescribed by a doctor—is a chemical graph consisting of nodes (atoms) and edges (bonds) (shown in the figure “Chemical graph”). Like the generative models used for textual data, graphs have the special property of not being fixed in length. There are many ways to encode a graph, including a binary representation based on numeric codes for the individual fragments (shown in the figure “Chemical graph”) and “SMILES” strigs that are linearized representations of 3D molecules.

The number of potential features in a chemical graph is quite large; in fact, the number of potential chemical structures that are in the same size and property range as known drugs has been estimated1 at 106 —even larger than the number of research papers on generative models; for reference, the number of atomsVillanueva, John Carl. 2009. “How Many Atoms Are There in the Universe?” Universe Today. July 31, 2009. https://www.universetoday.com/36302/atoms-in-. in the observable universe is between 107810^{78} to 108210^{82}.

One can appreciate, then, that a large challenge of drug discovery—finding new drugs for existing and emerging diseases—is the sheer size of the potential space one might need to search. Experimental approaches for drug screening—testing thousands, millions, or even billions of compounds in high-throughput experiments to find a chemical needle in a haystack with potential therapeutic properties—have been used for decades. However, the development of computational methods such as machine learning has opened the door for “virtual screening” on a far larger scale.

Get hands-on with 1400+ tech skills courses.