Accelerate your protein engineering. Download our free guide to cell display
A High School Guide : How AI Helps Scientists Design New Proteins
From DeepMind’s AlphaFold to AI-designed COVID antivirals, artificial intelligence is revolutionizing protein design. This short guide breaks down how machines learn the “language of life” and help scientists build a healthier, cleaner world.
10/29/20255 min read


You’ve seen AI (Artificial Intelligence) everywhere — from the apps on your phone to chat bots on different websites. But one of the most exciting ways AI is being used is in protein engineering, a field where scientists design brand-new proteins to become solutions for pressing challenges.
This includes everything from biopharma applications like new medicines to industrial biotechnology applications like designing enzymes that can break down plastic waste or make laundry detergent more effective.
But how does a computer “design” a protein? How does it achieve the intelligence to do this? It’s not magic—but it is incredibly smart. Here’s a simple breakdown.
🧩 What’s the Big Problem?
First, what is a protein? Think of it as a long necklace made from 20 different kinds of beads, called amino acids. The specific order of these beads—the sequence—determines how the necklace will fold up into a complex 3D shape.
That final 3D shape is everything. It’s what allows a protein to do its job, proteins fold into distinch shapes that allows it to grab onto a virus or break down a stain.
The problem is the number of possibilities. A tiny protein might have 100 beads. With 20 different options for each spot, the number of possible sequences is astronomically larger than all the atoms in the universe.
A scientist trying to find the one perfect sequence by guessing is like trying to win the lottery a billion times in a row.
This is where AI comes in.
🧠 Step 1: The AI “Learns the Language” of Proteins
Before an AI can design a new protein, it has to learn the rules of existing proteins. Scientists do this by feeding it a massive dataset, usually containing the sequences of millions of real proteins found in nature (from humans, bacteria, plants, etc.).
The AI learns by playing a “fill-in-the-blank” game over and over:
It takes a real protein sequence:
...Alanine - Leucine - Glycine - Valine...
It hides one of the “beads”:
...Alanine - Leucine - [ ? ] - Valine...
It tries to guess the missing bead based on all its neighbors.
By playing this game billions of times, the AI doesn’t just memorize sequences; it learns the grammar of proteins. It learns that if you have an Alanine and Leucine on one side, and a Valine on the other, the missing piece is likely Glycine to make the structure stable.
DeepMind’s AlphaFold project did exactly this. It learned the “language” of proteins so well that it can now predict how almost any protein folds into its 3D shape—something that used to take scientists years in the lab. Today, AlphaFold’s database includes over 200 million protein structures, helping researchers understand diseases like Parkinson’s and develop new vaccines faster.
🔬 Step 2: The AI “Learns the Job” (Connecting Sequence to Function)
Learning the grammar (how to fold) is great, but it’s not enough. The AI also needs to learn what a protein does.
To do this, scientists give the AI a different kind of dataset. One dataset links a specific sequence to a specific job.
Scientists create a library of thousands of different protein sequences in the lab, test each for a specific function (like “how well does it stick to a flu virus?” or “how fast does it break down a grass stain?”), and feed that data to the AI.
The AI now has a lookup table:
Sequence 1 → Sticks to flu: 9/10
Sequence 2 → Sticks to flu: 2/10
Sequence 3 → Breaks down grass: 10/10
Now the AI can connect grammar (Step 1) to job (Step 2). It starts to learn which amino acids in which positions are responsible for a particular function.
At the University of Washington’s Institute for Protein Design, scientists used AI to create “mini-proteins” that bind tightly to the COVID-19 spike protein. These didn’t exist in nature—they were designed completely from scratch. Some became the basis for experimental antivirals and rapid diagnostics during the pandemic.
🧪 Step 3: The AI Becomes a “Designer”
Once the AI has learned the “grammar” and the “job,” scientists can use it in two powerful ways.
The first way is to use AI is a scoring machine.
Scientists often have thousands of new protein ideas. Before spending months and effort testing them in a lab, they ask the AI:
“Hey AI, I have 100,000 new sequences. Which of these will be stable? And which ones are most likely to stick to the flu virus?”
The AI “scores” every sequence for both stability and function. It might report back:
“These 500 sequences look stable and have a 90% chance of working. The rest are probably junk.”
This saves a huge amount of time and money.
Companies like Amgen and Pfizer use AI scoring tools to predict which protein-based drugs (called biologics) are most likely to work before they’re made. This helps speed up drug development and reduces the need for thousands of failed experiments.
The second way is to use AI as a generator to make sequences from scratch.
This is the new, exciting frontier. Instead of just scoring ideas, the AI creates brand-new proteins from scratch. Scientists can give the AI a challenge, and it will “generate” a novel protein sequence that meets the goal.
There are a few main ways these “generators” work:
🧩 Autoregressive Models (Like Predictive Text)
Analogy: This works like your phone’s text predictor, writing a protein one “bead” at a time.
Researchers at Meta AI (Facebook) developed ESMFold, which can generate and predict the structures of new proteins instantly—helping scientists design enzymes or materials that don’t exist in nature.
🔍 Masked Models (Like a Fill-in-the-Blank Game)
Analogy: A scientist gives the AI a known protein and “masks” the parts they want redesigned. The AI fills in the blanks to create a new version.
In 2023, Google DeepMind and Isomorphic Labs used this approach to re-engineer enzymes that make drug molecules faster and more precisely—cutting down production time and waste in pharmaceutical manufacturing.
🌫️ Diffusion Models (Like an Image Generator)
Analogy: This method starts with random “noise” (a jumbled protein sequence) and gradually refines it into a stable, functional protein—just like an AI art generator makes a clear image from static.
In 2024, Generate:Biomedicines used diffusion-based AI models to design proteins that help the immune system better recognize cancer cells. These designs are now being tested as next-generation cancer therapies.
⚗️ Why Good Data Is Still the Most Important Part
Here’s the catch: the AI is only as smart as the data you give it. This is the “Garbage In, Garbage Out” rule.
If you train an AI with messy, inconsistent, or incomplete data, it will learn the wrong rules—and design proteins that fail in the lab.
That’s why breakthroughs in biologics or biotechnology still require real scientists running clean, high-quality experiments. The AI helps connect the dots, but it can’t replace good data or human judgment.
When AlphaFold’s predictions were released, scientists still had to verify them using lab techniques like X-ray crystallography and cryo-electron microscopy. AI got them 90% of the way there—but humans had to check the final 10%.
🚀 The Future: AI + Scientists = Discovery Machines
The future of protein engineering is teamwork:
AI explores billions of possibilities and suggests the best ones.
Scientists test, refine, and validate those ideas in the lab.
Together, they’re designing new medicines, enzymes that eat plastic, proteins that capture carbon dioxide, and materials that can heal themselves—things that once sounded like science fiction.
The next generation of breakthroughs will come from this partnership: AI’s creativity + human curiosity.
Interested in Protein AI
Are you interested in AI protein design? Looking for specific datasets and tools?
Come talk to our team to get free guidance.
