AI-Driven Protein Design: An Honest CRO Perspective

AI-driven protein design covers a specific set of methods — diffusion models, hallucination-based scoring, sequence design via inverse folding — applied to a specific problem: generating proteins with predefined function from scratch. The field went from research curiosity to production tool in about three years. This is what currently works and what doesn’t, from a CRO that runs both the design and the wet-lab validation.

What “AI-driven” actually means in protein design

Three model families do most of the work in 2026:

Diffusion models (RFdiffusion, RFantibody) generate protein backbones by reversing a noising process. They are conditioned on target hotspots and produce 3D coordinates of plausible binder structures. We’ve covered the operational details in RFdiffusion in Practice.

Hallucination + scoring pipelines (BindCraft) iteratively propose backbone-and-sequence pairs and score them against a target. They produce fewer designs per run than diffusion, but each design is more structurally plausible because the scoring prunes during generation rather than after.

Inverse folding models (ProteinMPNN, ESM3) take a backbone and predict amino acid sequences likely to fold to it. They are the bridge between structure-generation and synthesizable DNA.

A typical campaign chains them: RFdiffusion generates 50,000 backbones → ProteinMPNN designs 5 sequences per backbone → AlphaFold2 or Boltz-2 predicts and confirms structure → developability filters cut the list to ~500 → wet-lab synthesis. The whole pipeline runs in 24-72 hours on H100 infrastructure.

What works in 2026

Hotspot-conditioned binder generation against well-defined targets. When the target has a clear paratope (an extracellular receptor domain, an enzyme active site, a viral protein interface), RFdiffusion and BindCraft consistently produce binders that pass computational triage. We’ve designed binders against PD-L1, TIGIT, CD8a, CD3ε, and several membrane-protein extracellular domains; the design step is no longer the bottleneck.

De novo scaffolds with custom topologies. AI design produces folds that don’t exist in the PDB, including helical bundles tuned to specific surface complementarity. This was effectively impossible with template-based methods.

Sequence diversity at fixed structure. Inverse folding lets you generate dozens of sequences that fold to the same backbone. This is useful for both campaign-level diversity and downstream developability triage.

What doesn’t (yet) work

Flat targets without obvious hotspots. When the target’s surface lacks a clear interaction patch (transcription factors, intrinsically disordered regions, non-classical epitopes), AI design hits the same wall structural biologists hit decades ago. The diffusion model has to be told where to bind; if you don’t know, the model doesn’t either.

Membrane proteins beyond the extracellular domain. Diffusion models trained on soluble protein structures struggle when the binder needs to engage a transmembrane region or a lipid-embedded epitope. Some progress with conditioning on membrane-context priors, but production-grade results require careful target preparation that still takes structural-biology judgment.

Wet-lab hit rates without aggressive filtering. The headline numbers (“90% of designs bind!”) in academic papers reflect carefully curated test sets. Real-world hit rates from raw RFdiffusion output, before developability filtering, fall well below the published headlines. Developability filtering (RFdiffusion outputs need a developability check) closes much of the gap, but the model doesn’t filter for what wet labs care about.

The validation gap

This is where AI-driven protein design becomes interesting commercially. Software-only AI-design companies generate designs and ship them to customers; the customer (or a CRO) then runs the experimental validation. The hit rate is the joint product of the design quality and the validation strategy. When the design pipeline and the screen are not co-designed, the joint hit rate is lower than either could achieve alone.

A concrete example: a diffusion model that produces structurally plausible but aggregation-prone binders looks great on AlphaFold2 confidence scores and pipeline pLDDT. The same designs fail expression in yeast or mammalian display. If the design loop doesn’t get the wet-lab feedback, it doesn’t learn — and the next campaign repeats the same failure mode.

Closed-loop campaigns — where design proposes, screen rejects, and design re-proposes against the screen’s feedback — produce meaningfully higher hit rates than open-loop campaigns at comparable cost. This is the operational case for integrating AI design with experimental validation rather than treating them as separate vendor relationships.

Practical recommendations for buyers

If you’re scoping an AI-driven protein design campaign:

Ask the vendor for their wet-lab hit rate on a comparable target. Not the published rate from the original RFdiffusion paper. Their rate, on their last campaign.
Ask whether developability filters are applied before delivery. A list of 1,000 designs without filters is much less useful than 100 filtered designs.
Decide who owns the validation. If the AI vendor doesn’t run wet-lab validation, you’ll need to coordinate with a separate display/screening provider. The data-flow latency between vendors is the silent killer of these campaigns.
Budget for at least two design rounds. Single-shot AI design rarely produces leads good enough to skip iteration.

The integrated approach

Ranomics designs binders with RFdiffusion, BindCraft, and Boltzgen, validates them with yeast and mammalian display, and feeds the experimental results back into the design model on the same campaign. The closed loop runs in 6-8 weeks for the AI Binder Sprint and longer for Custom Campaigns. The hit rates we publish are end-to-end — design through validated binder, no curation.

That doesn’t make AI-driven protein design easy. It makes the difficulty visible, which is the part that matters for project planning.