De Novo Protein Design: How the Pipeline Works in Practice

We run de novo binder design campaigns every week at Ranomics. The pipeline has matured to the point where generating thousands of structurally plausible binder candidates is routine. The hard part is no longer computational. The hard part is confirming that any of them actually bind.

This post describes how a modern de novo design campaign works end to end, from target structure through experimental screening, and where the real failure modes live.

The design layer: three generators running in parallel

A de novo campaign starts with a target structure and a defined binding surface. We specify hotspot residues on the target, the residues we want the designed binder to engage, and run three generators in parallel.

RFdiffusion generates protein backbones by reversing a learned noise process, producing novel scaffolds that are geometrically complementary to the specified hotspot. It does not start from any known protein. The outputs are backbone coordinate sets, no sequences yet.

BindCraft takes a different approach: it jointly optimizes backbone geometry and sequence in a single pass, using AlphaFold2 as an internal scoring function. BindCraft generates complete binder candidates (structure and sequence together) and applies its own filters for predicted binding affinity and structural confidence.

Boltzgen samples from a Boltzmann distribution over protein conformations, producing diverse backbone geometries that satisfy the target interface constraints through a generative flow matching approach.

Identify epitopes on your own target. Epitope Scout scores and ranks surface patches on any PDB structure. Free to use.

Running all three in parallel is deliberate. Each generator explores structure space differently. RFdiffusion excels at producing compact, high confidence scaffolds. BindCraft tends to find solutions that score well on AlphaFold metrics but may explore different topology space. Boltzgen adds conformational diversity that neither of the other two reliably captures. The union of their outputs gives broader coverage of viable binding modes than any single method alone.

Sequence design: ProteinMPNN for backbone-only generators

RFdiffusion and Boltzgen produce backbones without sequences. Those backbones need sequences that will fold into the intended geometry and present the correct interface residues. ProteinMPNN handles this step: given a backbone, it predicts amino acid sequences optimized for structural stability and foldability.

We typically generate 8 to 16 sequences per backbone, then filter on ProteinMPNN confidence scores before advancing candidates. BindCraft skips this step entirely because it designs sequence and structure jointly.

Structural validation: filtering before synthesis

Before any candidate reaches a gene synthesis order, it passes through computational structural validation. We run designed sequences through Boltz-2, ESMFold, and ColabFold to predict their folded structures independently of the design model that generated them.

The key metric is whether the predicted structure matches the designed structure. If a sequence was designed to fold into a four-helix bundle that engages the target through a specific interface, the predicted structure should reproduce that geometry. We measure this as backbone RMSD between the design model and the predicted structure, and we check that the predicted interface contacts match the designed ones.

Candidates that fail structural validation, those where the predicted fold deviates significantly from the design, are discarded before synthesis. This step eliminates 60% to 80% of initial candidates depending on the target. It is the single most cost-effective filter in the pipeline.

Experimental screening: where campaigns succeed or fail

The survivors of computational filtering are synthesized and screened experimentally. At Ranomics, this means yeast surface display as the primary screening platform, with mammalian display for candidates that require post-translational modifications or where yeast expression is problematic.

Yeast display lets us screen thousands of candidates in parallel against labeled target protein, sorting for binding by FACS. The throughput is high enough to test the full computationally filtered set in a single campaign. First-round hit rates for well-designed campaigns typically fall in the 5% to 15% range, meaning 5% to 15% of synthesized candidates show detectable binding to the target.

That number is the real metric of a de novo design campaign. Not how many backbones were generated, not how many passed computational filters, but how many confirmed binders came out of experimental screening.

What determines success vs. failure

After running dozens of these campaigns, the patterns are clear.

Target surface quality matters more than generator choice. Flat, featureless surfaces are harder to design binders against than concave pockets or surfaces with prominent loops. No amount of computational sampling compensates for a target that presents minimal geometric features for a binder to grip.

Hotspot selection is the highest leverage decision. The residues you designate as the binding interface on the target constrain everything downstream. Choosing hotspots that are solvent-accessible, structurally rigid, and chemically diverse produces better candidates than choosing hotspots based solely on biological relevance.

Computational filtering must be aggressive. The temptation is to advance marginal candidates because synthesis is “only” a few hundred dollars per gene. In practice, every candidate that enters experimental screening consumes assay capacity. Tight structural validation cutoffs save weeks of wet-lab time.

The team running computation must also run the experiments. When design and validation are separated across organizations, the feedback loop breaks. If a campaign yields zero hits, understanding whether the failure was in hotspot selection, backbone sampling, sequence design, or expression requires access to both the computational parameters and the experimental data. We keep both under one roof for this reason.

The bottleneck is experimental, and that changes the economics

Five years ago, the bottleneck in protein design was computation: generating enough plausible candidates was slow and expensive. Today, a single cloud GPU run produces more candidates than any lab can screen in a quarter. The constraint has flipped.

This means the economic value in de novo design is not in running the algorithms. It is in the experimental infrastructure to validate what the algorithms produce, and in the expertise to close the loop between computational predictions and biochemical reality.

De novo protein design works. It produces binders for targets where no natural scaffold exists and no library would reach. The question is no longer whether the approach is viable. The question is whether your validation pipeline can keep pace with what the generators produce.

Start a project

AI protein binder design: De novo binders designed and validated end-to-end.
Binder Pilot: Starter program for first-time de novo design campaigns.