Closing the Loop: How AI Protein Design and Display Screening Work as a Single System

Most protein engineering programs draw a line between computation and experiment. One team runs RFdiffusion or BindCraft, hands off a set of designs, and waits. Another team clones, transforms, and screens. The two groups may not speak again until results come back weeks later.

This is how most campaigns are structured. It is also why many of them underperform.

The programs that consistently produce high-affinity, developable binders do not treat design and screening as sequential steps. They treat them as a single coupled system, where the output of each stage directly shapes the input of the next. The distinction matters because the failure modes at the interface between computation and experiment are different from the failure modes within either discipline alone.

What Computational Design Actually Produces

Tools like RFdiffusion, BindCraft, and Boltzgen generate protein backbones and sequences optimized against structural and energetic objectives. RFdiffusion produces backbone geometries conditioned on a target interface. BindCraft refines these into full sequence designs with binding energy optimization. Boltzgen samples conformational diversity across the design landscape.

What these tools do not produce is experimental validation. A design with a predicted Rosetta interface energy (dG_separated) of -15 REU and a high pLDDT score is a hypothesis. It is a well-informed hypothesis, but it remains untested until it is expressed, displayed on a cell surface, and sorted against its target.

The practical implication: computational metrics like Rosetta binding energy, AlphaFold confidence, and shape complementarity are necessary filters, but they are not sufficient predictors of experimental success. We routinely see designs that score well computationally but fail to express, fail to display, or fail to bind in a yeast display assay. The reverse also happens: designs that rank modestly in silico perform well experimentally.

This is not a failure of the computational tools. It is a reflection of the gap between the objectives these tools optimize for and the full set of biophysical properties that determine experimental success.

Three Ways to Lose Value Between Design and Screening

Designs that do not express. Computational tools optimize for structure and binding, not for translational efficiency, folding kinetics in the yeast secretory pathway, or compatibility with the Aga2p fusion context. A design can have a perfect predicted structure and still misfold when expressed as a surface display construct.

Libraries that are too narrow. Teams often select the top 50 or 100 designs by computational score and screen only those. This approach treats the computational pipeline as a precision tool when it is better understood as a diversity generator. The ranking function is noisy. Screening a narrow, top-ranked set discards designs that would have been hits.

No feedback path. The most common failure is structural: the team screens one batch of designs, identifies hits, and moves to affinity maturation without feeding experimental outcomes back into the design pipeline. The computational model never learns which of its predictions were right.

Designing Libraries for Screening, Not for Rankings

Generate more designs than you plan to screen and filter aggressively on expressibility proxies, not just binding metrics. Predicted solubility, aggregation propensity, and the presence of unpaired cysteines or N-linked glycosylation sites in the display context are all worth filtering on before cloning.

Maintain structural diversity in the screened set. If RFdiffusion produces designs with three distinct backbone topologies for the same epitope, carry all three into screening even if one topology scores lower on average. The computational energy function is not accurate enough to reliably distinguish between topologies. The screening assay is.

Include internal controls. Designs with known binding properties, non-binding scaffolds, and expression-positive but target-negative controls allow you to calibrate the assay and distinguish real signal from display level artifacts. This is standard practice in library screening but is often skipped when screening computationally designed sets because the batch sizes are smaller.

The Feedback Loop That Changes Everything

The most valuable data in any design campaign is the first round of experimental results. Not because the first hits are the final product, but because the experimental outcomes recalibrate the entire computational pipeline.

Consider what a single round of yeast display screening tells you. FACS sorting against the target separates binders from non-binders. Display level measurements (anti-tag staining) separate designs that express and fold from those that do not. NGS on the sorted populations quantifies enrichment ratios for every design in the library.

This is a rich, multivariate dataset. It tells you which backbone topologies actually produce binders. Which sequence features correlate with expression. Which predicted metrics were informative and which were noise. This is exactly the information the computational tools need to improve.

Teams that feed this data back into the design pipeline before running a second round of design see measurably better outcomes. The second batch of designs is not just iterating on hits from round one. It is generated by a pipeline that has been recalibrated against experimental reality.

At Ranomics, this is the default workflow. Computational design and yeast display screening run as alternating cycles, not as a linear handoff. The experimental data from each round directly informs the design parameters, filtering criteria, and diversity strategy of the next.

What Integrated Campaigns Look Like in Practice

A typical coupled workflow for a binder design campaign:

Identify epitopes on your own target. Epitope Scout scores and ranks surface patches on any PDB structure. Free to use.

Round 1: Broad exploration. Generate 5,000 to 10,000 designs across multiple backbone topologies, targeting multiple hotspots on the antigen surface using RFdiffusion, BindCraft, and Boltzgen in parallel. Each tool explores different regions of design space: RFdiffusion for backbone diversity, BindCraft for direct binding energy optimization, Boltzgen for conformational sampling. Filter to 2,000 to 4,000 on expression proxies and structural diversity. Clone as a pooled library. Screen by yeast display with FACS. Sequence enriched and depleted populations by NGS.

Analysis: Recalibrate. Identify which computational metrics predicted experimental outcomes. Retrain or re-weight the filtering criteria. Identify the top-performing backbone topologies. Flag sequence features associated with expression failure.

Round 2: Focused refinement. Generate new designs using the validated topologies. Apply the recalibrated filters. Include point variants of round 1 hits for affinity maturation. Screen again with tighter sorting gates.

Round 3 (if needed): Optimization. Narrow to the best 10 to 20 candidates. Characterize individually: SPR kinetics, thermal stability, cross-reactivity. Feed developability data back if moving to mammalian display for final filtering.

The total timeline is often comparable to a traditional directed evolution campaign, but the quality of the final candidates is higher because every round is informed by both computational prediction and experimental measurement.

The Bottleneck Has Moved

Two years ago, the bottleneck in protein design was generating plausible structures. Tools like RFdiffusion and BindCraft have largely solved that problem. The computational side can now produce thousands of diverse, structurally plausible designs in hours.

The bottleneck is now at the interface: how efficiently you convert computational diversity into experimental data, and how effectively you feed experimental results back into the design pipeline. Teams that treat this interface as an engineering problem, not an administrative handoff, are the ones producing the best molecules.

The tools exist on both sides. The gap is in the workflow that connects them.

Start a project with Ranomics

Compute-to-clone: AI design coupled to yeast/mammalian display validation in a single program.
AI Binder Sprint: Integrated design + display campaigns in 6–8 weeks.