RFdiffusion in Practice: What Works and What Fails

Most explanations of RFdiffusion describe how diffusion models work in general: add noise to protein structures, train a model to reverse the process, condition on a target surface. That is correct but not operationally useful. If you are planning or evaluating an RFdiffusion campaign, what you need to know is where the method performs well, where it struggles, and which parameter choices actually affect outcomes.

This post covers what we have learned running RFdiffusion campaigns at Ranomics across dozens of targets.

Scaffold topology bias: what the model generates and what it avoids

RFdiffusion was trained on structures from the Protein Data Bank. This means the model has strong priors toward topologies that are well-represented in the PDB: compact globular folds, alpha-helical bundles, mixed alpha/beta structures, and immunoglobulin-like domains.

In practice, this creates predictable patterns in what the model generates. For hotspot-conditioned binder design, the most common output topologies are three-helix bundles and four-helix bundles with short connecting loops. These are compact, thermodynamically stable folds that present well-defined binding surfaces, and they work well for many targets.

The limitation appears when the target geometry demands something different. If the epitope sits in a narrow groove, the model needs to generate an extended loop or beta-hairpin to reach it. RFdiffusion can produce these topologies, but at lower frequency and with lower structural confidence scores. You will need to increase the number of designs sampled (from 10,000 to 50,000 or more) to find enough non-helical solutions in the output pool.

Membrane protein targets present a specific challenge. RFdiffusion has no explicit membrane model. If the target epitope is near the membrane-proximal region, generated scaffolds may clash with the lipid bilayer in a physiological context even though they look structurally valid in isolation. Filtering against a membrane plane model after generation is necessary for these targets.

Hotspot conditioning: the highest-leverage decision in any campaign

The residues you specify as hotspots constrain the entire downstream output. Getting this wrong is the most common reason campaigns fail to produce validated binders.

Too few hotspots (1 to 2 residues) gives the model too much freedom. Generated scaffolds contact the specified residues but the rest of the interface is unconstrained, leading to weak and geometrically variable binding modes. Hit rates after experimental screening are consistently low.

Too many hotspots (more than 8 to 10 residues) overconstrains the problem. The model struggles to find scaffold geometries that simultaneously contact all specified positions, and output diversity collapses. You end up with many near-identical designs that all fail or all succeed together.

The productive range is 3 to 8 hotspot residues for most targets. Within this range, the choice of which residues to specify matters more than the count. Prioritize residues that are solvent-accessible, structurally rigid (low B-factor or high pLDDT), and chemically diverse (mix of hydrophobic and polar contacts). Avoid specifying hotspots on flexible loops or disordered regions, as the model will generate scaffolds that contact the modeled position of those residues, which may not reflect their actual conformation.

Identify epitopes on your own target. Epitope Scout scores and ranks surface patches on any PDB structure. Free to use.

One pattern we see repeatedly: researchers specify hotspots based solely on biological importance (e.g., residues known to be critical for receptor-ligand binding) without checking structural accessibility. A residue can be biologically critical and geometrically buried. RFdiffusion will attempt to reach it, producing scaffolds with strained geometries that fail during structural validation.

Partial diffusion: when to start from an existing scaffold

Standard RFdiffusion starts from pure noise and generates backbones from scratch. Partial diffusion starts from an existing structure, adds a controlled amount of noise, and then runs the reverse diffusion process conditioned on a new target.

This is useful in two specific scenarios.

Scaffold grafting. You have a validated binder scaffold against one target and want to adapt it to a related target with a different epitope. Partial diffusion preserves the global fold topology while allowing the interface region to be rebuilt. The noise level controls how much of the original structure is retained: low noise (10 to 20% of full diffusion steps) makes minor adjustments, high noise (60 to 80%) essentially redesigns the scaffold while keeping a topological bias toward the original.

Topology steering. When you want the output to have a specific fold class (e.g., beta-sheet scaffold for a target that requires a flat binding surface), you can seed with an exemplar structure of that topology. This biases the output distribution toward the desired fold type without rigidly constraining it.

Partial diffusion does not guarantee the output will retain the input fold. At high noise levels, the model may diverge entirely from the seed structure. Always verify that the output topology matches your intent before advancing candidates.

Target-dependent failure modes

Flat, featureless surfaces. Some targets (certain cytokine receptors, viral capsid proteins, designed repeat proteins) present large, convex surfaces with minimal topographic features. RFdiffusion generates scaffolds that sit on these surfaces but lack the geometric complementarity that drives high-affinity binding. These campaigns require larger sampling (30,000 to 50,000 designs minimum) and aggressive post-design filtering, and they still tend to produce lower hit rates than campaigns against concave or grooved epitopes.

Glycosylated surfaces. RFdiffusion does not model glycans. If the target surface near the epitope is glycosylated in vivo, generated scaffolds may clash with glycan chains that are not represented in the input structure. Check the target for known glycosylation sites and exclude or flag epitopes that are partially occluded by glycans.

Very large interfaces. Designing scaffolds that span more than approximately 1,500 square angstroms of buried surface area pushes the model toward large, multi-domain architectures that are harder to express and fold. For targets requiring extensive interfaces, consider splitting the design into smaller binders that engage adjacent but non-overlapping epitopes.

Avoiding redundant candidate pools

A subtle failure mode is generating thousands of designs that look structurally diverse but converge to a small number of unique sequences after ProteinMPNN. This happens when multiple distinct backbones present similar local geometries at the interface: ProteinMPNN assigns similar amino acid identities at the contact positions, and the resulting sequences cluster tightly.

Monitor sequence identity across the candidate pool before committing to synthesis. If more than 30 to 40% of candidates share greater than 80% sequence identity, the effective diversity of the pool is lower than the structural diversity suggests. In this case, either increase the scaffold length range to force more topological variation, reduce the number of hotspot constraints, or run ProteinMPNN at higher sampling temperatures (0.3 to 0.5) to increase sequence diversity at the cost of average predicted stability.

The bottom line

RFdiffusion is a powerful backbone generator, but it is one step in a multi-step pipeline. Its outputs are only as good as the hotspot specification, and its failures are often invisible until experimental screening. Running large campaigns, filtering aggressively, and combining RFdiffusion with BindCraft and Boltzgen to cover different regions of structure space is the approach that consistently produces validated binders.

See the full pipeline | Start a project

AI protein binder design: De novo binder design services using RFdiffusion, BindCraft, and experimental validation.
RFdiffusion: Production RFdiffusion pipelines for hotspot-conditioned binder design.
AI Binder Sprint: Flagship 6–8 week program combining RFdiffusion with experimental validation.

Frequently asked questions

How many hotspot residues should I specify for RFdiffusion?

The productive range is 3 to 8 hotspot residues for most targets. Too few, one or two, gives the model too much freedom and produces weak, geometrically variable binding modes. Too many, more than 8 to 10, overconstrains the problem and collapses output diversity. Within the productive range, which residues you choose matters more than the count: prioritize solvent-accessible, structurally rigid, chemically diverse positions.

What types of target does RFdiffusion struggle with?

RFdiffusion struggles with flat, featureless target surfaces, glycosylated surfaces, and very large interfaces. It has strong priors toward compact helical bundles from its PDB training data, so non-helical topologies appear at lower frequency. It has no explicit membrane or glycan model, so designs against membrane-proximal or glycosylated epitopes can clash with structure not represented in the input. These targets need much larger sampling and aggressive filtering.

What is partial diffusion in RFdiffusion?

Partial diffusion starts from an existing structure, adds a controlled amount of noise, then runs the reverse diffusion process conditioned on a new target, instead of starting from pure noise. It is useful for scaffold grafting, adapting a validated binder to a related target, and for topology steering toward a desired fold class. The noise level controls how much of the original structure is retained.

Why do RFdiffusion designs sometimes lack sequence diversity?

Thousands of structurally diverse backbones can converge to a small number of unique sequences after ProteinMPNN, because distinct backbones presenting similar interface geometry receive similar amino acid assignments. Monitor sequence identity across the pool before synthesis. If more than 30 to 40 percent of candidates share over 80 percent identity, increase the scaffold length range, reduce hotspot constraints, or raise the ProteinMPNN sampling temperature.