BindCraft vs RFdiffusion for Binder Design

Both BindCraft and RFdiffusion produce de novo protein binders. Both are open source, both generate structures conditioned on a target, and both can deliver binders that work in wet lab. They are not interchangeable. In our hands, they fail in different ways, succeed on different targets, and chain in different orders. This is where each one wins.

The two methods in one paragraph each

RFdiffusion is a denoising diffusion model trained on protein structures. It generates protein backbones by progressively denoising random coordinates conditioned on a target structure and hotspot residues. The output is a stream of 3D coordinates — sequence is added afterward by ProteinMPNN. RFdiffusion produces high backbone diversity at scale; we routinely run 10,000-50,000 backbones per campaign.

BindCraft combines hallucination-based scaffold generation with integrated sequence scoring during generation. Each design step proposes a candidate, scores it against multiple metrics (interaction quality, foldability, secondary structure), and accepts or rejects. The output is a backbone-plus-sequence pair, not just a backbone. BindCraft produces fewer designs per unit compute than RFdiffusion, but each one has already passed several internal filters.

The architectural difference matters

RFdiffusion samples broadly. BindCraft samples narrowly but with internal vetoes. This is the core trade-off.

If your target has many viable binding modes and you want to discover them, RFdiffusion’s diversity is the advantage. If your target is constrained to one or two reasonable interaction geometries, BindCraft’s filtering during generation cuts the candidate-evaluation overhead at the cost of design diversity.

We see this most clearly on flat targets versus targets with deep pockets. On a target with a deep pocket and one obvious binding hotspot, BindCraft converges quickly on candidates that fit; RFdiffusion produces many backbones that approach the pocket from sub-optimal angles. On a flat target with multiple plausible interaction surfaces, RFdiffusion’s broader sampling produces more candidates that turn out to bind in unexpected (and useful) modes after wet-lab validation.

Computational cost (production scale)

Metric	RFdiffusion	BindCraft
Backbones per H100-hour	~250-400	~50-80
Sequences per backbone	5-8 (via ProteinMPNN)	1 (integrated)
Wall-clock for 10K candidates	~25-40 hours on 1xH100	~125-200 hours on 1xH100
Memory footprint	~22 GB VRAM	~16 GB VRAM
Failure rate (NaN, OOM, divergence)	Low	Moderate

These numbers come from our own runs on PD-L1, TIGIT, and CD8a target structures. Performance varies with target size, contig length, and hotspot count.

When RFdiffusion wins

Targets with a large hotspot region or multiple interaction sites. If you can specify five or more hotspot residues spanning a non-trivial surface area, RFdiffusion’s diversity gives you binders that approach from multiple angles. Some of those will be the binders you wanted; others reveal binding modes you hadn’t considered.

Campaigns where you’ll filter aggressively post-hoc. RFdiffusion produces many candidates fast. If your downstream pipeline includes developability filters, AlphaFold2 confidence scoring, and quick experimental triage, the volume becomes an asset rather than a liability.

Scaffold-grafting and partial-diffusion workflows. RFdiffusion supports partial diffusion (denoising only a region of an existing scaffold), which is the right tool for grafting a designed binding loop onto a stable framework. BindCraft has less mature support for this workflow.

Membrane proteins where binding mode is uncertain. On the extracellular domain of a GPCR, for instance, you may not know which face of the protein is the right interaction surface. RFdiffusion’s diversity surfaces multiple plausible options; BindCraft’s filtering may converge on whichever face the scoring function happens to prefer.

When BindCraft wins

Small targets with one obvious interaction hotspot. If the target has a single binding site (a peptide, a small extracellular domain, a defined epitope), BindCraft converges faster and the integrated scoring filters out junk during generation. You spend less compute and less wet-lab budget on candidates that were never going to work.

When you want one good binder per ~100 candidates rather than ten good binders per 10,000. For a Pilot-scale campaign with limited screening budget, BindCraft’s higher quality-per-design is worth the lower throughput.

Compact backbone topologies. BindCraft’s scoring penalizes designs that don’t fold cleanly to a small, stable topology. If your application needs a binder you can express in E. coli or yeast at high yield, the bias toward compact, well-folded outputs is a feature.

Iterative refinement against a single design intent. When you’ve decided what kind of binder you want (a 60-residue helical bundle hitting a specific patch) and you want many sequence variants, BindCraft’s sample-and-filter loop is more efficient than running RFdiffusion plus separate ProteinMPNN runs.

How we chain them

Most of our AI Binder Sprint campaigns use both methods, not one. A common pattern:

RFdiffusion sweep, 10,000-30,000 backbones, broad sampling. Goal: find the right interaction modes against the target.
Cluster the RFdiffusion outputs by binding-mode geometry (typically 3-5 distinct modes survive AlphaFold2 confidence triage).
For the most promising mode, run BindCraft with hotspot conditioning that locks the geometry RFdiffusion discovered. Goal: generate higher-quality variants of the discovered solution.
Wet-lab screen the union — RFdiffusion winners plus BindCraft winners. Hit rates from BindCraft variants typically run 1.5-2x higher per design, but the absolute number of viable hits is comparable because RFdiffusion produces many more candidates.

This chain costs more compute than running either tool alone, but the hit rate per dollar of wet-lab validation is meaningfully higher because the candidates that reach screening are pre-filtered.

The cases where neither wins

Both methods struggle on:

Very small targets (peptides under 15-20 residues). The diffusion conditioning loses traction with little structural context.
Intrinsically disordered targets. Neither model has a stable structure to anchor binding-mode prediction.
Targets that require allosteric binders rather than active-site binders. The methods optimize for direct contact, not allosteric effect.

For these cases, the right tool isn’t either one — it’s directed evolution from a focused library, or DMS-guided rational design. AI binder design isn’t the universal answer.

Decision summary

If you can choose only one tool: RFdiffusion is the better default for unfamiliar targets; BindCraft is the better default for well-characterized targets. For production campaigns, run both, and run each across several parameter strategies. In our work, the compute is cheap relative to wet-lab time, and the marginal hit rate from sampling multiple parameter profiles per tool is consistently worth the extra GPU-hours.

If you’re scoping a campaign and want a second opinion on which tool fits your target, see our AI protein binder design services, start a Binder Pilot, or reach out via the contact page.

Frequently asked questions

Should I use BindCraft or RFdiffusion for de novo binder design?

RFdiffusion is the better default for unfamiliar targets and flat surfaces with multiple plausible binding modes, because it samples backbone geometry broadly. BindCraft is the better default for well-characterized targets with a single obvious hotspot, because its integrated scoring filters weak designs during generation. For production campaigns, the strongest results come from running both rather than choosing one.

What is the difference between BindCraft and RFdiffusion?

RFdiffusion is a denoising diffusion model that generates protein backbones conditioned on a target, with sequence added afterward by ProteinMPNN. BindCraft combines scaffold generation with sequence scoring in a single pass, so each output is a backbone and sequence pair that has already passed internal filters for interaction quality and foldability. RFdiffusion samples broadly; BindCraft samples narrowly with built-in vetoes.

Is RFdiffusion or BindCraft faster?

RFdiffusion produces far more designs per unit of GPU compute, on the order of 250 to 400 backbones per H100-hour versus roughly 50 to 80 for BindCraft. RFdiffusion also needs a separate ProteinMPNN step for sequences. BindCraft is slower per design, but each output is pre-filtered, so it spends less downstream wet-lab budget on candidates that were never going to work.

Can BindCraft and RFdiffusion be used together?

Yes, and most production campaigns use both. A common pattern is to run a broad RFdiffusion sweep first to discover viable binding modes, cluster the outputs by interaction geometry, then run BindCraft with hotspot conditioning that locks the most promising geometry to generate higher-quality variants. Wet-lab screening then tests the union of RFdiffusion and BindCraft winners.

BindCraft vs RFdiffusion: When to Use Which for Binder Design