ColabFold Batch Triage for RFdiffusion and BindCraft Designs

Generative protein design tools like RFdiffusion and BindCraft produce hundreds or thousands of candidate sequences per run. Most of them will not fold the way the design intended, and many of the ones that do will not bind the target. The job of triage is to discard the broken candidates before any of them reach gene synthesis, and the cheapest reliable filter is a fast self-consistency fold.

ColabFold is the tool that makes this filter cheap enough to actually run. This post walks through how to use colabfold_batch to triage a design pool, what thresholds to apply, and where the filter typically lands.

Why a Separate Fold Step Is Needed

A generative design pipeline produces three things for every candidate: a designed backbone, a sequence intended to fold into that backbone, and a model-internal confidence estimate (BindCraft’s AF2 loss, RFdiffusion plus ProteinMPNN’s joint score, BoltzGen’s Boltz-2 confidence). Those internal scores are useful but not sufficient. The design model and the sequence design model share assumptions, and a design that scores well by its own internal metric can still produce a sequence that does not actually fold into the intended backbone when an independent structure predictor is asked the question.

The standard fix is the self-consistency check, sometimes called design plus rosetta-fold-back, AF2 RMSD filtering, or simply “AF2 filtering”:

Take the designed sequence from your generator.
Fold the sequence independently with ColabFold (single-sequence mode, since these are de novo sequences with no natural homologs).
Align the predicted structure back to the intended design backbone.
Measure RMSD over the binder chain and pLDDT averaged over the binder residues.
Threshold on both.

The independent fold step is what catches the sequences that the generator predicted would work but that the actual AlphaFold 2 weights disagree about.

Why ColabFold and Not Full-MSA AF2

The triage filter has to run on every candidate, which for a meaningful campaign means hundreds to low thousands of folds. Full-MSA AlphaFold 2 at 30 to 90 minutes per fold makes this filter infeasible. ColabFold at 1 to 2 minutes per fold on a modern GPU makes it routine.

For designed sequences specifically, the MMseqs2 frontend question is moot. De novo binder designs have no natural homologs, so both pipelines should run in single-sequence mode. Single-sequence ColabFold is the standard tool, and the output is interchangeable with what full-MSA AlphaFold 2 would produce on the same sequence with MSA disabled.

ESMFold is the other option in this niche. It is roughly twice as fast as single-sequence ColabFold but uses the ESM-2 language-model weights rather than AlphaFold 2, so it gives a second opinion rather than a faster version of the same one. In practice many design pipelines run both: ColabFold for the standard filter, ESMFold for an orthogonal cross-check on borderline candidates.

Running colabfold_batch on a Design Pool

The basic command on a LocalColabFold install or hosted GPU:

colabfold_batch \
  --msa-mode single_sequence \
  --num-recycle 3 \
  --model-type alphafold2_multimer_v3 \
  designs.fasta \
  results_dir/

Inputs that matter for triage:

FASTA format. Each entry is a designed binder sequence. For multimer prediction with the target, put the binder and target sequences separated by a colon on the same FASTA record. For binder-alone folds (cheaper and often sufficient), put each binder sequence as a single-chain record.
MSA mode. Always single_sequence for de novo designs. A built MSA on a designed sequence is a contamination source, not a feature.
Recycle count. Three is the default and works for most triage. Going to six can pull marginal candidates over a pLDDT threshold but doubles the runtime per fold.
Model checkpoint. For multimer prediction, use the multimer weights. For binder-alone folds, the standard monomer weights are fine.

The output is a directory per sequence with five ranked PDB files (model_1 through model_5), a PAE matrix per model, and a JSON summary with pLDDT and pTM scores.

What to Filter On

Four numbers do most of the work:

1. pLDDT averaged over the binder

The headline confidence score. Compute the mean pLDDT over the binder residues (chain A in a multimer fold, or the whole sequence in a monomer fold). A typical threshold is 80 for promotion to the next filter stage. Above 90 is excellent; below 70 the sequence is unlikely to fold into the intended structure.

For binder-alone folds (no target), pLDDT alone is the gate.

2. Self-consistency RMSD

Align the predicted binder structure to the designed backbone and measure C-alpha RMSD. The threshold most groups use is 2 Å. Above 2 Å means the predicted fold disagrees with the intended fold, and the sequence is unlikely to recapitulate the design when expressed.

This is the most important single filter. A 1,500-design RFdiffusion run with RMSD-only filtering at 2 Å typically loses 50 to 80 percent of the pool. That is the expected and correct behavior.

3. Interface pLDDT (for multimer folds)

If you folded the binder together with the target, compute pLDDT over just the interface residues (binder residues within 5 Å of any target residue in the predicted complex). High interface pLDDT means AlphaFold 2 is confident in the inter-chain contacts, not just the chain folds. A typical threshold is 75 to 80.

Interface pLDDT correlates with experimental binding probability in published benchmarks more strongly than overall pLDDT does.

4. PAE on the interface block

The off-diagonal block of the PAE matrix covering binder-to-target residue pairs. Low PAE values across that block mean AlphaFold 2 is confident in the relative positioning of the two chains. High values mean the model thinks the two chains are docked but is uncertain by how much. A typical threshold is mean PAE under 10 Å on the binder-target block.

Together, pLDDT, RMSD, interface pLDDT, and PAE give a four-axis filter that catches most of the candidates that will not work experimentally.

A Reasonable Funnel

For a 1,500-design RFdiffusion plus ProteinMPNN pool against a single target, a triage funnel that works in practice:

Stage	Tool	Typical pool size
Raw RFdiffusion plus ProteinMPNN output	None	1,500
ColabFold single-sequence binder-only fold	colabfold_batch	1,500 → 400
Filter pLDDT > 80 and RMSD < 2 Å	Local Python script	400 → 400
ColabFold multimer fold against target	colabfold_batch	400 → 200
Filter interface pLDDT > 75 and PAE < 10 Å	Local Python script	200 → 100
Developability filters (charge, hydrophobic patch)	Local Python script	100 → 60
Final ranked subset for gene synthesis	Manual review	60 → 60

The numbers vary by target and by how aggressive the early filters are. Sixty candidates is a typical size for a first wet-lab pool: small enough to synthesize as a single oligo pool, large enough that the wet-lab hit rate of 5 to 20 percent gives several real hits.

When BindCraft and BoltzGen Need the Same Treatment

BindCraft has its own internal AF2 multimer loss, which is essentially a single-sequence AlphaFold 2 fold built into the design loop. Designs that come out of BindCraft therefore already pass a soft version of the self-consistency check. In practice the filter still catches things, because the BindCraft internal loss optimizes against a single AF2 model checkpoint and an independent five-model ColabFold fold can disagree.

BoltzGen uses Boltz-2 internally, which is a different model family than AlphaFold 2. A ColabFold filter on BoltzGen output is fully orthogonal to BoltzGen’s internal scoring and catches a different set of failure modes. We recommend running ColabFold triage on BoltzGen outputs even more than on BindCraft outputs for that reason.

RFdiffusion plus ProteinMPNN has no built-in fold check, so the ColabFold filter is the entire quality gate.

Practical Notes on Throughput

A few details that matter when running colabfold_batch at scale:

MSA server politeness. Single-sequence mode skips the MMseqs2 server entirely, which is convenient. If you ever switch back to MSA mode for natural-sequence triage, batch your jobs and respect the rate limits on the public MMseqs2 server.
Disk usage. Five PDB files plus a PAE matrix plus JSON per sequence adds up. A 1,500-sequence batch produces about 1 GB of output. Plan storage accordingly.
GPU memory. Monomer folds of typical binder lengths (60 to 150 residues) fit comfortably on a 16 GB GPU. Multimer folds with the target attached (target plus binder, total 200 to 600 residues) want at least 24 GB and ideally 40 GB.
Parallelism. colabfold_batch is single-GPU. Splitting a large FASTA across multiple GPUs is a shell-level parallelization. Use one process per GPU and partition the input file.

What Comes After Triage

A clean filtered pool of 50 to 100 binder candidates is a starting point, not an endpoint. The next stages are typically:

Gene synthesis (oligo pool for high-throughput screens, or individual genes for purification).
Display screening (yeast or mammalian display with FACS sorting and NGS readout) for pool-level hit calling.
Purification and orthogonal affinity measurement (SPR, BLI) for the top 5 to 20 ranked hits.

The Binder Pilot at Ranomics is scoped to take a filtered design pool through exactly these stages. The AI Binder Sprint is the flagship version for multi-target or multi-round work, and includes ColabFold triage inside the in-silico loop alongside multiple generative tools.

Common Pitfalls

A few mistakes that recur on first-time triage runs:

Folding designs with a built MSA. Always use single-sequence mode for de novo sequences. MSA mode on a designed sequence contaminates the prediction with whatever distant homologs the MMseqs2 server happens to hit.
Using only pLDDT as the filter. A confident fold of a wrong structure is still wrong. Always pair pLDDT with self-consistency RMSD.
Skipping the multimer fold. Binder-alone folds catch sequences that do not fold at all. They cannot tell you whether the sequence will dock onto the target. Multimer folds on the survivors are where interface pLDDT and interface PAE earn their keep.
Synthesizing the full filtered pool without manual review. Outliers (very long, very short, very charged) often pass numerical filters but fail downstream. A 30-second eyeball on each survivor before synthesis catches a handful of bad candidates every batch.
Treating ColabFold confidence as binding confidence. A confident fold and a confident interface prediction are necessary but not sufficient for a real binder. The wet lab is still the only ground truth.

Summary

ColabFold’s job inside a binder design campaign is filtering, not generation. The MMseqs2 frontend and the single-sequence inference mode make it cheap enough to run on every candidate sequence, and the self-consistency loop catches most of the candidates that would have wasted wet-lab time. A pool that goes into ColabFold at 1,500 designs and comes out at 60 is a normal and healthy ratio. The campaigns that ship working binders are the ones that take the filter seriously, not the ones that try to skip it.

You can run the ColabFold tool on tools.ranomics.com against a designed sequence right now, with no install and a hosted A100. For a full batch triage workflow, the colabfold_batch command is available in both the LocalColabFold install and the hosted environment.

ColabFold technology page: how the MMseqs2 frontend works and where ColabFold sits in a design campaign.
Binder Pilot: ranked hit list from your top filtered designs, scoped for academic and seed-biotech teams.
AI Binder Sprint: multi-algorithm campaign with ColabFold sitting in the in-silico triage loop alongside RFdiffusion, BindCraft, and BoltzGen.

Using ColabFold Batch to Triage RFdiffusion and BindCraft Binder Pools

Why a Separate Fold Step Is Needed

Why ColabFold and Not Full-MSA AF2

Running colabfold_batch on a Design Pool

What to Filter On

1. pLDDT averaged over the binder

2. Self-consistency RMSD

3. Interface pLDDT (for multimer folds)

4. PAE on the interface block

A Reasonable Funnel

When BindCraft and BoltzGen Need the Same Treatment

Practical Notes on Throughput

What Comes After Triage

Common Pitfalls

Summary

Ready to design your binder?

Using ColabFold Batch to Triage RFdiffusion and BindCraft Binder Pools

Why a Separate Fold Step Is Needed

Why ColabFold and Not Full-MSA AF2

Running colabfold_batch on a Design Pool

What to Filter On

1. pLDDT averaged over the binder

2. Self-consistency RMSD

3. Interface pLDDT (for multimer folds)

4. PAE on the interface block

A Reasonable Funnel

When BindCraft and BoltzGen Need the Same Treatment

Practical Notes on Throughput

What Comes After Triage

Common Pitfalls

Summary

Related Ranomics services

Ready to design your binder?