Sequence design

ProteinMPNN inverse folding sequence design

ProteinMPNN solves the inverse-folding problem: given a backbone, what amino acid sequence will fold into that structure? A message-passing neural network that has become the standard pairing for RFdiffusion, BoltzGen, and other de novo backbone generators in modern protein design pipelines.

Run sequence design on a generated backbone. Temperature sampling, fixed-position interface constraints, and stability scoring are all exposed as standard arguments.

Based on Dauparas et al., Science 2022. Used inside every Ranomics wet-lab binder campaign that passes through RFdiffusion or BoltzGen.

Launch ProteinMPNN → How it works

How it works

From backbone coordinates to scored sequences

Input backbone

Provide backbone coordinates from RFdiffusion, BoltzGen, or any de novo generator. Optionally pin hotspot contact residues as fixed positions.

MPNN inference

The message-passing network considers each residue position in the context of its local geometric environment and learned neighbor identities.

Sampled sequences

Generate 8-16 sequences per backbone at temperatures 0.1-0.5. Each sequence receives a model-confidence score reflecting predicted foldability.

Filter and rank

Apply solubility, aggregation, and motif filters; rank by MPNN score; advance the highest-confidence designs to AlphaFold2 structural validation.

Methodology

How we configure ProteinMPNN in production runs

Five settings define a Ranomics ProteinMPNN job. These are the same defaults we use in wet-lab campaigns and that the tools-hub UI exposes for self-serve users.

8-16

Sequences per backbone

For each backbone we generate 8-16 candidate sequences. This produces sequence diversity across a single topology, so the downstream AF2 validator can pick the best-folding variant rather than betting on a single sample.

0.1-0.5

Temperature sampling

Low T (0.1) yields conservative, high-confidence sequences; high T (0.4-0.5) explores more diversity at the cost of foldability. Default 0.2-0.3 is the empirical sweet spot for binder design.

Fixed

Interface position constraints

Hotspot contact residues identified during backbone generation are locked. ProteinMPNN optimizes everything else while preserving the binding geometry at the designed interface.

Score

Stability and foldability ranking

Each sequence receives a score reflecting the model confidence that it will fold into the target backbone. We rank within-backbone and advance only the top fraction to AlphaFold2 validation.

Filters

Solubility and motif checks

Predicted solubility, aggregation propensity, free cysteines, and N-glycosylation sites are screened before sequences leave the design pool. Problematic motifs are filtered out, not just flagged.

Pipeline position

Where ProteinMPNN sits in a de novo design run

Between backbone gen and AF2

Backbone generators produce coordinates without amino acid identities. ProteinMPNN assigns the sequence; AlphaFold2 then refolds each sequence end-to-end and we keep only the ones whose predicted structure matches the input backbone. Three discrete stages, three different models.

Pairs with RFdiffusion and BoltzGen

Both RFdiffusion and BoltzGen output sequence-agnostic backbones, and ProteinMPNN is the standard inverse-folder for both. The RFdiffusion paper and the broader open-source binder design community treat MPNN as the default sequence design step.

Skip for BindCraft

BindCraft handles its own sequence design inside an iterative co-optimization loop against AlphaFold2. Running ProteinMPNN on BindCraft output is redundant and can hurt scores. Use BindCraft’s native designs directly, validate, and move on.

When to use ProteinMPNN

The default sequence design step for de novo binders

If you are running RFdiffusion or BoltzGen, you need ProteinMPNN. Backbone generators do not assign amino acid identities, and naive sequence design from a Rosetta-style energy function recovers far fewer foldable, expressible designs than a learned inverse folder.

ProteinMPNN replaces hand-tuned design with a single inference pass that produces ranked, scored, ready-to-validate sequences. Per Dauparas et al. (Science, 2022), sequence recovery is substantially higher than physics-based methods, and the model has now been validated experimentally in dozens of published binder design campaigns.

Just generated backbones with RFdiffusion and need sequences before validation

Running BoltzGen and want to redesign sequences with explicit interface constraints

Optimizing a known binder backbone with a different target or stability profile

Designing soluble variants of a structural scaffold while preserving the fold

Comparing inverse-folding outputs against a physics-based baseline on the same backbone

What's next after MPNN

From designed sequences to validated binders

ProteinMPNN gives you sequences. Wet-lab validation tells you which ones actually bind. Two entry points depending on scope.

Starter program

Validate your MPNN-designed sequences

The Binder Pilot is a short, fixed-scope campaign with one round of design, a smaller pool, ranked hits, and a technical report. Scoped for academic labs, seed biotech, industrial SMBs, and student research groups who already have a target and want validated binders without committing to a full multi-round program.

See the Binder Pilot → Flagship program

Multi-algorithm de novo campaign

The AI Binder Sprint runs RFdiffusion, BindCraft, and BoltzGen in parallel over 6-8 weeks with milestone check-ins and a 100% binder guarantee. ProteinMPNN handles sequence design for the RFdiffusion and BoltzGen branches inside this pipeline.

See the AI Binder Sprint →

Run ProteinMPNN on your backbone

Upload a backbone, set the temperature, pin your hotspot positions. Eight to sixteen scored sequences returned, ready for AlphaFold2 validation or wet-lab handoff.

Launch ProteinMPNN → Need help with binder design?