ProteinMPNN protein sequence design
Inverse folding for amino acid sequence design across computationally generated protein backbones
ProteinMPNN designs sequences for generated backbones
Backbone generation and sequence design are separate problems in computational protein engineering. RFdiffusion and Boltzgen produce backbone coordinates — the three-dimensional shape of the protein — but do not assign amino acid identities. ProteinMPNN solves the complementary problem: given a backbone, what amino acid sequences will fold into that structure?
ProteinMPNN uses a message-passing neural network architecture that considers the local geometric environment of each residue position. It learns which amino acid identities are compatible with the backbone geometry at each position, producing sequences that are predicted to fold stably into the target structure.
Multiple sequences per backbone
For each backbone generated by RFdiffusion or Boltzgen, we generate 8-16 sequences using temperature sampling. This produces sequence diversity across a single backbone topology. BindCraft handles its own sequence design internally as part of its iterative co-optimization loop.
Fixed-position constraints are applied at predicted hotspot contact residues. These positions are locked to preserve the binding geometry at the interface, while allowing ProteinMPNN to optimize the remaining positions for stability and foldability.
Controlling the diversity-stability tradeoff
Conservative sequences. Higher predicted stability. Less sequence diversity. Best when the backbone geometry is well-suited to the target.
Balanced sampling. Our default for most campaigns. Good diversity with reasonable predicted foldability.
Diverse sequences. More exploration. Higher risk of non-folding designs, but occasionally finds unexpected solutions.
Stability and foldability metrics
Each ProteinMPNN output receives a score reflecting the model's confidence that the sequence will fold into the target backbone. We rank all sequences per backbone and advance high-scoring candidates to structural validation.
Additional filters include predicted solubility, aggregation propensity, and absence of known problematic sequence motifs (e.g., free cysteines, N-glycosylation sites in non-glycosylated contexts).
Between backbone generation and structural validation
Backbone coordinates from RFdiffusion or Boltzgen
Sequence design + scoring
Scored sequences for validation
See how ProteinMPNN fits into the full pipeline
ProteinMPNN is one step in our integrated design-to-screening workflow. Explore the full pipeline or start a project.