When designing protein binders from scratch, one of the most consequential decisions is how you constrain the computational search. An unconstrained RFdiffusion campaign will generate diverse backbones that geometrically approach the target surface, but with no guidance about where on that surface the binder should actually make contact. The result is a large pool of candidates that engage varying epitopes with varying efficiency, most of which will not bind detectably.
Hotspot conditioning changes this. By specifying which residues on the target must be contacted by the designed scaffold, you focus the generative model on the region of the target where binding energy can actually be extracted.
Not all interface residues contribute equally to binding energy
The principle is well established in protein-protein interaction biochemistry: at most natural protein interfaces, a small subset of residues (typically 2-8 out of 10-20 interface contacts) account for the majority of the binding free energy. These are the hotspot residues. The remainder of the interface contributes little to delta-G(binding) and is largely structural or solvent-exclusion.
Alanine scanning mutagenesis, the classic method for identifying hotspots, replaces each interface residue with alanine and measures the change in binding affinity. Residues where the Ala mutation causes a delta-delta-G of >1-2 kcal/mol are operationally defined as hotspots. This has been measured for hundreds of protein-protein interactions. The pattern is consistent: binding energy is concentrated.
For de novo design, this means: if you specify hotspot residues and constrain the design model to contact them, you are directing the scaffold to engage the part of the target surface where the energy reward for binding is highest. Unconstrained generation, by contrast, may produce scaffolds that contact peripheral residues with low contribution to binding energy, generating sequences that look structurally plausible but fail in the binding assay.
How to identify hotspot residues computationally
When experimental alanine scanning data exist, they are the preferred source. Where they do not:
Interface energy decomposition. If a complex structure is available (or can be modeled with AlphaFold-Multimer / Boltz-2), per-residue interface energy can be decomposed using Rosetta’s InterfaceAnalyzer or FoldX. Residues with large negative contributions to the interface score are candidate hotspots.
Evolutionary conservation at functional sites. Residues conserved across orthologs at a binding interface are under selection pressure that correlates with functional importance. ConSurf or custom evolutionary trace analysis can identify these positions.
Literature-derived binding epitopes. For targets with published structural biology of natural protein-protein complexes, the natural binding interface defines the epitope. Structures of ligand-receptor or protein-inhibitor complexes in the PDB are a direct source.
Cryo-EM and HDX-MS data. Hydrogen-deuterium exchange mass spectrometry maps solvent exposure changes on binding, identifying protected (interfacial) regions even without an atomic-resolution structure.
Identify epitopes on your own target. Epitope Scout scores and ranks surface patches on any PDB structure. Free to use.
Specifying hotspots for RFdiffusion
In practice, hotspots are specified as a list of residue identifiers from the target structure. RFdiffusion uses these as geometric constraints during backbone generation: the produced scaffolds are required to place atoms within contact distance of the specified hotspot residues.
Practical guidance from production campaigns:
Specify 3-8 hotspot residues. Fewer than 3 gives insufficient constraint; the model will produce diverse backbones but many won’t contact the intended region. More than roughly 10 overconstrains the problem and reduces the scaffold diversity you need for a successful screen.
Prefer hotspots in the core of the epitope, not the periphery. Edge residues at the interface boundary are often partially solvent-exposed and contribute less to binding energy. Central, buried hotspot residues are better anchors for a designed scaffold.
Check structural accessibility. A computationally predicted hotspot that is in a crystal packing contact, involved in an allosteric site, or at the base of a very deep groove may not be practically accessible to an external binder. Visual inspection of the target structure before finalizing the hotspot list is worthwhile.
Avoid specifying hotspots in disordered or flexible loop regions. RFdiffusion’s conditioning on hotspot positions assumes those positions are well-defined in 3D space. High B-factor residues or regions with significant structural heterogeneity across crystal forms are poor hotspot anchors.
Hotspot specification vs. hit rate: what the data show
In Ranomics’ campaigns, hotspot-conditioned runs consistently outperform unconstrained generation in confirmed hit rate per screened sequence. The improvement is most pronounced for targets with complex surface topology where the productive epitope is a small fraction of the total accessible surface.
For flat or extended surfaces with multiple equally accessible regions, the gain is smaller. In those cases, unconstrained generation covers the productive epitope by chance at reasonable frequency.
The decision to invest in hotspot definition (experimental alanine scanning, or rigorous computational decomposition) should be scaled to the target difficulty and campaign budget. For a straightforward extracellular domain, an AlphaFold-Multimer-derived hotspot estimate is usually sufficient. For a challenging target where previous campaigns have failed, investing in experimental epitope definition before running a de novo design campaign is often the highest-value use of resources.
Ranomics uses hotspot-guided design in every binder campaign: AI Protein Binder Design