Ranomics
Scientific research and computational biology
yeast displaylibrary designprotein engineeringNGSVHHnanobody

How Big a Yeast Display Library Do You Need for a 10 nM Binder?

The most common scoping question on a yeast display intake call is some version of “how big a library do we need?” The honest answer is that it depends on where you are starting and where you are going. This post walks through the math for a 10 nM KD target, which is a typical specification for a preclinical antibody lead or a usable tool binder. The same framework applies to tighter or looser specifications with adjusted numbers.

Starting Material Sets the Floor

The library you screen is not the library you build. The useful diversity is the diversity of the starting population that already has some probability of binding the target. This varies by an order of magnitude or more depending on how the library was constructed.

Naive libraries. A fully synthetic library with random CDR diversification has no bias toward your target. Most hits that enrich in a naive library sit in the 100 nM to 1 µM range, and reaching 10 nM almost always requires a second round of affinity maturation built on a first-round lead. Plan for two campaigns in sequence: a discovery campaign at 10^8 to 10^9 diversity, followed by a maturation library around a validated hit.

Immunized libraries. A VHH library cloned from an immunized llama or alpaca has already been selected in vivo for binding to the target. The starting population is enriched for antigen-specific B cells, and germline lineages that bind the target are over-represented. A well-constructed immunized library of 10^7 to 10^8 transformants can reach single-digit nanomolar affinity directly without a separate maturation step. The practical constraint is library construction quality, not diversity ceiling.

Computational pools. A library of de novo designs from RFdiffusion, BindCraft, or a related pipeline is already pre-filtered for shape complementarity and predicted binding energy. Most of the designs that make it through self-consistency filtering have a measurable probability of binding. A well-curated pool of 10^5 to 10^6 computational designs, screened on yeast, can deliver sub-100 nM hits in a single campaign, and affinity maturation on the top hits reaches 10 nM without a second discovery library.

The KD goal and the starting material together define the minimum useful library size. For a 10 nM target, a naive library needs 10^8 or more to give affinity maturation a working lead. An immunized library needs 10^7 or more to capture the rare high-affinity germlines. A computational pool needs 10^5 or more to cover the design diversity that survived filtering.

The Coverage Math: Poisson Sampling

Library construction is a sampling process. If your transformation efficiency is 10^7 transformants per microgram and your theoretical DNA diversity is 10^8, you have not sampled 10^8 sequences; you have sampled roughly 10^7 sequences with replacement from a 10^8 pool. The Poisson distribution tells you what fraction of theoretical variants appear at least once.

For the NGS readout, the same math applies in reverse. If you want to see 95% of your library members at least once in an NGS run, you need roughly three times the library size in reads. For a 10^8-member input library, that is ~3 * 10^8 reads per round, across three to four rounds, plus the zero-selection baseline. On a NextSeq 550 or similar Illumina platform this is feasible; on a single MiSeq run it is not. Scope NGS capacity before you scope library size.

Coverage drops as the library concentrates through selection. Post-sort populations are smaller and deeper sequencing gets you finer enrichment resolution on the surviving clones. Many groups budget higher read counts for the zero-selection baseline and the final sorted population, with lighter coverage on intermediate rounds.

Sort Gate Stringency

A FACS sort on a yeast display library is a one-dimensional selection: gate on binding signal, normalized to surface expression. The stringency of the gate sets how much of the library passes each round.

At a 10 nM target concentration, the expected fraction of displayed variants that bind at or above the gate threshold is roughly the fraction of the library with KD at or below ~10 nM. For a naive library this is typically 0.01% to 0.1% of the library. For an immunized library it can be 0.1% to 1%. For a computational pool it can be 1% to 10%. Those rates compound across rounds.

A typical four-round FACS campaign against a 10 nM target starts from 10^8 cells sorted in round 1, sorts around 10^4 to 10^6 positive cells, grows them to 10^7 to 10^8, then sorts again. By round 4 the post-sort population is dominated by a few hundred to a few thousand enriched clones. NGS resolves the ranking.

If the input library is too small relative to the expected positive fraction, the round 1 post-sort population can drop below the 10^4 cells needed to recover a reliable population. If the input library is too large relative to the expected positive fraction, round 1 oversamples bystanders and FACS throughput becomes the bottleneck. Library size and expected hit rate need to be specified together.

The KD Ladder Across Rounds

Multi-round yeast display selection against a specific KD target uses a decreasing target concentration ladder. A common specification for a 10 nM endpoint goal:

  • Round 1: target concentration at 100 nM. Five to tenfold above the goal to capture anything binding at or tighter than the goal, with margin. Relatively permissive gate.
  • Round 2: target concentration at 10 nM. At the goal. Tighter gate to enrich true 10 nM binders over bystanders.
  • Round 3: target concentration at 1 nM. Tenfold below the goal. Selects for the tightest binders in the enriched population. This round is sometimes replaced or paired with an off-rate selection, where the library is pre-bound at saturating target and then washed with unlabeled competitor.
  • Round 4 (optional): counter-selection, specificity panel, or 100 pM for further affinity resolution. Used when the downstream readout needs specificity information (for example, cross-reactivity with close homologs) rather than raw affinity.

The ladder converts a broad 10^8 library into a ranked list of tens to low thousands of clones at the endpoint. NGS on each round lets you track enrichment and pick the clones that climb consistently across rounds rather than the ones that win a single round by luck.

Library Size Sanity Checks

Two failure modes are worth checking before committing to synthesis.

Theoretical diversity exceeds the yeast ceiling. If your diversification pattern encodes more than ~10^9 theoretical DNA variants, you cannot sample it in a single yeast library. Transformation efficiency caps at around 10^8 transformants per electroporation, and pooled campaigns can stretch this to 10^9 at significant expense. If the math says 10^10 or more, you are guaranteed to undersample. Either narrow the diversification pattern, use a trimer mix to reduce redundant codons, or split into focused sub-libraries that cover different regions of the sequence space.

Library smaller than 10^6. If your pool fits in fewer than a million variants (for example, 500 rationally designed point mutants, or 10,000 computationally filtered designs), yeast display is often not the right screening format. Individual-clone ELISA or low-throughput SPR against a handful of candidates can deliver affinity-ranked hits faster, without the overhead of library construction, FACS, and NGS. Yeast display earns its cost when the diversity is high enough that individual-clone screening is impractical.

The Yeast Display Library Planner flags both of these cases automatically against the yeast transformation ceiling and the NGS read budget you specify.

Computational Design Plus Display: The Best Case

The campaign architecture with the cleanest library size math is a pool of 10^3 to 10^5 computationally generated designs screened on yeast. The pool is small enough to transform fully, large enough to justify FACS over individual-clone screening, and pre-filtered enough that the per-clone hit rate is meaningful.

A representative workflow: RFdiffusion generates 10,000 backbone designs against a defined hotspot patch. ProteinMPNN redesigns sequences onto the backbones. Self-consistency filtering by ESMFold, ColabFold, or Boltz-2 removes designs that do not refold to the intended backbone (typically 50% to 80% of raw output). The filtered pool of 1,000 to 3,000 designs is synthesized as an oligo pool, cloned into the yeast display vector, transformed, and sorted against the target.

Because the starting pool is already enriched for predicted binders, hit rates in round 1 typically run at 1% to 10%, not 0.01%. The FACS throughput requirement drops by two orders of magnitude. Three rounds at a decreasing KD ladder often reach 10 nM directly, without a separate affinity maturation campaign. End-to-end timeline runs three to four weeks from a ready-to-synthesize design pool to a ranked hit list with NGS enrichment data.

This architecture is the backbone of the AI Binder Sprint flagship program, which pairs RFdiffusion, BindCraft, and similar generative tools with in-house yeast display screening.

The Short Answer

For a 10 nM endpoint KD target:

  • Naive library: 10^8 or larger, plus a follow-on affinity maturation campaign.
  • Immunized library: 10^7 to 10^8, single campaign feasible.
  • Computational pool: 10^3 to 10^5, single campaign feasible with a cleaner hit rate distribution than a naive library.

Library size is a downstream variable. Start with the target KD, the starting material, and the sort budget; library size falls out of that specification, not the other way around.

Scoping a Campaign

To run these numbers against your specific target and starting material, the Yeast Display Library Planner takes library inputs and a sort plan and returns theoretical diversity, achievable diversity, stop codon load, and NGS read depth per round. It flags cases where the plan is mathematically infeasible before any synthesis budget is committed.

For a scoped yeast display campaign end to end, the Yeast Display service page covers platform capabilities. The AI Binder Sprint pairs computational design with display validation for campaigns that benefit from a pre-enriched starting pool. To scope a project, use the contact form and include the target, the starting library architecture (naive, immunized, or computational), and the KD specification.

Share

Ready to start a project?

Tell us about your protein engineering challenge. We will scope a program and get back to you within 24 hours.

Start a project →