Ranomics
Scientific research and computational biology
NGSyeast displaylibrary screeningbioinformaticsantibody discovery

Deconvoluting Polyclonal Hits: Strategies for Characterizing Enriched Library Pools

Your yeast display and mammalian display screen is finished, but now you face a complex set of NGS data where simply choosing the most abundant clone can lead to costly mistakes. This guide provides a strategic framework for deconvoluting polyclonal hits by moving beyond simple frequency to analyze enrichment ratios and patterns of convergent evolution.

The Foundation: Why Simple Abundance is Not Enough

Relying solely on the final frequency of a clone is a flawed strategy because it ignores the history of the selection process.

Reasons a clone can be highly abundant without being a top performer:

  • “Jackpot” Effect: Over-represented in the initial library due to synthesis or cloning bias
  • PCR Amplification Bias: Some sequences are amplified more easily than others
  • Modest Binders with High Display Levels: A million copies of a mediocre binder can be brighter than a thousand copies of an elite binder

The key is not to ask “Which clone is most common at the end?” but rather, “Which clone showed the most significant and consistent improvement throughout the selection?”

The Primary Metric: Calculating Enrichment Ratios

Enrichment Ratio = (Frequency of Variant in Final Round) / (Frequency of Variant in Unselected Library)

You must deep-sequence both your final enriched pool and your initial, unselected (Round 0) library. A variant that started at a frequency of 0.001% and ended at 1% (a 1000x enrichment) is often far more interesting than a variant that started at 0.5% and ended at 2% (a 4x enrichment).

Identifying Convergent Evolution: The Power of Sequence Families

Key patterns to look for:

  • Are entire families enriching? If a cluster of 20 related sequences all show high enrichment ratios, it provides immense confidence that this structural solution is robust and effective.
  • Are there consensus mutations? Aligning sequences within an enriching family identifies key consensus mutations driving improved function.
  • Are there shared motifs across different families? Sometimes, different sequence families will independently discover the same solution at a key position.

Putting It All Together: A Candidate Selection Framework

Selection matrix criteria:

  • Enrichment Ratio: Quantitative measure of selection success
  • Final Abundance: Abundant enough to be real, not a sequencing artifact
  • Family Convergence: Part of a larger enriching family (confidence score)
  • Sequence Liabilities: Red flags like glycosylation sites, deamidation motifs, or unpaired cysteines

Hypothetical case:

  • Candidate A: #1 most abundant (5% final frequency), modest 15x enrichment, orphan clone
  • Candidate B: #30 most abundant (0.5% final frequency), massive 800x enrichment, lead member of a family of 25 enriching variants sharing a key mutation at position H52

Candidate B is a far more compelling and well-validated hit than Candidate A.

Conclusion: From Data to Discovery

Deconvoluting a polyclonal NGS dataset is an investigative process that blends quantitative analysis with biological intuition.

For a worked example of enrichment-ratio ranking and convergent-hotspot deconvolution from a real campaign, see our case study on pH-dependent antibody engineering via yeast surface display — 640 clones, six FACS sorts, and the convergent residues (designated H-1 and H-2) that the analysis surfaced.

Share

Ready to start a project?

Tell us about your protein engineering challenge. We will scope a program and get back to you within 24 hours.

Start a project →