Your yeast display and mammalian display screen is finished, but now you face a complex set of NGS data where simply choosing the most abundant clone can lead to costly mistakes. This guide provides a strategic framework for deconvoluting polyclonal hits by moving beyond simple frequency to analyze enrichment ratios and patterns of convergent evolution.
The Foundation: Why Simple Abundance is Not Enough
Relying solely on the final frequency of a clone is a flawed strategy because it ignores the history of the selection process.
Reasons a clone can be highly abundant without being a top performer:
- “Jackpot” Effect: Over-represented in the initial library due to synthesis or cloning bias
- PCR Amplification Bias: Some sequences are amplified more easily than others
- Modest Binders with High Display Levels: A million copies of a mediocre binder can be brighter than a thousand copies of an elite binder
The key is not to ask “Which clone is most common at the end?” but rather, “Which clone showed the most significant and consistent improvement throughout the selection?”
The Primary Metric: Calculating Enrichment Ratios
Enrichment Ratio = (Frequency of Variant in Final Round) / (Frequency of Variant in Unselected Library)
You must deep-sequence both your final enriched pool and your initial, unselected (Round 0) library. A variant that started at a frequency of 0.001% and ended at 1% (a 1000x enrichment) is often far more interesting than a variant that started at 0.5% and ended at 2% (a 4x enrichment).
Identifying Convergent Evolution: The Power of Sequence Families
Key patterns to look for:
- Are entire families enriching? If a cluster of 20 related sequences all show high enrichment ratios, it provides immense confidence that this structural solution is robust and effective.
- Are there consensus mutations? Aligning sequences within an enriching family identifies key consensus mutations driving improved function.
- Are there shared motifs across different families? Sometimes, different sequence families will independently discover the same solution at a key position.
Putting It All Together: A Candidate Selection Framework
Selection matrix criteria:
- Enrichment Ratio: Quantitative measure of selection success
- Final Abundance: Abundant enough to be real, not a sequencing artifact
- Family Convergence: Part of a larger enriching family (confidence score)
- Sequence Liabilities: Red flags like glycosylation sites, deamidation motifs, or unpaired cysteines
Hypothetical case:
- Candidate A: #1 most abundant (5% final frequency), modest 15x enrichment, orphan clone
- Candidate B: #30 most abundant (0.5% final frequency), massive 800x enrichment, lead member of a family of 25 enriching variants sharing a key mutation at position H52
Candidate B is a far more compelling and well-validated hit than Candidate A.
Conclusion: From Data to Discovery
Deconvoluting a polyclonal NGS dataset is an investigative process that blends quantitative analysis with biological intuition.
For a worked example of enrichment-ratio ranking and convergent-hotspot deconvolution from a real campaign, see our case study on pH-dependent antibody engineering via yeast surface display — 640 clones, six FACS sorts, and the convergent residues (designated H-1 and H-2) that the analysis surfaced.
Related Ranomics services
- NGS analysis: Enrichment-ratio hit calling and clonal deconvolution across sort rounds.
- Antibody engineering: Polyclonal-to-monoclonal workflows including validation and characterization.
- Case study: pH-dependent antibody engineering: 14-page technical walkthrough of a real client campaign.