Jacquere Libraries: Balancing On-Target Efficiency and Off-Target Risk in CRISPR Screening

Jacquere Libraries: Balancing On-Target Efficiency and Off-Target Risk in CRISPR Screening

The Dilemma of CRISPR Screening

CRISPR-Cas9 genome-wide knockout screening is a core tool in functional genomics research. By systematically perturbing genes and observing phenotypic changes, researchers can reveal gene function on a large scale. However, this technology has always faced two mutually constraining challenges: guide RNAs may not be able to effectively target the target gene (insufficient on-target efficacy), and may also produce off-target effects at non-target sites (off-target activity).

These two problems have spurred numerous prediction algorithms and design strategies, but for a long time, a unified framework that can simultaneously balance both has been lacking. Meanwhile, with the widespread adoption of high-dimensional readout technologies (such as single-cell RNA sequencing and high-content imaging), researchers' demand for more compact and efficient screening libraries is increasing, while the continuous updating of genome annotation is gradually rendering existing libraries obsolete.

On March 25, 2026, John G. Doench's team from the Broad Institute of MIT and Harvard proposed a systematic solution in a paper—they designed and validated novel whole-genome CRISPR-Cas9 knockout libraries Jacquere (human) and Julianna (mouse), and developed a new strategy, CRISPick aggregate CFD, to balance on-target efficacy and off-target avoidance.

Reinterpreting Off-Target Activity from GUIDE-seq Data

To construct an excellent library, a precise understanding of the patterns of off-target activity is essential. Using GUIDE-seq data from 114 unique guides, the research team systematically quantified the activity rate of off-target sites in CD4+/CD8+ T cells, U2OS, and HEK293 cell lines:

  • 0 mismatches: 100% activity rate
  • 1 mismatch: 43% activity rate
  • 2 mismatches: 5.7% activity rate
  • 3 mismatches: 0.4% activity rate

This result clearly shows that off-target activity decreases sharply with increasing mismatch numbers. Notably, alignment shifts in RNA/DNA bulges do not increase activity, a finding that simplifies the dimensions considered in off-target assessment.

Further analysis revealed a high correlation between the CFD (cutting frequency determination) score and the activity rate measured by GUIDE-seq (Pearson R = 0.93), providing a reliable basis for assessing off-target risk based on computational prediction.

CRISPick Aggregate CFD: A Balanced Off-Target Classifier

Based on the above findings, the research team proposed the CRISPick aggregate CFD classifier. The core idea is to represent each guide as the sum of the CFD scores of all its off-target sites, and to distinguish between guides with high and low specificity by setting a threshold.

A key question is: how many mismatched off-target sites should be included? The team evaluated the classifier's F1 score at several mismatch thresholds:

  • Considering only 0 SDR mismatches: F1 = 0.74
  • Considering at most 1 SDR mismatch: F1 = 0.77
  • Considering at most 2 SDR mismatches: F1 = 0.69

Including too many mismatched off-target sites introduces a large amount of noise from inactive sites, reducing classifier performance. The optimal approach is to consider only off-target sites with at most 1 mismatch in the SDRs, with a threshold set at 4.8.

Validation on the Avana library dataset shows that the classifier's performance is unaffected by TP53 status—the F1 score is 0.73 for wild-type TP53 and 0.72 for mutant TP53.

The study also compared the widely used GuideScan specificity score. Because GuideScan included too many inactive off-target sites (87.9% of the included sites were inactive, compared to only 56.3% in CRISPick aggregate CFD), it performed significantly worse. This further confirms an important principle: not all alignable off-target sites are worth paying attention to; over-inclusion only dilutes the signal.

Jacquere Library Design Strategy

Guided by the CRISPick aggregate CFD classifier, the research team designed a novel human whole-genome CRISPR-Cas9 knockout library, Jacquere. Its design process reflects multiple considerations:

  • Gene Coverage: Integrating protein-coding gene annotations from RefSeq, GENCODE, and CHESS databases, covering a total of 20,550 genes.
  • On-Target Priority: Guides were first selected by sorting from highest to lowest RS3 score to ensure targeting efficacy.
  • Off-Target Filtering: Guides with a CRISPick aggregate CFD score exceeding 4.8 were excluded, while off-target sites with a CFD of 1.0 were avoided (protein-coding regions were prioritized for avoidance).
  • SNP Avoidance: Sites with a gnomAD variant frequency exceeding 5% and sites with a frequency exceeding 12.5% in the African/African American subgroup were excluded.
  • Compact Design: A quota of 3 guides was designed for each gene, resulting in 99.5% of genes meeting the quota requirements, and only 0.08% of guides exceeding the SNP frequency threshold.
  • Paralog Coverage: Multimapping guides were allowed to cover paralogous genes, and a single guide could target multiple genes simultaneously (e.g., FCGR2B and FCGR2C).

The entire library contains 60,550 unique guides, of which 95.0% were selected in the first round of selection, indicating that most genes had highly active and specific guides available.

Jacquere library design overview, guide overlap with Brunello and Gattinara, paralog targeting example, and RS3 score comparison across CRISPR-Cas9ko libraries.

Figure 1. Composition of Jacquere and comparison across CRISPR-Cas9ko genome-wide libraries. (Drepanos, et al. 2026)

Comprehensive Comparison with Existing Libraries

The Jacquere library outperforms existing mainstream libraries in several dimensions:

  • Highest Gene Coverage: Only 1/164 genes had no available guides, compared to 1/32 for MinLibCas9 and 1/13 for TKOv3.
  • On-Target Efficacy Was the Highest: The RS3 score distribution was significantly higher than other libraries (Mann-Whitney p < 0.00001).
  • Off-Target Risk Was the Lowest: The proportion of off-target sites with a CFD of 1.0 was only 3.2%, compared to 12.6% for the VBC library and 6.3% for MinLib2.
  • Effective Performance in Essential Gene Recall Tests: Jacquere exhibited the lowest false negative rate in recall tests of 201 essential genes.

Experimental Validation: Lower False Negative Rate

The research team performed depletion screen validation in A549 (lung cancer) and A375 (melanoma) cell lines. Good consistency was observed between biological replicates (Pearson r = 0.93 and 0.95, respectively).

Compared to the widely used Brunello library, Jacquere showed significant advantages:

  • ROC-AUC: 0.96 vs 0.92
  • PR-AUC: 0.98 vs 0.95

More importantly, the false negative rate was significantly reduced. Of the 131 essential genes that Brunello failed to detect, Jacquere recovered 97, with a similar false positive rate.

The study also compared the performance of single-guide and dual-guide vectors. Dual-guide vectors introduced more false positives—the depletion rate of non-essential genes was as high as 31.1% in Vienna-dual and 18.0% in Vienna-single, while Jacquere only reached 2.7%. This result suggests that while pursuing higher screening efficiency, the dual-guide strategy may introduce additional noise, requiring careful trade-offs.

Evaluating Hit Identification Methods

A well-designed library paired with an inappropriate analysis method can still produce misleading results. The research team compared three commonly used hit identification methods: Z-score, MAGeCK RRA, and MAGeCK MLE.

All three methods effectively distinguished between essential and non-essential genes, and performed better on Jacquere data. However, MAGeCK RRA has significant limitations: it relies solely on rank information while ignoring effect size, potentially leading to false positives (an anomaly in a single guide can cause a gene to be reported as a hit) and false negatives (when the depletion levels of different guides are inconsistent).

The research team provided a specific cautionary example: in an NK cell study, MAGeCK RRA reported Calhm2 as a positive selection hit, but none of its targeting guides were even among the top 25% of positively selected guides in the library. In such cases, results from a single method are unreliable.

Therefore, it is recommended to use multiple methods to cross-validate hits to avoid false findings due to systematic biases from a single method.

Limitations and Future Prospects

While the Jacquere library performed exceptionally well in design and validation, the research team candidly pointed out several limitations:

  • The library design did not consider the impact of copy number amplification regions, although this can be compensated for during analysis using tools such as CRISPRCleanR.
  • Some genes lack highly specific and active PAM neighbor sequences, necessitating the inclusion of suboptimal guides.
  • SNPs and in-frame mutations may still cause guides to fail in specific genetic contexts.
  • Well-designed libraries still require appropriate downstream analysis to maximize their value.

This work provides the CRISPR screening field with a rigorously validated, transparently designed whole-genome library. More importantly, it establishes a generalizable framework—finding a balance between on-target efficacy and off-target avoidance, making screening results more reliable.

Related Services

Reference

  1. Drepanos, L. M., et al.(2026). Balancing off-target and on-target considerations for optimized CRISPR-Cas9 knockout library design. Cell Genomics. DOI: 1016/j.xgen.2026.101190.
For research or industrial raw materials, not for personal medical use!
Online Inquiry