Newsroom
B-cell receptors (BCRs) are not only molecular keys for antigen recognition, but also record the history of B-cell activation, differentiation, and clonal evolution. However, traditional bioinformatics methods struggle to capture these complex nonlinear relationships. Single-cell sequencing provides detailed insights, but its high-cost limits large-scale clinical applications.
In a study published in Briefings in Bioinformatics, researchers led by Profs. GU Hongcang and ZHANG Fan from the Hefei Institutes of Physical Science of the Chinese Academy of Sciences developed BCRInsight, an artificial intelligence (AI) language model capable of decoding the complex "fingerprints" of BCRs, which is based on phenotype-aware contrastive learning and leverages self-supervised learning on large-scale sequence datasets.
BCRInsight integrates amino acid sequences with gene annotations and metadata, in a manner analogous to paired sentence encoding in natural language processing. The model is built on a 12-layer Transformer encoder with 86 million parameters.
BCRInsight demonstrated strong performance in benchmark evaluations. It accurately deconvolved B-cell subset compositions from bulk BCR-seq data at a fraction of the cost of single-cell approaches. In antibody paratope prediction, it achieved an AUROC of 0.962, outperforming nine advanced international methods in direct comparisons.
Moreover, the researchers found that the model exhibited emergent structural awareness. Even without any three-dimensional structural training, BCRInsight’s attention mechanisms consistently focused on critical HCDR3 loops, which are central to antigen binding and other important regions. This indicates that the model can infer structural and functional features directly from sequence and metadata alone, representing a breakthrough in predictive immunology.
BCRInsight offers a cost-effective tool for decoding immune repertoires and enables large-scale patient analysis. By identifying key paratope regions and functional patterns, it may also support antibody design, therapeutic optimization, and the development of personalized vaccines and immunotherapies.