AIRR-ML-25: Adaptive Immune Profiling Challenge

Overview ⏳ Time left until challenge launch
Loading...

In this competition, you’ll develop machine learning models to simultaneously perform two tasks: (a) predict the immune state (e.g. disease, healthy) of individuals based on so-called adaptive immune repertoires (sets of protein sequences), and (b) identify immune state-associated receptor sequences (those that explain immune state in the first task). The goal is to expedite ML-based solutions for immunodiagnostics and therapeutics discovery.

Timeline¶

November 05, 2025 - Start Date. (opens at 08:00 AM CET)
December 17, 2025 - Final Submission Deadline (closes at 07:59 AM CET).

All deadlines are at 11:59 PM CET on the corresponding day unless otherwise noted. The competition organizers reserve the right to update the contest timeline if they deem it necessary.

How to Participate¶

The competition is open to everyone, and will be hosted on the popular Kaggle platform. All you need to do is create a Kaggle account, accept the competition rules, and start coding! The competition will be live on November 05 at the following URL: https://www.kaggle.com/competitions/adaptive-immune-profiling-challenge-2025.

Prizes¶

Monetary rewards¶

1st Place - $ 5,000
2nd Place - $ 3,000
3rd Place - $ 2,000

Eligibility¶

To win the prize money, a prerequisite is that the participants make their code open-source.

Sponsorship¶

Competition prizes are kindly sponsored by The Research Council of Norway.

Scientific manuscript authorship¶

Top 10 performing participants on the final Leaderboard rankings will be invited to contribute their model descriptions, related discussions, and code to a scientific paper summarizing the competition's scientific outcome. Nature Methods has "accepted in principle" to publish this work.

Organizers¶

Many awesome people have contributed to making this community challenge happen including:

Chakravarthi Kanduri^1,2, Thomas Konstantinovsky^3,†, Puneet Rawat^4,†, Milena Pavlovic^1,2, Damon H. May⁵, Rebecca Elyanow⁵, Bryan Howie⁵, Harlan S. Robins⁵, Crina Curca⁶, Bryan Hariadi⁶, Ashwath Kumar⁶, Jose Jacob⁶, Efthymia Papalexi⁶, Charles Roco⁶, Alex Rosenberg⁶, AIRR-Community Machine Learning Working Group, Justin Barton⁷, Günter Klambauer⁸, Encarnita Mariotti-Ferrandiz⁹, Pieter Meysman¹⁰, Eline T. Luning Prak¹¹, Lindsay G. Cowell¹², Todd M. Brusko^13,14,15, Gur Yaari^3,16,‡, Victor Greiff^4,17,‡, Geir Kjetil Sandve^1,2,‡

¹ Scientific Computing and Machine Learning section, Department of Informatics, University of Oslo, Norway
² UiORealArt Convergence Environment, University of Oslo, Norway
³ Faculty of Engineering and Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar-Ilan University, Israel
⁴ Department of Immunology, University of Oslo, Oslo, Norway
⁵ Adaptive Biotechnologies, Seattle, WA, USA
⁶ Parse Biosciences, Seattle, WA, USA
⁷ Institute of Structural and Molecular Biology, University of London, United Kingdom
⁸ Institute for Machine Learning, Johannes Kepler University Linz, Austria
⁹ Sorbonne Université, INSERM, UMRS959, Immunology-Immunopathology-Immunotherapy (i3) lab, Paris, France
¹⁰ Adrem Data Lab, Department of Computer Science, University of Antwerp, Belgium
¹¹ Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
¹² Department of Health Data Science and Biostatistics, Peter O'Donnell Jr. School of Public Health; Department of Immunology, School of Biomedical Sciences; UT Southwestern Medical Center, Dallas, TX, USA
¹³ Department of Pathology, Immunology, and Laboratory Medicine, Diabetes Institute, College of Medicine, University of Florida, Gainesville, FL, USA
¹⁴ Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL, USA
¹⁵ Department of Biochemistry and Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
¹⁶ Department of Pathology, Yale School of Medicine, New Haven, CT, USA
¹⁷ Imprint Labs, LLC, New York, NY, USA

^† Equal contribution
^‡ Equal contribution

Correspondence: geirksa@ifi.uio.no, victor.greiff@medisin.uio.no, gur.yaari@yale.edu

Note: The contributors list shown above does not reflect the final list of authors, and authorship order, for the scientific manuscript summarizing the competition's scientific outcome. As described above, the top 10 performing participants on the final Leaderboard rankings will be invited to contribute to this manuscript and become co-authors.

Thanks to the AIRR-community for the shared vision and collective perspective in organizing this challenge.

Further details¶

Description of problem¶

Imagine your body's immune system as a vast, personal army, constantly on guard against invaders like viruses and bacteria. Each soldier in this army is an "immune receptor," a tiny protein designed to recognize and fight off threats. What is truly incredible is the sheer variety of these soldiers: you have billions of unique immune receptors, each one a potential weapon against a new disease!

When a new enemy (what researchers call an "antigen," like a specific virus variant) attacks, only a tiny handful of these billions of immune receptors are the perfect match to bind to it and neutralize the threat. It is like finding a needle in a haystack, but your body does it all the time.

Now, here is the exciting challenge: What if we could peek into this personal army of immune receptors from many different people? We will have collections of their unique immune receptors (called "repertoires"), and we will also know if those individuals have a certain immune state (e.g. diseased or healthy).

The big questions for this competition:

Can we predict a person's disease just by looking at their immune receptor "fingerprint"? Without knowing which receptors fight which diseases, can your machine learning models learn to identify patterns in these immune receptor collections that tell us if someone is sick or healthy?
Can we identify the "contributing" immune receptors? If our models can predict disease, can they also tell us which specific immune receptors are most strongly linked to a particular disease? This would be like finding the star soldiers in the immune army!

Solving these problems is a huge step forward for medicine. It could lead to new ways to diagnose diseases earlier and even develop targeted treatments based on our own immune system's unique capabilities.

Evaluation¶

There will be a total of eight training datasets and ten test datasets included in this competition. For each repertoire_id across all test datasets, the participants has to return a probability for the repertoire being label-positive. In addition, a ranked list of the top 50,000 unique rows (including junction_aa, v_call, and j_call) that best contribute to the optimal classification for each training dataset has to be returned, regardless of the data encoding used. Note that these label-associated sequences have to be sorted based on some form of importance scores from most important to less important; we may use only top-n sequences from the ordered list of 50k sequences for evaluation. These will be used to compute the performance metrics area under the ROC curve and Jaccard similarity, respectively, for each of the datasets. A weighted average of both measures across all the included datasets will be used as the basis for ranking on the leaderboard for the competition.

Additional resources¶

Link to come: A pre-registered protocol describing all the details of the competition including extensive background information, dataset descriptions, evaluation process, and pilot data providing reference benchmarks

What's the state-of-the-art in mining Adaptive Immune Repertoires?

Examples of state-of-the-art methods

📧 Stay Updated¶

Don't miss any important updates about the challenge! Subscribe to our newsletter for:

Competition announcements and timeline updates
Technical insights and tips from organizers
Community highlights and participant spotlights
Results and findings from the challenge

Acknowledgements¶

Adaptive Biotechnologies has generously provided ~ 500 unpublished TCRβ repertoires from a cohort of donors with known status with respect to HSV-2 infection.

Parse Biosciences has generously provided unpublished experimental antigen-specific TCR sequences for use in synthetic datasets. TCR Sequencing of 1 Million Antigen-Reactive Human T Cells in a Single Experiment, https://www.parsebiosciences.com/datasets/tcr-sequencing-of-1-million-antigen-reactive-human-t-cells-in-a-single-experiment/; Parse Biosciences, Seattle, USA, Accessed 13 March 2025.

Citation¶

AIRR-ML-2025 Organizers. AIRR-ML-2025: Adaptive Immune Profiling Challenge. https://www.kaggle.com/competitions/adaptive-immune-profiling-challenge-2025, 2025. Kaggle.