Overview of LIgO simulation parameters

This page provides an overview of LIgO simulation parameters and their potential biological reflection. Refers to Supplementary Table 2 in the LIgO manuscript.

Generative models for simulation of background AIRs

Link to documentation:

Simulation parameter(s)

Potential biological reflection

ExperimentalImport provides import of existing experimental data of one of the following data formats: AIRR, Generic, IGoR, IReceptor, ImmuneML, ImmunoSEQRearrangement, ImmunoSEQSample, MiXCR, OLGA, SingleLineReceptor, TenxGenomics, VDJdb

These parameters relate biologically to germline gene usage differences across individuals as previously shown by us and others (Glanville et al. 2011; Slabodkin et al. 2021).

OLGA provides simulation of synthetic AIRs using a V(D)J recombination model, including a user-defined custom model

Same as above

Motif definition

Links to documentation:

Simulation parameter(s)

Potential biological reflection

SeedMotif describes a motif using a (i) seed (the initial k-mer) and (ii) allowed deviations from the seed. The allowed deviations can be described with allowed gap length (min_gap, max_gap), distribution of number of allowed mismatches to the seed (hamming_distance_probabilities), distribution of mismatches across the seed positions (position_weights), and substitution probabilities for nucleotides or amino acid (alphabet_weights).

Can be used to describe antigen-binding motifs, which are present in signal-specfic AIRs as was previously observed by us and others (Akbar et al. 2021; Shrock et al. 2023; Ostmeyer et al. 2019; Shugay et al. 2018; Goncharov et al. 2022; Chronister et al. 2021)

PWM — motif described in a form of a positional weight matrix

Same as above

Signal definition

Link to documentation:

Simulation parameter(s)

Potential biological reflection

Motifs — several motifs within an immune signal

Relates to motifs co-occurrence within the same AIR (2 motifs max), since immune signals may comprise multiple co-occurring motifs within the same AIR (Akbar et al. 2021; Glanville et al. 2017; Dash et al. 2017)

Sequence_position_weights describes signal distribution across CDR3 IMGT positions

Helps preserve the conservative start and end of CDR3, and controls a motif predominance in a specific location within CDR3

V_call, J_call restrict the V or J gene (allele) in the immune signal

Relates to previous observations that some signal-specific AIRs are associated with a specific germline gene alleles (Feeney et al. 1996; Avnir et al. 2016; Liu and Lucas 2003; Imkeller et al. 2018; Fagiani, Catanzaro, and Lanni 2020)

Clonal_frequency describes clonal frequency distribution, can be defined separately for each group of immune signals

Relates to previous observation by us and others that clonal frequency distributions may change across immune statuses (Greiff et al. 2015; Chiffelle et al. 2020).

Custom signal function (source_file, is_present_func) — any user-defined function which takes as arguments motifs, v_call, j_call, and sequence_position_weights

Helps to define any custom immune signal

Parameters of the simulation

Link to documentation:

Simulation parameter(s)

Potential biological reflection

Is_repertoire defines receptor or repertoire level of simulation

Relates to different types of available AIRR data

Paired defines how to pair the output data

Relates to different types of available AIRR data

Sequence_type — defines the nucleotide or amino acid type of simulated AIRs

Relates to different types of available AIRR data

Species — human (default) or mouse

Relates to different types of available AIRR data

Keep_p_gen_dist implements importance sampling, i.e., subsample signal-specific AIRs with respect to pgen distribution of background AIRs

Relates to previous observations by us and other that found marked differences in pgen and clonal frequency which may relate to immune signal (Kanduri et al. 2023; Pogorelyy et al. 2019)

Remove_seqs_with_signals filters signal-specific AIRs from the background if True

Helps to control the exact number of signal-specific receptors within a set of AIRs (if True) or make simulated data more complex (if False)

References

  • Akbar, Rahmad, Philippe A. Robert, Milena Pavlović, Jeliazko R. Jeliazkov, Igor Snapkov, Andrei Slabodkin, Cédric R. Weber, et al. 2021. “A Compact Vocabulary of Paratope-Epitope Interactions Enables Predictability of Antibody-Antigen Binding.” Cell Reports 34 (11): 108856.

  • Avnir, Yuval, Corey T. Watson, Jacob Glanville, Eric C. Peterson, Aimee S. Tallarico, Andrew S. Bennett, Kun Qin, et al. 2016. “IGHV1-69 Polymorphism Modulates Anti-Influenza Antibody Repertoires, Correlates with IGHV Utilization Shifts and Varies by Ethnicity.” Scientific Reports 6 (February):20842.

  • Chiffelle, Johanna, Raphael Genolet, Marta As Perez, George Coukos, Vincent Zoete, and Alexandre Harari. 2020. “T-Cell Repertoire Analysis and Metrics of Diversity and Clonality.” Current Opinion in Biotechnology 65 (October):284–95.

  • Chronister, William D., Austin Crinklaw, Swapnil Mahajan, Randi Vita, Zeynep Koşaloğlu-Yalçın, Zhen Yan, Jason A. Greenbaum, et al. 2021. “TCRMatch: Predicting T-Cell Receptor Specificity Based on Sequence Similarity to Previously Characterized Receptors.” Frontiers in Immunology 12 (March):640725.

  • Dash, Pradyot, Andrew J. Fiore-Gartland, Tomer Hertz, George C. Wang, Shalini Sharma, Aisha Souquette, Jeremy Chase Crawford, et al. 2017. “Quantifiable Predictive Features Define Epitope-Specific T Cell Receptor Repertoires.” Nature 547 (7661): 89–93.

  • Fagiani, Francesca, Michele Catanzaro, and Cristina Lanni. 2020. “Molecular Features of IGHV3-53-Encoded Antibodies Elicited by SARS-CoV-2.” Signal Transduction and Targeted Therapy 5 (1): 170.

  • Feeney, A. J., M. J. Atkinson, M. J. Cowan, G. Escuro, and G. Lugo. 1996. “A Defective Vkappa A2 Allele in Navajos Which May Play a Role in Increased Susceptibility to Haemophilus Influenzae Type B Disease.” The Journal of Clinical Investigation 97 (10): 2277–82.

  • Glanville, Jacob, Huang Huang, Allison Nau, Olivia Hatton, Lisa E. Wagar, Florian Rubelt, Xuhuai Ji, et al. 2017. “Identifying Specificity Groups in the T Cell Receptor Repertoire.” Nature 547 (June):94–98.

  • Glanville, Jacob, Tracy C. Kuo, H-Christian von Büdingen, Lin Guey, Jan Berka, Purnima D. Sundar, Gabriella Huerta, et al. 2011. “Naive Antibody Gene-Segment Frequencies Are Heritable and Unaltered by Chronic Lymphocyte Ablation.” Proceedings of the National Academy of Sciences of the United States of America 108 (50): 20066–71.

  • Goncharov, Mikhail, Dmitry Bagaev, Dmitrii Shcherbinin, Ivan Zvyagin, Dmitry Bolotin, Paul G. Thomas, Anastasia A. Minervina, et al. 2022. “VDJdb in the Pandemic Era: A Compendium of T Cell Receptors Specific for SARS-CoV-2.” Nature Methods 19 (9): 1017–19.

  • Greiff, Victor, Pooja Bhat, Skylar C. Cook, Ulrike Menzel, Wenjing Kang, and Sai T. Reddy. 2015. “A Bioinformatic Framework for Immune Repertoire Diversity Profiling Enables Detection of Immunological Status.” Genome Medicine 7 (1): 49.

  • Haakenson, Jeremy K., Ruiqi Huang, and Vaughn V. Smider. 2018. “Diversity in the Cow Ultralong CDR H3 Antibody Repertoire.” Frontiers in Immunology 9 (June):1262.

  • Imkeller, Katharina, Stephen W. Scally, Alexandre Bosch, Gemma Pidelaserra Martí, Giulia Costa, Gianna Triller, Rajagopal Murugan, et al. 2018. “Antihomotypic Affinity Maturation Improves Human B Cell Responses against a Repetitive Epitope.” Science 360 (6395): 1358–62.

  • Kanduri, Chakravarthi, Lonneke Scheffer, Milena Pavlović, Knut Dagestad Rand, Maria Chernigovskaya, Oz Pirvandy, Gur Yaari, Victor Greiff, and Geir K. Sandve. n.d. “simAIRR: Simulation of Adaptive Immune Repertoires with Realistic Receptor Sequence Sharing for Benchmarking of Immune State Prediction Methods.” GigaScience. https://doi.org/10.1093/gigascience/giad074.

  • Liu, Leyu, and Alexander H. Lucas. 2003. “IGH V3-23*01 and Its Allele V3-23*03 Differ in Their Capacity to Form the Canonical Human Antibody Combining Site Specific for the Capsular Polysaccharide of Haemophilus Influenzae Type B.” Immunogenetics 55 (5): 336–38.

  • Ostmeyer, Jared, Scott Christley, Inimary T. Toby, and Lindsay G. Cowell. 2019. “Biophysicochemical Motifs in T-Cell Receptor Sequences Distinguish Repertoires from Tumor-Infiltrating Lymphocyte and Adjacent Healthy Tissue.” Cancer Research 79 (7): 1671–80.

  • Pogorelyy, Mikhail V., Anastasia A. Minervina, Mikhail Shugay, Dmitriy M. Chudakov, Yuri B. Lebedev, Thierry Mora, and Aleksandra M. Walczak. 2019. “Detecting T Cell Receptors Involved in Immune Responses from Single Repertoire Snapshots.” PLoS Biology 17 (6): e3000314.

  • Roark, Ryan S., Hui Li, Wilton B. Williams, Hema Chug, Rosemarie D. Mason, Jason Gorman, Shuyi Wang, et al. 2021. “Recapitulation of HIV-1 Env-Antibody Coevolution in Macaques Leading to Neutralization Breadth.” Science 371 (6525). https://doi.org/10.1126/science.abd2638.

  • Sethna, Zachary, Yuval Elhanati, Curtis G. Callan, Aleksandra M. Walczak, and Thierry Mora. 2019. “OLGA: Fast Computation of Generation Probabilities of B- and T-Cell Receptor Amino Acid Sequences and Motifs.” Bioinformatics. https://doi.org/10.1093/bioinformatics/btz035.

  • Shrock, Ellen L., Richard T. Timms, Tomasz Kula, Elijah L. Mena, Anthony P. West Jr, Rui Guo, I-Hsiu Lee, et al. 2023. “Germline-Encoded Amino Acid-Binding Motifs Drive Immunodominant Public Antibody Responses.” Science 380 (6640): eadc9498.

  • Shugay, Mikhail, Dmitriy V. Bagaev, Ivan V. Zvyagin, Renske M. Vroomans, Jeremy Chase Crawford, Garry Dolton, Ekaterina A. Komech, et al. 2018. “VDJdb: A Curated Database of T-Cell Receptor Sequences with Known Antigen Specificity.” Nucleic Acids Research 46 (D1): D419–27.

  • Slabodkin, Andrei, Maria Chernigovskaya, Ivana Mikocziova, Rahmad Akbar, Lonneke Scheffer, Milena Pavlović, Habib Bashour, et al. 2021. “Individualized VDJ Recombination Predisposes the Available Ig Sequence Space.” Genome Research, November. https://doi.org/10.1101/gr.275373.121.