How to simulate co-occuring immune signals¶
LIgO supports simulation of co-occurring immune signals using rejection sampling. In this tutorial, we replicate the simulation of the usecase 2 from the LIgO manuscript. Briefly, we will perform repertoire-level simulation, where some TRBs contain two signals belonging to two different immune events. We define signal 1 as a 3-mer GDT and signal 2 as a 3-mer SGL.
Step 1: Define immune signals¶
We begin by defining immune signals for simulation. This step remains consistent with standard LIgO simulation, even when we aim to simulate the occurrence of two immune signals within one receptor.
definitions:
motifs:
motif1:
seed: GDT
motif2:
seed: SGL
signals:
signal1:
motifs: [motif1]
signal2:
motifs: [motif2]
Step 2: Define frequency of each individual signal and the pair of signals in a repertoire¶
simulations:
sim1:
is_repertoire: true
paired: false
sequence_type: amino_acid
simulation_strategy: RejectionSampling
sim_items:
AIRR1:
generative_model:
chain: beta
default_model_name: humanTRB
model_path: null
type: OLGA
is_noise: false
number_of_examples: 10 # we simulate 10 reprtoires
receptors_in_repertoire_count: 1000 # we simulate 1000 BCRs in each repertoire
signals:
signal1__signal2: 0.1 # 10% of BCRs contain both signal 1 and signal 2
signal1: 0.2 # 20% of BCRs contain signal 1
signal2: 0.2 # 20% of BCRs contain signal 2
Step 3: Run the simulation with the following yaml file¶
definitions:
motifs:
motif1:
seed: GDT
motif2:
seed: SGL
signals:
signal1:
motifs: [motif1]
signal2:
motifs: [motif2]
simulations:
sim1:
is_repertoire: true
paired: false
sequence_type: amino_acid
simulation_strategy: RejectionSampling
sim_items:
AIRR1:
generative_model:
chain: beta
default_model_name: humanTRB
model_path: null
type: OLGA
is_noise: false
number_of_examples: 10 # we simulate 10 reprtoires
receptors_in_repertoire_count: 1000 # we simulate 1000 BCRs in each repertoire
signals:
signal1__signal2: 0.1 # 10% of BCRs contain both signal 1 and signal 2
signal1: 0.2 # 20% of BCRs contain signal 1
signal2: 0.2 # 20% of BCRs contain signal 2
instructions:
inst1:
export_p_gens: false # could take some time to compute (from olga)
max_iterations: 1000
number_of_processes: 4
sequence_batch_size: 100000
simulation: sim1
type: LigoSim