How to check feasibility of the simulation parameters¶
LIgO supports simulations that consist of any immune signal (any k-mer or PWM or any combination of them), so it might hard to know in advance how frequent a signal is or how it works with other signals of interest. For that purpose, LIgO provides a separate analysis called FeasibilitySummary. It is especially useful if rejection sampling is used as a simulation strategy or when removing background receptors that accidentally contain some of the signals when implanting is used as a simulation strategy.
This analysis simulates a predefined number of sequences for each generative model provided in the simulation and reports the following information per generative model:
frequencies of each signal,
how many sequences contain how many signals,
joint probabilities for pairs of signals,
conditional probabilities of observing one signal in the sequences given that another is already observed,
sequence length distribution,
warnings if some signals are very rare or very frequent.
This tutorial shows how to run the FeasibilitySummary analysis.
Step 1: Define potential immune signals and the simulation¶
The first step is to define a simulation. For example, in this tutorial we will define 4 signals, with one motif each. We will request to simulate 2 immune repertoires corresponding to 2 individuals: one repertoire to have signal1 in 50% of the sequences and signal4 in 20%. The other repertoire will be specified to have 10% of sequences with signal1 and 20% of sequences with signal2. For more details on defining immune signals, see How to define immune signals and immune events.
The specification for the signals and the simulation looks like this:
motifs:
motif1:
seed: AS
motif2:
seed: G
motif3:
seed: C
motif4:
seed: SLVTY
signals:
signal1:
motifs:
- motif1
signal2:
motifs:
- motif2
signal3:
motifs:
- motif3
signal4:
motifs:
- motif4
simulations:
sim1:
is_repertoire: true
paired: false
sequence_type: amino_acid
sim_items:
repertoire_group1:
generative_model:
default_model_name: humanTRB
model_path: null
type: OLGA
immune_events:
ievent1: true
ievent2: false
is_noise: false
number_of_examples: 1
receptors_in_repertoire_count: 6
seed: 100
signals:
signal1: 0.5
signal4: 0.2
repertoire_group2:
generative_model:
default_model_name: humanTRB
model_path: null
type: OLGA
immune_events:
ievent1: false
ievent2: false
is_noise: false
number_of_examples: 1
receptors_in_repertoire_count: 10
seed: 2
signals:
signal1: 0.1
signal2: 0.2
simulation_strategy: RejectionSampling
Step 2: Define how to generate the summary¶
For the simulation with the given parameters, we can now specify how to provide the feasibility summary. We need to provide the number of receptor sequences to generate to conduct the analysis and connect it to the simulation we are interested in. The higher the number of generated sequences, the better the estimate of signal occurrences. However, higher number of sequences and their annotation and summarization will result in longer running times.
inst1:
sequence_count: 100
simulation: sim1
type: FeasibilitySummary
Step 3: Run the analysis and explore results¶
The full specification of the feasibility summary is the following:
definitions:
motifs:
motif1:
seed: AS
motif2:
seed: G
motif3:
seed: C
motif4:
seed: SLVTY
signals:
signal1:
motifs:
- motif1
signal2:
motifs:
- motif2
signal3:
motifs:
- motif3
signal4:
motifs:
- motif4
simulations:
sim1:
is_repertoire: true
paired: false
sequence_type: amino_acid
sim_items:
repertoire_group1:
generative_model:
default_model_name: humanTRB
model_path: null
type: OLGA
immune_events:
ievent1: true
ievent2: false
is_noise: false
number_of_examples: 1
receptors_in_repertoire_count: 6
seed: 100
signals:
signal1: 0.5
signal4: 0.2
repertoire_group2:
generative_model:
default_model_name: humanTRB
model_path: null
type: OLGA
immune_events:
ievent1: false
ievent2: false
is_noise: false
number_of_examples: 1
receptors_in_repertoire_count: 10
seed: 2
signals:
signal1: 0.1
signal2: 0.2
simulation_strategy: RejectionSampling
instructions:
inst1:
sequence_count: 100
simulation: sim1
type: FeasibilitySummary
output:
format: HTML
Save this specification as specs.yaml and run the feasibility analysis:
ligo specs.yaml simulation_output
Under the simulation_output folder, open the index.html file to explore the results.