How to check feasibility of the simulation parameters

LIgO supports simulations that consist of any immune signal (any k-mer or PWM or any combination of them), so it might hard to know in advance how frequent a signal is or how it works with other signals of interest. For that purpose, LIgO provides a separate analysis called FeasibilitySummary. It is especially useful if rejection sampling is used as a simulation strategy or when removing background receptors that accidentally contain some of the signals when implanting is used as a simulation strategy.

This analysis simulates a predefined number of sequences for each generative model provided in the simulation and reports the following information per generative model:

  • frequencies of each signal,

  • how many sequences contain how many signals,

  • joint probabilities for pairs of signals,

  • conditional probabilities of observing one signal in the sequences given that another is already observed,

  • sequence length distribution,

  • warnings if some signals are very rare or very frequent.

This tutorial shows how to run the FeasibilitySummary analysis.

Step 1: Define potential immune signals and the simulation

The first step is to define a simulation. For example, in this tutorial we will define 4 signals, with one motif each. We will request to simulate 2 immune repertoires corresponding to 2 individuals: one repertoire to have signal1 in 50% of the sequences and signal4 in 20%. The other repertoire will be specified to have 10% of sequences with signal1 and 20% of sequences with signal2. For more details on defining immune signals, see How to define immune signals and immune events.

The specification for the signals and the simulation looks like this:

motifs:
  motif1:
    seed: AS
  motif2:
    seed: G
  motif3:
    seed: C
  motif4:
    seed: SLVTY
signals:
  signal1:
    motifs:
    - motif1
  signal2:
    motifs:
    - motif2
  signal3:
    motifs:
    - motif3
  signal4:
    motifs:
    - motif4
simulations:
  sim1:
    is_repertoire: true
    paired: false
    sequence_type: amino_acid
    sim_items:
      repertoire_group1:
        generative_model:
          default_model_name: humanTRB
          model_path: null
          type: OLGA
        immune_events:
          ievent1: true
          ievent2: false
        is_noise: false
        number_of_examples: 1
        receptors_in_repertoire_count: 6
        seed: 100
        signals:
          signal1: 0.5
          signal4: 0.2
      repertoire_group2:
        generative_model:
          default_model_name: humanTRB
          model_path: null
          type: OLGA
        immune_events:
          ievent1: false
          ievent2: false
        is_noise: false
        number_of_examples: 1
        receptors_in_repertoire_count: 10
        seed: 2
        signals:
          signal1: 0.1
          signal2: 0.2
    simulation_strategy: RejectionSampling

Step 2: Define how to generate the summary

For the simulation with the given parameters, we can now specify how to provide the feasibility summary. We need to provide the number of receptor sequences to generate to conduct the analysis and connect it to the simulation we are interested in. The higher the number of generated sequences, the better the estimate of signal occurrences. However, higher number of sequences and their annotation and summarization will result in longer running times.

inst1:
  sequence_count: 100
  simulation: sim1
  type: FeasibilitySummary

Step 3: Run the analysis and explore results

The full specification of the feasibility summary is the following:

definitions:
  motifs:
    motif1:
      seed: AS
    motif2:
      seed: G
    motif3:
      seed: C
    motif4:
      seed: SLVTY
  signals:
    signal1:
      motifs:
      - motif1
    signal2:
      motifs:
      - motif2
    signal3:
      motifs:
      - motif3
    signal4:
      motifs:
      - motif4
  simulations:
    sim1:
      is_repertoire: true
      paired: false
      sequence_type: amino_acid
      sim_items:
        repertoire_group1:
          generative_model:
            default_model_name: humanTRB
            model_path: null
            type: OLGA
          immune_events:
            ievent1: true
            ievent2: false
          is_noise: false
          number_of_examples: 1
          receptors_in_repertoire_count: 6
          seed: 100
          signals:
            signal1: 0.5
            signal4: 0.2
        repertoire_group2:
          generative_model:
            default_model_name: humanTRB
            model_path: null
            type: OLGA
          immune_events:
            ievent1: false
            ievent2: false
          is_noise: false
          number_of_examples: 1
          receptors_in_repertoire_count: 10
          seed: 2
          signals:
            signal1: 0.1
            signal2: 0.2
      simulation_strategy: RejectionSampling
instructions:
  inst1:
    sequence_count: 100
    simulation: sim1
    type: FeasibilitySummary
output:
  format: HTML

Save this specification as specs.yaml and run the feasibility analysis:

ligo specs.yaml simulation_output

Under the simulation_output folder, open the index.html file to explore the results.