How to choose between rejection sampling and signal implantation¶
LIgO supports two ways of simulating receptors containing an immune signal: rejection sampling and signal implantation. Which one is a better choice depends on the signal and its motifs. In this text, we provide a short description of the two and outline some recommendations on choosing the right one for the specific use case.
Signal implantation¶
For a given receptor sequence, e.g., AAAAAAA, and a signal that has a motif CCC, implanting the given signal could result in the sequence AACCCAA. It overwrites the part of the sequence with the subsequence determined by the motif and other signal parameters.
Rejection sampling¶
All receptor sequences in LIgO come from some generative model. Most often, that can be the OLGA tool. It generates a set of background sequences. For the purpose of simulation, LIgO then annotates all the sequences as containing user-defined signals or not. If rejection sampling is used, all the sequences that do not contain the signal are then discarded. When this type of simulation is used, none of the sequences are overwritten, but instead, they are generated until a sufficient number of them is found that fulfills the conditions imposed by the signal definition.
For example, for a set of sequences AAA, CCC, TTT, CCA coming from a generative model, and a signal that has a motif CC, only sequences CCC and CCA will be kept as signal-specific.
Choosing between simulation strategies¶
Rejection sampling might be a better choice if:
it is important to preserve biological rules of sequence generation
the generation probabilities of sequences should be preserved
the signals naturally occur often in the sequences coming from the generative model.
Implanting might be a better choice:
waiting for signal-specific sequences to be generated by the generative model might take a long time due to rare motifs or rare combinations of motifs, positions and genes,
generative model does not contain any biological rules of sequence generation anyway (e.g., random sequences are generated for the purpose of some simple benchmarking).
How long it would take for a simulation to complete depends on the chosen immune signals, but it is hard to know in advance when choosing the signal how much the generation would take. To get an estimate of this, LIgO provides the Feasibility Summary analysis. For a tutorial on this analysis, see How to check feasibility of the simulation parameters.