Calculates the pairwise p-epitope distances between all sequences in a character vector, using defined epitope sites for a given influenza subtype.
Usage
dist_pepi(
seqs,
subtype,
mode = "dominant",
harmonize_b_lineages = TRUE,
ambiguous_residues = "xX?"
)
Arguments
- seqs
Named character vector of amino acid sequences. Each string must be of equal length and represent a full-length HA protein sequence.
- subtype
Character. Influenza subtype or lineage. Currently allowed values are `"H1N1"`, `"A(H3N2)"`, `"B/Yamagata"`, `"B/Victoria`, and `"B/Presplit`. If `harmonize_b_lineages` is `TRUE` you can also specify just `B`.
- mode
Character. How to summarize the epitope-wise distances. Options: - `"dominant"`/`"max"`: return the maximum epitope distance (Gupta et al. 2006). - `"anderson"`: average of normalized distances (Anderson et al. 2018). - `"all"`/`"average"`/`"mean"`: mean over all epitope residues (Pan et al. 2010). - `"median"`: median of per-epitope distances. - `NULL` or empty string: returns the full vector of distances.
- harmonize_b_lineages
Logical. If `TRUE`, harmonizes B lineages using a unified epitope definition. Defaults to `TRUE`.
- ambiguous_residues
Character vector. Residue symbols to exclude from comparison (e.g. `"xX?"`, the default).
Value
A symmetric numeric matrix of p-epitope distances with sequence names as row and column names.
Details
The function computes the lower triangle of a distance matrix using the `pepitope()` function, then mirrors it to fill the full matrix. Ambiguous residues are removed from each pairwise comparison before computing distance.