Skip to contents

Calculates the pairwise p-epitope distances between all sequences in a character vector, using defined epitope sites for a given influenza subtype.

Usage

dist_pepi(
  seqs,
  subtype,
  mode = "dominant",
  harmonize_b_lineages = TRUE,
  ambiguous_residues = "xX?"
)

Arguments

seqs

Named character vector of amino acid sequences. Each string must be of equal length and represent a full-length HA protein sequence.

subtype

Character. Influenza subtype or lineage. Currently allowed values are `"H1N1"`, `"A(H3N2)"`, `"B/Yamagata"`, `"B/Victoria`, and `"B/Presplit`. If `harmonize_b_lineages` is `TRUE` you can also specify just `B`.

mode

Character. How to summarize the epitope-wise distances. Options: - `"dominant"`/`"max"`: return the maximum epitope distance (Gupta et al. 2006). - `"anderson"`: average of normalized distances (Anderson et al. 2018). - `"all"`/`"average"`/`"mean"`: mean over all epitope residues (Pan et al. 2010). - `"median"`: median of per-epitope distances. - `NULL` or empty string: returns the full vector of distances.

harmonize_b_lineages

Logical. If `TRUE`, harmonizes B lineages using a unified epitope definition. Defaults to `TRUE`.

ambiguous_residues

Character vector. Residue symbols to exclude from comparison (e.g. `"xX?"`, the default).

Value

A symmetric numeric matrix of p-epitope distances with sequence names as row and column names.

Details

The function computes the lower triangle of a distance matrix using the `pepitope()` function, then mirrors it to fill the full matrix. Ambiguous residues are removed from each pairwise comparison before computing distance.

References

- Gupta et al. (2006), PMID: 16460844 - Pan et al. (2010), PMID: 21123189 - Anderson et al. (2018), PMID: 29433425

See also

[`pepitope()`]