Skip to contents

Computes pairwise substitution distances between aligned amino acid sequences in a vector, using a specified substitution matrix. Only the lower triangle of the distance matrix is calculated to reduce redundant computations.

Usage

dist_substitution(seqs, method = "grantham", ambiguous_residues = "xX?-")

Arguments

seqs

A named character vector of aligned amino acid sequences. All sequences must be of equal length and named.

method

Character string specifying the substitution matrix to use. Supported values are "grantham" and "flu" (case-insensitive).

ambiguous_residues

A character string of ambiguous residues to remove before computing distances.

Value

A symmetric numeric matrix of pairwise mean substitution distances.

Details

Only the lower triangle of the matrix is computed to avoid redundant calculations. The diagonal is set to zero. The matrix is then symmetrized before being returned.

Examples

seqs <- c(
  "A/H1N1/South Carolina/1/1918" = "mktiialsyifclvlgqdfpgndnstat",
  "A/H3N2/Darwin/9/2021" = "mktiialsnilclvfaqkipgndnstat",
  "B/Sichuan/379/1999" = "drictgitssnsphvvktatqgevnvtg"
)
dist_substitution(seqs, method = "grantham")
#>                              A/H1N1/South Carolina/1/1918 A/H3N2/Darwin/9/2021
#> A/H1N1/South Carolina/1/1918                      0.00000             13.17857
#> A/H3N2/Darwin/9/2021                             13.17857              0.00000
#> B/Sichuan/379/1999                               86.78571             81.21429
#>                              B/Sichuan/379/1999
#> A/H1N1/South Carolina/1/1918           86.78571
#> A/H3N2/Darwin/9/2021                   81.21429
#> B/Sichuan/379/1999                      0.00000