
Pairwise Substitution Distances Between Amino Acid Sequences
Source:R/substitution.R
dist_substitution.Rd
Computes pairwise substitution distances between aligned amino acid sequences in a vector, using a specified substitution matrix. Only the lower triangle of the distance matrix is calculated to reduce redundant computations.
Arguments
- seqs
A named character vector of aligned amino acid sequences. All sequences must be of equal length and named.
- method
Character string specifying the substitution matrix to use. Supported values are
"grantham"
and"flu"
(case-insensitive).- ambiguous_residues
A character string of ambiguous residues to remove before computing distances.
Details
Only the lower triangle of the matrix is computed to avoid redundant calculations. The diagonal is set to zero. The matrix is then symmetrized before being returned.
Examples
seqs <- c(
"A/H1N1/South Carolina/1/1918" = "mktiialsyifclvlgqdfpgndnstat",
"A/H3N2/Darwin/9/2021" = "mktiialsnilclvfaqkipgndnstat",
"B/Sichuan/379/1999" = "drictgitssnsphvvktatqgevnvtg"
)
dist_substitution(seqs, method = "grantham")
#> A/H1N1/South Carolina/1/1918 A/H3N2/Darwin/9/2021
#> A/H1N1/South Carolina/1/1918 0.00000 13.17857
#> A/H3N2/Darwin/9/2021 13.17857 0.00000
#> B/Sichuan/379/1999 86.78571 81.21429
#> B/Sichuan/379/1999
#> A/H1N1/South Carolina/1/1918 86.78571
#> A/H3N2/Darwin/9/2021 81.21429
#> B/Sichuan/379/1999 0.00000