Skip to contents

Computes the average substitution distance between two aligned amino acid sequences based on a specified substitution matrix. Currently supports the Grantham and FLU matrices.

Usage

substitution(seq1, seq2, method = "grantham", ambiguous_residues = "xX?-")

Arguments

seq1

A character string representing the first aligned amino acid sequence.

seq2

A character string representing the second aligned amino acid sequence.

method

Character string specifying the substitution matrix to use. Supported values are "grantham" and "flu" (case-insensitive).

ambiguous_residues

A character string of ambiguous residues to remove before computing distance.

Value

A numeric scalar representing the mean substitution distance between seq1 and seq2.

Details

This function first removes ambiguous residues from both sequences using remove_ambiguous_residues, validates the remaining residues, and computes pairwise distances using a substitution matrix. The result is normalized by sequence length.

Eventually we plan to support more matrices like BLOSUM and Sneath's index. If you want to use a specific substitution matrix please let us know.

References

- Grantham, R. (1974). Amino acid difference formula to help explain protein evolution. Science, 185(4154), 862-864. doi:10.1126/science.185.4154.862

- Dang, C.C., Le, Q.S., Gascuel, O., & Lartillot, N. (2010). FLU, an amino acid substitution model for influenza proteins. BMC Evolutionary Biology, 10, 99. doi:10.1186/1471-2148-10-99