Skip to contents

This function removes positions containing specified ambiguous residues (e.g., "X") from a character vector of aligned sequences. Removal is performed listwise—i.e., any position containing an ambiguous residue in any sequence will be removed from all sequences. This ensures all sequences remain aligned.

Usage

remove_ambiguous_residues(seqs, ambiguous_residues = "xX?")

Arguments

seqs

A character vector of aligned amino acid sequences. Sequences may optionally be named.

ambiguous_residues

A length-1 character string where each character represents a residue to treat as ambiguous. Defaults to `"xX?"`.

Value

A character vector of the same length as `seqs`, with the same names (if any), where all positions containing ambiguous residues in any sequence have been removed.

Details

If `ambiguous_residues` is an empty string `""`, no residues will be removed.

A common modification that is required for some distance metrics is adding a gap character to `ambiguous_residies`, i.e., `"xX?-"`.

Examples

seqs <- c(a = "ACDXFG", b = "AXCXFG", c = "ACDYFG")
remove_ambiguous_residues(seqs)
#>      a      b      c 
#> "ADFG" "ACFG" "ADFG" 
# Returns c(a = "ACFG", b = "ACFG", c = "ACFG")