Skip to article frontmatterSkip to article content

Downstream alignment and phylogenetics

QIIME 2 contains tools for sequence alignment, alignment filtering, and phylogenetic reconstruction in the q2-alignment and q2-phylogeny plugins. These are not installed with genome-sampler by default, but can be accessed by installing the QIIME 2 amplicon distribution.

QIIME 2 often wraps other widely used tools in QIIME 2 plugins rather than implement them directly. For example, under-the-hood, genome-sampler’s sample-diversity action is using vsearch Rognes et al., 2016. In this document we’ll build an alignment with MAFFT Katoh & Standley, 2013, apply a pre-computed alignment position mask, and then build a phylogenetic tree with IQTree 2 Minh et al., 2020.

Obtain reference sequence and alignment mask

wget -O alignment-mask.qza https://raw.githubusercontent.com/caporaso-lab/genome-sampler/r2020.8/snakemake/tutorial-data/alignment-mask.qza
wget -O sarscov2-reference-genome.qza https://raw.githubusercontent.com/caporaso-lab/genome-sampler/r2020.8/snakemake/tutorial-data/sarscov2-reference-genome.qza

Align sequences and build a tree

First, we’ll add the SARS-CoV-2 reference sequence to the sequence collection obtained in the tutorial. Notice that we’re working with the .qza file that was created in that tutorial, not the .fasta file that we exported.

qiime feature-table merge-seqs \
  --i-data sequences.qza \
  --i-data sarscov2-reference-genome.qza \
  --o-merged-data sequences-w-ref.qza

Next, we’ll perform sequence alignment using MAFFT.

qiime alignment mafft \
  --i-sequences sequences-w-ref.qza \
  --o-alignment aligned-sequences-w-ref.qza

After aligning the sequences, a “mask” can be applied to filter positions from the alignment that are likely to be uninformative. At present, we’re experimenting with a [alignment-mask](#pre-computed alignment mask).

qiime genome-sampler mask \
  --i-alignment aligned-sequences-w-ref.qza \
  --i-mask alignment-mask.qza \
  --o-masked-alignment masked-aligned-sequences-w-ref.qza

Finally, we build a tree from the resulting alignment. This will generate an unrooted phylogenetic tree.

qiime phylogeny iqtree \
  --i-alignment masked-aligned-sequences-w-ref.qza \
  --o-tree unrooted-tree.qza

This .qza file can be viewed directly with iTOL to get a quick look.

All of the .qza files that were generated in this example can be exported using qiime tools export. Exporting of sequence or alignment files will provide you with fasta files by default, and exporting of the phylogenetic tree will provide you with a newick file by default. See the QIIME 2 exporting documentation for more details.

References
  1. Rognes, T., Flouri, T., Nichols, B., Quince, C., & Mahé, F. (2016). VSEARCH: a versatile open source tool for metagenomics. PeerJ, 4, e2584. 10.7717/peerj.2584
  2. Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol., 30(4), 772–780.
  3. Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., von Haeseler, A., & Lanfear, R. (2020). IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol., 37(5), 1530–1534.