QIIME 2 contains tools for sequence alignment, alignment filtering, and phylogenetic reconstruction in the q2-alignment and q2-phylogeny plugins. These are not installed with genome-sampler by default, but can be accessed by installing the QIIME 2 amplicon distribution.
QIIME 2 often wraps other widely used tools in QIIME 2 plugins rather than implement them directly.
For example, under-the-hood, genome-sampler
’s sample-diversity
action is using vsearch
Rognes et al., 2016.
In this document we’ll build an alignment with MAFFT Katoh & Standley, 2013, apply a pre-computed alignment position mask, and then build a phylogenetic tree with IQTree 2 Minh et al., 2020.
Obtain reference sequence and alignment mask¶
wget -O alignment-mask.qza https://raw.githubusercontent.com/caporaso-lab/genome-sampler/r2020.8/snakemake/tutorial-data/alignment-mask.qza
wget -O sarscov2-reference-genome.qza https://raw.githubusercontent.com/caporaso-lab/genome-sampler/r2020.8/snakemake/tutorial-data/sarscov2-reference-genome.qza
Align sequences and build a tree¶
First, we’ll add the SARS-CoV-2 reference sequence to the sequence collection obtained in the tutorial.
Notice that we’re working with the .qza
file that was created in that tutorial, not the .fasta
file that we exported.
qiime feature-table merge-seqs \
--i-data sequences.qza \
--i-data sarscov2-reference-genome.qza \
--o-merged-data sequences-w-ref.qza
Next, we’ll perform sequence alignment using MAFFT.
qiime alignment mafft \
--i-sequences sequences-w-ref.qza \
--o-alignment aligned-sequences-w-ref.qza
After aligning the sequences, a “mask” can be applied to filter positions from the alignment that are likely to be uninformative. At present, we’re experimenting with a [alignment-mask](#pre-computed alignment mask).
qiime genome-sampler mask \
--i-alignment aligned-sequences-w-ref.qza \
--i-mask alignment-mask.qza \
--o-masked-alignment masked-aligned-sequences-w-ref.qza
Finally, we build a tree from the resulting alignment. This will generate an unrooted phylogenetic tree.
qiime phylogeny iqtree \
--i-alignment masked-aligned-sequences-w-ref.qza \
--o-tree unrooted-tree.qza
This .qza
file can be viewed directly with iTOL to get a quick look.
All of the .qza
files that were generated in this example can be exported using qiime tools export
.
Exporting of sequence or alignment files will provide you with fasta files by default, and exporting of the phylogenetic tree will provide you with a newick file by default.
See the QIIME 2 exporting documentation for more details.
- Rognes, T., Flouri, T., Nichols, B., Quince, C., & Mahé, F. (2016). VSEARCH: a versatile open source tool for metagenomics. PeerJ, 4, e2584. 10.7717/peerj.2584
- Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol., 30(4), 772–780.
- Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., von Haeseler, A., & Lanfear, R. (2020). IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol., 37(5), 1530–1534.