genome-sampler label-seqs¶
Modifies sequence identifiers either by adding or removing metadata. If metadata and one or more columns are provided, the specified metadata columns will be added to the sequence id following the original sequence id and separated by delimiter. If metadata and columns are not provided, the first occurrence of delimiter and any characters following that will be removed from all sequence ids.
Citations¶
Inputs¶
- seqs:
FeatureData[Sequence¹ | AlignedSequence²] The sequences to be re-labeled.[required]
Parameters¶
- delimiter:
Str%Choices('|', ',', '+', ':', ';') The delimiter between the sequence id and each metadata entry.[required]
- metadata:
Metadata The metadata to embed in the header.[optional]
- columns:
List[Str] The columns in the metadata to be used.[optional]
- missing_value:
Str Value to use to indicate missing metadata column values for sequences.[default:
'missing']
Outputs¶
- labeled_seqs:
FeatureData[Sequence¹ | AlignedSequence²] The re-labeled sequences.[required]
- Bolyen, E., Dillon, M. R., Bokulich, N. A., Ladner, J. T., Larsen, B. B., Hepp, C. M., Lemmer, D., Sahl, J. W., Sanchez, A., Holdgraf, C., Sewell, C., Choudhury, A. G., Stachurski, J., McKay, M., Engelthaler, D. M., Worobey, M., Keim, P., & Gregory Caporaso, J. (2020). Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity. F1000 Research, 9(657), 657. 10.12688/f1000research.24751.1