genome-sampler sample-longitudinal¶
Sample dates at random without replacement from each user-defined interval. Dates should be provided in ISO-8601 format (see ISO 8601) both in metadata and for start_date.
Citations¶
Inputs¶
- context_seqs:
FeatureData[Sequence] The context sequences to be sampled from. Providing this will restrict the IDs sampled to only those which have an associated sequence.[optional]
Parameters¶
- dates:
MetadataColumn[Categorical] Dates to sample from.[required]
- start_date:
Str Start date of first interval. Dates before this date will be excluded. The start date plus the
days_per_intervaldefines the bounds of the sampling intervals. If not provided, this will default to the first date in metadata.[optional]- samples_per_interval:
Int%Range(1, None) The number of random dates to select in each interval.[default:
7]- days_per_interval:
Int%Range(1, None) The length of each interval in days.[default:
7]- seed:
Int%Range(0, None) Seed used for random number generators.[optional]
Outputs¶
- selection:
FeatureData[Selection] The selected ids (i.e., the subsampled dates).[required]
- Bolyen, E., Dillon, M. R., Bokulich, N. A., Ladner, J. T., Larsen, B. B., Hepp, C. M., Lemmer, D., Sahl, J. W., Sanchez, A., Holdgraf, C., Sewell, C., Choudhury, A. G., Stachurski, J., McKay, M., Engelthaler, D. M., Worobey, M., Keim, P., & Gregory Caporaso, J. (2020). Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity. F1000 Research, 9(657), 657. 10.12688/f1000research.24751.1