Skip to article frontmatterSkip to article content

sample-longitudinal

genome-sampler sample-longitudinal

Sample dates at random without replacement from each user-defined interval. Dates should be provided in ISO-8601 format (see ISO 8601) both in metadata and for start_date.

Citations

Bolyen et al., 2020

Inputs

context_seqs: FeatureData[Sequence]

The context sequences to be sampled from. Providing this will restrict the IDs sampled to only those which have an associated sequence.[optional]

Parameters

dates: MetadataColumn[Categorical]

Dates to sample from.[required]

start_date: Str

Start date of first interval. Dates before this date will be excluded. The start date plus the days_per_interval defines the bounds of the sampling intervals. If not provided, this will default to the first date in metadata.[optional]

samples_per_interval: Int % Range(1, None)

The number of random dates to select in each interval.[default: 7]

days_per_interval: Int % Range(1, None)

The length of each interval in days.[default: 7]

seed: Int % Range(0, None)

Seed used for random number generators.[optional]

Outputs

selection: FeatureData[Selection]

The selected ids (i.e., the subsampled dates).[required]

References
  1. Bolyen, E., Dillon, M. R., Bokulich, N. A., Ladner, J. T., Larsen, B. B., Hepp, C. M., Lemmer, D., Sahl, J. W., Sanchez, A., Holdgraf, C., Sewell, C., Choudhury, A. G., Stachurski, J., McKay, M., Engelthaler, D. M., Worobey, M., Keim, P., & Gregory Caporaso, J. (2020). Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity. F1000 Research, 9(657), 657. 10.12688/f1000research.24751.1