miller/docs6/docs/date-time-examples.md.in
2021-08-25 22:01:32 -04:00

61 lines
2 KiB
Markdown

# Date/time examples
## How can I filter by date?
Given input like
GENMD_RUN_COMMAND
cat dates.csv
GENMD_EOF
we can use [strptime](reference-verbs.md#strptime) to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the [strptime](reference-verbs.md#strptime) format-string. For example:
GENMD_RUN_COMMAND
mlr --csv filter '
strptime($date, "%Y-%m-%d") > strptime("2018-03-03", "%Y-%m-%d")
' dates.csv
GENMD_EOF
Caveat: localtime-handling in timezones with DST is still a work in progress; see [https://github.com/johnkerl/miller/issues/170](https://github.com/johnkerl/miller/issues/170) . See also [https://github.com/johnkerl/miller/issues/208](https://github.com/johnkerl/miller/issues/208) -- thanks @aborruso!
## Finding missing dates
Suppose you have some date-stamped data which may (or may not) be missing entries for one or more dates:
GENMD_RUN_COMMAND
head -n 10 data/miss-date.csv
GENMD_EOF
GENMD_RUN_COMMAND
wc -l data/miss-date.csv
GENMD_EOF
Since there are 1372 lines in the data file, some automation is called for. To find the missing dates, you can convert the dates to seconds since the epoch using `strptime`, then compute adjacent differences (the `cat -n` simply inserts record-counters):
GENMD_RUN_COMMAND
mlr --from data/miss-date.csv --icsv \
cat -n \
then put '$datestamp = strptime($date, "%Y-%m-%d")' \
then step -a delta -f datestamp \
| head
GENMD_EOF
Then, filter for adjacent difference not being 86400 (the number of seconds in a day):
GENMD_RUN_COMMAND
mlr --from data/miss-date.csv --icsv \
cat -n \
then put '$datestamp = strptime($date, "%Y-%m-%d")' \
then step -a delta -f datestamp \
then filter '$datestamp_delta != 86400 && $n != 1'
GENMD_EOF
Given this, it's now easy to see where the gaps are:
GENMD_RUN_COMMAND
mlr cat -n then filter '$n >= 770 && $n <= 780' data/miss-date.csv
GENMD_EOF
GENMD_RUN_COMMAND
mlr cat -n then filter '$n >= 1115 && $n <= 1125' data/miss-date.csv
GENMD_EOF