miller/docs/src/reference-dsl-time.md.in
2023-12-19 09:52:16 -05:00

364 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# DSL datetime/timezone functions
Dates/times are not a separate data type; Miller uses ints for
[seconds since the epoch](https://en.wikipedia.org/wiki/Unix_time) and strings for formatted
date/times. In this page we take a look at what some of the various options are
for processing datetimes and timezones in your data.
See also the [section on time-related
functions](reference-dsl-builtin-functions.md#time-functions) for
information auto-generated from Miller's online-help strings.
## Epoch seconds
[Seconds since the epoch](https://en.wikipedia.org/wiki/Unix_time), or _Unix
Time_, is seconds (positive, zero, or negative) since midnight January 1 1970
UTC. This representation has several advantages, and is quite common in the
computing world.
Since this is a [number](reference-main-arithmetic.md) in Miller -- 64-bit
signed integer or double-precision floating-point -- it can represent dates
billions of years into the past or future without worry of overflow. (There is
no [year-2038 problem](https://en.wikipedia.org/wiki/Year_2038_problem) here.)
Being numbers, epoch-seconds are easy to store in databases, communicate over
networks in binary format, etc. Another benefit of epoch-seconds is that
they're independent of timezone or daylight-savings time.
One minus is that, being just numbers, they're not particularly human-readable
-- hence the to-string and from-string functions described below. Another
caveat (not really a minus) is that _epoch milliseconds_, rather than epoch
seconds, are common in some contexts, particularly JavaScript. If you ever
(anywhere) see a timestamp for the year 49,000-something -- probably someone is
treating epoch-milliseconds as epoch-seconds.
GENMD-RUN-COMMAND
mlr -n put 'end {
print sec2gmt(1500000000);
print sec2gmt(1500000000000);
}'
GENMD-EOF
You can get the current system time, as epoch-seconds, using the
[systime](reference-dsl-builtin-functions.md#systime) DSL function:
GENMD-CARDIFY-HIGHLIGHT-ONE
mlr --c2p --from example.csv put '$t = systime()'
color shape flag k index quantity rate t
yellow triangle true 1 11 43.6498 9.8870 1634784588.045347
red square true 2 15 79.2778 0.0130 1634784588.045385
red circle true 3 16 13.8103 2.9010 1634784588.045386
red square false 4 48 77.5542 7.4670 1634784588.045393
purple triangle false 5 51 81.2290 8.5910 1634784588.045394
red square false 6 64 77.1991 9.5310 1634784588.045417
purple triangle false 7 65 80.1405 5.8240 1634784588.045418
yellow circle true 8 73 63.9785 4.2370 1634784588.045419
yellow circle true 9 87 63.5058 8.3350 1634784588.045421
purple square false 10 91 72.3735 8.2430 1634784588.045422
GENMD-EOF
The [systimeint](reference-dsl-builtin-functions.md#systimeint) DSL function
is nothing more than a keystroke-saver for `int(systime())`.
## UTC times with standard format
One way to make epoch-seconds human-readable, while maintaining some of their
benefits such as being independent of timezone and daylight savings, is to use
the [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) format. This was the
first (and initially only) human-readable date/time format supported by Miller
going all the way back to Miller 1.0.0.
You can get these from epoch-seconds using the
[sec2gmt](reference-dsl-builtin-functions.md#sec2gmt) DSL function.
(Note that the terms _UTC_ and _GMT_ are used interchangeably in Miller.)
We also have [sec2gmtdate](reference-dsl-builtin-functions.md#sec2gmtdate) DSL function.
GENMD-RUN-COMMAND
mlr -n put 'end {
print sec2gmt(0);
print sec2gmt(1234567890.123);
print sec2gmt(-1234567890.123);
print;
print sec2gmtdate(0);
print sec2gmtdate(1234567890.123);
print sec2gmtdate(-1234567890.123);
}'
GENMD-EOF
## Local times with standard format; specifying timezones
You can use similar formatting for dates in your preferred timezone, not just UTC/GMT.
We have the
[sec2localtime](reference-dsl-builtin-functions.md#sec2localtime),
[sec2localdate](reference-dsl-builtin-functions.md#sec2localdate), and
[localtime2sec](reference-dsl-builtin-functions.md#localtime2sec) DSL functions.
You can specify the timezone using any of the following:
* An environment variable, e.g. `export TZ=Asia/Istanbul` at your system prompt (`set TZ=Asia/Istanbul` in Windows).
* Using the `--tz` flag. This sets the `TZ` environment variable, but only internally to the `mlr` process.
* Within a DSL expression, you can assign to `ENV["TZ"]`.
* By supplying an additional argument to any of the functions with `local` in their names.
Regardless, if you specify an invalid timezone, you'll be clearly notified:
GENMD-RUN-COMMAND-TOLERATING-ERROR
mlr --from example.csv --tz This/Is/A/Typo cat
GENMD-EOF
GENMD-RUN-COMMAND
export TZ=Asia/Istanbul
mlr -n put 'end { print sec2localtime(0) }'
GENMD-EOF
GENMD-RUN-COMMAND
mlr --tz America/Sao_Paulo -n put 'end { print sec2localtime(0) }'
GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put 'end {
ENV["TZ"] = "Asia/Istanbul";
print sec2localtime(0);
print sec2localdate(0);
print localtime2sec("2000-01-02 03:04:05");
print;
ENV["TZ"] = "America/Sao_Paulo";
print sec2localtime(0);
print sec2localdate(0);
print localtime2sec("2000-01-02 03:04:05");
}'
GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put 'end {
print sec2localtime(0, 0, "Asia/Istanbul");
print sec2localdate(0, "Asia/Istanbul");
print localtime2sec("2000-01-02 03:04:05", "Asia/Istanbul");
print;
print sec2localtime(0, 0, "America/Sao_Paulo");
print sec2localdate(0, "America/Sao_Paulo");
print localtime2sec("2000-01-02 03:04:05", "America/Sao_Paulo");
}'
GENMD-EOF
Note that for local times, Miller omits the `T` and the `Z` you see in GMT times.
We also have the
[gmt2localtime](reference-dsl-builtin-functions.md#gmt2localtime) and
[localtime2gmt](reference-dsl-builtin-functions.md#localtime2gmt) convenience functions:
GENMD-RUN-COMMAND
mlr -n put 'end {
ENV["TZ"] = "Asia/Istanbul";
print gmt2localtime("1970-01-01T00:00:00Z");
print localtime2gmt("1970-01-01 00:00:00");
}'
GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put 'end {
print gmt2localtime("1970-01-01T00:00:00Z", "America/Sao_Paulo");
print gmt2localtime("1970-01-01T00:00:00Z", "Asia/Istanbul");
print localtime2gmt("1970-01-01 00:00:00", "America/Sao_Paulo");
print localtime2gmt("1970-01-01 00:00:00", "Asia/Istanbul");
}'
GENMD-EOF
## Custom formats: strptime and strftime
The to-string and from-string functions we've seen so far are low-keystroking:
with a little bit of typing you can convert datetimes to/from epoch seconds.
The minus, however, is flexibility. This is where the
[strftime](reference-dsl-builtin-functions.md#strftime) and
[strptime](reference-dsl-builtin-functions.md#strptime) functions come into play.
Notes:
* The names `strftime` and `strptime` far predate Miller; they were chosen for familiarity. The `f` is for _format_: from epoch-seconds to human-readable string. The `p` is for _parse_: for doing the reverse.
* Even though Miller is written in Go as of Miller 6, it still largely preserves [C-like](https://en.wikipedia.org/wiki/C_date_and_time_functions#strftime) `strftime` and `strptime` semantics. As noted below, not all format strings used by the C library are recognized.
* For `strftime`, this is thanks to [https://github.com/lestrrat-go/strftime](https://github.com/lestrrat-go/strftime), with a Miller-specific modification for fractional seconds.
* For `strftime`, this is thanks to [https://github.com/pbnjay/strptime](https://github.com/pbnjay/strptime), with Miller-specific modifications.
Available format strings for `strftime`, taken directly from [https://github.com/lestrrat-go/strftime](https://github.com/lestrrat-go/strftime) except for `%1..%9`, `%s`, `%N`, and `%O` which are Miller-specific additions:
| Pattern | Description |
|---------|-------------|
| `%A` | national representation of the full weekday name |
| `%a` | national representation of the abbreviated weekday |
| `%B` | national representation of the full month name |
| `%b` | national representation of the abbreviated month name |
| `%C` | (year / 100) as decimal number; single digits are preceded by a zero |
| `%c` | national representation of time and date |
| `%D` | equivalent to `%m/%d/%y` |
| `%d` | day of the month as a decimal number (01-31) |
| `%e` | the day of the month as a decimal number (1-31); single digits are preceded by a blank |
| `%F` | equivalent to `%Y-%m-%d` |
| `%H` | the hour (24-hour clock) as a decimal number (00-23) |
| `%h` | same as `%b` |
| `%I` | the hour (12-hour clock) as a decimal number (01-12) |
| `%j` | the day of the year as a decimal number (001-366) |
| `%k` | the hour (24-hour clock) as a decimal number (0-23); single digits are preceded by a blank |
| `%l` | the hour (12-hour clock) as a decimal number (1-12); single digits are preceded by a blank |
| `%M` | the minute as a decimal number (00-59) |
| `%m` | the month as a decimal number (01-12) |
| `%n` | a newline |
| `%N` | zero-padded nanoseconds |
| `%O` | non-zero-padded nanoseconds |
| `%p` | national representation of either "ante meridiem" (a.m.) or "post meridiem" (p.m.) as appropriate. |
| `%R` | equivalent to `%H:%M` |
| `%r` | equivalent to `%I:%M:%S %p` |
| `%s` | integer seconds since the epoch |
| `%S` | the second as a decimal number (00-60) |
| `%1S`, ..., `%9S` | the second as a decimal number (00-60) with 1..9 decimal places, respectively |
| `%T` | equivalent to `%H:%M:%S` |
| `%t` | a tab |
| `%U` | the week number of the year (Sunday as the first day of the week) as a decimal number (00-53) |
| `%u` | the weekday (Monday as the first day of the week) as a decimal number (1-7) |
| `%V` | the week number of the year (Monday as the first day of the week) as a decimal number (01-53) |
| `%v` | equivalent to `%e-%b-%Y` |
| `%W` | the week number of the year (Monday as the first day of the week) as a decimal number (00-53) |
| `%w` | the weekday (Sunday as the first day of the week) as a decimal number (0-6) |
| `%X` | national representation of the time |
| `%x` | national representation of the date |
| `%Y` | the year with century as a decimal number |
| `%y` | the year without century as a decimal number (00-99) |
| `%Z` | the time zone name |
| `%z` | the time zone offset from UTC |
| `%%` | a `%` |
Available format strings for `strptime`:
| Pattern | Description |
|---------|-------------|
| `%%` | A literal '%' character. |
| `%b` | Month as locales abbreviated name. |
| `%B` | Month as locales full name. |
| `%d` | Day of the month as a zero-padded decimal number. |
| `%f` | Microsecond as a decimal number, zero-padded on the left. |
| `%H` | Hour (24-hour clock) as a zero-padded decimal number. |
| `%I` | Hour (12-hour clock) as a zero-padded decimal number. |
| `%j` | Three-digit day of year, like 004 or 363. |
| `%m` | Month as a zero-padded decimal number. |
| `%M` | Minute as a zero-padded decimal number. |
| `%p` | Locales equivalent of either AM or PM. |
| `%S` | Second as a zero-padded decimal number. |
| `%y` | Year without century as a zero-padded decimal number. |
| `%Y` | Year with century as a decimal number. |
| `%z` | UTC offset in the form +HHMM or -HHMM. |
| `%Z` | Time zone name. UTC, EST, CST -- only if you're in that timezone. |
Examples:
GENMD-RUN-COMMAND
mlr -n put 'end {
print strftime(0, "%Y-%m-%dT%H:%M:%SZ");
print strftime(0, "%FT%TZ");
print strfntime(123, "%N");
print strfntime(123, "%O");
}'
GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put 'end {
ENV["TZ"] = "Asia/Istanbul";
print strftime(0, "%Y-%m-%d %H:%M:%S");
print strftime(0, "%Y-%m-%d %H:%M:%S %Z");
print strftime(0, "%Y-%m-%d %H:%M:%S %z");
print strftime(0, "%A, %B %e, %Y");
print strftime(123456789, "%I:%M %p");
}'
GENMD-EOF
Unfortunately, names from `%A` and `%B` are only available in English, as an artifact of a design
choice in the Go `time` library which Miller (and its `strftime` / `strptime` supporting packages as
noted above) rely on.
## A note on timezones
A note on timezones for `strptime`:
* Three-letter timezone names such as `CST` are recognized _only if you're in them_. (`UTC` is an exception.) This is because these aren't globally unique: `CST` can stand for `Central Standard Time`, `_Cuba Standard Time`, `_China Standard Time`, etc.
* Timezone specifiers which _are_ globally unique are of the form `-0400` and `+0500`.
* Specifiers like `-04:30`, `UTC-8`, and `Asia/Istanbul` were not supported in Miller 5 (which used the C `strptime` library), and are likewise not supported in Miller 6. See however the `TZ` environment-variable examples below.
* If you wish to match a final `Z` in the input, use a final `Z` in the format string. For example (see [ISO8601](https://en.wikipedia.org/wiki/ISO_8601)) you can match the timestamp `1970-01-01T00:00:00Z` using the format string `%FT%TZ`.
## Fractional seconds
For historical reasons, Miller's `strftime` and `strptime` use different format specifications for fractional seconds. Examples:
GENMD-RUN-COMMAND
mlr -n put 'end {
print strftime(123456.789, "%Y-%m-%d %H:%M:%S");
print strftime(123456.789, "%Y-%m-%d %H:%M:%1S");
print strftime(123456.789, "%Y-%m-%d %H:%M:%3S");
print strftime(123456.789, "%Y-%m-%d %H:%M:%6S");
print strptime("1970-01-02 10:17:36.789000", "%Y-%m-%d %H:%M:%S");
print strptime("1970-01-02 10:17:36.789000", "%Y-%m-%d %H:%M:%S.%f");
}'
GENMD-EOF
## strptime_local and strftime_local
We also have
[strftimelocal](reference-dsl-builtin-functions.md#strftimelocal) and
[strptimelocal](reference-dsl-builtin-functions.md#strptimelocal):
GENMD-RUN-COMMAND
mlr -n put 'end {
ENV["TZ"] = "America/Anchorage";
print strftime_local(0, "%Y-%m-%d %H:%M:%S %Z");
print strftime_local(0, "%Y-%m-%d %H:%M:%S %z");
print strftime_local(0, "%A, %B %e, %Y");
print strptime_local("2020-03-01 00:00:00", "%Y-%m-%d %H:%M:%S");
print;
ENV["TZ"] = "Asia/Hong_Kong";
print strftime_local(0, "%Y-%m-%d %H:%M:%S %Z");
print strftime_local(0, "%Y-%m-%d %H:%M:%S %z");
print strftime_local(0, "%A, %B %e, %Y");
print strptime_local("2020-03-01 00:00:00", "%Y-%m-%d %H:%M:%S");
}'
GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put 'end {
print strftime_local(0, "%Y-%m-%d %H:%M:%S %Z", "America/Anchorage");
print strftime_local(0, "%Y-%m-%d %H:%M:%S %z", "America/Anchorage");
print strftime_local(0, "%A, %B %e, %Y", "America/Anchorage");
print strptime_local("2020-03-01 00:00:00", "%Y-%m-%d %H:%M:%S", "America/Anchorage");
print;
print strftime_local(0, "%Y-%m-%d %H:%M:%S %Z", "Asia/Hong_Kong");
print strftime_local(0, "%Y-%m-%d %H:%M:%S %z", "Asia/Hong_Kong");
print strftime_local(0, "%A, %B %e, %Y", "Asia/Hong_Kong");
print strptime_local("2020-03-01 00:00:00", "%Y-%m-%d %H:%M:%S", "Asia/Hong_Kong");
}'
GENMD-EOF
## Relative times
You can get the seconds since the Miller process start using
[uptime](reference-dsl-builtin-functions.md#uptime):
GENMD-CARDIFY-HIGHLIGHT-ONE
mlr --c2p --from example.csv put '$u=uptime()'
color shape flag k index quantity rate u
yellow triangle true 1 11 43.6498 9.8870 0.0011110305786132812
red square true 2 15 79.2778 0.0130 0.0011241436004638672
red circle true 3 16 13.8103 2.9010 0.0011250972747802734
red square false 4 48 77.5542 7.4670 0.0011301040649414062
purple triangle false 5 51 81.2290 8.5910 0.0011301040649414062
red square false 6 64 77.1991 9.5310 0.002481222152709961
purple triangle false 7 65 80.1405 5.8240 0.0024831295013427734
yellow circle true 8 73 63.9785 4.2370 0.0024831295013427734
yellow circle true 9 87 63.5058 8.3350 0.0024852752685546875
purple square false 10 91 72.3735 8.2430 0.002485990524291992
GENMD-EOF
Time-differences can be done in seconds, of course; you can also use the following if you like:
GENMD-RUN-COMMAND
mlr -F | grep hms
GENMD-EOF
## References
* Non-Miller-specific list of formatting characters for `strftime` and `strptime`: [https://devhints.io/strftime](https://devhints.io/strftime)
* List of valid timezone names: [https://en.wikipedia.org/wiki/List_of_tz_database_time_zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)