3 KiB
Miller lets you use regular expressions (of type POSIX.2) in the following contexts:
-
In
mlr filterwith=~or!=~, e.g.mlr filter '$url =~ "http.*com"' -
In
mlr putwithsuborgsub, e.g.mlr put '$url = sub($url, "http.*com", "")' -
In
mlr having-fields, e.g.mlr having-fields --any-matching '^sda[0-9]' -
In
mlr cut, e.g.mlr cut -r -f '^status$,^sda[0-9]' -
In
mlr rename, e.g.mlr rename -r '^(sda[0-9]).*$,dev/\1' -
In
mlr grep, e.g.mlr --csv grep 00188555487 myfiles*.csv
Points demonstrated by the above examples:
-
There are no implicit start-of-string or end-of-string anchors; please use
^and/or$explicitly. -
Miller regexes are wrapped with double quotes rather than slashes.
-
The
iafter the ending double quote indicates a case-insensitive regex. -
Capture groups are wrapped with
(...)rather than\(...\); use\(and\)to match against parentheses.
Example:
cat data/regex-in-data.dat
name=jane,regex=^j.*e$ name=bill,regex=^b[ou]ll$ name=bull,regex=^b[ou]ll$
mlr filter '$name =~ $regex' data/regex-in-data.dat
name=jane,regex=^j.*e$ name=bull,regex=^b[ou]ll$
Regex captures
Regex captures of the form \0 through \9 are supported as
- Captures have in-function context for
subandgsub. For example, the first\1,\2pair belong to the firstsuband the second\1,\2pair belong to the secondsub:
mlr put '$b = sub($a, "(..)_(...)", "\2-\1"); $c = sub($a, "(..)_(.)(..)", ":\1:\2:\3")'
- Captures endure for the entirety of a
putfor the=~and!=~operators. For example, here the\1,\2are set by the=~operator and are used by both subsequent assignment statements:
mlr put '$a =~ "(..)_(....); $b = "left_\1"; $c = "right_\2"'
- The captures are not retained across multiple puts. For example, here the
\1,\2won't be expanded from the regex capture:
mlr put '$a =~ "(..)_(....)' then {... something else ...} then put '$b = "left_\1"; $c = "right_\2"'
- Up to nine matches are supported:
\1through\9, while\0is the entire match string;\15is treated as\1followed by an unrelated5.