Natural sort (#932)

* Add natural sort order as an option for the sort verb

* Add natural sort order as an option for the sort DSL function

* doc-build artifacts for on-line help

* webdocs

* codespell fix

* unit-test files for sort verb

* unit-test files for sort DSL function
This commit is contained in:
John Kerl 2022-02-08 00:35:28 -05:00 committed by GitHub
parent b3127ebcb5
commit ca9505dfaf
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
28 changed files with 341 additions and 63 deletions

View file

@ -1731,6 +1731,8 @@ VERBS
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.
Example:
@ -2496,10 +2498,17 @@ FUNCTIONS FOR FILTER/PUT
(class=math #args=1) Hyperbolic sine.
sort
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
Examples:
Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings.
Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase.
Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].
Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].
Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].
Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
splita
(class=conversion #args=2) Splits string into array with type inference. First argument is string to split; second is the separator to split on.
@ -3162,5 +3171,5 @@ SEE ALSO
2022-02-07 MILLER(1)
2022-02-08 MILLER(1)
</pre>

View file

@ -1710,6 +1710,8 @@ VERBS
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.
Example:
@ -2475,10 +2477,17 @@ FUNCTIONS FOR FILTER/PUT
(class=math #args=1) Hyperbolic sine.
sort
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
Examples:
Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings.
Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase.
Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].
Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].
Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].
Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
splita
(class=conversion #args=2) Splits string into array with type inference. First argument is string to split; second is the separator to split on.
@ -3141,4 +3150,4 @@ SEE ALSO
2022-02-07 MILLER(1)
2022-02-08 MILLER(1)

View file

@ -212,6 +212,8 @@ Options:
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.
Example:

View file

@ -671,10 +671,17 @@ Map example: select({"a":1, "b":3, "c":5}, func(k,v) {return v >= 3}) returns {"
### sort
<pre class="pre-non-highlight-non-pair">
sort (class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
sort (class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
Examples:
Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings.
Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase.
Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].
Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].
Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].
Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
</pre>
## Math functions

View file

@ -2803,6 +2803,8 @@ Options:
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.
Example:

View file

@ -24,10 +24,12 @@ Miller gives you three ways to sort your data:
## Sorting records: the sort verb
The `sort` verb (see [its documentation](reference-verbs.md#sort) for more
information) reorders entire records within the data stream. You can sort
lexically (with or without case-folding) or numerically, ascending or
descending; and you can sort primary by one column, then secondarily by
The `sort` verb (see [its documentation](reference-verbs.md#sort) for more information) reorders
entire records within the data stream. You can sort lexically (with or without case-folding),
numerically, or naturally (see
[https://en.wikipedia.org/wiki/Natural_sort_order](https://en.wikipedia.org/wiki/Natural_sort_order)
or [https://github.com/facette/natsort](https://github.com/facette/natsort) for more about natural
sorting); ascending or descending; and you can sort primarily by one column, then secondarily by
another, etc.
Input data:
@ -143,13 +145,13 @@ a b c
## The sort function by example
* It returns a sorted copy of an input array or map.
* Without second argument, uses the natural ordering.
* With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, and/or `"r"` for reverse/descending.
* Without second argument, uses Miller's default ordering which is numbers numerically, then strings lexically.
* With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, or `"t"` for natural sort order. An additional `"r"` in this string is for reverse/descending.
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort array with natural ordering</b>
<b> # Sort array with default ordering</b>
<b> print sort([5,2,3,1,4]);</b>
<b> }</b>
<b>'</b>
@ -161,7 +163,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort array with reverse-natural ordering</b>
<b> # Sort array with reverse-default ordering</b>
<b> print sort([5,2,3,1,4], "r");</b>
<b> }</b>
<b>'</b>
@ -173,7 +175,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort array with custom function: natural ordering</b>
<b> # Sort array with custom function: another way to get default ordering</b>
<b> print sort([5,2,3,1,4], func(a,b) { return a <=> b});</b>
<b> }</b>
<b>'</b>
@ -185,7 +187,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort array with custom function: reverse-natural ordering</b>
<b> # Sort array with custom function: another way to get reverse-default ordering</b>
<b> print sort([5,2,3,1,4], func(a,b) { return b <=> a});</b>
<b> }</b>
<b>'</b>
@ -197,7 +199,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort map with natural ordering on keys</b>
<b> # Sort map with default ordering on keys</b>
<b> print sort({"c":2, "a": 3, "b": 1});</b>
<b> }</b>
<b>'</b>
@ -213,7 +215,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort map with reverse-natural ordering on keys</b>
<b> # Sort map with reverse-default ordering on keys</b>
<b> print sort({"c":2, "a": 3, "b": 1}, "r");</b>
<b> }</b>
<b>'</b>
@ -229,7 +231,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort map with custom function: natural ordering on values</b>
<b> # Sort map with custom function: default ordering on values</b>
<b> print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return av <=> bv});</b>
<b> }</b>
<b>'</b>
@ -245,7 +247,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort map with custom function: reverse-natural ordering on values</b>
<b> # Sort map with custom function: reverse-default ordering on values</b>
<b> print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return bv <=> av});</b>
<b> }</b>
<b>'</b>
@ -258,6 +260,18 @@ a b c
}
</pre>
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Natural sort</b>
<b> print sort(["a1","a10","a100","a2","a20","a200"], "t");</b>
<b> }</b>
<b>'</b>
</pre>
<pre class="pre-non-highlight-in-pair">
["a1", "a2", "a10", "a20", "a100", "a200"]
</pre>
In the rest of this page we'll look more closely at these variants.
## Simple sorting of arrays

View file

@ -8,10 +8,12 @@ Miller gives you three ways to sort your data:
## Sorting records: the sort verb
The `sort` verb (see [its documentation](reference-verbs.md#sort) for more
information) reorders entire records within the data stream. You can sort
lexically (with or without case-folding) or numerically, ascending or
descending; and you can sort primary by one column, then secondarily by
The `sort` verb (see [its documentation](reference-verbs.md#sort) for more information) reorders
entire records within the data stream. You can sort lexically (with or without case-folding),
numerically, or naturally (see
[https://en.wikipedia.org/wiki/Natural_sort_order](https://en.wikipedia.org/wiki/Natural_sort_order)
or [https://github.com/facette/natsort](https://github.com/facette/natsort) for more about natural
sorting); ascending or descending; and you can sort primarily by one column, then secondarily by
another, etc.
Input data:
@ -55,13 +57,13 @@ GENMD-EOF
## The sort function by example
* It returns a sorted copy of an input array or map.
* Without second argument, uses the natural ordering.
* With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, and/or `"r"` for reverse/descending.
* Without second argument, uses Miller's default ordering which is numbers numerically, then strings lexically.
* With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, or `"t"` for natural sort order. An additional `"r"` in this string is for reverse/descending.
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort array with natural ordering
# Sort array with default ordering
print sort([5,2,3,1,4]);
}
'
@ -70,7 +72,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort array with reverse-natural ordering
# Sort array with reverse-default ordering
print sort([5,2,3,1,4], "r");
}
'
@ -79,7 +81,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort array with custom function: natural ordering
# Sort array with custom function: another way to get default ordering
print sort([5,2,3,1,4], func(a,b) { return a <=> b});
}
'
@ -88,7 +90,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort array with custom function: reverse-natural ordering
# Sort array with custom function: another way to get reverse-default ordering
print sort([5,2,3,1,4], func(a,b) { return b <=> a});
}
'
@ -97,7 +99,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort map with natural ordering on keys
# Sort map with default ordering on keys
print sort({"c":2, "a": 3, "b": 1});
}
'
@ -106,7 +108,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort map with reverse-natural ordering on keys
# Sort map with reverse-default ordering on keys
print sort({"c":2, "a": 3, "b": 1}, "r");
}
'
@ -115,7 +117,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort map with custom function: natural ordering on values
# Sort map with custom function: default ordering on values
print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return av <=> bv});
}
'
@ -124,12 +126,21 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort map with custom function: reverse-natural ordering on values
# Sort map with custom function: reverse-default ordering on values
print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return bv <=> av});
}
'
GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Natural sort
print sort(["a1","a10","a100","a2","a20","a200"], "t");
}
'
GENMD-EOF
In the rest of this page we'll look more closely at these variants.
## Simple sorting of arrays

1
go.mod
View file

@ -17,6 +17,7 @@ module github.com/johnkerl/miller
go 1.15
require (
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb // indirect
github.com/goccmack/gocc v0.0.0-20211213154817-7ea699349eca // indirect
github.com/johnkerl/lumin v1.0.0 // indirect
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51

2
go.sum
View file

@ -1,5 +1,7 @@
github.com/davecgh/go-spew v1.1.0 h1:ZDRjVQ15GmhC3fiQ8ni8+OwkZQO4DARzQgrnXU1Liz8=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb h1:IT4JYU7k4ikYg1SCxNI1/Tieq/NFvh6dzLdgi7eu0tM=
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb/go.mod h1:bH6Xx7IW64qjjJq8M2u4dxNaBiDfKK+z/3eGDpXEQhc=
github.com/goccmack/gocc v0.0.0-20211213154817-7ea699349eca h1:NuA6w6b01Ojdig+4K1l9p4Pp3unlv4owphbOiENm8m4=
github.com/goccmack/gocc v0.0.0-20211213154817-7ea699349eca/go.mod h1:c4Mb67Mg9+pl6OlxvnFBUiiQOSlXfh0QukINLl54OD0=
github.com/johnkerl/lumin v1.0.0 h1:CV34cHZOJ92Y02RbQ0rd4gA0C06Qck9q8blOyaPoWpU=

View file

@ -1792,15 +1792,23 @@ key and value, and map-element key and value; it should return the updated accum
name: "sort",
class: FUNC_CLASS_HOFS,
help: `Given a map or array as first argument and string flags or function as optional second argument,
returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by
map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for
natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second
argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a <
b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again
returning < 0, 0, or > 0, using a and b's keys and values.`,
returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and
then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can
contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural
sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for
arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively;
for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or
> 0, using a and b's keys and values.`,
examples: []string{
`Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].`,
`Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.`,
`Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].`,
` Note that this is numbers before strings.`,
`Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].`,
` Note that this is uppercase before lowercase.`,
`Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].`,
`Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].`,
`Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].`,
`Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].`,
`Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.`,
},
variadicFuncWithState: SortHOF,
minimumVariadicArity: 1,

View file

@ -12,6 +12,8 @@ import (
"strconv"
"strings"
"github.com/facette/natsort"
"github.com/johnkerl/miller/internal/pkg/lib"
"github.com/johnkerl/miller/internal/pkg/mlrval"
"github.com/johnkerl/miller/internal/pkg/runtime"
@ -565,6 +567,7 @@ const (
sortTypeLexical tSortType = iota
sortTypeCaseFold
sortTypeNumerical
sortTypeNatural
)
// decodeSortFlags maps strings like "cr" in the second argument to sort
@ -580,6 +583,8 @@ func decodeSortFlags(flags string) (tSortType, bool) {
sortType = sortTypeLexical
case 'c':
sortType = sortTypeCaseFold
case 't':
sortType = sortTypeNatural
case 'r':
reverse = true
}
@ -608,6 +613,8 @@ func sortA(
sortALexical(a, reverse)
case sortTypeCaseFold:
sortACaseFold(a, reverse)
case sortTypeNatural:
sortANatural(a, reverse)
}
return output
@ -649,6 +656,18 @@ func sortACaseFold(array []*mlrval.Mlrval, reverse bool) {
}
}
func sortANatural(array []*mlrval.Mlrval, reverse bool) {
if !reverse {
sort.Slice(array, func(i, j int) bool {
return natsort.Compare(array[i].String(), array[j].String())
})
} else {
sort.Slice(array, func(i, j int) bool {
return natsort.Compare(array[j].String(), array[i].String())
})
}
}
// sortA implements sort on map, with string flags rather than callback UDF.
func sortMK(
input1 *mlrval.Mlrval,
@ -680,6 +699,8 @@ func sortMK(
sortMKLexical(keys, reverse)
case sortTypeCaseFold:
sortMKCaseFold(keys, reverse)
case sortTypeNatural:
sortMKNatural(keys, reverse)
}
// Make a new map with keys in the new sort order.
@ -741,6 +762,18 @@ func sortMKCaseFold(array []string, reverse bool) {
}
}
func sortMKNatural(array []string, reverse bool) {
if !reverse {
sort.Slice(array, func(i, j int) bool {
return natsort.Compare(strings.ToLower(array[i]), strings.ToLower(array[j]))
})
} else {
sort.Slice(array, func(i, j int) bool {
return natsort.Compare(strings.ToLower(array[j]), strings.ToLower(array[i]))
})
}
}
// sortAF implements sort on arrays with callback UDF.
func sortAF(
input1 *mlrval.Mlrval,

View file

@ -18,6 +18,10 @@ import (
)
type CmpFuncBool func(input1, input2 *Mlrval) bool
// The Go sort API is just a bool a<b, not triple a<b, a==b, a>b. Miller does the latter since when
// we sort primarily on field 1, then secondarily on field 2, etc., we need to be able to detect
// ties on field 1 so we can know whether to compare on field 2 or not.
type CmpFuncInt func(input1, input2 *Mlrval) int // -1, 0, 1 for <=>
// ----------------------------------------------------------------

View file

@ -15,6 +15,8 @@ package mlrval
import (
"strings"
"github.com/facette/natsort"
)
// LexicalAscendingComparator is for lexical sort: it stringifies
@ -63,13 +65,6 @@ func CaseFoldDescendingComparator(input1 *Mlrval, input2 *Mlrval) int {
return CaseFoldAscendingComparator(input2, input1)
}
// TODO
//func _xcmp(input1, input2 *Mlrval) int {
// fmt.Fprintf(os.Stderr, "mlr: functions cannot be sorted.\n")
// os.Exit(1)
// return 0
//}
func NumericAscendingComparator(input1 *Mlrval, input2 *Mlrval) int {
return Cmp(input1, input2)
}
@ -79,3 +74,19 @@ func NumericAscendingComparator(input1 *Mlrval, input2 *Mlrval) int {
func NumericDescendingComparator(input1 *Mlrval, input2 *Mlrval) int {
return -Cmp(input1, input2)
}
func NaturalAscendingComparator(input1, input2 *Mlrval) int {
sa := input1.String()
sb := input2.String()
if sa == sb {
return 0
} else if natsort.Compare(input1.String(), input2.String()) {
return 1
} else {
return -1
}
}
func NaturalDescendingComparator(input1, input2 *Mlrval) int {
return NaturalAscendingComparator(input2, input1)
}

View file

@ -84,6 +84,8 @@ func transformerSortUsage(
fmt.Fprintf(o, "-n {comma-separated field names} Numerical ascending; nulls sort last\n")
fmt.Fprintf(o, "-nf {comma-separated field names} Same as -n\n")
fmt.Fprintf(o, "-nr {comma-separated field names} Numerical descending; nulls sort first\n")
fmt.Fprintf(o, "-t {comma-separated field names} Natural ascending\n")
fmt.Fprintf(o, "-tr {comma-separated field names} Natural descending\n")
fmt.Fprintf(o, "-h|--help Show this message.\n")
fmt.Fprintf(o, "\n")
fmt.Fprintf(o, "Example:\n")
@ -151,6 +153,25 @@ func transformerSortParseCLI(
}
}
} else if opt == "-t" {
// See comments over "-n" -- similar hack.
if args[argi] == "-r" {
// Treat like "-tr"
argi++
subList := cli.VerbGetStringArrayArgOrDie(verb, "-tr", args, &argi, argc)
for _, item := range subList {
groupByFieldNames = append(groupByFieldNames, item)
comparatorFuncs = append(comparatorFuncs, mlrval.NaturalAscendingComparator)
}
} else {
subList := cli.VerbGetStringArrayArgOrDie(verb, opt, args, &argi, argc)
for _, item := range subList {
groupByFieldNames = append(groupByFieldNames, item)
comparatorFuncs = append(comparatorFuncs, mlrval.NaturalDescendingComparator)
}
}
} else if opt == "-r" {
subList := cli.VerbGetStringArrayArgOrDie(verb, opt, args, &argi, argc)
for _, item := range subList {

View file

@ -1710,6 +1710,8 @@ VERBS
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.
Example:
@ -2475,10 +2477,17 @@ FUNCTIONS FOR FILTER/PUT
(class=math #args=1) Hyperbolic sine.
sort
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
Examples:
Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings.
Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase.
Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].
Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].
Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].
Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
splita
(class=conversion #args=2) Splits string into array with type inference. First argument is string to split; second is the separator to split on.
@ -3141,4 +3150,4 @@ SEE ALSO
2022-02-07 MILLER(1)
2022-02-08 MILLER(1)

View file

@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2022-02-07
.\" Date: 2022-02-08
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2022-02-07" "\ \&" "\ \&"
.TH "MILLER" "1" "2022-02-08" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -2155,6 +2155,8 @@ Options:
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.
Example:
@ -3816,10 +3818,17 @@ Map example: select({"a":1, "b":3, "c":5}, func(k,v) {return v >= 3}) returns {"
.RS 0
.\}
.nf
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
Examples:
Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings.
Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase.
Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].
Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].
Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].
Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
.fi
.if n \{\
.RE

View file

@ -914,6 +914,8 @@ Options:
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.
Example:

View file

@ -0,0 +1 @@
mlr --c2j -n put -f ${CASEDIR}/mlr

View file

@ -0,0 +1,6 @@
["X1", "X10", "X100", "X2", "X20", "X200"]
["X200", "X20", "X2", "X100", "X10", "X1"]
["X1", "X2", "X10", "X20", "X100", "X200"]
["X200", "X100", "X20", "X10", "X2", "X1"]
[
]

View file

@ -0,0 +1,7 @@
end {
a = ["X1", "X10", "X100", "X2", "X20", "X200"];
print sort(a);
print sort(a, "r");
print sort(a, "t");
print sort(a, "tr");
}

View file

@ -0,0 +1 @@
mlr --csv sort -t name test/input/natural-sort.csv

View file

View file

@ -0,0 +1,36 @@
n,name
2,10X Radonius
4,20X Radonius
5,20X Radonius Prime
6,30X Radonius
7,40X Radonius
3,200X Radonius
1,1000X Radonius Maximus
12,Allegia 6R Clasteron
8,Allegia 50 Clasteron
10,Allegia 50B Clasteron
11,Allegia 51 Clasteron
9,Allegia 500 Clasteron
14,Alpha 2
16,Alpha 2A
18,Alpha 2A-900
17,Alpha 2A-8000
13,Alpha 100
15,Alpha 200
19,Callisto Morphamax
20,Callisto Morphamax 500
22,Callisto Morphamax 600
25,Callisto Morphamax 700
21,Callisto Morphamax 5000
23,Callisto Morphamax 6000 SE
24,Callisto Morphamax 6000 SE2
26,Callisto Morphamax 7000
31,Xiph Xlater 5
30,Xiph Xlater 40
32,Xiph Xlater 50
35,Xiph Xlater 58
29,Xiph Xlater 300
33,Xiph Xlater 500
28,Xiph Xlater 2000
34,Xiph Xlater 5000
27,Xiph Xlater 10000

View file

@ -0,0 +1 @@
mlr --csv sort -tr name test/input/natural-sort.csv

View file

View file

@ -0,0 +1,36 @@
n,name
27,Xiph Xlater 10000
34,Xiph Xlater 5000
28,Xiph Xlater 2000
33,Xiph Xlater 500
29,Xiph Xlater 300
35,Xiph Xlater 58
32,Xiph Xlater 50
30,Xiph Xlater 40
31,Xiph Xlater 5
26,Callisto Morphamax 7000
24,Callisto Morphamax 6000 SE2
23,Callisto Morphamax 6000 SE
21,Callisto Morphamax 5000
25,Callisto Morphamax 700
22,Callisto Morphamax 600
20,Callisto Morphamax 500
19,Callisto Morphamax
15,Alpha 200
13,Alpha 100
17,Alpha 2A-8000
18,Alpha 2A-900
16,Alpha 2A
14,Alpha 2
9,Allegia 500 Clasteron
11,Allegia 51 Clasteron
10,Allegia 50B Clasteron
8,Allegia 50 Clasteron
12,Allegia 6R Clasteron
1,1000X Radonius Maximus
3,200X Radonius
7,40X Radonius
6,30X Radonius
5,20X Radonius Prime
4,20X Radonius
2,10X Radonius

View file

@ -0,0 +1,36 @@
n,name
1,1000X Radonius Maximus
2,10X Radonius
3,200X Radonius
4,20X Radonius
5,20X Radonius Prime
6,30X Radonius
7,40X Radonius
8,Allegia 50 Clasteron
9,Allegia 500 Clasteron
10,Allegia 50B Clasteron
11,Allegia 51 Clasteron
12,Allegia 6R Clasteron
13,Alpha 100
14,Alpha 2
15,Alpha 200
16,Alpha 2A
17,Alpha 2A-8000
18,Alpha 2A-900
19,Callisto Morphamax
20,Callisto Morphamax 500
21,Callisto Morphamax 5000
22,Callisto Morphamax 600
23,Callisto Morphamax 6000 SE
24,Callisto Morphamax 6000 SE2
25,Callisto Morphamax 700
26,Callisto Morphamax 7000
27,Xiph Xlater 10000
28,Xiph Xlater 2000
29,Xiph Xlater 300
30,Xiph Xlater 40
31,Xiph Xlater 5
32,Xiph Xlater 50
33,Xiph Xlater 500
34,Xiph Xlater 5000
35,Xiph Xlater 58
1 n name
2 1 1000X Radonius Maximus
3 2 10X Radonius
4 3 200X Radonius
5 4 20X Radonius
6 5 20X Radonius Prime
7 6 30X Radonius
8 7 40X Radonius
9 8 Allegia 50 Clasteron
10 9 Allegia 500 Clasteron
11 10 Allegia 50B Clasteron
12 11 Allegia 51 Clasteron
13 12 Allegia 6R Clasteron
14 13 Alpha 100
15 14 Alpha 2
16 15 Alpha 200
17 16 Alpha 2A
18 17 Alpha 2A-8000
19 18 Alpha 2A-900
20 19 Callisto Morphamax
21 20 Callisto Morphamax 500
22 21 Callisto Morphamax 5000
23 22 Callisto Morphamax 600
24 23 Callisto Morphamax 6000 SE
25 24 Callisto Morphamax 6000 SE2
26 25 Callisto Morphamax 700
27 26 Callisto Morphamax 7000
28 27 Xiph Xlater 10000
29 28 Xiph Xlater 2000
30 29 Xiph Xlater 300
31 30 Xiph Xlater 40
32 31 Xiph Xlater 5
33 32 Xiph Xlater 50
34 33 Xiph Xlater 500
35 34 Xiph Xlater 5000
36 35 Xiph Xlater 58