diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml index e5cfb1704..8b6cb20ec 100644 --- a/.github/workflows/codespell.yml +++ b/.github/workflows/codespell.yml @@ -33,7 +33,4 @@ jobs: with: check_filenames: true ignore_words_file: .codespellignore - # ignore_words_list: denom,inout,iput,nd,nin,numer,te,wee - # There is a word "RO" in docs/src/shapes-of-data.md.in and docs/src/shapes-of-data.md - # which is listed in .codespellignore but which codespell refuses to ignore. Not sure why. skip: "*.csv,*.dkvp,*.txt,*.js,*.html,*.map,./tags,./test/cases,./docs/src/shapes-of-data.md.in,./docs/src/shapes-of-data.md" diff --git a/docs/src/data/colours.csv b/docs/src/data/colours.csv index f6dbe24aa..e0ce75494 100644 --- a/docs/src/data/colours.csv +++ b/docs/src/data/colours.csv @@ -1,3 +1,3 @@ -KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR +KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah diff --git a/docs/src/shapes-of-data.md b/docs/src/shapes-of-data.md index 75ead4426..bab58b7f0 100644 --- a/docs/src/shapes-of-data.md +++ b/docs/src/shapes-of-data.md @@ -36,7 +36,7 @@ Use the `file` command to see if there are CR/LF terminators (in this case, ther file data/colours.csv
-data/colours.csv: UTF-8 Unicode text
+data/colours.csv: Unicode text, UTF-8 text
 
Look at the file to find names of fields: @@ -45,18 +45,15 @@ Look at the file to find names of fields: cat data/colours.csv
-KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR
-masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Witter;Biały;Alb;Beyaz
+KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR
+masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
 masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
 
Extract a few fields: -
-mlr --csv cut -f KEY,PL,RO data/colours.csv 
-
-
-(only blank lines appear)
+
+mlr --csv cut -f KEY,PL,TO data/colours.csv 
 
Use XTAB output format to get a sharper picture of where records/fields are being split: @@ -65,12 +62,12 @@ Use XTAB output format to get a sharper picture of where records/fields are bein mlr --icsv --oxtab cat data/colours.csv
-KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Witter;Biały;Alb;Beyaz
+KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
 
-KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
+KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
 
-Using XTAB output format makes it clearer that `KEY;DE;...;RO;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`): +Using XTAB output format makes it clearer that `KEY;DE;...;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`):
 mlr --icsv --ifs semicolon --oxtab cat data/colours.csv 
@@ -83,9 +80,9 @@ ES  Blanco
 FI  Valkoinen
 FR  Blanc
 IT  Bianco
-NL  Witter
+NL  Wit
 PL  Biały
-RO  Alb
+TO  Alb
 TR  Beyaz
 
 KEY masterdata_colourcode_2
@@ -97,17 +94,17 @@ FR  Noir
 IT  Nero
 NL  Zwart
 PL  Czarny
-RO  Negru
+TO  Negru
 TR  Siyah
 
Using the new field-separator, retry the cut:
-mlr --csv --fs semicolon cut -f KEY,PL,RO data/colours.csv 
+mlr --csv --fs semicolon cut -f KEY,PL,TO data/colours.csv 
 
-KEY;PL;RO
+KEY;PL;TO
 masterdata_colourcode_1;Biały;Alb
 masterdata_colourcode_2;Czarny;Negru
 
diff --git a/docs/src/shapes-of-data.md.in b/docs/src/shapes-of-data.md.in index b54719a1f..c32b0dad1 100644 --- a/docs/src/shapes-of-data.md.in +++ b/docs/src/shapes-of-data.md.in @@ -18,35 +18,34 @@ Use the `file` command to see if there are CR/LF terminators (in this case, ther GENMD-CARDIFY-HIGHLIGHT-ONE file data/colours.csv -data/colours.csv: UTF-8 Unicode text +data/colours.csv: Unicode text, UTF-8 text GENMD-EOF Look at the file to find names of fields: GENMD-CARDIFY-HIGHLIGHT-ONE cat data/colours.csv -KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR -masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Witter;Biały;Alb;Beyaz +KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR +masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah GENMD-EOF Extract a few fields: GENMD-CARDIFY-HIGHLIGHT-ONE -mlr --csv cut -f KEY,PL,RO data/colours.csv -(only blank lines appear) +mlr --csv cut -f KEY,PL,TO data/colours.csv GENMD-EOF Use XTAB output format to get a sharper picture of where records/fields are being split: GENMD-CARDIFY-HIGHLIGHT-ONE mlr --icsv --oxtab cat data/colours.csv -KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Witter;Biały;Alb;Beyaz +KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz -KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah +KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah GENMD-EOF -Using XTAB output format makes it clearer that `KEY;DE;...;RO;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`): +Using XTAB output format makes it clearer that `KEY;DE;...;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`): GENMD-CARDIFY-HIGHLIGHT-ONE mlr --icsv --ifs semicolon --oxtab cat data/colours.csv @@ -57,9 +56,9 @@ ES Blanco FI Valkoinen FR Blanc IT Bianco -NL Witter +NL Wit PL Biały -RO Alb +TO Alb TR Beyaz KEY masterdata_colourcode_2 @@ -71,15 +70,15 @@ FR Noir IT Nero NL Zwart PL Czarny -RO Negru +TO Negru TR Siyah GENMD-EOF Using the new field-separator, retry the cut: GENMD-CARDIFY-HIGHLIGHT-ONE -mlr --csv --fs semicolon cut -f KEY,PL,RO data/colours.csv -KEY;PL;RO +mlr --csv --fs semicolon cut -f KEY,PL,TO data/colours.csv +KEY;PL;TO masterdata_colourcode_1;Biały;Alb masterdata_colourcode_2;Czarny;Negru GENMD-EOF