mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
data-sharing doc page: DKVP in Ruby and Python
This commit is contained in:
parent
2e2c348091
commit
20dc9f0151
32 changed files with 739 additions and 0 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -40,6 +40,7 @@ catc0
|
|||
catm
|
||||
gmon.out
|
||||
*.o
|
||||
*.pyc
|
||||
.swp
|
||||
.swo
|
||||
.*.swp
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
58
doc/content-for-data-sharing.html
Normal file
58
doc/content-for-data-sharing.html
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
POKI_PUT_TOC_HERE
|
||||
|
||||
<p/>
|
||||
<button style="font-weight:bold;color:maroon;border:0" onclick="expand_all();" href="javascript:;">Expand all sections</button>
|
||||
<button style="font-weight:bold;color:maroon;border:0" onclick="collapse_all();" href="javascript:;">Collapse all sections</button>
|
||||
|
||||
<p/> As discussed in the section on
|
||||
POKI_PUT_LINK_FOR_PAGE(file-formats.html)HERE, Miller supports several
|
||||
different file formats. Different tools are good at different things, so
|
||||
it’s important to be able to move data into and out of other languages.
|
||||
CSV and JSON are well-known, of course; here are some examples using DKVP
|
||||
format, with Ruby and Python.
|
||||
|
||||
|
||||
<h1>DKVP I/O in Python</h1>
|
||||
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_dkvp_python');" href="javascript:;">Toggle section visibility</button>
|
||||
<div id="section_toggle_dkvp_python" style="display: block">
|
||||
|
||||
<p/>
|
||||
Here are the I/O routines:
|
||||
|
||||
POKI_INCLUDE_ESCAPED(polyglot-dkvp-io/dkvp_io.py)HERE
|
||||
|
||||
And here is an example using them:
|
||||
|
||||
POKI_RUN_COMMAND{{cat polyglot-dkvp-io/example.py}}HERE
|
||||
|
||||
Run as-is:
|
||||
|
||||
POKI_RUN_COMMAND{{python polyglot-dkvp-io/example.py < data/small}}HERE
|
||||
|
||||
Run as-is, then pipe to Miller for pretty-printing:
|
||||
|
||||
POKI_RUN_COMMAND{{python polyglot-dkvp-io/example.py < data/small | mlr --opprint cat}}HERE
|
||||
|
||||
</div>
|
||||
<h1>DKVP I/O in Ruby</h1>
|
||||
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_dkvp_ruby');" href="javascript:;">Toggle section visibility</button>
|
||||
<div id="section_toggle_dkvp_ruby" style="display: block">
|
||||
|
||||
<p/>
|
||||
Here are the I/O routines:
|
||||
|
||||
POKI_INCLUDE_ESCAPED(polyglot-dkvp-io/dkvp_io.rb)HERE
|
||||
|
||||
And here is an example using them:
|
||||
|
||||
POKI_RUN_COMMAND{{cat polyglot-dkvp-io/example.rb}}HERE
|
||||
|
||||
Run as-is:
|
||||
|
||||
POKI_RUN_COMMAND{{ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small}}HERE
|
||||
|
||||
Run as-is, then pipe to Miller for pretty-printing:
|
||||
|
||||
POKI_RUN_COMMAND{{ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small | mlr --opprint cat}}HERE
|
||||
|
||||
</div>
|
||||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html"><b>Cookbook part 1</b></a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html"><b>Cookbook part 2</b></a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html"><b>Cookbook part 3</b></a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
490
doc/data-sharing.html
Normal file
490
doc/data-sharing.html
Normal file
|
|
@ -0,0 +1,490 @@
|
|||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||||
<html lang="en">
|
||||
|
||||
<!-- PAGE GENERATED FROM template.html and content-for-data-sharing.html BY poki. -->
|
||||
<!-- PLEASE MAKE CHANGES THERE AND THEN RE-RUN poki. -->
|
||||
<head>
|
||||
<meta http-equiv="Content-type" content="text/html;charset=UTF-8"/>
|
||||
<meta name="description" content="Miller documentation"/>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0"/> <!-- mobile-friendly -->
|
||||
<meta name="keywords"
|
||||
content="John Kerl, Kerl, Miller, miller, mlr, OLAP, data analysis software, regression, correlation, variance, data tools, " />
|
||||
|
||||
<title> Sharing data with other languages </title>
|
||||
<link rel="stylesheet" type="text/css" href="css/miller.css"/>
|
||||
<link rel="stylesheet" type="text/css" href="css/poki-callbacks.css"/>
|
||||
</head>
|
||||
|
||||
<!-- ================================================================ -->
|
||||
<script type="text/javascript">
|
||||
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
|
||||
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
|
||||
</script>
|
||||
<script type="text/javascript">
|
||||
try {
|
||||
var pageTracker = _gat._getTracker("UA-15651652-1");
|
||||
pageTracker._trackPageview();
|
||||
} catch(err) {}
|
||||
</script>
|
||||
|
||||
<!-- ================================================================ -->
|
||||
<script type="text/javascript">
|
||||
function toggle_div(div) {
|
||||
if (div != null) {
|
||||
if (div.id.startsWith("section_toggle_")) {
|
||||
var state = div.style.display;
|
||||
if (state == "block") {
|
||||
div.style.display = "none";
|
||||
} else {
|
||||
div.style.display = "block";
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
function expand_div(div) {
|
||||
if (div != null) {
|
||||
if (div.id.startsWith("section_toggle_")) {
|
||||
div.style.display = "block";
|
||||
}
|
||||
}
|
||||
}
|
||||
function collapse_div(div) {
|
||||
if (div != null) {
|
||||
if (div.id.startsWith("section_toggle_")) {
|
||||
div.style.display = "none";
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function toggle_by_name(divName) {
|
||||
toggle_div(document.getElementById(divName));
|
||||
}
|
||||
function expand_by_name(divName) {
|
||||
expand_div(document.getElementById(divName));
|
||||
}
|
||||
function collapse_by_name(divName) {
|
||||
collapse_div(document.getElementById(divName));
|
||||
}
|
||||
|
||||
function expand_all() {
|
||||
var divs = document.getElementsByTagName("div");
|
||||
for(var i = 0; i < divs.length; i++) {
|
||||
expand_div(divs[i]);
|
||||
}
|
||||
}
|
||||
function collapse_all() {
|
||||
var divs = document.getElementsByTagName("div");
|
||||
for(var i = 0; i < divs.length; i++){
|
||||
collapse_div(divs[i]);
|
||||
}
|
||||
}
|
||||
</script>
|
||||
|
||||
<!--
|
||||
The background image is from a screenshot of a Google search for "data analysis
|
||||
tools", lightened and sepia-toned. Over this was placed a Mac Terminal app with
|
||||
very light-grey font and translucent background, in which a few statistical
|
||||
Miller commands were run with pretty-print-tabular output format.
|
||||
<body background="pix/sepia-overlay.jpg">
|
||||
-->
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<!-- ================================================================ -->
|
||||
<table width="100%">
|
||||
<tr>
|
||||
|
||||
<!-- navbar -->
|
||||
<td width="15%">
|
||||
<!--
|
||||
<img src="pix/mlr.jpg" />
|
||||
<img style="border-width:1px; color:black;" src="pix/mlr.jpg" />
|
||||
-->
|
||||
|
||||
<div class="pokinav">
|
||||
<center><titleinbody>Miller</titleinbody></center>
|
||||
|
||||
<!-- PAGE LIST GENERATED FROM template.html BY poki -->
|
||||
<br/><b>Overview:</b>
|
||||
<br/>• <a href="index.html">About Miller</a>
|
||||
<br/>• <a href="10-min.html">Miller in 10 minutes</a>
|
||||
<br/>• <a href="file-formats.html">File formats</a>
|
||||
<br/>• <a href="feature-comparison.html">Miller features in the context of the Unix toolkit</a>
|
||||
<br/>• <a href="record-heterogeneity.html">Record-heterogeneity</a>
|
||||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html"><b>Sharing data with other languages</b></a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
<br/>• <a href="data-examples.html">Data-diving examples</a>
|
||||
<br/>• <a href="manpage.html">Manpage</a>
|
||||
<br/>• <a href="reference.html">Reference</a>
|
||||
<br/>• <a href="reference-verbs.html">Reference: Verbs</a>
|
||||
<br/>• <a href="reference-dsl.html">Reference: DSL</a>
|
||||
<br/>• <a href="release-docs.html">Documents by release</a>
|
||||
<br/>• <a href="build.html">Installation, portability, dependencies, and testing</a>
|
||||
<br/><b>Background:</b>
|
||||
<br/>• <a href="why.html">Why?</a>
|
||||
<br/>• <a href="whyc.html">Why C?</a>
|
||||
<br/>• <a href="etymology.html">Why call it Miller?</a>
|
||||
<br/>• <a href="originality.html">How original is Miller?</a>
|
||||
<br/>• <a href="performance.html">Performance</a>
|
||||
<br/><b>Repository:</b>
|
||||
<br/>• <a href="to-do.html">Things to do</a>
|
||||
<br/>• <a href="contact.html">Contact information</a>
|
||||
<br/>• <a href="https://github.com/johnkerl/miller">GitHub repo</a>
|
||||
<br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/>
|
||||
<br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/>
|
||||
<br/> <br/> <br/> <br/> <br/> <br/>
|
||||
</div>
|
||||
</td>
|
||||
|
||||
<!-- page body -->
|
||||
<td>
|
||||
<!--
|
||||
This is a visually gorgeous feature (here & in the CSS): it allows for
|
||||
independent scroll of the nav and body panels. In particular the nav
|
||||
stays on-screen as you scroll the body.
|
||||
|
||||
However, two problems:
|
||||
|
||||
(1) In Firefox & Chrome both I get janky end-of-body scrolls: there is
|
||||
more content but I can't scroll down to it unless I repeatedly retry the
|
||||
scrolldown. Which is weird.
|
||||
|
||||
(2) Worse, only the first page renders in PDF (again, Firefox & Chrome).
|
||||
|
||||
For now I'm disabling this separate-scroll feature. A frontender, I am
|
||||
not ... maybe someday I'll find a config which gets *all* the features
|
||||
I want; for now, it's a tradeoff.
|
||||
-->
|
||||
|
||||
<!-- Implementation details: one bit is right here:
|
||||
|
||||
div style="overflow-y:scroll;height:1500px"
|
||||
|
||||
and the other bit is in css/poki-callbacks.css:
|
||||
|
||||
.pokinav {
|
||||
display: inline-block;
|
||||
background: #e8d9bc;
|
||||
border: 1;
|
||||
box-shadow: 0px 0px 3px 3px #C9C9C9;
|
||||
margin: 10px;
|
||||
padding-top: 10px;
|
||||
padding-bottom: 10px;
|
||||
padding-left: 10px;
|
||||
padding-right: 10px;
|
||||
overflow-y: scroll; < - - - - - - here
|
||||
height: 1500px;
|
||||
}
|
||||
|
||||
-->
|
||||
<div>
|
||||
<center> <titleinbody> Sharing data with other languages </titleinbody> </center>
|
||||
<p/>
|
||||
|
||||
<!-- BODY COPIED FROM content-for-data-sharing.html BY poki -->
|
||||
<div class="pokitoc">
|
||||
<center><b>Contents:</b></center>
|
||||
• <a href="#DKVP_I/O_in_Python">DKVP I/O in Python</a><br/>
|
||||
• <a href="#DKVP_I/O_in_Ruby">DKVP I/O in Ruby</a><br/>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
<p/>
|
||||
<button style="font-weight:bold;color:maroon;border:0" onclick="expand_all();" href="javascript:;">Expand all sections</button>
|
||||
<button style="font-weight:bold;color:maroon;border:0" onclick="collapse_all();" href="javascript:;">Collapse all sections</button>
|
||||
|
||||
<p/> As discussed in the section on
|
||||
<a href="file-formats.html">File formats</a>, Miller supports several
|
||||
different file formats. Different tools are good at different things, so
|
||||
it’s important to be able to move data into and out of other languages.
|
||||
CSV and JSON are well-known, of course; here are some examples using DKVP
|
||||
format, with Ruby and Python.
|
||||
|
||||
|
||||
<a id="DKVP_I/O_in_Python"/><h1>DKVP I/O in Python</h1>
|
||||
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_dkvp_python');" href="javascript:;">Toggle section visibility</button>
|
||||
<div id="section_toggle_dkvp_python" style="display: block">
|
||||
|
||||
<p/>
|
||||
Here are the I/O routines:
|
||||
|
||||
<p/>
|
||||
<div class="pokipanel">
|
||||
<pre>
|
||||
#!/usr/bin/env python
|
||||
|
||||
# ================================================================
|
||||
# Example of DKVP I/O using Python.
|
||||
#
|
||||
# Key point: Use Miller for what it's good at; pass data into/out of tools in
|
||||
# other languages to do what they're good at.
|
||||
#
|
||||
# bash$ python -i dkvp_io.py
|
||||
#
|
||||
# # READ
|
||||
# >>> map = dkvpline2map('x=1,y=2', '=', ',')
|
||||
# >>> map
|
||||
# OrderedDict([('x', '1'), ('y', '2')])
|
||||
#
|
||||
# # MODIFY
|
||||
# >>> map['z'] = map['x'] + map['y']
|
||||
# >>> map
|
||||
# OrderedDict([('x', '1'), ('y', '2'), ('z', 3)])
|
||||
#
|
||||
# # WRITE
|
||||
# >>> line = map2dkvpline(map, '=', ',')
|
||||
# >>> line
|
||||
# 'x=1,y=2,z=3'
|
||||
#
|
||||
# ================================================================
|
||||
|
||||
import re
|
||||
import collections
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# ips and ifs (input pair separator and input field separator) are nominally '=' and ','.
|
||||
def dkvpline2map(line, ips, ifs):
|
||||
pairs = re.split(ifs, line)
|
||||
map = collections.OrderedDict()
|
||||
for pair in pairs:
|
||||
key, value = re.split(ips, pair, 1)
|
||||
|
||||
# Type inference:
|
||||
try:
|
||||
value = int(value)
|
||||
except:
|
||||
try:
|
||||
value = float(value)
|
||||
except:
|
||||
pass
|
||||
|
||||
map[key] = value
|
||||
return map
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
|
||||
def map2dkvpline(map , ops, ofs):
|
||||
line = ''
|
||||
pairs = []
|
||||
for key in map:
|
||||
pairs.append(str(key) + ops + str(map[key]))
|
||||
return str.join(ofs, pairs)
|
||||
</pre>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
And here is an example using them:
|
||||
|
||||
<p/>
|
||||
<div class="pokipanel">
|
||||
<pre>
|
||||
$ cat polyglot-dkvp-io/example.py
|
||||
#!/usr/bin/env ruby
|
||||
|
||||
import sys
|
||||
import re
|
||||
import dkvp_io
|
||||
|
||||
while True:
|
||||
# Read the original record:
|
||||
line = sys.stdin.readline().strip()
|
||||
if line == '':
|
||||
break
|
||||
map = dkvp_io.dkvpline2map(line, '=', ',')
|
||||
|
||||
# Drop a field:
|
||||
map.pop('x')
|
||||
|
||||
# Compute some new fields:
|
||||
map['ab'] = map['a'] + map['b']
|
||||
map['iy'] = map['i'] + map['y']
|
||||
|
||||
# Add new fields which show type of each already-existing field:
|
||||
keys = map.keys()
|
||||
for key in keys:
|
||||
# Convert "<type 'int'>" to just "int", etc.:
|
||||
type_string = str(map[key].__class__)
|
||||
type_string = re.sub("<type '", "", type_string)
|
||||
type_string = re.sub("'>", "", type_string)
|
||||
map['t'+key] = type_string
|
||||
|
||||
# Write the modified record:
|
||||
print dkvp_io.map2dkvpline(map, '=', ',')
|
||||
</pre>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
Run as-is:
|
||||
|
||||
<p/>
|
||||
<div class="pokipanel">
|
||||
<pre>
|
||||
$ python polyglot-dkvp-io/example.py < data/small
|
||||
a=pan,b=pan,i=1,y=0.726802862743,ab=panpan,iy=1.72680286274,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
|
||||
a=eks,b=pan,i=2,y=0.522151108333,ab=ekspan,iy=2.52215110833,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
|
||||
a=wye,b=wye,i=3,y=0.338318525517,ab=wyewye,iy=3.33831852552,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
|
||||
a=eks,b=wye,i=4,y=0.134188743284,ab=ekswye,iy=4.13418874328,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
|
||||
a=wye,b=pan,i=5,y=0.863624469903,ab=wyepan,iy=5.8636244699,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
|
||||
</pre>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
Run as-is, then pipe to Miller for pretty-printing:
|
||||
|
||||
<p/>
|
||||
<div class="pokipanel">
|
||||
<pre>
|
||||
$ python polyglot-dkvp-io/example.py < data/small | mlr --opprint cat
|
||||
a b i y ab iy ta tb ti ty tab tiy
|
||||
pan pan 1 0.726802862743 panpan 1.72680286274 str str int float str float
|
||||
eks pan 2 0.522151108333 ekspan 2.52215110833 str str int float str float
|
||||
wye wye 3 0.338318525517 wyewye 3.33831852552 str str int float str float
|
||||
eks wye 4 0.134188743284 ekswye 4.13418874328 str str int float str float
|
||||
wye pan 5 0.863624469903 wyepan 5.8636244699 str str int float str float
|
||||
</pre>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
</div>
|
||||
<a id="DKVP_I/O_in_Ruby"/><h1>DKVP I/O in Ruby</h1>
|
||||
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_dkvp_ruby');" href="javascript:;">Toggle section visibility</button>
|
||||
<div id="section_toggle_dkvp_ruby" style="display: block">
|
||||
|
||||
<p/>
|
||||
Here are the I/O routines:
|
||||
|
||||
<p/>
|
||||
<div class="pokipanel">
|
||||
<pre>
|
||||
#!/usr/bin/env ruby
|
||||
|
||||
# ================================================================
|
||||
# Example of DKVP I/O using Ruby.
|
||||
#
|
||||
# Key point: Use Miller for what it's good at; pass data into/out of tools in
|
||||
# other languages to do what they're good at.
|
||||
#
|
||||
# bash$ irb -I. -r dkvp_io.rb
|
||||
#
|
||||
# # READ
|
||||
# irb(main):001:0> map = dkvpline2map('x=1,y=2', '=', ',')
|
||||
# => {"x"=>"1", "y"=>"2"}
|
||||
#
|
||||
# # MODIFY
|
||||
# irb(main):001:0> map['z'] = map['x'] + map['y']
|
||||
# => 3
|
||||
#
|
||||
# # WRITE
|
||||
# irb(main):002:0> line = map2dkvpline(map, '=', ',')
|
||||
# => "x=1,y=2,z=3"
|
||||
#
|
||||
# ================================================================
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# ips and ifs (input pair separator and input field separator) are nominally '=' and ','.
|
||||
def dkvpline2map(line, ips, ifs)
|
||||
map = {}
|
||||
line.split(ifs).each do |pair|
|
||||
(k, v) = pair.split(ips, 2)
|
||||
|
||||
# Type inference:
|
||||
begin
|
||||
v = Integer(v)
|
||||
rescue ArgumentError
|
||||
begin
|
||||
v = Float(v)
|
||||
rescue ArgumentError
|
||||
# Leave as string
|
||||
end
|
||||
end
|
||||
|
||||
map[k] = v
|
||||
end
|
||||
map
|
||||
end
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
|
||||
def map2dkvpline(map, ops, ofs)
|
||||
map.collect{|k,v| k.to_s + ops + v.to_s}.join(ofs)
|
||||
end
|
||||
</pre>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
And here is an example using them:
|
||||
|
||||
<p/>
|
||||
<div class="pokipanel">
|
||||
<pre>
|
||||
$ cat polyglot-dkvp-io/example.rb
|
||||
#!/usr/bin/env ruby
|
||||
|
||||
require 'dkvp_io'
|
||||
|
||||
ARGF.each do |line|
|
||||
# Read the original record:
|
||||
map = dkvpline2map(line.chomp, '=', ',')
|
||||
|
||||
# Drop a field:
|
||||
map.delete('x')
|
||||
|
||||
# Compute some new fields:
|
||||
map['ab'] = map['a'] + map['b']
|
||||
map['iy'] = map['i'] + map['y']
|
||||
|
||||
# Add new fields which show type of each already-existing field:
|
||||
keys = map.keys
|
||||
keys.each do |key|
|
||||
map['t'+key] = map[key].class
|
||||
end
|
||||
|
||||
# Write the modified record:
|
||||
puts map2dkvpline(map, '=', ',')
|
||||
end
|
||||
</pre>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
Run as-is:
|
||||
|
||||
<p/>
|
||||
<div class="pokipanel">
|
||||
<pre>
|
||||
$ ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small
|
||||
a=pan,b=pan,i=1,y=0.7268028627434533,ab=panpan,iy=1.7268028627434533,ta=String,tb=String,ti=Fixnum,ty=Float,tab=String,tiy=Float
|
||||
a=eks,b=pan,i=2,y=0.5221511083334797,ab=ekspan,iy=2.5221511083334796,ta=String,tb=String,ti=Fixnum,ty=Float,tab=String,tiy=Float
|
||||
a=wye,b=wye,i=3,y=0.33831852551664776,ab=wyewye,iy=3.3383185255166477,ta=String,tb=String,ti=Fixnum,ty=Float,tab=String,tiy=Float
|
||||
a=eks,b=wye,i=4,y=0.13418874328430463,ab=ekswye,iy=4.134188743284304,ta=String,tb=String,ti=Fixnum,ty=Float,tab=String,tiy=Float
|
||||
a=wye,b=pan,i=5,y=0.8636244699032729,ab=wyepan,iy=5.863624469903273,ta=String,tb=String,ti=Fixnum,ty=Float,tab=String,tiy=Float
|
||||
</pre>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
Run as-is, then pipe to Miller for pretty-printing:
|
||||
|
||||
<p/>
|
||||
<div class="pokipanel">
|
||||
<pre>
|
||||
$ ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small | mlr --opprint cat
|
||||
a b i y ab iy ta tb ti ty tab tiy
|
||||
pan pan 1 0.7268028627434533 panpan 1.7268028627434533 String String Fixnum Float String Float
|
||||
eks pan 2 0.5221511083334797 ekspan 2.5221511083334796 String String Fixnum Float String Float
|
||||
wye wye 3 0.33831852551664776 wyewye 3.3383185255166477 String String Fixnum Float String Float
|
||||
eks wye 4 0.13418874328430463 ekswye 4.134188743284304 String String Fixnum Float String Float
|
||||
wye pan 5 0.8636244699032729 wyepan 5.863624469903273 String String Fixnum Float String Float
|
||||
</pre>
|
||||
</div>
|
||||
<p/>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</td>
|
||||
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html"><b>FAQ</b></a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html"><b>Internationalization</b></a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ internationalization.html Internationalization
|
|||
|
||||
sep:details <b>Using Miller:</b>
|
||||
faq.html FAQ
|
||||
data-sharing.html Sharing data with other languages
|
||||
cookbook.html Cookbook part 1
|
||||
cookbook2.html Cookbook part 2
|
||||
cookbook3.html Cookbook part 3
|
||||
|
|
|
|||
58
doc/polyglot-dkvp-io/dkvp_io.py
Normal file
58
doc/polyglot-dkvp-io/dkvp_io.py
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
#!/usr/bin/env python
|
||||
|
||||
# ================================================================
|
||||
# Example of DKVP I/O using Python.
|
||||
#
|
||||
# Key point: Use Miller for what it's good at; pass data into/out of tools in
|
||||
# other languages to do what they're good at.
|
||||
#
|
||||
# bash$ python -i dkvp_io.py
|
||||
#
|
||||
# # READ
|
||||
# >>> map = dkvpline2map('x=1,y=2', '=', ',')
|
||||
# >>> map
|
||||
# OrderedDict([('x', '1'), ('y', '2')])
|
||||
#
|
||||
# # MODIFY
|
||||
# >>> map['z'] = map['x'] + map['y']
|
||||
# >>> map
|
||||
# OrderedDict([('x', '1'), ('y', '2'), ('z', 3)])
|
||||
#
|
||||
# # WRITE
|
||||
# >>> line = map2dkvpline(map, '=', ',')
|
||||
# >>> line
|
||||
# 'x=1,y=2,z=3'
|
||||
#
|
||||
# ================================================================
|
||||
|
||||
import re
|
||||
import collections
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# ips and ifs (input pair separator and input field separator) are nominally '=' and ','.
|
||||
def dkvpline2map(line, ips, ifs):
|
||||
pairs = re.split(ifs, line)
|
||||
map = collections.OrderedDict()
|
||||
for pair in pairs:
|
||||
key, value = re.split(ips, pair, 1)
|
||||
|
||||
# Type inference:
|
||||
try:
|
||||
value = int(value)
|
||||
except:
|
||||
try:
|
||||
value = float(value)
|
||||
except:
|
||||
pass
|
||||
|
||||
map[key] = value
|
||||
return map
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
|
||||
def map2dkvpline(map , ops, ofs):
|
||||
line = ''
|
||||
pairs = []
|
||||
for key in map:
|
||||
pairs.append(str(key) + ops + str(map[key]))
|
||||
return str.join(ofs, pairs)
|
||||
52
doc/polyglot-dkvp-io/dkvp_io.rb
Normal file
52
doc/polyglot-dkvp-io/dkvp_io.rb
Normal file
|
|
@ -0,0 +1,52 @@
|
|||
#!/usr/bin/env ruby
|
||||
|
||||
# ================================================================
|
||||
# Example of DKVP I/O using Ruby.
|
||||
#
|
||||
# Key point: Use Miller for what it's good at; pass data into/out of tools in
|
||||
# other languages to do what they're good at.
|
||||
#
|
||||
# bash$ irb -I. -r dkvp_io.rb
|
||||
#
|
||||
# # READ
|
||||
# irb(main):001:0> map = dkvpline2map('x=1,y=2', '=', ',')
|
||||
# => {"x"=>"1", "y"=>"2"}
|
||||
#
|
||||
# # MODIFY
|
||||
# irb(main):001:0> map['z'] = map['x'] + map['y']
|
||||
# => 3
|
||||
#
|
||||
# # WRITE
|
||||
# irb(main):002:0> line = map2dkvpline(map, '=', ',')
|
||||
# => "x=1,y=2,z=3"
|
||||
#
|
||||
# ================================================================
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# ips and ifs (input pair separator and input field separator) are nominally '=' and ','.
|
||||
def dkvpline2map(line, ips, ifs)
|
||||
map = {}
|
||||
line.split(ifs).each do |pair|
|
||||
(k, v) = pair.split(ips, 2)
|
||||
|
||||
# Type inference:
|
||||
begin
|
||||
v = Integer(v)
|
||||
rescue ArgumentError
|
||||
begin
|
||||
v = Float(v)
|
||||
rescue ArgumentError
|
||||
# Leave as string
|
||||
end
|
||||
end
|
||||
|
||||
map[k] = v
|
||||
end
|
||||
map
|
||||
end
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
|
||||
def map2dkvpline(map, ops, ofs)
|
||||
map.collect{|k,v| k.to_s + ops + v.to_s}.join(ofs)
|
||||
end
|
||||
31
doc/polyglot-dkvp-io/example.py
Normal file
31
doc/polyglot-dkvp-io/example.py
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
#!/usr/bin/env ruby
|
||||
|
||||
import sys
|
||||
import re
|
||||
import dkvp_io
|
||||
|
||||
while True:
|
||||
# Read the original record:
|
||||
line = sys.stdin.readline().strip()
|
||||
if line == '':
|
||||
break
|
||||
map = dkvp_io.dkvpline2map(line, '=', ',')
|
||||
|
||||
# Drop a field:
|
||||
map.pop('x')
|
||||
|
||||
# Compute some new fields:
|
||||
map['ab'] = map['a'] + map['b']
|
||||
map['iy'] = map['i'] + map['y']
|
||||
|
||||
# Add new fields which show type of each already-existing field:
|
||||
keys = map.keys()
|
||||
for key in keys:
|
||||
# Convert "<type 'int'>" to just "int", etc.:
|
||||
type_string = str(map[key].__class__)
|
||||
type_string = re.sub("<type '", "", type_string)
|
||||
type_string = re.sub("'>", "", type_string)
|
||||
map['t'+key] = type_string
|
||||
|
||||
# Write the modified record:
|
||||
print dkvp_io.map2dkvpline(map, '=', ',')
|
||||
24
doc/polyglot-dkvp-io/example.rb
Normal file
24
doc/polyglot-dkvp-io/example.rb
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
#!/usr/bin/env ruby
|
||||
|
||||
require 'dkvp_io'
|
||||
|
||||
ARGF.each do |line|
|
||||
# Read the original record:
|
||||
map = dkvpline2map(line.chomp, '=', ',')
|
||||
|
||||
# Drop a field:
|
||||
map.delete('x')
|
||||
|
||||
# Compute some new fields:
|
||||
map['ab'] = map['a'] + map['b']
|
||||
map['iy'] = map['i'] + map['y']
|
||||
|
||||
# Add new fields which show type of each already-existing field:
|
||||
keys = map.keys
|
||||
keys.each do |key|
|
||||
map['t'+key] = map[key].class
|
||||
end
|
||||
|
||||
# Write the modified record:
|
||||
puts map2dkvpline(map, '=', ',')
|
||||
end
|
||||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
|
|
@ -113,6 +113,7 @@ Miller commands were run with pretty-print-tabular output format.
|
|||
<br/>• <a href="internationalization.html">Internationalization</a>
|
||||
<br/><b>Using Miller:</b>
|
||||
<br/>• <a href="faq.html">FAQ</a>
|
||||
<br/>• <a href="data-sharing.html">Sharing data with other languages</a>
|
||||
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
||||
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
||||
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue