Archive for the ‘json’ Category
Parslet and JSON
Parslet is a small Ruby library for constructing parsers based on Parsing Expression Grammars (PEG). It’s written by Kaspar Schiess and various contributors.
This blog post introduces Parslet with a parser example. Since JSON has very easy to grasp railroad diagrams for its syntax, it might make for a good example.
Please note that the JSON parser here won’t compete for speed with available libraries. No benchmarks here.
Our goal is to take as input JSON strings and output the resulting value.
For the impatient, the end result is at https://gist.github.com/966020
How is an array encoded in JSON ?
How would that look in our parser ?
class Parser < Parslet::Parser rule(:spaces) { match('\s').repeat(1) } # at least 1 space character (space, tab, new line, carriage return) rule(:spaces?) { spaces.maybe } # a bunch of spaces or not rule(:comma) { spaces? >> str(',') >> spaces? } # a comma surrounded by optional spaces rule(:array) { str('[') >> spaces? >> (value >> (comma >> value).repeat).maybe.as(:array) >> spaces? >> str(']') } end
What is this value thing ?
string or number or object or …
rule(:value) { string | number | object | array | str('true').as(:true) | str('false').as(:false) | str('null').as(:null) }
All is good, a few parsing rules laters, we have a complete JSON parser, but wait, what does it output ?
p MyJson::Parser.new.parse(%{ [ 1, 2, 3, null, "asdfasdf asdfds", { "a": -1.2 }, { "b": true, "c": false }, 0.1e24, true, false, [ 1 ] ] }) # => {:array=>[{:number=>"1"@5}, {:number=>"2"@8}, {:number=>"3"@11}, {:null=>"null"@14}, {:string=>"asdfasdf asdfds"@25}, {:object=>{:entry=>{:val=>{:number=>"-1.2"@50}, :key=>{:string=>"a"@46}}}}, {:object=>[{:entry=>{:val=>{:true=>"true"@65}, :key=>{:string=>"b"@61}}}, {:entry=>{:val=>{:false=>"false"@76}, :key=>{:string=>"c"@72}}}]}, {:number=>"0.1e24"@89}, {:true=>"true"@97}, {:false=>"false"@103}, {:array=>{:number=>"1"@112}}]}
Oh well, that is not exactly what we want as final result. Parslet calls the output of its parser a “intermediate tree”. It separates parsing from transformation.
We need a transformer and it looks like :
class Transformer < Parslet::Transform class Entry < Struct.new(:key, :val); end rule(:array => subtree(:ar)) { ar.is_a?(Array) ? ar : [ ar ] } rule(:object => subtree(:ob)) { (ob.is_a?(Array) ? ob : [ ob ]).inject({}) { |h, e| h[e.key] = e.val; h } } rule(:entry => { :key => simple(:ke), :val => simple(:va) }) { Entry.new(ke, va) } rule(:string => simple(:st)) { st.to_s } rule(:number => simple(:nb)) { nb.match(/[eE\.]/) ? Float(nb) : Integer(nb) } rule(:null => simple(:nu)) { nil } rule(:true => simple(:tr)) { true } rule(:false => simple(:fa)) { false } end
Patterns in the intermediate tree are indentified and replaced, producing a final output (or yet another intermediate result, it’s up to you).
The complete parser (and transformer and small test) is at https://gist.github.com/966020
There isn’t much more I could say. Ah yes, about testing. Kaspar explains it in the tricks, you can directly test parsing rules individually :
class MyJsonTest < Test::Unit::TestCase def parser MyJson::Parser.new end def test_parser_number_integer assert_equal 1, parser.number("1") end def test_parser_number_float assert_equal 1.0, parser.number("1.0") end def test_parser_number_not_a_number assert_raise Parslet::ParseFailed do parser.number("whatever") end end end
Happy parsing (and transforming) !
the json parser : https://gist.github.com/966020
documentation : http://kschiess.github.com/parslet/
source code : https://github.com/kschiess/parslet
mailing list : ruby.parslet@librelist.com
irc : freenode.net #parslet
No animals got benchmarked during this blog post.