processi

about processes and engines

Parslet and JSON

Parslet is a small Ruby library for constructing parsers based on Parsing Expression Grammars (PEG). It’s written by Kaspar Schiess and various contributors.

This blog post introduces Parslet with a parser example. Since JSON has very easy to grasp railroad diagrams for its syntax, it might make for a good example.

Please note that the JSON parser here won’t compete for speed with available libraries. No benchmarks here.

Our goal is to take as input JSON strings and output the resulting value.

For the impatient, the end result is at https://gist.github.com/966020

How is an array encoded in JSON ?

How would that look in our parser ?

  class Parser < Parslet::Parser

    rule(:spaces) { match('\s').repeat(1) }
      # at least 1 space character (space, tab, new line, carriage return)

    rule(:spaces?) { spaces.maybe }
      # a bunch of spaces or not

    rule(:comma) { spaces? >> str(',') >> spaces? }
      # a comma surrounded by optional spaces

    rule(:array) {
      str('[') >> spaces? >>
      (value >> (comma >> value).repeat).maybe.as(:array) >>
      spaces? >> str(']')
    }
  end

What is this value thing ?

string or number or object or …

    rule(:value) {
      string | number |
      object | array |
      str('true').as(:true) | str('false').as(:false) |
      str('null').as(:null)
    }

All is good, a few parsing rules laters, we have a complete JSON parser, but wait, what does it output ?

p MyJson::Parser.new.parse(%{
  [ 1, 2, 3, null,
    "asdfasdf asdfds", { "a": -1.2 }, { "b": true, "c": false },
    0.1e24, true, false, [ 1 ] ]
})
# => {:array=>[{:number=>"1"@5}, {:number=>"2"@8}, {:number=>"3"@11}, {:null=>"null"@14}, {:string=>"asdfasdf asdfds"@25}, {:object=>{:entry=>{:val=>{:number=>"-1.2"@50}, :key=>{:string=>"a"@46}}}}, {:object=>[{:entry=>{:val=>{:true=>"true"@65}, :key=>{:string=>"b"@61}}}, {:entry=>{:val=>{:false=>"false"@76}, :key=>{:string=>"c"@72}}}]}, {:number=>"0.1e24"@89}, {:true=>"true"@97}, {:false=>"false"@103}, {:array=>{:number=>"1"@112}}]}

Oh well, that is not exactly what we want as final result. Parslet calls the output of its parser a “intermediate tree”. It separates parsing from transformation.

We need a transformer and it looks like :

  class Transformer < Parslet::Transform

    class Entry < Struct.new(:key, :val); end

    rule(:array => subtree(:ar)) {
      ar.is_a?(Array) ? ar : [ ar ]
    }
    rule(:object => subtree(:ob)) {
      (ob.is_a?(Array) ? ob : [ ob ]).inject({}) { |h, e| h[e.key] = e.val; h }
    }

    rule(:entry => { :key => simple(:ke), :val => simple(:va) }) {
      Entry.new(ke, va)
    }

    rule(:string => simple(:st)) {
      st.to_s
    }
    rule(:number => simple(:nb)) {
      nb.match(/[eE\.]/) ? Float(nb) : Integer(nb)
    }

    rule(:null => simple(:nu)) { nil }
    rule(:true => simple(:tr)) { true }
    rule(:false => simple(:fa)) { false }
  end

Patterns in the intermediate tree are indentified and replaced, producing a final output (or yet another intermediate result, it’s up to you).

The complete parser (and transformer and small test) is at https://gist.github.com/966020

There isn’t much more I could say. Ah yes, about testing. Kaspar explains it in the tricks, you can directly test parsing rules individually :

class MyJsonTest < Test::Unit::TestCase
  def parser
    MyJson::Parser.new
  end
  def test_parser_number_integer
    assert_equal 1, parser.number("1")
  end
  def test_parser_number_float
    assert_equal 1.0, parser.number("1.0")
  end
  def test_parser_number_not_a_number
    assert_raise Parslet::ParseFailed do
      parser.number("whatever")
    end
  end
end

Happy parsing (and transforming) !

 

the json parser : https://gist.github.com/966020

documentation : http://kschiess.github.com/parslet/
source code : https://github.com/kschiess/parslet
mailing list : ruby.parslet@librelist.com
irc : freenode.net #parslet

No animals got benchmarked during this blog post.

 

Written by John Mettraux

May 11, 2011 at 7:10 am

Posted in json, parslet, ruby

5 Responses

Subscribe to comments with RSS.

  1. Instead of `ar.is_a?(Array) ? ar : [ ar ]` you can simply use `Array(ar)`.

    Konstantin Haase

    May 11, 2011 at 7:46 am

  2. http://kschiess.gihtub.com/ <- should be *kschiess.github.com* as I suppose ;)

    Kazuhiro

    May 11, 2011 at 9:52 am


Comments are closed.