Publicidad:
La Coctelera

un rato de sol

i will work harder, ja ja ja, no ahora en serio.

Categoría: erlang

1 Noviembre 2009

Parsing SPARQL in Erlang with a monadic combinatory parser

When I had to learn Haskell, some years ago, the task was frustrating for some time. I had the impression of not being able to put all the pieces together, of missing the big picture. Of course, I had to learn the syntax of the language, lazy evaluation, currying and partial application of functions, monads, do notation, arrows, etc. All of those parts seemed to be powerful tools, nice building blocks to build something, but what I was missing was the way of assembling all these components together in order to build something useful in an elegant way.

After a few weeks of struggle I eventually had my "aha!! moment", the moment I realized how powerful and elegant Haskell functional code was. That moment for me happened using the Parsec (http://www.haskell.org/haskellwiki/Parsec) library to build a parser.

Parsec is a monadic parser combinatory library for the Haskell language. It can be used to build parsers for context-sensitive infinite look-ahead grammars in a very different way we are used to with tools like YACC or events based parsers.

With Parsec, the grammar of the language to parse is not defined in an external file using some kind of BNF notation, but in plain Haskell code. Each rule of the grammar can be defined as a parser function of a monadic data type, accepting a state and a type of tokens. The function will do the parsing consuming tokens and returning either a successful result and the not consumed input or an error. The simplest parser you can build, as stated in the Parsec documentation, is the char parser. It is already defined in the Parsec module and just consumes a character:

simple :: Parser Char
 simple  = letter
 

Others Parsers are already defined in the Parsec module, e.g. for strings. All of these parser functions are full blown parsers that can be run over some input using the parse function:

 >parse simple "" "some characters"
 Right 's'
 

'Right' in the output is signaling the success status and the tokens consumed, 's' in this case. If the parser fails, 'Left' is returned:

 >parse simple "" "123"
 Left (line 1, column 1):
 unexpected "1"
 expecting letter
 

The interesting thing about Parsec's parsers is their monadic nature. It allows us to combine simple parsers to build more powerful and expressive parsers. The Parsec library comes loaded with tons of monadic combinators for parsers: many, many1, skipMany, skipMany1, sepBy, sepBy1, endBy, endBy1, sepEndBy, sepEndBy1, count, between, option, choice, manyTill, chainl, chainl1, eof, notFollowedBy, anyToken... For instance, we can create a parser to parse any number of chars using the previously defined parser and the combinator 'many':

 complex = many simple
 >parse complex "" "abc123"
 Right "abc"
 

Since parsers are monad instances, Haskell's "do notation" can be used to write the parsers in a more convenient way. The final result is that using Parsec you can write a parser for a grammar almost by simply translating the rules to Haskell code. An example from the Parsec documentation:

 ------->EBNF:
 receipt     ::= product* total
 produkt    ::= "return" price ";"
                      | identifier price ";"
 total       ::= price "total"
 price       ::= digit+ "." digit digit
 
 
 ------>Parsec:
 receipt = do{ ps <- many produkt
 ; p <- total
 
 ; return (sum ps == p)
 }
 
 produkt = do{ symbol "return"
 ; p <- price
 ; semi
 
 ; return (-p)
 }
 <|> do{ identifier
 ; p <- price
 ; semi
 
 ; return p
 }
 "produkt"
 
 total = do{ p <- price
 ; symbol "total"
 
 ; return p
 }
 
 price = lexeme (do{ ds1 <- many1 digit
 ; char '.'
 ; ds2 <- count 2 digit
 
 ; return (convert 0 (ds1 ++ ds2))
 })
 "price"
 where
 convert n [] = n
 convert n (d:ds) = convert (10*n + digitToInt d) ds
 

The code should be pretty straight forward. One can see how productions are implemented using Parsec and Haskell's do notation, and how the semantic transformation of the production rule is incorporated after the definition of the production.

I really encourage you to take a good look at Haskell and the little wonders, like Parsec,that it contains.

Nevertheless, my current work is based in another interesting functional language, Erlang. Recently I've been needing some parsing power for my project. After spending some time taking a look at Erlang's parsing libraries like YEPP, I really started missing Parsec. For a moment I had the idea of building a Parsec-like library for Erlang. Fortunately Jeffrey Meunler has already done it: http://www.engr.uconn.edu/~jeffm/Source/Erlang/.

This version of Parsec is much smaller than the original Parsec library for Haskell, but is handy and really does its work. The most annoying part is dealing with Erlang's unconvenient functional syntax. The Erlang's version of the 'simple' and 'complex' char parsers shown before will look like this:

 simple(S) ->
     parser:pAlphaNum(S) .
 
 >simple("abc")
 {"a","bc"}
 
 complex(S) ->
 (parse:pMany( fun parser:pAlphaNum/1 ))(S) .
 
 >complex("abc")
 {"abc",[]}
 

I've used this library to build a small parser for a subset of the SPARQL query language. Here you can see the production and how they are implemented in Erlang:

 %% @doc
 %% Main public function of the parser. It accepts a SPARQL query in a string
 %% and returns a tuple with a list of variables and a list of triple patterns {S,P,O,G} as binaries or
 %% error, if some error was found during the parsing process
 -spec parse_sparql_query(list()) -> {[bitstring()],[{bitstring(), bitstring(), bitstring(), bitstring()}]} | error .
 parse_sparql_query(Input) ->
     parser:parse(sparql_parser(), Input) .
 
 
 %% @doc
 %% This function returns the SPARQL parser composing top level parsers
 %% [5] 	SelectQuery  ::=  'SELECT' ( 'DISTINCT' | 'REDUCED' )? ( Var+ | '*' ) DatasetClause* WhereClause SolutionModifier
 sparql_parser() ->
     fun( I ) ->
             case (parser:pAnd( [
                                   parser:pString("SELECT"),
                                   fun parser:pSpaces/1,
                                   parser:pOr([parser:pString("*"),
                                               parser:pMany( fun var_parser/1 )]),
                                   fun parser:pSpaces/1,
                                   parser:pMaybe(parser:pString("WHERE")),
                                   fun parser:pSpaces/1,
                                   fun group_graph_pattern_parser/1
                                 ])) ( I ) of
                 {["SELECT",_,Vars,_,_W,_,Triples],[]} -> {Vars,lists:map(fun({triple,S,P,O}) -> {S,P,O,undefined} end, Triples)} ;
                 Other                                 -> Other
             end
     end .
 
 %% @doc
 %% [32] TriplesSameSubject ::=  VarOrTerm PropertyListNotEmpty | TriplesNode PropertyList
 triples_same_subject_parser( S ) ->
     case (parser:pOr([parser:pAnd([fun var_or_term_parser/1,
                                    fun property_list_not_empty_parser/1]),
                      parser:pAnd([fun triples_node_parser/1,
                                   parser:pMaybe(fun property_list_not_empty_parser/1)])]))( S ) of
 
         {[Sub,Ps],SP} -> case Sub of
                              {blank, BId, BTs} -> {lists:flatten(
                                                      lists:map(fun({predicate, P, O, Ts}) ->
                                                                        [{triple, BId, P, O} | Ts]
                                                                end,lists:flatten(Ps)) ++ BTs),
                                                    SP} ;
                              _Other            -> {lists:flatten(
                                                      lists:map(fun({predicate, P, O, Ts}) ->
                                                                        [{triple, Sub, P, O} | Ts]
                                                                end, lists:flatten(Ps))),
                                                    SP}
                          end ;
         error         -> error ;
         Other         -> Other
     end .
 
 %% @doc
 %% [38]    	TriplesNode  	  ::=    	Collection | BlankNodePropertyList
 triples_node_parser( S ) ->
     (parser:pOr([fun blank_node_property_list_parser/1]))( S ) .
 
 
 %% @doc
 %% [39]    	BlankNodePropertyList  	  ::=    	'[' PropertyListNotEmpty ']'
 blank_node_property_list_parser( S ) ->
     case (parser:pAnd([fun parser:pSpaces/1,
                        parser:pString("["),
                        fun parser:pSpaces/1,
                        fun property_list_not_empty_parser/1,
                        fun parser:pSpaces/1,
                        parser:pString("]"),
                        fun parser:pSpaces/1]))( S ) of
         error              -> error ;
         {[_,"[",_,Ps,_,"]",_], SP} -> BId = gen_blank_id(),
                                       {{blank, BId, lists:flatten(lists:map(fun({predicate, P, O, Ts}) -> [{triple, BId, P, O} | Ts] end, Ps))}, SP}
     end .
 
 
 %% @doc
 %% Generates a new identifier for a blank node.
 gen_blank_id() ->
     list_to_binary(io_lib:format("?_~p",[trunc(random:uniform() * 100000)])) .
 
 
 %% @doc
 %% [74]    	VAR1  	  ::=    	'?' VARNAME
 %% [75]   	VAR2 	  ::=   	'$' VARNAME
 var_parser( S ) ->
     case (parser:pAnd( [parser:pOr([parser:pString("?"),
                                     parser:pString("$")]),
                         fun( Inp ) ->
                                 {Res, More} = (parser:pMany( fun parser:pAlphaNum/1 ))( Inp ),
                                 case Res of
                                     []  ->  fail ;
                                     _Other -> {_Sp, MoreP} = parser:pSpaces(More),
                                               {plaza_utils:to_binary(Res),MoreP}
                                 end
                         end]))( S ) of
         fail            -> fail ;
         {[_S,Var], SP}  -> {Var,SP}
     end .
 
 
 %% @doc
 %% [20]	GroupGraphPattern  ::= 	'{' TriplesBlock? ( ( GraphPatternNotTriples | Filter ) '.'? TriplesBlock? )* '}'
 group_graph_pattern_parser( S ) ->
     case (parser:pAnd([parser:pString("{"),
                        fun parser:pSpaces/1,
                        fun triples_block_parser/1,
                        fun parser:pSpaces/1,
                        parser:pString("}")]))( S ) of
         error              -> error ;
         {["{",_,Ts,_,"}"],SP}  -> {Ts, SP}
     end .
 
 
 %% @doc
 %% [31]	ConstructTriples  ::=  	TriplesSameSubject ( '.' ConstructTriples? )?
 triples_block_parser( S ) ->
     case (parser:pAnd([fun triples_same_subject_parser/1,
                        parser:pMaybe(parser:pAnd([fun parser:pSpaces/1,
                                                   parser:pString("."),
                                                   fun parser:pSpaces/1,
                                                   fun triples_block_parser/1]))]))( S ) of
         error                      -> error ;
         {[Ts,[[_,".",_,OTs]]], SP} -> {lists:flatten(Ts ++ OTs), SP} ;
         {[Ts, OTs],SP}             -> {lists:flatten(Ts ++ OTs), SP}
     end .
 
 
 %% @doc
 %% Parses a var or a term. This rule is NOT in the SPARQL grammar.
 var_or_term_parser( S ) ->
     case (parser:pOr( [fun var_parser/1,
                        parser:pAnd([parser:pString("<"),
                                     parser:pMany(fun parser:pAlphaNum/1 ),
                                     parser:pString(">")])
                       ] ))( S ) of
         fail             -> fail ;
         {["<", Other, ">"], Sp}  -> {list_to_binary(Other), Sp} ;
         {Var, SP}                -> {list_to_binary([<<"?">>,Var]),SP} 
     end .
 
 
 %% @doc
 %% [35]	ObjectList  ::=	Object ( ',' Object )*
 object_list_parser( S ) ->
     case (parser:pAnd( [parser:pOr([fun var_or_term_parser/1,
                                     fun triples_node_parser/1]),
                         fun( Sp ) -> case (parser:pMany(parser:pAnd([fun parser:pSpaces/1,
                                                                      parser:pString(","),
                                                                      fun parser:pSpaces/1,
                                                                      parser:pOr([fun var_or_term_parser/1,
                                                                                  fun triples_node_parser/1 ])]))) ( Sp ) of
                                          error        -> error ;
                                          {Terms, Spp} -> {lists:map(fun([_,",",_,VarOrTerm]) -> VarOrTerm end, Terms), Spp}
                                      end
                         end ] ))( S ) of
         fail             -> fail ;
         {[P,Ps],Sp}      -> {lists:map(fun(Obj) -> case Obj of
                                                        {blank,BId,Ts} -> {obj, BId, Ts } ;
                                                        _Other         -> {obj, Obj, [] }
                                                    end
                                        end, lists:flatten([P|Ps])), Sp}
     end .
 
 
 %% @doc
 %% [33]	PropertyListNotEmpty  ::=  Verb ObjectList ( ';' ( Verb ObjectList )? )*
 property_list_not_empty_parser( S ) ->
     case (parser:pAnd( [fun var_or_term_parser/1,
                         fun object_list_parser/1,
                         parser:pMany(parser:pAnd([fun parser:pSpaces/1,
                                                   parser:pString(";"),
                                                   fun parser:pSpaces/1,
                                                   parser:pMaybe(parser:pAnd([fun var_or_term_parser/1,
                                                                              fun object_list_parser/1]))]))
                        ]))( S ) of
         fail   -> fail ;
         {[P,[],Triples],Sp} ->  {lists:map(fun({obj, O, Ts}) -> {predicate, P, O, Ts} end, Triples), Sp};
         {[P,O,Ps],Sp}       ->  {lists:flatten(
                                    lists:map(fun([Pr,Triples]) ->
                                                      lists:map(fun({obj, Ob, Ts}) -> {predicate, Pr, Ob, Ts} end,
                                                                Triples) end,
                                              [[P,O]|lists:map(fun([_,";",_,[[IP,IObjs]]]) -> [IP, IObjs] end, Ps)])), Sp}
     end .
 

Whit these definitions you can parse SPARQL queries easilly as the next test shows:

 select_sparql_1_test() ->
     Res = parse_sparql_query("SELECT * WHERE { ?s ?p ?o . ?x ?U ?Z}"),
     ?assertEqual({"*",
                   [{<<"?s">>,<<"?p">>,<<"?o">>,undefined},
                    {<<"?x">>,<<"?U">>,<<"?Z">>,undefined}]},
                 Res ) .
 

Parsec clones are also available for other languages like Java, Ruby or Python.

servido por Antonio 2 comentarios compártelo

7 Septiembre 2009

egearmand: a gearman server written in erlang

I've just pushed to my github repository (http://github.com/antoniogarrote/egearmand-server/tree/master)  the version 0.0.5 of egearmand, an erlang implementation of the gearman (http://gearman.org/index.php) server.

Gearman (http://en.wikipedia.org/wiki/Gearman) is a project initiated by Danga (the creators of memcache), specifying a very simple platform for distributed computing around the gearman protocol.

Egearmand offers a distributed implementation of the protocol running on erlang nodes using mnesia. If you want to try it, just clone the project from github, compile it with rake and starts egearmand from the command line. Take a look at the README document for the details.

Egearmand tries to take advantage of erlang strengths offering a fault tolerant distributed version of gearman (it is already an OTP compliant application) and adding some interesting ideas, like the support for extensions. For instance, an extension allowing gearman clients to create, publish and consume RabbitMQ queues is included.

The current version of egearmand is 0.0.5, and it implements the full gearman protocol but it is not, by any means, production ready yet. It might break in unsuspected ways, so have fun with it but be careful :)

Tags: erlang, gearman

servido por Antonio sin comentarios compártelo

27 Agosto 2009

Installing rabbitmq-erlang-client in OS X

Installing the erlang client for rabbitmq can be a little tricky, specially if you are new to erlang development and you have tried to follow the instructions at http://hopper.squarespace.com/blog/2008/1/12/introducing-the-erlang-amqp-client.html.

Requirements: erlang OTP, python, mercurial.

This is how I set up the beast under OS X, hope it can be helpful.

1.- Use mercurial to clone last version of rabbitmq:

hg clone http://hg.rabbitmq.com/rabbitmq-codegen

hg clone http://hg.rabbitmq.com/rabbitmq-server

cd rabbitmq-server

make PYTHON=/opt/local/bin/python2.5

It should have compiled the server without any problem. You can set up python path with the PYTHON flag.

2.-Then you need to download and install the erlang client:

hg clone http://hg.rabbitmq.com/rabbitmq-erlang-client/

cd rabbitmq-erlang-client/

make

It should have compiled the client code. If you get some warnings about the ssl_record, it means you don't have a recent version of rabbitmq at ../rabbitmq-server. The change in that record can be seen at: http://hg.rabbitmq.com/rabbitmq-server/rev/9d6c6a354a395257493cbf8a84332326bfb71a59

The erlang client needs access to the rabbitmq server code. By default it will look for it at ../rabbitmq-server.

After compiling the code, I made it available to erlang applications by copying the output to the erlang's OTP home directory:

sudo cp -rf rabbitmq-server /usr/local/lib/erlang/lib/rabbitmq_server-1.6.0

sudo cp -rf rabbitmq-erlang-client /usr/local/lib/erlang/lib/rabbitmq_erlang_client-1.6.0

ln -s /usr/local/lib/erlang/lib/rabbitmq_server-1.6.0 /usr/local/lib/erlang/lib/rabbit_common

Now you are ready to start rabbitmq server and test the client. The last soft link allows to retrieve rabbitmq server record definitions from the erlang client. With that symlink you can include the client's records with -include_lib("rabbitmq_erlang_client/include/amqp_client.hrl"). The code at http://hopper.squarespace.com/blog/2008/1/12/introducing-the-erlang-amqp-client.html seems to be out of date, so you will need to start the client in a different way:

$erl -mnesia dir  -boot start_sasl -s rabbit

...OTP and rabbitmq starting...

1> amqp_connection:start_direct().
<0.140.0>

If everything is fine you should obtain a beautiful Pid as a result of the previous function invocation. Take a look at the code of the client to see how to use the client's API

Tags: erlang, rabbitmq

servido por Antonio sin comentarios compártelo


Sobre mí

Avatar de Antonio

un rato de sol

Barcelona/Salamanca, España
ver perfil »
contacto »
Trabajador del metal y del acero, en la gloriosa XING AG, escribo software con el que poder ganarme el jornal. En mi tiempo libre sigo tecleando código de bonitos colores a medio camino entre lo sublime y lo terrible. Últimamente me gustan mucho los gatos.

Fotos

Antonio Garrote Hernández todavía no ha subido ninguna foto.

¡Anímale a hacerlo!

Buscar

suscríbete

Selecciona el agregador que utilices para suscribirte a este blog (también puedes obtener la URL de los feeds):

¿Qué es esto?

Crea tu blog gratis en La Coctelera