When I had to learn Haskell, some years ago, the task was frustrating for some time. I had the impression of not being able to put all the pieces together, of missing the big picture. Of course, I had to learn the syntax of the language, lazy evaluation, currying and partial application of functions, monads, do notation, arrows, etc. All of those parts seemed to be powerful tools, nice building blocks to build something, but what I was missing was the way of assembling all these components together in order to build something useful in an elegant way.

After a few weeks of struggle I eventually had my "aha!! moment", the moment I realized how powerful and elegant Haskell functional code was. That moment for me happened using the Parsec (http://www.haskell.org/haskellwiki/Parsec) library to build a parser.

Parsec is a monadic parser combinatory library for the Haskell language. It can be used to build parsers for context-sensitive infinite look-ahead grammars in a very different way we are used to with tools like YACC or events based parsers.

With Parsec, the grammar of the language to parse is not defined in an external file using some kind of BNF notation, but in plain Haskell code. Each rule of the grammar can be defined as a parser function of a monadic data type, accepting a state and a type of tokens. The function will do the parsing consuming tokens and returning either a successful result and the not consumed input or an error. The simplest parser you can build, as stated in the Parsec documentation, is the char parser. It is already defined in the Parsec module and just consumes a character:

simple :: Parser Char
 simple  = letter
 

Others Parsers are already defined in the Parsec module, e.g. for strings. All of these parser functions are full blown parsers that can be run over some input using the parse function:

 >parse simple "" "some characters"
 Right 's'
 

'Right' in the output is signaling the success status and the tokens consumed, 's' in this case. If the parser fails, 'Left' is returned:

 >parse simple "" "123"
 Left (line 1, column 1):
 unexpected "1"
 expecting letter
 

The interesting thing about Parsec's parsers is their monadic nature. It allows us to combine simple parsers to build more powerful and expressive parsers. The Parsec library comes loaded with tons of monadic combinators for parsers: many, many1, skipMany, skipMany1, sepBy, sepBy1, endBy, endBy1, sepEndBy, sepEndBy1, count, between, option, choice, manyTill, chainl, chainl1, eof, notFollowedBy, anyToken... For instance, we can create a parser to parse any number of chars using the previously defined parser and the combinator 'many':

 complex = many simple
 >parse complex "" "abc123"
 Right "abc"
 

Since parsers are monad instances, Haskell's "do notation" can be used to write the parsers in a more convenient way. The final result is that using Parsec you can write a parser for a grammar almost by simply translating the rules to Haskell code. An example from the Parsec documentation:

 ------->EBNF:
 receipt     ::= product* total
 produkt    ::= "return" price ";"
                      | identifier price ";"
 total       ::= price "total"
 price       ::= digit+ "." digit digit
 
 
 ------>Parsec:
 receipt = do{ ps <- many produkt
 ; p <- total
 
 ; return (sum ps == p)
 }
 
 produkt = do{ symbol "return"
 ; p <- price
 ; semi
 
 ; return (-p)
 }
 <|> do{ identifier
 ; p <- price
 ; semi
 
 ; return p
 }
 "produkt"
 
 total = do{ p <- price
 ; symbol "total"
 
 ; return p
 }
 
 price = lexeme (do{ ds1 <- many1 digit
 ; char '.'
 ; ds2 <- count 2 digit
 
 ; return (convert 0 (ds1 ++ ds2))
 })
 "price"
 where
 convert n [] = n
 convert n (d:ds) = convert (10*n + digitToInt d) ds
 

The code should be pretty straight forward. One can see how productions are implemented using Parsec and Haskell's do notation, and how the semantic transformation of the production rule is incorporated after the definition of the production.

I really encourage you to take a good look at Haskell and the little wonders, like Parsec,that it contains.

Nevertheless, my current work is based in another interesting functional language, Erlang. Recently I've been needing some parsing power for my project. After spending some time taking a look at Erlang's parsing libraries like YEPP, I really started missing Parsec. For a moment I had the idea of building a Parsec-like library for Erlang. Fortunately Jeffrey Meunler has already done it: http://www.engr.uconn.edu/~jeffm/Source/Erlang/.

This version of Parsec is much smaller than the original Parsec library for Haskell, but is handy and really does its work. The most annoying part is dealing with Erlang's unconvenient functional syntax. The Erlang's version of the 'simple' and 'complex' char parsers shown before will look like this:

 simple(S) ->
     parser:pAlphaNum(S) .
 
 >simple("abc")
 {"a","bc"}
 
 complex(S) ->
 (parse:pMany( fun parser:pAlphaNum/1 ))(S) .
 
 >complex("abc")
 {"abc",[]}
 

I've used this library to build a small parser for a subset of the SPARQL query language. Here you can see the production and how they are implemented in Erlang:

 %% @doc
 %% Main public function of the parser. It accepts a SPARQL query in a string
 %% and returns a tuple with a list of variables and a list of triple patterns {S,P,O,G} as binaries or
 %% error, if some error was found during the parsing process
 -spec parse_sparql_query(list()) -> {[bitstring()],[{bitstring(), bitstring(), bitstring(), bitstring()}]} | error .
 parse_sparql_query(Input) ->
     parser:parse(sparql_parser(), Input) .
 
 
 %% @doc
 %% This function returns the SPARQL parser composing top level parsers
 %% [5] 	SelectQuery  ::=  'SELECT' ( 'DISTINCT' | 'REDUCED' )? ( Var+ | '*' ) DatasetClause* WhereClause SolutionModifier
 sparql_parser() ->
     fun( I ) ->
             case (parser:pAnd( [
                                   parser:pString("SELECT"),
                                   fun parser:pSpaces/1,
                                   parser:pOr([parser:pString("*"),
                                               parser:pMany( fun var_parser/1 )]),
                                   fun parser:pSpaces/1,
                                   parser:pMaybe(parser:pString("WHERE")),
                                   fun parser:pSpaces/1,
                                   fun group_graph_pattern_parser/1
                                 ])) ( I ) of
                 {["SELECT",_,Vars,_,_W,_,Triples],[]} -> {Vars,lists:map(fun({triple,S,P,O}) -> {S,P,O,undefined} end, Triples)} ;
                 Other                                 -> Other
             end
     end .
 
 %% @doc
 %% [32] TriplesSameSubject ::=  VarOrTerm PropertyListNotEmpty | TriplesNode PropertyList
 triples_same_subject_parser( S ) ->
     case (parser:pOr([parser:pAnd([fun var_or_term_parser/1,
                                    fun property_list_not_empty_parser/1]),
                      parser:pAnd([fun triples_node_parser/1,
                                   parser:pMaybe(fun property_list_not_empty_parser/1)])]))( S ) of
 
         {[Sub,Ps],SP} -> case Sub of
                              {blank, BId, BTs} -> {lists:flatten(
                                                      lists:map(fun({predicate, P, O, Ts}) ->
                                                                        [{triple, BId, P, O} | Ts]
                                                                end,lists:flatten(Ps)) ++ BTs),
                                                    SP} ;
                              _Other            -> {lists:flatten(
                                                      lists:map(fun({predicate, P, O, Ts}) ->
                                                                        [{triple, Sub, P, O} | Ts]
                                                                end, lists:flatten(Ps))),
                                                    SP}
                          end ;
         error         -> error ;
         Other         -> Other
     end .
 
 %% @doc
 %% [38]    	TriplesNode  	  ::=    	Collection | BlankNodePropertyList
 triples_node_parser( S ) ->
     (parser:pOr([fun blank_node_property_list_parser/1]))( S ) .
 
 
 %% @doc
 %% [39]    	BlankNodePropertyList  	  ::=    	'[' PropertyListNotEmpty ']'
 blank_node_property_list_parser( S ) ->
     case (parser:pAnd([fun parser:pSpaces/1,
                        parser:pString("["),
                        fun parser:pSpaces/1,
                        fun property_list_not_empty_parser/1,
                        fun parser:pSpaces/1,
                        parser:pString("]"),
                        fun parser:pSpaces/1]))( S ) of
         error              -> error ;
         {[_,"[",_,Ps,_,"]",_], SP} -> BId = gen_blank_id(),
                                       {{blank, BId, lists:flatten(lists:map(fun({predicate, P, O, Ts}) -> [{triple, BId, P, O} | Ts] end, Ps))}, SP}
     end .
 
 
 %% @doc
 %% Generates a new identifier for a blank node.
 gen_blank_id() ->
     list_to_binary(io_lib:format("?_~p",[trunc(random:uniform() * 100000)])) .
 
 
 %% @doc
 %% [74]    	VAR1  	  ::=    	'?' VARNAME
 %% [75]   	VAR2 	  ::=   	'$' VARNAME
 var_parser( S ) ->
     case (parser:pAnd( [parser:pOr([parser:pString("?"),
                                     parser:pString("$")]),
                         fun( Inp ) ->
                                 {Res, More} = (parser:pMany( fun parser:pAlphaNum/1 ))( Inp ),
                                 case Res of
                                     []  ->  fail ;
                                     _Other -> {_Sp, MoreP} = parser:pSpaces(More),
                                               {plaza_utils:to_binary(Res),MoreP}
                                 end
                         end]))( S ) of
         fail            -> fail ;
         {[_S,Var], SP}  -> {Var,SP}
     end .
 
 
 %% @doc
 %% [20]	GroupGraphPattern  ::= 	'{' TriplesBlock? ( ( GraphPatternNotTriples | Filter ) '.'? TriplesBlock? )* '}'
 group_graph_pattern_parser( S ) ->
     case (parser:pAnd([parser:pString("{"),
                        fun parser:pSpaces/1,
                        fun triples_block_parser/1,
                        fun parser:pSpaces/1,
                        parser:pString("}")]))( S ) of
         error              -> error ;
         {["{",_,Ts,_,"}"],SP}  -> {Ts, SP}
     end .
 
 
 %% @doc
 %% [31]	ConstructTriples  ::=  	TriplesSameSubject ( '.' ConstructTriples? )?
 triples_block_parser( S ) ->
     case (parser:pAnd([fun triples_same_subject_parser/1,
                        parser:pMaybe(parser:pAnd([fun parser:pSpaces/1,
                                                   parser:pString("."),
                                                   fun parser:pSpaces/1,
                                                   fun triples_block_parser/1]))]))( S ) of
         error                      -> error ;
         {[Ts,[[_,".",_,OTs]]], SP} -> {lists:flatten(Ts ++ OTs), SP} ;
         {[Ts, OTs],SP}             -> {lists:flatten(Ts ++ OTs), SP}
     end .
 
 
 %% @doc
 %% Parses a var or a term. This rule is NOT in the SPARQL grammar.
 var_or_term_parser( S ) ->
     case (parser:pOr( [fun var_parser/1,
                        parser:pAnd([parser:pString("<"),
                                     parser:pMany(fun parser:pAlphaNum/1 ),
                                     parser:pString(">")])
                       ] ))( S ) of
         fail             -> fail ;
         {["<", Other, ">"], Sp}  -> {list_to_binary(Other), Sp} ;
         {Var, SP}                -> {list_to_binary([<<"?">>,Var]),SP} 
     end .
 
 
 %% @doc
 %% [35]	ObjectList  ::=	Object ( ',' Object )*
 object_list_parser( S ) ->
     case (parser:pAnd( [parser:pOr([fun var_or_term_parser/1,
                                     fun triples_node_parser/1]),
                         fun( Sp ) -> case (parser:pMany(parser:pAnd([fun parser:pSpaces/1,
                                                                      parser:pString(","),
                                                                      fun parser:pSpaces/1,
                                                                      parser:pOr([fun var_or_term_parser/1,
                                                                                  fun triples_node_parser/1 ])]))) ( Sp ) of
                                          error        -> error ;
                                          {Terms, Spp} -> {lists:map(fun([_,",",_,VarOrTerm]) -> VarOrTerm end, Terms), Spp}
                                      end
                         end ] ))( S ) of
         fail             -> fail ;
         {[P,Ps],Sp}      -> {lists:map(fun(Obj) -> case Obj of
                                                        {blank,BId,Ts} -> {obj, BId, Ts } ;
                                                        _Other         -> {obj, Obj, [] }
                                                    end
                                        end, lists:flatten([P|Ps])), Sp}
     end .
 
 
 %% @doc
 %% [33]	PropertyListNotEmpty  ::=  Verb ObjectList ( ';' ( Verb ObjectList )? )*
 property_list_not_empty_parser( S ) ->
     case (parser:pAnd( [fun var_or_term_parser/1,
                         fun object_list_parser/1,
                         parser:pMany(parser:pAnd([fun parser:pSpaces/1,
                                                   parser:pString(";"),
                                                   fun parser:pSpaces/1,
                                                   parser:pMaybe(parser:pAnd([fun var_or_term_parser/1,
                                                                              fun object_list_parser/1]))]))
                        ]))( S ) of
         fail   -> fail ;
         {[P,[],Triples],Sp} ->  {lists:map(fun({obj, O, Ts}) -> {predicate, P, O, Ts} end, Triples), Sp};
         {[P,O,Ps],Sp}       ->  {lists:flatten(
                                    lists:map(fun([Pr,Triples]) ->
                                                      lists:map(fun({obj, Ob, Ts}) -> {predicate, Pr, Ob, Ts} end,
                                                                Triples) end,
                                              [[P,O]|lists:map(fun([_,";",_,[[IP,IObjs]]]) -> [IP, IObjs] end, Ps)])), Sp}
     end .
 

Whit these definitions you can parse SPARQL queries easilly as the next test shows:

 select_sparql_1_test() ->
     Res = parse_sparql_query("SELECT * WHERE { ?s ?p ?o . ?x ?U ?Z}"),
     ?assertEqual({"*",
                   [{<<"?s">>,<<"?p">>,<<"?o">>,undefined},
                    {<<"?x">>,<<"?U">>,<<"?Z">>,undefined}]},
                 Res ) .
 

Parsec clones are also available for other languages like Java, Ruby or Python.