Monadic Parser

Monadic Parser

Parser

A parser is a function that takes a string as input, then parses the string, and gives back an output.

However, the process may fail as the input string may not be a valid input for the parser. To describe the result, here we use a list [a] to represent the result, where a is the type of the output, and when parsing fails, we get an empty list.

1
newtype Parser a = Parser { parse :: String -> [(a, String)] }

Now we begin with “atoms” of parser, which consumes the smallest possible input, i.e. a char:

1
2
3
4
item :: Parser Char
item = Parser $ \s -> case s of
[] -> []
(x:xs) -> [(x, xs)]

Here we want to compose parsers, so declaring Parser as a Functor, Applicative, and even Monad is necessary:

1
2
3
4
5
6
7
8
9
10
11
12
13
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap f (Parser p) = Parser $ \s -> [(f a, s') | (a, s') <- p s]

instance Applicative Parser where
pure a = Parser $ \s -> [(a, s)] -- keep the input string unchanged
-- (<*>) :: Parser (a -> b) -> Parser a -> Parser b
pg <*> px = Parser $ \s -> [(g x, s'') | (g, s') <- parse pg s, (x, s'') <- parse px s']

instance Monad Parser where
return = pure
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
px >>= f = Parser $ \s -> concat [parse (f x) s' | (x, s') <- parse px s]

By virtue of the do notation, we can now compose parsers easily:

1
2
3
4
5
6
7
8
three = do 
x <- item
item
z <- item
return (x, z)

> parse three "abcdef"
[(('a','c'),"def")]

The remarkable thing is that the Monad instance of Parser allows what ,In contract, when a computation is lifted by Applicative, the type can’t be changed, so any middle operation cannot change the type of the computation.

Alternative

You may notice that we were just doing some linear parsing in the previous section. Now we want to make choices in parsing, so we need to define Alternative:

1
2
3
4
-- import Control.Applicative
instance Applicative Alternative where
empty :: f a
(<|>) :: f a -> f a -> f a

The class Alternative is a subclass of Applicative, and it is used to represent choice, by the operand <|>. To ensure what we implement really stands for making choices, we need to obey the following laws, which shows the essence of choice:

1
2
3
Associativity: x <|> (y <|> z) = (x <|> y) <|> z
Left Identity: empty <|> x = x
Right Identity: x <|> empty = x

Making many & some choices

The concept is a bit like while and do-while, we repeat the computation until it fails, and we get the result.

1
2
3
4
5
6
7
-- many : Zero or more.
many :: Alternative f => f a -> f [a]
many v = some v <|> pure []

-- some : One or more.
some :: Alternative f => f a -> f [a]
some v = (:) <$> v <*> many v

Their diffecence is that some must succeed at least once, while many may fail at the beginning. We use examples to clarify the difference:

1
2
3
4
5
> many Nothing
Just []

> some Nothing
Nothing

Then the Parser instance serves as a perfect example of Alternative:

1
2
3
4
5
6
instance Alternative Parser where
empty = Parser $ \s -> []
-- (<|>) :: Parser a -> Parser a -> Parser a
p <|> q = Parser $ \s -> case parse p s of
[] -> parse q s
[(x, s')] -> [(x, s')]

Now we can use <|> to make choices in parsing:

1
2
3
4
> parse (item <|> return 'd') "abc"
[('a',"bc")]
> parse (empty <|> return 'd') "abc" -- fails
[('d',"abc")]

Get down to Parsers!

We often want to make judgments on the input string, so we need to define some predicates:

1
2
3
4
-- Whether the next chat satisfies the predicate.
sat :: (Char -> Bool) -> Parser Char
sat p = do t <- item
if p t then return t else empty

Then we can define some parsers that consume specific chars and strings:

1
2
3
4
5
6
7
char :: Char -> Parser Char
char c = sat (== c)

string :: String -> Parser String
string (x:xs) = do char x
string xs
return (x:xs)

Using many, we also have something that consumes multiple spaces, processes natural numbers:

1
2
3
4
5
6
7
space :: Parser ()
space = do many (sat isSpace)
return ()

nat :: Parser Int
nat = do xs <- some digit
return (read xs)

When using spaces to separate different parts of the string, we can use token to consume the spaces:

1
2
3
4
5
6
7
8
9
10
11
12
token :: Parser a -> Parser a
token p = do space
v <- p
space
return v

-- Tokenized parsers
symbol :: String -> Parser String
symbol xs = token $ string xs

natural :: Parser Int
natural = token nat

Arithmetic Expressions

Now we can define a parser for arithmetic expressions. Given the properties of the operators, we can describe the wanted syntax as the following context free grammar:

1
2
3
4
expr ::= term '+' expr | term
term ::= factor '*' term | factor
factor ::= digit | '(' expr ')'
digit ::= '0' | '1' | ... | '9'

Once we have the grammar, we can easily translate it into Haskell code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
expr :: Parser Int
expr = do
t <- term
do
symbol "+"
e <- expr
return (t + e)
<|> do
symbol "-"
e <- expr
return (t - e)
<|> return t

term :: Parser Int
term = do
f <- factor
do
symbol "*"
t <- term
return (f * t)
<|> do
symbol "/"
t <- term
return (f `div` t)
<|> return f

factor :: Parser Int
factor = do
symbol "("
e <- expr
symbol ")"
return e
<|> natural

Finally it’s time to parse and evaluate the final value!

1
2
eval :: String -> Int
eval = fst . head . parse expr

Foldable and ...

Foldable and Friends

Foldable

The type class Foldable generally relies on Monoid, and it describes data structures that can be folded to a summary value, so the mappend(<>) defined on Monoid is needed to show that the containing data in Foldable can be combined:

1
2
3
4
5
class Foldable t where
fold :: Monoid a => t a -> a
foldMap :: Monoid b => (a -> b) -> t a -> b
foldr :: (a -> b -> b) -> b -> t a -> b
foldl :: (b -> a -> b) -> b -> t a -> b

Here is a foldMap example on lists:

1
2
3
foldMapList :: Monoid m => (a -> m) -> [a] -> m
foldMapList f [] = mempty
foldMapList f (x:xs) = f x <> foldMapList f xs

The minimal definition is foldMap or foldr, and using one of these two we can define them all:

1
2
3
fold = foldMap id
foldMap f = foldr (mappend . f) mempty
toList = foldMap (\x -> [x]) -- Here, we use the monoid of lists, which is concatenation

However, the folding directions are not absolute. For example, we can define foldl in terms of foldr (even though it is not efficient enough, it still shows some type of homomorphism):

1
2
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b
foldl f z ys = foldr (\x g -> g . f x) id ys z

To understand the entire process, we first notice that if we want to reverse the computation order, we may leave the computation “undone”, and gradually fold other elements into the process.

So in this implementation of foldl, the type b is used as the “undone” computation, i.e. function a -> a (like (+) 1), and the initial value is the identity map. After iterations, we shall get:

1
2
3
4
5
id -- \x -> x
\x -> f x y0
\x -> f (f x y0) y1
...
\x -> f (... f (f (f x y0) y1) ... yN)

When the folding process ends, we apply the “undone” computation to the initial value z, and we get the final result, in a reversed order.


In conclusion, “Folding” is the abstraction of sequential operations (or sequential recursion) on a data structure. Example:

1
map f = foldr ((:) . f) [] -- or foldMap (pure . f)

Traversable

Now let’s take a look at map.

1
2
3
map :: (a -> b) -> [a] -> [b]
map g [] = []
map g (x:xs) = g x : map g xs

We get the motivation of Traversable from the need of generalizing map to deal with effects brought by various functors.

1
2
3
4
5
6
class (Functor t, Foldable t) => Traversable t where
traverse :: Applicative f => (a -> f b) -> t a -> f (t b)

instance Traversable [] where
traverse g [] = pure []
traverse g (x:xs) = pure (:) <*> g x <*> traverse g xs

In the default definition of Traversable, another function sequenceA is introduced. It folds a series of effects into a single effect:

1
2
3
4
5
6
7
8
9
sequenceA :: Applicative f => t (f a) -> f (t a)
sequenceA = traverse id

--- Examples:
> sequenceA [Just 1, Just 2, Just 3]
Just [1, 2, 3]

> sequenceA [Just 1, Nothing, Just 3]
Nothing

We can also define traverse in turn:

1
traverse g = sequenceA . fmap g