Monadic Parser

Monadic Parser

Parser

A parser is a function that takes a string as input, then parses the string, and gives back an output.

However, the process may fail as the input string may not be a valid input for the parser. To describe the result, here we use a list [a] to represent the result, where a is the type of the output, and when parsing fails, we get an empty list.

1
newtype Parser a = Parser { parse :: String -> [(a, String)] }

Now we begin with “atoms” of parser, which consumes the smallest possible input, i.e. a char:

1
2
3
4
item :: Parser Char
item = Parser $ \s -> case s of
[] -> []
(x:xs) -> [(x, xs)]

Here we want to compose parsers, so declaring Parser as a Functor, Applicative, and even Monad is necessary:

1
2
3
4
5
6
7
8
9
10
11
12
13
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap f (Parser p) = Parser $ \s -> [(f a, s') | (a, s') <- p s]

instance Applicative Parser where
pure a = Parser $ \s -> [(a, s)] -- keep the input string unchanged
-- (<*>) :: Parser (a -> b) -> Parser a -> Parser b
pg <*> px = Parser $ \s -> [(g x, s'') | (g, s') <- parse pg s, (x, s'') <- parse px s']

instance Monad Parser where
return = pure
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
px >>= f = Parser $ \s -> concat [parse (f x) s' | (x, s') <- parse px s]

By virtue of the do notation, we can now compose parsers easily:

1
2
3
4
5
6
7
8
three = do 
x <- item
item
z <- item
return (x, z)

> parse three "abcdef"
[(('a','c'),"def")]

The remarkable thing is that the Monad instance of Parser allows what ,In contract, when a computation is lifted by Applicative, the type can’t be changed, so any middle operation cannot change the type of the computation.

Alternative

You may notice that we were just doing some linear parsing in the previous section. Now we want to make choices in parsing, so we need to define Alternative:

1
2
3
4
-- import Control.Applicative
instance Applicative Alternative where
empty :: f a
(<|>) :: f a -> f a -> f a

The class Alternative is a subclass of Applicative, and it is used to represent choice, by the operand <|>. To ensure what we implement really stands for making choices, we need to obey the following laws, which shows the essence of choice:

1
2
3
Associativity: x <|> (y <|> z) = (x <|> y) <|> z
Left Identity: empty <|> x = x
Right Identity: x <|> empty = x

Making many & some choices

The concept is a bit like while and do-while, we repeat the computation until it fails, and we get the result.

1
2
3
4
5
6
7
-- many : Zero or more.
many :: Alternative f => f a -> f [a]
many v = some v <|> pure []

-- some : One or more.
some :: Alternative f => f a -> f [a]
some v = (:) <$> v <*> many v

Their diffecence is that some must succeed at least once, while many may fail at the beginning. We use examples to clarify the difference:

1
2
3
4
5
> many Nothing
Just []

> some Nothing
Nothing

Then the Parser instance serves as a perfect example of Alternative:

1
2
3
4
5
6
instance Alternative Parser where
empty = Parser $ \s -> []
-- (<|>) :: Parser a -> Parser a -> Parser a
p <|> q = Parser $ \s -> case parse p s of
[] -> parse q s
[(x, s')] -> [(x, s')]

Now we can use <|> to make choices in parsing:

1
2
3
4
> parse (item <|> return 'd') "abc"
[('a',"bc")]
> parse (empty <|> return 'd') "abc" -- fails
[('d',"abc")]

Get down to Parsers!

We often want to make judgments on the input string, so we need to define some predicates:

1
2
3
4
-- Whether the next chat satisfies the predicate.
sat :: (Char -> Bool) -> Parser Char
sat p = do t <- item
if p t then return t else empty

Then we can define some parsers that consume specific chars and strings:

1
2
3
4
5
6
7
char :: Char -> Parser Char
char c = sat (== c)

string :: String -> Parser String
string (x:xs) = do char x
string xs
return (x:xs)

Using many, we also have something that consumes multiple spaces, processes natural numbers:

1
2
3
4
5
6
7
space :: Parser ()
space = do many (sat isSpace)
return ()

nat :: Parser Int
nat = do xs <- some digit
return (read xs)

When using spaces to separate different parts of the string, we can use token to consume the spaces:

1
2
3
4
5
6
7
8
9
10
11
12
token :: Parser a -> Parser a
token p = do space
v <- p
space
return v

-- Tokenized parsers
symbol :: String -> Parser String
symbol xs = token $ string xs

natural :: Parser Int
natural = token nat

Arithmetic Expressions

Now we can define a parser for arithmetic expressions. Given the properties of the operators, we can describe the wanted syntax as the following context free grammar:

1
2
3
4
expr ::= term '+' expr | term
term ::= factor '*' term | factor
factor ::= digit | '(' expr ')'
digit ::= '0' | '1' | ... | '9'

Once we have the grammar, we can easily translate it into Haskell code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
expr :: Parser Int
expr = do
t <- term
do
symbol "+"
e <- expr
return (t + e)
<|> do
symbol "-"
e <- expr
return (t - e)
<|> return t

term :: Parser Int
term = do
f <- factor
do
symbol "*"
t <- term
return (f * t)
<|> do
symbol "/"
t <- term
return (f `div` t)
<|> return f

factor :: Parser Int
factor = do
symbol "("
e <- expr
symbol ")"
return e
<|> natural

Finally it’s time to parse and evaluate the final value!

1
2
eval :: String -> Int
eval = fst . head . parse expr
作者

timetraveler314

发布于

2023-11-04

更新于

2023-11-04

许可协议