# Learning Souffle

## Introduction

Datalog is a programming language derived from Prolog. Compared to Prolog, Datalog has a wide use in database and program analysis, though it imposes a few restrictions on certain syntax of Prolog and thus becomes turing-incomplete. Souffle is one of the efficient implementation of Datalog and I met it for the first time when I was learning the Software Analysis Course taught by Prof. Xiong. Nevertheless, I was too naive to understand the language at that time. Recently, I take the “Program Analysis” course and I have to write about 1k lines of Souffle. This blog post is thus to record my experience on writing Souffle.

The code of this post could be tested online at here

The Chinese version of this post can be found here

## Case Study: Reachable

Souffle has a very different programming model compared to most popular programming languages. The very first object in Souffle is “relation”. Programmers provide Souffle with input (facts), the relations and the rules and Souffle would resolve the rules to output the result. For a concrete example:

1 | .decl edge(n: symbol, m: symbol) |

This Souffle program would output all reachable nodes for each node in the directed graph. Let’s understand it line by line.

1 | .decl edge(n: symbol, m: symbol) |

In Soufle, `.decl`

is used to declare a relation and `Souffle`

support both `symbol`

and `number`

as the data type. Note, `symbol`

in fact is pretty like the string.

1 | edge("a", "b"). |

These three lines defines some known relationship. Such things are called `facts`

.

1 | .decl reachable (n: symbol, m: symbol) |

Another relation `reachable`

is defined and `.ouptut`

directs Souffle to output its content.

1 | reachable(x, y):- edge(x, y). |

These two lines define two rules with the relation `reachable`

:

- If there is an edge between x and y, then y is reachable for x, which is obvious.
- If there is an edge between x and y while z is reachable for y, then z is reachable for x.

It looks pretty intuitive, but here I would like to emphasize some hidden lexeme behind the rules:

- Any variables that show both in the right and left, have
*any*lexeme. e.g. x and y in the first rule. - Any variables in the right but not in the left, have
*exist*lexeme. e.g. y in the second rule.

With these two points, the actual meaning of the second rule is that: For *any* x and z in `reachable`

, if there `exists`

y, such that there is an edge between x and y and z is reachable for y, then z is reachable for x.

It looks more complex, though, it can help learn how Souffle solve rules.

## How Souffle Works

To understand souffle better, it’s essential to learn how souffle actually works. Note, the actual meaning of `edge`

and `reachable`

relation is defined by us and Souffle only considers them as two relations and thus Souffle won’t really build a directed graph to solve it. To learn the implementation, the definition of `Least Fixed Point`

should be introduced. In brief, given a set `S`

and a function `g`

with domain on sets, if:

S = g(S)

Then we say `S`

is a `Least Fixed Point`

.

Souffle in fact derives new facts repeatly from known rules and relations until every relation arrives at its `Least Fixed Point`

. Take the example above to illustrate this process:

- There is no rules for
`edge`

relation, so all facts of`edge`

relation consists of the least fixed point of`edge`

. - Repeatly apply the first rule of
`reachable`

to add all facts of`edge`

to`reachable`

and get`(a, b), (b, c), (c, d)`

. - Apply the second rule of
`reachable`

with regard of the previous explanation, several new facts`(a, c), (a, d), (b, d)`

are added. At this moment,`reachable`

has`(a, b), (b, c), (c, d), (a, c), (a, d), (b, d)`

. - Try all rules in the program and no sets change. In this case, all sets arrive at the least fixed point and the program ends.

## Limitation

The least fixed point algorithm makes it easy to write intuitive Souffle code in many scenarios, however it’s not the silver bullet.

There is a strong requirement on the algorithm above: The `g`

should be monotone, and in Souffle it means if in cyclic rules, negative rules are prohibited. In this case, each time Souffle applies a rule, the set is guarantee-ed to increase. That’s to say, the following program won’t compile:

1 | .decl p(x: number) |

In addition, to be more expressive (recall that Datalog usually is not turing-complete), like 1 to many relation, Souffle provides support for some complex data structure like the linked list:

1 | .type IntList = [next: IntList, x: number] |

However, it’s obvious that the linked list implementation is neither intuitive and easy to use compared to common programming languages.

Lastly, except for the linked list, I also found it’s hard to express aggregation in Souffle. Say we define a “cheapShop” as “all apples sold in this shop are cheaper than 25” and give facts:

1 | .decl apple(price: number, shop: number) |

To implement `cheapShop`

, a straightforward attempt might be:

1 | cheapShop(shop) :- apple(x, shop), x < 25. |

Unfortunately, it’s incorrect as the x means *exits* and thus shop 2 also meets the requirements. In this case, it’s really hard to express something like “*All* apples in *some* shop” and an indirect solution is using `count`

:

1 | .decl cheapCount(shop: number, cnt: number) |

However, `count`

cannot always work due to the witness problem.

## Postscript

Souffle is a fascinating and efficient programming language and still in active development at this moment. However, in the end, I switch to python to implement my program analysis course project due to the limitation above. In short, the experience of writing Souffle is good and intuitive overall, and personally I wish it good luck.