Learning Souffle
Introduction
Datalog is a programming language derived from Prolog. Compared to Prolog, Datalog has a wide use in database and program analysis, though it imposes a few restrictions on certain syntax of Prolog and thus becomes turing-incomplete. Souffle is one of the efficient implementation of Datalog and I met it for the first time when I was learning the Software Analysis Course taught by Prof. Xiong. Nevertheless, I was too naive to understand the language at that time. Recently, I take the “Program Analysis” course and I have to write about 1k lines of Souffle. This blog post is thus to record my experience on writing Souffle.
The code of this post could be tested online at here
The Chinese version of this post can be found here
Case Study: Reachable
Souffle has a very different programming model compared to most popular programming languages. The very first object in Souffle is “relation”. Programmers provide Souffle with input (facts), the relations and the rules and Souffle would resolve the rules to output the result. For a concrete example:
1 | .decl edge(n: symbol, m: symbol) |
This Souffle program would output all reachable nodes for each node in the directed graph. Let’s understand it line by line.
1 | .decl edge(n: symbol, m: symbol) |
In Soufle, .decl
is used to declare a relation and Souffle
support both symbol
and number
as the data type. Note, symbol
in fact is pretty like the string.
1 | edge("a", "b"). |
These three lines defines some known relationship. Such things are called facts
.
1 | .decl reachable (n: symbol, m: symbol) |
Another relation reachable
is defined and .ouptut
directs Souffle to output its content.
1 | reachable(x, y):- edge(x, y). |
These two lines define two rules with the relation reachable
:
- If there is an edge between x and y, then y is reachable for x, which is obvious.
- If there is an edge between x and y while z is reachable for y, then z is reachable for x.
It looks pretty intuitive, but here I would like to emphasize some hidden lexeme behind the rules:
- Any variables that show both in the right and left, have any lexeme. e.g. x and y in the first rule.
- Any variables in the right but not in the left, have exist lexeme. e.g. y in the second rule.
With these two points, the actual meaning of the second rule is that: For any x and z in reachable
, if there exists
y, such that there is an edge between x and y and z is reachable for y, then z is reachable for x.
It looks more complex, though, it can help learn how Souffle solve rules.
How Souffle Works
To understand souffle better, it’s essential to learn how souffle actually works. Note, the actual meaning of edge
and reachable
relation is defined by us and Souffle only considers them as two relations and thus Souffle won’t really build a directed graph to solve it. To learn the implementation, the definition of Least Fixed Point
should be introduced. In brief, given a set S
and a function g
with domain on sets, if:
S = g(S)
Then we say S
is a Least Fixed Point
.
Souffle in fact derives new facts repeatly from known rules and relations until every relation arrives at its Least Fixed Point
. Take the example above to illustrate this process:
- There is no rules for
edge
relation, so all facts ofedge
relation consists of the least fixed point ofedge
. - Repeatly apply the first rule of
reachable
to add all facts ofedge
toreachable
and get(a, b), (b, c), (c, d)
. - Apply the second rule of
reachable
with regard of the previous explanation, several new facts(a, c), (a, d), (b, d)
are added. At this moment,reachable
has(a, b), (b, c), (c, d), (a, c), (a, d), (b, d)
. - Try all rules in the program and no sets change. In this case, all sets arrive at the least fixed point and the program ends.
Limitation
The least fixed point algorithm makes it easy to write intuitive Souffle code in many scenarios, however it’s not the silver bullet.
There is a strong requirement on the algorithm above: The g
should be monotone, and in Souffle it means if in cyclic rules, negative rules are prohibited. In this case, each time Souffle applies a rule, the set is guarantee-ed to increase. That’s to say, the following program won’t compile:
1 | .decl p(x: number) |
In addition, to be more expressive (recall that Datalog usually is not turing-complete), like 1 to many relation, Souffle provides support for some complex data structure like the linked list:
1 | .type IntList = [next: IntList, x: number] |
However, it’s obvious that the linked list implementation is neither intuitive and easy to use compared to common programming languages.
Lastly, except for the linked list, I also found it’s hard to express aggregation in Souffle. Say we define a “cheapShop” as “all apples sold in this shop are cheaper than 25” and give facts:
1 | .decl apple(price: number, shop: number) |
To implement cheapShop
, a straightforward attempt might be:
1 | cheapShop(shop) :- apple(x, shop), x < 25. |
Unfortunately, it’s incorrect as the x means exits and thus shop 2 also meets the requirements. In this case, it’s really hard to express something like “All apples in some shop” and an indirect solution is using count
:
1 | .decl cheapCount(shop: number, cnt: number) |
However, count
cannot always work due to the witness problem.
Postscript
Souffle is a fascinating and efficient programming language and still in active development at this moment. However, in the end, I switch to python to implement my program analysis course project due to the limitation above. In short, the experience of writing Souffle is good and intuitive overall, and personally I wish it good luck.