Learning Souffle

Introduction

Datalog is a programming language derived from Prolog. Compared to Prolog, Datalog has a wide use in database and program analysis, though it imposes a few restrictions on certain syntax of Prolog and thus becomes turing-incomplete. Souffle is one of the efficient implementation of Datalog and I met it for the first time when I was learning the Software Analysis Course taught by Prof. Xiong. Nevertheless, I was too naive to understand the language at that time. Recently, I take the “Program Analysis” course and I have to write about 1k lines of Souffle. This blog post is thus to record my experience on writing Souffle.

The code of this post could be tested online at here

The Chinese version of this post can be found here

Case Study: Reachable

Souffle has a very different programming model compared to most popular programming languages. The very first object in Souffle is “relation”. Programmers provide Souffle with input (facts), the relations and the rules and Souffle would resolve the rules to output the result. For a concrete example:

1
2
3
4
5
6
7
8
.decl edge(n: symbol, m: symbol)
edge("a", "b").
edge("b", "c").
edge("c", "d").
.decl reachable (n: symbol, m: symbol)
.output reachable
reachable(x, y):- edge(x, y).
reachable(x, z):- edge(x, y), reachable(y, z).

This Souffle program would output all reachable nodes for each node in the directed graph. Let’s understand it line by line.

1
.decl edge(n: symbol, m: symbol)

In Soufle, .decl is used to declare a relation and Souffle support both symbol and number as the data type. Note, symbol in fact is pretty like the string.

1
2
3
edge("a", "b").
edge("b", "c").
edge("c", "d").

These three lines defines some known relationship. Such things are called facts.

1
2
.decl reachable (n: symbol, m: symbol)
.output reachable

Another relation reachable is defined and .ouptut directs Souffle to output its content.

1
2
reachable(x, y):- edge(x, y).
reachable(x, z):- edge(x, y), reachable(y, z).

These two lines define two rules with the relation reachable:

  1. If there is an edge between x and y, then y is reachable for x, which is obvious.
  2. If there is an edge between x and y while z is reachable for y, then z is reachable for x.

It looks pretty intuitive, but here I would like to emphasize some hidden lexeme behind the rules:

  1. Any variables that show both in the right and left, have any lexeme. e.g. x and y in the first rule.
  2. Any variables in the right but not in the left, have exist lexeme. e.g. y in the second rule.

With these two points, the actual meaning of the second rule is that: For any x and z in reachable, if there exists y, such that there is an edge between x and y and z is reachable for y, then z is reachable for x.

It looks more complex, though, it can help learn how Souffle solve rules.

How Souffle Works

To understand souffle better, it’s essential to learn how souffle actually works. Note, the actual meaning of edge and reachable relation is defined by us and Souffle only considers them as two relations and thus Souffle won’t really build a directed graph to solve it. To learn the implementation, the definition of Least Fixed Point should be introduced. In brief, given a set S and a function g with domain on sets, if:

S = g(S)

Then we say S is a Least Fixed Point.

Souffle in fact derives new facts repeatly from known rules and relations until every relation arrives at its Least Fixed Point. Take the example above to illustrate this process:

  1. There is no rules for edge relation, so all facts of edge relation consists of the least fixed point of edge.
  2. Repeatly apply the first rule of reachable to add all facts of edge to reachable and get (a, b), (b, c), (c, d).
  3. Apply the second rule of reachable with regard of the previous explanation, several new facts (a, c), (a, d), (b, d) are added. At this moment, reachable has (a, b), (b, c), (c, d), (a, c), (a, d), (b, d).
  4. Try all rules in the program and no sets change. In this case, all sets arrive at the least fixed point and the program ends.

Limitation

The least fixed point algorithm makes it easy to write intuitive Souffle code in many scenarios, however it’s not the silver bullet.

There is a strong requirement on the algorithm above: The g should be monotone, and in Souffle it means if in cyclic rules, negative rules are prohibited. In this case, each time Souffle applies a rule, the set is guarantee-ed to increase. That’s to say, the following program won’t compile:

1
2
3
4
5
.decl p(x: number)
.decl q(x: number)

p(x) :- !q(x).
q(x) :- !p(x).

In addition, to be more expressive (recall that Datalog usually is not turing-complete), like 1 to many relation, Souffle provides support for some complex data structure like the linked list:

1
2
3
4
5
6
7
8
.type IntList = [next: IntList, x: number]
.decl L(l: IntList)
L([nil,10]).
L([r1,x+10]) :- L(r1), r1=[r2,x], x < 30.
.decl Flatten(x: number)
Flatten(x) :- L([_,x]).
.output Flatten
.output L

However, it’s obvious that the linked list implementation is neither intuitive and easy to use compared to common programming languages.

Lastly, except for the linked list, I also found it’s hard to express aggregation in Souffle. Say we define a “cheapShop” as “all apples sold in this shop are cheaper than 25” and give facts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
.decl apple(price: number, shop: number)

apple(10, 1).
apple(23, 2).
apple(32, 2).

.decl cheapApple(price: number, shop: number)

cheapApple(price, number) :-
apple(price, number), price <= 25.

// A cheap shop only sells apples cheaper than 25
// and thus only shop 1 meets this requirement.
.decl cheapShop(shop: number)

To implement cheapShop, a straightforward attempt might be:

1
cheapShop(shop) :- apple(x, shop), x < 25.

Unfortunately, it’s incorrect as the x means exits and thus shop 2 also meets the requirements. In this case, it’s really hard to express something like “All apples in some shop” and an indirect solution is using count:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
.decl cheapCount(shop: number, cnt: number)

cheapCount(shop, cnt) :-
apple(price, shop),
count : { cheapApple(_, shop) } = cnt.

.decl appleCount(shop: number, cnt: number)

appleCount(shop, cnt) :-
apple(price, shop),
count : {apple(_, shop)} = cnt.

cheapShop(shop) :-
appleCount(shop, cnt),
cheapCount(shop, cnt2),
cnt = cnt2.

.output cheapShop

However, count cannot always work due to the witness problem.

Postscript

Souffle is a fascinating and efficient programming language and still in active development at this moment. However, in the end, I switch to python to implement my program analysis course project due to the limitation above. In short, the experience of writing Souffle is good and intuitive overall, and personally I wish it good luck.

Reference