Template-Based Verification of Heap-Manipulating Programs

We propose a shape analysis suitable for analysis engines that perform automatic invariant inference using an SMT solver. The proposed solution includes an abstract template domain that encodes the shape of the program heap based on logical formulae over bit-vectors. It is based on computing a points-to relation between pointers and symbolic addresses of abstract memory objects. Our abstract heap domain can be combined with value domains in a straightforward manner, which particularly allows us to reason about shapes and contents of heap structures at the same time. The information obtained from the analysis can be used to prove memory safety and reachability properties, expressed by user assertions, of programs manipulating dynamic data structures, mainly linked lists. The solution has been implemented in the 2LS framework and compared against state-of-the-art tools that perform the best in heap-related categories of the well-known Software Verification Competition (SV-COMP). Results show that 2LS outperforms these tools on benchmarks requiring combined reasoning about unbounded data structures and their numerical contents.


I. INTRODUCTION
Reasoning about dynamic data structures is one of the core problems in software verification.The techniques implemented in state-of-the-art verification tools for C programs such as those competing in the Software Verification Competition (SV-COMP) have shortcomings when it comes to combined reasoning about shape and content of data structures as our experiments revealed.We address this problem in this paper in the context of template-based program verification.
Template-based verification uses a logic-based synthesis approach to inferring the invariants required for proving program properties.It delegates semantic reasoning to SMT solvers and focusses on the design of appropriate template domains and efficient algorithms for finding the optimal template parameters (i.e.least fixed points in the abstract interpretation sense [14]).The use of such templates makes it straightforward to compute invariants describing both shape and value properties of data structures, which is more difficult when combining domains that are based on different principles.
Running example: To better illustrate the concepts and methods proposed in the paper, we use the program in Listing 1 as a running example.It creates a singly-linked list, each node containing a value between 10 and 20 (Lines 7-15).The list is afterwards traversed repeatedly and the value of each node is either incremented by 1 or halved (Lines [16][17][18][19][20][21][22].We add an assertion that, in every iteration, the value of each node stays between 10 and 20.The goal of the analysis is to prove that the assertion always holds.This requires an analysis capable of reasoning about unbounded linked data structures and numerical content of their nodes at the same time.To prove this property we have to infer that the value of the val field of the dynamic objects allocated in Line 7 and 13 is always in the range [10,20].
With the help of our technique, we will infer an invariant for the loop on Line 10 that states the following: • tail may point to the sets of Node objects created in Line 7 and 13.We denote these sets ao 7 and ao 13 , resp.• The next field of ao 7 may point to ao 13 or null.Its val field has a value in the interval [10,10].• The next field of ao 13 may point to ao 13 or null.
However, its val field has a value in the interval [10,20].This means that ao 13 abstracts a set of Node objects whose val fields have values in the interval [10,20].For the loop in Line 18, we infer the invariant that the val fields of ao 7 and ao 13 must both be in the interval [10,20], which implies that the property holds.
Contributions: The contributions of this paper, which form the contents of Sections III-VII, are as follows: 1) We propose a novel abstract template domain for reasoning over heap-allocated data structures such as singly and doubly linked lists using a template-based parameter synthesis engine.2) We show how we can build product and power domain combinations of our heap domain with structural domains (e.g.trace partitioning) and value domains such as template polyhedra that capture the content of data structures.3) We implement our abstract heap domain in the 2LS verification tool for C programs.We demonstrate the power of the proposed domain on benchmarks, which require combined reasoning about the shape and content of data structures, showing that other tools, which performed well in SV-COMP, cannot handle these examples.
II. TEMPLATE-BASED PROGRAM VERIFICATION This section describes the approach to program verification using template-based synthesis of inductive invariants which the 2LS tool [35] is based upon and that underlies our approach too.The source program is first translated into single static assignment (SSA) form.Using this program representation, the verification task can then be expressed as a secondorder logical formula.However, since suitable solvers for such formulae are not available, the verification problem is reduced to synthesising loop invariants using parametrised templates and an SMT solver to find suitable values of the parameters.

A. Program Verification Using Inductive Invariants
A state of a program is a logical interpretation of logical variables corresponding to each program variable.A set of states can be described using a formula-states in the set are defined by models of the formula.Given a vector of variables x, the predicate Init( x) describes the initial states.A transition relation is described as a formula Trans( x, x ′ ).
From these, it is possible to determine the set of reachable states as the least fixed-point of the transition relation starting from the states described by Init( x).This is, however, difficult to compute, so instead, we use an inductive invariant.A verification task requires showing that the set of reachable states does not intersect with the set of error states Err ( x).Using the concept of inductive invariants and existential second-order quantification (∃ 2 ), we can formalise it as:

B. Invariant Inference via Templates
To directly handle Eq. ( 1) by a solver, it would require the capability to deal with second-order logic quantification.Since a suitably general and efficient second-order solver is not currently available, the problem is reduced to one that can be solved by an iterative application of a first-order solver.This reduction is done by restricting the form of the inductive invariant Inv to T ( x, δ) where T is a fixed expression (a socalled template) over program variables x and template parameters δ.This restriction corresponds to the choice of an abstract domain in abstract interpretation-a template only captures the properties of the program state space that are relevant for the analysis.This reduces the second-order search for an invariant to a first-order search for the template parameters: Although the problem is now expressible in first-order logic, the formula contains quantifier alternation, which poses a problem for current SMT solvers.This is solved by iteratively checking the negated formula (to turn ∀ into ∃) for different choices of constants d as candidates for template parameters δ.For a value d, the template formula T ( x, d) is an invariant if and only if Eq. ( 3) is unsatisfiable.
From the abstract interpretation point of view, d is an abstract value, i.e. it represents (concretises to) the set of all program states x that satisfy the formula T ( x, d).The abstract values representing the infimum ⊥ and supremum ⊤ of the abstract domain denote the empty set and the whole state space, respectively: T ( x, ⊥) ≡ false and T ( x, ⊤) ≡ true [8].
Formally, the concretisation function γ is: In the abstraction function, to get the most precise abstract value representing the given concrete program state x, we let α( x) = min( d) such that T ( x, d) ≡ true.Since the abstract domain forms a complete lattice, existence of such a minimal value d is guaranteed.
The algorithm for the invariant inference takes an initial value of d =⊥ and iteratively solves Eq. ( 3) using an SMT solver.If the formula is unsatisfiable, then an invariant has been found, otherwise a model of satisfiability is returned by the solver.The model represents a counterexample to the current instantiation of the template being an invariant.The value of the template parameter d is then updated by combining with the obtained model of satisfiability d ′ using a domain-specific join operator [8].For example, assume we have a program with a loop that counts from 0 to 10 in variable x and we have a template x ≤ d.Let's assume that the current value of the parameter d is 3 and we get a new model d ′ = 4. Then we update the parameter to 4 by computing d ⊔ d ′ = max(d, d ′ ), because max is the join operator for a domain that tracks numerical upper bounds.

C. Source Program Encoding
In this paper, we deal with non-recursive programs with all function calls inlined.As said above, we encode the program into a formula representing a specific static single assignment form (SSA).For acyclic programs, the SSA represents exactly the strongest postcondition of the program-as usual, with a fresh copy x i of each variable x for each program location i where the value of x is modified.The effect of loops is overapproximated as described in [8].In this encoding, special variables called guards are used to track the control flow of the program.In particular, for each program location i, a Boolean variable g i is introduced, and its value encodes whether the program location is reachable.
To see how the over-approximation of program loops is achieved, note that, at the loop head, the program path coming from before the loop joins with the path coming from the end of the loop (assuming that all paths within the loop join before its end; and likewise for the paths coming from before the loop).To achieve acyclicity of the SSA, we cut the path coming from the end of the loop.We then represent the value of each variable x at the loop head using a phi variable x phi whose value is defined by a non-deterministic choice between the value coming from before the loop, say x 0 , and the value coming from the end of the loop.The latter value is represented by a newly introduced loop-back variable x lb .In particular, we let x phi = g ls ?x lb : x 0 where g ls is a socalled loop-select Boolean guard that is unconstrained in order to model the non-deterministic choice.Moreover, to overapproximate the effect of the loop, the value of the loop-back variable x lb is initially unconstrained too and later constrained by the derived candidate loop invariants.
Example.In Listing We now propose a representation of heap memory and operations over it, designed to be used within the approach laid out in Section II.The proposal respects the fact that the considered SSA form is an acyclic program representation, overapproximating reachable values of variables used in loops.

A. Abstract Memory Representation
Under our assumption of fully inlined, non-recursive programs, static memory objects correspond simply to a finite set Var of program variables: we do not need to consider the stack.We let PVar , SVar ⊆ Var , PVar ∩ SVar = ∅, be the sets of variables of pointer and structure type, respectively.A linked data structure in C is typically defined using a struct type, which groups together named fields for the payload data and the link pointers (see Lines 1-4 in Listing 1).We use Fld to denote the finite set of fields used in the given program.Let PFld ⊆ Fld be the set of all pointer-typed fields.
1) Abstract Dynamic Objects: We use abstract dynamic objects to represent dynamic memory objects, i.e. those that are allocated using malloc (or some of its variants) on the heap.An abstract dynamic object represents a set of concrete dynamic objects allocated at the same allocation site i, e.g. by the same malloc call located at Line i in Listing 1.However, a single abstract dynamic object is not sufficient to represent all concrete dynamic objects allocated by a given malloc.The reason for this is that the program may use several independent objects created at an allocation site at the same time.Typically, this issue is solved by the analysis algorithm materialising dynamic objects on-demand.We take a different approach and statically over-approximate the maximum number n i of concrete objects required (see next section below).Hence, we use a set abstract dynamic objects for that purpose.We let AO = ∪ i AO i and require Var ∩ AO = ∅ and AO i ∩ AO j = ∅ for i = j.The set of all objects of our program abstraction is then Obj = AO ∪ Var .
Pairs consisting of an abstract dynamic object and a field, i.e. elements of the set AO × Fld , represent an abstraction of the appropriate fields of all the represented concrete objects.We use the "dot" notation to represent such pairs: e.g.ao i .nextdenotes the abstraction of the next field of all the concrete dynamic objects represented by ao i .
We define Ptr = PVar ∪ ((SVar ∪ AO) × PFld ) to be the set of all pointers of the given program abstraction.Pointers can be assigned addresses of objects.Since we currently do not support pointer arithmetic, the only addresses that we consider are symbolic addresses of static and dynamic objects together with the special address null.The symbolic address of an abstract dynamic object ao i is an abstraction of the symbolic addresses of the concrete dynamic objects represented by ao i .To get the address of both static and dynamic objects, we use the &-operator.Hence, the set Addr of addresses that we consider is defined as Addr = {&o | o ∈ Obj } ∪ {null}. 1) Pre-Materialisation: As mentioned above, instead of materialising dynamic objects on-demand, we pre-materialise a sufficient number n i of them for each allocation site i and encode them into our SSA representation.In order for this abstraction to be sound, it is sufficient that the number n i equals the maximal number of distinct concrete objects allocated at i that are simultaneously pointed to by some pointer at any location of the analysed program.
For each allocation site i, we compute the number n i as follows.First, using a standard static may-alias analysis, we over-approximate, for each program location j, the set P i j of all pointer expressions of the source program that may point to some object allocated at i.These might be pointer variables from PVar , pointer-typed fields of static objects from SVar × PFld , or pointer-typed fields of dynamic objects accessed through dereferences of pointers-i.e.elements of PVar × PFld .For simplicity, we assume that all chained dereferences of the form p → f 1 → f 2 with f 1 , f 2 ∈ PFld are broken into two expressions using an intermediate variable.Overall, P i j ⊆ PVar ∪ ((SVar ∪ PVar ) × PFld ).Next, we compute the must-alias relation ∼ j .For each pair of pointers p and q and for each program location j, p ∼ j q iff p and q must point to the same concrete dynamic object at j. Finally, we partition the set P i j into equivalence classes by ∼ j , and n i is given by the maximal number of such classes at any j.

B. Operations over the Abstract Memory Representation 1) Dynamic Memory Allocation:
We represent a call to malloc at program location i by a non-deterministic choice among the addresses of objects from the set AO i .Hence, a statement p = malloc(. ..) at i is translated to the formula ) where g os i,j , 1 ≤ j < n i are free Boolean variables, so-called object-select guards.
Example.In Listing 1, two calls of malloc occur on Lines 7 and 13.For Line 7, a single abstract dynamic object ao 7 is created as there is just one concrete object allocated. 2he malloc on Line 13 must be represented by two objects ao 1  13 and ao 2  13 as, e.g. on Line 14, variables tail and p may point to different concrete objects allocated by this malloc call.Specifically, the statement on Line 13 will be translated into the equality p 13 = g os 13 ?&ao 1 13 : &ao 2  13 .Abstract dynamic objects ao 1  13 and ao 2  13 then collectively represent all concrete dynamic objects allocated in the loop.
2) Reading through Dereferenced Pointers: We handle expressions of the form p → f for p ∈ PVar , f ∈ Fld appearing on the right-hand side of assignments or in conditions as follows.We first perform a may-points-to analysis, which overapproximates for each pointer p ∈ Ptr and each program location i the set of objects from Obj that p may point to at i. Using the result of the analysis, we can replace the pointer dereference p → f by a choice among the values of the field f of the objects possibly pointed to by p.
To facilitate the replacement, we introduce purely logical dereference variables.Assume that at program location i there appears an R-expression p → f and that the pointer p may point to a set of objects O ⊆ Obj at i.We replace the use of p → f by using a fresh variable drf (p).f i whose value is defined by the formula where j, k are the relevant versions of the concerned variables at program location i and o ⊥ denotes a special "unknown object" (a result of a dereference of an unknown or invalid (null) address). 3xample.We give the translation of the assignment p = p → next from Line 18 in Listing 1.Since the assignment is executed at the end of each loop iteration, we define its program location to be Line 22.At this program location, p may point to the set of objects {ao 7 , ao 1  13 , ao 2 13 }.Hence, the assignment will be represented by the following formula.As an optimisation, if the dereference variable is once created and the value of the concerned expression does not change, we reuse the existing dereference variable.Second, when dealing with a statement like v = p → f , the use of the dereference variable may seem unnecessary as one can plug v i instead of drf (p).f i into the formula defining the value of drf (p).f i .This can be done, but, as explained below, the use of dereference variables can give us more precision when dealing with sequences of reading and writing operations.
3) Writing through a Dereference: When writing into an abstract dynamic object ao i , we need to respect the fact that only one concrete object abstracted by ao i is actually written to, and the others keep the original value.Hence, we need to make a join of the original and new value.We again use dereference variables to facilitate the transformation.
Assume that at program location i, we have an assignment p → f = v, p ∈ PVar , f ∈ Fld , v ∈ Var , and that p may point to a set of objects O ⊆ Obj at the entry to i. 4We replace the L-expression p → f by a fresh variable drf (p).f i whose value is defined by the value of v, i.e. we assert that drf (p).f i = v l where v l is the version of v valid at program location i.We then use drf (p).f i to update the value of the field f of the referenced object, using the formula o∈O o.f i = (p j = &o ∧ g os i ) ? drf (p).f i : o.f k where j, k are the relevant versions of the variables p and o.f at program location i. 5 The formula expresses the fact that o.f i gets updated if p equals the address of o, otherwise its value remains unchanged; k is the last program location before i where the value of o.f was changed.The objectselect guard g os i which is a freshly introduced unconstrained Boolean variable, enforces that the value of field f is changed in only one of the concrete objects abstracted by o while it remains unchanged in the other objects abstracted by o.If o is not allocated in a loop (and hence representing a single instance), g os i may be omitted.Example.For illustration, the assignment tail->next=p from Line 15 of Listing 1 will be translated into the formula: (drf (list As mentioned above, the use of dereference variables may increase the precision of our analysis.This happens in particular when we write into an abstract object through some pointer and later read the written value back through the same pointer (or a pointer aliased with it) without any change of the pointers and the concerned value in between.Then, we get back exactly the value that we wrote, which would otherwise not happen due to the joins involved.
4) Memory Free: Since the free operation has no effect on the heap reachability itself, we defer its discussion to Section V devoted to checking memory safety.

IV. AN ABSTRACT DOMAIN FOR HEAP ANALYSIS
We will now work towards our template-based abstract domain suitable for reasoning about properties of heapmanipulating programs, starting from a base shape domain and refining it.We will show that, due to the fact that all domains in the considered approach are based on templates, the new domain can be easily combined with other domains, e.g. for inferring properties about numerical data of data structures.

A. Base Abstract Shape Domain
In the considered approach, an abstract domain needs to have the form of a template-a fixed, parametrised, quantifierfree first-order logic formula describing the desired property of a program.As described in Section II, templates are used to efficiently compute loop invariants of the analysed program.These are used to constrain values of the loop-back variables that are used in the SSA-based program encoding to overapproximate values returning from the end of the loop to the loop head.Hence, a loop invariant describes a property that holds for some program variables at the end of the loop body after any iteration of the loop.Hence, we limit our shape domain to the set Ptr lb of all loop-back pointers.Let L be the set of all loops in the program.Since there is one loop-back pointer variable for each pointer variable and each loop, we define Ptr lb = Ptr × L. We denote elements (p, l) ∈ Ptr lb by p lb i where i is the program location of the end of the loop l.Intuitively, the value of p lb i is an abstraction of the value of the pointer p coming from the end of the body of the loop l.The property that our base shape domain describes is the may-point-to relation between the set Ptr lb and the set Addr . 6he template of our base shape domain has the form of the formula p lb i = a).Abstract values of template rows corresponding to pointer fields of abstract dynamic objects allow the domain to describe unbounded linked paths in the heap, such as list segments.
Example.In Listing 1, a list segment is created by the first loop.Objects in the segment are linked through the pointer field next, and they are represented by the abstract dynamic objects ao 1  13 and ao 2 13 .In our base shape domain, the shape of this segment will be described by an invariant for the first loop, specifically by the two template rows for ao 1   = null.These formulae say that the next fields of both ao 1  13 and ao 2 13 may either point to one of the objects themselves or to null.This describes an unbounded linked path in the heap composed of objects abstracted by ao 1  13 or ao 2 13 and terminated by null.

B. Guarded Shape Templates
In order to use the base shape domain in our approach, we have to augment it with information about the guard variables that encode the program's control flow in the SSA.The guards express when an appropriate loop-back control edge is executed and the loop-back pointer has a defined value 7 .A row of a guarded shape template is defined as a formula

The row guard G p lb
i is a conjunction of the following guards: • The guard g lh j linked with the head of the loop l located at program location j, encoding that the loop l is reachable.
• The guard g ls i linked with the use of p lb i .The value of g ls i is true if p lb i is chosen as the value of the corresponding phi variable at the head of l (see Section II-C).
• If p lb i describes a pointer field of some abstract dynamic object (i.e. it has the form ao k j .flb i for some ao k j ∈ AO, f ∈ Fld ), we also use the guard g ao k j linked with the allocation of ao k j at program location j.This guard conjoins the guard expressing reachability of program location j with the object-select guards g os j,l and their negations denoting allocation of the k-th materialisation ao k j of the object allocated at j. Example.In Section IV-A, we presented a shape invariant describing the linked segment created by the first loop from Listing 1.The corresponding guards for the two template rows of that invariant are G ao 1 13 .nextlb 16 = g 10 ∧ g ls 16 ∧ (g 13 ∧ g os 13 ) and G ao 2 13 .nextlb 16 = g 10 ∧ g ls 16 ∧ (g 13 ∧ ¬g os 13 ).Here, the loop head guard is g 10 , the loop-select guard is g ls 16 , and the allocation guard is given by the guard of the reachability of the allocation site g 13 and by the appropriate object-select guards (g 13 for ao 1  13 and ¬g os 13 for ao 2 13 , respectively).

C. Shape Domain with Symbolic Loop Paths
Unfortunately, guarded shape templates are not precise enough for many heap-manipulating programs.One often needs to allow the invariant of a loop to be able to distinguish which loops were or were not executed while reaching the given loop.This can, e.g.distinguish which objects were allocated and can hence be processed in the given loop.
To deal with the above problem, we introduce the concept of symbolic loop paths and compute different invariants for different paths.Since we use loop-select guards to express the control flow through the loops (see Section II-C), a symbolic loop path is simply a conjunction of loop-select guards. 8et G ls be the set of all loop-select guards of all loops in a program.A symbolic loop path π is then formally defined as π = g∈G ls l g where l g is a literal of the variable g, i.e. either g or ¬g.We use Π to denote the set of all symbolic loop paths of a given program.A shape template extended with symbolic loop paths is then given by a formula T L ≡ π∈Π π =⇒ T G π where the T G π formulae are guarded shape templates as defined in Section IV-B.Here, π ⊥ a special path containing negative literals only.On that path no loop invariants are computed.
Example.We now show invariants for the pointer p for the second loop of the program in Listing 1.Using our (trace-insensitive) guarded shape domain, the corresponding template row would be T G p lb

22
({&ao 1  13 , &ao 2 13 , null}).In other words, p would be understood as possibly pointing to ao 1  13 or ao 2  13 even on paths where they were not allocated.However, symbolic loop paths allow us to obtain two different invariants depending on the execution of the first loop (for simplicity, we only provide the appropriate template row): namely, g ls 16 ∧ g ls 22 ⇒ T G p lb

22
({&ao 1 13 , &ao 2 13 , null}) for the case when the body of the first loop is executed and ¬g ls 16 ∧ g ls 22 ⇒ T G p lb

22
({null}) for the case when the body of the first loop is not executed.

D. Combinations of Domains
The true power of the template-based verification approach lies in the simplicity of domain combinations.Since templates are general logical formulae, they can be easily composed, forming abstract domains capable of describing more complex properties of programs while relying on the solver to do the heavy-lifting on the combination of the domain operations and the mutual reduction of their abstract values.
1) Power Templates: The definition of shape templates with symbolic loop paths shows one way how a complex template can be formed from a simpler one.In this case, the template parameter, i.e. the abstract value, maps particular symbolic loop paths to sets of parameters of the original shape template.In fact, the shape domain could be replaced by any other abstract domain.The symbolic paths template can hence be viewed as a power template-in the sense of power domains [15]-which assigns to each element of the base domain an element of the exponent domain.
2) Product Templates: From the perspective of program analysis, a very interesting possibility is the combination of the shape domain with an abstract domain capable of describing values of variables of non-pointer types, e.g.numerical variables (such as the well-known interval or octagon domains).The simplest way to achieve such a combination is to use a Cartesian product template that combines templates of different kinds to be used independently side-by-side.The proposed shape template with loop-back guards T G from Section IV-C can be combined with a template for analysis of numerical values T V by simply taking their conjunction, i.e.T G ∧T V .This not only allows us to analyse programs that use pointer and numerical variables simultaneously, but also to reason about the contents of data structures on the heap.We achieve this by analysing numerical fields of abstract dynamic objects using the value part of the template.
In addition, we use this product template as the inner template of the template with symbolic loop paths, forming an even stronger abstract domain: Using this domain for the running example allows us to analyse the shape and the contents of the linked list at the same time, obtaining the invariants described in Section I that enable us to prove the given property of interest.

V. MEMORY SAFETY ANALYSIS
Apart from checking user-defined assertions, we can also verify memory safety.This includes a number of properties: (1) pointer dereferencing safety, (2) free safety, and (3) absence of memory leaks.

A. Dereferencing a null Pointer
Since our invariants are over-approximating the reachable program states, we can soundly verify may (or better called must-not) properties.To check dereferences of null, for each expression * p occurring in a program location i, we verify the assertion p j = null where p j is the version of p valid at i.

B. Free Safety
Free safety includes the absence of dereferencing a freed pointer and freeing an already freed pointer (a so-called "double free").To prove absence from these errors, we introduce a new special variable fr initialised to null, which is then nondeterministically set to the address of the object to be freed in a free call.We replace each call of the form free(p) at program location i by a formula fr i = g f r i ?p j : fr k , where p j and fr k are the versions of p and fr , respectively, valid in i, and g f r i is a free Boolean variable (a so-called free guard).Treating fr as a standard pointer-typed variable allows us to over-approximate the set of all freed addresses with the help of our shape domain.Then, in each program location i where either * p or free(p) occurs, we can check for the assertion p j = fr k to prove free safety (here, p j and fr k are again versions of p and fr , respectively, valid at i).
Even though this approach is sound, it is often too imprecise.Freeing one of the concrete objects does not mean that all objects were freed and that it is not safe any more to dereference/free the abstract object.To improve precision, we modify the representation of malloc calls.At each allocation site i, we add one more object ao co i to the set {ao k i }.The object can be chosen as the result of the allocation nondeterministically like any other ao k i , but it is guaranteed to be allocated only once (by an additional condition checking that, upon its allocation, no loop-back pointer can point to it).Hence, ao co i represents a concrete object.Then, for each allocation site i, we only allow &ao co i to be assigned to fr .The checks for free safety described above are done on concrete objects only, avoiding possible imprecision stemming from dealing with multiple objects represented by a single abstract object which would join the possibly different values of these objects.Also, as ao co i represents an arbitrary concrete object allocated at i, if safety can be proven for it, it can be assumed to hold for any other object allocated at i.

C. Absence of Memory Leaks
Using fr , we then check whether some ao co i object may be not freed at the end of the program (if there is a leak, it must be possible to show it on some concrete object).Unfortunately, as we do not track the sequencing of abstract objects representing a set of objects allocated at an allocation site (even when they form a list segment), our analysis typically sees that ao co i may be skipped in the deallocation loops, and hence remains inconclusive on the memory leaks.

VI. IMPLEMENTATION
We implemented9 the proposed shape domain within the 2LS framework [35] that uses the template-based verification method described in Section II.We extended the SSA form generated by the framework to handle dynamic memory allocation.2LS is based on the CPROVER framework [13], which includes an SMT solver based on reduction to propositional logic.We used Glucose 4.0 as the back-end solver in our experiments.We let 2LS inline all functions before running our analysis.For combination with numerical domains described in Section IV-D, we use the template polyhedra domain that is already a part of 2LS.Our approach handles any sequential C program, however, invariants are not inferred for array contents and memory manipulation using pointer arithmetic.

VII. EXPERIMENTS
We performed the experiments to show how our approach improves the performance of 2LS and also how it compares to other state-of-the-art software verification tools. 10We used BenchExec [4] to run the experiments with time limit set to 900 s and memory limit to 15 GB.The first comparison was done on the subcategories of the SV-COMP benchmarks [36] related to memory safety, particularly ReachSafety-ControlFlow, ReachSafety-Heap, MemSafety-Heap, MemSafety-LinkedLists, MemSafety-Others.Tasks in ReachSafety are checked for reachability of an error condition, tasks in MemSafety for absence of invalid pointer dereference, invalid free, and memory leaks.We compared our implementation to the version of 2LS from SV-COMP'17 without the proposed shape analysis.
The results are shown in Table I.The proposed method significantly improves the performance of the tool.Due to missing heap analysis support, the old version of 2LS often reported wrong results and therefore it had a negative score in three subcategories.2LS with our analysis obtained a positive score in all subcategories and it is also faster in some of them.
Although the results show an improvement, we are still unable to compete with the best tools of SV-COMP'18 in the heap categories.This is mainly because our analysis does not yet support pointer arithmetic and is not yet expressive enough to handle various kinds of trees or nested lists.
However, the main purpose of our work was to extend possibilities of analysing combined shape and value properties of programs.To evaluate, we performed an experiment comparing our tool with the leaders of SV-COMP'18 in the heap-related categories, on tasks combining manipulation of unbounded data structures with a need to reason about the data stored in these structures.All these tasks 11 correct programs created by our team, since no such programs are part of the SV-COMP benchmarks yet.For each task, we verify that no error state is reachable.The results of the evaluation are shown in Table II.Numbers in the table represent CPU time in seconds needed for the analysis of the example.The value unknown means that the tool was not able to analyse the task.
On these benchmarks, 2LS outperforms the other tools significantly.Even tools specialised in shape analysis, Forester [17] and Predator [16], often report unknown, timeout or even find a false error.This is probably caused by their inability to reason about the data stored in the lists.More general tools such as Symbiotic [9] or Ultimate Automizer [18] often time out since they probably lack an efficient abstraction for combination of shape and value properties.CPAChecker [3] (in the CPA-Seq configuration from SV-COMP'18) solved four tasks but times out on the rest.

VIII. RELATED WORK
There is a vast body of work on shape analysis.We can only give an overview of the main lines of research in this section.For a more complete survey, we refer to [25].
Many of the existing approaches to shape analysis are based on abstract interpretation [14], some of them dating back to 1980s [23].In particular, the TVLA engine [34] came with a flexible approach based on abstract interpretation over a set of user-supplied predicates.In comparison, our approach can be viewed as using a set of parametrised predicates.
Several further approaches based on abstract interpretation and various underlying formalisms (logics, automata, graphs) are mentioned below.In general, our approach differs in that it uses inductive invariant synthesis based on gradually refining parameters of templates via SMT solving on the SSA form (with no iterative execution), instead of iteratively executing the program using abstract transformers and widening until a fixed point is reached.Hence, our approach does not use widening over gradually growing instances of dynamic data structures to capture unbounded sets of instances of such structures.Also, it does not use on-demand materialisation of concrete memory node from an abstract representation of a set of such nodes followed by again abstracting the resulting  Various extensions of Hoare logic have been developed to cope with heap-manipulating programs.E.g., [22] proposed a way to reason about lists using the Mona tool, which was then extended to more complex data structures [29] and their contents [27].Another program logic is separation logic [32], which enables reasoning about local memory modifications, rather than looking at the memory as a whole.It has been used for deductive program verification based on user-provided annotations [11].Fully automated approaches based on separation logic and abstract interpretation have also been proposed and used, e.g., in the Space Invader [37] and SLAyer [2] tools.
More recently, automation of separation logic using SMT solvers by reduction to effectively propositional logic has been proposed by [31], [20], [21].A different approach [30] uses the Houdini algorithm to find inductive invariants over heap predicates generated from grammars.These works share the common approach with our method to use SMT solvers to reason about heap properties; however, each of them uses different techniques for synthesising the invariant predicates.For an overview on template-based analysis techniques for numerical properties, we refer to [8].
Other fully automated approaches based on abstract interpretation build on shape graphs [26], such as the Predator tool [16], or tree automata and regular tree model checking, such as [6] or the Forester tool [17].These approaches primarily aim at handling unbounded heap structures.Their combination with reasoning about value properties is not easy as shown in the works [1], [19] that extended Forester with reasoning about finite data and a specialised support for handling ordered list segments.As our experiments showed, Forester and Predator could handle almost none of our examples.
Several further abstract domains have been proposed for combining shape and data domains (e.g.[10], [5]).Our approach has the advantage that such domain combinations need not be designed from scratch.
Beyond the mentioned tools, several participants in SV-COMP, such as CPAChecker [3], Symbiotic [9], Ultimate Automizer [18], or CBMC [13], provide support for dealing with dynamic data structures and their content.However, they cannot handle data structures of unbounded size.
All the above methods are store-based, i.e., they describe the heap explicitly by a graph encoded in different ways.Other approaches are inspired by storeless semantics [24] using pointer access paths [12], [33], [28], [7] to describe reachability properties on the heap.This idea proved most suitable for our purposes.A pointer access path does not concretely express the heap state, it only describes which dynamic objects are reachable from a pointer.Using a set of access paths for each pointer, one can efficiently describe the shape of the heap.Compared with our method, the above approaches, however, use abstract interpretation over CFGs, and their support of dealing with the data content is limited [28].

IX. CONCLUSIONS AND FUTURE WORK
We present a verification approach for heap-manipulating programs based on template-based invariant synthesis.We propose an abstract template domain capable of expressing reachability in dynamic data structures.We show that the domain can easily be combined with other domains to form power and product domains that are able to express complex properties about the shape and the contents of data structures.We experimentally evaluate our approach by within the 2LS framework.We plan to extend the technique to support pointer arithmetic and to develop templates that can express more complex data structure shapes, such as trees, skip-lists, or nested lists.Moreover, we work on using our method to infer function summaries to enable a modular verification approach.
i ) describes the may-point-to relation for the loop-back pointer p lb i .The parameter d p lb i ⊆ i ) ≡ ( a∈d p lb i Ptr coming from the end of a loop l ∈ L.

TABLE I :
Comparison of 2LS using the proposed method with the previous version of the tool over the SV-COMP benchmark.

TABLE II :
Comparison of 2LS with other tools on examples combining unbounded data structures and their stored data.