Using JBSE¶
This section will describe in details how to use JBSE to symbolically execute Java bytecode programs and analyze them by means of assertions and assumptions.
The symbolic execution classes¶
The class that is responsible of performing symbolic execution of a program is jbse.jvm.Engine
. It implements a symbolic JVM that can execute an arbitrary Java method in a step-by-step fashion, i.e., one bytecode at time. In the case a symbolic state has more than one successor, the Engine
selects one of them as the next state, and stores the others in an internal data structure. At the end of a path, the Engine
can be backtracked to one of the previously stored states.
The Engine
class offers a low-level API, that allows a fine-grain control of symbolic execution, but is also quite tedious to use correctly. For this reason, JBSE offers the jbse.jvm.Runner
class. A Runner
object drives an Engine
to perform an exhaustive symbolic execution of a method. To this end, the Runner
repeatedly steps and backtracks the Engine
until it visits all (or a suitable subset of) the states in the symbolic execution tree of the method under execution. Users can customize the behavior of a Runner
object by registering a suitable listener object that provides a set of callback methods. The Runner
will invoke the appropriate callback method upon occurrence of an event, e.g., before or after a bytecode step, at the entry of a method, upon backtrack, etc.
By injecting behavior through callbacks it is possible to customize the behavior of a Runner
object and create applications. The jbse.apps.run.Run
class is an example of an application that is done this way. As discussed in the introduction, this application symbolically executes a method and prints to the console information about the execution as, e.g., the traversed states, their path conditions, the shape of the symbolic execution tree, the violations of the assertions, etc. We will introduce the capabilities of the Run
application in a later section, while this section focuses on the use of the Engine
and Runner
classes.
Creating a symbolic executor¶
The creation of an Engine
is performed by a suitable Builder object of class jbse.jvm.EngineBuilder
. This object exposes a single method EngineBuilder.build(EngineParameters)
, that accepts as argument an object of class jbse.jvm.EngineParameters
and returns a new Engine
. The EngineParameters
object gathers the (many) configuration parameters that must be provided to create an Engine
. Similarly, a Runner
must be created via a jbse.jvm.RunnerBuilder
object by invoking its RunnerBuilder.build(RunnerParameters)
method. The jbse.jvm.RunnerParameters
class offers a superset of the methods of the EngineParameter
class, i.e., all the methods to configure the parameters necessary to create the Engine
underlying the Runner
, plus the methods to configure the parameters specific to the Runner
(e.g., the listener). We now describe how an EngineParameters
or a RunnerParameters
object is set to suitably configure an Engine
or a Runner
.
Setting the paths¶
An Engine
(and similarly a Runner
) must know the paths where it can find the classes of the application that it must symbolically execute, and of the libraries the application invokes, including the standard Java library. Mirroring what Oracle’s as Hotspot JVM does, JBSE distinguishes the bootstrap classpath, where the core JDK classes reside, the extensions classpath, home of the classes that must be loaded with the extensions mechanism, and the user classpath, pointing to the classes that belong to the application to be run (the latter is what we usually mean when we talk about the “classpath of an application”). The bootstrap and the extensions classpath are, by default, automatically configured by the EngineParameters
and RunnerParameters
objects by detecting the bootstrap and the extensions classpath of the JDK on which JBSE executes. This means that in most cases you do not need to configure them manually. Should you ever need to override this default, you can set the bootstrap classpath by specifying the home directory of a Java 8 JDK setup via the EngineParameters.setJavaHome(String)
or EngineParameters.setJavaHome(Path)
methods. To modify the extensions classpath you may add paths to it by invoking EngineParameters.addExtClasspath(String...)
or EngineParameters.addExtClasspath(Path...)
. If you want to completely override it, you need to first clear it from its default value by invoking EngineParameters.clearExtClasspath()
. Finally, to add paths to the user classpath use either the EngineParameters.addUserClasspath(String...)
or the EngineParameters.addExtClasspath(Path...)
method (by default, the user classpath is empty). The RunnerParameter
class offers identical methods.
Setting the target method¶
To specify which method the Engine
must symbolically execute, invoke EngineParameter.setMethodSignature(String, String, String)
with the signature of the method. The signature of the method is composed by three strings: The name of class the method belongs to, the method descriptor (a string listing the types of the method parameters and of the return value), and the method’s name. The name of the class must be specified in binary name format: It must be fully qualified with the name of the package, and possibly of the other classes, that contain it, the separator dots must be replaced by slash (/
) characters in case the container on the left of the dot is a package, and by a dollar sign ($
) in case the container is a class. For example the binary name of the class java.util.LinkedList
is java/util/LinkedList
, and the binary name of the nested class java.util.LinkedList.Node
is java/util/LinkedList$Node
. The descriptor is necessary to distinguish overloaded methods. It is composed by a list of parameters types, enclosed in parentheses, followed by the type of the return value, or V
if the method is declared void
. The types of the parameters must be encoded as type descriptors according to Table 4.3-A of the Java Virtual Machine Specification.
Java type |
Type descriptor |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
ClassName |
|
JavaType |
|
Table 1 reports the correspondence between Java types and types descriptors for reference. As an example, if we want to symbolically execute a method double m(int, Object[][])
of class foo.Baz
, we must configure the EngineParameters
object by invoking setMethodSignature("foo/Baz", "(I[[Ljava/lang/Object;)D", "m")
.
Setting the calculator¶
An Engine
needs to be able to create, from time to time, new symbolic expression. When this happens, some basic manipulations are usually performed on the created symbolic expression at the purpose of simplifying it. For example, it is possible to configure an Engine
so it, whenever it must add the symbol a
to the number 0
, simplifies the resulting expression a + 0
to a
. To this end, an Engine
depends on an object extending the abstract class jbse.val.Calculator
, that it uses to create and manipulate all the symbolic expressions, and that performs all the simplifications on-the-fly. It is therefore necessary to inject the dependency to a suitable subclass of Calculator
through the method setCalculator(Calculator)
of the classes EngineParameters
and RunnerParameters
. Currently, the only concrete subclass of Calculator
is the class jbse.rewr.CalculatorRewriting
. A CalculatorRewriting
applies a set of rewriting rules to simplify all the symbolic expressions it produces. It is possible to plug the rewriting rules, implemented as subclasses of jbse.rewr.RewriterCalculatorRewriting
, by invoking the CalculatorRewriting.addRewriter(RewriterCalculatorRewriting)
method. The package jbse.rewr
contains a collection of rewriting rules performing some useful simplifications. The most important ones, that are essentially compulsory, are:
jbse.rewr.RewriterExpressionOrConversionOnSimplex
: necessary to simplify all the expressions whose operands are numeric, e.g., to simplify3 + 2
to5
;jbse.rewr.RewriterFunctionApplicationOnSimplex
: similar to the previous, where the operator is a (symbolic) function application assin
,cos
,max
,min
…jbse.rewr.RewriterZeroUnit
: simplifies some operations with zero or one that have trivial result: e.g., simplifiesa * 0
to0
, and1 * b
tob
;jbse.rewr.RewriterNegationElimination
: eliminates double negations simplifying, e.g.,- (- a)
toa
.
The other rewriters in the package jbse.rewr
can be used to simplify nonlinear expression with trigonometric operators and square roots. Historically they have been used to check properties involving distances in the Cartesian plane and polar-to-cartesian and their inverse coordinates conversions. A more mundane setup of JBSE would be as follows:
import jbse.jvm.EngineParameters;
import jbse.rewr.CalculatorRewriting;
import jbse.rewr.RewriterExpressionOrConversionOnSimplex;
import jbse.rewr.RewriterFunctionApplicationOnSimplex;
import jbse.rewr.RewriterNegationElimination;
import jbse.rewr.RewriterZeroUnit;
...
EngineParameters p = new EngineParameters();
...
CalculatorRewriting calc = new CalculatorRewriting();
calc.addRewriter(new RewriterExpressionOrConversionOnSimplex());
calc.addRewriter(new RewriterFunctionApplicationOnSimplex());
calc.addRewriter(new RewriterZeroUnit());
calc.addRewriter(new RewriterNegationElimination());
p.setCalculator(calc);
Unfortunately the order the rewriters are added to the calculator matters. Moreover, some rewriters depend on the presence of other rewriters. Refer the Javadoc of the rewriters classes for more information.
Setting the decision procedures¶
To prune the unfeasible branches of the symbolic execution tree an Engine
must decide whether a symbolic expression of boolean type is satisfiable. To this end, the Engine
uses an object with class jbse.dec.DecisionProcedureAlgorithms
. Upon configuration it is necessary to create this object and inject the dependency by invoking the setDecisionProcedure(DecisionProcedureAlgorithms)
method exposed by the EngineParameters
and RunnerParameters
classes. This because a DecisionProcedureAlgorithms
object can be configured according to the capabilities it needs to provide.
For the sake of simplicity, a JBSE decision procedure object recognizes a proper subset of the possible boolean clauses produced by an Engine
. As a consequence, no single decision procedure object is able to decide the satisfiability of all the clauses generated by all the possible symbolic executions. For this reason, decision procedure objects must be organized in a Chain of Responsibility: Whenever a decision procedure is unable to decide the satisfiability of a clause, it delegates the task to the next decision procedure in the chain.
A JBSE decision procedure class extends the abstract class jbse.dec.DecisionProcedure
, and provides a set of methods to check the satisfiability of all the possible boolean clauses that JBSE may produce. The class DecisionProcedureAlgorithms
, on the other hand, is a Decorator adding to an arbitrary DecisionProcedure
a set of methods that, for each bytecode instruction, perform the correct sequences of satisfiability queries necessary to implement the correct bytecode semantics. The necessary steps to configure the Engine
are therefore:
Creating a set of
DecisionProcedure
objects to decide a sufficient subset of the clauses that might appear in the path conditions of the program under analysis, and arranging them in a Chain of Responsibility;Wrapping the topmost
DecisionProcedure
in the Chain of Responsibility in aDecisionProcedureAlgorithms
object;Setting the
EngineParameters
orRunnerParameters
object via thesetDecisionProcedure(DecisionProcedureAlgorithms)
method.
Most decision procedures are declared in the package jbse.dec
. They are typically configured through their constructor, that is also used to set the next decision procedure in the Chain of Responsibility. The most important classes are:
jbse.dec.DecisionProcedureClassInit
: This decision procedure recognizes only the clauses that predicate on the initialization status of a class (i.e., whether a class must be assumed to be loaded before the start of symbolic execution or not). It is indispensable, and therefore shall always be present in a Chain of Responsibility of decision procedures.jbse.dec.DecisionProcedureSMTLIB2_AUFNIRA
: This decision procedure interacts via console with any SMT solver that is compliant with the SMTLIB 2 standard and that supports the AUFNIRA logic. Currently the SMT solver is used to decide only the numeric clauses. The only SMT solvers we are sure that work with JBSE are Z3 and CVC4.jbse.dec.DecisionProcedureLICS
: This decision procedure implements the LICS rule language that allows to restrain the possible resolutions of (sets of) symbolic references. It therefore recognizes the clauses that predicate on symbolic references.jbse.dec.DecisionProcedureAlwSat
: This decision procedure is a dummy decision procedure that recognizes all the clauses and always answers that a clause is satisfiable. It must be used as the last decision procedure in the Chain of Responsibility.
These classes typically yield, when combined, a sufficiently powerful and flexible solver. A possible configuration example code follows:
import jbse.dec.DecisionProcedureAlwSat;
import jbse.dec.DecisionProcedureClassInit;
import jbse.dec.DecisionProcedureLICS;
import jbse.dec.DecisionProcedureSMTLIB2_AUFNIRA;
import jbse.jvm.EngineParameters;
import jbse.rewr.CalculatorRewriting;
import jbse.rules.ClassInitRulesRepo;
import jbse.rules.LICSRulesRepo;
EngineParameters p = new EngineParameters();
...
CalculatorRewriting calc = new CalculatorRewriting();
...
p.setCalculator(calc)
ArrayList<String> z3CommandLine = new ArrayList<>();
z3CommandLine.add("/opt/local/bin/z3");
z3CommandLine.add("-smt2");
z3CommandLine.add("-in");
z3CommandLine.add("-t:100");
LICSRulesRepo licsRules = new LICSRulesRepo();
ClassInitRulesRepo initRules = new ClassInitRulesRepo();
...
DecisionProcedureAlwSat decAlwSat = new DecisionProcedureAlwSat(calc);
DecisionProcedureSMTLIB2_AUFNIRA decSMT = new DecisionProcedureSMTLIB2_AUFNIRA(decAlwSat, z3CommandLine);
DecisionProcedureLICS decLICS = new DecisionProcedureLICS(decSMT, licsRules)
DecisionProcedureClassInit decInit = new DecisionProcedureClassInit(decLICS, initRules);
DecisionProcedureAlgorithms decAlgo = new DecisionProcedureAlgorithms(decInit);
p.setDecisionProcedure(decAlgo);
We add some final remarks:
The
DecisionProcedureAlwSat
constructor accepts as parameter aCalculator
. It is good practice (although not strictly necessary) to pass the same calculator object passed to theEngineParameters
orRunnerParameters
object via thesetCalculator
method.A
DecisionProcedureSMTLIB2_AUFNIRA
must receive as a constructor parameter the command line that must be used to invoke the SMT solver. This is of course platform- and environment-dependent. In the above example we proposed a possible command line for invoking Z3 in a UNIX-like environment.Differently from other decision procedures, where it is possible to interactively send and retract assertions, the
DecisionProcedureLICS
andDecisionProcedureClassInit
objects must be configured at construction time by passing suitable objects (with classjbse.rules.LICSRulesRepo
andjbse.rules.ClassInitRulesRepo
, respectively) that gather the assertions about reference resolution and class initialization, respectively. We will discuss these constraints in a later section.
Building the engine¶
Once set an EngineParameters
object, we are ready to create an Engine
; It is sufficient to create an EngineBuilder
object and invoke its build
method as follows:
...
import jbse.jvm.EngineParameters;
...
EngineParameters p = new EngineParameters();
...
EngineBuilder b = new EngineBuilder();
Engine e = b.build(p);
Setting the listener¶
The procedure for creating a Runner
is mostly identical to the one for creating an Engine
, with the obvious difference that the involved classes are RunnerParameters
and RunnerBuilder
(the name of the methods are identical). The true distinguishing feature is that a RunnerParameters
object allows to register a listener for the symbolic execution process: Differently from an Engine
, that must be stepped bytecode-by-bytecode, a Runner
fully executes a method symbolically without interruption. To allow a degree of observability and controllability of a Runner
’s symbolic execution, this notifies a listener object extending the jbse.jvm.Runner.Actions
class at appropriate moments—before a bytecode step, after a bytecode step, at the entry of a method… An application may thus monitor an execution by defining a suitable listener object and registering it by invoking the setActions(Runner.Actions)
method of the RunnerParameters
object:
...
import jbse.jvm.Runner;
import jbse.jvm.Runner.Actions;
import jbse.jvm.RunnerBuilder;
import jbse.jvm.RunnerParameters;
...
RunnerParameters p = new RunnerParameters();
...
Actions a = new Runner.Actions() {
@Override
public boolean atStart() {
...
}
@Override
public boolean atStepPre() {
...
}
...
}
p.setActions(a);
RunnerBuilder b = new RunnerBuilder();
Runner r = b.build(p);
For maximum observability and controllability of the symbolic execution process it is possible at any time to extract the Engine
object underlying a Runner
.
Using the symbolic executor¶
The symbolic execution process performed by JBSE is simple and complex at the same time. On one hand the gist of the process is easy to grasp:
Execute the next bytecode;
If you are at a branch in the symbolic execution tree select a way to go (otherwise, follow the only way to go);
Repeat.
When there are no more bytecodes to execute (i.e., the state of symbolic execution is stuck) it is possible to backtrack to one of the previously visited branching points in the symbolic execution tree and follow an unvisited direction, until all the paths of the symbolic execution tree are covered, or some budget is exhausted, whatever comes first.
On the other hand, JBSE aims at being as conformant as possible to the Java Virtual Machine Specification v.8, and as behaviorally similar as possible to its Hotspot implementation. This means, for instance, that JBSE follows quite closely the Hotspot bootstrap process and loads and initializes the same large set of core classes, in the same order as Hotspot. The consequence of such a degree of compliance is that JBSE must manage complex and large application states, even before it starts executing the user’s code. Symbolic execution of a method is performed in phases: the pre-initial phase bootstraps the execution by loading and initializing the core classes, then an initial state is created where the frame of the target method is pushed on the stack of the state emerging from the pre-initial phase, and finally during the post-initial phase the target method is actually executed.
The Engine
class exposes a low-level interface to symbolic execution. After its creation, an Engine
must be initialized by invoking its init()
method, and after that it can be stepped by repeatedly invoking its step()
method as long as a successor state exists, a thing that can be checked by invoking the canStep()
method. This happens until the execution state gets stuck, or you decide to prematurely end the exploration of the current execution path by invoking the stopCurrentPath()
method.
import jbse.jvm.Engine;
Engine e;
...
e.init();
while (e.canStep()) {
e.step();
...
//possibly:
e.stopCurrentPath();
...
}
When the Engine
is at a branch point in the symbolic execution tree, it automatically selects one of the successor states as the next one. The step()
method returns a Memento jbse.tree.StateTree.BranchPoint
object, that can be used to detect whether the Engine
is passing through a same branch point again. To backtrack to the next pending branch in the symbolic execution tree invoke the backtrack()
method, not before checking whether such a pending branch exists with the method canBacktrack()
.
import jbse.jvm.Engine;
Engine e;
...
e.init();
while (true) {
while (e.canStep()) {
e.step();
...
}
if (e.canBacktrack()) {
e.backtrack();
} else {
break;
}
}
By stepping and backtracking it is possible to perform a depth-first visit of the symbolic execution tree.