Using JBSE

This section will describe in details how to use JBSE to symbolically execute Java bytecode programs and analyze them by means of assertions and assumptions.

The symbolic execution classes

The class that is responsible of performing symbolic execution of a program is jbse.jvm.Engine. It implements a symbolic JVM that can execute an arbitrary Java method in a step-by-step fashion, i.e., one bytecode at time. In the case a symbolic state has more than one successor, the Engine selects one of them as the next state, and stores the others in an internal data structure. At the end of a path, the Engine can be backtracked to one of the previously stored states.

The Engine class offers a low-level API, that allows a fine-grain control of symbolic execution, but is also quite tedious to use correctly. For this reason, JBSE offers the jbse.jvm.Runner class. A Runner object drives an Engine to perform an exhaustive symbolic execution of a method. To this end, the Runner repeatedly steps and backtracks the Engine until it visits all (or a suitable subset of) the states in the symbolic execution tree of the method under execution. Users can customize the behavior of a Runner object by registering a suitable listener object that provides a set of callback methods. The Runner will invoke the appropriate callback method upon occurrence of an event, e.g., before or after a bytecode step, at the entry of a method, upon backtrack, etc.

By injecting behavior through callbacks it is possible to customize the behavior of a Runner object and create applications. The jbse.apps.run.Run class is an example of an application that is done this way. As discussed in the introduction, this application symbolically executes a method and prints to the console information about the execution as, e.g., the traversed states, their path conditions, the shape of the symbolic execution tree, the violations of the assertions, etc. We will introduce the capabilities of the Run application in a later section, while this section focuses on the use of the Engine and Runner classes.

Creating a symbolic executor

The creation of an Engine is performed by a suitable Builder object of class jbse.jvm.EngineBuilder. This object exposes a single method EngineBuilder.build(EngineParameters), that accepts as argument an object of class jbse.jvm.EngineParameters and returns a new Engine. The EngineParameters object gathers the (many) configuration parameters that must be provided to create an Engine. Similarly, a Runner must be created via a jbse.jvm.RunnerBuilder object by invoking its RunnerBuilder.build(RunnerParameters) method. The jbse.jvm.RunnerParameters class offers a superset of the methods of the EngineParameter class, i.e., all the methods to configure the parameters necessary to create the Engine underlying the Runner, plus the methods to configure the parameters specific to the Runner (e.g., the listener). We now describe how an EngineParameters or a RunnerParameters object is set to suitably configure an Engine or a Runner.

Setting the paths

An Engine (and similarly a Runner) must know the paths where it can find the classes of the application that it must symbolically execute, and of the libraries the application invokes, including the standard Java library. Mirroring what Oracle’s as Hotspot JVM does, JBSE distinguishes the bootstrap classpath, where the core JDK classes reside, the extensions classpath, home of the classes that must be loaded with the extensions mechanism, and the user classpath, pointing to the classes that belong to the application to be run (the latter is what we usually mean when we talk about the “classpath of an application”). The bootstrap and the extensions classpath are, by default, automatically configured by the EngineParameters and RunnerParameters objects by detecting the bootstrap and the extensions classpath of the JDK on which JBSE executes. This means that in most cases you do not need to configure them manually. Should you ever need to override this default, you can set the bootstrap classpath by specifying the home directory of a Java 8 JDK setup via the EngineParameters.setJavaHome(String) or EngineParameters.setJavaHome(Path) methods. To modify the extensions classpath you may add paths to it by invoking EngineParameters.addExtClasspath(String...) or EngineParameters.addExtClasspath(Path...). If you want to completely override it, you need to first clear it from its default value by invoking EngineParameters.clearExtClasspath(). Finally, to add paths to the user classpath use either the EngineParameters.addUserClasspath(String...) or the EngineParameters.addExtClasspath(Path...) method (by default, the user classpath is empty). The RunnerParameter class offers identical methods.

Setting the target method

To specify which method the Engine must symbolically execute, invoke EngineParameter.setMethodSignature(String, String, String) with the signature of the method. The signature of the method is composed by three strings: The name of class the method belongs to, the method descriptor (a string listing the types of the method parameters and of the return value), and the method’s name. The name of the class must be specified in binary name format: It must be fully qualified with the name of the package, and possibly of the other classes, that contain it, the separator dots must be replaced by slash (/) characters in case the container on the left of the dot is a package, and by a dollar sign ($) in case the container is a class. For example the binary name of the class java.util.LinkedList is java/util/LinkedList, and the binary name of the nested class java.util.LinkedList.Node is java/util/LinkedList$Node. The descriptor is necessary to distinguish overloaded methods. It is composed by a list of parameters types, enclosed in parentheses, followed by the type of the return value, or V if the method is declared void. The types of the parameters must be encoded as type descriptors according to Table 4.3-A of the Java Virtual Machine Specification.

Table 1 Type descriptors for Java types

Java type

Type descriptor

byte

B

char

C

double

D

float

F

int

I

long

J

ClassName

L ClassBinaryName ;

JavaType[]

[TypeDescriptor

Table 1 reports the correspondence between Java types and types descriptors for reference. As an example, if we want to symbolically execute a method double m(int, Object[][]) of class foo.Baz, we must configure the EngineParameters object by invoking setMethodSignature("foo/Baz", "(I[[Ljava/lang/Object;)D", "m").

Setting the calculator

An Engine needs to be able to create, from time to time, new symbolic expression. When this happens, some basic manipulations are usually performed on the created symbolic expression at the purpose of simplifying it. For example, it is possible to configure an Engine so it, whenever it must add the symbol a to the number 0, simplifies the resulting expression a + 0 to a. To this end, an Engine depends on an object extending the abstract class jbse.val.Calculator, that it uses to create and manipulate all the symbolic expressions, and that performs all the simplifications on-the-fly. It is therefore necessary to inject the dependency to a suitable subclass of Calculator through the method setCalculator(Calculator) of the classes EngineParameters and RunnerParameters. Currently, the only concrete subclass of Calculator is the class jbse.rewr.CalculatorRewriting. A CalculatorRewriting applies a set of rewriting rules to simplify all the symbolic expressions it produces. It is possible to plug the rewriting rules, implemented as subclasses of jbse.rewr.RewriterCalculatorRewriting, by invoking the CalculatorRewriting.addRewriter(RewriterCalculatorRewriting) method. The package jbse.rewr contains a collection of rewriting rules performing some useful simplifications. The most important ones, that are essentially compulsory, are:

  • jbse.rewr.RewriterExpressionOrConversionOnSimplex: necessary to simplify all the expressions whose operands are numeric, e.g., to simplify 3 + 2 to 5;

  • jbse.rewr.RewriterFunctionApplicationOnSimplex: similar to the previous, where the operator is a (symbolic) function application as sin, cos, max, min

  • jbse.rewr.RewriterZeroUnit: simplifies some operations with zero or one that have trivial result: e.g., simplifies a * 0 to 0, and 1 * b to b;

  • jbse.rewr.RewriterNegationElimination: eliminates double negations simplifying, e.g., - (- a) to a.

The other rewriters in the package jbse.rewr can be used to simplify nonlinear expression with trigonometric operators and square roots. Historically they have been used to check properties involving distances in the Cartesian plane and polar-to-cartesian and their inverse coordinates conversions. A more mundane setup of JBSE would be as follows:

import jbse.jvm.EngineParameters;
import jbse.rewr.CalculatorRewriting;
import jbse.rewr.RewriterExpressionOrConversionOnSimplex;
import jbse.rewr.RewriterFunctionApplicationOnSimplex;
import jbse.rewr.RewriterNegationElimination;
import jbse.rewr.RewriterZeroUnit;
...

EngineParameters p = new EngineParameters();
...
CalculatorRewriting calc = new CalculatorRewriting();
calc.addRewriter(new RewriterExpressionOrConversionOnSimplex());
calc.addRewriter(new RewriterFunctionApplicationOnSimplex());
calc.addRewriter(new RewriterZeroUnit());
calc.addRewriter(new RewriterNegationElimination());
p.setCalculator(calc);

Unfortunately the order the rewriters are added to the calculator matters. Moreover, some rewriters depend on the presence of other rewriters. Refer the Javadoc of the rewriters classes for more information.

Setting the decision procedures

To prune the unfeasible branches of the symbolic execution tree an Engine must decide whether a symbolic expression of boolean type is satisfiable. To this end, the Engine uses an object with class jbse.dec.DecisionProcedureAlgorithms. Upon configuration it is necessary to create this object and inject the dependency by invoking the setDecisionProcedure(DecisionProcedureAlgorithms) method exposed by the EngineParameters and RunnerParameters classes. This because a DecisionProcedureAlgorithms object can be configured according to the capabilities it needs to provide.

For the sake of simplicity, a JBSE decision procedure object recognizes a proper subset of the possible boolean clauses produced by an Engine. As a consequence, no single decision procedure object is able to decide the satisfiability of all the clauses generated by all the possible symbolic executions. For this reason, decision procedure objects must be organized in a Chain of Responsibility: Whenever a decision procedure is unable to decide the satisfiability of a clause, it delegates the task to the next decision procedure in the chain.

A JBSE decision procedure class extends the abstract class jbse.dec.DecisionProcedure, and provides a set of methods to check the satisfiability of all the possible boolean clauses that JBSE may produce. The class DecisionProcedureAlgorithms, on the other hand, is a Decorator adding to an arbitrary DecisionProcedure a set of methods that, for each bytecode instruction, perform the correct sequences of satisfiability queries necessary to implement the correct bytecode semantics. The necessary steps to configure the Engine are therefore:

  • Creating a set of DecisionProcedure objects to decide a sufficient subset of the clauses that might appear in the path conditions of the program under analysis, and arranging them in a Chain of Responsibility;

  • Wrapping the topmost DecisionProcedure in the Chain of Responsibility in a DecisionProcedureAlgorithms object;

  • Setting the EngineParameters or RunnerParameters object via the setDecisionProcedure(DecisionProcedureAlgorithms) method.

Most decision procedures are declared in the package jbse.dec. They are typically configured through their constructor, that is also used to set the next decision procedure in the Chain of Responsibility. The most important classes are:

  • jbse.dec.DecisionProcedureClassInit: This decision procedure recognizes only the clauses that predicate on the initialization status of a class (i.e., whether a class must be assumed to be loaded before the start of symbolic execution or not). It is indispensable, and therefore shall always be present in a Chain of Responsibility of decision procedures.

  • jbse.dec.DecisionProcedureSMTLIB2_AUFNIRA: This decision procedure interacts via console with any SMT solver that is compliant with the SMTLIB 2 standard and that supports the AUFNIRA logic. Currently the SMT solver is used to decide only the numeric clauses. The only SMT solvers we are sure that work with JBSE are Z3 and CVC4.

  • jbse.dec.DecisionProcedureLICS: This decision procedure implements the LICS rule language that allows to restrain the possible resolutions of (sets of) symbolic references. It therefore recognizes the clauses that predicate on symbolic references.

  • jbse.dec.DecisionProcedureAlwSat: This decision procedure is a dummy decision procedure that recognizes all the clauses and always answers that a clause is satisfiable. It must be used as the last decision procedure in the Chain of Responsibility.

These classes typically yield, when combined, a sufficiently powerful and flexible solver. A possible configuration example code follows:

import jbse.dec.DecisionProcedureAlwSat;
import jbse.dec.DecisionProcedureClassInit;
import jbse.dec.DecisionProcedureLICS;
import jbse.dec.DecisionProcedureSMTLIB2_AUFNIRA;
import jbse.jvm.EngineParameters;
import jbse.rewr.CalculatorRewriting;
import jbse.rules.ClassInitRulesRepo;
import jbse.rules.LICSRulesRepo;

EngineParameters p = new EngineParameters();
...
CalculatorRewriting calc = new CalculatorRewriting();
...
p.setCalculator(calc)

ArrayList<String> z3CommandLine = new ArrayList<>();
z3CommandLine.add("/opt/local/bin/z3");
z3CommandLine.add("-smt2");
z3CommandLine.add("-in");
z3CommandLine.add("-t:100");
LICSRulesRepo licsRules = new LICSRulesRepo();
ClassInitRulesRepo initRules = new ClassInitRulesRepo();
...
DecisionProcedureAlwSat decAlwSat = new DecisionProcedureAlwSat(calc);
DecisionProcedureSMTLIB2_AUFNIRA decSMT = new DecisionProcedureSMTLIB2_AUFNIRA(decAlwSat, z3CommandLine);
DecisionProcedureLICS decLICS = new DecisionProcedureLICS(decSMT, licsRules)
DecisionProcedureClassInit decInit = new DecisionProcedureClassInit(decLICS, initRules);
DecisionProcedureAlgorithms decAlgo = new DecisionProcedureAlgorithms(decInit);
p.setDecisionProcedure(decAlgo);

We add some final remarks:

  • The DecisionProcedureAlwSat constructor accepts as parameter a Calculator. It is good practice (although not strictly necessary) to pass the same calculator object passed to the EngineParameters or RunnerParameters object via the setCalculator method.

  • A DecisionProcedureSMTLIB2_AUFNIRA must receive as a constructor parameter the command line that must be used to invoke the SMT solver. This is of course platform- and environment-dependent. In the above example we proposed a possible command line for invoking Z3 in a UNIX-like environment.

  • Differently from other decision procedures, where it is possible to interactively send and retract assertions, the DecisionProcedureLICS and DecisionProcedureClassInit objects must be configured at construction time by passing suitable objects (with class jbse.rules.LICSRulesRepo and jbse.rules.ClassInitRulesRepo, respectively) that gather the assertions about reference resolution and class initialization, respectively. We will discuss these constraints in a later section.

Building the engine

Once set an EngineParameters object, we are ready to create an Engine; It is sufficient to create an EngineBuilder object and invoke its build method as follows:

...
import jbse.jvm.EngineParameters;
...

EngineParameters p = new EngineParameters();
...
EngineBuilder b = new EngineBuilder();
Engine e = b.build(p);

Setting the listener

The procedure for creating a Runner is mostly identical to the one for creating an Engine, with the obvious difference that the involved classes are RunnerParameters and RunnerBuilder (the name of the methods are identical). The true distinguishing feature is that a RunnerParameters object allows to register a listener for the symbolic execution process: Differently from an Engine, that must be stepped bytecode-by-bytecode, a Runner fully executes a method symbolically without interruption. To allow a degree of observability and controllability of a Runner’s symbolic execution, this notifies a listener object extending the jbse.jvm.Runner.Actions class at appropriate moments—before a bytecode step, after a bytecode step, at the entry of a method… An application may thus monitor an execution by defining a suitable listener object and registering it by invoking the setActions(Runner.Actions) method of the RunnerParameters object:

...
import jbse.jvm.Runner;
import jbse.jvm.Runner.Actions;
import jbse.jvm.RunnerBuilder;
import jbse.jvm.RunnerParameters;
...

RunnerParameters p = new RunnerParameters();
...
Actions a = new Runner.Actions() {
  @Override
  public boolean atStart() {
    ...
  }

  @Override
  public boolean atStepPre() {
    ...
  }

  ...
}
p.setActions(a);

RunnerBuilder b = new RunnerBuilder();
Runner r = b.build(p);

For maximum observability and controllability of the symbolic execution process it is possible at any time to extract the Engine object underlying a Runner.

Using the symbolic executor

The symbolic execution process performed by JBSE is simple and complex at the same time. On one hand the gist of the process is easy to grasp:

  • Execute the next bytecode;

  • If you are at a branch in the symbolic execution tree select a way to go (otherwise, follow the only way to go);

  • Repeat.

When there are no more bytecodes to execute (i.e., the state of symbolic execution is stuck) it is possible to backtrack to one of the previously visited branching points in the symbolic execution tree and follow an unvisited direction, until all the paths of the symbolic execution tree are covered, or some budget is exhausted, whatever comes first.

On the other hand, JBSE aims at being as conformant as possible to the Java Virtual Machine Specification v.8, and as behaviorally similar as possible to its Hotspot implementation. This means, for instance, that JBSE follows quite closely the Hotspot bootstrap process and loads and initializes the same large set of core classes, in the same order as Hotspot. The consequence of such a degree of compliance is that JBSE must manage complex and large application states, even before it starts executing the user’s code. Symbolic execution of a method is performed in phases: the pre-initial phase bootstraps the execution by loading and initializing the core classes, then an initial state is created where the frame of the target method is pushed on the stack of the state emerging from the pre-initial phase, and finally during the post-initial phase the target method is actually executed.

The Engine class exposes a low-level interface to symbolic execution. After its creation, an Engine must be initialized by invoking its init() method, and after that it can be stepped by repeatedly invoking its step() method as long as a successor state exists, a thing that can be checked by invoking the canStep() method. This happens until the execution state gets stuck, or you decide to prematurely end the exploration of the current execution path by invoking the stopCurrentPath() method.

import jbse.jvm.Engine;

Engine e;
...
e.init();
while (e.canStep()) {
  e.step();

  ...
  //possibly:
  e.stopCurrentPath();
  ...
}

When the Engine is at a branch point in the symbolic execution tree, it automatically selects one of the successor states as the next one. The step() method returns a Memento jbse.tree.StateTree.BranchPoint object, that can be used to detect whether the Engine is passing through a same branch point again. To backtrack to the next pending branch in the symbolic execution tree invoke the backtrack() method, not before checking whether such a pending branch exists with the method canBacktrack().

import jbse.jvm.Engine;

Engine e;
...
e.init();
while (true) {
  while (e.canStep()) {
    e.step();
    ...
  }
  if (e.canBacktrack()) {
    e.backtrack();
  } else {
    break;
  }
}

By stepping and backtracking it is possible to perform a depth-first visit of the symbolic execution tree.