esbmc / esbmc Goto Github PK

The efficient SMT-based context-bounded model checker (ESBMC)

License: Other

C++ 43.14% C 49.16% Yacc 0.04% Shell 0.14% Makefile 0.25% Python 0.60% Perl 0.21% CMake 0.54% HTML 0.82% SWIG 2.31% QMake 0.01% PowerShell 0.01% Scilab 1.60% Solidity 0.43% Kotlin 0.02% Vim Script 0.01% Cuda 0.74% Crystal 0.01%

c cpp solidity-contracts bmc incremental-learning k-induction smt-solver kotlin automated-testing automated-verification

esbmc's Introduction

ESBMC-CHERI Video & Download

This video describes how to obtain, build and run ESBMC-CHERI on an example.

A pre-compiled binary for Linux is available in the pre-release ESBMC-CHERI, for other systems/archs the BUILDING.md document explains the necessary installation steps.

The ESBMC model checker

ESBMC (the Efficient SMT-based Context-Bounded Model Checker) is a mature, permissively licensed open-source context-bounded model checker for verifying single- and multithreaded C/C++, CUDA, CHERI, Kotlin, Python, and Solidity programs. It can automatically verify predefined safety properties (e.g., bounds check, pointer safety, overflow) and user-defined program assertions. In addition, ESBMC supports the Clang compiler as its C/C++/CHERI/CUDA frontend, the Soot framework via Jimple as its Java/Kotlin frontend, the ast2json package as its Python frontend, implements the Solidity grammar production rules as its Solidity frontend, and IEEE floating-point arithmetic for various SMT solvers. In addition, ESBMC implements state-of-the-art incremental BMC and k-induction proof-rule algorithms based on Satisfiability Modulo Theories (SMT) and Constraint Programming (CP) solvers.

To build ESBMC, please see the BUILDING file. To get started, we recommend first reading some of the background material/publications to understand exactly what this technique can provide, for example, our SV-COMP papers, which are available online.

We also provide a short video that explains ESBMC:

https://www.youtube.com/watch?v=uJ5Jn0sxm08&t=2182s

In a workshop between Arm Research and the University of Manchester, this video was delivered as part of a technical talk on exploiting the SAT revolution for automated software verification.

We offer a post-graduate course in software security that explains the internals of ESBMC.

https://ssvlab.github.io/lucasccordeiro/courses/2020/01/software-security/index.html

This course unit introduces students to basic and advanced approaches to formally building verified trustworthy software systems, where trustworthiness comprises five attributes: reliability, availability, safety, resilience and security.

The canonical public location of ESBMCs source is on GitHub:

https://github.com/esbmc/esbmc

While our main website is esbmc.org.

Architecture

The figure below illustrates the current ESBMC architecture. The tool inputs a C/C++/CUDA, Java/Kotlin, Solidity, or CHERI-C program, then converts an abstract syntax tree (AST) into a state transition system called a GOTO program. Its symbolic execution engine unrolls the GOTO program and generates a sequence of static single assignments (SSAs). The SSAs are then converted to an SMT formula, which is satisfiable if and only if the program contains errors.

Awards

Distinguished Paper Award at ICSE’11
Best Paper Award at SBESC’15
Most Influential Paper Award at ASE’23
Best Tool Paper Award at SBSeg'23
29 awards from international competitions on software verification (SV-COMP) and testing (Test-Comp) 2012-2023 at TACAS/FASE (Strength: Bug Finding and Code Coverage).

Selected Publications

Yiannis Charalambous, Norbert Tihanyi, Ridhi Jain, Youcheng Sun, Mohamed Amine Ferrag, Lucas C. Cordeiro. A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification. Technical Report, CoRR abs/2305.14752, 2023. DOI
Rafael Menezes, Daniel Moura, Helena Cavalcante, Rosiane de Freitas, Lucas C. Cordeiro . ESBMC-Jimple: verifying Kotlin programs via jimple intermediate representation In ISSTA'22, pp. 777-780, 2022. DOI
Franz Brauße, Fedor Shmarov, Rafael Menezes, Mikhail R. Gadelha, Konstantin Korovin, Giles Reger, Lucas C. Cordeiro. ESBMC-CHERI: towards verification of C programs for CHERI platforms with ESBMC In ISSTA'22, pp. 773-776, 2022. DOI
Felipe R. Monteiro, Mikhail R. Gadelha, Lucas C. Cordeiro. Model checking C++ programs. In Softw. Test. Verification Reliab. 32(1), 2022. DOI, Video, Open access.
Mikhail R. Gadelha, Lucas C. Cordeiro, Denis A. Nicole. An Efficient Floating-Point Bit-Blasting API for Verifying C Programs. In VSTTE, pp. 178-195, 2020. DOI
Mikhail Y. R. Gadelha, Felipe R. Monteiro, Jeremy Morse, Lucas C. Cordeiro, Bernd Fischer, Denis A. Nicole. ESBMC 5.0: an industrial-strength C model checker. In ASE, pp. 888-891, 2018. DOI
Jeremy Morse, Lucas C. Cordeiro, Denis A. Nicole, Bernd Fischer. Model checking LTL properties over ANSI-C programs with bounded traces. In Softw. Syst. Model. 14(1), pp. 65-81, 2015. DOI
Mikhail Y. R. Gadelha, Hussama Ibrahim Ismail, Lucas C. Cordeiro. Handling loops in bounded model checking of C programs via k-induction. In Int. J. Softw. Tools Technol. Transf. 19(1), pp. 97-114, 2017. DOI
Phillipe A. Pereira, Higo F. Albuquerque, Isabela da Silva, Hendrio Marques, Felipe R. Monteiro, Ricardo Ferreira, Lucas C. Cordeiro. SMT-based context-bounded model checking for CUDA programs. In Concurr. Comput. Pract. Exp. 29(22), 2017. DOI
Lucas C. Cordeiro, Bernd Fischer, João Marques-Silva. SMT-Based Bounded Model Checking for Embedded ANSI-C Software. In IEEE Trans. Software Eng. 38(4), pp. 957-974, 2012. DOI
Lucas C. Cordeiro, Bernd Fischer. Verifying multi-threaded software using smt-based context-bounded model checking. In ICSE, pp. 331-340, 2011. DOI

Features

ESBMC detects errors in software by simulating a finite prefix of the program execution with all possible inputs. Classes of implementation errors that can be detected include:

User-specified assertion failures
Out-of-bounds array access
Illegal pointer dereferences, such as:
- Dereferencing null
- Performing an out-of-bounds dereference
- Double-free of malloc'd memory
- Misaligned memory access
Integer overflows
Undefined behavior on shift operations
Floating-point for NaN
Divide by zero
Memory leaks

Concurrent software (using the pthread api) is verified by explicitly exploring interleavings, producing one symbolic execution per interleaving. By default, pointer-safety, array-out-of-bounds, division-by-zero, and user-specified assertions will be checked for; one can also specify options to check concurrent programs for:

Deadlock (only on pthread mutexes and convars)
Data races (i.e. competing writes)
Atomicity violations at visible assignments
Lock acquisition ordering

By default, ESBMC performs a "lazy" depth-first search of interleavings -- it can also encode (explicitly) all interleavings into a single SMT formula.

Many SMT solvers are currently supported:

Z3 4.8+
Bitwuzla
Boolector 3.0+
MathSAT
CVC4
Yices 2.2+

In addition, ESBMC can be configured to use the SMTLIB interactive text format with a pipe to communicate with an arbitrary solver process, although not-insignificant overheads are involved.

A limited subset of C++98/03 is supported, too -- a library modeling the STL is also available.

Differences from CBMC

ESBMC is a fork of CBMC v2.9 (2008), the C Bounded Model Checker. The primary differences between the two are:

CBMC focuses on SAT-based encodings of unrolled programs, while ESBMC targets SMT-based encodings.
CBMC's concurrency support is an entirely symbolic encoding of a concurrent program in one SAT formula, while ESBMC explores each interleaving individually using context-bounded verification.
CBMC uses a modified C parser written by James Roskind and a C++ parser based on OpenC++, while ESBMC relies on the Clang front-end.
ESBMC implements the Solidity grammar production rules as its Solidity frontend, while CBMC does not implement a Solidity frontend.
ESBMC verifies Kotlin programs with a model of the standard Kotlin libraries and checks a set of safety properties, while CBMC cannot handle Kotlin programs.
CBMC implements k-induction, requiring three different calls: to generate the CFG, to annotate the program, and to verify it, whereas ESBMC handles the whole process in a single call. Additionally, CBMC does not have a forward condition to check if all states were reached and relies on a limited loop unwinding.
ESBMC adds some additional types to the program's internal representation.

Open source

ESBMC has now been released as open-source software -- mainly distributed under the terms of the Apache License 2.0. ESBMC contains a significant amount of other people's software. However, please see the COPYING file for an explanation of who-owns-what and under what terms they are distributed.

We'd be extremely happy to receive contributions to make ESBMC better (under the terms of the Apache License 2.0). Please file a pull request against the public GitHub repo if you'd like to submit anything. General discussion and release announcements will be made via GitHub. Please post an issue on GitHub to contact us about research or collaboration.

Getting started

We need a better guide for getting started with ESBMC, although we hope to improve this in the future. Examining some of the benchmarks in the SV-COMP competition (http://sv-comp.sosy-lab.org/) would be a good start, using the ESBMC command line for the relevant competition year.

Contributing to the code base

Here are some steps to contributing to the code base:

Compile and execute esbmc. Building
Fork the repository
Clone the repository git clone [email protected]:YOURNAME/esbmc.git
Create a branch from the master branch (default branch)
Make your changes
Check the formatting with clang-format (use Clang 9)
Push your changes to your branch
Create a Pull Request targeting the master branch

Here is an example of preparing a pull request (PR)

A) Ensure you are in the master branch and your fork is updated.

git checkout master
git fetch upstream
git pull --rebase upstream master
git push origin HEAD:master

Note that if you have not yet setup the upstream, you need to type the following command:

git remote add upstream https://github.com/esbmc/esbmc

B) Create a local branch (e.g., model-pthread-create) from the master branch:

git checkout -b model-pthread-equal --track master

C) Add your changes via commits to the local branch:

git add path-to-file/file.cpp
git commit -sm "added opertational model for pthread_equal"

Note that you can check your changes via git status. Note also that every PR should contain at least two test cases to check your implementation: one successful and one failed test case.

D) Push your changes in the local branch to the ESBMC repository:

git push origin model-pthread-equal

New contributors can check issues marked with good first issue by clicking here.

Documentation

A limited number of classes have been marked up with doxygen documentation headers. Comments are put in the header files declaring classes and methods. HTML documentation can be generated by running:

doxygen .doxygen

The output will be in docs/HTML; open index.html to get started.

Acknowledgments

ESBMC is a joint project with the Federal University of Amazonas (Brazil), University of Manchester (UK), University of Southampton (UK), and University of Stellenbosch (South Africa).

The ESBMC development was supported by various research funding agencies, including CNPq (Brazil), CAPES (Brazil), FAPEAM (Brazil), EPSRC (UK), Royal Society (UK), British Council (UK), European Commission (Horizon 2020), and companies including ARM, Intel, Motorola Mobility, Nokia Institute of Technology and Samsung. The ESBMC development is currently funded by ARM, EPSRC grants EP/T026995/1 and EP/V000497/1, Ethereum Foundation, EU H2020 ELEGANT 957286, Intel, Motorola Mobility (through Agreement N° 4/2021), and Soteria project awarded by the UK Research and Innovation for the Digital Security by Design (DSbD) Programme.

esbmc's People

Contributors

Stargazers

Watchers

Forkers

jungyh0218 ssvlab hbgit linkitools jeremygu rjmunro neo4reo rainoftime elseviersoftwarex donatoclun lucasccordeiro feliperodri pablodiego alecs184 lennonchaves kaled-alshmrany fatimahkj vanderson-rocha 85lx pmatos drandynisbet coolboy43 beyonddream-productions alaafkirin ahmedhashwa kunjsong01 janislley jhsorge grassgod shmarovfedor owenfucell ignaciolopezmiguel rikayan daniel-yrm zhenkuny hymnoflight fbrausse ericksonalves jacobcairncross research-fork ar4s-eth maktheus correcthorsebatterystaple georgerennie matildemark juliana039 chenfengwei1 will-leeson helena345 elegant-h2020 capuanob chenfengwei0 gxf1986 z5146542 anishamadouche research-tools-pavt tcm3nc dangyajun lawleymax johnlindonrobinson vaibtan eyaget xlizhi brcfarias kai-zhichengzhou ru0ch3n grosa1 maysjhd yiannis128 nmannall anthonysdu alanpjohn rain6851 linuer rafaelsamenezes liyufeiat001 pengzeshan jihopark01 chriszai jacobyiu adilanwar2399 posticarus intrigus-lgtm prototypec14 haru-02

esbmc's Issues

pthreadtypes.h redefines __clockid_t inconsistently on different platforms

This C++ test:

https://gist.github.com/jmorse/85d2d244fc1c913e77a2

Does not compile on ubuntu:

~/esbmc ~/2mutex_wait_c++.cpp -I ~esbmc/cpp/includes
ESBMC version 1.25.4 64-bit x86_64 linux
file /home/jmorse/2mutex_wait_c++.cpp: Parsing
Converting
Type-checking /home/jmorse/2mutex_wait_c++.cpp
file pthread.h line 606: found no match for symbol `__clockid_t', candidates are:
  type int
  type int

CONVERSION ERROR

It works fine on debian wheezy.

Why don't we do value-set analysis for constants?

Consider the following program:

int thread_1 = 0;
int thread_2 = 0;

int main()
{
  if(nondet_bool())
    thread_2 = 1;

  if(nondet_bool())
    thread_1 = 1;

  assert(thread_1 == 2 || thread_2 == 2);
}

ESBMC calls the solver for this program, but if we're using value-set analysis not only for pointers, we could statically solve this problem.

Of course, that will most likely increase the memory usage.

De-stitching in value_sett drops alignment information

An sv-comp bodge in value_sett currently extracts a variety of pointer information from chains of concats / byte_extracts / bitcasts by simply ignoring those ireps. However, they change the alignment of the corresponding pointed-at exprs. At the very least the alignment information should be dropped

(This is sv-comp cleanup).

Phi functions with identical outcomes should be flattened

i.e., if an if-then-else has the same expression on either side, we should eliminate it. This isn't an existing simplification.

Constant offset sometimes not detected during dereference

In #60, with the code in [0], we ultimately decompose the struct-type assignments into byte array operations, because malloc allocates arrays of bytes. This is fine. However, the dereferencing process doesn't appear to correctly identify that the index into the malloc'd 'out' array is constant (when unrolled). This results in a lot of dynamic operations on the array, which is going to be a lot more expensive.

This constant offset situation should be completely detectable, and the program should be decomposed into assignments to explicitly identified elements in the underlying byte array.

[0] https://gist.github.com/jmorse/3156debe8d2e90350add

simplify reachability tree using incremental SMT solving

Consider the following multi-threaded C program:

# include <pthread.h>
# include <assert.h>

int g=nondet_int(), x;
void *t1(void *arg) {
  int i=0;
  if (g+i>=0) {
    i++;
    g=i;
  } else {
    x++;
  }
  return NULL;
}

void *t2(void *arg) {
  int i=0;
  if (g+i>=0) {
    i++;
    g=i;
  } else {
    x++;
  }
  return NULL;
}

int main() {
  pthread_t id1, id2;
  __ESBMC_assume(g>=0);
  pthread_create(&id1, NULL, t1, NULL);
  pthread_create(&id2, NULL, t2, NULL);
  assert(g>=0);
  return 0;
}

If we run this program as

$time esbmc test.c --z3

then we get 386 thread interleavings in aprox. 1s.

If we run this program as:

$time esbmc test.c --smt-during-symex --smt-thread-guard --smt-symex-guard --z3

then we get 173 thread interleavings in aprox. 5s.

If we run this program as:

$time esbmc test.c --smt-during-symex --smt-symex-guard --z3

then we also get 173 thread interleavings, but in aprox. 2s.

I think we can improve that check about the smt-symex-guard, i.e., come up with "heuristics" for not excessively call the SMT solver during symbolic execution.

There is already some code available at symex_goto.c (https://github.com/esbmc/esbmc-private/tree/simplify_rt), where I have introduced the following method:

bool goto_symext::is_global_expr(const expr2tc &expr)

to check whether a given guard contains global variables and also:

has_op(const expr2tc &e)

to limit the number/type of operands to a given expression.

The idea is to come up with "heuristics" for not excessively call the SMT solver at goto_symext::symex_goto(const expr2tc &old_guard) via ask_solver_question().

terminate called after throwing an instance of 'type2t::symbolic_type_excp*'

This C file was generated by means of GCC pre-processing.

gcc_output_esbmc.c.zip

$esbmc gcc_output_cbmc.c --no-bounds-check --no-pointer-check --no-div-by-zero-check --clang-frontend

Unwinding loop 206 iteration 1 file state-space.h line 202 function fxp_state_space_representation
Unwinding loop 207 iteration 3 file state-space.h line 201 function fxp_state_space_representation
Unwinding loop 208 iteration 1 file state-space.h line 208 function fxp_state_space_representation
Unwinding loop 209 iteration 1 file state-space.h line 207 function fxp_state_space_representation
**** WARNING: no body for function c::_assert
terminate called after throwing an instance of 'type2t::symbolic_type_excp*'
Aborted (core dumped)

CBMC is able to correctly verify it in less than 5 seconds (if we replace __ESBMC_assume by __CPROVER_assume).

Investigate a 'switch' irep

Given a vector of guards and expressions, we can avoid encoding a very long if-chain by producing a free variable, then asserting values into the free variable using an implication. I.E in not-even-pseudo-code:

type something = new free variable;
assert guard-a-is-true -> something == value1;
assert guard-b-is-true -> something == value2;
...
assert guard-n-is-true -> something == valuen;

This will reduce the complexity of SMT AST that has to be manipulated by the solver. It might also just generally improve the performance of the solver and reduce state space (more assertions, etc). It'd also mean that variable consolidation at the end of a switch block is much simpler to encode (which will improve the performance of what Gennaro is doing right now).

ETA: not now.

C++ frontend accepts arrays of void

Example program:

https://gist.github.com/jmorse/c52d7250e4c1b8b91456

This is rejected by g++ and clang++, but accepted by ESBMC. While I don't think detecting 100% of compilation problems is going to happen, this one is particularly painful because the C++ library uses and depends on this (undefined) behaviour.

apply incremental SMT to goto_symext::claim

@jmorse: Is there some way to stop the symbolic execution as soon as we find a property violation during incremental SMT?

Take a look at this code:

void
goto_symext::claim(const expr2tc &claim_expr, const std::string &msg) {
  if (unwinding_recursion_assumption)
    return ;

  // Can happen when evaluating certain special intrinsics. Gulp.
  if (cur_state->guard.is_false())
    return;

  total_claims++;

  expr2tc new_expr = claim_expr;
  cur_state->rename(new_expr);

  // first try simplifier on it
  do_simplify(new_expr);

  if (is_true(new_expr))
    return;

  auto rte = std::dynamic_pointer_cast<runtime_encoded_equationt>(target);
  equality2tc question(true_expr, new_expr);
  symex_slicet slicer;
  symex_target_equationt eq = dynamic_cast<const symex_target_equationt&>(*target);
  slicer.slice_for_symbols(eq, question);

  tvt res = rte.get()->ask_solver_question(question);

  if (res.is_false())
    //stop symbolic execution and report the counterexample
  else if (res.is_true())
    return ;
}

As soon as we get FALSE, I would like to stop the symbolic execution and report the counterexample.

Leaked memory detection won't work in --schedule

Due to some fixes just applied on the 'mpor' branch to stop it from crashing. This is a minor problem.

value set xchg values not erased

As observed with Mikhail right now value_sett::__ESBMC_xchg_ptr@61!0[] and so forth keep on popping up in the value set tracking, where they should be deleted.

C++ front-end is broken since ESBMC v3.0

If we run ~/esbmc-private/src/esbmc/esbmc double-linked-list.cpp double-linked-list_app.cpp -I ~/esbmc-private/src/cpp/library/, ESBMC reports:

Generating GOTO Program
GOTO program creation time: 0.953s
GOTO program processing time: 0.053s
No solver specified; defaulting to Boolector
Starting Bounded Model Checking
Symbolic type id in type_byte_size
symbol
- symbol_name : cpp::tag.node
  Aborted (core dumped)

The C++ front-end seems to be completely broken since ESBMC v3.0. Is it related to some refactoring about structs/unions?

Here are the files:

//double-linked-list.cpp
# include <iostream>
# include <cassert>
# include "double-linked-list.h"

using namespace std;

mlist::mlist(node\* h) : head(h)
{
    cout << "List is being created" << endl;
    head=NULL;
}

node\* mlist::search_node(int k)
{
    node\* l = new node();
    l=head;
    while(l!=NULL && l->key!=k)
    {
        l = l->next;
    }
    return l;
}

int mlist::delete_node(node *l)
{
    if (l->previous!=NULL)
        l->previous->next = l->next;
    else
        head = l->next;


if (l->next!=NULL)
    l->next->previous=l->previous;

return 0;


}

int mlist::insert_node(int k)
{
    node\* tmp = new node();
    tmp->key = k;
    tmp->next = head;
    if (head!=NULL)
        head->previous =tmp;
    head = tmp;
    tmp->previous = NULL;


return 0;


}

node\* mlist::get_head()
{
    return head;
}

//double-linked-list_app.cpp
# include <iostream>
# include <cassert>
# include "double-linked-list.h"

using namespace std;

int main(void){

mlist *mylist = new mlist(NULL);

mylist->insert_node(2);
mylist->insert_node(5);
mylist->insert_node(1);
mylist->insert_node(3);

node *temp = mylist->get_head();

cout << "list all elements: " << endl;

while(temp)
{
    cout << temp->key << endl;
    temp = temp->next;
}

temp = mylist->search_node(2);
cout << "search for element 2: " << (temp->key == 2 ? "found" : "not found") << endl;
assert(temp->key == 2);

mylist->delete_node(temp);

temp = mylist->get_head();

cout << "list all elements except 2: " << endl;
while(temp)
{
    cout << temp->key << endl;
    temp = temp->next;
}
}

Stack trace printing in cex is poor

i.e., it's not clear which function the line number belongs to. Could be a lot better.

Distinguish array and struct 'with''s

Currently more than a little code is devoted to working out when a 'with' applies to a struct or an array, and making decisions based on that. If we changed array operations to 'store' instead, everyone would be a lot happier.

Regression tests should have a "no simplify" mode

I've noticed that at least one test (01_cbmc_Fixedbv8) only works because the assertion is simplified out; the actual implementation of the operation it tests in the SMT backend is broken.

We should be able to run our regression tests in some kind of 'no simplify' mode (i.e. with the --no-simplify option) so that we know all our behaviours are correct in the way we're modeling them, and there isn't some simplification gunk getting in the way. This will be slower, but meh.

Configure TRAVIS CI to run ESBMC regression tests

@feliperodri: can you please talk to Lennon to configure TRAVIS CI in order to run the ESBMC regression for every commit in github? Lennon has already done this for the DSVerifier project.

Assertion not printed on memory deref failures

In some cases, an assertion isn't printed when a dereference fails. This can be replicated in c380f763 in 03_nec13 with boolector. I figure it's because the memory dereference is being sliced out, but the assertion about its correctness isn't, leaving nothing to be printed as the failing instruction.

terminate called after throwing an instance of 'array_type2t::dyn_sized_array_excp*'

ESBMC fails for this particular program when we call it as:

esbmc main.c --unwind 10

# include <stdio.h>
# include <stdlib.h>
# include <assert.h>

typedef int __nodetype;

typedef struct node {
    __nodetype key;
    struct node *next;
} NODE;

static unsigned int vertices;

int get_graph_size(char *argv[]) {
  unsigned int size=0;
  int a, b;
  FILE *fp;
  fp = fopen(argv[1],"r");
  if (!fp) {
    printf("Failed to open the file %s.\n",argv[1]);
    return -1;
  }
  while (!feof(fp)) {
    fscanf(fp,"%d%d", &a, &b);
    size = a > size ? a : size;
    size = b > size ? b : size;
  }

  ++size;
  printf("graph size: %d\n", size);

  return size;
}

void insert_node(NODE* list[], int a, int b){
    NODE* l = (NODE*)malloc(sizeof(NODE));
    if (list[a] == NULL) {
        l->key = b;
        l->next = NULL;
    } else {
        l->key = b;
        l->next = list[a];
    }
    list[a] = l;
}

void print_adjacent_list(NODE *list[]){
  int i;
  NODE *tmp;
  printf("\nPrinting adjacent list...\n\n");
  for(i=0; i<vertices; i++) {
    if (list[i]!=NULL) {
      printf("(%d) ==> %d ", i, list[i]->key) ;
      tmp = list[i]->next;
      while (tmp != NULL) {
        printf("==> %d  ", tmp->key);
        tmp = tmp->next;
      }
    }
    printf("\n");
  }
}

void print_adjacent_matrix(__nodetype matrix[vertices][vertices]) {
  int i, j;
  printf("\nPrinting adjacent matrix...\n\n");
  for(i=0; i<vertices; i++) {
    for(j=0; j<vertices; j++) {
      printf("%d ", matrix[i][j]);
    }
    printf("\n");
  }
}

int create_adjacent_list(char *argv[], NODE *list[]) {
  int i, a, b, prev_a, prev_b;
  FILE *fp;
  fp = fopen(argv[1],"r");

  for(i=0; i<vertices; i++)
    list[i]=NULL;

  if (!fp) {
    printf("Failed to open the file %s.\n",argv[1]);
    return -1;
  }

  while (!feof(fp)) {
    fscanf(fp,"%d%d", &a, &b);
    if (prev_a != a || prev_b != b) {
          printf("a: %d, b: %d\n", a, b);
      insert_node(list, a, b);
    }
    prev_a = a;
    prev_b = b;
  }
  return 0;
}

int create_adjacent_matrix(char *argv[], __nodetype matrix[vertices][vertices]) {
  int i, j, a, b, prev_a, prev_b;
  FILE *fp;
  fp = fopen(argv[1],"r");

  for(i=0; i<vertices; i++)
    for(j=0; j<vertices; j++)
      matrix[i][j] = 0;

  if (!fp) {
    printf("Failed to open the file %s.\n",argv[1]);
    return -1;
  }

  while (!feof(fp)) {
    fscanf(fp,"%d%d", &a, &b);
    if (prev_a != a || prev_b != b) {
          printf("a: %d, b: %d\n", a, b);
      matrix[a][b]=1;
    }
    prev_a = a;
    prev_b = b;
  }
  return 0;
}

int main(int argc, char *argv[]){

  vertices = get_graph_size(argv);

  NODE *adjacent_list[vertices];
  __nodetype adjacent_matrix[vertices][vertices];

  create_adjacent_list(argv, adjacent_list);
  print_adjacent_list(adjacent_list);

  create_adjacent_matrix(argv, adjacent_matrix);
  print_adjacent_matrix(adjacent_matrix);

  return 0;
}

Support custom sized bitvectors

Request from the cseq team.

They're using frama-c to do some interval analysis and define custom bitvectors with arbitrary size. In a number of cases, the verification time is much smaller.

CBMC uses this weird semantic:
__CPROVER_bitvector[10] a;

Obviously, the clang frontend will not allow that, even if we define the __CPROVER_bitvector type.

The easiest solution I could come up is to use source annotations, which we already use for infinite sized arrays. I propose something like:

attribute((annotate("bv=10")))
int a;

This is parsed by clang, we can still compile the code and it will require little effort to implement.

The down side: it will only be supported by the new frontend.

Invalid assertion on two dimension array value with non-determinstic index

ESBMC detect an invalid assertion violation when :

the assertion condition is about a value in a two dimension array
an index get a non-deterministic value, that is forced to a specific value by pruning all values but one

ESBMC seems to process correctly assertions about the value of the non-deterministic index alone, or
about the value in the array with constant indexes.

/* esbmc esbmc_array_nondet_err.c */
int main(void)
{
    unsigned int b;
    unsigned int a = 0;
    int table[2][2]  = {{1,0}, {1,0}};

    if (nondet_uint() == 0) {
        a = 1;
    } else {
        __ESBMC_assume(0);
    }
    __ESBMC_assert(table[1][0], "FOO"); /* Assertion ok */
    __ESBMC_assert(a == 1, "FOO"); /* Assertion ok */
    b = table[a][0];
    __ESBMC_assert(b, "ERROR"); /* Assertion violated */
    __ESBMC_assert(table[a][0], "ERROR"); /* arrays bound violated */
    return (0);
}

Trace :

/*
ESBMC version 4.2.0 64-bit x86_64 linux
file esbmc_array_nondet_err.c: Parsing
Converting
Type-checking esbmc_array_nondet_err
Generating GOTO Program
GOTO program creation time: 0.074s
GOTO program processing time: 0.000s
Starting Bounded Model Checking
Symex completed in: 0.001s (27 assignments)
Slicing time: 0.000s
Generated 6 VCC(s), 5 remaining after simplification
No solver specified; defaulting to Boolector
Encoding remaining VCC(s) using bit-vector arithmetic
Encoding to solver time: 0.003s
Solving with solver Boolector 2.4.1
Encoding to solver time: 0.003s
Runtime decision procedure: 0.000s
Building error trace

Counterexample:

State 1 file esbmc_array_nondet_err.c line 8 function main thread 0
c::main
----------------------------------------------------
  c::main::$tmp::return_value_nondet_uint$1=0 (0)

State 4 file esbmc_array_nondet_err.c line 17 function main thread 0
c::main
----------------------------------------------------
  esbmc_array_nondet_err::main::1::b=0 (0)

State 5 file esbmc_array_nondet_err.c line 18 function main thread 0
c::main
----------------------------------------------------
Violated property:
  file esbmc_array_nondet_err.c line 18 function main
  ERROR
  (_Bool)b

VERIFICATION FAILED
*/

Consider replacing the int-to-pointer conversion code.

Right now in SMT, when you convert an int to a pointer, it gets converted to a gigantic ITE chain that tests whether the int representation of the pointer is in a particular range. This might be more efficiently represented as a free pointer with some implications applied to it.

This can be benchmarked to death.

Implement a dereference cache

Looking at C++ performance stuff, we do spend some time repeatedly dereferencing 'this', i.e. what the current object is. And for the majority of object code, 'this' doesn't change very often and points at the same object (or set of objects). So, we might benefit from caching expressions built to access members of 'this' or other dereferenced pointers. So long as it isn't renamed to level 2, it can be re-used for the lifetime of the pointer itself.

This is massively handwavey, not urgent, and 'future work'ish.

Track pointer values through integer casts

Basically, we loose all pointer tracking data after casting the pointer to an integer, which alas happens occasionally (see: CIL).

Why not instead, track pointer values through integers, i.e. "this integer might point at this object". That would vastly simplify the SMT casting to and from of these values, and mean that at symex time we can determine what a pointer ''might'' point at, meaning we can handle derefs correctly.

This probably means fiddling with the format of some typecast expressions, but also the pointer tracking. Kroening already does this though!

Benchmark putting alignment assertion in pointer assignment

The C spec says that even just pointer values that aren't correctly aligned for the type of the variable cause undefined behaviour. In theory we could apply the alignment assertion on each recalculation of a pointer, catching errors earlier and possibly with fewer assertions injected from dereferences.

Test for oversized or negative shift operands

C99's 6.5.7.3 dictates that if the right hand operand of a shift operation is either negative or >= the bitwidth of the operand, then the behaviour is undefined. However, we don't assert for or against this.

This is an issue because boolector will only accept shift operands where the shift amount is a of width log2(lhs_bitwidth), i.e. neither negative nor large enough to shift a number all the way out of a variable. So it's likely to have diverging behaviour from Z3 and other solvers.

False "array bound violated error" of two dimension arrays

A failing assertion over a value on the second column of a two dimension array
raise the error array bounds violated instead of the assertion violation message.

A minimal example follows :

/* esbmc esbmc_array_bound_err.c */
int main(void)
{ 
    int a = 1;
    int b = 0;
    int t[2][1]  = { {0}, {1}};
    __ESBMC_assert(t[a][b] == 0, "TEST1");
    return (0);
}

Error trace :

ESBMC version 4.2.0 64-bit x86_64 linux
file esbmc_array_bound_err.c: Parsing
Converting
Type-checking esbmc_array_bound_err
Generating GOTO Program
GOTO program creation time: 0.094s
GOTO program processing time: 0.002s
Starting Bounded Model Checking
Symex completed in: 0.003s (17 assignments)
Slicing time: 0.000s
Generated 5 VCC(s), 1 remaining after simplification
No solver specified; defaulting to Boolector
Encoding remaining VCC(s) using bit-vector arithmetic
Encoding to solver time: 0.000s
Solving with solver Boolector 2.4.1
Encoding to solver time: 0.000s
Runtime decision procedure: 0.002s
Building error trace

Counterexample:

State 1 file esbmc_array_bound_err.c line 7 function main thread 0
c::main
----------------------------------------------------
Violated property:
  file esbmc_array_bound_err.c line 7 function main
  array bounds violated: array `t'[] upper bound
  t[a][b] == 0

VERIFICATION FAILED

Keep minimal context across runs in incremental-bmc

As discussed in issue #119, we were adding symbols with same name but different types to the context and everything went bananas.

I just remembered that namespacet has actually two contexts. Every search eventually goes through:

bool namespacet::lookup(
  const irep_idt &name,
  const symbolt *&symbol) const
{
  const symbolt* s = nullptr;

  s = context1->find_symbol(name);
  if(s != nullptr)
  {
    symbol = s;
    return false;
  }

  if(context2 != nullptr)
  {
    s = context2->find_symbol(name);
    if(s != nullptr)
    {
      symbol = s;
      return false;
    }
  }

  return true;
}

which searches in both contexts.

Currently, the two contexts are used by the frontend to merge contexts from different translation units but we can extend that, adding a mode where context1 the minimal context, generated after the goto transformations and context 2 contains the new symbols from the symbolic execution.

When running in incremental-bmc (or k-induction), we'll simply need to clear the second context and avoid the memory bloat.

pkg-config isn't on ubuntu by default

I'm experimenting with what people will experience if they install ESBMC on a ubuntu VM by default. The answer is that autoconf acts weird, but I'm not sure it's our fault.

One problem though is that pkg.m4, which provides the ability to look up pkg-config config files, for example for python, doesn't seem to exist on a default ubuntu install unless you ask for it. This is an issue because m4 just won't expand the PKG_* macros, and it'll lead to a crash at configure time, which people typically aren't prepared for. It's difficult to work out the solution too.

There's some kind of magical invocation one can give m4 that tells it to not complete if certain macros remain undefined, we could use that to prevent future pain. This is very much low priority though.

Header inclusion pulls in a lot of the C library, unnecessarily.

That is; if you include stdio.h say, then there are references everywhere for all the IO functions, and so they get pulled into the final program.

Possibly instead the set of functions that are called in should be used to work out how much of the C library to include.

Consider propagating assignments

After wrangling a problem with inlining, it turns out that turning it off in tests like esbmc-cpp/esbmc-algorithm/algorithm109, it can make a difference of at least an hour (with a SSA program size 1/4 of the size). The only thing I can think this is due to is the reduced number of assignments between variables and function argument variables in the course of a function call, although I haven't tested this.

We can emulate that in symex via some kind of variable name propagation: during renaming, we link up local variable names to the last assigned variable with that value (but only if it's reading it). It probably means the sat solver has less search space, or something.

Should be optional, because it'd be horrendous to debug an SSA program with this enabled. Also, not high priority.

Infinite loop (?) on symbolic execution

OBS: the new llvm frontend is required in order to reproduce this error.

The program in [1] causes esbmc to hang during symbolic execution. Maybe an infinite loop.

Symex trace shows:

    num_total_threads=0;
    num_threads_running=0;
    __ESBMC_deallocated=ARRAY_OF(FALSE);
    __ESBMC_alloc=ARRAY_OF(FALSE);
    p=&{ 2, 4 }[0];
    __ESBMC_alloc_size=ARRAY_OF(0);
    pc=&{ 1f, 10f, 100f }[0];
    __ESBMC_is_dynamic=ARRAY_OF(FALSE);
    FUNCTION_CALL:  pthread_start_main_hook()
    ATOMIC_BEGIN
    num_total_threads=num_total_threads + 1;
    num_threads_running=num_threads_running + 1;
    ATOMIC_END
    END_FUNCTION
    FUNCTION_CALL:  main()
    signed int n;
    n=2;
    signed int * p;
    p=&n;
    p=&{ *p, 0 }[0];

Eliminate C++ components in types during conversion

Previously I mangled the GOTO conversion code to flatten all 'symbol' types, so that there wasn't a need to resolve type names at symex time. That saved a large amount of memory, because ESBMC didn't have to repeatedly look up and duplicate types during code interpretation.

However this is a pain in the neck when dealing with C++ code, because the entirety of the class definition is baked into the type. However, this isn't actually necessary: all name resolution should have been done during the code typechecking, so at no point during symex should we actually have to look at the member components of a class and make a decision on it. (The GOTO code should be at a level even lower than C; ideally no C++ special cases, save exceptions perhaps).

So, we should consider nurking non-code components of structs, to aid readability of stuff while debugging C++ code.

Ditch 'invalid object' assignments

When we fail to dereference something, a read to a 'invalid_object' symbol with some unique numbering occurs, so that the SMT formula is well formed. That means we can encode the expressions to SMT and encounter the relevant assertion failure; or not if the guard is nondeterministically false at that point.

Anyhoo, these invalid object symbols can turn into assignments if they're on the lhs of a dereferenced assignment. But they're 100% superfluous, and can be discarded. Most annoying is that they end up in phi variable joins. They all end up being sliced away, but it adds overhead.

More multi-dimensional-array tests

Given how often we run into errors with these, there should be many many more tests that feature them.

Eliminate big-int's to_{u,}long

In favour of to_{u,}int64. 'long' is too machine dependant for me.

Exception related assertions should be optional

I.E, some option of the form --no-exception-assertions, or something to turn off the "Unhandled" claims. We might end up needing to disable this in a future TACAS competition or something.

Nonexistant bit operators can be deleted.

For example: "bvnxor". Nothing generates this irep. Yet there must be roughly a hundred lines of boilerplate scattered around to accommodate it. (I suspect it's inherited from CBMCs hardware interaction stuff).

Try to eliminate claim duplication

One (longer term) simplification we could perform is checking that we're not repeatedly issuing the same assertions about something. For example, repeatedly asserting that a pointer (that hasn't been modified) isn't null. This would reduce the number of claims and hopefully the amount of SMT/SSA that gets converted.

Detect whether we can build 32/64 bit C libraries

Prior to 7989562d, the sv-comp benchmark pthread/sigma_false-unreach-call.i croaked saying that time_t's definition differed between the benchmark and the machine esbmc was compiled on. This seems really really unlikely, because it used to work, and nothing has really changed. Except, uh, autotools.

This is really really fishy, but I can't look at it at this exact moment.

Segmentation fault (core dumped)

ESBMC crashes in this particular C++ example:

[lucascordeiro@localhost src]$ esbmc binary-tree.cpp -I ~/esbmc-cpp-linux-64-static/libraries/
ESBMC version 3.0.0 64-bit x86_64 linux
file binary-tree.cpp: Parsing
Converting
Type-checking binary-tree.cpp
Segmentation fault (core dumped)
# 
# include <iostream>
# include <cassert>

using namespace std;

template <class T>
class nodet
{
public:
    T key;
    nodet *p;
    nodet *left;
    nodet *right;
};

template <class T>
class treet
{
public:
    treet();
    void preorder_tree_walk(nodet<T> *x);
    void inorder_tree_walk(nodet<T> *x);
    void posorder_tree_walk(nodet<T> *x);
    void insert(nodet<T> *z, T key);
    nodet<T> *tree_search(nodet<T> *x, T k);
    nodet<T> *tree_minimum(nodet<T> *x);
    nodet<T> *tree_maximum(nodet<T> *x);
    nodet<T> *get_root();
private:
    nodet<T> *root;
};

template <class T>
treet<T>::treet()
{
    root = NULL;
}

template <class T>
void treet<T>::preorder_tree_walk(nodet<T> *x)
{
  if (x!=NULL)
  {
    cout << "node: " << x->key << endl;
    preorder_tree_walk(x->left);
    preorder_tree_walk(x->right);
  }
}

template <class T>
void treet<T>::inorder_tree_walk(nodet<T> *x)
{
  if (x!=NULL)
  {
    inorder_tree_walk(x->left);
    cout << "node: " << x->key << endl;
    inorder_tree_walk(x->right);
  }
}

template <class T>
void treet<T>::posorder_tree_walk(nodet<T> *x)
{
  if (x!=NULL)
  {
    posorder_tree_walk(x->left);
    posorder_tree_walk(x->right);
    cout << "node: " << x->key << endl;
  }
}

template <class T>
void treet<T>::insert(nodet<T> *z, T key)
{
  nodet<T> *y, *x;
  z = new nodet<T>;
  z->key=key; z->left=NULL; z->right=NULL; y = NULL; x=root;
  while (x != NULL)
  {
    y = x;
    if (z->key < x->key) x = x->left;
    else x = x->right;
  }

  z->p = y;
  if (y==NULL) {root = z; root->left=NULL; root->right=NULL;}
  else if (z->key < y->key) y->left = z;
  else if (z->key > y->key) y->right = z;
}

template <class T>
nodet<T> *treet<T>::tree_search(nodet<T> *x, T k)
{
  if (x == NULL || k == x->key)
    return x;
  if (k < x->key)
    tree_search(x->left, k);
  else
    tree_search(x->right,k);
}

template <class T>
nodet<T> *treet<T>::tree_minimum(nodet<T> *x)
{
  while (x->left != NULL)
    x=x->left;
  return x;
}

template <class T>
nodet<T> *treet<T>::tree_maximum(nodet<T> *x)
{
  while (x->right != NULL)
    x=x->right;
  return x;
}

template <class T>
nodet<T> *treet<T>::get_root()
{
    return root;
}

int main(void)
{
  nodet<char> *t = new nodet<char>;
  treet<char> *n = new treet<char>;
  n->insert(t, 'F');
  n->insert(t, 'C');
  n->insert(t, 'E');
  n->insert(t, 'L');
  n->insert(t, 'G');
  n->insert(t, 'A');
  n->insert(t, 'B');
  n->insert(t, 'I');
  n->insert(t, 'H');
  n->insert(t, 'J');

  cout << "### inorder_tree_walk: " << endl;
  n->inorder_tree_walk(n->get_root());

  cout << "### preorder_tree_walk: " << endl;
  n->preorder_tree_walk(n->get_root());

  cout << "### posorder_tree_walk: " << endl;
  n->posorder_tree_walk(n->get_root());

  t = n->tree_search(n->get_root(),'A');
  assert(t->key == 'A');

  cout << "### tree_search: " << 'A' << "==" << t->key << endl;
  t = n->tree_minimum(n->get_root());
  assert(t->key == 'A');

  cout << "### tree_minimum: " << t->key << endl;
  t = n->tree_maximum(n->get_root());
  assert(t->key == 'L');
  cout << "### tree_maximum: " << t->key << endl;

  return 0;
}

Generate some memory model regression tests

We need a set of tests that cover all combinations of dereferencing differently typed objects through differently typed pointers, with different offsets, const and dynamic. Otherwise, the way it works will randomly change.

Smtlib printing needs tweaking

Particularly the fact that most solvers don't like '' characters, at all, and when bitvectors are printed they should never be negative.

Improve incremental BMC by generating a single SSA

Right now, we have stuff like:

*** Iteration number 1 ***
Starting Bounded Model Checking
Unwinding loop 1 iteration 1 file Problem05_label01_false-unreach-call.c line 11131 function main
Symex completed in: 4.829s
size of program expression: 26118 assignments
Slicing time: 4.277sinstead
Generated 2 VCC(s), 2 remaining after simplification
Encoding remaining VCC(s) using bit-vector arithmetic
Encoding to solver time: 0.109s
Solving with solver MathSAT5 version 5.3.14 (0b98b661254c) (Nov 17 2016 10:59:45, gmp 5.1.3, gcc 4.8.5, 64-bit)
Runtime decision procedure: 0.059s
No bug has been found in the base case

BMC program time: 9.277s
Starting Bounded Model Checking
Unwinding loop 1 iteration 1 file Problem05_label01_false-unreach-call.c line 11131 function main
Symex completed in: 5.126s
size of program expression: 26117 assignments
Slicing time: 4.266s
Generated 2 VCC(s), 2 remaining after simplification
Encoding remaining VCC(s) using bit-vector arithmetic
Encoding to solver time: 0.109s
Solving with solver MathSAT5 version 5.3.14 (0b98b661254c) (Nov 17 2016 10:59:45, gmp 5.1.3, gcc 4.8.5, 64-bit)
Runtime decision procedure: 0.075s
The forward condition is unable to prove the property

The symex code runs twice for each k (three times for k-induction). It's usually the most expensive step of the verification along with the slicer.

We could instead:

Add new flags to the SSA assignments to whether or not it should be converted for the current step (maybe we can extend the ignored flag from bool to an enum: IGNORE, BASE_ONLY, FORW_ONLY and INDU_ONLY).
Convert the base case to SSA to SMT, if it's unsat, push the unwinding assertions and run again. Something similar can be done for the inductive step, we can push the nondet assignments but we would have to remove the unwinding assertions from the formula and I'm not sure if this is doable (maybe we can push the unwinding assumptions instead of removing the unwinding assertions). Other thing to consider is the slicer, adding new assignments during the inductive step might need the original unsliced program to produce the correct SMT formula.
Profit.

We'll still run the symex code for new values of k, but we can't avoid that.

Over-stitching of pointer accesses when dereferencing

dereferencet::construct_from_dyn_offset receives a scalar variable and a type, and a dynamic offset into the scalar variable from which data will be extracted. Most of the time we're extracting matching types, so an integer from an integer, and this function does very little.

However the test it makes to decide whether types 'match' is base_type_eq, which requires that the types of pointers match. This then leads to the implicit-cast feature of C void pointers not matching causing that test to fail, even when a pointer value is being extracted from a pointer value. (It's just their subtypes that don't match, which shouldn't make a difference). The cost is additional byte stitching, which is seriously expensive at SMT time.

This is easy to fix with a more forgiving type-equality test, but I don't have the time now. This function could also be supplied with alignment info to avoid the subsequent if2tc.

Add support for __ESBMC_array_of(foo) to the frontend

As discussed in issue #131, we can avoid the loop by using a new reserved function __ESBMC_array_of(foo).

This can be easily done on both frontends (the C++ frontends won't need to be changed).

value_sett::object_numbering is bad (in general).

It's:

A global
Static
Not thread safe
Retains all pointed-at objects, inflating memory usage

And can totally be eliminated now that cheap-ish expression hashing exists.

Try-catch is broken for simple C++ programs

If we run
$ ~/esbmc-v3.0.2-linux-64/bin/esbmc main.cpp

ESBMC fails to detect the assertion violation in the following program:
*** Exception thrown of type signed_int at file main.cpp line 4
*** Caught by catch(signed_int) at file line
Symex completed in: 0.000s
size of program expression: 12 assignments
Slicing time: 0.000s
Generated 0 VCC(s), 0 remaining after simplification
VERIFICATION SUCCESSFUL
BMC program time: 0.000s

include <assert.h>

void myfunction () {
throw 1;
}

int main (void) {
try {
myfunction();
}
catch (int) { assert(0); }
return 0;
}

Possible memory leak on migrate_type

I'm testing a program that Genarro sent me (it's more than 90KLOC) and ESBMC is using a massive amount of memory during symex.

While CBMC uses around 3GB, ESBMC uses 17GB+ memory. Compiling using the leak sanitizer, the GOTO model generation fails because the c2goto binary is leaking on migrate_type, here's one trace:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x5131fb in operator new(unsigned long) /home/nikola/final/llvm.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:78:35
    esbmc/esbmc#1 0x985b52 in get_type_from_pool(typet const&, std::map<typet, irep_container<type2t>, std::less<typet>, std::allocator<std::pair<typet const, irep_container<type2t> > > >&) /home/mramalho/esbmc/build/util/../../src/util/irep2.cpp:812:12
    esbmc/esbmc#2 0x985cf3 in type_poolt::get_signedbv(typet const&) /home/mramalho/esbmc/build/util/../../src/util/irep2.cpp:848:10
    esbmc/esbmc#3 0xb7b68e in migrate_type(typet const&, irep_container<type2t>&, namespacet const*, bool) /home/mramalho/esbmc/build/util/../../src/util/migrate.cpp:288:30
    esbmc/esbmc#4 0xb78958 in real_migrate_type(typet const&, irep_container<type2t>&, namespacet const*, bool) /home/mramalho/esbmc/build/util/../../src/util/migrate.cpp:106:5
    esbmc/esbmc#5 0xb7b5d1 in migrate_type(typet const&, irep_container<type2t>&, namespacet const*, bool) /home/mramalho/esbmc/build/util/../../src/util/migrate.cpp:283:12
    esbmc/esbmc#6 0x622393 in c_sizeof(typet const&, namespacet const&) /home/mramalho/esbmc/build/ansi-c/../../src/ansi-c/c_sizeof.cpp:25:3
    esbmc/esbmc#7 0x67b08e in clang_c_adjust::adjust_sizeof(exprt&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:472:18
    esbmc/esbmc#8 0x67732d in clang_c_adjust::adjust_expr(exprt&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:106:5
    esbmc/esbmc#9 0x67bdf8 in clang_c_adjust::adjust_operands(exprt&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:1238:5
    esbmc/esbmc#10 0x67b3d0 in clang_c_adjust::adjust_expr_binary_arithmetic(exprt&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:243:3
    esbmc/esbmc#11 0x6774e1 in clang_c_adjust::adjust_expr(exprt&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:119:5
    esbmc/esbmc#12 0x67bdf8 in clang_c_adjust::adjust_operands(exprt&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:1238:5
    esbmc/esbmc#13 0x677710 in clang_c_adjust::adjust_expr(exprt&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:147:5
    esbmc/esbmc#14 0x676e70 in clang_c_adjust::adjust_symbol(symbolt&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:55:5
    esbmc/esbmc#15 0x676b56 in clang_c_adjust::adjust() /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_adjust_expr.cpp:46:5
    esbmc/esbmc#16 0x63265f in clang_c_languaget::typecheck(contextt&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, message_handlert&) /home/mramalho/esbmc/build/clang-c-frontend/../../src/clang-c-frontend/clang_c_language.cpp:161:15
    esbmc/esbmc#17 0x87a2ea in language_filest::typecheck(contextt&) /home/mramalho/esbmc/build/util/../../src/util/language_file.cpp:99:31
    esbmc/esbmc#18 0x84621c in language_uit::typecheck() /home/mramalho/esbmc/build/langapi/../../src/langapi/language_ui.cpp:182:21
    esbmc/esbmc#19 0x518c2d in c2goto_parseopt::doit() /home/mramalho/esbmc/build/c2goto/../../src/c2goto/c2goto.cpp:54:9
    esbmc/esbmc#20 0x8852d2 in parseoptions_baset::main() /home/mramalho/esbmc/build/util/../../src/util/parseoptions.cpp:50:10
    esbmc/esbmc#21 0x5155db in main /home/mramalho/esbmc/build/c2goto/../../src/c2goto/c2goto.cpp:74:19
    esbmc/esbmc#22 0x7fa8dcea957f in __libc_start_main (/lib64/libc.so.6+0x2057f)

All the leaks happen when calculating the sizeof operator. The full leak report: leak.txt

@jmorse, what do you think? is it false positive?

I tried to remove the new operator and return the object allocated by real_migrate_type but the leak continued. I can't find who's holding a reference for the type2tc object, that's keeping the shared_ptr alive.

MacOS 64-bit size_t's bleed through into ESBMC

Fisch is encountering some 64-bit size_t's when running some ESBMC tests, using the --64 option. This actually shouldn't happen, because by design size_t should remain 32 bits at all time (the underlying pointer munging code for malloc isn't robust enough).

This leads to some malformed SMT that causes Z3 errors. However more peculiar is that it doesn't happen when running without --64. That means it isn't necessarily the mac environment leaking in, it might be something else we're #including.