verifying the absolute value function

02 Sep, 2023

my friend artjom and i have been trying to verify randy bryant's bit twiddling hacks. starting with the absolute value function.

the bithack is defining the absolute value of an signed integer x as x^m + ~m + 1 where m = x >> 31.

verification by pen & paper 🤭

we attempted to prove this on pen an paper to observer the mechanisms of how it works. m will be -1 in the negative case (which is 32 bits of 1s) and 0 in the positive case (32 bits of 0s). when you xor with x, the negative case flips the bits, and positive case keeps the bits -- i.e. if x < 0 then x^m = ~x and if x >= 0 then x^m = x. if x is negative it's then just performing twos compliment (flip the bits of x and add 1), we already have flipped the bits with x^m and ~m is equal to 0 (flipping 32 bits of 1s) and then we add 1. in the positive case we essentially add a -1 (flipping all the bits of 0) followed by a 1.

here are some of the things we learned/relearned:

bit shifting to the right of unsigned integers, causes sign extension
bit shifting right 32 bits in gcc gets you back the same number (bit shifting a negative number x to the right 31 times (x>>31) will give you -1 (sign extended) but shifting it 32 times willl give you back x, this is kinda weird behavior)

we kinda wanted to do better than just a paper proof, one idea we had was to come up with a sat encoding of the problem -- i.e. come up with formulas for each bit (b_31, b_30, ... , b_1, b_0) in their boolean logic form, and a sat formula saying that the boolean logic encoding of the bits using the hack is the same as the boolean logic encoding for just multiplying by -1 if it's negative and keeping the same bit if it's positive.

we had some meta questions about verification here, some of which were just us brainfarting:

what is the high level thing we're trying to prove? (if the output bits of the bithack model is the same as the bit canonical model then they are the same)
us two working out the boolean logic form of each bit feels like we could introduce error into the system. and would modeling the system (how bits work) in coq/pvs be any better?
given the c program and a target system/architecture how do we verify program correctness?
what is the bridge between the program and the model -- e.g. model bits in coq?

verification using CBMC

CBMC is a model checker for C programs, it takes C programs and transforms them into equations -- using techniques like SSA, loop unrolling, symbolic execution -- before feeding them into solvers.

The following C program has a reference implementation and another that uses the non-branching bit version. CBMC can check if the two abs implementions produce the same results for all unsigned integers.

int abs_ref(int x) { return x < 0 ? -x : x; }

int abs_bits(int x) {
  int m = x >> 31;
  return (x ^ m) + ~m + 1;
}

int main() {
  int t = nondet_int();
  int ar = abs_ref(t);
  int ab = abs_bits(t);
  __CPROVER_assert(ar == ab, "Err: abs_bits does not return the same value as abs_ref");
}

the result from running an automated tool like that of a model checker are not that exciting, maybe counter examples produced are more interesting to look at. anyways running the following command cbmc abs.c will produce

CBMC version 5.12 (cbmc-5.12) 64-bit x86_64 linux
Parsing abs.c
Converting
Type-checking abs
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Starting Bounded Model Checking
size of program expression: 54 steps
simple slicing removed 5 assignments
Generated 1 VCC(s), 1 remaining after simplification
Passing problem to propositional reduction
converting SSA
Running propositional reduction
Post-processing
Solving with MiniSAT 2.2.1 with simplifier
565 variables, 1571 clauses
SAT checker: instance is UNSATISFIABLE
Runtime decision procedure: 0.00408032s

** Results:
abs.c function main
[main.assertion.1] line 16 Err: abs_bit did not return the same value as abs_ref: SUCCESS

** 0 of 1 failed (1 iterations)
VERIFICATION SUCCESSFUL

thinking that verification conditions might be a useful thing to look at, i ran cbmc --show-vcc abs.c and got back some...

incomprehensible output

CBMC version 5.12 (cbmc-5.12) 64-bit x86_64 linux
Parsing abs.c
Converting
Type-checking abs
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Starting Bounded Model Checking
size of program expression: 54 steps
simple slicing removed 5 assignments
Generated 1 VCC(s), 1 remaining after simplification
VERIFICATION CONDITIONS:

file abs.c line 16 function main
Err: abs_bit did not return the same value as abs_ref
{-1} __CPROVER_alloca_object#1 = NULL
{-2} __CPROVER_dead_object#1 = NULL
{-3} __CPROVER_deallocated#1 = NULL
{-4} __CPROVER_malloc_is_new_array#1 ⇔ false
{-5} __CPROVER_malloc_object#1 = NULL
{-6} __CPROVER_malloc_size#1 = 0
{-7} __CPROVER_memory_leak#1 = NULL
{-8} __CPROVER_next_thread_id#1 = 0
{-9} __CPROVER_next_thread_key!0#1 = 0
{-10} __CPROVER_pipe_count#1 = 0
{-11} __CPROVER_rounding_mode!0#1 = 0
{-12} __CPROVER_thread_id!0#1 = 0
{-13} __CPROVER_thread_key_dtors!0#1 = array_of #source_location=""(NULL)
{-14} __CPROVER_thread_keys!0#1 = array_of #source_location=""(NULL)
{-15} __CPROVER_threads_exited#1 = array_of #source_location=""(false)
{-16} main::1::x!0@1#2 = nondet_symbol #source_location="" identifier="symex::nondet0"
{-17} abs_ref::x!0@1#1 = main::1::x!0@1#2
{-18} abs_ref#return_value!0#1 = (abs_ref::x!0@1#1 ≥ 0 ? abs_ref::x!0@1#1 : -abs_ref::x!0@1#1)
{-19} main::1::res_ref!0@1#2 = abs_ref#return_value!0#1
{-20} abs_bit::x!0@1#1 = main::1::x!0@1#2
{-21} abs_bit::1::m!0@1#2 = ashr #source_location=""(abs_bit::x!0@1#1, 31)
{-22} abs_bit#return_value!0#1 = bitnot #source_location=""(abs_bit::1::m!0@1#2) + bitxor #source_location=""(abs_bit::1::m!0@1#2, abs_bit::x!0@1#1) + 1
{-23} main::1::res_bit!0@1#2 = abs_bit#return_value!0#1
├──────────────────────────
{1} main::1::res_ref!0@1#2 = main::1::res_bit!0@1#2