Paper notes: ARTISTE
Title: ARTISTE: Automatic generation of hybrid data structure signatures from
binary code executions
PDF: 8bd55709138804473a2e1d974ef8afd7.pdf
Another runtime system with the goal of recovering both primitive types,
and complex ones (structures, arrays and dynamic arrays), without the
requirement of source code or debugging symbols. The implementation supports
both Windows (PE) and Linux (ELF) binaries, but the paper does not mention
if a DBI framework is used. An execution log is captured which contains
executed instructions, operands, and also memory allocations and
deallocations. Based on this log, primitive data types are inferred via
x86 instructions. For example, both operands of an add
instruction are
given the type num32
. The prototypes of standard library functions are also
used for the same purpose, and to narrow down the types (e.g. from num32
to size_t
).
The interesting part of the paper is that a heap graph is constructed and logged at periodic intervals during execution. This heap graph includes which functions operate on an allocation, and the types assigned to its offsets. Based on that, an allocation is denoted as an array or a structure, and a signature is created for it. These signatures are organized in a tree and updated during the analysis of the execution log.