argp         posts     research     bugs

Paper notes: ARTISTE

Title: ARTISTE: Automatic generation of hybrid data structure signatures from binary code executions
PDF: 8bd55709138804473a2e1d974ef8afd7.pdf

Another runtime system with the goal of recovering both primitive types, and complex ones (structures, arrays and dynamic arrays), without the requirement of source code or debugging symbols. The implementation supports both Windows (PE) and Linux (ELF) binaries, but the paper does not mention if a DBI framework is used. An execution log is captured which contains executed instructions, operands, and also memory allocations and deallocations. Based on this log, primitive data types are inferred via x86 instructions. For example, both operands of an add instruction are given the type num32. The prototypes of standard library functions are also used for the same purpose, and to narrow down the types (e.g. from num32 to size_t).

The interesting part of the paper is that a heap graph is constructed and logged at periodic intervals during execution. This heap graph includes which functions operate on an allocation, and the types assigned to its offsets. Based on that, an allocation is denoted as an array or a structure, and a signature is created for it. These signatures are organized in a tree and updated during the analysis of the execution log.