Paper notes: BinSim
Title: BinSim: Trace-based semantic binary diffing via system call sliced segment equivalence checking
PDF: 0070bdf0db1fe29e6730d8e63662b0b3.pdf
BinSim is a dynamic system for identifying differences between two malware samples based on the executed system and API calls. I decided to read the paper in order to see if I can reuse anything in my usual bindiffing workflow for patch analysis and symbol porting. Since it is dynamic, its first analysis step is to run the two target x86 executables with the same input. BinSim uses BitBlaze as its dynamic analysis infrastructure, and therefore its next step is to lift the execution trace logs to BitBlaze’s Vine IL. The system and API calls from the two traces are collected and aligned based on their call sequence. Then, the arguments of the matched calls are checked for equivalence. This is done by collecting all instructions that affect them using backward slicing, and calculating each slice’s weakest precondition symbolic formula. Finally, the two formulas, each representing an aligned system or API call from the target executables, are checked for equivalence with a constraint solver. Similarity scores (1.0 for a match, 0.7 for an approximate match, and 0.5 for an unknown case) are used to summarize the equivalence for the whole binaries.