June 9, 2020
This is the first post in a series that describes how we built tools to rapidly identify and characterize “format extensions”: modifications and new feature additions in parsers of complex formats. In this puzzle, we were given a set of binaries and a few input files – in this instance PDFs. Our task was to precisely characterize any new feature(s) present in the binaries and describe how the input files triggered them. Moreover, our goal was to build tools to enable a human to do this faster and/or more completely than they could previously. Our approach was to make the best use of the inputs that triggered modified behaviors, with a combination of fine-grained static binary diffing and execution trace and memory trace analysis.