A year ago, we released a series of blog posts documenting our research into the world of binary hashing. While we speculated about the efficacy of this technique for binary diffing, our primary goal was to recognize similar code between binaries for the purpose of porting annotations from one analyzed binary to another and many of our design choices reflected this end-goal. Luckily, we’ve been given the opportunity to explore how these hashing techniques could be applied to the world of “bindiffing” through DARPA’s Assured Micropatching (AMP) 1 program.
As part of this ongoing research, we have developed NinjaDiff - an open source binary diffing plugin for BinaryNinja. Throughout this blog post, we will be discussing the underlying algorithms and technical design choices made while designing this tool.
This is the first post in a series that describes how we built tools to rapidly identify and characterize “format extensions”: modifications and new feature additions in parsers of complex formats. In this puzzle, we were given a set of binaries and a few input files – in this instance PDFs. Our task was to precisely characterize any new feature(s) present in the binaries and describe how the input files triggered them. Moreover, our goal was to build tools to enable a human to do this faster and/or more completely than they could previously. Our approach was to make the best use of the inputs that triggered modified behaviors, with a combination of fine-grained static binary diffing and execution trace and memory trace analysis.
In our previous blog, we described some examples of where binary hashing can help solve problems and compared a number of algorithms for both basic block and graph aware hashing. Today we are releasing a tool, Hashashin, which combines some of these algorithms to allow security researchers to port Binary Ninja annotations from one binary to another.
As security researchers, we often spend a lot of time looking into the internals of libraries in products we are assessing. With this come some common time sinks, such as identifying library versions. While library version identification is relatively straightforward on the surface, other tasks are clearly more challenging – such as applying signatures to stripped binaries, porting defined types across libraries, and similar codebases.