Sometime ago while browsing the academia articles I have accidentally bumped on a patent application titled "BINARY REWRITING WITHOUT RELOCATION INFORMATION" (filled 05/24/2010). I always find this topic interesting so I have decided to read this paper and comment most of the important approaches from it (IMHO).
First of all, if you have ever tried writing a disassembler you have probably heard about the never-ending code vs data dillema. Relocation information can improve the correctness of this analysis but only to some point (typically far from perfect solutions). Results obtained from static disassembly are often not correct or do not provide full code coverage etc. Without perfect disassembly it is hard to perform the static binary rewriting since things may become unstable (see my Aslan notes for details). So what was really proposed in this patent application that makes it ticks even without the relocation information? (I can only assume it works since I have not seen their engine in action). I will firstly try to describe the most interesting things their are using and comment them at the end of this note.
In their approach binary reader & disassembler converts the input binary to LLVM IR. Which is then mixed with instrumentation code which user adds (I assume). LLVM does some IR optimizations in the next stage (including the additional binary-aware optimizations huhu). This optimized IR is now transfered to code generation engine, which among other things does binary layout modifications and then produces the output binary. This is enough for introduction, now lets focus on the important things. They are mainly limited to disassembler and binary layout modification internals in this post.
One of the problems in static binary rewriting is covering the indirect jump / call cases. Consider a basic instruction "call reg". Statically you can't really guess what value reg represents at the time the call gets executed (of course you can try some register propagation voodoo but there are many cases when this fails too). So in this case if the reg was initialized in some "weird" way and your engine was not able to update that offset (I assume reader knows what i am talking about here) and point it to the moved location (obtained after the binary rewriting process) you fail and the rewritten application crashes (in most of the cases). So what the patent guys did here? Appending to the patent application it seems that they have instrumented (statically) the call indirect instruction (in our case it is a call reg). So every time it get executed, instrumentation code checks the translation table for destination address and returns the modified one. Of course this is not enough since sometimes there may be no translation address for this area at all. This may happen for example when the code coverage was not complete. So to cover those cases too they have decided to keep the original code (a priori at the same original location). So when the translation fails the original (not rewritten code) is executed and hopefully the control transfer will be returned to the rewritten code... As for indirect jumps it seems they are using bit different method (see page 5-6).
Additionally the same thing is done to data sections (so all the data and code is preserved at the original locations). This helps in covering additional cases when for example a function pointer (callback) is passed as a parameter to another function (external or not). This is also hard as not impossible to cover by only performing the static analysis.
Ok time for some comments. The general idea of emitting instrumentation code (I assume this is what author of the patent names as "embodiment"?) is pretty old. Dynamic binary instrumentation engines (as the name says) rely on that techniques from the start. And in DBI control transfers instruction are also instrumented together with using the translation tables. So at this point the only difference appears to be the type of instrumentation emission (static vs virtual). Additionally in 2009 I have released my SpiderPig paper in which i have described my technique of virtual code integration which also preserves the original code and data layout (basically to improve the rewriting process). So I don't see anything new here also. Not to mention it increases the size pretty heavily. Another thing is that when the translation address (mentioned before) is not available they simply execute the original destination. However there is no guarantee the executed original code will return at all. Authors also claim that their engine works with obfuscated binaries, but as the real world shows such binaries often (or almost always) come in pair with self-modifying code - and this is harder to handle.
Finally the number of claims (the numbered "embodiments" at the end of the patent application) written by the authors is quite ridiculous (before I have read that point I wanted to point here the things I liked in their approach, silly me). However I am ware that your personal judgment dear reader may be quite different. Anyway lets hope someone finds this post entertaining because I don't.
P.S My deepest condolences to people and friends from Japan regarding the earthquake, tsunami, nuclear reactors and overall current situation. On the other hand I have never suspected there are so many nuclear experts on twitter.