There are few papers, reports etc. out there about using dynamic binary instrumentation for shellcode detection or protecting against return-oriented programming attacks (put that phrase in google search you should find some). As far as most of those solutions are based on external DBI engines (either Pin or DynamoRIO). Speaking from my experience with dynamic binary instrumentation engines (mainly DynamoRIO) and also my own (used in MmmBop project) I would like to share my opinions on this topic. First of all DBI engines (not matter DynamoRIO/Pin/Valgrind) are awesome in many ways. They surely can make your life easier and with a little bit of tweaking you can make your own shellcode detector & analyzer in a reasonable amount of time (of course they are not limited only to that). But there are some major drawbacks. Usually from analyst point of view you don't have to instrument your entire operating system - well in fact with current tools it is not even possible. So typically you get a malicious file, run the target process with DBI enabled and voila :-). However when we are speaking about generic product/tool for shellcode detection or protecting against return-oriented programming attacks we can't limit ourselves to instrumenting AcroRd32.exe only, can we? What are the problems with instrumenting multiple processes (large number of processes)? Well there are a few.
First one is speed. Appending to Derek Bruening DynamoRIO causes average slowdown of 34% where Pin average slowdown is near 71% (please note this data comes from 2008). Even if those engines are faster now the slowdown should be still noticeable (especially in case when multiple processes are instrumented). Also please note that this is the slowdown generated only by pure DBI itself without additional Anti-ROP/Shellcode mechanism etc.
Second one is memory usage. Once again this is not a big factor for single instrumenting one or few processes. Problems start when you are trying to instrument multiple ones simultaneously (speaking from my DynamoRIO experience in ~2009). I'm not sure if it was optimized now but DynamoRIO used to reserve 128MB of address space up front (by default).
However I think it is hard to get reliable results here since they depend heavily on the specific instrumented program.
I doubt right now dynamic binary instrumentation engines are suitable for creating a security product for masses. Sometime ago I have been working on something similar but those performance issues were no go. Maybe in the future (with greater cpu speeds or more important with cheaper RAM memory prices) this will work on larger scale.
However it is possible I am wrong and such solutions already exist (and they are used in the real world)? I am aware of a company called Determina that had security product that used program shepherding. However I am not sure if I can qualify that as product with DBI engine? - anyway I'm limiting my opinions here to Pin and DynamoRIO. On the side note again. I remember I have seen some paper about parallelizing dynamic instrumentation for performance but I have no idea whether it is used anywhere in the real world. Perhaps someone has different thoughts on this matter?
In the end I would like to share some old trick for DynamoRIO I have been using to check if my program was being instrumented. Not sure if it works now, but back in the days by using FSTENV instruction (popular getPC method) your were able to retrieve the real value of EIP (not the faked one). Since DynamoRIO basic blocks are stored in memory that is always writable (i assume because of performance reasons) you can do some additional funky things :-)
Ok enough of my rumblings...
Alex Sotirov pointed that Determina actually used DynamoRIO to instrument all Windows services. And their product was working on a real systems.