Rule Metadata & Exploit Signature Difficulties

Creating reliable and performant detection logic for Suricata and Snort is the primary focus of the Emerging Threats team however, we are frequently looking to improve several other aspects of our rules on a day-to-day basis. One of these aspects is the metadata of our rules which can be incredibly insightful to those utilizing our rulesets in many various ways. More specifically, this post focuses on metadata from an exploit signature perspective, how metadata can be utilized to extract information regarding exploit signatures, and the changes we are making to this metadata on an ongoing basis.

Currently, when a signature is created for an exploit, we apply a confidence level in metadata for the signature based on several factors. One of the major factors here is how dynamic the exploit may be. To better explain this factor, consider the following brief comparison of buffer overflows and use-after-free vulnerabilities from a network detection perspective.

Buffer Overflow

Suppose a web application suffers from a vulnerability in which a HTTP URI parameter in a specific PHP file will cause a buffer overflow when containing greater than 300 characters, we can easily write a signature that is granular and reliable by detecting the presence of the vulnerable parameter and the PHP file in conjunction with the number of bytes required to trigger the overflow.

Here is an example POC of such a scenario, using a different filetype but the premise is the same (CVE-2004-2466) - https://www.exploit-db.com/exploits/50999

And here is an example of a signature snippet that could be used to detect this exploitation based on the information available in the POC: http.uri; content:”chat.ghp?username=”; fast_pattern; pcre:”/^[^&]{200}/R”; We would apply a confidence level of ‘high’ to such a signature. You can be quite lenient with the number of bytes detected here, it’s highly unlikely that somebody will have a username containing more than 200 bytes. Additionally, you can create more generic detections looking for things such as a NOP sled in the URI, which we have rules for in our current live production ruleset. Another detection method relies on byte_test when there is a buffer overflow in which bytes at a known location in the packet structure indicate the size of a field and that size exceeds the expected/maximum value, again another example of areas in which we can apply laser focus for detection with minimal FP rates.

Use-After-Free

On the other end of the detection spectrum, we have Use-After-Free vulnerabilities and their ridiculous endless possibilities. Here are a few publicly available POCs of various Use-After-Free exploits that all present the same problem. https://www.exploit-db.com/exploits/46968 https://www.exploit-db.com/exploits/46205 https://www.exploit-db.com/exploits/45279 https://googleprojectzero.github.io/0days-in-the-wild/0day-RCAs/2021/CVE-2021-1879.html

Let’s consider CVE-2021-1897, a Use-After-Free vulnerability in Apple Webkit’s QuickTimePluginReplacement for our analysis.

At first glance, we have a neatly written POC and things look relatively straight forward. Writing a POC-based signature here is a piece of cake but that’s not reliable coverage. Diving deeper into this POC, we can quickly identify many issues that result in no possible reliable detection. If we first consider aspects of the POC that may remain static and be indicative of exploitation, we already have our first problem. Much of this POC contains snippets that are incredibly dynamic. We can’t use variable names in our signature, values assigned to those variables are easily changed, the remote address is also dynamic of course, and all methods used are incredibly generic. Imagine writing an exploit signature in which the only static content is “.getElementById(“ and “window.requestAnimationFrame(”, you see the issue?

Next, we can consider what dynamic aspects we may be able to cover in more creative ways such as utilizing PCRE. Before we even begin here, we’re almost falling into the trap of a POC-based signature, let me explain why. There are many aspects of the POC that could be covered with some PCRE magic. For example, we can write PCRE to detect a possible garbage collector and the same goes for snippets of code that are designed to force JIT compilation in other similar vulnerability classes, but I digress. Suppose we write some PCRE looking for a possible garbage collector, that may look like this but how many different ways do you think you could write a garbage collector in JS?

for\s*\(let\s*(?P<gc_counter>[A-Za-z0-9_-]{1,20})\s*=\s*\d{1,8}\s*\x3b\s*(?P=gc_counter)\s*(?:<|>)\s*(?:0x)?\d{2,}\s*\x3b\s*(?P=gc_counter)(?:\+{2}|-{2})\s*\)

What should we do next? We could use PCRE and named capture groups to detect various interesting variables and where they are used but, now we’re definitely falling into a POC-based signature. There are many ways to write this same exploit, writing a signature based on the structuring of the code is a quick way of drastically reducing your true positive rate. Time to take a step back and think about what we are trying to achieve here. Reading through the writeup provided states that, essentially, the vulnerability is triggered because some object is freed by the garbage collector but references to said object remain after garbage collection, meaning that this freed object (now corrupted) is still accessible and points to unallocated memory. To reliably detect malicious activity, we would need to monitor object types during runtime, which is not possible with an IDS.

A full POC-based signature for this vulnerability may look like this:

alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"EXPLOIT [CVE-2021-1879] - Use-After-Free in QuickTimePluginReplacement"; flow:established,from_server; file.data; content:"window.requestAnimationFrame|28|"; fast_pattern; content:"var"; pcre:"/^\s*(?P<worker>[A-Za-z0-9_-]{1,20})\s*=\s*null\x3b.{1,300}(?P=worker)\s*=\s*document\.getElementById\(\x22(?P=worker)\x22\)\x3b.{1,300}\.addEventListener\(\x22DOMNodeInserted\x22\s*,\s*(?P<callback0>[A-Za-z0-9_-]{1,20}).{0,300}(?P=worker)(?P<worker_ext>(\.\w{1,20})+)\s*=\s*\d+\x3b.{1,300}function\s*(?P=callback0)\([^\)]+\)\s*\{\s*.{1,300}\.requestAnimationFrame\((?P<callback>[A-Za-z0-9_-]{1,20})\)\x3b.{1,300}function\s*(?P<garbagecollector>[A-Za-z0-9_-]{1,20})\(\)\s*\{\s*.{0,100}for\s*\(let\s*(?P<gc_counter>[A-Za-z0-9_-]{1,20})\s*=\s*\d{1,8}\s*\x3b\s*(?P=gc_counter)\s*(?:<|>)\s*(?:0x)?\d{2,}\s*\x3b\s*(?P=gc_counter)(?:\+{2}|-{2})\s*\)\s*.{1,300}function\s*(?P=callback)\([^\)]+\)\s*\{\s*.{1,300}(?P=garbagecollector)\(\)\s*\x3b\s*.{1,300}\((?P=worker)(?P=worker_ext)\)/Rs"; classtype:exploit; sid:90000000; rev:1;)

The immediate downside? re-order the functions without changing any of the code and you have a bypass.

The problem is that we’d be writing network detections on code in transit without any knowledge of what is truly happening. An IDS does not have the capability to query object types or memory allocations during the code’s runtime and thus, we can’t reliably detect such exploits over the wire and endpoint detection should be used instead. Additionally, to consider these types of vulnerabilities as ‘covered’, we’d have to write signatures for all variations. How many ways do you think you could write those above exploit POCs? Exactly.

Such exploits will always be assigned a confidence level of ‘low’. The only solution here is to have an IDS, Suricata for example, spin up a virtual instance of the affected engine and query elements crucial to detecting exploitation activity. Imagine your Suricata instance is creating a V8 VM every time the above static components of the signature are matching on random benign packets. Thus, this is not likely to happen.

Metadata and Finding Exploit Signatures

Onto a more positive note. Emerging Threats rules contain metadata that can be useful for identifying specific exploit signatures however not all signatures designed with exploitation in mind have been easily searchable up until this point. There are many signatures in various other rule categories such as INFO and HUNTING (categories that we know are disabled by default for a lot of people) that are incredibly powerful for detecting exploit activity.

Today, I have started the process of adding ‘possible_exploitation’ to the ‘external tag’ metadata tag for such rules. By grepping the ruleset for this tag, you will highlight the more vague/generic exploit hunting signatures that you will not otherwise see when grepping for specific CVEs.

Here’s an example of a rule now tagged with ‘possible_exploitation’, this rule is now available in the OPEN ruleset:

#alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"ETPRO HUNTING V8 JavaScript Engine JIT Forcing Observed - Investigate Possible Exploitation M2"; flow:from_server,established; http.stat_code; content:"200"; file.data; pcre:"/\s(?P<count_var>[\w-.]{1,30})\s=\s(0x[a-f0-9]{3,12}|\d{4,10}).{1,500}(?P<jit_func>[\w-.]{1,30})\s=\s.{1,500}for\s(var\s(?P<counter>[\w-]{1,20})\s=\s\d+\s\x3b\s(?P=counter)\s<\s(?P=count_var)\s\x3b\s+{2}(?P=counter)\s)\s(?P=jit_func)(/i"; classtype:unknown; sid:2850489; rev:1; metadata:created_at 2021_11_18, former_category HUNTING, updated_at 2021_11_18;)

On the topic of grepping for specific CVEs, I’d like to point out that it’s valuable to grep both the signature message and the specific ‘cve’ metadata tag.

For CVE message: grep -i “msg:\x22[^\x22]+CVE-” <rules_file>

For CVE metadata tag: grep -i ”cve cve_” <rules_file>

Missing Metadata

You may notice that some signatures contain a CVE ID in metadata but not in the signature message as well as some of these signatures sitting in less severe classtypes than expected, this is intentional but seemingly unclear as to why this is the case. When evaluating signatures based on exploits, we try to consider the response to seeing a signature fire. A good example here is IOC based signatures (DNS/TLS) for infrastructure observed utilizing exploits ITW. These signatures may see a CVE ID added to the metadata to show that they were observed in certain exploit campaigns but are not indicative of the exploit itself.

Metadata utilization not only in the context of an exploit but as a whole is something we are constantly discussing, revising, and making improvements to. Internal discussions around clarifying our metadata uses are frequent and we will continue to make updates on metadata going forward.

Finally, if you have metadata suggestions or questions as to how we do things, please feel free to contact us through our feedback portal or you can contact me directly on Twitter and I’ll respond to you as soon as possible.

Written on June 9, 2023