As I’m sure many of you know, I was scheduled to talk at a couple of different conferences (SANS Threat Hunting Summit and x33fcon), but was unable to due to needing open heart surgery. Not wanting all of the preparation work for these conferences to go to waste, I posted the slides here: https://www.slideshare.net/JackCrook/billions-billions-of-logs (If you have not looked at these slides I would suggest that you do before continuing). After posting, I received some great response to the slides and wanted to do a blog post discussing a few of my thoughts since I doubt I will ever give the presentation. I also wanted to exercise my mind a bit by focusing on one of the things that I love to do (hunting). I’ve been exercising my body everyday and it has responded well so the same should be true for my mind, right?
We know that companies are consuming huge amounts of data and that the people tasked with hunting through these logs often struggle with the volume. The question is: How do we, as hunters and defenders, navigate this sea of data and still be able to find what our current detection technologies aren’t finding? This is not an easy task, but I think a lot of it comes down to the questions that we ask.
What am I looking for?
I think often we may try and generalize the things we are hunting for. This generalization can lead to broader queries and introduce vast amounts of uneeded logs into your results. Know exactly what it is that you are looking for and, to the best of your knowledge, what it would look like in the logs that you have. Keep in mind that the smaller that “thing" is that you are trying to find, the more focused your query can become.
Why am I looking for it?
Define what is important to your org and scope your hunts accordingly. Once you define what you will be hunting, break that down into smaller pieces and tackle those one at a time. Prioritize and continue to iterate through your defined process.
How do I find it?
Don’t get tunnel vision. If your queries are producing far to many results, know that this will be very difficult to transition to automated alerting. Can you look at the problem you are trying to solve differently and therefore produce different results? This is, by the way, the topic for the rest of this post.
If I look at KC7 for example. Often times the actions that attackers take are the same actions taken legitimately by users millions upon millions of times a day. From authenticating to a domain controller to mounting a network share, these are legitimate actions that can be seen every minute of every day on a normal network. These can also be signs of malicious activity. So how can we tell the difference? This is where looking at the problem differently comes into play. Before we delve into that though we need to look at attackers and how they may operate. We can then use what we may think we know about them to our advantage.
I used the following hypothesis in my slides and I think you would often see some or all of them in the majority of intrusions.
- Comprised of multiple actions.
- Actions typically happen over short time spans.
- Will often use legitimate windows utilities.
- Will often use tools brought in with them.
- Will often need to elevate permissions.
- Will need to access multiple machines.
- Will need to access files on a filesystem.
Based on the above hypothesis we are able to derive attacker needs.
- Execution
- Credentials
- Enumeration
- Authentication
- Data Movement
I also know that during an active intrusion, attackers are very goal focused. Their actions typically can be associated with one of the above needs and you may see multiple needs from a single entity in very short time spans. If you contrast that to how a normal user looks on a network the behavior is typically very different. We can use these differences to our advantage and instead of looking for actions that would likely produce millions of results, we can look for patterns of behaviors that fall into multiple needs. By looking for indications of needs and chaining them together by time and an additional common entity, such as source host, destination host, user, we can greatly reduce the amount of data we need to hunt through.
So how do we get there?
- Develop queries for specific actions based on attacker needs
a. Accuracy of query is key
b. Volume of output is not
- Enhance data with queries from detection technologies
- Store output of queries in central location
- Each query makes a link
- The sum of links make up a chain
Attackers will continue to operate in our environments and the amount of data the we collect will continue to grow. Being able to come up with creative ways to utilize this data and still find evil is essential. I would love to know your thoughts on this or if you have other ways of dealing with billions and billions of logs. Feel free to reach out in the comment section or on twitter @jackcr.
I would also like to thank everyone who has reached out to me over the past weeks. You are all awesome and have helped and inspired me more than you will ever know. I can’t thank you all enough. I would also like to thank SANS and x33fcon for giving me the opportunity to speak and for being amazing when I had to cancel!