Sunday, May 17, 2020

It's all in the numbers


In my last few posts I talked about hunting for anomalies in network data.  I wanted to expand on that a bit and specifically talk about a way we can create metadata around detectable events and use those additional data points for hunting or anomaly detection.  The hope being that the metadata will help point us to areas of investigation that we may not normally take.

For this post I'm again using the BOTS data from Splunk and I've created several saved searches based on behaviors we may see during an intrusion.  Once the saved searches run, the output results are logged to a summary index.  More on that topic can be found here: http://findingbad.blogspot.com/2017/02/hunting-for-chains.html.  The goal is to get all of our detect data into a queryable location as well as a way that we count.

For our saved searches we want to ensure the following.

Create detections based on behaviors:
  • Focus on accuracy regardless of fidelity.
  • A field that will signify an intrusion phase where this detection would normally be seen.
  • A field where a weight can be assigned based on criticality.
  • A common field that can be found in each detection output that will identify the asset or user (src_ip, hostname, username...).
Once the output of our saved searches begins to populate the summary index we would like to have results similar to the screenshot below:













The following is the definition of the fields:
(Note: the events in the screenshot have been deduped.  All calculations have taken place, but am limiting the number of rows.  Much of what is identified in the output is data from the last detection before the dedup occurred.)
  • hostname: Self explanatory, but am also using the src_ip where the hostname can't be determined.
  • source: The name of the saved search.
  • weight: Number assigned that represents criticality of event.
  • phase: Identifier assigned for phase of intrusion.
  • tweight: The sum weight of all detected events.
  • dscount: The distinct county of unique detection names (source field).
  • pcount: The number of unique phases identified.
  • scount: Total number of detection identified.
  • phasemult: An additional value given for number of unique phases identified where that number is > 1.
  • sourcemult: An additional value given for number of unique sources identified where that number is > 1.
  • weighted: The sum score of all values from above.
There are a few points that I want to discuss around the additional fields that I've assigned and the reasons behind them.
  • Phases (phase,pcount,phasemult): Actors or insiders will need to step through multiple phases of activity before data theft occurrs.  Identifying multiple phases in a given period of time may be an indicator of malicious activity. 
  • Sources (source,scount,dscount,sourcemult): A large number of detections may be less concerning if all detections are finding the same activity over and over.  Actors or insiders need to perform multiple steps before data theft occurs and therefor fewer numbers of detections, where those detections surround different actions, would be more concerning.
  • Weight: Weight is based on criticality.  If I see a large weight with few detections, I can assume the behavior may have a higher likely hood of being malicious.
  • Weighted: High scores tend to have more behaviors identified where those behaviors reach multiple behaviors.
Now that we've performed all of these calculations and have a good understanding of what they are, we can run k-means and cluster the results.  I downloaded a csv from the splunk output and named it cluster.csv.  Using the below code you can see I chose 3 clusters using the tweight, phasemult and scount fields.  I believe that the combinations of these fields can be a good representation of anomalous behavior (I could also plug in other combinations and have the potential to surface other behaviors.).












The following is the contents of those clusters.
















Based on the output, the machine in cluster 1 definitely should be investigated.  I would also investigate those machines in cluster 2 as well.

Granted, this is a fairly small data set, but is a great representation of what can be done in much larger environments.  The scheduling of this method could also be automated where the results are actioned, correlated, alerted on ...).

Again I would like to thank the Splunk team for producing and releasing BOTS.  It's a great set of data to test with and learn from.




Thursday, May 7, 2020

Hunting for Beacons Part 2


In my last post I talked about a method of hunting for beacons using a combination of Splunk and K-Means to identify outliers in network flow data.  I wanted to write a quick update to that post so that I can expand on a few things.

In that blog post I gave these different points that help define general parameters that I can begin to craft a search around .   This helps to define what it is I'm trying to look for and in a way,  builds sort of a framework that I can follow as I begin looking for this behavior.

  1. Beacons generally create uniform byte patterns
  2. Active C2 generates non uniform byte patterns
  3. There are far more flows that are uniform than non uniform
  4. Active C2 happens in spurts
  5. These patterns will be anomalous when compared to normal traffic

Using the definition above, I went out and built a method that will identify anomalous traffic patterns that may indicate malicious beaconing.  It worked well for the sample data I was using, but when implementing this method in a much larger dataset I had problems.  The anomalous datapoints were much greater and therefore the fidelity of the data I was highlighting was much lower (if you haven't read my last post I would recommend it).  The other issue was that it took much longer to pivot into the data of interest and then try and understand why that pattern was identified as an outlier.  I then decided to see if I could take what I was trying to do with k-means and build it into my Splunk query.  Here is what I came up with:

The entire search looks like this:


index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") (dest_port=443 OR dest_port=80) |stats count(bytes_out) as "beacon_count" values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out |eventstats sum(beacon_count) as total_count dc(bytes_out) as unique_count by src_ip,dest_ip |eval beacon_avg=('beacon_count' / 'total_count') |stats values(beacon_count) as beacon_count values(unique_count) as unique_count values(beacon_avg) as beacon_avg values(total_count) as total_count values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out |eval incount=mvcount(bytes_in) |join dest_ip [|search index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") |stats values(login) as login by dest_ip |eval login_count=mvcount(login)] |eventstats avg(beacon_count) as overall_average |eval beacon_percentage=('beacon_count' / 'overall_average') |table src_ip,dest_ip,bytes_out,beacon_count,beacon_avg,beacon_percentage,overall_average,unique_count,total_count,incount,login_count |sort beacon_percentage desc

Breaking it down:


Collect the data that will be parsed:
  • index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") (dest_port=443 OR dest_port=80)


Count the number of times a unique byte size occurrs between a src, dst and dst port:
  • |stats count(bytes_out) as "beacon_count" values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out


Count the total number of times all bytes sizes occur regardless of size and the distinct number of unique byte sizes:
  • |eventstats sum(beacon_count) as total_count dc(bytes_out) as unique_count by src_ip,dest_ip


Calculate average volume of src,dst,byte size when compared to all traffic between src,dst:
  • |eval beacon_avg=('beacon_count' / 'total_count')


Define fields that may be manipulated, tabled, counted:
  • |stats values(beacon_count) as beacon_count values(unique_count) as unique_count values(beacon_avg) as beacon_avg values(total_count) as total_count values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out


Count the number of unique bytes_in sizes between src,dst,bytes_out.  Can be used to further define parameters with respect to beacon behavior:
  • |eval incount=mvcount(bytes_in)


*** Below is additional from original query ***

Generally there will be a limited number of users beaconing to s single destination.  If this query is looking at an authenticated proxy this will count the total number of users communicating with the destination (this can also add a lot of overhead to your query):
  • |join dest_ip [|search index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") |stats values(login) as login by dest_ip |eval login_count=mvcount(login)]


Calculate the average number of counts between all src,dst,bytes_out:
  • |eventstats avg(beacon_count) as overall_average


Calculate the volume percentage by src,dst,bytes_out based off the overall_average:
  • |eval beacon_percentage=('beacon_count' / 'overall_average')


And the output from the Splunk botsv3 data:
















You can see from the output above, the first 2 machines were ones identified as compromised.  The volume of their beacons were 1600 and 400 times more than the average volume of traffic between src,dst,bytes_out.  By adding the bottom portion of the search I've basically built the outlier detection into the query.  You could even add a parameter to the end of the search like "|where beacon_percentage > 500" and only surface anomalous traffic.  Also, by adjusting the numbers in these fields you can really turn the levers and tune the query to different environments. 

(beacon_count,beacon_avg,beacon_percentage,overall_average,unique_count,total_count,incount,login_count)

If you were to apply this to proxy data you could also run multiple queries based on category.  This may increase the speed and take some of the load off Splunk.

I've also not given up on K-Means.  I just pivoted to using a different method for this.

Friday, May 1, 2020

Hunting for Beacons


A few years ago I wrote a post about ways that you can correlate different characteristics of backdoor beaconing.  By identifying and combining these different characteristics you may be able to identify unknown backdoors and possibly generate higher fidelity alerting.  The blog can be found here: http://findingbad.blogspot.com/2018/03/c2-hunting.html

What I didn't talk about was utilizing flow data to identify C2.  With the use of ssl or encrypted traffic you may lack the required data to correlate different characteristics and need to rely on other sources of information.  So how do we go hunting for C2 in network flows?  First we need to define what that may look like.

  1. Beacons generally create uniform byte patterns
  2. Active C2 generates non uniform byte patterns
  3. There are far more flows that are uniform than non uniform
  4. Active C2 happens in spurts
  5. These patterns will be anomalous when compared to normal traffic

I've said for a long time that one way to find malicious beaconing in network flow data is to look for patterns of beacons (uniform byte patterns) and alert when the patterns drastically change (non uniform byte patterns).  The problem I had was figuring out how to do just that with the tools I had.  I think we (or maybe just me) often get stuck on a single idea .  When we hit a roadblock we lose momentum and can eventually let the idea go, though it may remain in the back of your head. 

Last week I downloaded the latest Splunk BOTS data source and loaded it into a Splunk instance I have running on a local VM.  I wanted to use this to explore some ideas I had using Jupyter Notebook.  That's when the light went off.  Below is what I came up with.












This Splunk search performs the following:

  1. Collects all flows that are greater than 0 bytes
  2. Counts the number of flows by each unique byte count by src_ip, dest_ip, and dest_port (i_bytecount)
  3. Counts the total number of flows between src_ip, dest_ip (t_bytecount)
  4. Counts the unique number of byte counts by src_ip, dest_ip (distinct_byte_count)
  5. Generates a percentage of traffic by unique byte count between src_ip, dest_ip (avgcount)

The thought being that a beacon will have a high percentage of the overall traffic between 2 endpoints.  Active C2 will be variable in byte counts, which is represented by distinct_byte_count.

I then wanted to identify anomalous patterns (if any) within this data.  For this I used K-Means clustering as I wanted to see if there were patterns that were outside of the norm.  Using the following python code:

import pandas as pd
import numpy as np
import matplotlib.dates as md
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import host_subplot
import mpl_toolkits.axisartist as AA
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.covariance import EllipticEnvelope
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from mpl_toolkits.mplot3d import Axes3D

df = pd.read_csv("ByteAvgs1.csv")
df['t_bytecount'] = pd.to_numeric(df['t_bytecount'], errors='coerce')
df['i_bytecount'] = pd.to_numeric(df['i_bytecount'], errors='coerce')
df['avgcount'] = pd.to_numeric(df['avgcount'], errors='coerce')
df['distinct_byte_count'] = pd.to_numeric(df['distinct_byte_count'], errors='coerce')
df['bytes_out'] = pd.to_numeric(df['bytes_out'], errors='coerce')

X = df[['avgcount', 't_bytecount', 'distinct_byte_count']]
X = X.reset_index(drop=True)
km = KMeans(n_clusters=2)
km.fit(X)
km.predict(X)
labels = km.labels_

fig = plt.figure(1, figsize=(7,7))
ax = Axes3D(fig, rect=[0, 0, 0.95, 1], elev=48, azim=134)
ax.scatter(X.iloc[:,0], X.iloc[:,1],
          c=labels.astype(np.float), edgecolor="k")
ax.set_xlabel("Beacon Percentage")
ax.set_ylabel("Total Count")
ax.set_zlabel("Unique")
plt.title("K Means", fontsize=14);


I was able to visualize the following clusters:


























While the majority of the traffic looks normal there are definitely few outliers.  The biggest outlier based on the Beacon Percentage and Total Count is:




There were 3865 flows with 97% all being the same byte count.  There were also 19 unique byte counts between these 2 ip's.

Taking a quick look into the ip we can assume that this machine was compromised based off the command for the netcat relay (will take more analysis to confirm):








Obviously this is a quick look into a limited data set and needs more runtime to prove it out.  Though it does speak to exploring new ideas and new methods (or in this case, old ideas and new methods).  You never know what you may surface.

I'd also like to thank the Splunk team for making the data available to everyone.  If you would like to download it, you can find it here: https://www.splunk.com/en_us/blog/security/botsv3-dataset-released.html.


Wednesday, October 3, 2018

This Day 25 Years Ago

This is definitely a different post for me, but today marks 25 years since Operation Gothic Serpent, or what has become known as Blackhawk Down.  This also marks a significant point in my life and one that will remain my thoughts daily for the next 25 years.  I was in the 24th Infantry Division and my company had just assumed the role of immediate ready company and my platoon was immediate ready platoon (kind of an on call status for IT, but has the possibility to become much much more intense).  As part of this immediate ready platoon, we were tasked with being anywhere in the world within 18 hours if needed.  To my knowledge, that day was the first time a mechanized infantry platoon was deployed with that kind of speed.  They always say it helps to talk about things, so I would like to talk about that day from my point of view.  

My first memory of that day was around 2AM.  I remember being woken up by my beeper and thinking it was a training alert (there were no cell phones back then 😃).  We had just returned from spending 4 weeks in the field and my assumption was a test.  Leadership was testing their ability to contact everyone.  I figured a simple phone call and I would be back in bed.  I made the phone call and was informed that I needed to be in formation in 30 minutes.  A little upset that they were taking it this far, I got dressed, kissed my wife of 6 months goodby, and told her I would be back soon. 

When I arrived at work I saw the look on people’s faces.  Serious, scared, anxious.  These weren’t the faces I’ve seen during routine alerts.  I began hearing of CNN’s reporting on the events happening in Mogadishu.  This was quickly turning into the day I totally wasn’t expecting.  Shortly after I had arrived we were told we needed to draw weapons before formation.  I remember that pit in my stomach, knowing where I was heading and knowing that I would be leaving my wife, who had only lived in the area for 5 weeks, behind.  I hadn’t said goodby to her and prayed that I would be able to at some point before we left.

During formation it was confirmed that we were being deployed to Somalia.  3rd PLT 3/15 INF (which was my platoon) would be the immediate ready platoon and were to head over to the gym where we would receive our shots, have wills drawn up or any other legal needs.  If you haven’t experienced the vaccine process, it’s like an assembly line.  Going from station to station until you eventually reach the end.

The next few hours are kind of vague.  I believe a lot of it was hurry up and wait, but they were probably dealing with some logistical issues.  At any rate, my squad leader allowed me to go home for a few minutes so I could let my wife know what was going on.  While I was at home I was also able to call my parents and let them know too.  I really appreciated that I was able see her and talk to her.  Soon I had to head back though.  I knew the busses would be arriving that would take us to Hunter Army Airfield where our gear and vehicles were prestaged.  The hardest thing in my life was saying good by.  Scared for her because she would be alone in a place she didn’t know.  Wondering if I would ever see her again and knowing she was thinking the same thing.  Finally turning and walking away was absolutely heart wrenching for me.

Soon the busses did show up.  We all boarded and made the way from Ft. Stewart to Savannah.  I watched the cars out of the window and the people walking around.  I thought about how different our lives were at that point in time.  They were headed to the park and I had no idea what I was heading to.  They could plan the rest of their day and I wasn’t sure what I could plan.  It was ok though.  I had been in Desert Storm and knew that it could happen at any time again.  There would have been no way that I would not have gone.  That’s not what we do.  Arriving at Hunter Army Airfield I saw the 2 C5 Galaxies that would take us on our journey.  The Bradleys hadn’t been loaded yet, but they soon would be and we would be on our way.

From the busses we moved to some underground barracks where we were distributed ammo.  Once our magazines were loaded we moved to the range where we had the opportunity to zero our weapons.  This is the one time you want to make sure it’s dead on (pun intended).

Soon we were able to board the plane.  If you’ve never been in one of these, it’s hard to explain how big they are.  The seating is above the cargo area and holds about 40 people.  There is one window seat and everyone faces backwards.  I was the lucky one and had the window seat. It kept me occupied during the trip, even if I was only looking at water much of the time.  I couldn’t sleep and it was good to have something to do.  As we got closer we had people go down to the cargo area where 2 of the 4 Bradleys were.  They loaded the 25mm chain gun as well as missiles into to launcher.  I don’t believe anyone knew what to expect when we landed and the nose of the plane opened up.  I can tell you that nobody wanted to be caught off guard though.

The flight was many many hours.  As we got closer to landing you could see the focus of people change.  It went from talking and joking around to determination and anticipation.  It was time to do the job we were sent to do.  Whatever that may turn out to be.  I would spend the next 5 months in Somalia.  


My experience that day is different than many because I deployed in response to those events.  Please remember those that were there and gave everything they had for their brothers.

This picture was taken late Oct '93 just outside what would become Victory Base.  Was home for a few weeks for my platoon.


Saturday, September 8, 2018

Thoughts After the Sans 2018 ThreatHunting Summit

Over the past few days I've had the pleasure of attending the Sans ThreatHunting Summit.  I thought, not only was this a terrific event, but also gave me the opportunity to see how others in our community are tackling problems that we all are dealing with.  I was able to look at the things I am doing and see if there are ways that I can improve or things that I can incorporate into my current processes.  The summit speakers and attendees also helped me spark new ideas as well as things I would like to dig into more.

One of the thoughts I had during the summit was when Alex Pinto (@alexcpsec) and Rob Lee (@RobertMLee) were discussing machine learning.  I believe ML may be hard to implement into a detection strategy, unless it’s for a very narrow and specific use case.  As the scope widens, the accuracy of your results may suffer.  What would happen though if we start building models based on a wider scope, but built them in a way that would cluster with other models?  Would we be able to cluster the results of these different models in a way may then highlight an attacker performing different actions during an intrusion.  I’m spit balling here, but as an example: 
  1. A model looking at all flow data for anomalous network patterns between machines.
  2. A model that is looking for anomalous authentication patterns.  
Can the results of these 2 models then be clustered by src ip or dest ip (or some other attribute) and the cluster would be a higher fidelity event than the results of each individual model?  I’m not sure as I don’t have a lot of experience with ML, so just throwing that out there.

Rick McElroy (@InfoSecRick) was also talking about something similar during his keynote.  Analysts need context when they are looking at events as it’s often very hard to classify something as malicious until you have aditional supporting evidence (I summarized).  I believe we can often build multiple points of context into our our alerting though.  By building visibility around triggers (actions), regardless of how noisy they may be individually, we can then generate alerts where there are multiple data points and therefore produce higher fidelity alerts while reducing the overall number.  An example may be: 
  1. Powershell initiating a http request.
  2. First seen non alpha-numeric character pattern.
  3. Multiple flags where the flag is 3 characters or less
By being able to generate an alert on any of the 3 characteristics, but not doing so until I have met a threshold of 2 or more, I have dramatically increased the fidelity of the alert.  Or we could generate a suspicious powershell event based on any 1 of the 3 occurring and send an alert when and additional suspicious action on the host has been identified within a certain time frame.  An executable being written to a Temp directory may be an example (or any other detection you may have that will map to the host).  The cool thing about this is that you can start to dynamically see behaviors vs singular events.

ATT&CK was discussed quite a bit throughout the summit (@likethecoins and @its_a_feature_).  This is such a cool framework.  Analysts can wrap their heads around the things that they can (and should) be hunting for.  I’m curious how many companies have adopted this framework and are using it to build and validate their detection.  If you start building the visibility around the types of things listed in ATT&CK, can you then start clustering events generated and map them through the framework?  The more data points that map, does that raise the confidence of the behavior, machine in question or user associated with the events?


My flight was delayed today, so I’ve been sitting at the airport for the last several hours.  This is a quick post, but I wanted to get these thoughts jotted down while I had some time. 

Saturday, June 30, 2018

Methods of Detection

I talk a lot about how I may go about finding adversary behavior, but I have not spoken very much about how teams may be alerted.  This is a much needed conversation in my opinion.  As teams gain capability and visibility, their alert volumes will likely increase too.  The obvious example may be the team that implemented a threat feed and wants to incorporate a watchlist that is largely derived from this.  Sure, you will probably receive alerts from this and management will be happy that you are finally doing "Intel driven detection" ;) , but how will your analysts work these without the proper context as to why that indicator may be bad?

I believe there are 3 different forms of detection.  Using these correctly can:

1. Decrease analyst fatigue.
2. Decrease false positive rate.
3. Decrease alert volume.
3. Gain additional visibility.
4. Gain additional alerting capability.


Detections that are fed directly to an analyst as an alert. 
      
These detections are generally high fidelity and are well documented.  Available to the analyst are descriptions of what the intention of the detection rules are along with true positive and false positive examples. 

Detections that are used for correlation.

These detections are generally low fidelity.  They may happen often in our environments as normal activity, but when combining multiple detections and looking at order / timing, they may indicate malicious actions being taken by an attacker.  I also believe that all detections that go directly to analysts should go in this bucket as well.  You never know when looking at clusters of detections can change how an alert is categorized.

The downside of alerting from dynamically correlated events is that they may be more difficult to analyze.  You may often be looking for behaviors and those analysts with limited experience may miss key indicators that point to malicious behavior.  If tuned correctly the alert volume should be low so it may be possible to route all alerts derived from these to more senior analysts.

Detections written to increase visibility.

These detections are used to increase our ability to perform direct alerting as well as correlation.  An example may be that we want to know when a Windows command prompt is spawned across a smb session.  We can use an IDS such as snort to gain this visibility.  We can then feed that into our correlation bucket or directly to an analyst depending on fidelity.  This is just one example, think about other technologies your org has that would allow you to write rules and gain additional capability (HIPS, HIDS, Proxy, Sysmon..).

So now if we take our initial example of alerting directly off a newly purchased threat feed, we may see (based on alert volume and fidelity) that a better option could be to use these detections in the correlation bucket.  An example could then be Watchlist alert + Rare User-Agent + URI Volume.  Alone these detections may fire 1000's of times a day, but together they may mean a newly discovered compromise.

Thursday, March 29, 2018

C2 Hunting

For an adversary to be successful in your environment they will need a way to enter and leave your network.  This can obviously happen in many different ways.  One way may be an attacker utilizing 3rd party access, another possibly gaining access through an externally facing device, but more often than not, this is facilitated by a backdoor being placed on a machine within your network, or at least the initial stages are.  Going with this assumption, it then makes sense that we spend a large amount of time and effort trying to identify indications of backdoors.

So when you sit down and think about the problem, ask yourself, what does a backdoor look like.  What does it look like when it’s initially placed on the machine?  What does it look like when it starts?  What does it look like when it beacons?  What does it look like when it’s actively being used?  For this post I will be focusing on beacon behaviors, but  remember that there are many other opportunities to hunt for and identify these.

When we investigate IDS alerts that are related to C2 activity, what are some of the indications that we look for that may help tip the scale in saying that the alert is a true positive.  Or to put it another way, what are some of the things that may be common about C2?

  • User-Agent is rare
  • User-Agent is new
  • Domain is rare
  • Domain is new
  • High frequency of http connections
  • URI is same
  • URI varies but length is constant
  • Domain varies but length is constant
  • Missing referrer
  • Missing or same referrer to multiple uri’s on single dest.

All of the above will not be true about every beacon, but in a far majority of instances, more than one statement will be true.  If I look for multiples of the above, by source and destination pairs, I believe that I will have a higher chance of identifying malicious beacon traffic than by analyzing each individually.

Next we need to generate some traffic so that we can validate our theories.  If you are wondering about a list of backdoors that would be good to test, have a look at attack.mitre.org and the backdoors that have been used by the various actors that are tracked.  I also can’t emphasize enough the importance of having an environment that you can use for testing out theories.  Being able to perform and log the actions that you want to find can often lead to new ideas when you see the actual data that is generated.  You also need to know that your queries will really find what you are looking for.  For this testing I setup 3 vm’s which are listed below.

Machine 1 
  • Ubuntu 16.04
  • InetSim
  • Bro
  • Splunkforwarder

Machine 2
  • Ubuntu 16.04
  • Free Splunk

Machine 3
  • Windows 7
  • Default route and DNS is set to the IP address of Machine 1.

Flow
  • Obviously the malware will be executed on Machine 3.  For backdoors that communicate with a domain based C2, a DNS lookup will occur and the dns name will resolve to Machine 1.  For IP based C2, the traffic will follow the default route on Machine 3 and Machine 1 will respond (using an iptables redirect and nat rule).
  • InetSim will respond to the C2 communication.
  • Bro will log the http traffic and forward logs to the spunk server.
  • Scheduled queries will run within the Splunk environment to identify C2 behaviors that we define.
  • Results of queries will be logged to separate index within Splunk.
  • Scheduled search will run against this new index in an attempt to identify multiple behaviors on either a host or destination.


I used the file for this blog post from the link below.  It’s named Cobaltstrike.exe, but I don’t believe it’s a Cobaltstrike backdoor.  I believe it serves the purpose for this post though.  How can we go about finding unknown backdoors, or backdoors that we don’t have signatures for.


https://www.hybrid-analysis.com/sample/5b16d3c8451a1ea7633aae14c28f30c2d5c9b925d9f607938828bf543db9c582?environmentId=100

The result of executing this particular backdoor can be seen in the screenshot of correlated events below.  To get a better understanding of how this correlation occurred I'll go over the queries that got us here.




When an http based backdoor communicates, it will reach out to a URI.  The URI or the URI structure is typically coded into the backdoor.  If the backdoor beacons to multiple URI's on the same C2 host, these URI's are very often the same character length.  This query looks for source/destination pairs with greater than 6 connections to multiple UR's of which all are the same length.


Just as the URI or URI structure is often coded into a backdoor, a User-Agent string is as well.  These User-Agents are very often unique due to misspellings, version mismatches or simply random naming.  By stacking User-Agents you will find rare ones, but very often, after investigating these, they will wind up being legitimate traffic.  By combining rare UA's with additional C2 behavior you can quickly focus on the connections you should be looking at.  This query looks for less than 10 source hosts, all using a single UA, communicating to the same destination.


When you want to identify how a host wound up visiting a specific URL you would typically look at the referrer field.  Very often the referrer is left blank with C2 traffic or can be hardcoded with a single referrer for every beacon.  It can be odd to see the same referrer field to multiple URI's, all on the same destination host.  This query identifies a single referrer listed for multiple URI's on a single destination.


This query simply looks at volume of traffic between a source and a destination.  When combined with additional behaviors, this can be a good indicator of malicious traffic.


There are many additional signs of malicious beacon traffic.  By spending time identifying these behaviors and incorporating them into some type of detection workflow, your chances of spotting malicious over benign becomes much greater.  By applying this methodology you gain additional coverage over signature based detection or new capability where you currently don't have detection, but have the data (i.e. proxy logs). 

All questions and comments are welcome.  Feel free to reach out on twitter @jackcr.