Saturday, February 27, 2021

More Behavioral Hunting and Insider Data Theft

I consider hunting for insider data theft to be the apex in user behavioral analysis.  I recently gave a presentation on this at an internal conference that my team holds once a year.  The talk was titled "How I spent my pandemic" and focused on the things that I've built, discovered and learned over these past months as they relate to this topic.  I'd like to share some of those things in this post.

If you've paid attention to the many indictments handed down by the DoJ over the past few years you can see that espionage via insider theft is real and happens quite often.  I don't see that companies are as well positioned to identify evidence of this type of theft.  Some reasons for this I believe are that there are no public repositories of knowledge.  No security companies blogging about cases they've investigated and nobody publicly talking about what works and doesn't work in the detection realm.  I liken it to how the APT was viewed and discussed 10 years ago. 

The end goal of insiders and many state sponsored external actors are the same, but how they get there can be very different.  Generally speaking, there is no exploitation to gain access to a target network.  No malware to maintain or facilitate further access.  No internal recon, or the many other actions that are typically taken by external actors during active intrusions.  To the contrary.  Insiders will often use approved applications to target data they typically access on devices they are assigned.  What we are left with are changes in their behavior.  That's great.  We have changes in behavior, but changes can happen all of the time for many different reasons.  How do we find the changes that matter is the question.  Here is where we begin.

I've written previous blogs around behavior anomalies at an individual level, but have not discussed measuring behavior of a population of people.  The thought around this is that data theft in general should be an anomaly, so when this occurs the user should end up in a cluster that is far outside the norm.  Here's the Splunk query I'm using to generate the numbers I'm using for scoring and clustering:

index=a_summary_index |stats values(phase_id) as phase_id count(phase_id) as detection_count dc(phase_id) as dc_phase_count values(source_id) as source_id dc(source_id) as dc_detection_count by user |eventstats count(user) as user_count by phase_id |nomv phase_id |nomv source_id |eval userhash=md5(user) |eval phase_hash=md5(phase_id) |eval source_hash=md5(source_id) |eventstats count(user) as user_source_count by source_id |table user,userhash,phase_hash,source_hash,phase_id,source_id,user_count,user_source_count,detection_count,dc_phase_count,dc_detection_count |eval user_count_mult=case(user_count=1, 200, user_count<5, 150, user_count<10, 50, user_count>=10, 0) |eval dc_phase_count_mult=case(dc_phase_count<2, 0, dc_phase_count>=2, 100) |eval dc_detection_count_mult=case(dc_detection_count>2, 100, dc_detection_count=2, 50, dc_detection_count=1, 0) |eval addedweight = (user_count_mult+dc_phase_count_mult+dc_detection_count_mult)

To explain the search:

index=a_summary_index |stats values(phase_id) as phase_id count(phase_id) as detection_count dc(phase_id) as dc_phase_count values(source_id) as source_id dc(source_id) as dc_detection_count by user |eventstats count(user) as user_count by phase_id |nomv phase_id |nomv source_id |eval userhash=md5(user) |eval phase_hash=md5(phase_id) |eval source_hash=md5(source_id) |eventstats count(user) as user_source_count by source_id |table user,userhash,phase_hash,source_hash,phase_id,source_id,user_count,user_source_count,detection_count,dc_phase_count,dc_detection_count 

  • Each detection is logged to a summary index so that I can search over past events.
  • phase_id is a name given to a stage that the user is at in relation to data exfil
  • source_id is the name of the search that generated the detection
  • dc_phase_count is the distinct count of phases
  • dc_detection_count is the distinct count of different detection names
  • user_count is the count of users that were seen in a distinct "phase_id".
  • nomv phase_id: converts multivalue field to a single value (used for hashing)
  • nomv source_id: converts multivalue field to a single value (used for hashing)
  • userhash is the md5 hash of the username
  • phase_hash is the md5 hash of the distinct phases that the user was seen in
  • source_hash is the md5 hash of the distinct detection names that was generated by the user
  • user_source_count is the count of users with matching detections
The remainder of the search is used for scoring:

|eval user_count_mult=case(user_count=1, 200, user_count<5, 150, user_count<10, 50, user_count>=10, 0) |eval dc_phase_count_mult=case(dc_phase_count<2, 0, dc_phase_count>=2, 100) |eval dc_detection_count_mult=case(dc_detection_count>2, 100, dc_detection_count=2, 50, dc_detection_count=1, 0) |eval addedweight = (user_count_mult+dc_phase_count_mult+dc_detection_count_mult)

  • user_count_mult is a value given to numbers of users with pahse_id's.  Fewer users = higher score
  • dc_phase_count_mult is a value given to the count of a distinct phase
  • dc_detection_count_mult is a value given to to the number of users that generated a detection name
  • addedweight is the sum of the above values
Sample output would look like the following:

I can then use kmeans to cluster the output:

The outlier in this case was cluster number 5 which generated a risk score of 400:

I feel that when a user begins to execute on their intent to steal data they will generate anomalous clusters of detections.  These detections can be measured, scored and highlighted against the overall population of users.  Looking for these higher scores is a great way to hunt for anomalous behavior patterns.  This method could also be applied to external threats as well as an external attacker will likely generate anomalous behavior patterns when moving laterally within your environment.

Tuesday, July 7, 2020

Insider Threat Hunting

If you subscribe to the notion that a user, who is intent on stealing data from your org, will require a change in their behavior.  Then identifying that change is critically important.  As this change happens, they will take actions that they have not previously taken.  These actions can be seen as anomalies and is where we want to identify and analyze their behavior.

I've been studying insider IP theft, particularly those with a connection to China, for a number of years now.  I feel that, in a way, this problem mimics the APT of 10 years ago.  Nobody with exposure to it is willing to talk about the things that work or don't.  This leaves open the opportunity for this behavior to successfully continue.  While I'm not going to share specific signatures, I would like to talk about the logic I use for hunting.   This is my attempt to generate conversation around an area that, in my opinion, can't be ignored.

Just as in network intrusions, there are phases that an insider will likely go through before data exfil occurs.  But unlike network intrusions, these phases are not all required as the insider probably has all the access they need.   They may perform additional actions to hide what they are doing.  They may collect data from different areas of the network.  They may even  simply exfil data with no other steps.  There are no set TTP's for classes of insiders, but below are some phases that you could see:

Data Discovery
Data Collection
Data Packaging
Data Obfuscation
Data Exfil

I've also added a few additional points that may be of interest.  These aren't necessarily phases, but may add to the story behind their behavior.  I'm including it in the phase category for scoring purposes though.  The scoring will be explained more below.  The additional points of interest are:

Motive - Is there a reason behind their actions?
Job Position - Does their position give them access to sensitive data?
Red Flags - e.g. Employee has submitted 2 week notice.

By assigning these tags, behaviors that enter multiple phases suddenly become more interesting.  In many cases, multiple phases such as data packaging -> data exfil should rise above a single phase such as data collection.  This is because a rule is only designed to accurately identify an action, regardless of intent.  But by looking at the sum of these actions we can begin to surface behaviors.  This is not to say that the total count of a single rule or a single instance of a highly critical rule will not draw the same attention.  It should, and that's where rule weighting comes in. 

Weighting is simply assigning a number score to the criticality of an action.  If a user performs an action that is being watched, a score is assigned to their total weight (weighted) for the day.  Depending on a user's behavior, their weighted score may rise over the course of that day.  If a user begins exhibiting anomalous behavior and a threshold is met, based on certain conditions, an alert may fire.

An explanation of alert generation.  My first attempt at this was simply to correlate multiple events per user.  As I developed new queries the number of alerts I received grew drastically.  There was really no logic other than looking for multiple events which simply led to noise.  I then sat down and thought about the reasons why I would want to be notified and came up with:

User has been identified in multiple rules + single phase + weight threshold met (500)
User has been identified in multiple phases + weight threshold met (300)
User exceeds weight threshold (1000)

To describe this logic in numbers it would look like:
|where ((scount > 1 and TotalWeight > 500) OR (pcount > 1 and TotalWeight > 300) OR (TotalWieght > 1000))

By implementing those 3 requirements I was able to eliminate the vast majority of noise and began surfacing interesting behavior.  I did  wonder what I may be missing though.  Were there users exhibiting behavior  that I would potentially want to investigate?  Obviously my alert logic wouldn't identify everything of interest.  I needed a good way to hunt for threads to pull, so  I set about describing behaviors in numbers.  I wrote about this a little in a previous post, but I'll go through it again.

Using the existing detections rules I used the rule name, weight and phase fields to create metadata that would describe a user's behavior.  Here's the fields I created and use for them:

Total Weight - The sum weight of a user's behavior.
Distinct rule count - The number of unique rule names a user has been seen in.
Total rule count - The total number of rules a user has been seen in.
Phase count - The number of phases a user has been seen in.

Knowing that riskier behavior often involves multiple actions taken by a user, I created the following fields to convey this.

Phase multiplier - Additional value given to a user that is seen in multiple phases.  Increases for every phase above 1.
Source multiplier - Additional value given to a user that is seen in multiple rules.  Increases for every rule above 1.

We then add Total Weight + Total rule count + Phase multiplier + Source multiplier to get the users weighted score.

By generating these numbers we can, not only observe how a user acted over the course of that day, but also surface anomalous behavior when compared to how others users acted.  For this I'm using an isolation forest and feeding it the total weight, phase count, total rule count and weighted numbers.  I feel these values best describe how a user acted and therefore are best used to identify anomalous activity. 

I'm also storing this metadata so that I can:

Look at their behavior patterns over time This particular user was identified on 4 different days:

Compare their sum of activity to other anomalous users.  This will help me identify the scale of their behavior.  This user's actions are leaning outside of normal ptterns:

I can also look at the daily activity and compare that against top anomalous users or where they rank as a percentage.  You can see on the plot below that the user's actions were anomalous on a couple of different days. 

There are also a number of other use cases for retaining this metadata.

This has taken a lot of time and effort to get to this point and is still a work in progress.  I can say though that I have found this to be a great way to quickly identify the threads that need be pulled.

Again, I'm sharing this so that maybe a conversation will begin to happen.  Are orgs hunting for insiders, and if so, how?  It's a conversation that's long overdue in my opinion.

Monday, June 22, 2020

Dynamic Correlation, ML and Hunting

Hunting has been my primary responsibility for the last several years.  Over this time I've done a lot of experimentation around different processes and methods of finding malicious activity in log data. What has always stayed true though is the need for a solid understanding of the hypothesis you're using, familiarity with all the data you can take advantage of and a method to produce/analyze the results.  For this post I'd like to share one of the ideas I've been working on lately. 

I've previously written a number of blog posts on beaconing.  Over time I've refined much of how I go about looking for these anomalous connections.  My rule of thumb for hunting beacons (or other types of malicious activity) is to ignore static IOC's as those are best suited for detection.  Instead, focus on behaviors or clusters of behaviors that will return higher confidence output.  Here's how I'm accomplishing this in a single Splunk search.

  1. Describe what you are looking for in numbers.  This will allow you to have much more control over your conditional statements which impacts the quality of your output.
  2. Define those attributes that you are interested in and assign number values to them.  These attributes will be your points of correlation.
  3. Reduce your output to those connections that exhibit any of what you are looking for.  This is the correlation piece where we can use the total score of all attributes identified within a src/dest pair.  Higher sums translate to greater numbers of attributes identified.

Below is a screenshot of the search I came up with.  This is again using the botsv3 data set from Splunk's Boss of the SOC competition.  Thanks Splunk!

The following is a description of the fields in the output.

-dest: Based on the data source, this field may include the ip address or domain name.
-src_ip: Source of request
-dest_ip: Destination of request
-bytes_out: Data sent from src_ip.
-distinct_event_count: The total number of connections per destination.
-i_bytecount: The total count of bytes_out by src/dest/bytes_out. Large numbers may indicate beaconing.
-t_bytecount: The total count of connections between src/dest.
-avgcount: i_bytecount / t_bytecount.  Beacon percentage.  Values closer to 1 are more indicative of a beaocn.
-distinct_byte_count: Total count of bytes_out (used in determining percentages for beaconing).
-incount: The count of unique bytes_in values.  When compared with t_bytecount you may see the responses to beacons.
-time_count: The number of hours the src/dest have been communicating.  Large numbers may indicate persistent beaconing.
-o_average: The average beacon count between all src/dests.
-above: The percentage above o_average.
-beaconmult:  weight multiplier given to higher above averages.
-evtmult: weight multiplier given to destinations with higher volume connections.
-timemult: weight multiplier given to connections that last multiple hours.
-addedweight: The sum of all multipliers.

You can see from the search results that we reduced 30k+ events down to 1700 that exhibit some type of behavior that we're interested in.  This is good, but still not feasible to analyze every event individually.  I have a couple of choices to reduce my output at this point.  I can adjust my weighted condition to something like "|where weighted > 100" which would have the effect of requiring multiple characteristics being correlated.  My other choice is to use some type of anomaly detection to surface those odd connections.  You can probably tell from the "ML" portion of the title which direction I'm going to go.  So from here we need a method to pick out the anomalies as the vast majority of this data is likely legitimate traffic.  For this I'll be inserting our results into a MySQL database.  I don't necessarily need to for this analysis, but it's a way for me to keep the metadata of the connections for greater periods of time.  This will allow me to do longer term analysis based on the data that is being stored.

Once it's in the database we can use python and various ML algorithms to surface anomalous traffic.  For this I'll be using an Isolation Forest.  I'll also be choosing fields that I think best represents what a beacon looks like as I don't want to feed every field through this process.

distinct_event_count: Overall activity.
time_count: How persistent is the traffic?
above: How does the beacon frequency compare to all other traffic?
addedweight: How many beacon characteristics does this traffic exhibit?

The following screenshot contains the code as well as the output.

Looking at the top 3 tenths of 1 percent of the most anomalous src/dest pairs you can see that there are 4 destination ip addresses that may need investigating.  If you've read my last 2 posts on beaconing the ip should look familiar.  This ip was definitely used for C2.  The ip is also interesting.  Taking a quick look at the destination in the botsv3 data, you can see memchached injection that appears to be successful.  Additional investigation of the src ip's in this output would definitely be justified. 

I will say that this method is very good at identifying beacons, but beacons are not always malicious.  Greater work may be needed to surface those types malicious connections.  Some additional ideas may be first seen ip's or incorporating proxy data where even more characteristics can be defined, scored and correlated.

A large portion of hunting is experimentation so experiment with the data and see what you can come up with!

Sunday, May 17, 2020

It's all in the numbers

In my last few posts I talked about hunting for anomalies in network data.  I wanted to expand on that a bit and specifically talk about a way we can create metadata around detectable events and use those additional data points for hunting or anomaly detection.  The hope being that the metadata will help point us to areas of investigation that we may not normally take.

For this post I'm again using the BOTS data from Splunk and I've created several saved searches based on behaviors we may see during an intrusion.  Once the saved searches run, the output results are logged to a summary index.  More on that topic can be found here:  The goal is to get all of our detect data into a queryable location as well as a way that we count.

For our saved searches we want to ensure the following.

Create detections based on behaviors:
  • Focus on accuracy regardless of fidelity.
  • A field that will signify an intrusion phase where this detection would normally be seen.
  • A field where a weight can be assigned based on criticality.
  • A common field that can be found in each detection output that will identify the asset or user (src_ip, hostname, username...).
Once the output of our saved searches begins to populate the summary index we would like to have results similar to the screenshot below:

The following is the definition of the fields:
(Note: the events in the screenshot have been deduped.  All calculations have taken place, but am limiting the number of rows.  Much of what is identified in the output is data from the last detection before the dedup occurred.)
  • hostname: Self explanatory, but am also using the src_ip where the hostname can't be determined.
  • source: The name of the saved search.
  • weight: Number assigned that represents criticality of event.
  • phase: Identifier assigned for phase of intrusion.
  • tweight: The sum weight of all detected events.
  • dscount: The distinct county of unique detection names (source field).
  • pcount: The number of unique phases identified.
  • scount: Total number of detection identified.
  • phasemult: An additional value given for number of unique phases identified where that number is > 1.
  • sourcemult: An additional value given for number of unique sources identified where that number is > 1.
  • weighted: The sum score of all values from above.
There are a few points that I want to discuss around the additional fields that I've assigned and the reasons behind them.
  • Phases (phase,pcount,phasemult): Actors or insiders will need to step through multiple phases of activity before data theft occurrs.  Identifying multiple phases in a given period of time may be an indicator of malicious activity. 
  • Sources (source,scount,dscount,sourcemult): A large number of detections may be less concerning if all detections are finding the same activity over and over.  Actors or insiders need to perform multiple steps before data theft occurs and therefor fewer numbers of detections, where those detections surround different actions, would be more concerning.
  • Weight: Weight is based on criticality.  If I see a large weight with few detections, I can assume the behavior may have a higher likely hood of being malicious.
  • Weighted: High scores tend to have more behaviors identified where those behaviors reach multiple behaviors.
Now that we've performed all of these calculations and have a good understanding of what they are, we can run k-means and cluster the results.  I downloaded a csv from the splunk output and named it cluster.csv.  Using the below code you can see I chose 3 clusters using the tweight, phasemult and scount fields.  I believe that the combinations of these fields can be a good representation of anomalous behavior (I could also plug in other combinations and have the potential to surface other behaviors.).

The following is the contents of those clusters.

Based on the output, the machine in cluster 1 definitely should be investigated.  I would also investigate those machines in cluster 2 as well.

Granted, this is a fairly small data set, but is a great representation of what can be done in much larger environments.  The scheduling of this method could also be automated where the results are actioned, correlated, alerted on ...).

Again I would like to thank the Splunk team for producing and releasing BOTS.  It's a great set of data to test with and learn from.

Thursday, May 7, 2020

Hunting for Beacons Part 2

In my last post I talked about a method of hunting for beacons using a combination of Splunk and K-Means to identify outliers in network flow data.  I wanted to write a quick update to that post so that I can expand on a few things.

In that blog post I gave these different points that help define general parameters that I can begin to craft a search around .   This helps to define what it is I'm trying to look for and in a way,  builds sort of a framework that I can follow as I begin looking for this behavior.

  1. Beacons generally create uniform byte patterns
  2. Active C2 generates non uniform byte patterns
  3. There are far more flows that are uniform than non uniform
  4. Active C2 happens in spurts
  5. These patterns will be anomalous when compared to normal traffic

Using the definition above, I went out and built a method that will identify anomalous traffic patterns that may indicate malicious beaconing.  It worked well for the sample data I was using, but when implementing this method in a much larger dataset I had problems.  The anomalous datapoints were much greater and therefore the fidelity of the data I was highlighting was much lower (if you haven't read my last post I would recommend it).  The other issue was that it took much longer to pivot into the data of interest and then try and understand why that pattern was identified as an outlier.  I then decided to see if I could take what I was trying to do with k-means and build it into my Splunk query.  Here is what I came up with:

The entire search looks like this:

index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") (dest_port=443 OR dest_port=80) |stats count(bytes_out) as "beacon_count" values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out |eventstats sum(beacon_count) as total_count dc(bytes_out) as unique_count by src_ip,dest_ip |eval beacon_avg=('beacon_count' / 'total_count') |stats values(beacon_count) as beacon_count values(unique_count) as unique_count values(beacon_avg) as beacon_avg values(total_count) as total_count values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out |eval incount=mvcount(bytes_in) |join dest_ip [|search index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") |stats values(login) as login by dest_ip |eval login_count=mvcount(login)] |eventstats avg(beacon_count) as overall_average |eval beacon_percentage=('beacon_count' / 'overall_average') |table src_ip,dest_ip,bytes_out,beacon_count,beacon_avg,beacon_percentage,overall_average,unique_count,total_count,incount,login_count |sort beacon_percentage desc

Breaking it down:

Collect the data that will be parsed:
  • index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") (dest_port=443 OR dest_port=80)

Count the number of times a unique byte size occurrs between a src, dst and dst port:
  • |stats count(bytes_out) as "beacon_count" values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out

Count the total number of times all bytes sizes occur regardless of size and the distinct number of unique byte sizes:
  • |eventstats sum(beacon_count) as total_count dc(bytes_out) as unique_count by src_ip,dest_ip

Calculate average volume of src,dst,byte size when compared to all traffic between src,dst:
  • |eval beacon_avg=('beacon_count' / 'total_count')

Define fields that may be manipulated, tabled, counted:
  • |stats values(beacon_count) as beacon_count values(unique_count) as unique_count values(beacon_avg) as beacon_avg values(total_count) as total_count values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out

Count the number of unique bytes_in sizes between src,dst,bytes_out.  Can be used to further define parameters with respect to beacon behavior:
  • |eval incount=mvcount(bytes_in)

*** Below is additional from original query ***

Generally there will be a limited number of users beaconing to s single destination.  If this query is looking at an authenticated proxy this will count the total number of users communicating with the destination (this can also add a lot of overhead to your query):
  • |join dest_ip [|search index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") |stats values(login) as login by dest_ip |eval login_count=mvcount(login)]

Calculate the average number of counts between all src,dst,bytes_out:
  • |eventstats avg(beacon_count) as overall_average

Calculate the volume percentage by src,dst,bytes_out based off the overall_average:
  • |eval beacon_percentage=('beacon_count' / 'overall_average')

And the output from the Splunk botsv3 data:

You can see from the output above, the first 2 machines were ones identified as compromised.  The volume of their beacons were 1600 and 400 times more than the average volume of traffic between src,dst,bytes_out.  By adding the bottom portion of the search I've basically built the outlier detection into the query.  You could even add a parameter to the end of the search like "|where beacon_percentage > 500" and only surface anomalous traffic.  Also, by adjusting the numbers in these fields you can really turn the levers and tune the query to different environments. 


If you were to apply this to proxy data you could also run multiple queries based on category.  This may increase the speed and take some of the load off Splunk.

I've also not given up on K-Means.  I just pivoted to using a different method for this.

*** Adding an update to include a Splunk search with a risk scoring function ***

index=someindex sourcetype=somesourcetype earliest=-1d 
|stats count(bytes_out) as "i_bytecount" values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out 
|eventstats sum(i_bytecount) as t_bytecount dc(bytes_out) as distinct_byte_count by src_ip,dest_ip 
|eval avgcount=('i_bytecount' / 't_bytecount') 
|stats values(i_bytecount) as i_bytecount values(distinct_byte_count) as distinct_byte_count values(avgcount) as avgcount values(t_bytecount) as t_bytecount values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out |eval incount=mvcount(bytes_in) 
|join dest_ip 
[|search index=someindex sourcetype=somesourcetye earliest=-1d 
|bucket _time span=1h 
|stats values(user) as user values(_time) as _time dc(url) as distinct_url_count count as distinct_event_count by dest_ip,dest 
|eval time_count=mvcount(_time) 
|eval login_count=mvcount(user)] 
|table dest,src_ip,dest_ip,bytes_out,distinct_url_count,distinct_event_count,i_bytecount,distinct_byte_count,avgcount,t_bytecount,incount,login_count,user,time_count 
|search t_bytecount > 1 login_count < 3 
|eventstats avg(i_bytecount) as o_average 
|eval above=('i_bytecount' / 'o_average') 
|eval avgurl=(distinct_url_count / distinct_event_count) 
|eval usermult=case(login_count=1, 100, login_count=2, 50, login_count>2, 0) 
|eval evtmult=case(distinct_event_count>60, 50, distinct_event_count>300, 100, distinct_event_count<60, 0) 
|eval beaconmult=case(above>5, 100, above>100, 200, above<=5, 0) 
|eval urlmult=case(avgurl>.06 AND avgurl<.94, 0, avgurl>.95 ,100, avgurl<.05, 100) 
|eval timemult=case(time_count > 7, 100, time_count<=7, 0) 
|eval addedweight = (evtmult+usermult+beaconmult+urlmult+timemult) 
|dedup dest 
|search addedweight > 250

Friday, May 1, 2020

Hunting for Beacons

A few years ago I wrote a post about ways that you can correlate different characteristics of backdoor beaconing.  By identifying and combining these different characteristics you may be able to identify unknown backdoors and possibly generate higher fidelity alerting.  The blog can be found here:

What I didn't talk about was utilizing flow data to identify C2.  With the use of ssl or encrypted traffic you may lack the required data to correlate different characteristics and need to rely on other sources of information.  So how do we go hunting for C2 in network flows?  First we need to define what that may look like.

  1. Beacons generally create uniform byte patterns
  2. Active C2 generates non uniform byte patterns
  3. There are far more flows that are uniform than non uniform
  4. Active C2 happens in spurts
  5. These patterns will be anomalous when compared to normal traffic

I've said for a long time that one way to find malicious beaconing in network flow data is to look for patterns of beacons (uniform byte patterns) and alert when the patterns drastically change (non uniform byte patterns).  The problem I had was figuring out how to do just that with the tools I had.  I think we (or maybe just me) often get stuck on a single idea .  When we hit a roadblock we lose momentum and can eventually let the idea go, though it may remain in the back of your head. 

Last week I downloaded the latest Splunk BOTS data source and loaded it into a Splunk instance I have running on a local VM.  I wanted to use this to explore some ideas I had using Jupyter Notebook.  That's when the light went off.  Below is what I came up with.

This Splunk search performs the following:

  1. Collects all flows that are greater than 0 bytes
  2. Counts the number of flows by each unique byte count by src_ip, dest_ip, and dest_port (i_bytecount)
  3. Counts the total number of flows between src_ip, dest_ip (t_bytecount)
  4. Counts the unique number of byte counts by src_ip, dest_ip (distinct_byte_count)
  5. Generates a percentage of traffic by unique byte count between src_ip, dest_ip (avgcount)

The thought being that a beacon will have a high percentage of the overall traffic between 2 endpoints.  Active C2 will be variable in byte counts, which is represented by distinct_byte_count.

I then wanted to identify anomalous patterns (if any) within this data.  For this I used K-Means clustering as I wanted to see if there were patterns that were outside of the norm.  Using the following python code:

import pandas as pd
import numpy as np
import matplotlib.dates as md
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import host_subplot
import mpl_toolkits.axisartist as AA
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.covariance import EllipticEnvelope
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from mpl_toolkits.mplot3d import Axes3D

df = pd.read_csv("ByteAvgs1.csv")
df['t_bytecount'] = pd.to_numeric(df['t_bytecount'], errors='coerce')
df['i_bytecount'] = pd.to_numeric(df['i_bytecount'], errors='coerce')
df['avgcount'] = pd.to_numeric(df['avgcount'], errors='coerce')
df['distinct_byte_count'] = pd.to_numeric(df['distinct_byte_count'], errors='coerce')
df['bytes_out'] = pd.to_numeric(df['bytes_out'], errors='coerce')

X = df[['avgcount', 't_bytecount', 'distinct_byte_count']]
X = X.reset_index(drop=True)
km = KMeans(n_clusters=2)
labels = km.labels_

fig = plt.figure(1, figsize=(7,7))
ax = Axes3D(fig, rect=[0, 0, 0.95, 1], elev=48, azim=134)
ax.scatter(X.iloc[:,0], X.iloc[:,1],
          c=labels.astype(np.float), edgecolor="k")
ax.set_xlabel("Beacon Percentage")
ax.set_ylabel("Total Count")
plt.title("K Means", fontsize=14);

I was able to visualize the following clusters:

While the majority of the traffic looks normal there are definitely few outliers.  The biggest outlier based on the Beacon Percentage and Total Count is:

There were 3865 flows with 97% all being the same byte count.  There were also 19 unique byte counts between these 2 ip's.

Taking a quick look into the ip we can assume that this machine was compromised based off the command for the netcat relay (will take more analysis to confirm):

Obviously this is a quick look into a limited data set and needs more runtime to prove it out.  Though it does speak to exploring new ideas and new methods (or in this case, old ideas and new methods).  You never know what you may surface.

I'd also like to thank the Splunk team for making the data available to everyone.  If you would like to download it, you can find it here:

Wednesday, October 3, 2018

This Day 25 Years Ago

This is definitely a different post for me, but today marks 25 years since Operation Gothic Serpent, or what has become known as Blackhawk Down.  This also marks a significant point in my life and one that will remain my thoughts daily for the next 25 years.  I was in the 24th Infantry Division and my company had just assumed the role of immediate ready company and my platoon was immediate ready platoon (kind of an on call status for IT, but has the possibility to become much much more intense).  As part of this immediate ready platoon, we were tasked with being anywhere in the world within 18 hours if needed.  To my knowledge, that day was the first time a mechanized infantry platoon was deployed with that kind of speed.  They always say it helps to talk about things, so I would like to talk about that day from my point of view.  

My first memory of that day was around 2AM.  I remember being woken up by my beeper and thinking it was a training alert (there were no cell phones back then 😃).  We had just returned from spending 4 weeks in the field and my assumption was a test.  Leadership was testing their ability to contact everyone.  I figured a simple phone call and I would be back in bed.  I made the phone call and was informed that I needed to be in formation in 30 minutes.  A little upset that they were taking it this far, I got dressed, kissed my wife of 6 months goodby, and told her I would be back soon. 

When I arrived at work I saw the look on people’s faces.  Serious, scared, anxious.  These weren’t the faces I’ve seen during routine alerts.  I began hearing of CNN’s reporting on the events happening in Mogadishu.  This was quickly turning into the day I totally wasn’t expecting.  Shortly after I had arrived we were told we needed to draw weapons before formation.  I remember that pit in my stomach, knowing where I was heading and knowing that I would be leaving my wife, who had only lived in the area for 5 weeks, behind.  I hadn’t said goodby to her and prayed that I would be able to at some point before we left.

During formation it was confirmed that we were being deployed to Somalia.  C Company, 3rd PLT, 3/15 INF (which was my platoon) would be the immediate ready platoon and were to head over to the gym where we would receive our shots, have wills drawn up or any other legal needs.  If you haven’t experienced the vaccine process, it’s like an assembly line.  Going from station to station until you eventually reach the end.

The next few hours are kind of vague.  I believe a lot of it was hurry up and wait, but they were probably dealing with some logistical issues.  At any rate, my squad leader allowed me to go home for a few minutes so I could let my wife know what was going on.  While I was at home I was also able to call my parents and let them know too.  I really appreciated that I was able see her and talk to her.  Soon I had to head back though.  I knew the busses would be arriving that would take us to Hunter Army Airfield where our gear and vehicles were prestaged.  The hardest thing in my life was saying good by.  Scared for her because she would be alone in a place she didn’t know.  Wondering if I would ever see her again and knowing she was thinking the same thing.  Finally turning and walking away was absolutely heart wrenching for me.

Soon the busses did show up.  We all boarded and made the way from Ft. Stewart to Savannah.  I watched the cars out of the window and the people walking around.  I thought about how different our lives were at that point in time.  They were headed to the park and I had no idea what I was heading to.  They could plan the rest of their day and I wasn’t sure what I could plan.  It was ok though.  I had been in Desert Storm and knew that it could happen at any time again.  There would have been no way that I would not have gone.  That’s not what we do.  Arriving at Hunter Army Airfield I saw the 2 C5 Galaxies that would take us on our journey.  The Bradleys hadn’t been loaded yet, but they soon would be and we would be on our way.

From the busses we moved to some underground barracks where we were distributed ammo.  Once our magazines were loaded we moved to the range where we had the opportunity to zero our weapons.  This is the one time you want to make sure it’s dead on (pun intended).

Soon we were able to board the plane.  If you’ve never been in one of these, it’s hard to explain how big they are.  The seating is above the cargo area and holds about 40 people.  There is one window seat and everyone faces backwards.  I was the lucky one and had the window seat. It kept me occupied during the trip, even if I was only looking at water much of the time.  I couldn’t sleep and it was good to have something to do.  As we got closer we had people go down to the cargo area where 2 of the 4 Bradleys were.  They loaded the 25mm chain gun as well as missiles into to launcher.  I don’t believe anyone knew what to expect when we landed and the nose of the plane opened up.  I can tell you that nobody wanted to be caught off guard though.

The flight was many many hours.  As we got closer to landing you could see the focus of people change.  It went from talking and joking around to determination and anticipation.  It was time to do the job we were sent to do.  Whatever that may turn out to be.  I would spend the next 5 months in Somalia.  

My experience that day is different than many because I deployed in response to those events.  Please remember those that were there and gave everything they had for their brothers.

This picture was taken late Oct '93 just outside what would become Victory Base.  Was home for a few weeks for my platoon.