In my last post I
talked about a method of hunting for beacons using a combination of Splunk and
K-Means to identify outliers in network flow data. I wanted to write a quick update to that post
so that I can expand on a few things.
In that blog post I
gave these different points that help define general parameters that I can
begin to craft a search around . This
helps to define what it is I'm trying to look for and in a way, builds sort of a framework that I can follow
as I begin looking for this behavior.
- Beacons generally create uniform byte patterns
- Active C2 generates non uniform byte patterns
- There are far more flows that are uniform than non uniform
- Active C2 happens in spurts
- These patterns will be anomalous when compared to normal traffic
Using the definition above, I went out and built a
method that will identify anomalous traffic patterns that may indicate
malicious beaconing. It worked well for
the sample data I was using, but when implementing this method in a much larger
dataset I had problems. The anomalous
datapoints were much greater and therefore the fidelity of the data I was
highlighting was much lower (if you haven't read my last post I would recommend
it). The other issue was that it took
much longer to pivot into the data of interest and then try and understand why
that pattern was identified as an outlier.
I then decided to see if I could take what I was trying to do with
k-means and build it into my Splunk
query. Here is what I came up with:
The entire search looks like this:
index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") (dest_port=443 OR dest_port=80) |stats count(bytes_out) as "beacon_count" values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out |eventstats sum(beacon_count) as total_count dc(bytes_out) as unique_count by src_ip,dest_ip |eval beacon_avg=('beacon_count' / 'total_count') |stats values(beacon_count) as beacon_count values(unique_count) as unique_count values(beacon_avg) as beacon_avg values(total_count) as total_count values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out |eval incount=mvcount(bytes_in) |join dest_ip [|search index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") |stats values(login) as login by dest_ip |eval login_count=mvcount(login)] |eventstats avg(beacon_count) as overall_average |eval beacon_percentage=('beacon_count' / 'overall_average') |table src_ip,dest_ip,bytes_out,beacon_count,beacon_avg,beacon_percentage,overall_average,unique_count,total_count,incount,login_count |sort beacon_percentage desc
Breaking it down:
Collect the data that will be parsed:
- index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") (dest_port=443 OR dest_port=80)
Count the number of
times a unique byte size occurrs between a src, dst and dst port:
- |stats count(bytes_out) as "beacon_count" values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out
Count the total
number of times all bytes sizes occur regardless of size and the distinct
number of unique byte sizes:
- |eventstats sum(beacon_count) as total_count dc(bytes_out) as unique_count by src_ip,dest_ip
Calculate average
volume of src,dst,byte size when compared to all traffic between src,dst:
- |eval beacon_avg=('beacon_count' / 'total_count')
Define fields that
may be manipulated, tabled, counted:
- |stats values(beacon_count) as beacon_count values(unique_count) as unique_count values(beacon_avg) as beacon_avg values(total_count) as total_count values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out
Count the number of
unique bytes_in sizes between src,dst,bytes_out. Can be used to further define parameters with respect to beacon behavior:
- |eval incount=mvcount(bytes_in)
*** Below is
additional from original query ***
Generally there will
be a limited number of users beaconing to s single destination. If this query is looking at an authenticated
proxy this will count the total number of users communicating with the
destination (this can also add a lot of overhead to your query):
- |join dest_ip [|search index=botsv3 earliest=0 (sourcetype="stream:tcp" OR sourcetype="stream:ip") |stats values(login) as login by dest_ip |eval login_count=mvcount(login)]
Calculate the
average number of counts between all src,dst,bytes_out:
- |eventstats avg(beacon_count) as overall_average
Calculate the volume
percentage by src,dst,bytes_out based off the overall_average:
- |eval beacon_percentage=('beacon_count' / 'overall_average')
And the output from
the Splunk botsv3 data:
You can see from the
output above, the first 2 machines were ones identified as compromised. The volume of their beacons were 1600 and 400
times more than the average volume of traffic between src,dst,bytes_out. By adding the bottom portion of the search
I've basically built the outlier detection into the query. You could even add a parameter to the end of
the search like "|where beacon_percentage > 500" and only surface
anomalous traffic. Also, by adjusting the numbers in these fields you can really turn the levers and tune the query to different environments.
(beacon_count,beacon_avg,beacon_percentage,overall_average,unique_count,total_count,incount,login_count)
If you were to apply
this to proxy data you could also run multiple queries based on category. This may increase the speed and take some of the load off Splunk.
I've also not given
up on K-Means. I just pivoted to using a
different method for this.
*** Adding an update to include a Splunk search with a risk scoring function ***
index=someindex sourcetype=somesourcetype earliest=-1d
|stats count(bytes_out) as "i_bytecount" values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out
|eventstats sum(i_bytecount) as t_bytecount dc(bytes_out) as distinct_byte_count by src_ip,dest_ip
|eval avgcount=('i_bytecount' / 't_bytecount')
|stats values(i_bytecount) as i_bytecount values(distinct_byte_count) as distinct_byte_count values(avgcount) as avgcount values(t_bytecount) as t_bytecount values(bytes_in) as bytes_in by src_ip,dest_ip,bytes_out |eval incount=mvcount(bytes_in)
|join dest_ip
[|search index=someindex sourcetype=somesourcetye earliest=-1d
|bucket _time span=1h
|stats values(user) as user values(_time) as _time dc(url) as distinct_url_count count as distinct_event_count by dest_ip,dest
|eval time_count=mvcount(_time)
|eval login_count=mvcount(user)]
|table dest,src_ip,dest_ip,bytes_out,distinct_url_count,distinct_event_count,i_bytecount,distinct_byte_count,avgcount,t_bytecount,incount,login_count,user,time_count
|search t_bytecount > 1 login_count < 3
|eventstats avg(i_bytecount) as o_average
|eval above=('i_bytecount' / 'o_average')
|eval avgurl=(distinct_url_count / distinct_event_count)
|eval usermult=case(login_count=1, 100, login_count=2, 50, login_count>2, 0)
|eval evtmult=case(distinct_event_count>60, 50, distinct_event_count>300, 100, distinct_event_count<60, 0)
|eval beaconmult=case(above>5, 100, above>100, 200, above<=5, 0)
|eval urlmult=case(avgurl>.06 AND avgurl<.94, 0, avgurl>.95 ,100, avgurl<.05, 100)
|eval timemult=case(time_count > 7, 100, time_count<=7, 0)
|eval addedweight = (evtmult+usermult+beaconmult+urlmult+timemult)
|dedup dest
|search addedweight > 250
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.