The Advanced Query feature within the Samurai MDR application uses Microsoft's Kusto Query Language (KQL). In this article we discuss the basics of KQL, the logic of a query and provide some examples to get you started.
What is KQL?
In short, KQL is as it states, a Query Language.
The "K" in KQL (Kusto) is named after Jacques Cousteau, the infamous ocean explorer! Just like Jacques's exploration into the depths of the oceans, finding previously unknown volcanic basins, KQL provides you the ability to explore the expanse of your telemetry data.
Why and when use KQL?
Of course, you are not going to find any volcanic basins or new species of dolphin in your data, however in the cybersecurity context it will allow you to find actionable information. Use of KQL will allow you to investigate your data to answer simple questions such as 'is my log source generating data' through to tracing the sequence of events in a breach. You may be familiar with the term Threat Hunting, effectively searching for malicious, suspicious or nefarious activity - whether that be proactive via determining a hypothesis through to hunts based on Indicators of Compromise (IOCs) and Indicators of Attack (IOAs). In essence, using KQL helps you answer the following questions:
- Does X exist
- Where does X exist?
- Why does X exist?
- How to respond?
A typical query is structured to search, locate information and produce results.
The structure may include:
- What? table to query
- Pipe (|) for command separation
- Order data
- Modify Columns in results
Lets walk through some simple examples to understand the logic.
Find events between two hosts
events | where src_ip == "10.170.236.50" and dest_ip == "10.179.236.106"
- The first step in this query outlines what to query, in this example it is the "events" table. By default Samurai always queries the "events" table.
- The pipe ( | ) command is always used for command separation.
- We then use a 'where' operator to filter within the query for the source ip address (src_ip) of "10.170.236.50" and a destination IP address (dst_ip) of "10.179.236.106"
When looking at the results of a query, you will be presented with associated Fields based on the query which allows you to narrow down your search. KQL query statements work like a funnel, starting with a large data set and passing it through multiple operators until it is filtered, summarized or rearranged as required.
By selecting a Favorite Field or Other Field you can start to narrow down your results to your requirements. Alternatively you may choose to include the fields within the query itself or use the Project operator to include specific column fields within your result.
Refer to Advanced Query Functionality to understand more on Favorite Fields and Other Fields.
Search for events with source IP 10.170.236.50 and display a table with a few selected columns
The pipe ( | ) command is always used for command separation
|where src_ip == "10.170.236.50"
|order by timestamp
- Query the 'events' table
- Filter events using the where operator for source IP address "10.170.236.50"
- Use the order operator to order results by timestamp
- Use the project operator to include the column fields "timestamp, action, src_ip, src_port, dest_ip, dest_port"
The simple examples above make use of common operators, use the KQL quick reference guide for more info on operators which includes a comprehensive list with definitions.
Lets now walk through some more complex examples.
Frequency of Events
A common requirement is to find the frequency of occurrence of an event. For instance, in this example we are reviewing Amazon VPC Flow logs and finding which destination IP addresses are receiving the most connections to port numbers below 1024.
| where product == "Secure Firewall (FTD)"
and toint(dest_port) < 1024
| summarize connections = count() by dest_ip
| order by connections
What is also worth noting in this example is that we first need to convert the destination port number to an integer type (as it is a text field) before checking if it is a low-numbered port (less than 1024). For more information on the data types used in the data lake, you can refer to the article on event field data types.
This query produces output which looks something like this:
The result helps us to see which IP addresses are accepting the most connections to privileged ports.
Querying raw logs
Advanced Query isn't only able to query logs from sources which originate from supported integrations. The fact that logs from any kind of syslog source can be ingested into the Samurai platform makes it possible to query the raw content of these logs using Advanced Query.
In the example below, we are taking authentication logs from a host, and querying for failed authentication attempts. We are able to use the extend operator and extract function to create our own fields from the log lines, parsing them using regular expressions.
| where host == "10.1.1.1"
and (raw contains "Invalid"or raw contains "fail")
and raw !contains "connect"
| extend message = substring(raw, 16)
| extend src_host = extract("([A-Za-z0-9\\-]+) .+", 1, message)
| extend msg_info = extract("\\[[0-9]+\\]\\: (.+)", 1, message)
| extend app_src = extract("[A-Za-z0-9\\-]+ ([a-zA-Z0-9\\-]+).+", 1, message)
| extend user = extract(" ([A-Za-z0-9\\-]+) from ", 1, msg_info)
| extend src_ip = extract("from ([0-9a-f\\.\\:]+)", 1, msg_info)
| project timestamp,
| summarize attempts = count() by src_ip
| order by attempts
Once we have extracted the fields we want, we can then go on to perform more operations. In this case we are summarizing the logs by counting the failed authentication attempts by source IP address, and ordering the list so that the IP address with the most failed attempts is listed first. In this case, this helps us to find potential brute force attackers who are trying to guess passwords through brute force tactics.
Be Specific when constructing queries!
Used correctly, Advanced Query can perform sophisticated queries matching against a data set measured in terabytes within seconds! However poorly constructed queries can cause problems, cause dreaded browser slowdowns, or even trigger a query time-out when exceeding the maximum allowed query wait time. The more specific you are with your query, the quicker you are able to will get to the Result.
Refer to the Microsoft documentation Query Limits for further information on limitations.
Lets look at an example:
- Try not run a query with no criteria (for example simply 'events') against a long timer period. Whilst this might be tempting to view all events, this will match ALL events in your Samurai tenant, delivering a sub optimal experience - results for such a query could potentially be measured in Gigabytes or at times Terabytes!
Instead, try to be as specific as possible:
- If you are querying activity for a specific source host, add a where statement specifically asking for results from a specific source:
where src == "172.21.33.99"
- # matching results: 9 100 000 events
- Approx time to completion: Partial results in 45 seconds.
- If you are researching activity related to a specific source type, extend the query to specifically ask for results matching a type:
where src == "172.21.33.99" and type == "WEBPROXY"
- # matching results: 3 700 000 events
- Approx time to completion: Full Results in 40 seconds.
- If you are looking for specific fields, extend the query to specifically project specific fields:
| where src == "172.21.33.99" and type == "WEBPROXY"
| project timestamp, src, url
- # matching results: 3 700 000 events
- Approx time to completion: Full Results in 7 seconds
As you may have realized from reading this article, Advanced Query is a powerful tool - only limited by your own understanding of KQL and in determining what questions or hypothesis against your data you may have. We recommend you start by writing a few simple queries and review the Microsoft documentation. If you need a reminder of usage in the Samurai MDR application, be sure to review Advanced Query Functionality.