Introduction To Cyber Incidents
Cyber incidents are a part of life. It is no longer a matter of if you will have an incident, but rather when and how frequent. Just in the field of ransomware incidents alone, there has been a 144% increase in the average amount of ransomware demands, with a 78% increase in payments being made, which makes this a lucrative business venture for attackers.
This problem is only made worse by the fact that threat actors require less skills today to perform a successful attack. Solutions such as ransomware-as-a-service make it easy for even a low-skilled threat actor to hold an entire organisation ransom. The same can’t, however, be said for the blue (defence) team. While there have been technological advancements, the drastic skills shortage remains, and new attacker vectors are being published regularly. The age old saying still holds:
While the best way to learn how to deal with an incident is to actually have an incident, this is not really an effective learning strategy. This is where tabletop exercises come in. You can almost think about these exercises as a ‘Dungeons and Dragons’ session for blue teams, but with ransomware instead! During these exercises, the blue team works through an incident scenario to test their processes, playbooks, and response to a scripted cyber incident. These exercises can be used to highlight shortcomings, such as the team not having sufficient knowledge about modern attacker techniques, or certain critical process not being documented. Most organisations that have a blue team should be doing these exercises on a fairly regular basis to keep their team sharp and ready for a real-life incident.
These exercises themselves do, however, have some shortcomings. The biggest shortcoming is the fact that talking about an action is not the same as carrying it out and gaining an understanding of the cost and implications of performing that action in practice. Saying that you will rotate the KRBTGT account’s password twice to flush golden and silver tickets is easy, but if you have ever done this, you’ll know how disruptive this action can be to critical services and systems that rely on Active Directory for authentication. Especially those pesky legacy services.
Furthermore, these tabletop exercises often do not follow the same process that a real-world attacker would, and they tend to focus on specific phases of the cyber kill chain. With blue teams being generally more familiar with the initial phases of the kill chain (reconnaissance and delivery), they are often able to act against the threat actor quite quickly, before any real goal execution can occur.
Lastly, the effectiveness of controls and countermeasures are often hyper-inflated during these exercises. Our personal favourite is the ultimate Anti-Virus (AV) or Endpoint Detection and Response (EDR) tool that is seemingly able to catch all malware and halt the attackers dead in their tracks. From real-life experience, we can tell you that this is seldom the case. With these shortcomings in mind, our journey of preparation against a cyber incident started with a request from a client.
We were asked to solve three main challenges that the client was facing with their standard tabletop exercises:
- Could the tabletop scenario be more aligned with real-world attacks?
- Although tabletop exercises are more process-driven and focused, was there any way that the actual investigation team could participate in the exercise?
- Since the scenario for every tabletop exercise is different, how could a team’s ability and improvement be accurately measured across different scenarios?
With these requests, we set out on a journey to try and find potential solutions and quickly realised that we needed to rethink how we performed tabletop exercises. To address the request, we had to solve three distinct challenges:
Challenge 1 – Technical Components
The first challenge that we had to solve was a technical challenge. During tabletop exercises, teams usually ask investigative questions which the facilitators then answer. This does not really allow for the investigation team to perform an actual investigation. Determining incident scope is a vital component of any incident investigation and being handed this scope on a silver platter impacts the credibility of the exercise.
To address this shortcoming, we created a lab environment where we could simulate an organisation’s infrastructure. By creating a simulation lab, we could launch a real-life attack against this organisation for our scenario. While the attack was being executed, we could capture key forensic elements such as artefacts, indicators of compromise, and log information.
We created the lab environment using Vagrant as an Infrastructure-as-Code solution. The lab represented a fictitious organisation complete with Active Directory through multiple domain controllers and over 6000 employees. It also hosted several key infrastructure elements, such as mail servers and web applications.
With the lab created, we could then leverage our red team capabilities to launch a simulated red team attack against this organisation, that followed the entire cyber kill chain all the way to goal execution. Our first scenario looked like this:
While the attack was being executed, we were able to capture log information and forensic evidence that could be provided to the investigation team during the tabletop exercise for analysis. While the attack simulated the entire cyber kill chain, the goal was to break the attack into smaller components. This would allow us to swap out the components to easily create different scenarios. For each of these components, forensic evidence, as well as the attack path, was stored.
Using a Python script, each attack component’s log information could be spliced together with other attack components to create the overall attack path for a scenario. During the exercise, log information could then be replayed in real time to the blue team for investigation.
Challenge 2 – Communication
Communication is vital during cyber incidents. This has only become more difficult during the pandemic where incidents had to be dealt with over VoIP calls.
We realised that to assess the team’s communication skills, we would have to intercept their communication during the exercise. Furthermore, we still had to solve the shortcoming that discussing an action was not the same as carrying out that action.
As a potential solution to this problem, we hosted an instant messaging platform called Zulip, similar to Slack or Discord, and allowed users to create various channels for communication during the exercise. We created a few pre-defined channels on the platform to facilitate communication with key internal and external stakeholders, namely ExCo, IT Support, Vendor, and Supporting CSOC channels, and used them to monitor the team’s communication. For example, using the IT Support channel, teams would have to log a ticket for a specific action, such as host isolation, to be taken. Of course, we as facilitators would take on all the different stakeholder roles to be able to correspond with the team.
Challenge 3 – Assessment Criteria
The last challenge to solve was the assessment challenge. This again begs the question:
“How do you assess a team’s ability when the scenario keeps on changing?”
To solve this, we had to abstract the assessment from the scenario. We had to make sure that we had criteria that could remain constant, regardless of the specific scenario for the exercise. To do this, we decided to tie the criteria back to the Incident Management framework.
We used the NIST Incident Management framework as a base:
Instead of focusing on the team’s response to the scenario, we would focus on the team’s response based on the framework. Was the team able to identify the scope? How well did the team perform their triage abilities? Each element of the framework was broken down in a questionnaire that was used during the exercise to measure the team’s response.
But only tying questions to a framework would not be enough, we also had to make sure to phrase the questions better. Let’s look at a few examples.
- Rather than asking if the team isolated the infected workstation, which is a scenario specific question, we could ask, was the team’s containment procedures adequate? The answer to this question could be assessed even if the scenario changes, since what is adequate, would depend on the scenario.
- Instead of asking if the team sent the summary report that the CEO requested, we asked the question on whether the team kept relevant stakeholders updated, as this is a universal ask during any incident.
Using scenario independent questions, our hope was that we could assess the same answers to these questions across multiple scenarios to show the team’s progress and improvement.
With potential solutions to our 3 initial challenges, it was time to start our three-year journey of running the new and improved tabletop exercises.
Tabletop Journey Iterations
To date, we have executed three different iterations of these exercises. An iteration for us is a yearly run of the same scenario (with minor tweaks) against various blue teams. From these exercises, we would gain new insights and make improvements to our solution for future iterations. We are nowhere near finished, but we do believe we are moving in the right direction to ensure that tabletop exercises provide the best possible learning experience for blue teams.
Our first iteration was the most stressful, since it was the very first time that we got to test our new solution in the wild. After running more than 20 of these exercise in Iteration 1, we learned the following valuable insights:
- Blue teams generally have more knowledge in dealing with the initial phases of the cyber kill chain. If a threat actor slips through the cracks and is discovered too late, it is incredibly difficult to perform incident response.
- The eradication step was often missing in Containment, Eradication, and Recovery. Teams were quite effective at containing the incident, but without adequate eradication, the threat actor’s access would persist through the recovery phase, meaning the incident would still be live directly after the recovery actions were taken. This showed why accurate and effective eradication is a must.
- The teams that performed the best were teams that had clear leadership in the room. While democracy is certainly important to hear everyone’s opinions and ideas, during an incident, you often need autocracy, where an individual makes the difficult final call and takes accountability for all actions.
- Teams often did not make use of subject matter experts. A web application would be compromised, and the team would not leverage the product owner or developers. These SMEs can often provide valuable information to the team that could help them better understand the scope of the attack. We often think that our blue teams must be subject matter experts in everything, which is simply not feasible.
Using these insights, we made some improvements for Iteration 2:
- To allow the teams to feel more comfortable during the exercise, we incorporated their specific SIEM product into the lab environment.
- We believe that modern attacker techniques often change the way incident response and management should be performed. To test this theory, our second iteration scenarios focused more on modern attacker techniques, such as supply chain attacks.
- We increased the size of our lab environment to allow us to host even more systems such as a Git server and an HTTP proxy server.
Again, after running several exercises during Iteration 2, we gathered the following insights:
- Most playbooks do not yet cater for modern attacker techniques, such as supply chain attacks, where the application is indirectly attacked, meaning log visibility is significantly reduced.
- Several teams did not have adequate first responder training, meaning forensic evidence would often be destroyed during the recovery phase.
- The teams that performed well were the ones that took adequate notes during the exercise, to not only document the actions that the team has taken, but also the effect that these actions had on the incident.
- Understanding incident scope is incredibly vital to ensure an appropriate response. Without decent knowledge on the incident scope, over- or under-reactions were observed.
- Making the lab environment larger does not really contribute to the exercise, since it is already such an action-packed event.
Using the insights from Iteration 2, the following improvements were made for Iteration 3:
- Since the larger environment did not really add value, we decreased the size of the environment again.
- We decided that we no longer wanted to replay the attack, but rather run the red team component live. Our belief was that this would allow us to better adapt our attack based on the team’s response, as well as provide the team with an opportunity to truly understand the effects of the actions that they take.
- To allow the teams to better respond to the incident and gain more telemetry, Microsoft Defender Advance Threat Protection (MDATP) was introduced as an EDR in the lab environment.
- We again decided to focus on modern attacker techniques for our scenario, since it yielded positive results in the second iteration.
After the completion of Iteration 3 earlier this month, we gathered the following insights:
- The teams that engaged in regular tabletop exercises performed better. We usually only do one of these exercises for a team each year, which leaves 364 days where the team has time to work on their shortcomings and improve. Teams who ran their own tabletop exercises for training regularly, performed significantly better.
- The recent Log4J vulnerability had significantly improved the team’s capabilities to assess and deal with the risk associated with new critical 0-day vulnerabilities being released. While an incident from these vulnerabilities is still difficult to deal with, most teams had processes in place to manage the risks of a new 0-day vulnerability coming to light.
- Current incident processes and playbooks do not adequately cater for modern attacker techniques. The latest Rogue Certificate exploits by SpecterOps means that account breaches can no longer be fixed by simply resetting a password.
- After assessing the capabilities of the various teams over a three-year period, we can say that our assessment criteria stand the test of time. We could tangibly show the areas where the teams made improvements, as well as the key areas of shortcomings.
Cyber Incidents Preparation: Conclusion
With all of this said, what does this actually mean for the future of tabletop exercises and the journey that all organisations must go on to prepare for an incident? We have a couple of pointers:
- Your blue team’s capabilities should be assessed independently of the specific tabletop scenario. This can be achieved by creating a criteria based on your incident management framework and making sure to generalise your questions in such a manner that they will work for every scenario at hand.
- When dealing with an incident, understanding the scope of the attack is vital to ensure an adequate response. However, this requires a team effort between both Incident Managers and Responders, as well as subject matter experts being pulled in based on the specific incident. Make sure that your exercises incorporate all of these teams to truly test your readiness.
- Playbooks. Love them, hate them, they do serve a very vital purpose. However, we should be updating our playbooks to ensure that they can cater for modern attacker techniques as well.
- Tabletop exercises should be interactive. Having a lab environment helps to make the exercises more realistic and provides the team with an opportunity to test the actions that they are proposing.
To help with that last point, we released a collection of our lab creation scripts. These scripts can be used to create a lab environment where your blue team will be able to run attacks and perform technical response during the tabletop exercises. These scripts are available on our GitHub page here.
Thank you for coming on this journey with us, but we are far from done. Our hope is to improve our lab environment to allow us to simulate even more types of attacks! We also want to stay on top of modern attacks such as Log4J. Furthermore, there are some organisations that are fairly mature when it comes to incident response. For these organisations, we are currently working on what we call later-phase exercises. Rather than worry about the initial phases of the cyber kill chain, we directly test response at the goal execution phase. Our R&D team has been working on tooling that can simulate things such as safe ransomware that we can use to test what would happen from these assume breach perspectives. This will allow these more mature teams to test different response aspects, such as what it would truly take to recover from backups, or whether their network segregation is truly adequate to stop the propagation of ransomware.
If you are interested in starting your journey of preparing for a cyber incident or simply have a question, feel free to reach out!