Automated Purple Team Framework: Exercise Methodology

What is Purple Teaming?

Purple teaming is a mode of collaboration between red team and blue team with a common objective: improve threat detection and incident response.

The Problem: How to Generate Regular Exercises Without Overloading Teams?

Security teams face a recurring dilemma: purple team exercises are one of the best approaches to validate and improve the SOC’s ability to detect and respond to threats. On the other hand, these exercises mobilize a lot of human resources and time. Each exercise requires preparation that can sometimes be lengthy, and the exercise time can influence the complexity and actions that can be carried out during the exercise. Moreover, the analysis of results and the time spent exchanging between teams to rework detection rules and the functioning of security tools can also be lengthy. The result: teams can only launch a few or no exercises in a year.

Analysis of the Existing Landscape

Before designing the framework, I examined the solutions available on the market.

CALDERA (MITRE) is a very comprehensive open-source tool for attack automation. However, it does not integrate naturally into automated infrastructures. There is no lab management, the default binaries are all detected by a simple Defender, and it does not initially allow conducting complex campaigns based on SOC evasion. Moreover, after a few tests, it is not simple to rewrite this application to adapt it to our needs.

Atomic Red Team offers an excellent library of atomic tests aligned with the MITRE ATT&CK framework. But it is primarily a collection of unit scripts, not a complete framework capable of managing labs, users, complex scenarios, and result comparison. And, like CALDERA which uses this library, all these TTPs are already detected by Defender.

Vectr is designed for tracking and documenting purple team exercises. It is a very good reporting tool, but it does not offer execution automation, infrastructure management, or integration with SOC tools.

What is missing from current frameworks and tools: there is no tool that combines both rapid deployment of a lab close to the real network, automated execution of attack scripts, and automatic comparison of results with SOC alerts. I will reuse components and features from the solutions seen, adding the missing parts identified in my framework.

Structure of a Purple Team Exercise

Before diving into the technical details, it is important to understand the logical flow of an exercise.

Phase 1: Preparation. The scope and objectives of the exercise are defined. Teams align on the TTPs (Tactics, Techniques, Procedures) to be tested. The test infrastructure is prepared to allow testing the desired rules or actions pre-established by the blue team and the red team. The white team prepares the infrastructure, the blue team connects its tools to the platform, and the red team prepares its attack scenarios.

Phase 2: Execution. The red team executes each TTP transparently and documented. The blue team observes and validates whether or not a detection has occurred.

Phase 3: Joint Analysis. For each undetected TTP, detection rules are created or improved. SIEM tuning and alert adjustment are performed.

The Automated Purple Team Framework

To address this problem, I designed a framework that can be broken down into several components, each playing a specific role in fully automating the exercise cycle.

Technology Stack

Ludus (Proxmox): This is the virtualization platform that hosts the test labs, it is integrated with Proxmox. I chose Ludus for its open API, its ability to deploy test environments from a single YAML file, and the ability to integrate this API directly into my backend.

FastAPI as backend: it offers the flexibility needed to orchestrate all technologies in the framework. It interfaces with the Ludus API, those of SOC tools, and manages the framework’s states. I used SQLAlchemy for the ORM layer, which allows for rapid and iterative database evolution.

PostgreSQL centralizes all data: users, Ludus API keys, exercise configurations, metadata (script configuration, execution history, user roles, lab states), as well as exercise logs, etc.

Redis captures execution logs in real-time. Ansible scripts emit their logs in a Redis stream, which the frontend continuously consults to display execution live.

Vue 3 / Vite as frontend for a rapid and easy-to-maintain development interface.

Docker Ansible Runner is a custom Docker image that contains all Ansible tools. Each time an attack is executed, a container is instantiated with the correct VPN configurations, launches the Ansible scripts, then destroys itself automatically once completed.

The Docker Ansible Runner: The Executor

When you launch an attack from the frontend:

A Docker container is instantiated with the VPN configuration file of the target lab.
The container integrates the attack scripts to be executed and connects to the lab via VPN.
Ansible executes tasks on the target machines in the lab.
During execution, all logs are streamed in real-time to Redis.
Once completed, the container destroys itself automatically.

This design ensures that each execution is isolated, reproducible, and fully traced.

Attack Scripts:

Scripts are written in YAML Ansible and encapsulate the logic of an attack. They must include:

Connection variables: the method for accessing the target machine (WinRM for Windows, SSH for Linux), user identifiers and passwords.

Task variables: those that parameterize the script’s behavior (paths, registries, exploitation parameters).

From the frontend, at the time of execution preparation, it is possible to override the values of variables. This allows reusing the same script on different labs, with different domain names, different IP ranges, different complexity levels.

Each script is associated with metadata in the database:

The author and level of the attack (script-kiddie, intermediate, advanced, APT), visibility (public or private), number of executions, number of successes and errors. This allows good tracking and evolution of scripts on the platform.

Execution logs for each script are kept in the database, which allows consulting precisely what happened during a previous execution.

Lab Management via Ludus

The Ludus API has been fully integrated into the backend. From the web interface, you can manage:

Users and their respective labs. Machines (startup/shutdown, deletion, snapshots, deployment). Activation of testing mode.

The frontend sends requests to the backend, which uses the user’s API key to relay them to the Ludus server via an SSH tunnel. This allows credential isolation and granular access control.

For this part, only members of the red-team and white-team have access. If they wish to use it, they must register their Ludus API key, which is kept in the database.

Role and Permission System

The platform segments access according to user responsibilities and roles:

Superuser: full access. Manages all users and all configurations.

Red_Team_Manager / Blue_Team_Manager: can manage members of their respective team.

Red_Team_Admin / Blue_Team_Admin: extended rights within their team.

Red_Team_User / Blue_Team_User: access to features specific to their team.

White_Team: access to the Ludus section to create and administer labs.

The white team is responsible for infrastructure: preparation of Ludus ranges, configuration of YAML deployment files, lab administration. The red team creates and executes attack scripts. The blue team accesses the detection dashboard to analyze results.

Each new user registers on the platform as either a red team, blue team, or white team user. An administrator can then elevate the account’s role or remove it from its team if it does not correspond to the correct team. Managers can only manage users from their own team. It is assumed that this framework is internal to an organization and that it is not necessary to go through an account without a role that should be validated and assigned to a team manually by a manager or administrator, in order to limit delays and usage constraints.

VM Configuration: Custom Ansible Roles

Ludus automates the deployment of ranges via YAML Ansible configuration files. I created several custom roles to link VMs to real SOC tools:

WEC Role: installs and configures Windows Event Collector on a Windows Server. Pushes a GPO to collect all logs from domain machines and send them to the SOC pipeline.
Sysmon Role: installs and configures Sysmon on all Windows machines.
Linux_Logs Role: configures direct sending of Linux logs to the SOC pipeline.
EDR_Windows / EDR_Linux Role: configures the EDR agent on lab machines and links it to the real EDR solution used by the teams.

Other roles were created or adapted to deploy Exchange, ADCS, and other usual services.

These custom roles ensure that test labs best reproduce the configuration of the organization’s real infrastructure.

The Scenario Builder: Building Realistic Attacks

In addition to executing isolated scripts, the platform integrates an attack scenario builder.

From this interface, you can create a structured scenario aligned with the MITRE ATT&CK framework: reconnaissance, resource development, initial access, execution, persistence, etc. In each phase, you add the corresponding attack scripts, configure their variables, and select target machines.

The interface offers direct navigation of the MITRE ATT&CK framework, which allows quickly identifying TTPs to test and adding them to the scenario.

Once the scenario is configured, you can save it to replay it later on another lab or at another date. You select the target Ludus lab and launch the entire scenario in a single click.

The goal is to allow the red team to build a base of scenarios and knowledge, and to be able to duplicate scenarios or TTPs under several variants and evasion levels. On the blue team side, this offers clear visibility on detection coverage by phase.

The Blue-Team Dashboard: Measuring Coverage

The dashboard makes calls to the tool centralizing all SOC alerts and automatically compares:

Executed scripts versus rules that should have triggered an alert.

Rules that actually fired versus those that were expected.

It displays:

A global detection rate percentage.
The list of rules that did not fire and that need work.
The list of unexpected rules that fired.
A performance graphs: red team versus SOC detection capacity for the exercise.

Isolation of Ansible Logs: Not Polluting the Pipeline and Avoiding False Positives

Given that the IP range of the VPN access network to the lab in Ludus is predefined, I created exclusion rules so that these logs do not pollute the SOC pipeline. This range is excluded from log collection tools on Linux, Windows, and on the WEC.

This ensures that the SOC pipeline only processes relevant logs and that false positives from executing Ansible tasks do not bias the detection metrics.

Technical Challenges and Solutions

Several challenges were encountered during development.

Real-time log synchronization. Streaming Ansible logs via Redis required work to ensure that no log was lost between execution in the container and display on the frontend, particularly during container interruptions. It was also necessary to improve their readability and not expose raw logs to users.
Container lifecycle management. It was necessary to ensure that the Docker Ansible Runner destroys itself cleanly after each execution, even in case of error or Ansible timeout, to avoid orphaned containers connected to VPN, and guarantee its isolation so that it only executes its scripts once connected to the lab’s VPN, without being able to execute them on the Ludus server.
Script portability. A script written for a specific lab must be able to be reused on any other lab. This requires rigorous variable management and special attention to environment differences (domain name, IP ranges, OS versions). For this, I created a reusable skeleton for creating new scripts to avoid errors.
Alert mapping. Automatic correlation between a SOC alert and the attack script that triggered it is based on timestamps. Before executing a script, a log is generated and sent to the SOC pipeline, and the same at the end, to allow sorting alerts and relating them to a script. Expected rules are defined in the script’s metadata by the blue team.

Results and Validation

The framework was validated on several scenarios covering different ATT&CK phases. A concrete example: a three-phase scenario initial access by simulated phishing, privilege escalation, lateral movement, executed on a Ludus lab with an Active Directory domain, Sysmon on Windows machines, and logs sent to the real SOC pipeline.

On this scenario, the blue team dashboard provided immediately:

The vision of what was detected or not, the SIEM rules that fired, and those that need work. This data, which would have taken several days to collect between exercise planning, execution, and post-exercise review, was able to be generated in a few minutes.

Note: the objective is not necessarily to achieve 100% detection. A 100% rate would rather mean that the scripts are too noisy or poorly designed. The objective is to establish a measurable baseline and evolve it over time to progressively improve the detection score. This score can also temporarily decrease if the red team develops new, more complex attack methods allowing them to evade current rules. These evolutions stimulate competition between red team and blue team, with the common goal of improving the overall security of the organization.

Perspectives and Evolution

This framework answers the initial problem: allow regular purple team exercises without massively mobilizing teams each time. Once scripts are written and labs configured, relaunching a complete exercise is done in a few clicks.

Areas for improvement to consider:

Automatic Scheduling. Plan recurring executions without human intervention for continuous monitoring of detection coverage.
Security CI/CD Integration. Automatically trigger an exercise when SIEM rules are updated or infrastructure is changed.
Script Library Enrichment. The more the red team feeds the database, the wider the MITRE coverage tested becomes and the more relevant future executions become.
Automated Reporting. Generate periodic reports on the evolution of the detection rate for the relevant teams.

In the end, the goal is for a large part of the exercises to run autonomously. Teams only intervene for analysis of results and improvement of detection rules. Human load shifts from planning and execution to analysis and improvement.