πŸ“˜ Part 2: From Events to Dashboards

🌸 1. Running the ML Sidecar, Lookups & Visual Exploration

In Part 1, we focused on the engine itself: how authentication behavior is modeled, clustered, scored, and turned into explainable anomaly signals.

In this second part, let’s be a bit more practical.

We’ll walk through:

No deep math this time, just connecting the dots end to end.

Also, if you want to deepdive the math and code;


πŸ—‚οΈ 2. Project Structure: What Lives Where?

Before running anything, it helps to understand how the pieces are separated. The repository is intentionally split into three main parts:

splunk_ml_sidecar/
β”‚
β”œβ”€β”€ ml_sidecar/                     ← Python ML engine
β”‚   β”œβ”€β”€ config/settings.yaml
β”‚   β”œβ”€β”€ core/*.py                   ← Features, modeling, pipeline logic
β”‚   β”œβ”€β”€ models/                     ← Trained K-Means + metadata
β”‚   β”œβ”€β”€ run_auto.py
β”‚   └── README.md
β”‚
β”œβ”€β”€ splunk_ml_sidecar_app/           ← Splunk app
β”‚   β”œβ”€β”€ local/collections.conf
β”‚   β”œβ”€β”€ local/transforms.conf
β”‚   β”œβ”€β”€ local/inputs.conf
β”‚   └── local/data/*
β”‚
└── auth-windows-log-generator-as-json-with-real-user-behaviour.py

Why this separation?

A helper script we used to generate realistic synthetic Windows authentication logs for testing and experimentation.

Keeping these layers separate makes the system easier to reason about, and easier to replace parts later.

Also, keep in mind: I just wanted to illustrate that we can create a self-controlled system in this project. Of course, there can be some enhancements to the algorithm and the general logic.


3. πŸ§ͺ Input Data: What Are We Feeding the Engine?

For this project, we used synthetic Windows authentication events generated as JSON. If you want, you can also check generator script from the repository.

Example event:

{"TimeCreated": "2025-12-15T07:55:44.007717Z", "user": "svc-web", "src_user": "svc-web", "src": "10.10.2.120", "dest": "SERVER011", "signature_id": 4624, "signature": "An account was successfully logged on", "action": "success", "process": "C:\\Windows\\System32\\lsass.exe"}

These events are ingested into Splunk first (as normal logs) and then read back by the ML Sidecar.

πŸ’‘ The key idea: Splunk remains the system of record for logs. The ML Sidecar only reads data and writes results to KVStore. Also, because output will be crowded, don’t forget to change your lookup configurations according to your environment.


▢️ 4. Running the Pipeline

Once data is available in Splunk, running the full pipeline is intentionally simple. You can also, find pre-requriments from the splunk_ml_sidecar/README.md file.

  1. pull the repository from the Github. We will use splunk_ml_sidecar directory but you can check out our other demos from this repository. 🐣
       gh repo clone seynur/seynur-demos
    
  2. Install the ML Engine
       cd splunk_ml_sidecar/ml_sidecar
       pip install -e .
    
  3. Configure:
    • both ingestion and output Splunk REST token (ingestion.auth_token & output.auth_token) & base Splunk urls (ingestion.base_url & output.base_url). Also, you can modify all configurations in the settings.yaml file as you desired.

    Note: We have both ingestion and output configurations that you may want to collect input from X server and send all data to another server setup.

    Note: There is no other algorithm other than k-means at this moment in this study.

  4. Change OUT_FILE as your full input file path in the generator script, and generate synthetic authentication logs.

       python3 auth-windows-log-generator-as-json-with-real-user-behaviour.py
    
  5. Configure Splunk to ingest the synthetic data.

       #Β splunk_ml_sidecar/splunk_ml_sidecar_app/local/inputs.conf
    
       [monitor://<full-path-of-the-input-file>]
       disabled = false
       index = ml_sidecar
       sourcetype = ml:sidecar:json
    
    
  6. Restart Splunk after adding the app, and validate the KVStore contents below and inputs exist in Splunk.

       auth_events_lookup
       auth_outlier_events_lookup
       auth_cluster_profiles_lookup
       auth_user_profiles_lookup
       auth_user_thresholds
    

Once data is available in Splunk, running the full pipeline is intentionally simple.

From the splunk_ml_sidecar/ml_sidecar/ directory:

python run_auto.py
screenshot
Figure 1 example of the run_auto.py script outputs on the cli.

That’s it. 🌸

Behind the scenes, this triggers the full chain:

  1. Load configuration from settings.yaml
  2. Pull authentication events from Splunk via REST
  3. Train or load an existing K-Means model
  4. Check for behavioral drift
  5. Score all events
  6. Apply adaptive thresholds (user/cluster/global)
  7. Build profiles and enriched event records
  8. Write everything into Splunk KVStore

If this is the first run, a model is trained. On later runs, if the drift level is enough for your new data, the engine reuses the model unless drift is detected.


πŸ“¦ 5. What Gets Written Back to Splunk?

Instead of indexes, the Sidecar writes structured KVStore collections. This keeps dashboards fast and avoids recomputing ML logic during searches.

Main KVStore Collections


πŸ“Š 6. Bonus: Dashboards - Making Sense of the Output

Once the KVStore collections are populated, dashboards become very lightweight. And yes, I only create a dashboard instead of create fancy UI for now. If you want you can check out dashboards instead of checking step by step via SPLs. πŸͺ„

There are three different section in β€œAuthentication ML Anomaly Detection Test” dashboard. Also, this will be generated automatically with the splunk_ml_sidecar_app.

Typical views include:

Furthermore, you can check the lookups directly.


7. πŸ“˜ Wrapping Up

With Part 1 and Part 2 together, we now have the full picture:

There’s still plenty of room to extend this:

But even in this state, you can think of it as a starting point to develop your model instead of using it directly in production, like a playground for behavior-first security analytics.

πŸ“ Full code & documentation: GitHub Repository

If you end up trying something similar or break it interestingly, I’d love to hear about it. 😊

Connect with me on LinkedIn or drop a comment on the blog.

Until next time πŸ‘‹πŸ”


References: