Better late than never – I wrote the S3 version long back but somehow never came up with the GCS one.
If you have been following my previous articles in this series — Getting started with LocalStack, Archiving frozen buckets to S3 on LocalStack – then you should be aware about this process. Splunk moves the buckets through the hot → warm → cold → frozen cycle. By default, at the frozen level, Splunk deletes the bucket. But the coldToFrozenScript hook allows you to catch the bucket before it is deleted and move.
The script coldToGCS.py sits in seynur-tools. It archives just the journal.zst file — the compressed logs — and that’s all you need. ✨
The main feature lies in how it differentiates between local and production modes. If the environment variable EMULATOR_HOST is set, it uses the GCS JSON API to upload straight away, without any Google Cloud Storage SDK dependencies, just stdlib (urllib) — and that’s only when it’s running locally.
```python
#!/usr/bin/env python3
import sys, os, subprocess
# Define the base GCS path where journal.zst files will be uploaded
# ex: "gs://my-bucket/frozen_buckets"
GCS_BUCKET = "gs://gcs-frozen-test-bucket"
# Set to the local fake-gcs-server endpoint for local testing; leave empty for production
# ex: EMULATOR_HOST = "http://localhost:4443"
EMULATOR_HOST = "http://localhost:4443"
...
```
![]() |
|---|
Figure 1: The three components of the configuration next to each other: coldToGCS.py on the left with the variables of GCS_BUCKET and EMULATOR_HOST set for local testing, inputs.conf on top right specifying the data input for the test index, and indexes.conf on bottom right with the coldToFrozenScript configured. |
I normally write extensive walkthroughs, but this time it’s going to be short, partly because the steps are indeed very straightforward and partly because I’ve already written three posts in this series and my fingers hurt.
fake-gcs-server is the GCS equivalent of LocalStack - an easily runnable Docker container which imitates the GCS API interface in your local environment. Six steps, let’s do it.
docker run -d --name fake-gcs-server -p 4443:4443 \
fsouza/fake-gcs-server -scheme http -port 4443
python3 -c "
from google.cloud import storage
import google.auth.credentials
client = storage.Client(
project='test-project',
credentials=google.auth.credentials.AnonymousCredentials(),
client_options={'api_endpoint': 'http://localhost:4443'},
)
client.create_bucket('gcs-frozen-test-bucket')
print('Bucket created:', list(client.list_buckets()))
"
Configure coldToGCS.py
Two lines. You can do this.
GCS_BUCKET = "gs://gcs-frozen-test-bucket"
EMULATOR_HOST = "http://localhost:4443"
indexes.conf
[oyku_test_gcs]
coldPath = $SPLUNK_DB/oyku_test_gcs/colddb
homePath = $SPLUNK_DB/oyku_test_gcs/db
thawedPath = $SPLUNK_DB/oyku_test_gcs/thaweddb
coldToFrozenScript = "$SPLUNK_HOME/bin/python" "$SPLUNK_HOME/etc/apps/org_frozen_buckets_to_cloud_app/bin/coldToGCS.py"
frozenTimePeriodInSecs = 10
maxHotSpanSecs = 10
maxHotBuckets = 1
maxWarmDBCount = 1
When frozenTimePeriodInSecs = 10 then Splunk will not force you to wait but you have to force yourself to start again, so do that.
bin/splunk _internal call /data/indexes/oyku_test_gcs/roll-hot-buckets
python3 -c "
from google.cloud import storage
import google.auth.credentials
client = storage.Client(
project='test-project',
credentials=google.auth.credentials.AnonymousCredentials(),
client_options={'api_endpoint': 'http://localhost:4443'},
)
for blob in client.list_blobs('gcs-frozen-test-bucket'):
print(blob.name)
"
When you’ve done all this correctly:
oyku_test_gcs/db_1781627016_1781627016_3/rawdata/journal.zst
There is your frozen bucket, sitting safely in a fake cloud on your computer. Improvement.
![]() |
|---|
Figure 2: Flow in three endpoints. Upper section: fake-gcs-server is initiated by Docker in port 4443. Middle: The one-liner of Python script generates the test bucket and checks if it exists. Lower section: After the hot bucket rollover by Splunk, the check in GCS emulator returns that the archived file is now under oyku_test_gcs/db_1781627016_1781627016_3/rawdata/journal.zst. |
And that works on your machine? Awesome! Try it now in production. The difference between this and the above section isn’t as big as you think it is.
brew install google-cloud-sdk # macOS
# sudo apt install google-cloud-sdk # Debian/Ubuntu
gcloud init
gcloud auth login
Or use a service account (grownup way):
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your-service-account.json"
Create the bucket
gsutil mb gs://your-bucket-name
Provide Permissions
Minimum permissions required are Storage Object Creator and Storage Object Viewer. This step cannot be skipped, as you’ll regret it when you need to do it later.
coldToGCS.py
GCS_BUCKET = "gs://your-bucket-name"
EMULATOR_HOST = ""
indexes.conf. Just restart it.That’s All Folks! 🎉
And that’s the GCS flavor — same Splunk hook, same life cycle, and different cloud. Got your infrastructure running on GCP? You have an easy way to run local archives using fake-gcs-server and just setting a single variable in production.
There’s no restore guide for this one (as always with the S3 post if you need any help), but your frozen data will be available when you want it.
Any questions or comments? Contact me on LinkedIn!
Until next time! ☁️