Logging to your database is worth it

By Daniel Samson · 2026-04-23

I nearly wrote this as "never log to your database". Then I remembered I'm the person who argues you can do almost everything in one relational database — and that logging to the database is one of the more useful things I do. So let me walk the absolutist version back. The idea is sound. The thing that took production down for fourteen hours wasn't the logging; it was the missing retention policy.

Why I log to the database in the first place

My request pipeline is genuinely complex — many stages, async jobs, conversions, retries, work fanning out across queues and workers. Following a single request's journey through log files, or grepping a log aggregator, is miserable when the flow itself is the hard part to see. Structured rows in the database — with feature, request_id, job_uuid, movie_id columns — mean I can query the flow instead of reconstructing it by hand.

The payoff: the whole pipeline in one click

Because the logs are structured relational rows and not opaque text lines, I built a tool on top of them: one click gives me an overview of errors across the entire request pipeline, grouped by stage, with the ability to drill straight into a single request's full timeline. The complexity of the pipeline is exactly what makes the flow hard to follow — and putting the logs in the database is what let me make that flow visible. With logs in a flat file I'd be grepping and guessing. With logs in the database it's a JOIN and a dashboard.

This is the "one database" argument, applied

It's the same case I make for using Postgres as your cache, your queue, and your event log: one engine, one query language, one place to look. Logs as queryable relational data are no different. When the data is right there next to the records it refers to, you can build observability that a separate logging stack would make far more awkward. I stand by all of that.

So what actually bit me?

Not the logging. The lack of retention. The table grew to 44 million rows and 9.2 GB — ten indexes, append-only, no pruning, no partitioning — and quietly filled the production database's disk over thirty days until everything fell over. A great idea with a missing operational guardrail is still a great idea; it just needs the guardrail.

How to keep the upside without the outage

  • Add a retention job from day one — prune rows older than N days on a schedule.

  • Partition by day and drop old partitions; dropping a partition is instant, a giant DELETE is its own incident.

  • Index only the columns your tooling actually queries — every index is a tax on every insert.

  • Alert on table growth and disk utilisation, so a slow leak surfaces at 70% rather than at 100%.

  • If the volume is genuinely high, give it its own volume so a logging spike can't starve your application data.

The balance

Log to your database when the queryability buys you something real — like being able to see your whole request pipeline, and every error in it, in a single click. Just don't log to it the way I first did: unbounded, unmonitored, on the same disk as everything else. The lesson was never "don't". It was "with retention".