Building a time-series database from scratch taught me more about distributed systems than any course.
Why build a TSDB?
Time-series data has unique properties: append-only writes, time-range queries, high compression potential. Building Helios forced me to understand these deeply.
LSM-tree storage
The Log-Structured Merge-tree is write-optimized. Writes go to an in-memory buffer (MemTable), which periodically flushes to immutable sorted files (SSTables). Background compaction merges SSTables.
Raft consensus
Implementing Raft from the paper was the hardest part. The paper is clear, but the edge cases are subtle. Leader election, log replication, and membership changes each have gotchas.
Gorilla compression
Time-series data compresses well. Timestamps are usually regular, so delta-of-delta encoding works. Values often repeat or change slightly, so XOR encoding helps. Helios achieves 12x compression.