Jun 2026
GPU Systems
K8s was designed for stateless microservices, not 8-GPU nodes with NVLink topologies. Here's what we built instead — and why bare-metal scheduling outperforms container orchestration by 40% on inference throughput.
Read article →
May 2026
Compiler
How we achieved 2.3x speedup on attention layers by fusing softmax, masking, and dropout into a single CUDA kernel. A walkthrough of our MLIR-based fusion pass.
Read article →
Apr 2026
Distributed Systems
When you have 10,000 AI agents coordinating across regions, strong consistency is a luxury you can't afford. We use operation-based CRDTs for state management — here's the architecture.
Read article →
Mar 2026
Infrastructure
We trained a lightweight model to predict thermal throttling 90 seconds before it happens. By preemptively migrating workloads, we eliminated 99.7% of thermal-induced performance drops.
Read article →
Feb 2026
Performance
From network stack bypass (DPDK) to custom memory allocators, the full story of how we brought P99 latency from 12ms to 1.8ms for LLM token generation.
Read article →
Jan 2026
Launch
Why we started MetalBear, what problems we're solving, and our vision for the future of AI infrastructure. The manifesto for building things that never break.
Read article →