// BLOG

Engineering Notes

Deep dives into the systems we build. No fluff, no marketing — just technical writing from the team in the trenches.

Jun 2026 GPU Systems

Why We Ditched Kubernetes for GPU Workloads

K8s was designed for stateless microservices, not 8-GPU nodes with NVLink topologies. Here's what we built instead — and why bare-metal scheduling outperforms container orchestration by 40% on inference throughput.

Read article →

May 2026 Compiler

Kernel Fusion Strategies for Transformer Inference

How we achieved 2.3x speedup on attention layers by fusing softmax, masking, and dropout into a single CUDA kernel. A walkthrough of our MLIR-based fusion pass.

Read article →

Apr 2026 Distributed Systems

CRDTs for Agent State: Eventual Consistency at Scale

When you have 10,000 AI agents coordinating across regions, strong consistency is a luxury you can't afford. We use operation-based CRDTs for state management — here's the architecture.

Read article →

Mar 2026 Infrastructure

Predictive Thermal Management: ML for Our Own Hardware

We trained a lightweight model to predict thermal throttling 90 seconds before it happens. By preemptively migrating workloads, we eliminated 99.7% of thermal-induced performance drops.

Read article →

Feb 2026 Performance

Sub-2ms P99: How We Optimized Our Inference Pipeline

From network stack bypass (DPDK) to custom memory allocators, the full story of how we brought P99 latency from 12ms to 1.8ms for LLM token generation.

Read article →

Jan 2026 Launch

Hello, World. We're MetalBear.

Why we started MetalBear, what problems we're solving, and our vision for the future of AI infrastructure. The manifesto for building things that never break.

Read article →

Stay in the loop

Engineering updates, no more than once a month. No spam, unsubscribe anytime.