Engineering Blog

Engineering Blog

Technical deep dives from the team building AI-data infrastructure. MCP, Node.js, distributed systems, and lessons from production.

DuckDB Over Parquet via MCP: How We Query 175.8 Million Rows in 627ms

PostgreSQL choked on a 175.8 million row taxi dataset — 67 GB of CSV that took 18 hours just to COPY in. We converted to 7.7 GB Parquet, pointed DuckDB at it, and built a query router that picks the right engine per dataset. Here are the 4 gotchas that almost derailed it, from lazy VIEWs to MCP transport serialization.

Why You Can't Timeout SSE Streams in Node.js (and the Combined Pattern That Finally Works)

After 7 iterations we discovered that AbortSignal, Promise.race, and every standard timeout mechanism fails on SSE streams in Node.js. The event loop scheduling is the culprit — I/O callbacks starve timer callbacks when keepalives flow, and kernel-buffered TCP chunks still deliver after socket destroy. No single fix covers all failure modes. Here's the combined pattern that finally works.

serverExternalPackages: How One Config Line Cut Our Next.js Build from 30 GB to 1.16 GB

Our Next.js app had never had a production build — it ran next dev in production since v0.10.x. When we finally tried next build, it OOM-killed at 28 GB. 10 attempts and one config entry later, we achieved a 26x RSS reduction. Here's the full journey, every failed attempt, and the fix.