Engineering Insights

Thoughts on code, artificial intelligence, and the future of tech.

Optimizing Large Language Models for Production

How to reduce latency and cost when deploying LLMs using quantization and efficient inference servers.