JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
(445 stars) (66 forks) (0 indexed issues) (0 open good first issues)
Repositories
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
A simple, performant and scalable Jax LLM!