Amazon’s cloud unit said it is partnering with Cerebras Systems to offer what it claims will be the fastest AI inference available on its Bedrock service in the coming months. The setup splits inference into two stages—prompt “prefill” on AWS’s Trainium chips and token “decode” on Cerebras’s CS-3 systems—linked by Amazon’s high-speed Elastic Fabric Adapter, a design the companies say targets the biggest latency bottleneck in modern LLMs. AWS is the first cloud provider to host Cerebras’s disaggregated inference solution and plans to add leading open-source models and Amazon Nova on Cerebras hardware later this year. The move underscores intensifying competition among cloud providers and chipmakers to speed inference for applications such as real-time coding, with AWS citing broader Trainium adoption from Anthropic and OpenAI, though pricing and exact performance metrics were not disclosed. As with many AI rollouts, results will hinge on real-world workloads and the economics of scaling specialized hardware in AWS data centers.
Related articles:
Amazon Bedrock is now generally available
Elastic Fabric Adapter (EFA)





























