api.cerebras.ai

Item: api.cerebras.ai
Rating: 4.1
Author: FreeAIRouter

免费大模型额度 · 免费模型 · 社区口碑 · 风险评估

网站类型官方站(免费层)

状态可用

平台openai-compatible

口碑分82 / 100

风险等级低

置信度85%

官网https://cloud.cerebras.ai ↗

发现来源mnfst/awesome-free-llm-apis

网站介绍

Cerebras Inference 是芯片厂商 Cerebras Systems 基于自研晶圆级 CS-3 硬件提供的超高速 LLM 推理服务，主营按量付费 API/云服务，并附带一个无需信用卡、长期有效的免费开发者层（每日 100 万 token）。

免费额度

免费开发者层：每日 100 万 token，30 RPM 限速，免信用卡、即时开通、长期有效（不过期）；免费层上下文窗口暂限 8,192 token，超额返回 429 直到 UTC 00:00 重置。

免费模型

Llama 3.3 70BLlama 4 ScoutQwen3 32BQwen3 235BGPT-OSS 120B (OpenAI 开源)DeepSeek R1 Distill (轮换)

优点

正规上市芯片厂商（Cerebras Systems）官方端点，存在性与合规性确定，无盗 key/跑路类中转风险
每日 100 万 token 免费额度是同类中最慷慨的之一，长期有效、免信用卡、即时开通无需等待
晶圆级硬件推理极快（Llama 4 Scout 实测约 2600 token/秒，部分模型 2000+ token/秒），适合速度敏感场景
提供 Llama 3.3 70B、Qwen3 235B、GPT-OSS 120B 等较大开源模型，足以做原型与小型工具

缺点

免费层限速偏严（30 RPM），因推理太快，工具易突发超限触发 429 失败（claude-code-router 等已报告）
免费层上下文暂限 8,192 token，长上下文任务受限
免费模型均为第三方开源模型（Llama/Qwen/GPT-OSS），非自研大模型，不含闭源前沿模型
超出每日额度后须等到 UTC 00:00 重置或升级付费

风险点

免费层限速 30 RPM 偏紧，高频/Agent 突发请求易触发 429 失败
免费层上下文暂限 8,192 token，长文任务受限
免费额度与限速由厂商单方设定，未来可能调整（如曾临时收紧上下文）
仅提供第三方开源模型，无闭源前沿模型

社区口碑综述

Cerebras 是上市晶圆级 AI 芯片厂商的官方推理服务，口碑整体正面：免费层被多家攻略站评为 2026 年最慷慨之一（每日 100 万 token、免卡、即时、不过期），速度极快。主要负面集中在免费层 30 RPM 限速偏紧——由于推理过快，Agent 类工具容易突发超限触发 429（GitHub claude-code-router issue 等有反馈），以及免费层 8K 上下文上限。属正规第一方免费试用层，无中转站式风险。

使用建议

对速度敏感的原型、小型内部工具与试点项目，Cerebras 免费层是当前性价比与体验俱佳的选择，正规可放心试用。但需注意 30 RPM 限速与 8K 上下文上限：高并发或 Agent 场景应做好限流与 429 退避；长上下文任务需评估付费层或其他方案。

社区提及 (5)

Adam Holter (blog) · positive

Cerebras opens a free 1M tokens/day inference tier; real speed benchmarks show ~2600 tokens/sec on Llama 4 Scout. One million tokens per day is enough for serious prototyping, small internal tools, or a pilot with real users. ↗

Awesome Agents - Free AI API Guide 2026 · positive

Cerebras' free tier is the rare one that's truly instant: no card, no waitlist, and enough daily tokens to build something real. Recommended alongside Groq as best for speed-critical applications. ↗

GitHub - claude-code-router issue #1178 · negative

Users report Cerebras' low rate limits cause failures; because inference is so fast, tools send rapid bursts that exceed the 30 RPM / RPS limits and requests fail with 429 instead of falling back. ↗

Cerebras Inference docs - Rate Limits · neutral

Free tier: 30 requests/minute, 1M tokens/day, context window temporarily limited to 8,192 tokens across all models; exceeding the daily quota returns 429 until UTC 00:00 reset. ↗

GitHub - claude-code-router issue #1178 · negative

Users report Cerebras' low rate limits cause failures; because inference is so fast, tools send rapid bursts that exceed the 30 RPM limit and requests fail with 429 instead of falling back. ↗

参考来源

In English

Summary: Cerebras Inference is the ultra-fast LLM inference service from chipmaker Cerebras Systems on its in-house wafer-scale CS-3 hardware; primarily pay-as-you-go API/cloud, with a no-credit-card, long-lived free developer tier (1M tokens/day).

Free quota: Free developer tier: 1,000,000 tokens/day, 30 RPM limit, no credit card, instant activation, ongoing (does not expire); free-tier context window temporarily capped at 8,192 tokens, with 429 returned over quota until the UTC 00:00 reset.