api.cerebras.ai

免费大模型额度 · 免费模型 · 社区口碑 · 风险评估

网站类型官方站(免费层)
状态可用
平台openai-compatible
口碑分82 / 100
风险等级
置信度85%
发现来源mnfst/awesome-free-llm-apis

网站介绍

Cerebras Inference 是芯片厂商 Cerebras Systems 基于自研晶圆级 CS-3 硬件提供的超高速 LLM 推理服务,主营按量付费 API/云服务,并附带一个无需信用卡、长期有效的免费开发者层(每日 100 万 token)。

免费额度

免费开发者层:每日 100 万 token,30 RPM 限速,免信用卡、即时开通、长期有效(不过期);免费层上下文窗口暂限 8,192 token,超额返回 429 直到 UTC 00:00 重置。

免费模型

Llama 3.3 70BLlama 4 ScoutQwen3 32BQwen3 235BGPT-OSS 120B (OpenAI 开源)DeepSeek R1 Distill (轮换)

优点

缺点

风险点

社区口碑综述

Cerebras 是上市晶圆级 AI 芯片厂商的官方推理服务,口碑整体正面:免费层被多家攻略站评为 2026 年最慷慨之一(每日 100 万 token、免卡、即时、不过期),速度极快。主要负面集中在免费层 30 RPM 限速偏紧——由于推理过快,Agent 类工具容易突发超限触发 429(GitHub claude-code-router issue 等有反馈),以及免费层 8K 上下文上限。属正规第一方免费试用层,无中转站式风险。

使用建议

对速度敏感的原型、小型内部工具与试点项目,Cerebras 免费层是当前性价比与体验俱佳的选择,正规可放心试用。但需注意 30 RPM 限速与 8K 上下文上限:高并发或 Agent 场景应做好限流与 429 退避;长上下文任务需评估付费层或其他方案。

最近探测

可达是 (HTTP 200)
延迟1005 ms
平台openai-compatible
模型端点需鉴权

社区提及 (5)

Adam Holter (blog) · positive
Cerebras opens a free 1M tokens/day inference tier; real speed benchmarks show ~2600 tokens/sec on Llama 4 Scout. One million tokens per day is enough for serious prototyping, small internal tools, or a pilot with real users.
Awesome Agents - Free AI API Guide 2026 · positive
Cerebras' free tier is the rare one that's truly instant: no card, no waitlist, and enough daily tokens to build something real. Recommended alongside Groq as best for speed-critical applications.
GitHub - claude-code-router issue #1178 · negative
Users report Cerebras' low rate limits cause failures; because inference is so fast, tools send rapid bursts that exceed the 30 RPM / RPS limits and requests fail with 429 instead of falling back.
Cerebras Inference docs - Rate Limits · neutral
Free tier: 30 requests/minute, 1M tokens/day, context window temporarily limited to 8,192 tokens across all models; exceeding the daily quota returns 429 until UTC 00:00 reset.
GitHub - claude-code-router issue #1178 · negative
Users report Cerebras' low rate limits cause failures; because inference is so fast, tools send rapid bursts that exceed the 30 RPM limit and requests fail with 429 instead of falling back.

参考来源

In English

Summary: Cerebras Inference is the ultra-fast LLM inference service from chipmaker Cerebras Systems on its in-house wafer-scale CS-3 hardware; primarily pay-as-you-go API/cloud, with a no-credit-card, long-lived free developer tier (1M tokens/day).

Free quota: Free developer tier: 1,000,000 tokens/day, 30 RPM limit, no credit card, instant activation, ongoing (does not expire); free-tier context window temporarily capped at 8,192 tokens, with 429 returned over quota until the UTC 00:00 reset.