201 models across 13 providers. 153 free · 47 vision-capable · all accessible today.
Showing 201 of 201 models
Anthropic
Claude Opus 4.8 — Anthropic's flagship, served keyless via Pollinations. No user API key needed.
OpenAI
GPT-5.5 — OpenAI's frontier reasoning model with 1M context, keyless via Pollinations.
Claude Opus 4 — Anthropic's most capable model, frontier on all benchmarks, free via GitHub PAT.
DeepSeek
DeepSeek's flagship V4 Pro — frontier reasoning, 256K context, top benchmark scores.
DeepSeek V4 Pro — flagship model, 1M context, top benchmarks.
Claude Sonnet 4.5 — latest hybrid reasoning Sonnet, fast and frontier, free via GitHub PAT.
o3 — OpenAI's most powerful reasoning model, extended chain-of-thought, free via GitHub PAT.
DeepSeek V4 Pro — flagship model, free via GitHub PAT.
DeepSeek V4 Pro on NVIDIA NIM — flagship model, free tier.
DeepSeek V4 Pro on Together AI — flagship reasoning at scale.
Nvidia
NVIDIA Nemotron 3 Ultra — 253B flagship, top-tier reasoning & code, free on OpenRouter.
Google
Google's most capable Gemini yet — frontier reasoning, 1M context, free via AI Studio.
Claude 4 Sonnet — Anthropic's flagship coding & reasoning model, free via GitHub PAT.
NVIDIA Nemotron 3 Ultra — 253B flagship with elite reasoning & code, free via GitHub PAT.
Claude (large tier) — balanced Anthropic model, keyless via Pollinations.
NVIDIA Nemotron 3 Ultra — 253B flagship reasoning & code, keyless via Pollinations.
DeepSeek Pro — frontier reasoning & coding, keyless via Pollinations.
xAI
Grok 4.3 — xAI's frontier model, keyless via Pollinations.
DeepSeek R1-0528 — updated R1 with improved reasoning and fewer hallucinations.
Updated DeepSeek R1 — improved reasoning, fewer hallucinations.
Updated DeepSeek R1 on NVIDIA NIM — improved reasoning, free tier.
NVIDIA Nemotron 3 Ultra — 253B parameter flagship, best-in-class reasoning & code generation, free on NIM.
GPT-4.1 — OpenAI's flagship 2025 model, free via GitHub Models PAT (rate-limited).
o1 — frontier reasoning model with extended chain-of-thought, free via GitHub PAT.
Grok 4 — xAI's frontier model. Free monthly credits for new accounts.
DeepSeek R1 full — frontier reasoning with chain-of-thought, 671B MoE.
Full DeepSeek R1 671B MoE — frontier reasoning with chain-of-thought.
DeepSeek R1 — full 671B reasoning model, free via GitHub PAT.
Alibaba
Qwen 3.6 on NVIDIA NIM — latest Alibaba flagship, free tier.
Qwen 3.6 on Together AI — Alibaba's latest frontier model.
Full DeepSeek R1 on Together AI — 671B MoE reasoning model.
Qwen3 235B MoE — best open reasoning model at scale via Together AI.
o3-mini — OpenAI's reasoning model with visible chain-of-thought, free via GitHub.
Claude 3.5 Sonnet — proven flagship, strong on code, free via GitHub PAT.
Qwen 3.6 Plus — premium Alibaba model, 1M context, top reasoning.
Qwen3 235B MoE — largest open Qwen, frontier quality.
Meta
Llama 4 Maverick 128E MoE on Groq — 1M context, vision, frontier quality.
Llama 4 Maverick on Together AI — 128E MoE with 512K context.
Llama 4 Maverick 128E MoE — frontier capability, 1M context, free on NVIDIA NIM.
Mistral
Mistral's largest model on NVIDIA NIM — 675B parameter frontier reasoning.
Llama 4 Maverick — 1M context MoE, free via GitHub PAT.
OpenAI large tier with reasoning — keyless via Pollinations.
MiniMax
MiniMax M3 — 1M context, vision + reasoning, keyless via Pollinations.
Qwen large tier — frontier Alibaba reasoning, keyless via Pollinations.
Llama 4 Maverick 128E MoE — 1M context, keyless via Pollinations.
DeepSeek V4 — strong general reasoning and coding, 128K context.
MiniMax M3 — latest flagship, 1M context, frontier reasoning.
Llama 4 Maverick 128E MoE — 1M context, frontier quality.
o4-mini — fast reasoning model with vision, free via GitHub PAT.
GPT-4.5 Preview — creative and nuanced responses, free via GitHub PAT.
NVIDIA's largest Nemotron 3 MoE — 120B total, frontier OSS performance.
GPT-4o — proven multimodal flagship, free via GitHub PAT.
Gemini 3.5 Flash — fast multimodal Google model, 1M context, keyless via Pollinations.
OpenAI's open-source GPT 120B — frontier quality released as OSS.
Moonshot AI
Moonshot AI's Kimi K2.6 — 1T+ MoE, frontier coding & agentic tasks, free on OpenRouter.
Meta's massive 405B — frontier open model via Together AI.
OpenAI's open-source GPT 120B on NVIDIA NIM — frontier OSS quality.
GPT-5.4 Mini — fast multimodal OpenAI model, keyless via Pollinations.
DeepSeek V3-class chat & code model, keyless via Pollinations.
Kimi K2.6 — Moonshot's frontier MoE for coding & agentic tasks, keyless via Pollinations.
Grok 3 — strong general capability, real-time X data, free monthly credits.
DeepSeek Prover V2 — formal mathematics and theorem proving specialist.
DeepSeek V3.1 — refined V3 with better instruction following and coding.
DeepSeek V3.2 — fast, capable chat model at very low cost.
MiniMax M2.7 — 200K context, strong reasoning at low cost.
Qwen 3.6 Flash — 1M context, fast and cheap Alibaba flagship.
Llama 4 Scout 16E MoE on Groq LPU — 512K context, vision-capable, blazing fast.
OpenAI's open-source GPT 120B on Cerebras CS-3 — frontier quality at record speed.
NVIDIA's flagship Nemotron Super — Llama-base with elite RLHF.
Llama 4 Scout 17B-16E — 512K context MoE, free via GitHub PAT.
Llama 4 Scout 16E MoE — long-context, vision, keyless via Pollinations.
Qwen3 Coder Flash — fast code-specialised model with 1M context.
Llama 4 Scout MoE — massive 10M context, vision-capable, very affordable.
Meta's flagship 70B served via Groq — best free balance of quality and speed.
DeepSeek V4 Flash — 1M context, extremely cheap on OpenRouter.
Llama 3.3 70B Turbo on Together AI — genuinely free (no token cost).
Llama 3.3 70B FP8 quantized on Cloudflare edge — quality at edge speed.
Meta's latest 70B on NVIDIA's GPU cloud — 128K context.
DeepSeek V4 Flash on NVIDIA NIM — fast frontier reasoning, free tier.
o1-mini — faster reasoning variant, free via GitHub PAT.
Llama 3.3 70B on GitHub Models — Meta's flagship 70B, free via GitHub PAT.
Mistral Large 2 — top multilingual reasoning, free via GitHub PAT.
Qwen Coder — code-specialised Qwen, keyless via Pollinations.
Mistral Large — top multilingual reasoning, keyless via Pollinations.
DeepSeek V4 Flash — fast and free via DeepSeek's direct API.
DeepSeek V3 via direct API — proven MoE backbone for chat and code.
Kimi K2 Thinking — extended chain-of-thought reasoning variant.
NousResearch
Meta's Llama 3.1 405B fine-tuned by NousResearch — massive but free.
DeepSeek Coder V3 — code-specialised model, strong on HumanEval and SWE-bench.
Xiaomi
MiMo V2.5 Pro — Xiaomi's pro-tier, 1M context, enhanced agentic coding.
Moonshot AI's Kimi K2 on NVIDIA NIM — strong coding MoE, free tier.
Llama 70B distilled from DeepSeek R1 — fast reasoning on Groq's LPU (Qwen 32B variant was retired by Groq).
Qwen3 Next 80B MoE with 3B active — strong reasoning, free on OpenRouter.
Fast Gemini 2.5 Flash with thinking mode — best speed/quality for free.
Flagship Mistral Large 2 — top-tier multilingual reasoning.
Qwen3 Next 80B MoE on NVIDIA — 3B active, strong reasoning.
Mistral Large 2 hosted free on NVIDIA NIM — top multilingual reasoning.
GPT-4.1 mini — fast multimodal model, free via GitHub PAT.
Claude 4.5 Haiku — fast & cheap Claude tier, free via GitHub PAT.
ZAI
ZAI GLM — bilingual Chinese/English model, keyless via Pollinations.
Qwen3's code-specialized model — top OSS coding performance.
Meta's flagship Llama 3.1 70B hosted on HuggingFace Inference API.
Llama 3.1 70B with optimized Turbo serving on Together AI.
DeepSeek V3 on Together AI — frontier open reasoning model.
Battle-tested Llama 3.1 70B — proven workhorse with 128K context.
Kimi K2.5 — Moonshot AI's refined model with strong coding and reasoning.
Large multimodal Llama 90B on Groq — powerful vision + reasoning.
Meta's classic Llama 3 70B on Groq's LPU — reliable 8K context, proven quality.
Meta's Llama 3.3 70B — top OSS quality with 128K context, free on OpenRouter.
MiniMax M2.5 — 200K context, very affordable on OpenRouter.
DeepSeek R1 Qwen 32B distill on Cloudflare — edge reasoning model.
Multimodal Llama 3.2 — sees images in 128K context on NVIDIA GPUs.
Codestral — Mistral's code-specialised model, free via GitHub PAT.
Cohere
Cohere Command R+ — RAG-optimised flagship, free via GitHub PAT.
Qwen Vision — multimodal Qwen, keyless via Pollinations.
Grok 2 with vision — multimodal Grok, free monthly credits.
Qwen3 32B dense — strong reasoning at very low cost.
Tencent
Tencent Hy3 — 256K context reasoning model, very affordable.
Qwen3 32B on Groq LPU — blazing fast thinking model.
Qwen3 32B on Cerebras CS-3 — record inference speed for thinking models.
Alibaba's thinking model — surprisingly strong at reasoning for 32B.
Larger MoE with 8x22B experts — strong coding, math, and reasoning.
Mixtral 8x22B on Together AI — highest quality MoE open model.
Google's Gemma 4 31B — latest open Gemma with vision and strong reasoning.
Gemini 2.0 Flash stable release — multimodal, great free tier.
ZAI GLM-4.7 on Cerebras wafer-scale silicon — fast bilingual model.
Large Qwen 2.5 72B via HuggingFace inference API — free tier.
Qwen 2.5 72B Turbo optimized serving on Together AI.
AI21
AI21 Jamba 1.5 Large — SSM-transformer hybrid, 256K context, free via GitHub.
Grok 3 mini — efficient Grok tier, free monthly credits.
Xiaomi MiMo V2.5 — 1M context reasoning MoE, very cheap on OpenRouter.
Xiaomi MiMo 2.5 on NVIDIA NIM — reasoning MoE, free tier.
DeepSeek R1 distilled into Qwen 32B — fast reasoning on Groq LPU.
DeepSeek R1 distilled Qwen 32B on Cerebras wafer-scale — fastest reasoning.
OpenAI's open-source GPT 20B — efficient frontier quality for free.
Best coding model via Together AI — 32B Qwen Coder for serious dev work.
OpenAI's open-source GPT 20B on NVIDIA NIM — efficient frontier quality.
GPT-OSS 20B Reasoning via Pollinations — completely keyless, no signup needed.
GPT-OSS 20B via Pollinations fast lane — minimum latency, completely keyless.
Qwen3 30B MoE with 3B active — efficient reasoning, very cheap.
NVIDIA Nemotron 3 Nano Omni — reasoning-specialized MoE variant.
ZAI's GLM-4.5 Air — bilingual Chinese/English model, fast and free.
Mistral's code-first model — 80+ languages, fill-in-middle support.
HuggingFace H4
Massive 141B MoE Zephyr trained with ORPO — excellent instruction following.
GPT-4o mini — fast, capable, free via GitHub PAT.
Gemma 4 MoE variant — 26B total with only 4B active, efficient and fast.
NVIDIA Nemotron 3 Nano 30B MoE — efficient inference with 3B active parameters.
Direct from Mistral API — sparse MoE with 8 experts, 32K context.
Smallest GPT-4.1 — ultra-fast, free via GitHub PAT.
Claude 3 Haiku — fastest classic Claude, free via GitHub PAT.
Mistral Small 3 — efficient and capable, free via GitHub PAT.
Cohere Command R — efficient RAG and tools, free via GitHub PAT.
Qwen3 14B — mid-size thinking model, great speed/quality balance.
Largest open Gemma model — strong instruction following.
Microsoft
Phi-3.5 MoE — 16x3.8B sparse, free via GitHub PAT.
Multimodal Llama 3.2 11B — sees images at Groq speed.
NVIDIA Nemotron Nano 12B with vision — compact multimodal model.
Lightest Gemini 2.0 — extremely fast for high-volume tasks.
Mistral's Small — free experimental tier, balanced quality.
Llama 3.2 11B multimodal via Together AI — vision + text.
Compact NVIDIA Nemotron Nano — punches above weight at high speed.
Microsoft Phi-4 with multimodal vision on NVIDIA — compact image understanding.
Phi-4 — Microsoft's efficient 14B with strong reasoning, free via GitHub PAT.
Google's open Gemma 2 9B on Groq — strong small-model quality.
CogComp
Dolphin Mistral 24B Venice Edition — creative fine-tune, free on OpenRouter.
Mixtral fine-tuned with DPO by NousResearch — excellent creative tasks.
Nano-sized Nemotron — ultra-fast, Llama-base RLHF tuned.
AI21 Jamba 1.5 Mini — efficient Jamba variant, free via GitHub PAT.
Qwen3 8B — compact but capable with dual thinking modes.
NVIDIA Nemotron Nano 9B v2 — fast and capable small model.
Microsoft Phi-4 Mini on NVIDIA NIM — tiny but capable 3.8B model.
Fast lightweight Llama 3.1 on Groq's LPU — ideal for high-volume tasks.
Edge-hosted Llama 8B — ultra-low latency from any region globally.
Lightweight Llama 8B on NVIDIA — fast and free, 128K context.
Phi-3.5 vision — multimodal Phi, free via GitHub PAT.
TII
TII's massive 180B open model — one of the largest OSS LLMs.
12B model jointly trained with NVIDIA — 128K context, multilingual.
OpenChat
OpenChat 3.5 — top of HuggingFace open-chat leaderboard on release.
OpenChat 3.5 at the edge — previously #1 open-chat model.
Tiny Phi-3.5 Mini with 128K context — Microsoft's efficient SLM.
Qwen 2.5 7B on Cloudflare edge — strong coding for a small model.
Defog
Specialized SQL generation model on Cloudflare — text-to-SQL.
Original Mistral 7B — the model that sparked the OSS revolution.
Mistral 7B v0.3 via HuggingFace — free inference API.
Classic Mistral 7B via Together AI — reliable small model.
Original Mistral 7B on Cloudflare Workers AI — fast edge inference.
Meta's CodeLlama 34B — strong code generation and debugging.
Google's Gemma 7B on Cloudflare edge — reliable global inference.
Beloved 7B fine-tune of Mistral from HuggingFace H4 team.
Ministral 3B — ultra-light Mistral, free via GitHub PAT.
Compact 3B with strong quality-speed trade-off on Groq's LPU hardware.
Tiny 3B Llama — fast, free, fine for simple tasks.
Liquid AI
Liquid Foundation Model 2.5 tiny with thinking — novel non-transformer architecture.
Tiny Llama 3B on Cloudflare edge — global, fast, free.
Tiny Llama 3.2 on NVIDIA — ideal for high-volume edge workloads.
Microsoft's Phi-2 on Cloudflare — efficient 2.7B model.
Liquid Foundation Model 2.5 tiny instruct — blazing fast novel architecture.
Llama 2 13B quantized at the edge — classic, reliable baseline.
Ultra-tiny 1B model — blazing speed for simple classification/extraction tasks.
Smallest Llama on Cloudflare — for global edge classification tasks.
Zhang Peiyuan
Tiny 1.1B model — fastest possible inference for simple tasks.