Trending Models, in One-Click
From concept to deployment - no infrastructure worries.

Llama-3.1-8B-Instruct
A reliable general-purpose chat model for Q&A, writing, and everyday app assistants.
GPU:
1x L4
VRAM:
24GB

Qwen2.5-7B-Instruct
A small, fast chat model that’s great for typical assistant tasks on modest GPUs.
GPU:
1x L4
VRAM:
24GB

GPT-OSS-20B
An open-weights LLM meant for local/edge-friendly inference and rapid experimentation without huge infrastructure.
GPU:
1x L40S
VRAM:
48GB

DeepSeek-R1-Distill-Qwen-32B
A reasoning-focused LLM best for tough logic, math, and coding-style prompts (distilled for easier serving).
GPU:
1x L40S
VRAM:
48GB