Connections rebuilt, NVIDIA models, chat composer modes

Settings gets a model-first Connections page, NVIDIA arrives as a managed provider, and a new Humans Only composer mode pauses specialists.

0.9.0 changes how you manage models and providers, brings NVIDIA into the managed catalog, and adds a chat composer mode that lets people talk without a specialist jumping in. Specialist building, billing, and editor and notification behavior all get meaningful upgrades.

Providers

You can now connect to NVIDIA as a managed provider. NVIDIA’s chat models are imported into the catalog — deduplicated and filtered — with consistent metadata, so model picking, billing, and routing work the same way they do for every other provider.

Top 7 models

nvidia/llama-3.1-nemotron-70b-instruct
nvidia/nemotron-3-super-120b-a12b
nvidia/nemotron-3-nano-30b-a3b
meta/llama-3.3-70b-instruct
deepseek-ai/deepseek-v3.2
moonshotai/kimi-k2-thinking
qwen/qwen3-coder-480b-a35b-instruct

Show all 19 models

deepseek-ai/deepseek-v3.1-terminus
deepseek-ai/deepseek-v3.2
deepseek-ai/deepseek-v4-flash
deepseek-ai/deepseek-v4-pro
meta/llama-3.1-70b-instruct
meta/llama-3.1-8b-instruct
meta/llama-3.2-1b-instruct
meta/llama-3.2-3b-instruct
meta/llama-3.3-70b-instruct
moonshotai/kimi-k2-instruct
moonshotai/kimi-k2-instruct-0905
moonshotai/kimi-k2-thinking
moonshotai/kimi-k2.5
nvidia/llama-3.1-nemotron-70b-instruct
nvidia/nemotron-3-nano-30b-a3b
nvidia/nemotron-3-super-120b-a12b
qwen/qwen2.5-coder-32b-instruct
qwen/qwen3-coder-480b-a35b-instruct
qwen/qwen3.5-122b-a10b

Settings: model-first Connections

The Connections page is now built around the models you actually use. A search-driven Models list with curated defaults sits on top, variants from multiple providers merge into a single row, and toggling a model whose provider is not yet configured takes you straight to the right setup section. Provider rows collapse into a single switch with sign-in always visible, and admin actions move into a per-row gear menu. API key, Cloudflare, and OpenRouter panels save automatically when you click away. Connecting OpenRouter no longer auto-imports hundreds of models — you start with a curated shortlist and pick what you want from there.

Switching a provider’s role no longer fails when toggling Cloudflare-over-OpenRouter as default and fallback, the model catalog refreshes immediately after you change enablement, lockdown, role, or allowlist instead of taking up to an hour to update, and workspace balances stay correct after a plan change.

Chat: Humans Only mode, recency, and badges

A new composer mode toggle (Tab to cycle) lets you switch the chat input between Everyone and Humans Only. In Humans Only your message still posts to everyone, but specialists won’t reply unless you @-mention them directly. The mode is per-conversation, persisted on your device, and a ribbon above the input makes it obvious that automatic replies are paused.

Direct messages now move to the top of the sidebar on activity, the way Slack and Discord do, and unread DM rows render in semibold so you can see them at a glance. The macOS dock icon and window title both reflect total unread count, and the title prefix and badge clear cleanly when you read messages or navigate away. Channels intentionally keep their existing order.

Token usage from streaming chat completions is now reported reliably end to end. Final usage details from OpenAI-compatible providers are preserved, billing reconciles against actual token counts, and disconnecting mid-stream releases your reservation immediately instead of locking up wallet headroom. Together these close the gap that produced occasional false 402 errors during bursts of transient failures.

Notifications now suppress chimes for messages you sent yourself, treat the focused conversation consistently across unread counts, OS notifications, and routing, and clicking a notification from the bell marks the conversation read.

Specialists: marketplace redesign and IQ that counts knowledge

The specialists marketplace and draft cards have been rebuilt to the new design. There’s a new card primitive with theme-aware hover surfaces, curated category groupings, and a visual header on the marketplace page. Deleting a draft no longer removes it before you confirm — the native confirm has been replaced with a proper dialog. Sidebar nav rows, dark-mode secondary buttons, and refreshed hover, shadow, and icon-tile styling round it out.

Specialist IQ now counts indexed knowledge sources — including indexed files, folders, and URLs — and recalculates after knowledge changes. Specialists no longer get stuck on a false “Knowledge: Missing” state.

In the specialist builder, manually regenerating an overview no longer overwrites your work with a generic fallback when generation fails. Failures now show a clear error in the Review card and a toast with the failure detail, and your existing overview is preserved. The first-pass overview during specialist creation also produces a deterministic, specialist-specific draft.

Local specialist commands now use your saved GitHub token for gh and git, so cloning, pushing, and querying GitHub work without prompting. Authentication stays local-only, respects values you’ve set explicitly, and honors sandbox deny rules.

Inside the specialist builder, opening the VSCode or browser panel automatically closes the details panel so the layout always has room for chat plus one mini-app. The browser specialist’s element picker once again returns its result correctly, so picking elements in the embedded browser works.

Editor and projects

The code panel title now resolves from the active conversation, so popouts and “open in editor” stay in sync with the conversation you’re looking at. The VSCode integration also reacts to repository updates without a restart when opened roots change.

Repository sources in projects now reconcile their clone status from the local copy at startup and after sync failures, so a project that was cloning when you closed the app converges to either ready or an actionable error instead of getting stuck on “probing” or “Loading branches…”. The bundled Git runtime also checks itself before being selected, falling back to the system Git when the bundled version isn’t usable.

Sandbox and platform

Tool calls to external services now retry with exponential backoff, so brief connection drops stop turning into hard failures. The Linux sandbox enforces stricter system-call rules and now allows the calls recent Node versions emit during normal logging, eliminating spurious sandbox violations during tool execution.

A platform update fixes a deadlock and a Windows freeze where resizing, hiding, and showing the browser or VSCode mini-app could leave it blank.

Auth and account switching

Adding an account, logging into a duplicate account, or switching workspaces no longer leaves stale sidebar or chat state behind. A full-screen transition now covers the workspace switch, workspace-bound state and cached data are cleared cleanly, and authenticated screens no longer render before the workspace finishes loading. The render-loop bug that could spike during fast switches is gone.

Polish and fixes

Production, dev, nightly, and beta desktop builds now have distinct names (“Zephyr Agency”, “Zephyr Agency Dev”, and so on), so you can keep multiple channels installed without them clobbering each other’s data. Avatar online dots use a brighter emerald in dark mode so presence reads clearly against dark sidebars.