Uni-CLI — Agent-Native Command Budget

Honest measurement of the context budget behind unicli SITE CMD. Numbers in the "Results" section below are produced by npm run bench and are reproducible in CI fixture mode and on a dev machine in live mode.

Why This File Exists

Agent-native infrastructure should publish real cost numbers. Uni-CLI measures both the command invocation and the response body so public claims stay tied to the current code, fixtures, and output contract.

The current fixture bench measures v2 AgentEnvelope response bodies at 357-415 tokens for representative --limit 5 list-style calls. Total invocation-plus-response budgets land at 364-423 tokens in the same suite. unicli list is much larger because it intentionally emits the full 235-site / 1450-command catalog.

This file ships real numbers or says TODO: -- nothing in between.

How We Measure

Build the CLI (npm run build) so dist/main.js is current.
Run each (site, command) pair at fixed inputs (default args, --limit 5 unless noted). Capture stdout as JSON.
Tokenise the piped stdout via an o200k_base heuristic approximator (bench/tokens.ts). The heuristic matches real tiktoken counts within roughly 6-8% on English and compact JSON; rounding to tens of tokens is honest at this precision.
Record p50 and p95 across the configured number of iterations (BENCH_RUNS, default 50) of in-process tokenisation in fixture mode, or subprocess wall-clock in live mode.
Also capture the invocation-string token count so the agent-side command cost is visible.

The harness lives under bench/ and is wired into npm run bench.

Mode	Command	Network	Use
live	`npm run bench`	yes	Dev-machine sanity check, refreshes fixtures.
fixture	`BENCH_FIXTURES_ONLY=1 npm run bench`	no	CI and reproducible reports. Reads `bench/fixtures/*.json`.

Fixture files are committed under bench/fixtures/ alongside the scripts. Legacy fixture payloads are normalized into the current v2 AgentEnvelope shape before token counting, so the benchmark tracks the current public output contract even when source fixtures predate the envelope migration.

Results

Generated 2026-04-30T12:50:15.804Z on Node v22.22.2 / darwin-arm64. Mode: fixture (20 iterations per case). Reproduce with npm run bench (local live mode) or BENCH_FIXTURES_ONLY=1 npm run bench (CI-deterministic fixture mode).

Cold start: `unicli list`

metric	value
wall p50	47 ms
wall p95	82 ms
response tokens	73370
response chars	264131
sites listed	235
commands listed	1450

Adapter call: p50/p95 response tokens

category	command	invocation tokens	response p50 tokens	response p95 tokens	wall p50 ms	wall p95 ms	mode
news	`unicli hackernews top --limit 5`	9	404	404	0.005	0.08	fixture
social	`unicli reddit hot --limit 5`	8	415	415	0.005	0.016	fixture
social-cn	`unicli 36kr hot --limit 5`	7	357	357	0.003	0.004	fixture
dev	`unicli github-trending daily --limit 5`	11	400	400	0.005	0.006	fixture

Public call budget

metric	value
Smallest total call budget	364 tokens
Largest total call budget	423 tokens
Median total call budget	412 tokens
Representative response token span	357-415 tokens

Public Budget

The public operating target is straightforward:

common list-style calls should stay under 600 total tokens at --limit 5;
failure envelopes should stay compact enough for an agent to repair without loading unrelated documentation;
full-catalog output should remain explicit, not automatic.

The current fixture suite clears that bar. If a future command class needs a larger payload, it should expose pagination, --limit, or --compact.

Reproducibility

The bench/ directory is self-contained:

bench/tokens.ts — token estimator (no native deps).
bench/cold-start.ts — unicli list cold-start runner.
bench/adapter-call.ts — per-command p50/p95 runner (live or fixture mode).
bench/report.ts — orchestrator, writes bench/results.json and patches this file between  and .
bench/fixtures/ — captured JSON responses (rerun npm run bench in live mode to refresh; commit the diff if upstream shape changes).

npm run bench is not part of npm run verify because it prefers network. CI runs it only in fixture mode, on an explicit workflow dispatch or a scheduled maintenance check.

Last reviewed: 2026-04-28.

Uni-CLI — Agent-Native Command Budget ​

Why This File Exists ​

How We Measure ​

Results ​

Cold start: unicli list ​

Adapter call: p50/p95 response tokens ​

Public call budget ​

Public Budget ​

Reproducibility ​