pi-autoresearch is an automated experimentation loop tool built for pi, adaptable to a variety of optimization goals including test speed, bundle size, LLM training performance, build time, Lighthouse scores, and more.
| Type | Description |
|---|---|
| Extension | Toolset + live component + /autoresearch dashboard |
| Skill | Define optimization goal, generate session files, start the experiment loop |
| Tool | Description |
|---|---|
init_experiment |
One‑time session configuration – sets experiment name, metric, unit, and optimization direction (min/max). |
run_experiment |
Executes any command, measures wall‑clock time, and captures output. |
log_experiment |
Logs experiment results, automatically commits code, and updates components and dashboard. |
A status component is always visible at the top of the editor. Example:
🔬 autoresearch 12 runs 8 kept │ best: 42.3s
It shows the number of runs, kept runs, and the best result so far.
Type /autoresearch to open the full‑featured results dashboard.
Ctrl+X toggles the display.Escape closes it.
All experiment data is aggregated here for easy review.When you invoke the autoresearch-create skill, the tool will ask (or infer from context) for the goal, command, metric, and relevant file scope. It then generates two core files and immediately starts the experiment loop. Optionally, a check script can be created:
| File | Purpose |
|---|---|
autoresearch.md |
Session document – records goal, metric, relevant file scope, and tried approaches. A new agent can resume the session using only this file. |
autoresearch.sh |
Benchmark script – contains prerequisite checks and the task execution logic. Outputs a metric line in the format METRIC name=number. |
autoresearch.checks.sh |
(Optional) Reverse‑pressure check script – runs tests, type checks, lints, etc. after each successful benchmark. If this script fails, the “keep” operation is prevented. |
Run the following command:
pi install https://github.com/davebcn87/pi-autoresearch
cp -r extensions/pi-autoresearch ~/.pi/agent/extensions/
cp -r skills/autoresearch-create ~/.pi/agent/skills/
/reload to load the newly added extension and skill.Type the following command to start the skill:
/skill:autoresearch-create
The agent will ask for the optimization goal, command, metric, and relevant file scope (or infer them from context). It then creates a branch, generates autoresearch.md and autoresearch.sh, runs the baseline benchmark, and immediately begins the experiment loop.
The agent autonomously executes the loop:
run_experiment → run log_experiment → keep effective changes or roll back ineffective ones → repeat.
The loop continues indefinitely unless manually interrupted.Every experiment result is appended to autoresearch.jsonl in the project. Each line represents one run and has the following features:
autoresearch.md records all attempted approaches, so a new agent can obtain the full context./autoresearch to open the full dashboard, view results tables, and see the best run.Escape at any time to stop the loop; the agent will provide a summary report of the experiments.| Scenario | Optimization Metric | Command |
|---|---|---|
| Test speed optimization | seconds (lower is better) | pnpm test |
| Bundle size optimization | kilobytes (lower is better) | pnpm build && du -sb dist |
| LLM training optimization | bits per byte on validation set (lower is better) | uv run train.py |
| Build speed optimization | seconds (lower is better) | pnpm build |
| Lighthouse score optimization | performance score (higher is better) | lighthouse http://localhost:3000 --output=json |
The extension provides domain‑agnostic infrastructure, while the skill holds domain‑specific knowledge. This separation lets one extension support an unlimited number of application domains.
┌──────────────────────┐ ┌──────────────────────────┐
│ Extension (global) │ │ Skill (per domain) │
│ │ │ │
│ run_experiment │◄────│ Command: pnpm test │
│ log_experiment │ │ Metric: seconds (min) │
│ component + dashboard│ │ Scope: vitest config │
│ │ │ Ideas: pooling, parallel│
└──────────────────────┘ └──────────────────────────┘
Two core files ensure that sessions can survive restarts and context resets:
autoresearch.jsonl – Append‑only log file, recording per‑run metrics, status, commit hashes, and descriptions.autoresearch.md – Dynamically updated document containing the goal, tried approaches, dead ends, and key achievements.Create an autoresearch.checks.sh file to add correctness checks (e.g., tests, type checks, linters). This ensures optimizations don’t break existing functionality. Example script:
#!/bin/bash
set -euo pipefail
pnpm test --run
pnpm typecheck
checks_failed (behavior identical to a crash: no code commit, changes are rolled back).checks_failed status, distinguishing correctness failures from benchmark crashes.checks_timeout_seconds in run_experiment).