
How to use Ponytail to make your AI coding agent write less code
You know him. Long ponytail. Oval glasses. Has been at the company longer than the version control. You show him fifty lines; he looks at them, says nothing, and replaces them with one.
Ponytail puts him inside your AI agent.
It is a skill and ruleset you install into whichever agent you already use (Claude Code, Codex, Gemini CLI, Cursor, and ten more). Within ten days of release it hit roughly 10,000 GitHub stars. The benchmark headline from the README: 80-94% less code, 47-77% less cost, and 3-6x faster than a no-skill agent, on every model.
How Ponytail decides whether to write code
The six-rung decision ladder
Before writing any code, the agent stops at the first rung that holds. The AGENTS.md ruleset states it directly:
You are a lazy senior developer. Lazy means efficient, not careless. The best code is the code never written.
The ladder:
- Does this need to be built at all? (YAGNI)
- Does the standard library already do this? Use it.
- Does a native platform feature cover it? Use it.
- Does an already-installed dependency solve it? Use it.
- Can this be one line? Make it one line.
- Only then: write the minimum code that works.
The agent never reaches step 6 unless every rung above it fails. The rules also ban abstractions that weren't explicitly requested, new dependencies that can be avoided, and boilerplate nobody asked for. Deletion over addition. Boring over clever.
What Ponytail never cuts
Ponytail cuts code volume, not correctness. The AGENTS.md carve-outs are explicit: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (clocks drift, sensors read off), and anything explicitly requested.
The v4.4.0 release added a one-runnable-check rule: non-trivial logic leaves one runnable check behind (an assert-based demo or one small test file, no frameworks, no fixtures). "Lazy code without its check is unfinished." A user who ran Ponytail across a nine-phase from-scratch rewrite (protocol, desktop app, simulator, Raspberry Pi daemon, ESP32 firmware) confirmed: "it never once trimmed a failsafe, validation, or auth check."
Before and after: what Ponytail actually changes
The easiest way to see what Ponytail does is through examples.
Date picker: from 30 lines to 1
Without Ponytail, the agent reaches for a library:
npm install flatpickr
import flatpickr from "flatpickr";
import "flatpickr/dist/flatpickr.min.css";
import { useEffect, useRef } from "react";
export default function DatePicker({ value, onChange, minDate, maxDate }) {
const inputRef = useRef(null);
const instanceRef = useRef(null);
useEffect(() => {
instanceRef.current = flatpickr(inputRef.current, {
defaultDate: value,
minDate,
maxDate,
dateFormat: "Y-m-d",
onChange: ([date]) => onChange(date),
});
return () => instanceRef.current?.destroy();
}, []);
useEffect(() => {
instanceRef.current?.setDate(value, false);
}, [value]);
return <input ref={inputRef} className="date-picker" />;
}
One dependency, one wrapper component, two useEffect hooks, a cleanup function, and a CSS import. To pick a date.
With Ponytail:
<!-- ponytail: browser has one -->
<input type="date">
1 dependency + 30 lines → 0 dependencies + 1 line. Native, accessible, localized, keyboard-navigable, mobile-friendly.
Email validation: from a class to one expression
Without Ponytail:
import re
EMAIL_PATTERN = re.compile(
r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
)
class EmailValidator:
"""Validates email addresses against RFC-like rules."""
def __init__(self, pattern: re.Pattern = EMAIL_PATTERN):
self.pattern = pattern
def validate(self, email: str) -> bool:
if not isinstance(email, str):
raise TypeError("email must be a string")
email = email.strip()
if not email:
return False
return bool(self.pattern.match(email))
def validate_email(email: str) -> bool:
"""Convenience wrapper around EmailValidator."""
return EmailValidator().validate(email)
A class, a wrapper, a regex that still rejects valid addresses and accepts invalid ones. Regex cannot validate email. Only a delivery attempt can.
With Ponytail:
# ponytail: good enough, real validation is sending the mail
"@" in email and "." in email.split("@")[-1]
27 lines → 1 line.
API endpoint: from five files to nine lines
Without Ponytail: five files, three classes, a custom exception, and a dependency-injection chain wrapping one database call.
app/
├── controllers/user_controller.py
├── services/user_service.py
├── repositories/user_repository.py
├── schemas/user_schemas.py
└── exceptions/user_exceptions.py
With Ponytail:
# ponytail: drop the layers; keep the response schema, it whitelists what leaves the API
class UserOut(BaseModel):
id: int
name: str
email: str
@app.get("/users/{user_id}", response_model=UserOut)
def get_user(user_id: int, db: Session = Depends(get_db)):
user = db.get(User, user_id)
if not user:
raise HTTPException(404)
return user
5 files → 9 lines. The repository, service, and custom exception were ceremony. The response schema stays: it whitelists which fields leave the API. Removing it would expose every ORM column. That is the line Ponytail draws between "lazy" and "negligent": cut the layers, keep the trust boundary.
Benchmark results
How much less code Ponytail writes
Three arms (no skill, caveman, ponytail), three models (Haiku, Sonnet, Opus), five everyday tasks (email validator, JS debounce, CSV sum, React countdown, FastAPI rate-limit), 10 runs per cell, median reported. Date: 2026-06-13. Reproduce with:
npx promptfoo eval -c benchmarks/promptfooconfig.yaml --repeat 10
Versus baseline, Ponytail writes 80-94% less code, costs 47-77% less, and runs 3-6x faster, on every model. These numbers reflect single-shot calls that re-send the skill each time. In real sessions the skill is injected once and prompt-cached, so the cost gap widens further in Ponytail's favor.
Ponytail commands
Commands require a skill-capable host: Claude Code, Codex, OpenCode, Gemini CLI, or pi. Cursor, Windsurf, Cline, and Copilot get the always-on ruleset only.
/ponytail ultra exists for when the codebase has wronged you personally.
/ponytail-review outputs one finding per line in the format L<line>: <tag> <what>. <replacement>. Tags are delete:, stdlib:, native:, yagni:, and shrink:. It ends with net: -<N> lines possible. If there is nothing to cut, it says Lean already. Ship.
How to install Ponytail
Installing Ponytail on Claude Code
/plugin marketplace add DietrichGebert/ponytail
/plugin install ponytail@ponytail
Installing Ponytail on Codex
codex plugin marketplace add DietrichGebert/ponytail
codex
Then open /plugins, select the Ponytail marketplace listing, and install it. Open /hooks, review and trust its two lifecycle hooks, and start a new thread.
Installing Ponytail on Gemini CLI
gemini extensions install https://github.com/DietrichGebert/ponytail
Installing Ponytail on Cursor, Windsurf, Cline, and other rule-based agents
Copy the matching rules file from the repo into your project:
- Cursor:
.cursor/rules/ponytail.mdc - Windsurf:
.windsurf/rules/ponytail.md - Cline:
.clinerules/ponytail.md - GitHub Copilot:
.github/copilot-instructions.md - Kiro:
.kiro/steering/ponytail.md
These agents get the always-on ruleset but not the commands.
Supported agents
Ponytail works with 14 agents as of v4.4.0:
The README badge says 11 agents. The portability docs are more current and list 14.
Tracking technical debt with the ponytail: comment
What the ponytail: comment does
Every intentional simplification gets a ponytail: comment that names what was skipped and the upgrade path. The examples above already show the pattern: <!-- ponytail: browser has one -->, # ponytail: good enough, real validation is sending the mail, # ponytail: drop the layers; keep the response schema. The comment keeps the shortcut visible and prevents "later" becoming "never."
The format is: ponytail: <ceiling description and upgrade path>
Collecting all ponytail: comments with /ponytail-debt
/ponytail-debt harvests all ponytail: comments in the repo into a tracked ledger. It was added in v4.4.0, prompted by a user who ran Ponytail across a nine-phase from-scratch rewrite of a real system. The verdict: "net win, kept it on the whole build." The command gives you a single view of everything intentionally deferred so you can decide what to upgrade and what to leave alone.
Ponytail is at github.com/DietrichGebert/ponytail. The AGENTS.md file is the full ruleset. Copy it into any project or agent that reads AGENTS.md at the repo root.