Philosophy

The principle injected into every tier-2-or-higher skill via the {{COMPLETENESS_SECTION}} template variable is short enough to fit in one paragraph and load-bearing enough to shape every recommendation gstack ever makes.

Boil the Lake

AI-assisted coding makes the marginal cost of completeness near-zero. When the complete implementation costs minutes more than the shortcut — do the complete thing. Every time. Completeness is cheap. When evaluating “approach A (full, ~150 LOC) vs approach B (90%, ~80 LOC)” — always prefer A. The 70-line delta costs seconds with AI coding. “Ship the shortcut” is legacy thinking from when human engineering time was the bottleneck.

The distinction is between a lake (achievable in days with AI) and an ocean (a multi-quarter migration). Skills are instructed to flag “completeness gaps” specifically when the complete version costs less than 30 minutes of CC time.

The time-compression table

Every effort estimate inside gstack is reported in two units. Human-team hours and CC plus gstack minutes. The ratio is the lever.

Task type	Human team	CC plus gstack	Compression
Boilerplate or scaffolding	2 days	15 min	~100x
Test writing	1 day	15 min	~50x
Feature implementation	1 week	30 min	~30x
Bug fix plus regression test	4 hours	15 min	~20x
Architecture or design	2 days	4 hours	~5x
Research or exploration	1 day	3 hours	~3x

Notice that the highest-compression tasks are the most mechanical, and the lowest-compression tasks are the most judgment-intensive. The system bets that boilerplate, tests, and feature implementation are nearly free, while design and exploration still demand thinking time. The reviews and the office-hours skill spend the user’s attention there. The ship skill spends it almost nowhere.

The three layers of knowledge

Search-Before-Building, the framework that tier-3-or-higher skills run before recommending any pattern, ranks knowledge into three layers and explicitly prizes the third.

Layer 1. Tried and true

What everyone in this space already does. The conventional wisdom. Table stakes. Users expect it.

Layer 2. New and popular

What current discourse and recent search results endorse. What is trending. The emerging best practices.

Layer 3. First principles

Given what we learned about this specific product, is there a reason the conventional approach is wrong here? Prize this layer above the other two.

When a skill recognizes a Layer-3 insight (everyone does X because they assume Y, but the evidence in our conversation says Y is false here), it logs a eureka moment to ~/.gstack/analytics/eureka.jsonl. Future sessions surface relevant eurekas via {{LEARNINGS_SEARCH}}.

The anti-sycophancy rules

/office-hours Startup mode bans specific phrases from the model’s output and lists explicit replacements.

"That is an interesting approach" becomes take a position instead

The model is instructed to evaluate and take a side, not to acknowledge and explore. Hedging without commitment is treated as a failure mode.

"There are many ways to think about this" becomes pick one and state what evidence would change your mind

Calibration without conviction is treated as evasion. The model must commit to a position AND name the evidence that would update it.

"You might want to consider" becomes say "This is wrong because" or "This works because"

Softening recommendations to avoid friction is treated as serving the model’s comfort, not the user’s outcomes.

"I can see why you would think that" becomes if they are wrong, say they are wrong and why

Validating before challenging is treated as performative empathy. The skill instructs. Name the wrongness directly, then explain.

Show, do not tell. For closing reflections

/office-hours closes with a “What I noticed about how you think” section. The skill instructs the model to use specific callbacks, not generic praise.

GOOD

“You didn’t say ‘small businesses,’ you said ‘Sarah, the ops manager at a 50-person logistics company.’ That specificity is rare.”

BAD

“You showed great specificity in identifying your target user.”

GOOD

“You pushed back when I challenged premise #2. Most people just agree.”

BAD

“You demonstrated conviction and independent thinking.”

The pattern generalizes to every gstack output. Quote the user’s words back at them. Name the specific behavior. Let them feel the receipt. Generic praise reads as model-flavored and erodes trust. Specific callback reads as observed and earns the next sentence’s attention.

STOP gates everywhere

Every plan-review section ends with the same instruction.

“STOP. AskUserQuestion once per issue. Do NOT batch. Recommend plus WHY.”

The enforcement mechanism is explicit “STOP.” markers in the prompt plus an explicit instruction that “an issue with an ‘obvious fix’ is still an issue and still needs explicit user approval before it lands in the plan.” The model is told that batching multiple issues into one question is the failure mode the gate exists to prevent.

What this whole thing is betting on

The bottleneck shifted

Human implementation time used to be the constraint that forced shortcuts. AI made that constraint 10x to 100x looser. Shipping the shortcut is now legacy thinking.

Attention is the new constraint

The constraint is now the user’s attention budget. Every AskUserQuestion costs attention. Every batched decision costs trust when the wrong half wins. Hence one question per issue, recommend plus why, never batch.

Cross-session compounding is the moat

Each skill writes a learning. Each future skill reads the learnings. A bug fixed in March surfaces as context for a similar bug in May. The system gets smarter on a specific codebase over time. No other Claude Code setup compounds like this.

Specialist beats generalist for prompts

One file per role with deep methodology beats one giant agent that does everything. The model inhabits the role for the duration. The state directory lets roles hand off.

What it is not betting on

Not that AI will replace human judgment. Every irreversible decision still needs explicit user approval.
Not that the headline LOC number proves quality. The ON_THE_LOC_CONTROVERSY.md doc is candid about the limits and uses logical-SLOC (non-blank, non-comment) as the primary metric.
Not that every project should use every skill. The system is explicitly subset-able. Disable Eng review with one config key. Skip Design review if no UI scope. Run /qa-only if the user only wants a report.
Not that this is a security boundary. /freeze is accident prevention, not a sandbox. The user can override every safety hook.

Continue to reproducing it for the install instructions and the recommended first session.

Start Here

The System

How It Composes

Foundations

Boil the Lake

The time-compression table

The three layers of knowledge

Layer 1. Tried and true

Layer 2. New and popular

Layer 3. First principles

The anti-sycophancy rules

Show, do not tell. For closing reflections

GOOD

BAD

GOOD

BAD

STOP gates everywhere

What this whole thing is betting on

What it is not betting on

Start Here

The System

How It Composes

Foundations

Documentation Index

​Boil the Lake

​The time-compression table

​The three layers of knowledge

Layer 1. Tried and true

Layer 2. New and popular

Layer 3. First principles

​The anti-sycophancy rules

​Show, do not tell. For closing reflections

GOOD

BAD

GOOD

BAD

​STOP gates everywhere

​What this whole thing is betting on

​What it is not betting on

Boil the Lake

The time-compression table

The three layers of knowledge

The anti-sycophancy rules

Show, do not tell. For closing reflections

STOP gates everywhere

What this whole thing is betting on

What it is not betting on