Every AI coding session starts from zero.
You explain your project structure. Again. You describe the coding patterns you’ve established. Again. You remind it about that database migration issue from last week. Again. And when it inevitably makes the same mistake you corrected three sessions ago, you wonder why you’re paying for an “intelligent” assistant.
The pattern is predictable. You start a session, explain everything, get productive work done, end the session. Tomorrow, the slate is wiped clean. Your context is gone. Your lessons evaporated.
This isn’t a limitation of the AI itself. It’s a limitation of how we use it. The AI doesn’t remember because we haven’t given it anything to remember from.
Where This Comes From
Before diving in, here’s the context for these ideas.
This approach draws from two key sources:
Ralph: The AI Agent Whisperer by Geoffrey Huntley explores how to effectively work with AI agents, including the importance of structured context and clear task definitions.
Effective Harnesses for Long-Running Agents from Anthropic’s engineering blog discusses building systems that help AI agents work effectively over extended periods, with insights on context management, verification, and error recovery.
The Agentic Context System is a practical implementation of these ideas, tailored for everyday AI-assisted development.
The Problem: Context Rot
To understand why external files work, you need to understand what happens inside an AI conversation.
Every AI model has a context window, a fixed amount of text it can process at once. For Claude, this is around 200,000 tokens. Sounds like a lot, but in a coding session with file contents, error messages, and back-and-forth discussion, you burn through it fast.
Here’s what happens in practice:
Early in a session: You explain the architecture. The AI “knows” that your app uses a specific database pattern, that certain files shouldn’t be touched, that tests need to pass before committing.
Mid-session: The context window fills. Older messages get compressed or dropped to make room for new ones. That architecture explanation from the start? It’s fuzzy now, or gone entirely.
Late in a session: The AI makes a decision that contradicts something you established earlier. It’s not being stubborn. It literally doesn’t have access to that information anymore.
This is context rot: the gradual degradation of shared understanding as conversations grow longer than the AI’s working memory.
The longer your project runs, the more valuable these files become. Lessons learned three months ago stay accessible. Architecture decisions from the first week remain clear. Nothing degrades.
The Solution: Externalize Everything
The fix is straightforward: write it down.
Instead of keeping context in your head (or hoping the AI magically retains it), you store it in files. Plain text files that the AI reads at session start and updates at session end. No database. No API. No subscription. Just files.
This is the Agentic Context System: a structured set of files that give any AI coding assistant persistent memory across sessions.
| Metric | Without Context System | With Context System |
|---|---|---|
| Session start time | 10-15 min explaining context | 30 sec reading files |
| Coding consistency | Varies wildly session to session | Follows established patterns |
| Progress tracking | Mental notes, Slack messages | Documented with phase gates |
| Mistake repetition | Same errors every week | Lessons learned persist |
| Agent handoffs | Start from scratch each time | Structured context transfer |
The File System
The Agentic Context System organizes files into three categories: Context Files that the AI reads at session start, Task Files that track active work, and Templates that standardize workflows.
your-project/
├── CLAUDE.md # Project bible (tech specs, architecture)
├── CURRENT_TASK.md # Active task with phase tracking
├── Prompt.md # Session execution protocol
├── features/
│ ├── index.json # Lightweight status map + session plan
│ ├── backlog.json # Slim list (id, name, deps only)
│ └── active/ # Full definitions for active features
│ └── FEATURE_ID.json
└── .claude/
├── lessons-learned.md # Accumulated wisdom from sessions
├── commands/
│ └── work.md # Planning mode with implementation hints
├── templates/
│ └── HANDOFF.md # Context spawn template
└── static/
├── patterns.md # Stack-specific patterns
├── checklists.md # Phase-specific verification
└── rules.md # Critical operating rules
Context Files (Read at Session Start)
Task Files (Updated During Sessions)
Each file has a specific purpose. The key improvement over simpler systems is separation of concerns: lightweight files for quick status checks, detailed files only when needed.
CLAUDE.md: The Project Bible
The CLAUDE.md file is the most important file in the system. It’s the technical specification for your project, the source of truth that the AI references for every decision.
Here’s how a CLAUDE.md might look for a todo API project. The file starts with a project overview:
# CLAUDE.md - Todo API Technical Reference
## PROJECT OVERVIEW
A REST API for managing tasks.
- **Project Name:** todo-api
- **Tech Stack:** Bun, Hono, SQLite (via Bun's built-in sqlite)
### Core Principles
1. **Type Safety**: TypeScript everywhere, no `any` types
2. **Test First**: Write tests before implementation when possible
3. **Simple APIs**: RESTful design, clear error messages
Next, the architecture section documents the project structure:
## ARCHITECTURE
### Directory Structure
todo-api/
├── src/
│ ├── index.ts # Entry point, Hono app setup
│ ├── routes/
│ │ └── todos.ts # Todo CRUD routes
│ ├── db/
│ │ └── index.ts # Database connection
│ └── types/
│ └── todo.ts # TypeScript interfaces
├── tests/
│ ├── routes/
│ │ └── todos.test.ts
│ └── setup.ts # Test utilities
├── package.json
└── tsconfig.json
Finally, coding standards and common commands:
## CODING STANDARDS
- Use TypeScript with strict mode
- Format with Prettier
- Lint with ESLint
- Test with Bun's built-in test runner
## COMMANDS
# Development
bun install # Install dependencies
bun run dev # Start dev server
bun test # Run tests
bun run lint # Lint code
What to include in CLAUDE.md:
- Project overview and tech stack
- Directory structure with explanations
- Coding standards and conventions
- Common commands for development
The file should answer one question: “What do I need to know to work on this project?”
Prompt.md: The Execution Protocol
The Prompt.md file is new in this improved system. It defines how the AI should work, not just what it should know.
# Session Protocol
## Startup
1. CURRENT_TASK.md → Active feature
2. features/index.json → Check status_map for dependency passes
3. features/active/FEATURE_ID.json → Get full feature definition
4. Read by task type (patterns.md for backend, design.md for frontend)
## Phase Gate Rule
After completing Phase N:
1. Run: `bun test && bun run lint`
2. If pass: Commit `[FEATURE_ID] Phase N: Description`
3. If fail: Fix before Phase N+1 (counts toward 3-attempt limit)
4. Update CURRENT_TASK.md:
- Set Phase N status to `complete` with commit hash
- Increment Current Phase to N+1
- Reset Attempts to 0/3
## Attempt Tracking
On any test/lint failure:
1. Increment attempt counter (e.g., 1/3 → 2/3)
2. Log error in Attempt Log: `[YYYY-MM-DD HH:MM] - Error description`
3. At 3/3: Set status to BLOCKED, document all attempts, STOP work
## End
git status # Clean
bun test # Pass
bun run lint # Pass
Phase Gates & Verification
The biggest improvement in this system is phase gates: mandatory checkpoints that prevent the AI from moving forward until the current phase is verified.
┌─────────────────────────────────────────────────────────────────┐
│ PHASE GATE FLOW │
│ │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │Phase 1 │───▶│ VERIFY │───▶│Phase 2 │───▶│ VERIFY │───▶ ... │
│ │ Build │ │& Commit│ │ Build │ │& Commit│ │
│ └────────┘ └────────┘ └────────┘ └────────┘ │
│ │ │ │
│ ▼ ▼ │
│ FAIL? Fix FAIL? Fix │
│ (counts as (counts as │
│ attempt) attempt) │
│ │
└─────────────────────────────────────────────────────────────────┘
Each feature is broken into execution phases:
- Schema & Types - Data structures, interfaces
- Validation - Input validation with tests
- Queries - Database/API operations with tests
- UI Components - User interface pieces
- Routes - Page/endpoint wiring
- Integration - End-to-end testing
After each phase:
- Run tests and linting
- If pass: Commit with descriptive message
- If fail: Fix the issue (counts as an attempt)
- Update CURRENT_TASK.md with phase status
This creates accountability. The AI can’t claim “phase 2 done” without a passing commit. You can trace exactly when each phase completed and what changes were included.
Attempt Tracking & Blocking
AI agents can get stuck in loops, trying the same failing approach repeatedly. The attempt tracking system prevents this.
## Current Phase
**Phase:** 2 - Validation
**Attempts:** 2/3
### Attempt Log
[2024-01-15 10:30] - bun test failed: test_validate_title expect error
[2024-01-15 10:45] - bun test failed: validation function not exported
The rules:
- Each test/lint failure increments the attempt counter
- Log what went wrong and what was tried
- At 3 attempts: STOP. Mark as BLOCKED. Wait for human.
┌─────────────────────────────────────────────────────────────────┐
│ ATTEMPT TRACKING │
│ │
│ Attempt 1 ──▶ Fail ──▶ Log + Fix ──▶ Attempt 2 ──▶ Fail │
│ │ │
│ ▼ │
│ Log + Fix ──▶ Attempt 3 │
│ │ │
│ ▼ │
│ Fail ──▶ BLOCKED │
│ │ │
│ ▼ │
│ Wait for Human │
│ │
└─────────────────────────────────────────────────────────────────┘
When a task is BLOCKED:
## Status
**BLOCKED** - Exceeded 3 attempts on Phase 2
### Attempt Log
[2024-01-15 10:30] - bun test failed: validation function not found
[2024-01-15 10:45] - Added export, now type error in todo.ts
[2024-01-15 11:00] - Fixed types, now circular import error
### What Was Tried
1. Added missing export to validators/index.ts
2. Fixed CreateTodoInput type definition
3. Attempted to break circular dependency by moving types to separate file
### What's Needed
Human review of the circular dependency between validators/todo.ts
and types/todo.ts. May need architectural decision about where
validation types should live.
The AI stops making things worse. You come back to a documented situation, not a mess.
Feature Registry: The features/ Directory
Instead of a monolithic feature_list.json, the improved system uses a directory structure:
features/
├── index.json # Lightweight: status + session plan
├── backlog.json # Slim: id, name, dependencies only
└── active/ # Full definitions for current work
└── CORE-001.json
index.json (lightweight, always read):
{
"meta": {
"project": "todo-api",
"total_features": 12,
"completed_count": 3,
"next_priority": "CORE-002",
"session_plan": {
"1": {
"name": "Foundation",
"features": ["SETUP-001", "SETUP-002"],
"token_budget": 50000
},
"2": {
"name": "Core API",
"features": ["CORE-001", "CORE-002", "CORE-003"],
"token_budget": 80000
}
}
},
"status_map": {
"SETUP-001": { "passes": true, "session": 1 },
"SETUP-002": { "passes": true, "session": 1 },
"CORE-001": { "passes": false, "session": 2 }
}
}
backlog.json (slim reference):
{
"features": [
{
"id": "CORE-001",
"name": "Database Layer",
"session": 2,
"dependencies": ["SETUP-001"]
},
{
"id": "CORE-002",
"name": "Todo CRUD Routes",
"session": 2,
"dependencies": ["CORE-001"]
}
]
}
active/CORE-001.json (full definition):
{
"id": "CORE-001",
"name": "Database Layer",
"description": "SQLite database for storing todos using Bun's built-in sqlite",
"session": 2,
"token_budget": 25000,
"dependencies": ["SETUP-001"],
"acceptance_criteria": [
"Can create a new database file",
"Can add a todo item",
"Can list all todo items",
"Can mark a todo as complete",
"Can delete a todo",
"All operations have unit tests"
],
"execution_phases": [
{
"phase": 1,
"name": "Schema & Types",
"tasks": ["Todo interface", "Database schema"]
},
{
"phase": 2,
"name": "Validation",
"tasks": ["Title validation", "Input sanitization"]
},
{
"phase": 3,
"name": "Queries",
"tasks": ["CRUD operations", "Query tests"]
}
],
"quality_gates": [
"All CRUD operations work",
"Database handles errors gracefully",
"Error responses follow ApiError format"
]
}
Session Planning
Features are grouped into sessions — coherent chunks of work that fit within a token budget and represent a logical milestone.
"session_plan": {
"1": {
"name": "Foundation",
"features": ["SETUP-001", "SETUP-002"],
"token_budget": 50000
},
"2": {
"name": "Core Database",
"features": ["CORE-001", "CORE-002"],
"token_budget": 80000
},
"3": {
"name": "API Routes",
"features": ["ROUTE-001", "ROUTE-002", "ROUTE-003"],
"token_budget": 100000
}
}
Before each session:
- Check which session you’re on
- Verify all features from previous sessions pass
- Load the full definitions for this session’s features
- Check that dependencies are met
Session completion:
- All features in the session have
passes: true - All tests pass
- Clean git status
- CURRENT_TASK.md updated to next session’s first feature
This creates natural checkpoints and helps estimate project progress.
CURRENT_TASK.md: Real-Time Progress
The CURRENT_TASK.md file tracks the active task with phase-level detail.
# CORE-001: Database Layer
**Status:** IN_PROGRESS | **Session:** 2
## Phase Progress
| Phase | Name | Status | Commit |
|-------|-----------------|----------|---------|
| 1 | Schema & Types | complete | a1b2c3d |
| 2 | Validation | complete | e4f5g6h |
| 3 | Queries | in_progress | - |
| 4 | UI Components | pending | - |
| 5 | Routes | pending | - |
| 6 | Integration | pending | - |
## Current Phase
**Phase:** 3 - Queries
**Attempts:** 1/3
### Attempt Log
[2024-01-15 14:30] - pytest failed: test_list_todos assertion error
## Acceptance Criteria
- [x] Can create a new database file
- [x] Can add a todo item
- [ ] Can list all todo items
- [ ] Can mark a todo as complete
- [ ] Can delete a todo
- [ ] All operations have unit tests
## Next
- Fix list_todos() to return correct order
- Implement mark_complete() and delete()
- Complete test coverage
The phase table makes it immediately clear:
- What’s been committed
- What’s in progress
- Where the AI is stuck (if blocked)
Implementation Hints
When planning a feature, the improved system includes implementation hints — concrete file paths, pattern references, and code snippets.
## Implementation Hints
### Files to Create/Modify
- `src/db/index.ts` (create)
- `src/types/todo.ts` (modify: add Todo interface)
- `tests/db/todos.test.ts` (create)
### Pattern References
- Database setup: `src/db/connection.ts` lines 1-20
- Interface with validation: `src/types/user.ts` lines 1-25
### Test Cases (write first)
1. Empty database returns empty list
2. Add todo returns generated ID
3. List todos returns in creation order
4. Complete todo updates completedAt field
5. Delete non-existent todo returns null
### Code Snippets
```typescript
interface Todo {
id: number;
title: string;
description: string | null;
completedAt: string | null;
createdAt: string;
}
These hints serve multiple purposes:
- New agents know exactly where to look
- Patterns are explicitly referenced, not guessed
- Test cases are defined before implementation
- Key interfaces are sketched out
---
## Context Handoff Protocol
When spawning a new agent (new conversation, parallel worker, or session timeout), context can be lost. The **handoff protocol** preserves it.
Create `.claude/HANDOFF.md` before spawning:
```markdown
# Handoff: CORE-001
## Quick Start
1. Read this file
2. Read `/CURRENT_TASK.md`
3. Run `bun test` to verify clean state
## Current State
- **Phase:** 3 - Queries
- **Status:** in_progress
- **Last commit:** e4f5g6h
## Key Files
### Pattern to follow
- `src/db/connection.ts` lines 1-20 (database setup pattern)
### Types modified
- `src/types/todo.ts`: Todo interface
### Files created this session
- `src/db/todos.ts`
- `tests/db/todos.test.ts`
## Test Command
```bash
bun test tests/db/todos.test.ts
Context Notes
Using Bun’s built-in SQLite (synchronous API, not async). Decided to use integer IDs, not UUIDs, for simplicity.
Remaining Work
- Fix listTodos() ordering
- Implement markComplete()
- Implement deleteTodo()
- Add remaining tests
The new agent:
1. Reads HANDOFF.md first
2. Gets immediate context about where things stand
3. Knows exactly which files to check
4. Understands decisions already made
---
## The Workflow
The improved Agentic Context System follows a phase-gated loop:
```text
┌─────────────────────────────────────────────────────────────────┐
│ THE AGENTIC LOOP │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ READ │───▶│ WORK │───▶│ VERIFY │───▶│ COMMIT │ │
│ │ Context │ │ Phase N │ │ & Test │ │ Phase N │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ ▲ │ │ │
│ │ │ ▼ │
│ │ FAIL?│ ┌─────────┐ │
│ │ │ │ UPDATE │ │
│ │ ▼ │ TASK.md │ │
│ │ [Attempt++] └─────────┘ │
│ │ │ │ │
│ │ │ ▼ │
│ │ 3 attempts? Phase N+1? │
│ │ │ │ │
│ │ ▼ │ │
│ │ BLOCKED │ │
│ │ │ │
│ └─────────────────────────────────────────────┘ │
│ (next phase) │
└─────────────────────────────────────────────────────────────────┘
Session Start:
- AI reads CLAUDE.md, Prompt.md, CURRENT_TASK.md
- AI checks features/index.json for session context
- AI loads full feature definition from features/active/
- AI continues from the current phase
During Session (for each phase):
- Build the phase deliverables
- Run tests and linting
- If pass: Commit with
[FEATURE_ID] Phase N: Description - If fail: Fix (increment attempt counter), retry
- At 3 failures: BLOCKED, stop
- Update CURRENT_TASK.md phase table
- Move to next phase
Session End:
- All phases complete → Update status_map to
passes: true - Clean git status
- If spawning new agent → Create HANDOFF.md
The Supporting Files
Three additional files round out the system:
rules.md
Critical rules that must never be violated: one task at a time, commit after each phase, never break tests, 3-attempt blocking rule. These are the non-negotiables.
checklists.md
Phase-specific verification procedures. Schema checklist. Validation checklist. Integration checklist. The AI runs these before marking phases complete.
lessons-learned.md
Accumulated wisdom from past sessions. That SQLite connection issue. That import that always breaks. Patterns that work. Mistakes to avoid.
rules.md excerpt:
## CRITICAL RULES - NEVER VIOLATE
1. **PHASE GATE RULE**
- Never move to Phase N+1 until Phase N passes tests
- Every phase completion requires a commit
- Failed tests count as attempts toward the 3-attempt limit
2. **ONE TASK AT A TIME**
- Never work on more than one feature per session
- Complete the current task before starting another
- If blocked, mark as BLOCKED rather than switching tasks
3. **ATTEMPT TRACKING**
- Log every failure with timestamp and description
- At 3 attempts: STOP. Mark BLOCKED. Wait for human.
- Never silently retry without logging
lessons-learned.md example:
## Database
1. BUN SQLITE IS SYNCHRONOUS
- Bad: `await db.query(...).all()` // won't work
- Good: `db.query(...).all()` // sync API
- Discovered in CORE-001 when async/await caused errors
2. USE INTEGER IDS, NOT UUIDS
- SQLite auto-increment is simpler and faster
- UUIDs add complexity without benefit for this project
- Decided in CORE-001 planning phase
3. USE IN-MEMORY DB FOR TESTS
- Good: `new Database(':memory:')`
- Keeps tests fast and isolated
The lessons-learned file grows over time. Each mistake becomes a permanent memory. The AI never makes the same error twice because it reads the file at session start.
A Real Session Walkthrough
Here’s how a session typically flows. We’re building a todo API and working on the database layer.
Starting state: CURRENT_TASK.md shows Phase 1 complete, Phase 2 in progress at attempt 1/3.
You say: “Read my project files and continue the current task.”
What the AI does:
-
Reads context: CLAUDE.md (Bun/Hono/SQLite stack), Prompt.md (phase gates, attempt rules), CURRENT_TASK.md (Phase 2, 1/3 attempts), lessons-learned.md (Bun SQLite is sync)
-
Checks attempt log: Previous attempt failed on type export. Fixes the export.
-
Completes Phase 2: Validation tests pass. Commits
[CORE-001] Phase 2: Validation. -
Updates CURRENT_TASK.md: Phase 2 → complete with commit hash. Phase 3 → in_progress. Attempts → 0/3.
-
Starts Phase 3: Implements CRUD queries. Tests pass. Commits
[CORE-001] Phase 3: Queries. -
Continues through phases until feature complete or session ends.
Ending state: All phases complete. Feature passes. status_map updated. Ready for next feature.
The key observation: the AI used the synchronous SQLite pattern from lessons-learned.md without being told. It checked the attempt log and fixed the specific issue. It committed after each phase. That’s the point of the system: structured, verifiable progress.
Getting Started
Setting up the Agentic Context System takes about 5 minutes:
# Clone the template
git clone https://github.com/balevdev/agentic-context-system.git my-project
cd my-project
# Initialize git (if starting fresh)
git init
git add .
git commit -m "chore: initialize agentic context system"
# Customize for your project
# 1. Edit CLAUDE.md with your project specs
# 2. Edit features/index.json with your session plan
# 3. Create features/active/FEATURE_ID.json for your first feature
# 4. Edit .claude/static/checklists.md with your test commands
# 5. Start your first session
Customization Guide
The template includes placeholder commands. Replace them with your stack-specific tools:
| Stack | Test Command | Type Check | Lint |
|---|---|---|---|
| Bun | bun test | bun run typecheck | |
| Node.js/TypeScript | npm test | tsc --noEmit | |
| Python | pytest | mypy . | |
| Go | go test ./... | go vet ./... |
Update .claude/static/checklists.md with your specific commands. The AI will use these during verification.
Project Type Customization
REST API: Add endpoint patterns, error response format, authentication approach, database schema.
Web App: Add component patterns, state management, routing conventions.
CLI Tool: Add command structure, argument parsing patterns, help text conventions.
Library: Add public API documentation, versioning strategy, backward compatibility rules.
Common Pitfalls
Why This Works
The improved system works because it addresses the specific failure modes of AI coding:
Conclusion
AI coding assistants start every session from zero because there’s nothing to remember from. The Agentic Context System provides that something: a structured set of files that externalize your project’s architecture, coding standards, progress, and accumulated lessons.
The improved system adds accountability through phase gates, reliability through attempt tracking, clarity through implementation hints, and continuity through handoff protocols.
The AI reads these files at session start and updates them at session end. Context persists. Progress accumulates. Mistakes become documented lessons. And you can trace exactly what happened, when, and why.
If you work with AI coding assistants on projects spanning multiple sessions, set up the Agentic Context System. Clone the template, customize the files for your project, and start your next session with context instead of explanations.