Most failures in multi-step agent systems trace back to mismanaged context, not model size. This repository collects concise, transferable "skills" that teach how to curate the limited attention budget—when to compress, mask, offload, or evolve skill definitions—so agents keep relevant signals and avoid lost-in-the-middle or context-poisoning failures.
What Sets It Apart
- Progressive disclosure: skills load lightweight metadata at startup and full content only when activated, reducing attention consumption and making the approach practical for long-running sessions — so what? fewer tokens wasted on idle details and more reliable task focus.
- Platform-agnostic skill spec: content and examples are vendor-neutral (Claude Code, Cursor, general plugins) with pseudocode that runs across environments — so what? teams can adopt principles without reworking provider-specific tooling.
- Operational & evaluation-first: includes evaluation modules (LLM-as-judge patterns, pairwise comparison, rubrics) and examples for production telemetry — so what? you get guidance for measuring regressions and fairness, not just prototyping.
- Architecture + examples: coverage ranges from compression and memory systems to hosted/background agents and BDI-like mental state mapping — so what? it's useful both for architecture design and for mapping concrete engineering tasks to context strategies.
Who It's For and Trade-offs
Great fit if you design or maintain LLM-based agent systems, build multi-agent orchestration, or need repeatable evaluation frameworks. The repo is especially helpful for engineers who must squeeze value from limited context budgets and want concrete patterns rather than ad-hoc prompts.
Look elsewhere if you need a drop-in runtime, heavyweight SDK, or polished UI: the repository is instructional and example-driven (MIT-licensed) rather than a single integrated framework. It assumes familiarity with agent concepts and requires implementers to adapt pseudocode to their stack.
Where It Fits
Think of this as a skills-and-patterns playbook focused on context engineering—complementary to orchestration toolkits (LangChain-like frameworks) and model-serving platforms. Use it to inform system design, tooling decisions, and evaluation methodology rather than as a turnkey runtime.
