Building an internal agent: Context window compaction @ Irrational Exuberance

This is part of the Building an internal agent series.

        December 31, 2025

Building an internal agent: Context window compaction @ Irrational Exuberance

    Hi folks,

  This is the weekly digest for my blog, Irrational Exuberance. Reach out with thoughts on Twitter at @lethain, or reply to this email.

Posts from this week:

 - 
Building an internal agent: Context window compaction 
  - 
Building an internal agent: Progressive disclosure and handling large files 
  - 
Building an internal agent: Adding support for Agent Skills 

 Building an internal agent: Context window compaction 
Although my model of choice for most internal workflows remains ChatGPT 4.1
for its predictable speed and high-adherence to instructions, even its 1,047,576-token context window can run out of space.
When you run out of space in the context window, your agent either needs to give up, or it needs to compact that large context
window into a smaller one. Here are our notes on implementing compaction.
This is part of the Building an internal agent series.
Why compaction matters
Long-running workflows with many tool calls or user messages, along with any workflow
dealing with large files, often run out of space in their context window.
Although context window exhaustion is not relevant in most cases you’ll find for internal agents,
ultimately it’s not possible to implement a robust, reliable agent without solving
for this problem, and compaction is a straightforward solution.
How we implemented it
Initially, in the beautiful moment where we assumed compaction wouldn’t be a relevant concern to
our internal workflows, we implemented an extremely naive solution to compaction:
if we ever ran out of tokens, we discarded older tool responses until we had more space,
then continued.
Because we rarely ran into compaction, the fact that this worked poorly wasn’t a major
issue, but eventually the inelegance began to weigh on me as we started dealing with
more workflows with large files.
In our initial brainstorm on our 2nd iteration of compaction, I initially got anchored
on this beautiful idea that compaction should be sequenced after
implementing support for sub-agents, but I was never able to ground
that intuition in a concrete reason why it was necessary, and we implemented compaction
without sub-agent support.
The gist of our approach to compaction is:

After every user message (including tool responses), add a system message with the consumed and available tokens
in the context window. In that system message, we also include the updated list of available files that can
be read from

User messages and tool responses greater than 10,000 tokens are exposed as a new “virtual file”, with only
their first 1,000 tokens included in the context window. The agent must use file manipulation tools to read more than
those first 1,000 tokens (both 1k and 10k are configurable values)

Add a set of “base tools” that are always available to agents, specifically including the virtual file manipulation tools,
as we’d finally reached a point where most agents simply could not operate without a large number of mostly invisible internal
tools. These tools were file_read which can read entire files, lines ranges within a file, or byte ranges within a file,
and file_regex which is similar but performs a regex scan against a file up to a certain number of matches.
Every use of a file is recorded in the files data, so the agent knows what has and hasn’t been read into
the context window (particularly relevant for preloaded files), along the lines of:
<files>
 <file id='a' name='image.png' size='32kb'>
 <file_read />
 </file>
 <file id='a' name='image.png' size='32kb'>
 <file_read start_line=10 end_line=20 />
 </file>
</files>
This was surprisingly annoying to implement cleanly, mostly because I came onto this
idea after iteratively building the agent as a part-time project for several months.
If I could start over, I would start with files as a core internal construct, rather
than adding it on later.

If a message pushed us over 80% (configurable value) of the model’s available context window,
use the compaction prompt that Reddit claims Claude Code uses.
The prompt isn’t particularly special, it just already exists and seems pretty good

After compacting, add the prior context window as a virtual file to allow the agent to retrieve pieces of context
that it might have lost

Each of these steps is quite simple, but in combination they really do provide a fair amount of power
for handling complex, prolonged workflows.
Admittedly, we still have a configurable cap on the number of tools that can be called in a workflow
(to avoid agents spinning out), but this means that agents dealing with large or complex data are much
more likely to succeed usefully.
How is it working? / What’s next?
Whereas for most of our new internal agent features,
there are obvious problems or iterations, this one feels
like it’s good enough to forget for a long, long time.
There are two reasons for this:
first, most of our workflows don’t require large context windows,
and, second, honestly this seems to work quite well.
If context windows get significantly larger in the future, which I don’t see
too much evidence will happen at this moment in time, then we will simply
increase some of the default values to use more tokens, but the core algorithm
here seems good enough.

 Building an internal agent: Progressive disclosure and handling large files 
One of the most useful initial extensions I made to our workflows was injecting
associated images into the context window automatically, to improve the quality
of responses to tickets and messages that relied heavily on screenshots.
This was quick and made the workflows significantly more powerful.
More recently, there are a number of workflows attempting to operate on large
complex files like PDFs or DOCXs, and the naive approach of shoving them into
the context window hasn’t worked particularly well.
This post explains how we’ve adapted the principle of progressive disclosure
to allow our internal agents to work with large files.
This is part of the Building an internal agent series.
Large files and progressive disclosure
Progressive disclosure
is the practice of limiting what is added to the context window to the minimum necessary amount,
and adding more detail over time as necessary.
A good example of progressive disclosure is how agent skills are implemented:

Initially, you only add the description of each available skill into the context window
You then load the SKILL.md on demand
The SKILL.md can specify other files to be further loaded as helpful

In our internal use-case, we have skills for JIRA formatting, Slack formatting, and Notion formatting.
Some workflows require all three, but the vast majority of workflows require at most one of these skills,
and it’s straightforward for the agent to determine which are relevant to a given task.
File management is a particularly interesting progressive disclosure problem, because files are so
helpful in many scenarios, but are also so very large. For example,
requests for help in Slack are often along the lines of “I need help with this login issue ”,
which is impossible to solve without including that image into the context window.
In other workflows, you might want to analyze a daily data export in a very large PDF which is 5-10MB as
a PDF, but only 10-20kb of tables and text when extracted from the PDF.
This gets even messier when the goal is to compare across multiple PDFs, each of which is quite large.
Our approach
Our high-level approach to the large-file problem is as follows:

Always include metadata about available files in the prompt, similar to the list of available skills.
This will look something like:
Files:
 - id: f_a1
 name: my_image.png
 size: 500,000
 preloaded: false
 - id: f_b3
 name: ...
The key thing is that each id is a reference that the agent is able to pass
to tools. This allows it to operate on files without loading their context into
the context window.

Automatically preload the first N kb of files into the context window,
as long as they are appropriate mimetypes for loading (png, pdf, etc).
This is per-workflow configurable, and could be set as low as 0
if a given workflow didn’t want to preload any files.
I’m still of mixed minds whether preloading is worth doing,
as it takes some control away from the agent.

Provide three tools for operating on files:

load_file(id) loads an entire file into the context window
peek_file(id, start, stop) loads a section of a file into the context window
extract_file(id) transforms PDFs, PPTs, DOCX and so on into simplified textual versions

Provide a large_files skill which explains how and when to use the above tools
to work with large files. Generally, it encourages using extract_file on any PDF, DOCX or PPT
file that it wants to work with, and otherwise loading or peeking depending on the available space
in the context window

This approach was quick to implement, and provides significantly more control
to the agent to navigate a wide variety of scenarios involving large files.
It’s also a good example of how the “glue layer” between LLMs and tools is actually
a complex, sophisticated application layer rather than merely glue.
How is this working?
This has worked well. In particular, one of our internal workflows oriented around
giving feedback about documents attached to a ticket, in comparison to other
similar, existing documents. The workflow simply did not work at all
prior to this approach, and now works fairly well without workflow-specific
support for handling these sorts of large files,
because the large_files skill handles that in a reusable fashion without
workflow authors being aware of it.
What next?
Generally, this feels like a stand-alone set of functionality that doesn’t
require significant future investment, but there are three places where
we will need to continue building:

Until we add sub-agent support, our capabilities are constrained.
In many cases, the ideal scenario of dealing with a large file is opening it in a sub-agent
with a large context window, asking that sub-agent to summarize its contents,
and then taking that summary into the primary agent’s context window.
It seems likely that extract_file should be modified to return a referencable, virtual file_id
that is used with peek_file and load_file rather than returning contents directly.
This would make for a more robust tool even when extracting from very large files.
In practice, extracted content has always been quite compact.
Finally, operating within an AWS Lambda requires pure Python packages, and ultimately
pure Python is not very fast at parsing complex XML-derived document formats like DOCX.
Ultimately, we could solve this by adding a layer to our lambda with the lxml
dependencies in it, and at some point we might.

Altogether, a very helpful extension for our internal workflows.

 Building an internal agent: Adding support for Agent Skills 
When Anthropic introduced Agent Skills,
I was initially a bit skeptical of the problem they solved–can we just use prompts and tools?–but I’ve subsequently
come to appreciate them, and have explicitly implemented skills in our internal agent framework.
This post talks about the problem skills solves, how the engineering team at Imprint implemented them,
how well they’ve worked for us, and where we might work with them next.
This is part of the Building an internal agent series.
What problem do Agent Skills solve?
Agent Skills are a series of techniques that solve three important workflow problems:

use progressive disclosure to more effectively utilize the constrained context windows
minimize conflicting or unnecessary context in the context window
provide reusable snippets for solving recurring problems to avoid individual workflow-creators having to solve
recurring problems like e.g. Slack formatting or dealing with large files

All three of these problems initially seemed very insignificant when we started building out our internal workflows,
but once the number of internal workflows reached into the dozens, both become difficult to manage.
Without reusable snippets, I lost the leverage to improve all workflows at once, and without progressive disclosure
the agents would get a vast amount of irrelevant content that could confuse them, particularly when it came to things
like inconsistencies between Markdown and slack’s mrkdwn formatting language, both of which are important to different
tools used by our workflows.
How we implemented Agent Skills
As a disclaimer, I recognize that it’s not necessary to implement agent skills,
as you can integrate with e.g. Claude’s Agent Skills support for APIs.
However, one of our design decisions is being largely platform agnostic, such that we can switch
across model providers, and consequently we decided to implement skills within our framework.
With that out of the way, we started implementing by reviewing
the Agent Skills documentation at agentskills.io,
and cloning their Python reference implementation skills-ref
into our repository to make it accessible to Claude Code.
The resulting implementation has these core features:

Skills are in skills/ repository, with each skill consisting of its own sub-directory
with a SKILL.md

Each skill is a Markdown file with metadata along these lines:
---
name: pdf-processing
description: Extract text and tables...
metadata:
 author: example-org
 version: "1.0"
---

The list of available skills–including their description from metadata–is injected into the system prompt at the beginning of each workflow,
and the load_skills tool is available to the agent to load the entire file into the context window.

Updated workflow configuration to optionally specify required, allowed, and prohibited skills to
modify the list of exposed skills injected into the system prompt.
My guess is that requiring specific skills for a given workflow is a bit of an anti-pattern, “just let the agent decide!”,
but it was trivial to implement and the sort of thing that I could imagine is useful in the future.

Used the Notion MCP to retrieve all the existing prompts in our prompt repository,
identify existing implicit skills in the prompts we had created, write those initial
skills, and identify which Notion prompts to edit to eliminate the now redundant sections
of their prompts.

Then we shipped it into production.
How they’ve worked
Humans make mistakes all the time. For example, I’ve seen many dozens of JIRA tickets from
humans that don’t explain the actual problem they are having. People are used to that,
and when a human makes a mistake, they blame the human.
However, when agents make a mistake, a surprising percentage of people view it as a fundamental limitation of agents
as a category, rather than thinking that, “Oh, I should go update that prompt.”
Skills have been extremely helpful as the tool to continue refining down these edge cases
where we’ve relied on implicit behavior because specifying the exact behavior was simply overwhelming.
As one example, we ask that every Slack message end with a link to the prompt that drove the
response. That always worked, but the details of the formatting would vary in an annoying, distracting
way: sometimes it would be the equivalent of [title](link), sometimes link, sometimes [link](link).
With skills, it is now (almost always) consistent, without anyone thinking to include those instructions
in their workflow prompts.
Similarly, handling large files requires a series of different tools that benefit from
In-Context Learning (aka ICL, which is a fancy term for including a handful of examples of correct and incorrect usage),
which absolutely no one is going to add to their workflow prompt but is extremely effective
at improving how the workflow uses those tools.
For something that I was initially deeply skeptical about, I now wish I had implemented skills much earlier.
Where we might go next
While our skills implementation is working well today, there are a few
opportunities I’d like to take advantage of in the future:

Add a load_subskill skill to support files in skills/{skill}/* beyond the SKILL.md.
So far, this hasn’t been a major blocker, but as some skills get more sophisticated,
the ability to split varied use-cases into distinct files would improve our ability
to use skills for progressive disclosure

One significant advantage that Anthropic has over us is their sandboxed Python interpreter,
which allows skills to include entire Python scripts to be specified and run by tools.
For example, a script for parsing PDFs might be included in a skill, which is extremely handy.
We don’t currently have a sandboxed interpreter handy for our agents,
but this could, in theory anyway, significantly cut down on the number of custom skills
we need to implement.
At a minimum, it would do a much better job at operations that require reliable math
versus relying on the LLM to do its best at performing math-y operations.

I think both of these are actually pretty straightforward to implement.
The first is just a simple feature that Claude could implement in a few minutes.
The latter feels annoying to implement, but could also be implemented in less than
an hour by running a second lambda running Nodejs with Pyodide,
and exposing access to that lambda as a tool. It’s just so inelegant for a Python process
to call a Nodejs process to run sandboxed Python that I haven’t done it quite yet.

That's all for now! Hope to hear your thoughts on Twitter at @lethain!

          This email was sent to *|HTML:EMAIL|*

why did I get this?    unsubscribe from this list    update subscription preferences

*|LIST:ADDRESSLINE|*

*|REWARDS|*

                            Don't miss what's next. Subscribe to Irrational Exuberance:

            Email address (required)