It’s time for context management to become Agentic.

The internal version has a better reading experience bytetech

中文版本

Context Compression Problem

Currently, most context management focuses on what should be put in and how to find the right things to put in, such as RAG, MEM, etc., with little discussion on how to clean it up.

Current cleanup mainly relies on reaching a certain threshold of the context window, such as 80%, triggering a compression to achieve cleanup. This might have been first introduced by Claude Code, and has now become a basic function.

Claude Code compression logic: Modify the system prompt, bring the full history of messages, and append a compression prompt.

As you can see, this is a non-cached, full history message call, which consumes quite a few Tokens.

1
2
3
4
5
6
7
8

System Prompt
Messages


Compact System Prompt
Messages
Compact User Prompt

Many people should have encountered the situation where after compression, a lot of things are lost, making the conversation difficult.

Theoretically, an Agent should be able to perceive “I need to compress” earlier, achieving more semantic/task-level context compression.

Ideally, an Agent can actively manage its own context, actively choosing to load and unload content, which should be very useful in long conversations/multi-topic scenarios.

Today’s Agents are like a program that can only allocate memory but cannot free memory, relying only on compression and then restarting.

Kimi D-Mail

I probably first saw the d-mail feature on kimi-cli.

When AI finds that it has done some low information density things, such as reading a large file where actually only a little bit is useful, it calls d-mail to perform time travel, letting the agent return to the context before reading, and brings a message telling its past self: read xxx and found xxx.

kimi docs https://github.com/MoonshotAI/kimi-cli/blob/main/src/kimi_cli/tools/dmail/dmail.md
Bytedance internal docs https://bytetech.info/articles/7571069998476165146
Public research materials https://leslieo2.github.io/posts/agent-control-via-timetravel-checkpoints/

Pi Session Tree

Later I saw pi agent, where the design of session is very interesting.

  • It has a complete, transparent, and vendor-agnostic unified context storage, and sessions can be easily handed over to other models to continue reasoning.
  • Sessions are stored in the form of a tree, where each message is a node, providing branching and jumping functions between nodes.

The /tree command allows jumping to any node, optionally carrying a summary, which is very similar to d-mail.

The author wrote about the design philosophy here, recommended reading https://mariozechner.at/posts/2025-11-30-pi-coding-agent/
In case you didn’t know, openclaw is developed using pi.

Of course, currently many agents have context storage and jump functions.

  • Context storage and restoration are basically /resume.
  • Claude/Codex both jump by pressing esc twice.
  • Opencode doesn’t have it, but has a /fork command, which might be too obscure, and the fork command is not even introduced in the documentation.

But anyway, these are all human-oriented, not agent-oriented.

Pi is very easy to develop extensions for, so simply put, find a way to give /tree to the AI.

Git-Like Tree

I think the session tree is easy to analogize to a git workflow.

  • Each message is a commit.
  • Jumping is checkout, you can jump to any commit.
  • The summary action is more like submitting a MR (Merge Request), not bringing all the garbage commits, but merging them into a single mr-commit.

For example:

1
2
3
4
5
6
7
8
9
├─ user: "Develop feature X"
│ └─ assistant: "plan..." <- 1. base branch
│ ├─ user: "Try developing with method A" <- 2. Create branch git branch-1 on base branch
│ │ └─ assistant: "work..."
│ │ └─ [......]
│ │ └─ user: "Not working well" <- 3. After generating a bunch of commits, create an MR to merge into base
│ └─ sum: "Tried method A..." <- 4. Submit as a streamlined mr-commit instead of all commits
│ └─ user: "Try developing with method B" <- 5. Continue development
│ └─ assistant: "..."

The data on the left is what the pi tree can provide. To allow the agent to jump accurately, all messages are tagged with IDs, and the agent only needs to call the tree jump with the ID.

But in reality, after generating a lot of dialogue, the session tree will be huge. The AI will explode if it looks at the tree context at once, so streamlining is a must.

Session Tree -> Session Log

Pi tree contains all branches. A complete tree might look like this, allowing infinite nested backtracking, and even backtracking to a historical branch again.

But the agent actually only needs to perceive the content of the current session, and does not need to perceive other branches, because the content of all branch messages is already included in the SUM node.

Then at this point, you will find that only looking at current red line messages = all conversations of current session, and everything seems to return to the matter of summary compression.

But there is one difference: we need to attach jump markers to this summary.

Jumping on Session Log

Session Log = Tagged Summary Message of the current session.

Just like git log.

1
2
3
4
5
6
7
35d4182f (ROOT)
ba87607d USER: xxx
a8e58e1d AI: xxxx
37ac65e1 TOOL: xxxx
36c8ea0b SUM: xxxx <- This is the summary message, like an mr-commit
236d45e1 USER: xxxx
a8e58e1d (HEAD) AI: xxxx

When deciding to jump, it is git checkout, carrying a message.

1
context_checkout("8c5265a1", "summary...")

The ReAct loop will become like this:

It should be noted that the summary here is a non-cached, full-history call. If it is called in every ReAct loop, the cost should be explosive.

There are several improvement ideas:

  1. Adjust trigger timing

    Lower frequency? Trigger based on specific scenarios/rules?

    In a sense, it seems to go back to the question of when and how to compress, and it is the same in terms of cost, just different in structural compression logic.

  2. Build within the session

    Summary and jump decisions continue in the current session, so caching can be used.

    Although there are all historical conversations in the current session, there are no IDs. How to provide markers during summary?

    2.1. Message content is ID. Agent: I want to go back to the time containing the “xxxxxx” message.
    2.2. The agent builds it itself in history, recording key nodes during the dialogue. The skeleton map of key nodes = session log.

    I think ‘b’ is more interesting.

Loop: Build - Perceive - Jump

Such a loop needs to be embedded in the agent’s dialogue.

  1. Build: The Agent actively marks key nodes during the dialogue to form a skeleton map.
    1. Because each session action is part of history, it will be saved to the session and comes with a message id.
  2. Perceive: Observe the context state and current position through the skeleton map.
  3. Jump: Decide to jump in the skeleton map and carry a message.
1
2
3
4
5
35d4182f (ROOT)
a8e58e1d (plan-done)AI: xxxx
36c8ea0b SUM: try A fail, reason: xx...
236d45e1 USER: try B
a8e58e1d (HEAD try-B-start) AI: xxxx

Tool Design

Still borrowing the concept of git, 3 tools are designed:

  • context_tag: git tag, mark nodes.
  • context_log: git log, view context skeleton.
  • context_checkout: git checkout, jump on the skeleton.

In order to allow AI to better perceive and decide, in addition to the context skeleton, it should also perceive context usage, dialogue depth, how far away from the nearest tag, and remind to tag in time. A HUD is designed at the front, and context_log looks something like this:

1
2
3
4
5
6
[Context Dashboard]
• Context Usage: 0.9% (8.2k/1.0M)
• Segment Size: 4 steps since last tag 'exp-b-start'
---------------------------------------------------
ba87607d ...
78c541e2 ...

There is still a lot of work to design a better context log:

  • Recent messages are best displayed in full.
  • If there are too many tags, secondary folding also needs to be considered.
  • Specify message id and range to view folded details, just like browsing git log.

Skill

In order to let the Agent use these tools better, a skill was also added:

  • Context knowledge, why compress.
  • When and how to use tools.
  • How to tag.
  • How to make decisions after observing context log, when to jump, and where to jump to.
  • How to generate checkout messages, what important messages should be included.
  • Best practices and cases.

Back to the Future: Lossless Time Travel

d-mail jumping is going back to the past. I also want to go to the future.

For example, a simple bug fixing problem to simulate a multi-thread dialogue scenario.

The green line is dmail-like going back to the past. A purple line is also needed to go back to the future. All time travel is lossless.

The implementation is also quite simple. Mark where it came from in all SUMs, so you can checkout back at any time.

1
2
3
4
5
35d4182f (ROOT)
a8e58e1d (plan-done)AI: xxxx
36c8ea0b (from 8ea0891b) SUM: try A fail, reason: xx...
236d45e1 USER: try B
a8e58e1d (HEAD try-B-start) AI: xxxx

Another advantage of session tree is that as long as the branch is not too old, it is in the cache.

There is still a lot of work to be done for better time travel:

  • It doesn’t have to be really going back, it can also be retrieving certain messages, perhaps providing a recall tool.
  • Historical messages are all in files, perhaps directly bring: this original message is in xxxx.jsonl. The agent searches and views it itself, and then backtracks to before viewing it after finishing.

Conclusion

Just developed, don’t know how much improvement it can bring, need more business verification, welcome to try.

1
2
npm install -g @mariozechner/pi-coding-agent
pi install npm:pi-context

https://github.com/ttttmr/pi-context

Theoretically, it can also be migrated to other tools, after all, they all have session storage functions.

Some Other Thoughts

  • Giving the Agent a structured context and letting the Agent orchestrate and manage it itself may be a good direction in the future, which may be useful for multi-thread/long-cycle tasks.
    • Personal assistant: For example, chatting in Doubao, changing topics, and then jumping back to the original topic.
    • wide/deep-research might also be useful because there are many choices and a lot of noise.
  • Branch exploration and then backtracking is a bit like a sub-agent sharing historical context, and the d-mail message is the response of the sub-agent.
    • The advantage of sub-agents is concurrency.
  • Actually, it is also a bit like plan. Compared to planning-with-files, it is more like planning-in-context-files.
  • If tag/checkout is paired with optional matching git operations, context and local files can be backtracked synchronously.
  • If all sessions of an agent are on a huge session tree and can be backtracked at any time, is that memory? Through continuous summary, important content is naturally retained on the main line of the session tree, and unimportant content is gradually diluted in distant branches.
  • The summary field attached to the openai responses api is very suitable for building session log skeletons, but unfortunately pi is not compatible.

Ads

I developed other pi extensions, welcome to use