How to Read Tech Work and Write Better Docs
Three things to understand:
Why did it appear? What new abstraction does it introduce? What does it inspire in your own work?
Part I — What questions should you bring when learning something new?
I recommend keeping six types of questions in mind.
1. What is the core tension this work addresses?
Don’t start by asking “how does it work?” Ask instead:
Before this work existed, what pain were people experiencing?
For example, when reading about agents, harnesses, or Kaggle automation, ask:
Is it solving the problem of agents being unable to run autonomously over long periods?
Or the unreliability of tool calls?
Or state management?
Or evaluation and observability?
Or multi-agent coordination?
Good technical work always revolves around a single core tension.
For example:
A typical agent demo can complete short tasks,
but real-world tasks involve long waits, changing external state, failure recovery,
cost control, and context loss.
Which of these does this work try to solve?
Once you identify that tension, all the design decisions become much easier to understand.
2. What is the core insight?
In other words:
What is the one thing the author most wants you to believe?
That thing is usually not an API — it’s a design judgment.
For example:
An agent should not just be a chat loop — it should be an event-driven system.
Or:
LLM capability is not the bottleneck — environmental feedback and evaluation mechanisms are.
Or:
Long-running tasks should not block the agent; they should be split into submit, observe, recover, and aggregate.
When reading a piece of work, try to compress it into a single sentence:
The core insight of this work is: __________
If you can’t compress it, you’re still reading the surface features.
3. What new abstraction does it introduce?
The value of technical work is often not the code — it’s the abstraction.
Ask yourself:
How does it decompose a complex system into concepts?
What is the relationship between those concepts?
Is there a way of slicing the problem I hadn't thought of before?
Common abstractions in agent systems include:
Agent
Tool
Memory
State
Session
Task
Run
Trace
Event
Environment
Evaluator
Checkpoint
Policy
Skill
Subagent
The question to ask is:
Why slice it this way? What problem does this decomposition solve? Does this decomposition have side effects?
For instance, when comparing Claude Code, OpenAI Agents SDK, LangGraph, Hermes, and opencode, don’t just compare “who has what feature” — compare their abstraction boundaries:
Claude Code is more like a coding agent runtime
LangGraph is more like a stateful workflow graph
OpenAI Agents SDK is more like an agent orchestration + tracing framework
Hermes/opencode is more like a terminal-native autonomous coding agent
This matters far more than memorizing feature lists.
4. Where are the system boundaries?
This is the skill most worth developing.
Ask:
What does it solve?
What does it not solve?
What does it assume already exists?
Where does it push complexity?
Many projects claim to “let agents automatically complete tasks,” but in reality they may assume:
Tasks are short
The environment is stable
Tool calls are reliable
A user is constantly supervising
Failures are handled manually
No long-term state is needed
No rigorous evaluation is needed
Actively look for these hidden assumptions.
For a Kaggle Agent or Research OS scenario, ask:
Can it handle long waits?
Can it handle multiple concurrent experiments?
Can it recover from interruptions?
Can it distinguish the reliability of experimental results?
Can it avoid redundant trial-and-error?
Can it accumulate experience over time?
Can subsequent agent sessions pick up prior context?
If a piece of work doesn’t address these questions, it may only be partially useful for your use case.
5. What are the trade-offs?
Every design choice has a cost.
Ask:
What capability did it gain, and what did it sacrifice to get it?
Common trade-offs:
Flexibility vs. controllability
Degree of automation vs. interpretability
Concurrency vs. state consistency
General-purpose framework vs. domain-specific optimization
LLM autonomy vs. harness constraints
Rapid demo vs. production reliability
For example:
LangGraph's state graph makes workflows more controllable, but reduces agent freedom.
Claude Code is powerful, but many of its internal mechanisms are opaque.
Writing your own harness is flexible, but has a high development cost.
Fully autonomous agents look impressive, but are prone to runaway behavior, wasted budget, and hard-to-debug failures.
When studying a piece of work, don’t just look at its advantages — look at what it chose to sacrifice.
6. What does it inspire in your own work?
Finally, bring it back to yourself.
After reading each piece of work, write four sentences:
The problem this work solves is:
Its core idea is:
Where it doesn't fit my situation:
What I can borrow from it:
For example:
This work solves the observability problem for agents in complex tasks.
Its core idea is to trace every tool call, model decision, and state change.
It doesn't directly address long Kaggle kernel wait times or experiment aggregation.
But I can borrow its trace schema to record my agent's decision-making across experiments.
That is what it truly means to “understand” something.
Part II — If you’re writing your own technical documentation, what should it say?
If you want others to find your work interesting, the documentation cannot just be:
I built a Kaggle agent that can automatically run experiments.
That’s too generic.
You need to tell a compelling technical story:
Current agents are good at writing code, but they can’t reliably run long-term experiments. What I built is an agent OS for Kaggle / research automation — one where agents can submit experiments, wait asynchronously, restore context, aggregate results, and continuously improve.
The reader should immediately understand: this is not a generic wrapper — it’s solving a real system-level problem.
1. Start with the pain point, not the feature list
Don’t open with:
This project is an AI agent framework for Kaggle competitions.
That’s flat.
A better opening:
Current coding agents are good at writing code in short interactive sessions, but they struggle with long-running experimental workflows. In Kaggle-style research, an agent often needs to submit a notebook, wait for remote execution, inspect logs and scores, compare multiple experiments, and continue from prior decisions. Most agents either block while waiting or lose context after the run finishes.
This opening immediately establishes a sense of the problem.
2. Then state your core position
You need a thesis statement.
For example:
My argument is that Kaggle automation should not be a loop of an agent writing code — it should be designed as an event-driven research operating system.
Or:
The key idea is to separate experiment execution from agent reasoning: agents propose and launch experiments, while an external watcher observes remote execution and feeds structured results back into a persistent research state.
This sentence is critical — it determines how others perceive your work.
You’re not saying “I called the Kaggle CLI.” You’re saying:
I decoupled agent reasoning, experiment execution, external waiting, and result aggregation.
That has architectural substance.
3. Describe the system architecture, not a pile of features
Your documentation should present a logical structure:
Agent / Planner
↓
Experiment Producer
↓
Kaggle Kernel Submitter
↓
Watcher / Poller
↓
Result Collector
↓
Research State Store
↓
Aggregator / Next-step Agent
And explain:
Producer initiates experiments
Watcher non-blockingly monitors remote kernel status
Collector downloads logs, outputs, and leaderboard scores
State Store persists experiment history, hypotheses, results, and failure reasons
Aggregator compares experiments and generates the next direction
This is far more compelling than listing “supports auto-submission, log downloads, and state persistence.”
4. Tell a concrete story
Technical documentation needs a scenario so readers understand why it matters.
For example:
A typical run looks like this:
1. The agent inspects the current solution and proposes three experiments.
2. It submits the first notebook to Kaggle and records the hypothesis.
3. Instead of blocking, the watcher tracks the remote kernel status.
4. When the run finishes, logs, outputs, and scores are collected.
5. The aggregator compares the result against previous experiments.
6. The next agent session resumes from the structured research state and decides what to try next.
This is more effective than abstract description because readers can picture the system at work.
5. Clearly explain how you differ from a typical agent wrapper
This is the most important part.
Proactively answer:
How is this different from just letting Claude Code / opencode / Hermes run on its own?
You could write:
Unlike a normal coding-agent wrapper, this system treats long-running experiments as external events rather than blocking tool calls. The agent does not need to stay alive while a Kaggle kernel is running. Instead, experiment metadata, hypotheses, logs, and results are persisted, allowing future agent sessions to resume from a shared research state.
This directly addresses the problem you’ve cared about all along.
6. Document the failure modes you address
This section will make your documentation look genuinely professional.
For example:
Failure modes this system is designed for:
- Agent blocks while waiting for remote execution.
- Agent forgets why an experiment was launched.
- Multiple experiments finish out of order.
- Logs and scores are not linked to hypotheses.
- New agent sessions cannot continue prior reasoning.
- Experiments are repeated because previous attempts were not summarized.
- Human users cannot inspect why the agent chose a direction.
This is more convincing than a feature list, because readers will feel:
You’ve been through these problems yourself — that’s why you know this system needs to exist.
7. Write your core design principles
To give your work intellectual depth, articulate your design principles.
For example:
Design principles:
1. Non-blocking by default
Long-running external jobs should not occupy an agent session.
2. Persistent research state
Every experiment should be linked to its hypothesis, code version, logs, outputs, score, and conclusion.
3. Agent-agnostic execution
The system should be able to use Claude Code, opencode, Hermes, or other agents as interchangeable workers.
4. Event-driven continuation
A finished kernel should trigger structured result collection and future reasoning, rather than relying on the original agent process to stay alive.
5. Minimal local state
Whenever possible, live platform state such as kernel status and submission quota should be queried from Kaggle instead of duplicated locally.
These principles map well onto what you’ve been thinking through.
8. Be explicit about what you’re not solving
This actually increases credibility.
For example:
Non-goals:
- This is not a new LLM model.
- This is not a replacement for Claude Code or opencode.
- This does not try to fully constrain the agent with a rigid workflow.
- This does not assume every experiment will improve the score.
- This is not initially optimized for large-scale distributed training.
This shows readers that you have a clear sense of your boundaries.
9. Write a roadmap
Don’t only document what’s already done — also describe what comes next.
For example:
Roadmap:
- Experiment registry
- Kaggle watcher
- Structured result collector
- Research memory / summary store
- Multi-agent experiment aggregator
- Langfuse-based tracing
- Automatic failure classification
- Self-improving skill updates
This makes the project feel like an evolving system rather than a one-off script.
Part III — A recommended documentation structure
Here is a structure I suggest for your README or technical writeup:
# Project Name
## 1. Motivation
Why are existing coding agents ill-suited for long-running Kaggle / research workflows?
## 2. Core Idea
Decouple agent reasoning, experiment execution, remote waiting, and result aggregation.
## 3. System Overview
Diagram: Producer / Watcher / Collector / State Store / Aggregator.
## 4. Example Workflow
From proposing an experiment → submitting the kernel → waiting → collecting results → summarizing → deciding the next step.
## 5. Key Abstractions
Experiment, Run, Hypothesis, Result, Research State, Agent Session.
## 6. Failure Modes Addressed
Blocking, forgetting, repeated experiments, out-of-order results, inability to resume context.
## 7. Design Principles
Non-blocking, persistent state, agent-agnostic, event-driven, minimal local state.
## 8. Comparison with Existing Tools
Claude Code / opencode / Hermes / LangGraph / ML experiment trackers.
## 9. Current Status
What has been implemented vs. what is still in design.
## 10. Roadmap
Plans for future development.
Part IV — The single most important point
When reading others’ work, look for:
Problem → insight → abstraction → trade-offs → what you can borrow
When writing your own, follow the same sequence:
Pain point → core position → system abstraction → design trade-offs → concrete scenario
Don’t write:
I added support for features A, B, C, and D.
Write:
Why is the existing approach not good enough?
What is my core judgment?
How am I reframing the problem?
What capabilities does the system gain as a result?
Only then will your work feel like it has genuine ideas behind it — rather than just another automation script.