jatin.blog ~ $
$ cat ai-engineering/planning-vs-reactive-agents.md

Planning Agents vs Reactive Agents

When to plan ahead vs react step by step: ReAct vs plan-and-execute vs Tree-of-Thoughts, the cost of replanning, and the speculative-execution parallel.

Jatin Bansal@blog:~/ai-engineering$ open planning-vs-reactive-agents

A migration-assistant agent is rewriting 400 SQL stored procedures from Oracle to Postgres. The first run uses a pure ReAct loop: pick a proc, read it, write the Postgres version, run the test, fix what failed, move to the next. After 90 minutes it has finished 14 procedures and burned $38 of tokens because every iteration re-prefills the running history of all 14 prior conversions. The second run swaps in a plan-and-execute architecture: the planner emits a 400-step typed plan in one call, an executor with a cheaper model handles each step in isolation, and a re-planner only fires on failure. Same 400 procs, 22 minutes, $4.70. The model didn’t get smarter between runs; the loop did. When to plan ahead versus when to react step by step is the central control-flow decision in an agent, and it’s not the same answer for every task.

Opening bridge

Yesterday’s piece on the agent loop walked through ReAct as the canonical thought-action-observation cycle and previewed plan-and-execute as the batching alternative. It listed the trade-offs in a table and stopped there. Today we open up that table: when each pattern wins, what the in-between architectures look like, how to reason about the cost of replanning, and how to wire a planner/executor split that doesn’t sacrifice the adaptability ReAct buys you. The Agents subtree is structured around the loop, the harness, and the choices inside both — this article is the choice that lives inside the loop body.

What separates a planner from a reactor

Both architectures are agents in Simon Willison’s working sense — they run tools in a loop to achieve a goal. The difference is where the decision-making lives.

  • Reactive agents decide what to do next at each step. The ReAct loop is the archetype: emit one thought, one action, observe the result, then think again from the updated context. The agent has no commitment beyond the current step; it can swerve on the basis of the most recent observation.
  • Planning agents commit to a multi-step plan up front, then execute against it. The plan is a typed artifact — usually a JSON list of steps — produced by one expensive call to a strong model. Execution is mechanical: dispatch each step, splice the result back, move on.

The reactive agent does online reflection inside the action sequence; the planning agent does whole-task reflection before any action. Both are real strategies. They have different cost curves, different failure modes, and different sweet spots. Production systems converge on a hybrid where the boundary between the two shifts depending on how confident the agent is about what comes next.

The distributed-systems parallel

Three analogies map cleanly here, and each one explains a different facet.

Speculative vs reactive execution in a CPU. A modern superscalar processor doesn’t wait for each instruction to retire before issuing the next — it speculatively executes down a predicted branch and rolls back if the prediction misses. Plan-and-execute is speculative execution at the agent level. The planner emits a sequence of “predicted” tool calls; the executor runs them as if the world won’t change; if a step fails or returns a surprise, the re-planner is the pipeline flush — expensive but recoverable. Pure ReAct is the in-order, non-speculative pipeline: it waits at every step, so it can’t get ahead, but it never has to throw work away. The cost-of-replanning math is the cost-of-misprediction math; on tasks with high branch-prediction accuracy (well-specified tasks with predictable structure), speculation wins big.

Query planner vs interpreter. A relational database can interpret a SQL query (parse, walk the tree, execute as you walk) or it can plan it (parse, optimize, emit an execution graph, run the graph). Interpretation is cheaper for one-shot trivial queries; planning amortizes its up-front cost across complex queries with sharable sub-expressions. The classical SQL planner identifies join orders and pushes predicates down — exactly what a Tree-of-Thoughts (Yao et al., NeurIPS 2023) agent does when it evaluates multiple candidate plans and picks the best before committing. The interpreted path is ReAct; the planned-and-optimized path is plan-and-execute with whole-task reasoning. The right choice is determined by the query, not by ideology.

Make vs a shell script. A Makefile declares a dependency graph and lets the build system parallelize and skip steps; a shell script runs commands one after another. Plan-and-execute is make — the plan is a DAG (when the planner emits dependencies) or a sequence, and the executor can parallelize independent steps. ReAct is the shell script — readable, sequential, fully adaptive, but blocked at every step. LLMCompiler (Kim et al., ICML 2024) is the explicit DAG variant: its planner emits a function-call graph and its executor runs the independent edges in parallel, reporting up to 3.7× latency speedup and 6.7× cost reduction over ReAct on tasks where the dependency structure is exposable.

The four points on the curve

Plot agent architectures on a single axis from “decide each step at runtime” to “decide all steps before any action.” Four points are worth naming, in increasing planning-up-front:

  1. Pure ReAct. One LLM call per step. Maximally adaptive, maximally chatty. The default for chat agents and exploratory work. Covered in detail in the agent loop article.

  2. Plan-and-execute (sequential). One planner call up front emits a JSON list of typed steps; an executor (often a cheaper model, sometimes raw code) runs the steps in order. After each step or on failure, an optional re-planner can revise the remaining plan. LangChain’s canonical writeup is the practical reference; the research lineage is Plan-and-Solve (Wang et al., ACL 2023).

  3. ReWOO (reasoning without observation). A variant of plan-and-execute where the planner emits the entire plan including placeholders for observations. The executor runs all tool calls (often in parallel), then a final “Solver” call synthesizes the answer from the collected observations. The planner never sees any observation; the solver sees them all at once. ReWOO (Xu et al., 2023) trades adaptability for token efficiency — one planner call plus N tool calls plus one solver call replaces N×2 ReAct calls, dropping token usage substantially on tasks where the plan is robust to surprises.

  4. Tree-of-Thoughts (ToT). Multiple candidate plans (or partial plans) are generated and scored; the agent explores the tree with backtracking, much like a chess engine’s search. The Tree-of-Thoughts paper reports lifting Game-of-24 success from 4% (chain-of-thought) to 74%, but at multi-call cost per step that only pays off when the task has high branching and a cheap evaluator (a unit test, a constraint check, a verifier model). Production ToT is rare; production uses of ToT for hard sub-problems inside a larger agent are increasingly common.

Two patterns deserve flags but live one chapter away from the planner/reactor decision proper:

  • Reflexion. Reflexion (Shinn et al., NeurIPS 2023) is post-hoc reflection — after a trial fails, the agent writes a verbal “what went wrong” note to memory and tries again. It can wrap either a reactive or a planning agent. The dedicated reflection article picks up reflection as a first-class memory operation — the Generative Agents importance-threshold trigger, the salient-question-then-evidence-anchored-insight pipeline, and the self-reinforcing-error failure mode that bites both Reflexion-style and Generative-Agents-style implementations.
  • The five Anthropic workflow patterns. Anthropic’s “Building effective agents” carves up the space differently — prompt-chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer. Read it as orthogonal to today’s axis: each of those workflows can be reactive or planned, depending on where you put the LLM calls.

When planning beats reacting

The honest answer is: plan when the task has low entropy per step. Three signs the planner pays off:

  • The step sequence is largely deterministic given the goal. Bulk migrations, ETL jobs, structured reports, “scrape these 50 URLs and produce a CSV” — the model knows the steps from the goal alone. Reacting at every step burns model calls re-deriving what the planner already knows.
  • Steps are weakly coupled. If step 5’s input doesn’t strongly depend on step 4’s output, the executor can use a smaller, cheaper model per step. The planner’s whole-task reasoning is too expensive to repeat per step; the executor’s narrow-step reasoning is cheap. Plan-and-execute is the architecture that exploits this asymmetry.
  • The cost of replanning is low. When a step fails and the plan needs revision, how expensive is it to re-plan? If the re-planner is small and the failure is localized, replanning is cheap and you get adaptability back. If re-planning needs the strong model and re-derives most of the plan, you’ve lost ReWOO’s advantage.

Three signs the planner doesn’t pay off and ReAct is right:

  • The task is exploratory. Open-ended research, debugging, “figure out why X is failing.” Each observation genuinely changes what to do next; a plan made before the first observation is fiction. ReAct’s adaptivity is the point.
  • Tool results vary in shape. If the executor has to make non-trivial decisions about how to interpret each result, you’ve snuck the planner back into the executor. Just run ReAct.
  • The task is short. For 1-3 step tasks, the planner overhead doesn’t amortize. The single extra LLM call of a planner-then-executor pipeline is a 33% latency penalty on a 3-step task; it’s a 3% penalty on a 30-step task.

A useful diagnostic: if your planner emits a fixed plan and execution always follows it without revision, you’re not running an agent — you’re running a workflow. That’s not a criticism; it’s a clarification. Anthropic’s “Building effective agents” draws this line explicitly. Workflows are simpler, cheaper, and easier to test. Reach for an agent only when the dynamism is load-bearing.

The cost of replanning

The planning vs reacting trade-off boils down to one quantity: the expected cost of replanning, multiplied by the probability that you’ll have to.

Let:

  • Cp = cost of a single planner call (large model, whole-task reasoning)
  • Ce = cost of a single executor step (small model, narrow reasoning)
  • Cr = cost of ReAct step (large model, narrow + whole-task reasoning amortized)
  • N = number of steps in the task
  • p = probability that the plan needs revision after some step (the “branch miss rate”)
  • k = average step count at which the revision fires (replanning loses partial progress)

Approximate total costs:

  • ReAct: N · Cr
  • Plan-and-execute, no revision: Cp + N · Ce
  • Plan-and-execute, with revision probability p at step k: Cp + N · Ce + p · (Cp + (N - k) · Ce)

The plan-and-execute path wins when Cp + N · Ce + p · Cp · (something) < N · Cr. With Cr ≈ 3 · Ce (a typical large/small model gap) and Cp ≈ 5 · Ce, the break-even is somewhere around N > 4 with p < 0.3. The numbers shift with your provider’s prices and your model split, but the shape doesn’t: planning pays off as N grows and as p shrinks. The mistake is to use planning on a task with high p — you collect the planner overhead and the replanning penalty and lose to ReAct on both ends.

This is exactly the branch-prediction calculation a CPU runs: speculate when the prediction is likely right, fall back to in-order execution when branches are unpredictable. Tasks where p is structurally high (exploratory research, debugging, chat) live in the unpredictable regime and want ReAct. Tasks with low p (bulk transformations, well-defined data pipelines) live in the predictable regime and want plan-and-execute.

Code: a plan-and-execute agent in Python

A migration-assistant pattern. The planner emits a typed Plan; the executor processes each step with a cheaper model; a re-planner fires if a step fails. Install: pip install anthropic pydantic. Uses the Anthropic SDK and Pydantic for the plan schema — same pattern as the structured-output article.

python
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
import json, time
from typing import Literal
from pydantic import BaseModel, Field
from anthropic import Anthropic

client = Anthropic()

# --- typed plan schema (the planner emits this; the executor consumes it) ---
class PlanStep(BaseModel):
    id: int
    action: Literal["read_file", "transform", "write_file", "run_tests"]
    args: dict
    depends_on: list[int] = Field(default_factory=list)  # for parallel execution

class Plan(BaseModel):
    goal: str
    steps: list[PlanStep]

PLANNER_MODEL = "claude-opus-4-7"
EXECUTOR_MODEL = "claude-haiku-4-5"   # cheaper executor; pick what fits your stack

# --- planner: one large-model call, structured output via tool use ---
PLAN_TOOL = {
    "name": "submit_plan",
    "description": "Submit the multi-step plan for the migration task.",
    "input_schema": Plan.model_json_schema(),
}

def plan(goal: str) -> Plan:
    resp = client.messages.create(
        model=PLANNER_MODEL,
        max_tokens=4096,
        tools=[PLAN_TOOL],
        tool_choice={"type": "tool", "name": "submit_plan"},   # force the call
        messages=[{"role": "user", "content": f"Produce a step-by-step plan to: {goal}"}],
    )
    tool_block = next(b for b in resp.content if b.type == "tool_use")
    return Plan.model_validate(tool_block.input)

# --- executor: one small-model call per step (or raw code where deterministic) ---
def execute_step(step: PlanStep, state: dict) -> dict:
    # Deterministic actions short-circuit the LLM entirely.
    if step.action == "read_file":
        return {"text": open(step.args["path"]).read()}
    if step.action == "write_file":
        open(step.args["path"], "w").write(step.args["content"])
        return {"ok": True}
    if step.action == "run_tests":
        # subprocess.run(...) in real life; stubbed here
        return {"passed": True}

    # The transform action is the only one that needs the model.
    resp = client.messages.create(
        model=EXECUTOR_MODEL,
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"Transform per spec.\n\nInput:\n{step.args['input']}\n\nSpec:\n{step.args['spec']}",
        }],
    )
    return {"output": "".join(b.text for b in resp.content if b.type == "text")}

# --- the orchestrator: plan once, execute sequentially, replan on failure ---
def run(goal: str, *, max_replans=2, deadline_s=600.0):
    started = time.monotonic()
    current_plan = plan(goal)
    state: dict = {}
    replans = 0

    while True:
        for step in current_plan.steps:
            if time.monotonic() - started > deadline_s:
                raise TimeoutError(f"deadline exceeded at step {step.id}")
            try:
                result = execute_step(step, state)
                state[step.id] = result
                # progress check: tests failed? trigger replan
                if step.action == "run_tests" and not result.get("passed", True):
                    raise RuntimeError(f"tests failed at step {step.id}")
            except Exception as e:
                if replans >= max_replans:
                    raise RuntimeError(f"replan budget exhausted: {e}")
                replans += 1
                # Replan from the failed step forward, with the failure as context.
                current_plan = replan(goal, current_plan, step.id, str(e), state)
                break   # restart the for-loop with the new plan
        else:
            return state   # ran to completion without a failure-triggered break

def replan(goal: str, prior: Plan, failed_at: int, error: str, state: dict) -> Plan:
    """One re-planner call. Pass enough state for the planner to revise meaningfully."""
    summary = json.dumps({k: v for k, v in state.items() if k < failed_at})
    resp = client.messages.create(
        model=PLANNER_MODEL,
        max_tokens=4096,
        tools=[PLAN_TOOL],
        tool_choice={"type": "tool", "name": "submit_plan"},
        messages=[{
            "role": "user",
            "content": (
                f"Revise the plan for goal: {goal}\n"
                f"Original plan:\n{prior.model_dump_json(indent=2)}\n"
                f"Step {failed_at} failed with: {error}\n"
                f"Prior step outputs:\n{summary}\n"
                f"Emit a new plan that picks up from step {failed_at}."
            ),
        }],
    )
    tool_block = next(b for b in resp.content if b.type == "tool_use")
    return Plan.model_validate(tool_block.input)

Three things to notice. First, the planner uses forced tool choice (tool_choice={"type": "tool", ...}) to guarantee the plan comes back as a typed object — the same pattern as schema-coerced output. Second, the executor short-circuits the LLM for deterministic actions (file I/O, test execution) and only calls the model for transformations; this is where the cost savings come from. Third, the replan budget is bounded — two replans is a defensible default. If you’re replanning more than that on a single task, the planner is not the right tool for this task and ReAct probably is.

Code: a planning agent in TypeScript with LangGraph

LangGraph is the framework with the cleanest plan-and-execute primitives. Install: npm install @langchain/langgraph @langchain/anthropic @langchain/core zod. Uses LangGraph and the LangChain Anthropic provider.

typescript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import { ChatAnthropic } from "@langchain/anthropic";
import { StateGraph, Annotation, END, START } from "@langchain/langgraph";
import { z } from "zod";

// --- typed plan schema ---
const PlanStep = z.object({
  id: z.number().int(),
  action: z.enum(["read_file", "transform", "write_file", "run_tests"]),
  args: z.record(z.any()),
  dependsOn: z.array(z.number().int()).default([]),
});
const Plan = z.object({ goal: z.string(), steps: z.array(PlanStep) });
type PlanT = z.infer<typeof Plan>;
type PlanStepT = z.infer<typeof PlanStep>;

// --- graph state: the plan, executed steps, and a replan counter ---
const AgentState = Annotation.Root({
  goal: Annotation<string>,
  plan: Annotation<PlanT | null>({ default: () => null, reducer: (_, x) => x }),
  results: Annotation<Record<number, unknown>>({
    default: () => ({}),
    reducer: (acc, x) => ({ ...acc, ...x }),
  }),
  failedAt: Annotation<number | null>({ default: () => null, reducer: (_, x) => x }),
  replans: Annotation<number>({ default: () => 0, reducer: (_, x) => x }),
});

const planner = new ChatAnthropic({ model: "claude-opus-4-7", temperature: 0 });
const executor = new ChatAnthropic({ model: "claude-haiku-4-5", temperature: 0 });

// --- nodes: plan, execute, replan, decide-next ---
async function planNode(s: typeof AgentState.State) {
  const llm = planner.withStructuredOutput(Plan);
  const plan = await llm.invoke(`Produce a step-by-step plan to: ${s.goal}`);
  return { plan };
}

async function executeNode(s: typeof AgentState.State) {
  const results: Record<number, unknown> = {};
  for (const step of s.plan!.steps) {
    if (step.id in s.results) continue;
    try {
      results[step.id] = await runStep(step);
      if (step.action === "run_tests" && !(results[step.id] as any).passed) {
        return { results, failedAt: step.id };
      }
    } catch (e) {
      results[step.id] = { error: String(e) };
      return { results, failedAt: step.id };
    }
  }
  return { results, failedAt: null };
}

async function replanNode(s: typeof AgentState.State) {
  const llm = planner.withStructuredOutput(Plan);
  const plan = await llm.invoke(
    `Revise the plan for: ${s.goal}\nFailed at step ${s.failedAt}\n` +
    `Prior results:\n${JSON.stringify(s.results, null, 2)}\n` +
    `Emit a new plan that picks up from step ${s.failedAt}.`,
  );
  return { plan, replans: s.replans + 1, failedAt: null };
}

// --- conditional edge: success | replan | abort ---
function route(s: typeof AgentState.State): "execute" | "replan" | typeof END {
  if (s.failedAt === null) return END;
  if (s.replans >= 2) return END;          // abort with partial results
  return "replan";
}

const graph = new StateGraph(AgentState)
  .addNode("plan", planNode)
  .addNode("execute", executeNode)
  .addNode("replan", replanNode)
  .addEdge(START, "plan")
  .addEdge("plan", "execute")
  .addConditionalEdges("execute", route, { replan: "replan", [END]: END })
  .addEdge("replan", "execute")
  .compile();

async function runStep(step: PlanStepT): Promise<unknown> {
  // deterministic short-circuits; LLM only for transforms
  if (step.action === "transform") {
    const out = await executor.invoke(
      `Transform per spec.\nInput:\n${step.args.input}\nSpec:\n${step.args.spec}`,
    );
    return { output: out.content };
  }
  // read_file / write_file / run_tests handled by your runtime
  return { ok: true };
}

export async function ask(goal: string) {
  return graph.invoke({ goal });
}

The graph makes the planner/executor/replanner split explicit. Each node is one concern: plan calls the large model once, execute runs the typed plan against the executor, replan revises on failure, and the route function is the policy that bounds the replan budget. The pattern matches the Python version one-for-one — the LangGraph value is the explicit state machine and the observability that comes with it, not the orchestration math.

For the in-between architectures: ReWOO is this graph with the execute node fanned out into parallel tool calls and a final solver node appended; LLMCompiler is the same but with the planner emitting a DAG and the executor scheduling independent nodes concurrently; Tree-of-Thoughts replaces the single planner call with a planner-and-scorer pair and adds backtracking on low-score branches.

Trade-offs, failure modes, gotchas

Stale plans. The planner’s view of the world is fixed at plan time; by step 7, the world has moved. The longer the plan and the more it touches external state, the staler it gets. Mitigations: keep plans short (under ~10 steps), make every step idempotent, and validate preconditions in the executor before running each step. The validator can be deterministic (“does this file still exist?”) or a one-line LLM check; the point is to fail fast when the plan no longer applies.

Replanning cascades. A replan triggers, the new plan also fails, you replan again. In the worst case the agent enters a replan loop where each new plan is wrong in a new way. The fix is the same as the ReAct no-progress detector from the agent loop article: cap the replans (2 is a reasonable default), and include the prior failed plans in the re-planner’s context so it doesn’t re-emit the same broken structure. Without the prior context, the re-planner has no memory; with it, the re-planner can do real learning across attempts.

The planner doesn’t see the executor’s full output. A common ReWOO-style failure: the planner emits placeholders like <result_of_step_3> and the solver references them, but the executor’s actual output for step 3 is 50 KB of irrelevant tool noise. The solver chokes on token budget or misses the relevant fact. Mitigation: the executor should summarize each step’s output before storing it for the solver — same pattern as JIT context fetching. Cheaper still, define the tool to return a structured object with only the fields the plan will actually consume.

Over-planning short tasks. A planner-and-executor pipeline is a bad fit for “what’s the weather in Tokyo?” — the planner overhead dominates the work. Two mitigations: route by task length (a cheap classifier upstream decides ReAct vs plan-and-execute) or let the planner emit a “trivial — execute directly” tag that skips the executor and runs ReAct instead. Some production systems combine both: a router picks the architecture, and the chosen architecture can still fall back to the other.

Forced tool choice on the planner can hide bad plans. If you tool_choice: { type: "tool", name: "submit_plan" } to guarantee a typed plan, the model will always emit a plan — even when the right answer is “this task isn’t planable, run ReAct.” Either include “this task isn’t planable” as a valid plan output, or let the model emit free text and validate post-hoc. The trade-off is between strict-typed input to the executor and the planner’s ability to refuse.

Plan adherence drift in the executor. When the executor is a smaller model, it sometimes goes off-plan — adds a step the plan didn’t include, skips a step that looked redundant, or interprets a step’s args creatively. The cleanest fix is to make each step’s action a typed enum (as in the code above) and dispatch to deterministic code wherever possible; the only LLM call should be for steps that genuinely require model judgment. This is also where structured output and constrained decoding earn their keep — typed step args close the gap between “the plan said X” and “the executor did X.” Over many steps this kind of executor drift becomes its own failure mode — the long-horizon reliability article covers detection (drift scoring, MOP precursors) and recovery (checkpoint-and-restart) for that regime.

Parallelism vs idempotency. LLMCompiler-style parallel execution is a huge latency win but only safe when steps are truly independent. Two steps that both call update_ticket for the same ticket aren’t independent — they race. The planner’s depends_on field needs to be authoritative; the executor should refuse to parallelize steps unless the dependency graph is acyclic and complete. The blast radius of a wrong depends_on is exactly the blast radius of a wrong concurrency primitive in any distributed system.

Caching interacts with planning. Prompt caching loves long static prefixes, which is exactly what ReAct produces (the growing conversation history is mostly stable from step to step). Plan-and-execute breaks this: the planner sees one prompt, the executor sees many small prompts, and there’s no shared prefix across them. The mitigation is to cache the planner’s plan-emission prompt (the system prompt + tool schema) and to cache the executor’s task-template prompt (system prompt + step-template). Each cache hits on its own pattern; you just don’t get the cross-step cache reuse ReAct gets for free.

Cost accounting becomes harder. With ReAct, every step’s tokens are visible on a single conversation. With plan-and-execute, the planner, executor (N calls), and re-planner are separate conversations with different cost lines. The harness needs to aggregate. Most off-the-shelf observability tools (Langfuse, LangSmith) handle this; rolling your own and forgetting to aggregate is how budget blowouts hide.

Further reading

  • Anthropic — “Building effective agents” — the December 2024 piece that draws the workflow-vs-agent line and lays out the five orthogonal patterns (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer). Read alongside today’s article for the full taxonomy of “where to put the LLM calls.”
  • LangChain — “Plan-and-Execute Agents” — the canonical engineering writeup on the planner/executor split, with reference implementations for plan-and-execute, ReWOO, and LLMCompiler. The clearest single source on the in-between architectures named in this article.
  • Tree-of-Thoughts (Yao et al., NeurIPS 2023) — the paper that demonstrated deliberate search over candidate plans, with the eye-popping Game-of-24 result (4% → 74%). The cleanest reference for when an agent should plan multiple candidates before committing to one.
  • LLMCompiler (Kim et al., ICML 2024) — the DAG-emitting planner with parallel execution, reporting 3.7× latency speedup and 6.7× cost reduction over ReAct on tasks with exposable dependency structure. The clearest argument that “agent” and “make” can be the same thing.
  • The Agent Loop: ReAct and Its Descendants — the loop body that today’s piece sits on top of. ReAct is the reactive baseline; plan-and-execute is the alternative; both share the same six-step iteration covered in detail there.
  • Long-Horizon Task Reliability — the cost-of-replanning math from this piece composes with the cost-of-restart math there. Replanning is cheap drift correction; restart-from-checkpoint is expensive drift correction; the choice depends on how much state you’d lose to a restart.
  • Structured Output: JSON Mode and Schema Coercion — the schema-coercion mechanics behind the typed Plan object. Plan-and-execute is a structured-output use case at its core: the model’s primary deliverable is a JSON list of typed steps.
  • Multi-Agent Orchestration — the next step up from the single-agent control-flow choice. When planning vs reacting isn’t the right axis any more — when the task wants parallel subagents with isolated context windows — supervisor/swarm/hierarchical patterns are how you scale out without losing the termination story.