Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Now that we've diagnosed Query Planning failures, it's time to fix them. Let's compare when each of these three patterns shines.

Why Retrieval Planning?

In the previous post, we examined three failure points in Query Planning:

Decomposition: Breaking questions incorrectly
Sequencing: Wrong execution order
Grounding: Queries not matching documents

Three main approaches solve these problems:

Pattern	Core Idea	One-liner
ReAct	Think → Act → Observe loop	"Take a step, see what happens, think again"
Self-Ask	Generate follow-up questions	"What do I need to know first to answer this?"
Plan-and-Solve	Plan everything first, then execute	"Draw the map, then start walking"

Pattern 1: ReAct (Reasoning + Acting)

Core Structure

text

Thought → Action → Observation → Thought → Action → ... → Answer

ReAct alternates between reasoning and acting at each step. It decides the next action based on search results, adapting flexibly to unexpected situations.

How It Works

python

class ReActAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str, max_steps: int = 5) -> str:
        context = f"Question: {query}\n"

        for step in range(max_steps):
            # 1. Thought: Reason about what to do next
            thought = self.llm.generate(
                f"{context}\nThought {step+1}:"
            )
            context += f"Thought {step+1}: {thought}\n"

            # Check termination
            if "Final Answer:" in thought:
                return self.extract_answer(thought)

            # 2. Action: Decide search query
            action = self.llm.generate(
                f"{context}\nAction {step+1}: Search["
            )
            search_query = action.split("]")[0]
            context += f"Action {step+1}: Search[{search_query}]\n"

            # 3. Observation: Execute search and observe results
            results = self.retriever.search(search_query)
            observation = self.format_results(results)
            context += f"Observation {step+1}: {observation}\n"

        return "Could not find answer within max steps"

Execution Example

text

Question: What did Microsoft's CEO say when OpenAI's CEO was fired?

Thought 1: I need to find out when OpenAI's CEO was fired first.
Action 1: Search[OpenAI CEO fired date]
Observation 1: Sam Altman was fired by OpenAI's board on November 17, 2023.

Thought 2: Now I need to find what Microsoft's CEO said on that date.
Action 2: Search[Satya Nadella November 17 2023 Sam Altman]
Observation 2: Satya Nadella expressed support for Sam Altman and...

Thought 3: I have enough information to answer.
Final Answer: Satya Nadella expressed support for Sam Altman...

Pros and Cons

Pros	Cons
Dynamic adaptation: Changes path based on results	High token usage: Full context each step
Easy debugging: Thoughts expose reasoning	Infinite loop risk: Needs termination
Strong exception handling: Explores alternatives	Low consistency: Different paths for same question

When to Use

Questions are unpredictable (diverse domains, open-ended)
Strategy needs to change based on results
Debugging is important (need to trace reasoning)

Pattern 2: Self-Ask

Core Structure

text

Question → Follow-up Question → Intermediate Answer → ... → Final Answer

Self-Ask repeatedly asks "What do I need to know first to answer this?" It explicitly generates sub-questions, answers each, then combines for the final answer.

How It Works

python

class SelfAskAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str) -> str:
        context = f"Question: {query}\n"
        context += "Are follow-up questions needed here: "

        while True:
            # Decide if follow-up needed
            needs_followup = self.llm.generate(context)

            if "No" in needs_followup or "Final Answer" in needs_followup:
                # Generate final answer
                final = self.llm.generate(
                    f"{context}\nSo the final answer is:"
                )
                return final

            # Generate follow-up question
            context += "Yes.\n"
            followup = self.llm.generate(
                f"{context}Follow-up question:"
            )
            context += f"Follow-up question: {followup}\n"

            # Search and answer follow-up
            results = self.retriever.search(followup)
            intermediate = self.generate_intermediate_answer(followup, results)
            context += f"Intermediate answer: {intermediate}\n"
            context += "Are follow-up questions needed here: "

Execution Example

text

Question: Who was CEO before Sam Altman returned?

Are follow-up questions needed here: Yes.
Follow-up question: When did Sam Altman return as OpenAI CEO?
Intermediate answer: He returned on November 22, 2023.

Are follow-up questions needed here: Yes.
Follow-up question: Who was OpenAI CEO just before November 22, 2023?
Intermediate answer: Emmett Shear was interim CEO from November 20.

Are follow-up questions needed here: No.
So the final answer is: The CEO before Sam Altman's return was Emmett Shear.

Pros and Cons

Pros	Cons
Structured decomposition: Explicit sub-questions	Depth limited: Too many hops degrade performance
Intermediate answers cacheable	Weak branching: Optimized for linear chains
Easy verification: Can check each intermediate	Hard to parallelize: Sequential dependencies

When to Use

Chain-structured multi-hop questions (A → B → C)
Need to cache or verify intermediate results
Question decomposition structure is clear

Pattern 3: Plan-and-Solve

Core Structure

text

Question → Plan (all steps) → Execute Step 1 → Execute Step 2 → ... → Answer

Plan-and-Solve creates a complete plan first, then executes sequentially. Dependencies and parallelization are identified during planning.

How It Works

python

class PlanAndSolveAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str) -> str:
        # 1. Planning: Create complete plan
        plan = self.create_plan(query)

        # 2. Execution: Follow the plan
        results = {}
        for step in plan.steps:
            # Inject results from dependent steps
            resolved_query = self.resolve_dependencies(step, results)

            # Execute search
            search_results = self.retriever.search(resolved_query)
            results[step.id] = self.extract_answer(step, search_results)

        # 3. Synthesis: Combine results
        return self.synthesize(query, results)

    def create_plan(self, query: str) -> Plan:
        prompt = f"""
        Question: {query}

        Create a step-by-step plan to answer this question.
        For each step, specify:
        - step_id: unique identifier
        - query: what to search for
        - depends_on: list of step_ids this depends on (empty if none)

        Output as JSON.
        """
        plan_json = self.llm.generate(prompt)
        return Plan.from_json(plan_json)

Execution Example

text

Question: How did Tesla's stock and competitors react after they cut prices?

=== PLANNING PHASE ===
{
  "steps": [
    {"id": "s1", "query": "Tesla price cut date", "depends_on": []},
    {"id": "s2", "query": "Tesla stock reaction {s1.date}", "depends_on": ["s1"]},
    {"id": "s3", "query": "Competitor reaction Tesla price cut", "depends_on": ["s1"]},
    {"id": "s4", "query": "Synthesis", "depends_on": ["s2", "s3"]}
  ]
}

=== EXECUTION PHASE ===
Step s1: Tesla cut prices on January 13, 2023.
Step s2 (parallel): Tesla stock rose 8%.
Step s3 (parallel): Competitors responded with their own price cuts.
Step s4: [Synthesis]

=== FINAL ANSWER ===
After Tesla's January 2023 price cut, stock rose 8%,
and competitors responded with their own cuts.

Pros and Cons

Pros	Cons
Parallel execution: Dependencies enable optimization	Hard to modify: Difficult to change path mid-execution
Predictable: Can review plan before execution	Plan failure = total failure
Efficient: Minimizes unnecessary searches	Complex questions degrade plan quality

When to Use

Question structure is predictable
Need parallel processing for speed
Need to review/approve plan before execution

Pattern Comparison

Structural Comparison

text

ReAct:        Think → Act → Observe → Think → Act → ... (loop)
Self-Ask:     Question → Follow-up → Answer → Follow-up → ... (chain)
Plan-Solve:   Plan all steps → Execute s1 → Execute s2 → ... (sequential/parallel)

Detailed Comparison

Criteria	ReAct	Self-Ask	Plan-and-Solve
Adaptability	High (re-evaluates each step)	Medium	Low (plan is fixed)
Efficiency	Low (high token usage)	Medium	High (parallelizable)
Predictability	Low	Medium	High
Debugging	Easy (trace Thoughts)	Easy (trace Follow-ups)	Medium
Complex questions	Strong	Medium	Weak (depends on plan quality)
Implementation	Medium	Easy	Hard

Decision Flow

text

Assess question type
    │
    ├─ Unpredictable, open-ended ──────────→ ReAct
    │
    ├─ Clear chain structure (A→B→C) ──────→ Self-Ask
    │
    └─ Parallelizable, clear structure ────→ Plan-and-Solve

Hybrid Approaches: Mix Them in Practice

In production, hybrids are often more effective than pure patterns.

Plan-then-ReAct

python

class HybridAgent:
    """Start with Plan-and-Solve, fall back to ReAct on failure"""

    def run(self, query: str) -> str:
        # 1. Try planning first
        plan = self.create_plan(query)

        # 2. Execute plan
        for step in plan.steps:
            try:
                result = self.execute_step(step)
                if not self.is_valid(result):
                    raise InvalidResultError()
            except Exception:
                # 3. Fall back to ReAct on failure
                return self.react_fallback(query, step)

        return self.synthesize(results)

    def react_fallback(self, query: str, failed_step: Step) -> str:
        """Switch to ReAct mode for flexible resolution"""
        context = f"Original question: {query}\n"
        context += f"Failed at: {failed_step.query}\n"
        context += "Switching to exploratory mode...\n"

        return self.react_agent.run(context)

Self-Ask with Parallel Execution

python

class ParallelSelfAsk:
    """Decompose with Self-Ask, execute independent questions in parallel"""

    def run(self, query: str) -> str:
        # 1. Generate all follow-up questions first
        followups = self.generate_all_followups(query)

        # 2. Analyze dependencies
        deps = self.analyze_dependencies(followups)

        # 3. Parallel for independent, sequential for dependent
        results = {}
        for group in self.topological_groups(deps):
            # Execute same-group questions in parallel
            group_results = parallel_execute(
                [self.answer_followup(q) for q in group]
            )
            results.update(group_results)

        return self.synthesize(query, results)

Implementation Tips

1. Clear Termination Conditions

python

# Prevent infinite loops in ReAct
MAX_STEPS = 7
CONFIDENCE_THRESHOLD = 0.8

def should_stop(thought: str, step: int, confidence: float) -> bool:
    if step >= MAX_STEPS:
        return True
    if "Final Answer" in thought:
        return True
    if confidence > CONFIDENCE_THRESHOLD:
        return True
    return False

2. Search Failure Handling

python

def search_with_fallback(query: str) -> List[Document]:
    # Primary: Exact search
    results = retriever.search(query)
    if results:
        return results

    # Secondary: Query expansion
    expanded = expand_query(query)
    results = retriever.search(expanded)
    if results:
        return results

    # Tertiary: Keyword extraction
    keywords = extract_keywords(query)
    return retriever.search(" ".join(keywords))

3. Context Compression

python

def compress_context(context: str, max_tokens: int = 2000) -> str:
    """Compress long context to save tokens"""
    if count_tokens(context) <= max_tokens:
        return context

    # Keep only recent N steps
    steps = parse_steps(context)
    recent = steps[-3:]  # Last 3 steps

    # Summarize earlier steps
    summary = summarize(steps[:-3])

    return f"[Summary of earlier steps: {summary}]\n" + format_steps(recent)

Conclusion

The three patterns are complementary, not competing.

text

ReAct:         Maximum flexibility, strong for unpredictable questions
Self-Ask:      Structured decomposition, optimal for chain questions
Plan-Solve:    Maximum efficiency, enables parallelization and review

Production Recommendations:

Default to Plan-and-Solve (efficient)
Fall back to ReAct on plan failure (flexible)
Use Self-Ask for clear chain questions (structured)

Multi-hop RAG performance ultimately depends on choosing the right pattern for the situation.

Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve