Models & Algorithms

Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Now that we've diagnosed Query Planning failures, it's time to fix them. Let's compare when each of these three patterns shines.

Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Now that we've diagnosed Query Planning failures, it's time to fix them. Let's compare when each of these three patterns shines.

Why Retrieval Planning?

In the previous post, we examined three failure points in Query Planning:

  • Decomposition: Breaking questions incorrectly
  • Sequencing: Wrong execution order
  • Grounding: Queries not matching documents

Three main approaches solve these problems:

PatternCore IdeaOne-liner
**ReAct**Think → Act → Observe loop"Take a step, see what happens, think again"
**Self-Ask**Generate follow-up questions"What do I need to know first to answer this?"
**Plan-and-Solve**Plan everything first, then execute"Draw the map, then start walking"

Pattern 1: ReAct (Reasoning + Acting)

Core Structure

text
Thought → Action → Observation → Thought → Action → ... → Answer

ReAct alternates between reasoning and acting at each step. It decides the next action based on search results, adapting flexibly to unexpected situations.

How It Works

python
class ReActAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str, max_steps: int = 5) -> str:
        context = f"Question: {query}\n"

        for step in range(max_steps):
            # 1. Thought: Reason about what to do next
            thought = self.llm.generate(
                f"{context}\nThought {step+1}:"
            )
            context += f"Thought {step+1}: {thought}\n"

            # Check termination
            if "Final Answer:" in thought:
                return self.extract_answer(thought)

            # 2. Action: Decide search query
            action = self.llm.generate(
                f"{context}\nAction {step+1}: Search["
            )
            search_query = action.split("]")[0]
            context += f"Action {step+1}: Search[{search_query}]\n"

            # 3. Observation: Execute search and observe results
            results = self.retriever.search(search_query)
            observation = self.format_results(results)
            context += f"Observation {step+1}: {observation}\n"

        return "Could not find answer within max steps"

Execution Example

text
Question: What did Microsoft's CEO say when OpenAI's CEO was fired?

Thought 1: I need to find out when OpenAI's CEO was fired first.
Action 1: Search[OpenAI CEO fired date]
Observation 1: Sam Altman was fired by OpenAI's board on November 17, 2023.

Thought 2: Now I need to find what Microsoft's CEO said on that date.
Action 2: Search[Satya Nadella November 17 2023 Sam Altman]
Observation 2: Satya Nadella expressed support for Sam Altman and...

Thought 3: I have enough information to answer.
Final Answer: Satya Nadella expressed support for Sam Altman...

Pros and Cons

ProsCons
Dynamic adaptation: Changes path based on resultsHigh token usage: Full context each step
Easy debugging: Thoughts expose reasoningInfinite loop risk: Needs termination
Strong exception handling: Explores alternativesLow consistency: Different paths for same question

When to Use

  • Questions are unpredictable (diverse domains, open-ended)
  • Strategy needs to change based on results
  • Debugging is important (need to trace reasoning)

Pattern 2: Self-Ask

Core Structure

text
Question → Follow-up Question → Intermediate Answer → ... → Final Answer

Self-Ask repeatedly asks "What do I need to know first to answer this?" It explicitly generates sub-questions, answers each, then combines for the final answer.

How It Works

python
class SelfAskAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str) -> str:
        context = f"Question: {query}\n"
        context += "Are follow-up questions needed here: "

        while True:
            # Decide if follow-up needed
            needs_followup = self.llm.generate(context)

            if "No" in needs_followup or "Final Answer" in needs_followup:
                # Generate final answer
                final = self.llm.generate(
                    f"{context}\nSo the final answer is:"
                )
                return final

            # Generate follow-up question
            context += "Yes.\n"
            followup = self.llm.generate(
                f"{context}Follow-up question:"
            )
            context += f"Follow-up question: {followup}\n"

            # Search and answer follow-up
            results = self.retriever.search(followup)
            intermediate = self.generate_intermediate_answer(followup, results)
            context += f"Intermediate answer: {intermediate}\n"
            context += "Are follow-up questions needed here: "

Execution Example

text
Question: Who was CEO before Sam Altman returned?

Are follow-up questions needed here: Yes.
Follow-up question: When did Sam Altman return as OpenAI CEO?
Intermediate answer: He returned on November 22, 2023.

Are follow-up questions needed here: Yes.
Follow-up question: Who was OpenAI CEO just before November 22, 2023?
Intermediate answer: Emmett Shear was interim CEO from November 20.

Are follow-up questions needed here: No.
So the final answer is: The CEO before Sam Altman's return was Emmett Shear.

Pros and Cons

ProsCons
Structured decomposition: Explicit sub-questionsDepth limited: Too many hops degrade performance
Intermediate answers cacheableWeak branching: Optimized for linear chains
Easy verification: Can check each intermediateHard to parallelize: Sequential dependencies

When to Use

  • Chain-structured multi-hop questions (A → B → C)
  • Need to cache or verify intermediate results
  • Question decomposition structure is clear

Pattern 3: Plan-and-Solve

Core Structure

text
Question → Plan (all steps) → Execute Step 1 → Execute Step 2 → ... → Answer

Plan-and-Solve creates a complete plan first, then executes sequentially. Dependencies and parallelization are identified during planning.

How It Works

python
class PlanAndSolveAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str) -> str:
        # 1. Planning: Create complete plan
        plan = self.create_plan(query)

        # 2. Execution: Follow the plan
        results = {}
        for step in plan.steps:
            # Inject results from dependent steps
            resolved_query = self.resolve_dependencies(step, results)

            # Execute search
            search_results = self.retriever.search(resolved_query)
            results[step.id] = self.extract_answer(step, search_results)

        # 3. Synthesis: Combine results
        return self.synthesize(query, results)

    def create_plan(self, query: str) -> Plan:
        prompt = f"""
        Question: {query}

        Create a step-by-step plan to answer this question.
        For each step, specify:
        - step_id: unique identifier
        - query: what to search for
        - depends_on: list of step_ids this depends on (empty if none)

        Output as JSON.
        """
        plan_json = self.llm.generate(prompt)
        return Plan.from_json(plan_json)

Execution Example

text
Question: How did Tesla's stock and competitors react after they cut prices?

=== PLANNING PHASE ===
{
  "steps": [
    {"id": "s1", "query": "Tesla price cut date", "depends_on": []},
    {"id": "s2", "query": "Tesla stock reaction {s1.date}", "depends_on": ["s1"]},
    {"id": "s3", "query": "Competitor reaction Tesla price cut", "depends_on": ["s1"]},
    {"id": "s4", "query": "Synthesis", "depends_on": ["s2", "s3"]}
  ]
}

=== EXECUTION PHASE ===
Step s1: Tesla cut prices on January 13, 2023.
Step s2 (parallel): Tesla stock rose 8%.
Step s3 (parallel): Competitors responded with their own price cuts.
Step s4: [Synthesis]

=== FINAL ANSWER ===
After Tesla's January 2023 price cut, stock rose 8%,
and competitors responded with their own cuts.

Pros and Cons

ProsCons
Parallel execution: Dependencies enable optimizationHard to modify: Difficult to change path mid-execution
Predictable: Can review plan before executionPlan failure = total failure
Efficient: Minimizes unnecessary searchesComplex questions degrade plan quality

When to Use

  • Question structure is predictable
  • Need parallel processing for speed
  • Need to review/approve plan before execution

Pattern Comparison

Structural Comparison

text
ReAct:        Think → Act → Observe → Think → Act → ... (loop)
Self-Ask:     Question → Follow-up → Answer → Follow-up → ... (chain)
Plan-Solve:   Plan all steps → Execute s1 → Execute s2 → ... (sequential/parallel)

Detailed Comparison

CriteriaReActSelf-AskPlan-and-Solve
**Adaptability**High (re-evaluates each step)MediumLow (plan is fixed)
**Efficiency**Low (high token usage)MediumHigh (parallelizable)
**Predictability**LowMediumHigh
**Debugging**Easy (trace Thoughts)Easy (trace Follow-ups)Medium
**Complex questions**StrongMediumWeak (depends on plan quality)
**Implementation**MediumEasyHard

Decision Flow

text
Assess question type
    │
    ├─ Unpredictable, open-ended ──────────→ ReAct
    │
    ├─ Clear chain structure (A→B→C) ──────→ Self-Ask
    │
    └─ Parallelizable, clear structure ────→ Plan-and-Solve

Hybrid Approaches: Mix Them in Practice

In production, hybrids are often more effective than pure patterns.

Plan-then-ReAct

python
class HybridAgent:
    """Start with Plan-and-Solve, fall back to ReAct on failure"""

    def run(self, query: str) -> str:
        # 1. Try planning first
        plan = self.create_plan(query)

        # 2. Execute plan
        for step in plan.steps:
            try:
                result = self.execute_step(step)
                if not self.is_valid(result):
                    raise InvalidResultError()
            except Exception:
                # 3. Fall back to ReAct on failure
                return self.react_fallback(query, step)

        return self.synthesize(results)

    def react_fallback(self, query: str, failed_step: Step) -> str:
        """Switch to ReAct mode for flexible resolution"""
        context = f"Original question: {query}\n"
        context += f"Failed at: {failed_step.query}\n"
        context += "Switching to exploratory mode...\n"

        return self.react_agent.run(context)

Self-Ask with Parallel Execution

python
class ParallelSelfAsk:
    """Decompose with Self-Ask, execute independent questions in parallel"""

    def run(self, query: str) -> str:
        # 1. Generate all follow-up questions first
        followups = self.generate_all_followups(query)

        # 2. Analyze dependencies
        deps = self.analyze_dependencies(followups)

        # 3. Parallel for independent, sequential for dependent
        results = {}
        for group in self.topological_groups(deps):
            # Execute same-group questions in parallel
            group_results = parallel_execute(
                [self.answer_followup(q) for q in group]
            )
            results.update(group_results)

        return self.synthesize(query, results)

Implementation Tips

1. Clear Termination Conditions

python
# Prevent infinite loops in ReAct
MAX_STEPS = 7
CONFIDENCE_THRESHOLD = 0.8

def should_stop(thought: str, step: int, confidence: float) -> bool:
    if step >= MAX_STEPS:
        return True
    if "Final Answer" in thought:
        return True
    if confidence > CONFIDENCE_THRESHOLD:
        return True
    return False

2. Search Failure Handling

python
def search_with_fallback(query: str) -> List[Document]:
    # Primary: Exact search
    results = retriever.search(query)
    if results:
        return results

    # Secondary: Query expansion
    expanded = expand_query(query)
    results = retriever.search(expanded)
    if results:
        return results

    # Tertiary: Keyword extraction
    keywords = extract_keywords(query)
    return retriever.search(" ".join(keywords))

3. Context Compression

python
def compress_context(context: str, max_tokens: int = 2000) -> str:
    """Compress long context to save tokens"""
    if count_tokens(context) <= max_tokens:
        return context

    # Keep only recent N steps
    steps = parse_steps(context)
    recent = steps[-3:]  # Last 3 steps

    # Summarize earlier steps
    summary = summarize(steps[:-3])

    return f"[Summary of earlier steps: {summary}]\n" + format_steps(recent)

Conclusion

The three patterns are complementary, not competing.
text
ReAct:         Maximum flexibility, strong for unpredictable questions
Self-Ask:      Structured decomposition, optimal for chain questions
Plan-Solve:    Maximum efficiency, enables parallelization and review

Production Recommendations:

  1. Default to Plan-and-Solve (efficient)
  2. Fall back to ReAct on plan failure (flexible)
  3. Use Self-Ask for clear chain questions (structured)

Multi-hop RAG performance ultimately depends on choosing the right pattern for the situation.

Related Posts