Temporal RAG: Why RAG Always Gets 'When' Questions Wrong
"Who was the CEO in 2023?" "What about now?" — Why RAG gives wrong answers to these simple questions, and how to fix it.

Temporal RAG: Why RAG Always Gets 'When' Questions Wrong
"Who was the CEO in 2023?" "What about now?" — Why RAG gives wrong answers to these simple questions, and how to fix it.
Introduction: RAG's Time Blindness
Ask your RAG system this question:
"Who is OpenAI's CEO?"
Answer: "Sam Altman."
Good. Now ask this:
"Who was OpenAI's CEO on November 18, 2023?"
Answer: "Sam Altman."
Wrong. From November 17-22, 2023, Sam Altman was fired, and Mira Murati was interim CEO.
More Failure Cases
| Question | Expected Answer | RAG Answer | Problem |
|---|---|---|---|
| "Tesla stock price in 2022?" | $100-400 range | "Currently $248" | Ignores timeframe |
| "Last year vs this year Apple earnings?" | Comparison analysis | Mixed data | Time confusion |
| "Who is Twitter's CEO?" | "Linda Yaccarino" (2024) | "Elon Musk" (2022) | Stale data |
| "COVID cases status" | Latest data | 2021 peak data | Recency failure |
| "Company policy back then?" | Historical policy | Current policy | Can't track past |
Why Does This Happen?
The Fundamental Limitation of Embeddings
Vector embeddings only capture semantic similarity. Time information is not included.
# These two sentences have very high embedding similarity
text1 = "Sam Altman is the CEO of OpenAI" # 2024 document
text2 = "Sam Altman is the CEO of OpenAI" # 2020 document
# But also high similarity with this
text3 = "Mira Murati is the CEO of OpenAI" # Nov 2023 document
# Embeddings don't know 'when'
similarity(embed(text1), embed(text2)) ≈ 1.0 # Same content
similarity(embed(text1), embed(text3)) ≈ 0.85 # Both relevant to CEO questionTypes of Temporal Questions
1. Point-in-Time Questions
- "What was Q3 2023 revenue?"
- "What was the policy back then?"
- "What happened this time last year?"
2. Time Range Questions
- "Changes from 2020 to 2023"
- "Trends over the last 3 months"
- "What's different this year?"
3. Relative Time Questions
- "Recent news" (when exactly?)
- "How was it before?" (how far back?)
- "What changed since then?"
4. Temporal Comparison Questions
- "Year-over-year growth rate"
- "Before vs after policy change"
- "Performance before and after CEO change"
5. Time Series Questions
- "Quarterly revenue trends"
- "Annual user growth"
- "Monthly traffic changes"
Solution 1: Metadata Filtering
The most basic approach. Add time metadata to documents and filter during search.
Implementation
from datetime import datetime, timedelta
from typing import List, Optional
import chromadb
class TemporalVectorStore:
"""Time-aware vector store"""
def __init__(self):
self.client = chromadb.Client()
self.collection = self.client.create_collection("temporal_docs")
def add_document(self, doc_id: str, text: str, timestamp: datetime,
source: str = None):
"""Add document with time metadata"""
self.collection.add(
ids=[doc_id],
documents=[text],
metadatas=[{
"timestamp": timestamp.isoformat(),
"year": timestamp.year,
"month": timestamp.month,
"quarter": (timestamp.month - 1) // 3 + 1,
"source": source or "unknown"
}]
)
def query_with_time_filter(
self,
query: str,
start_date: Optional[datetime] = None,
end_date: Optional[datetime] = None,
top_k: int = 5
) -> List[dict]:
"""Time-filtered search"""
where_filter = {}
if start_date and end_date:
where_filter = {
"$and": [
{"timestamp": {"$gte": start_date.isoformat()}},
{"timestamp": {"$lte": end_date.isoformat()}}
]
}
elif start_date:
where_filter = {"timestamp": {"$gte": start_date.isoformat()}}
elif end_date:
where_filter = {"timestamp": {"$lte": end_date.isoformat()}}
results = self.collection.query(
query_texts=[query],
n_results=top_k,
where=where_filter if where_filter else None
)
return resultsTemporal Expression Parsing
import re
from dateutil import parser
from dateutil.relativedelta import relativedelta
class TemporalQueryParser:
"""Extract temporal information from queries"""
def parse(self, query: str, reference_date: datetime = None) -> dict:
"""Extract time range from query"""
if reference_date is None:
reference_date = datetime.now()
result = {
"original_query": query,
"start_date": None,
"end_date": None,
"temporal_type": "none"
}
# Absolute year
year_match = re.search(r'(\d{4})', query)
if year_match:
year = int(year_match.group(1))
result["start_date"] = datetime(year, 1, 1)
result["end_date"] = datetime(year, 12, 31)
result["temporal_type"] = "absolute_year"
return result
# Recent N days/months
recent_days = re.search(r'(last|recent|past)\s*(\d+)\s*days?', query, re.I)
if recent_days:
days = int(recent_days.group(2))
result["start_date"] = reference_date - timedelta(days=days)
result["end_date"] = reference_date
result["temporal_type"] = "relative_recent"
return result
# Last year/this year
if 'last year' in query.lower():
last_year = reference_date.year - 1
result["start_date"] = datetime(last_year, 1, 1)
result["end_date"] = datetime(last_year, 12, 31)
result["temporal_type"] = "relative_year"
return result
if 'this year' in query.lower():
result["start_date"] = datetime(reference_date.year, 1, 1)
result["end_date"] = reference_date
result["temporal_type"] = "relative_year"
return result
# Current/now
if any(kw in query.lower() for kw in ['current', 'now', 'today']):
result["start_date"] = reference_date - timedelta(days=7)
result["end_date"] = reference_date
result["temporal_type"] = "current"
return result
return resultLimitations
Metadata filtering is simple but has limitations:
- Hard filtering: Documents just outside the boundary are completely excluded
- Sparsity problem: No results if no documents exist in the specified period
- Complex expressions: Hard to handle "early 2020s" type expressions
Solution 2: Temporal Decay
Assign higher weights to recent documents.
Implementation
import numpy as np
from datetime import datetime
class TemporalDecayScorer:
"""Time-based score decay"""
def __init__(self, half_life_days: int = 30):
"""
half_life_days: Period for score to halve
Example: 30 days means a 30-day-old document gets 50% score
"""
self.half_life_days = half_life_days
self.decay_rate = np.log(2) / half_life_days
def exponential_decay(self, doc_date: datetime,
reference_date: datetime = None) -> float:
"""Exponential decay function"""
if reference_date is None:
reference_date = datetime.now()
age_days = (reference_date - doc_date).days
return np.exp(-self.decay_rate * age_days)
def gaussian_decay(self, doc_date: datetime,
target_date: datetime,
sigma_days: int = 30) -> float:
"""
Gaussian decay - peaks near specific point
Suitable for point-in-time questions
"""
diff_days = abs((target_date - doc_date).days)
return np.exp(-(diff_days ** 2) / (2 * sigma_days ** 2))
def apply_temporal_score(
self,
results: List[dict],
query_type: str = "recent",
target_date: datetime = None
) -> List[dict]:
"""Apply temporal scoring to search results"""
scored_results = []
for result in results:
doc_date = datetime.fromisoformat(result['metadata']['timestamp'])
semantic_score = result.get('score', 1.0)
if query_type == "recent":
# Prefer recent documents
temporal_score = self.exponential_decay(doc_date)
elif query_type == "point_in_time" and target_date:
# Prefer documents near specific point
temporal_score = self.gaussian_decay(doc_date, target_date)
else:
temporal_score = 1.0
# Final score = semantic score * temporal score
final_score = semantic_score * temporal_score
scored_results.append({
**result,
'semantic_score': semantic_score,
'temporal_score': temporal_score,
'final_score': final_score
})
# Re-sort by final score
scored_results.sort(key=lambda x: x['final_score'], reverse=True)
return scored_resultsSolution 3: Time-Aware Embedding
Encode time information in the embedding itself.
Method 1: Add Time Tokens
class TimeAwareEmbedder:
"""Embed with time context in text"""
def __init__(self, embedding_model):
self.model = embedding_model
def add_temporal_context(self, text: str, timestamp: datetime) -> str:
"""Add time context to text"""
time_prefix = f"[DATE: {timestamp.strftime('%Y-%m-%d')}] "
return time_prefix + text
def embed_with_time(self, text: str, timestamp: datetime) -> np.ndarray:
"""Generate embedding with temporal context"""
temporal_text = self.add_temporal_context(text, timestamp)
return self.model.encode(temporal_text)Method 2: Combine Time Embedding
class TemporalEmbedding:
"""Combine text embedding + time embedding"""
def __init__(self, text_dim: int = 768, time_dim: int = 32):
self.text_dim = text_dim
self.time_dim = time_dim
def encode_time(self, timestamp: datetime) -> np.ndarray:
"""Encode time as vector"""
features = np.array([
timestamp.year / 3000, # Normalize
timestamp.month / 12,
timestamp.day / 31,
timestamp.hour / 24,
timestamp.weekday() / 7,
timestamp.timetuple().tm_yday / 366
])
return features
def combine_embeddings(self, text_emb: np.ndarray,
time_emb: np.ndarray,
alpha: float = 0.1) -> np.ndarray:
"""Combine text and time embeddings"""
combined = np.concatenate([
text_emb * (1 - alpha),
time_emb * alpha
])
return combined / np.linalg.norm(combined)Solution 4: Temporal Reranking
Use LLM to re-evaluate temporal relevance after retrieval.
Implementation
class TemporalReranker:
"""LLM-based temporal-aware reranking"""
def __init__(self, llm_client):
self.llm = llm_client
def rerank(self, query: str, documents: List[dict],
temporal_context: dict) -> List[dict]:
"""Rerank considering temporal context"""
prompt = f"""Given the query and temporal context, rank these documents by relevance.
Query: {query}
Temporal Context: {temporal_context}
Documents:
"""
for i, doc in enumerate(documents):
prompt += f"""
[{i+1}] Date: {doc['metadata']['timestamp']}
Content: {doc['text'][:500]}...
"""
prompt += """
For each document, provide:
1. Temporal relevance score (0-1): How well does the document's date match the query's temporal intent?
2. Content relevance score (0-1): How relevant is the content?
3. Final ranking
Output as JSON array."""
response = self.llm.generate(prompt)
rankings = self._parse_rankings(response)
return self._apply_rankings(documents, rankings)Solution 5: Temporal Knowledge Graph
Build a Knowledge Graph with a time axis.
Concept
Traditional KG: (Sam Altman) --[CEO_OF]--> (OpenAI)
Temporal KG: (Sam Altman) --[CEO_OF {start: 2019, end: 2023-11-17}]--> (OpenAI)
(Mira Murati) --[CEO_OF {start: 2023-11-17, end: 2023-11-20}]--> (OpenAI)
(Emmett Shear) --[CEO_OF {start: 2023-11-20, end: 2023-11-22}]--> (OpenAI)
(Sam Altman) --[CEO_OF {start: 2023-11-22, end: null}]--> (OpenAI)Implementation
from dataclasses import dataclass
from datetime import datetime
from typing import Optional, List
@dataclass
class TemporalTriple:
"""Triple with temporal information"""
subject: str
predicate: str
object: str
valid_from: datetime
valid_to: Optional[datetime] = None # None = currently valid
confidence: float = 1.0
source: str = ""
class TemporalKnowledgeGraph:
"""Time-aware Knowledge Graph"""
def __init__(self):
self.triples: List[TemporalTriple] = []
self.entity_index = {} # entity -> triples
self.time_index = {} # (year, month) -> triples
def add_triple(self, triple: TemporalTriple):
"""Add and index triple"""
self.triples.append(triple)
# Entity index
for entity in [triple.subject, triple.object]:
if entity not in self.entity_index:
self.entity_index[entity] = []
self.entity_index[entity].append(triple)
def query_at_time(self, entity: str, predicate: str,
at_time: datetime) -> List[TemporalTriple]:
"""Query triples valid at a specific time"""
results = []
if entity in self.entity_index:
for triple in self.entity_index[entity]:
if predicate and triple.predicate != predicate:
continue
# Check temporal validity
if triple.valid_from <= at_time:
if triple.valid_to is None or triple.valid_to >= at_time:
results.append(triple)
return results
def query_history(self, entity: str, predicate: str) -> List[TemporalTriple]:
"""Query full history of an entity's facts"""
results = []
if entity in self.entity_index:
for triple in self.entity_index[entity]:
if predicate is None or triple.predicate == predicate:
results.append(triple)
# Sort by time
results.sort(key=lambda x: x.valid_from)
return results
# Usage example
tkg = TemporalKnowledgeGraph()
# Add OpenAI CEO history
tkg.add_triple(TemporalTriple(
subject="Sam Altman",
predicate="CEO_OF",
object="OpenAI",
valid_from=datetime(2019, 3, 1),
valid_to=datetime(2023, 11, 17),
source="news_001"
))
tkg.add_triple(TemporalTriple(
subject="Mira Murati",
predicate="CEO_OF",
object="OpenAI",
valid_from=datetime(2023, 11, 17),
valid_to=datetime(2023, 11, 20),
source="news_002"
))
# Query
print("OpenAI CEO on November 18, 2023:")
result = tkg.query_at_time("OpenAI", "CEO_OF", datetime(2023, 11, 18))Real Usage Examples
Example 1: CEO Change History
rag = TemporalRAG(vector_store, kg, llm, embedder)
# Question 1: Specific past point
result = rag.query("Who was OpenAI's CEO on November 18, 2023?")
print(result["answer"])
# Output: "On November 18, 2023, OpenAI's CEO was Mira Murati.
# She was appointed as interim CEO after Sam Altman was fired on November 17,
# and was later replaced by Emmett Shear on November 20."
# Question 2: Current
result = rag.query("Who is OpenAI's CEO now?")
print(result["answer"])
# Output: "As of January 2024, OpenAI's CEO is Sam Altman.
# He returned on November 22, 2023 and remains in the position."
# Question 3: History
result = rag.query("Has OpenAI's CEO ever changed?")
print(result["answer"])
# Output: "Yes, OpenAI's CEO has changed multiple times.
# - Sam Altman (Mar 2019 ~ Nov 17, 2023)
# - Mira Murati interim CEO (Nov 17-20, 2023)
# - Emmett Shear interim CEO (Nov 20-22, 2023)
# - Sam Altman returns (Nov 22, 2023 ~ present)"Example 2: Financial Data Time Series
# Question: Comparison analysis
result = rag.query("Compare Tesla's 2022 vs 2023 revenue")
print(result["answer"])
# Output: "Tesla annual revenue comparison:
# - 2022: $81.5B (51% YoY increase)
# - 2023: $96.8B (19% YoY increase)
# Growth continued in 2023 but at a slower rate."
# Question: Specific quarter
result = rag.query("What was Tesla's Q3 2023 performance?")
print(result["answer"])
# Output: "Tesla Q3 2023 results:
# - Revenue: $23.4B
# - Net income: $1.9B
# - Deliveries: 435,059 vehicles"Performance Optimization Tips
1. Time Index Partitioning
# Separate collections by year
collections = {
2022: chroma.create_collection("docs_2022"),
2023: chroma.create_collection("docs_2023"),
2024: chroma.create_collection("docs_2024"),
}
# Query only relevant years
def query_by_year(query, year):
if year in collections:
return collections[year].query(query)2. Time-based Caching
# Cache by time range
cache_key = f"{query_hash}_{start_date}_{end_date}"
cached_result = cache.get(cache_key)
if cached_result:
return cached_result3. Incremental Indexing
# Add only new documents (avoid full reindexing)
def incremental_index(new_docs):
for doc in new_docs:
if doc.timestamp > last_indexed_time:
vector_store.add(doc)
# Update Knowledge Graph too
kg.update_from_docs(new_docs)Summary
Core Problems
- Vector embeddings don't encode time information
- Can't understand time expressions like "recent", "back then", "current"
- Can't track fact changes over time
Solution Comparison
| Method | Pros | Cons | Best For |
|---|---|---|---|
| Metadata Filtering | Simple, fast | Hard filtering, boundary issues | Clear time range questions |
| Temporal Decay | Natural recency preference | Not suitable for past point questions | "Latest news" type |
| Time-Aware Embedding | Fundamental solution | Requires training, complex | Large-scale systems |
| Temporal Reranking | High accuracy | LLM cost, slow | High accuracy requirements |
| Temporal KG | Perfect fact change tracking | High build cost | Structured knowledge domains |
Recommended Combination
- Quick start: Metadata Filtering + Temporal Decay
- Balanced: Above + Temporal Reranking
- Complete solution: All techniques + Temporal KG
Next Steps
- Multi-hop Temporal Reasoning
- Event-based Temporal Indexing
- Temporal Question Decomposition
References
Subscribe to Newsletter
Related Posts

VibeTensor: Can AI Build a Deep Learning Framework from Scratch?
NVIDIA researchers released VibeTensor, a complete deep learning runtime generated by LLM-based AI agents. With over 60,000 lines of C++/CUDA code written by AI, we analyze the possibilities and limitations this project reveals.

Qwen3-Max-Thinking Snapshot Release: A New Standard in Reasoning AI
The recent trend in the LLM market goes beyond simply learning "more data" — it's now focused on "how the model thinks." Alibaba Cloud has released an API snapshot (qwen3-max-2026-01-23) of its most powerful model, Qwen3-Max-Thinking.

30-Minute Behavioral QA Before Deploy: 12 Bugs That Actually Break Vibe-Coded Apps
Session, Authorization, Duplicate Requests, LLM Resilience — What Static Analysis Can't Catch