Engineering🇰🇷 한국어

Production Survival Guide for Vibe Coders

5 Non-Negotiable Standards for Enterprise Deployment

Production Survival Guide for Vibe Coders

Production Survival Guide for Vibe Coders

5 Non-Negotiable Standards for Enterprise Deployment

In the age of vibe coding, anyone can deploy an app. But preventing post-launch disasters isn't about coding skills—it's about engineering standards.

Just clicking the Vercel deploy button? Here are the 5 safety measures that enterprises never skip before launching a service.

Step 1: Visibility (Logging & Monitoring)

Enterprises don't drive blindfolded. By the time users report issues, you're already too late.

Minimum Standard: Log Status Code / Response Time / Error Stack for all API requests

Key Principle: "I know before users tell me" is the starting point of operations.

python
import logging
import time
from functools import wraps

# Basic logging setup
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s | %(levelname)s | %(message)s'
)
logger = logging.getLogger(__name__)

def log_request(func):
    """API request logging decorator"""
    @wraps(func)
    async def wrapper(*args, **kwargs):
        start_time = time.time()
        request_id = generate_request_id()

        try:
            result = await func(*args, **kwargs)
            elapsed = time.time() - start_time

            logger.info(f"[{request_id}] {func.__name__} | "
                       f"status=200 | duration={elapsed:.3f}s")
            return result

        except Exception as e:
            elapsed = time.time() - start_time
            logger.error(f"[{request_id}] {func.__name__} | "
                        f"status=500 | duration={elapsed:.3f}s | "
                        f"error={type(e).__name__}: {str(e)}")
            raise

    return wrapper

# Usage example
@log_request
async def call_llm_api(prompt: str):
    response = await openai_client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

Pro Tip: Connect real-time alerting with Sentry, Datadog, etc. to know immediately when things break at 3 AM.

Step 2: Environment Variable Validation (Fail-Fast Env)

"It worked on my machine?" 90% of incidents stem from missing env vars and secrets.

Minimum Standard: Validate all required env vars (API Key, DB URL, etc.) at app startup

Key Principle: If anything is missing, the server shouldn't start (Fail-Fast)

python
from pydantic_settings import BaseSettings
from pydantic import field_validator

class Settings(BaseSettings):
    """Required env vars - app won't start if any are missing"""

    # API Keys
    OPENAI_API_KEY: str
    ANTHROPIC_API_KEY: str

    # Database
    DATABASE_URL: str

    # Optional with defaults
    MAX_TOKENS: int = 4000
    TIMEOUT_SECONDS: int = 30

    @field_validator('OPENAI_API_KEY', 'ANTHROPIC_API_KEY')
    @classmethod
    def validate_api_key(cls, v: str, info) -> str:
        if not v or v.startswith('sk-xxx'):
            raise ValueError(f"{info.field_name} is not set or is a placeholder")
        return v

    @field_validator('DATABASE_URL')
    @classmethod
    def validate_db_url(cls, v: str) -> str:
        if 'localhost' in v and not v.startswith('postgresql://'):
            raise ValueError("Production DATABASE_URL should not use localhost")
        return v

    class Config:
        env_file = ".env"

# Validate at startup - fails if invalid
try:
    settings = Settings()
    print("Environment validated successfully")
except Exception as e:
    print(f"FATAL: Environment validation failed - {e}")
    exit(1)

Security Essentials:

  • Never commit .env files to Git (add to .gitignore)
  • Use AWS Secrets Manager, Vercel Environment Variables for production

Step 3: Availability Guardrails (Timeout & Retry)

One slow external API shouldn't bring down your entire service. That's a disqualifying failure.

Minimum Standard: Enforce timeouts on all external requests

Key Principle: "One dies, the rest survive"

python
import httpx
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type
)

# HTTP client with timeout config
http_client = httpx.AsyncClient(
    timeout=httpx.Timeout(
        connect=5.0,    # Connection timeout: 5s
        read=30.0,      # Read timeout: 30s
        write=10.0,     # Write timeout: 10s
        pool=5.0        # Connection pool timeout: 5s
    )
)

# Retry decorator (exponential backoff)
@retry(
    stop=stop_after_attempt(3),                    # Max 3 attempts
    wait=wait_exponential(multiplier=1, max=10),   # 1s → 2s → 4s
    retry=retry_if_exception_type((
        httpx.TimeoutException,
        httpx.NetworkError,
    )),
    reraise=True
)
async def call_external_api(url: str, payload: dict) -> dict:
    """External API call with timeout + retry"""
    response = await http_client.post(url, json=payload)

    # Don't retry 4xx errors (client's fault)
    if 400 <= response.status_code < 500:
        raise ValueError(f"Client error: {response.status_code}")

    response.raise_for_status()
    return response.json()

# Fallback pattern
async def call_with_fallback(prompt: str) -> str:
    """Switch to fallback when main fails"""
    try:
        return await call_openai(prompt)
    except Exception as e:
        logger.warning(f"OpenAI failed, falling back to Claude: {e}")
        try:
            return await call_anthropic(prompt)
        except Exception as e2:
            logger.error(f"All LLM providers failed: {e2}")
            return "Sorry, a temporary error occurred. Please try again later."

Step 4: Resource/Cost Control (Rate Limit & Cost Guard)

Allowing unlimited requests is like leaving your wallet open on the street.

Minimum Standard: Per-IP/user call limits + cost caps

Essential: Without idempotency, payments or requests can fire twice.

python
from datetime import datetime, timedelta
from collections import defaultdict
import hashlib

class RateLimiter:
    """Simple in-memory Rate Limiter"""

    def __init__(self, max_requests: int = 100, window_seconds: int = 60):
        self.max_requests = max_requests
        self.window = timedelta(seconds=window_seconds)
        self.requests = defaultdict(list)

    def is_allowed(self, user_id: str) -> bool:
        now = datetime.now()
        cutoff = now - self.window

        # Remove requests outside window
        self.requests[user_id] = [
            t for t in self.requests[user_id] if t > cutoff
        ]

        if len(self.requests[user_id]) >= self.max_requests:
            return False

        self.requests[user_id].append(now)
        return True

class CostGuard:
    """Cost guardrails"""

    def __init__(self, daily_limit: float = 100.0):
        self.daily_limit = daily_limit
        self.daily_cost = 0.0
        self.last_reset = datetime.now().date()

    def check_and_add(self, estimated_cost: float) -> bool:
        today = datetime.now().date()

        # Reset on new day
        if today > self.last_reset:
            self.daily_cost = 0.0
            self.last_reset = today

        # Check limit
        if self.daily_cost + estimated_cost > self.daily_limit:
            logger.warning(f"Daily cost limit reached: ${self.daily_cost:.2f}")
            return False

        self.daily_cost += estimated_cost

        # Warn at 80%
        if self.daily_cost > self.daily_limit * 0.8:
            logger.warning(f"Cost warning: 80% of daily limit used (${self.daily_cost:.2f})")

        return True

class IdempotencyGuard:
    """Duplicate request prevention"""

    def __init__(self, ttl_seconds: int = 300):
        self.cache = {}  # Use Redis in production
        self.ttl = timedelta(seconds=ttl_seconds)

    def get_key(self, user_id: str, request_data: dict) -> str:
        data_str = f"{user_id}:{sorted(request_data.items())}"
        return hashlib.sha256(data_str.encode()).hexdigest()

    def check_duplicate(self, user_id: str, request_data: dict) -> tuple[bool, any]:
        key = self.get_key(user_id, request_data)
        now = datetime.now()

        if key in self.cache:
            cached_time, cached_result = self.cache[key]
            if now - cached_time < self.ttl:
                logger.info(f"Duplicate request detected, returning cached result")
                return True, cached_result

        return False, None

    def store_result(self, user_id: str, request_data: dict, result: any):
        key = self.get_key(user_id, request_data)
        self.cache[key] = (datetime.now(), result)

# Usage example
rate_limiter = RateLimiter(max_requests=100, window_seconds=60)
cost_guard = CostGuard(daily_limit=50.0)
idempotency = IdempotencyGuard()

async def handle_request(user_id: str, request_data: dict):
    # 1. Rate Limit check
    if not rate_limiter.is_allowed(user_id):
        return {"error": "Too many requests. Please wait."}, 429

    # 2. Duplicate request check
    is_duplicate, cached = idempotency.check_duplicate(user_id, request_data)
    if is_duplicate:
        return cached, 200

    # 3. Cost check
    estimated_cost = estimate_cost(request_data)
    if not cost_guard.check_and_add(estimated_cost):
        return {"error": "Daily limit exceeded. Try again tomorrow."}, 503

    # 4. Actual processing
    result = await process_request(request_data)

    # 5. Cache result
    idempotency.store_result(user_id, request_data, result)

    return result, 200

Step 5: LLM Context Management (Token Governance)

LLM apps get more expensive and slower as conversations grow. This isn't a performance issue—it's an operational strategy issue.

Minimum Standard: Enforce Max Tokens limit + summarization logic required

Key Principle: If input is too long, validate at the gate—don't burn API calls

python
import tiktoken

class TokenGovernor:
    """Token usage management"""

    def __init__(
        self,
        max_input_tokens: int = 4000,
        max_output_tokens: int = 1000,
        max_history_messages: int = 10
    ):
        self.max_input = max_input_tokens
        self.max_output = max_output_tokens
        self.max_history = max_history_messages
        self.encoder = tiktoken.encoding_for_model("gpt-4")

    def count_tokens(self, text: str) -> int:
        return len(self.encoder.encode(text))

    def validate_input(self, prompt: str) -> tuple[bool, str]:
        """Input validation - check before API call"""
        token_count = self.count_tokens(prompt)

        if token_count > self.max_input:
            return False, f"Input too long. ({token_count} tokens > {self.max_input} limit)"

        return True, ""

    def trim_history(self, messages: list[dict]) -> list[dict]:
        """Trim conversation history - keep only last N"""
        if len(messages) <= self.max_history:
            return messages

        # Always keep system messages
        system_msgs = [m for m in messages if m.get("role") == "system"]
        other_msgs = [m for m in messages if m.get("role") != "system"]

        # Keep only recent messages
        trimmed = other_msgs[-(self.max_history - len(system_msgs)):]

        return system_msgs + trimmed

    def summarize_if_needed(self, messages: list[dict]) -> list[dict]:
        """Summarize older conversation when tokens exceed limit"""
        total_tokens = sum(self.count_tokens(m.get("content", "")) for m in messages)

        if total_tokens <= self.max_input:
            return messages

        # Preserve system + last 2 messages
        system_msgs = [m for m in messages if m.get("role") == "system"]
        recent = [m for m in messages if m.get("role") != "system"][-2:]
        old_msgs = [m for m in messages if m.get("role") != "system"][:-2]

        if not old_msgs:
            return messages

        # Summarize older conversation
        old_content = "\n".join(m.get("content", "") for m in old_msgs)
        summary = f"[Previous conversation summary: {old_content[:500]}...]"

        summary_msg = {"role": "system", "content": summary}

        return system_msgs + [summary_msg] + recent

# Usage example
governor = TokenGovernor(
    max_input_tokens=4000,
    max_output_tokens=1000,
    max_history_messages=10
)

async def chat(user_input: str, history: list[dict]) -> str:
    # 1. Input validation
    is_valid, error_msg = governor.validate_input(user_input)
    if not is_valid:
        return error_msg

    # 2. History cleanup
    history = governor.trim_history(history)
    history = governor.summarize_if_needed(history)

    # 3. Add new message
    history.append({"role": "user", "content": user_input})

    # 4. API call
    response = await openai_client.chat.completions.create(
        model="gpt-4",
        messages=history,
        max_tokens=governor.max_output
    )

    return response.choices[0].message.content

Pre-Deployment Checklist

StepItemCheck
1. VisibilityAre all API requests logged?
1. VisibilityDo you get alerts on errors?
2. Env VarsIs there env var validation?
2. Env VarsAre API keys NOT in code?
3. AvailabilityDo all external calls have timeouts?
3. AvailabilityIs retry logic using exponential backoff?
3. AvailabilityIs there a fallback path?
4. CostIs Rate Limiting applied?
4. CostIs there a daily cost cap?
4. CostIs there idempotency handling?
5. TokensIs there input token limit?
5. TokensIs there history management logic?
If 3 or more items are ☐, you're not ready for production.

Series

Stay Updated

Follow us for the latest posts and tutorials

Subscribe to Newsletter

Related Posts