코호트 분석 SQL로 직접 구현하기: 리텐션의 모든 것

GA4 없이도 코호트 분석 가능합니다. 가입월별 리텐션, N일 리텐션 SQL로 직접 구현.

TL;DR

코호트(Cohort): 특정 시점에 같은 행동을 한 사용자 그룹
리텐션: 코호트가 시간이 지나도 서비스를 계속 사용하는 비율
SQL로 구현: Window 함수와 CTE로 GA4 없이 직접 분석 가능
핵심 인사이트: 어떤 코호트가 잔존율이 높은지, 언제 이탈이 심한지 파악

1. 코호트 분석이란?

왜 코호트 분석이 필요한가?

잘못된 분석:

sql

이번 달 DAU: 10,000명 → "좋아!"

올바른 분석:

sql

1월 가입자 중 3개월 후 활성 사용자: 20%
2월 가입자 중 3개월 후 활성 사용자: 35%
→ "2월에 뭔가 바뀌었구나!"

코호트 분석은 시간에 따른 사용자 행동 변화를 추적합니다.

코호트의 종류

코호트 기준	예시
가입 시점	2024년 1월 가입자
첫 구매 시점	첫 구매가 2024년 Q1인 고객
유입 채널	구글 광고로 유입된 사용자
첫 사용 기능	프리미엄 기능을 먼저 사용한 사용자

2. 기본 테이블 구조

sql

-- 사용자 테이블
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    created_at TIMESTAMP,    -- 가입일
    signup_channel VARCHAR(50)
);

-- 활동 로그 테이블
CREATE TABLE user_activities (
    id INT PRIMARY KEY,
    user_id INT,
    activity_type VARCHAR(50),
    created_at TIMESTAMP
);

-- 주문 테이블
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    user_id INT,
    amount DECIMAL(10, 2),
    created_at TIMESTAMP
);

3. 월별 가입 코호트 리텐션

가장 기본적인 코호트 분석

sql

WITH user_cohorts AS (
    -- 각 사용자의 가입월 (코호트)
    SELECT
        user_id,
        DATE_TRUNC('month', created_at) as cohort_month
    FROM users
),
user_activities_monthly AS (
    -- 각 사용자의 월별 활동 여부
    SELECT DISTINCT
        user_id,
        DATE_TRUNC('month', created_at) as activity_month
    FROM user_activities
),
cohort_data AS (
    -- 코호트와 활동 데이터 결합
    SELECT
        uc.cohort_month,
        ua.activity_month,
        -- 가입 후 몇 개월 차인지
        DATE_PART('year', ua.activity_month) * 12 + DATE_PART('month', ua.activity_month)
        - DATE_PART('year', uc.cohort_month) * 12 - DATE_PART('month', uc.cohort_month)
        as months_since_signup,
        COUNT(DISTINCT uc.user_id) as active_users
    FROM user_cohorts uc
    JOIN user_activities_monthly ua ON uc.user_id = ua.user_id
    WHERE ua.activity_month >= uc.cohort_month
    GROUP BY uc.cohort_month, ua.activity_month
),
cohort_sizes AS (
    -- 각 코호트의 총 사용자 수
    SELECT
        cohort_month,
        COUNT(*) as cohort_size
    FROM user_cohorts
    GROUP BY cohort_month
)
SELECT
    cd.cohort_month,
    cs.cohort_size,
    cd.months_since_signup,
    cd.active_users,
    ROUND(cd.active_users * 100.0 / cs.cohort_size, 2) as retention_rate
FROM cohort_data cd
JOIN cohort_sizes cs ON cd.cohort_month = cs.cohort_month
ORDER BY cd.cohort_month, cd.months_since_signup;

결과:

cohort_month	cohort_size	months_since_signup	active_users	retention_rate
2024-01-01	1000	0	1000	100.00
2024-01-01	1000	1	450	45.00
2024-01-01	1000	2	320	32.00
2024-01-01	1000	3	280	28.00
2024-02-01	1200	0	1200	100.00
2024-02-01	1200	1	600	50.00

4. 피벗 테이블 형태로 보기

코호트 리텐션 매트릭스

sql

WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', created_at) as cohort_month
    FROM users
),
user_activities_monthly AS (
    SELECT DISTINCT
        user_id,
        DATE_TRUNC('month', created_at) as activity_month
    FROM user_activities
),
retention_data AS (
    SELECT
        uc.cohort_month,
        DATE_PART('year', ua.activity_month) * 12 + DATE_PART('month', ua.activity_month)
        - DATE_PART('year', uc.cohort_month) * 12 - DATE_PART('month', uc.cohort_month)
        as month_number,
        COUNT(DISTINCT uc.user_id) as users
    FROM user_cohorts uc
    LEFT JOIN user_activities_monthly ua
        ON uc.user_id = ua.user_id
        AND ua.activity_month >= uc.cohort_month
    GROUP BY uc.cohort_month, month_number
),
cohort_sizes AS (
    SELECT cohort_month, COUNT(*) as size
    FROM user_cohorts
    GROUP BY cohort_month
)
SELECT
    TO_CHAR(r.cohort_month, 'YYYY-MM') as cohort,
    cs.size as cohort_size,
    MAX(CASE WHEN month_number = 0 THEN ROUND(users * 100.0 / cs.size, 1) END) as "M0",
    MAX(CASE WHEN month_number = 1 THEN ROUND(users * 100.0 / cs.size, 1) END) as "M1",
    MAX(CASE WHEN month_number = 2 THEN ROUND(users * 100.0 / cs.size, 1) END) as "M2",
    MAX(CASE WHEN month_number = 3 THEN ROUND(users * 100.0 / cs.size, 1) END) as "M3",
    MAX(CASE WHEN month_number = 4 THEN ROUND(users * 100.0 / cs.size, 1) END) as "M4",
    MAX(CASE WHEN month_number = 5 THEN ROUND(users * 100.0 / cs.size, 1) END) as "M5",
    MAX(CASE WHEN month_number = 6 THEN ROUND(users * 100.0 / cs.size, 1) END) as "M6"
FROM retention_data r
JOIN cohort_sizes cs ON r.cohort_month = cs.cohort_month
GROUP BY r.cohort_month, cs.size
ORDER BY r.cohort_month;

결과 (리텐션 매트릭스):

cohort	cohort_size	M0	M1	M2	M3	M4	M5	M6
2024-01	1000	100	45	32	28	25	23	22
2024-02	1200	100	50	38	32	29	27	-
2024-03	1500	100	55	42	36	33	-	-
2024-04	1100	100	48	35	30	-	-	-

5. N일 리텐션 (Day 1, Day 7, Day 30)

모바일 앱에서 많이 사용하는 지표

sql

WITH user_first_activity AS (
    -- 각 사용자의 첫 활동일
    SELECT
        user_id,
        DATE(MIN(created_at)) as first_activity_date
    FROM user_activities
    GROUP BY user_id
),
daily_activities AS (
    -- 사용자별 활동 날짜
    SELECT DISTINCT
        user_id,
        DATE(created_at) as activity_date
    FROM user_activities
),
retention_flags AS (
    SELECT
        ufa.user_id,
        ufa.first_activity_date,
        -- Day N 활동 여부 체크
        MAX(CASE WHEN da.activity_date = ufa.first_activity_date + INTERVAL '1 day'
            THEN 1 ELSE 0 END) as day_1,
        MAX(CASE WHEN da.activity_date = ufa.first_activity_date + INTERVAL '7 days'
            THEN 1 ELSE 0 END) as day_7,
        MAX(CASE WHEN da.activity_date = ufa.first_activity_date + INTERVAL '14 days'
            THEN 1 ELSE 0 END) as day_14,
        MAX(CASE WHEN da.activity_date = ufa.first_activity_date + INTERVAL '30 days'
            THEN 1 ELSE 0 END) as day_30
    FROM user_first_activity ufa
    LEFT JOIN daily_activities da ON ufa.user_id = da.user_id
    GROUP BY ufa.user_id, ufa.first_activity_date
)
SELECT
    DATE_TRUNC('week', first_activity_date) as cohort_week,
    COUNT(*) as cohort_size,
    ROUND(SUM(day_1) * 100.0 / COUNT(*), 2) as day_1_retention,
    ROUND(SUM(day_7) * 100.0 / COUNT(*), 2) as day_7_retention,
    ROUND(SUM(day_14) * 100.0 / COUNT(*), 2) as day_14_retention,
    ROUND(SUM(day_30) * 100.0 / COUNT(*), 2) as day_30_retention
FROM retention_flags
GROUP BY DATE_TRUNC('week', first_activity_date)
ORDER BY cohort_week;

결과:

cohort_week	cohort_size	day_1_retention	day_7_retention	day_14_retention	day_30_retention
2024-01-01	500	42.00	25.00	18.00	12.00
2024-01-08	520	45.00	28.00	20.00	14.00
2024-01-15	480	40.00	22.00	15.00	10.00

6. 구매 코호트: 재구매율 분석

첫 구매 후 재구매 패턴

sql

WITH first_purchase AS (
    -- 각 고객의 첫 구매
    SELECT
        user_id,
        DATE_TRUNC('month', MIN(created_at)) as first_purchase_month
    FROM orders
    GROUP BY user_id
),
monthly_purchases AS (
    -- 고객별 월별 구매 여부
    SELECT DISTINCT
        user_id,
        DATE_TRUNC('month', created_at) as purchase_month
    FROM orders
),
repurchase_data AS (
    SELECT
        fp.first_purchase_month as cohort,
        DATE_PART('year', mp.purchase_month) * 12 + DATE_PART('month', mp.purchase_month)
        - DATE_PART('year', fp.first_purchase_month) * 12 - DATE_PART('month', fp.first_purchase_month)
        as months_since_first,
        COUNT(DISTINCT fp.user_id) as customers
    FROM first_purchase fp
    JOIN monthly_purchases mp ON fp.user_id = mp.user_id
    WHERE mp.purchase_month >= fp.first_purchase_month
    GROUP BY fp.first_purchase_month, months_since_first
),
cohort_sizes AS (
    SELECT first_purchase_month, COUNT(*) as size
    FROM first_purchase
    GROUP BY first_purchase_month
)
SELECT
    TO_CHAR(rd.cohort, 'YYYY-MM') as cohort,
    cs.size,
    rd.months_since_first,
    rd.customers,
    ROUND(rd.customers * 100.0 / cs.size, 2) as repurchase_rate
FROM repurchase_data rd
JOIN cohort_sizes cs ON rd.cohort = cs.first_purchase_month
WHERE rd.months_since_first <= 6
ORDER BY rd.cohort, rd.months_since_first;

7. 채널별 코호트 비교

어떤 채널의 사용자가 더 오래 남는가?

sql

WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', created_at) as cohort_month,
        signup_channel
    FROM users
),
user_activities_monthly AS (
    SELECT DISTINCT
        user_id,
        DATE_TRUNC('month', created_at) as activity_month
    FROM user_activities
),
channel_retention AS (
    SELECT
        uc.signup_channel,
        DATE_PART('year', ua.activity_month) * 12 + DATE_PART('month', ua.activity_month)
        - DATE_PART('year', uc.cohort_month) * 12 - DATE_PART('month', uc.cohort_month)
        as month_number,
        COUNT(DISTINCT uc.user_id) as active_users,
        COUNT(DISTINCT CASE WHEN ua.activity_month = uc.cohort_month
            THEN uc.user_id END) as cohort_size
    FROM user_cohorts uc
    LEFT JOIN user_activities_monthly ua
        ON uc.user_id = ua.user_id
        AND ua.activity_month >= uc.cohort_month
    GROUP BY uc.signup_channel, month_number
)
SELECT
    signup_channel,
    month_number,
    active_users,
    ROUND(active_users * 100.0 / NULLIF(MAX(cohort_size) OVER (PARTITION BY signup_channel), 0), 2) as retention_rate
FROM channel_retention
WHERE month_number BETWEEN 0 AND 6
ORDER BY signup_channel, month_number;

결과:

signup_channel	month_number	active_users	retention_rate
google_ads	0	5000	100.00
google_ads	1	2000	40.00
google_ads	3	1000	20.00
organic	0	3000	100.00
organic	1	1800	60.00
organic	3	1200	40.00

→ Organic 유입이 유료 광고보다 리텐션이 2배!

8. 롤링 리텐션 vs 클래식 리텐션

클래식 리텐션

"정확히 Day 7에 접속한 사용자"

롤링 리텐션 (더 관대함)

"Day 7 이후 언제든 접속한 사용자"

sql

WITH user_first_activity AS (
    SELECT
        user_id,
        DATE(MIN(created_at)) as first_date
    FROM user_activities
    GROUP BY user_id
),
daily_activities AS (
    SELECT DISTINCT user_id, DATE(created_at) as activity_date
    FROM user_activities
),
retention_comparison AS (
    SELECT
        ufa.user_id,
        ufa.first_date,
        -- 클래식: 정확히 Day 7
        MAX(CASE WHEN da.activity_date = ufa.first_date + 7 THEN 1 ELSE 0 END) as classic_d7,
        -- 롤링: Day 7 이후 아무 때나
        MAX(CASE WHEN da.activity_date >= ufa.first_date + 7 THEN 1 ELSE 0 END) as rolling_d7
    FROM user_first_activity ufa
    LEFT JOIN daily_activities da ON ufa.user_id = da.user_id
    GROUP BY ufa.user_id, ufa.first_date
)
SELECT
    DATE_TRUNC('week', first_date) as cohort_week,
    COUNT(*) as users,
    ROUND(SUM(classic_d7) * 100.0 / COUNT(*), 2) as classic_d7_retention,
    ROUND(SUM(rolling_d7) * 100.0 / COUNT(*), 2) as rolling_d7_retention
FROM retention_comparison
GROUP BY DATE_TRUNC('week', first_date)
ORDER BY cohort_week;

9. LTV by 코호트

코호트별 고객 생애 가치

sql

WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', created_at) as cohort_month
    FROM users
),
user_revenue AS (
    SELECT
        user_id,
        SUM(amount) as total_revenue,
        COUNT(*) as order_count
    FROM orders
    GROUP BY user_id
)
SELECT
    TO_CHAR(uc.cohort_month, 'YYYY-MM') as cohort,
    COUNT(DISTINCT uc.user_id) as users,
    COUNT(DISTINCT ur.user_id) as paying_users,
    ROUND(COUNT(DISTINCT ur.user_id) * 100.0 / COUNT(DISTINCT uc.user_id), 2) as conversion_rate,
    ROUND(AVG(ur.total_revenue), 2) as avg_revenue_per_paying_user,
    ROUND(SUM(ur.total_revenue) / COUNT(DISTINCT uc.user_id), 2) as revenue_per_user
FROM user_cohorts uc
LEFT JOIN user_revenue ur ON uc.user_id = ur.user_id
GROUP BY uc.cohort_month
ORDER BY uc.cohort_month;

10. 실전 팁

1. 코호트 크기 필터링

작은 코호트는 통계적으로 불안정합니다.

sql

HAVING COUNT(*) >= 100  -- 최소 100명 이상

2. 시각화를 위한 데이터 포맷

sql

-- 히트맵 시각화용 long format
SELECT
    cohort_month,
    month_number,
    retention_rate
FROM retention_data
ORDER BY cohort_month, month_number;

3. 이탈률 = 100 - 리텐션

sql

SELECT
    cohort,
    month_number,
    retention_rate,
    100 - retention_rate as churn_rate,
    LAG(retention_rate) OVER (PARTITION BY cohort ORDER BY month_number) - retention_rate as monthly_churn
FROM retention_matrix;

결론

분석 유형	질문	SQL 핵심
월별 코호트	언제 가입한 사용자가 잘 남는가?	DATE_TRUNC + month 차이 계산
N일 리텐션	Day 1/7/30에 몇 %가 돌아오는가?	INTERVAL + CASE WHEN
채널별 비교	어떤 채널이 LTV가 높은가?	GROUP BY channel
롤링 리텐션	이후 언제든 돌아오는 비율	>= 비교

코호트 분석 핵심:

시간에 따른 사용자 행동을 추적
평균이 아닌 그룹별 트렌드 파악
제품 개선의 효과를 정량적으로 측정

References

Amplitude - Retention Analysis Guide
Mixpanel - Cohort Analysis Best Practices
Mode Analytics - SQL for Cohort Analysis

코호트 분석 SQL로 직접 구현하기: 리텐션의 모든 것

코호트 분석 SQL로 직접 구현하기: 리텐션의 모든 것

TL;DR

1. 코호트 분석이란?

왜 코호트 분석이 필요한가?

코호트의 종류

2. 기본 테이블 구조

3. 월별 가입 코호트 리텐션

가장 기본적인 코호트 분석

4. 피벗 테이블 형태로 보기

코호트 리텐션 매트릭스

5. N일 리텐션 (Day 1, Day 7, Day 30)

모바일 앱에서 많이 사용하는 지표

6. 구매 코호트: 재구매율 분석

첫 구매 후 재구매 패턴

7. 채널별 코호트 비교

어떤 채널의 사용자가 더 오래 남는가?

8. 롤링 리텐션 vs 클래식 리텐션

클래식 리텐션

롤링 리텐션 (더 관대함)

9. LTV by 코호트

코호트별 고객 생애 가치

10. 실전 팁

1. 코호트 크기 필터링

2. 시각화를 위한 데이터 포맷

3. 이탈률 = 100 - 리텐션

결론

References

더 많은 콘텐츠를 받아보세요

이메일로 받아보기

관련 포스트

Data Analysis Cookbook: SQL과 Pandas로 데이터 분석 마스터하기

데이터 품질 체크 자동화: NULL, 중복, 일관성 검증 SQL 템플릿

이상치 탐지를 SQL로: Z-Score와 IQR 기반 아웃라이어 찾기