EKS Observability

Resource Optimization

Junseok Oh

Sr. Solutions Architect

Amazon Web Services

Vertical Pod Autoscaler & Auto Mode

VPA와 Auto Mode의 시너지 효과

VPA

Pod 리소스 사용량 분석
최적 requests/limits 추천
CPU/Memory 자동 조정
14일 히스토리 기반 분석

VPA 최적화된 requests

Auto Mode가 최적 노드 선택

Auto Mode

실제 Pod requests 기반 선택
최적 인스턴스 타입 매칭
Bin-packing 최적화
Consolidation 자동 수행

⚙

VPA

Pod의 CPU/Memory 요청을 자동으로 최적화

⚡

Auto Mode

최적화된 요청 기반으로 적합한 인스턴스 자동 선택

✔

Synergy

리소스 낭비 감소 + 노드 비용 절감

📖 더 알아보기: VPA Pod Resize 가이드

VPA 모드 비교

updateMode별 동작 방식과 권장 사용 사례

Off 모드

추천값만 제공하고, Pod를 변경하지 않습니다. 가장 안전한 분석 모드입니다.

장점

Pod 재시작 없음, 안전하게 분석 가능, 프로덕션에서 먼저 테스트

단점

자동 적용되지 않음, 수동 업데이트 필요

권장 사용 사례

신규 워크로드 분석, 프로덕션 사전 검증, 리소스 패턴 파악

YAMLapiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"

Initial 모드

Pod 생성 시에만 추천값을 적용하고, 실행 중인 Pod는 변경하지 않습니다.

장점

실행 중 Pod 안정성 보장, 롤링 업데이트 시 자연스럽게 적용

단점

즉시 적용되지 않음, 새 Pod 생성까지 대기 필요

권장 사용 사례

Stateful 워크로드, 긴 실행 배치 작업, 재시작에 민감한 서비스

YAMLapiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Initial"

Recreate 모드

추천값이 변경되면 Pod를 재생성하여 적용합니다. 잠시 다운타임이 발생할 수 있습니다.

장점

즉시 최적화 적용, 리소스 효율성 극대화, 자동화된 관리

단점

Pod 재시작 발생, PDB 설정 필요, 잠재적 다운타임

권장 사용 사례

Stateless 워크로드, 개발/스테이징 환경, 다중 레플리카 서비스

YAMLapiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Recreate"

Auto 모드

Recreate와 동일하게 동작합니다. (deprecated — Recreate 사용 권장)

장점

최소 개입, 자동 리소스 조정, 편리한 관리

단점

Deprecated 상태, Recreate와 동일, Pod 재시작 발생

권장 사용 사례

일반적인 웹 서비스, 마이크로서비스, 유연한 운영 환경

YAMLapiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"

VPA 추천값 시뮬레이터

리소스 사용량 기반 VPA 추천값 시각화

CPU 현재 요청 (Request)

1000m

Memory 현재 요청 (Request)

2Gi

CPU 실제 사용량 (Usage)

300m

Memory 실제 사용량 (Usage)

512Mi

예상 리소스 절감률

65%

# VPA Recommendation Status recommendation: containerRecommendations: - containerName: app target: cpu: "345m" memory: "640Mi" lowerBound: cpu: "173m" memory: "384Mi" upperBound: cpu: "863m" memory: "1280Mi" # Savings: CPU 66% | Mem 69%

VPA + Auto Mode 연동 패턴

리소스 최적화에서 노드 비용 절감까지의 자동화 흐름

1

VPA 리소스 모니터링

14일간 Pod CPU/Memory 사용 패턴 분석

2

Pod Requests 업데이트

VPA가 Pod spec 자동 수정 (Recreate/Auto 모드)

4

Auto Mode 변경 감지

새로운 Pod requests를 기반으로 스케줄링 재평가

5

최적 인스턴스 선택

c6g.large에서 c6g.medium으로 다운사이징

6

Consolidation 수행

오버프로비저닝된 노드 통합, 빈 노드 제거

▶ Play를 눌러 VPA 흐름을 확인하세요

📊 리소스 모니터링

# VPA가 수집하는 메트릭
CPU 사용량 (14일 P95): 280m / 요청: 500m
Memory 사용량 (P95):   420Mi / 요청: 1Gi

# 사용률
CPU Utilization:    56%  ← 과잉 할당
Memory Utilization: 41%  ← 과잉 할당

💡 추천값 생성

# kubectl describe vpa my-app-vpa
Recommendation:
  Container: my-app
    Lower Bound:  CPU: 200m,  Mem: 350Mi
    Target:       CPU: 300m,  Mem: 512Mi  ← 권장
    Upper Bound:  CPU: 450m,  Mem: 700Mi
    Uncapped:     CPU: 300m,  Mem: 512Mi

→ CPU 40% 절감, Memory 50% 절감 가능

🔄 Pod Requests 업데이트

# VPA가 Pod spec 수정
resources:
  requests:
    cpu: 500m → 300m  # -40%
    memory: 1Gi → 512Mi  # -50%
  limits:
    cpu: 1000m → 600m
    memory: 2Gi → 1Gi

🔍 Auto Mode 변경 감지

# Karpenter가 Pod 변화 감지
Node pool capacity:
  Before: 3x c6g.large  (6 vCPU, 12Gi)
  After:  리소스 요청 감소 → 잉여 용량 발생

# Pod가 작아져서 더 작은 노드에 맞춤
Scheduling: c6g.medium으로 충분

📦 최적 인스턴스 선택

# 인스턴스 다운사이징
Before: c6g.large   ($0.068/h × 3) = $0.204/h
After:  c6g.medium  ($0.034/h × 3) = $0.102/h

# 시간당 절감
절감액: $0.102/h (50% 절감)
월간:   ~$73/mo 절감

🗑️ Consolidation 수행

# Karpenter Consolidation
감지: node-2 utilization < 30%
행동: Pod를 node-1, node-3로 이동
결과: node-2 제거

Before: 3 nodes → After: 2 nodes
비용:   $0.068/h 추가 절감

→ 총 비용 절감: VPA + Consolidation = ~67%

리소스 요청 최적화 전략

QoS Class와 Request/Limit 설정 가이드

CPU Settings

Request 500m

Limit 1000m

Memory Settings

Request 512Mi

Limit 512Mi

Guaranteed

request == limit

Burstable

request < limit

BestEffort

no request/limit

Production Best Practices

CPU: request만 설정, limit 없음 - Burstable로 CPU throttling 방지
Memory: request == limit - Guaranteed로 OOM 방지
권장 패턴: CPU request only + Memory guaranteed

Recommended Patternresources:
  requests:
    cpu: "500m"        # CPU request only
    memory: "512Mi"    # Memory guaranteed
  limits:
    # cpu: (no limit)    # Allows bursting
    memory: "512Mi"    # Same as request

현재 설정: CPU Burstable + Memory Guaranteed - 권장 패턴입니다!

📖 더 알아보기: 리소스 최적화 가이드

비용 절감 사례 (Before/After)

VPA + Auto Mode 최적화 실제 적용 결과

✖ Before (최적화 전)

Nodes 20개 (m5.xlarge)

CPU 활용률 25%

Memory 활용률 30%

월 비용

$5,840

✔ After (최적화 후)

Nodes 8개 (m6g.large, c6g.xlarge 혼합)

CPU 활용률 65%

Memory 활용률 70%

월 비용

$1,752

Total Savings

70%

VPA 리소스 적정화

20-40%

Graviton (ARM) 전환

20%

Spot 활용

60-70%

Consolidation

10-30%

HPA — Horizontal Pod Autoscaler

CPU, Memory, Custom Metrics 기반 수평 확장

# HorizontalPodAutoscaler (autoscaling/v2)
spec:
  scaleTargetRef: { name: my-app }
  minReplicas: 2 / maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # CPU 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Memory 80%
  behavior:
    scaleDown: { stabilizationWindowSeconds: 300 }
    scaleUp::  { stabilizationWindowSeconds: 30 }

# Prometheus Adapter 설치 필요
# helm install prometheus-adapter prometheus-community/prometheus-adapter

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-rps-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 100  # Pod당 100 RPS
  - type: External
    external:
      metric:
        name: sqs_queue_length
      target:
        type: AverageValue
        averageValue: 50

HPA 한계점

min: 1 — Scale-to-Zero 불가 (비용 낭비)
Custom Metrics — Prometheus Adapter 별도 설치/관리 필요
AWS 통합 — SQS, MSK 등 CloudWatch Adapter 별도 필요
PromQL 미지원 — Adapter가 변환해야 하며 복잡도 증가
단일 HPA — Deployment당 1개 제한 (KEDA는 ScaledObject로 통합)

KEDA가 해결하는 것: HPA를 직접 생성/관리하면서 60+ 외부 메트릭 소스를 네이티브로 지원하고 Scale-to-Zero까지 가능하게 합니다.

HPA 동작 흐름

Metrics Server / Prometheus Adapter
메트릭 수집 & API 제공

↓ 15초마다 조회

HPA Controller
현재값 vs 목표값 비교

↓ replicas 계산

Deployment
desiredReplicas = ceil(currentReplicas × currentValue / targetValue)

Resource

CPU / Memory
기본 내장

Custom

Pods / External
Adapter 필요

KEDA — Event-Driven Autoscaling

HPA를 넘어서: 이벤트 기반 스케일링과 Scale-to-Zero

Event Sources
SQS / MSK / Prometheus

→

KEDA
ScaledObject

→

HPA
자동 생성/관리

→

Deployment
0 ~ N Replicas

항목	HPA (기본)	KEDA
메트릭 소스	CPU / Memory만	60+ 외부 스케일러 (SQS, MSK, Prometheus 등)
최소 레플리카	min: 1	min: 0 (Scale-to-Zero → 비용 절감)
설정 방식	HPA 리소스 직접 관리	ScaledObject → HPA 자동 생성/관리
Prometheus 연동	Prometheus Adapter 필요	네이티브 PromQL 지원
AWS 통합	CloudWatch Adapter 별도 설치	SQS, MSK, CloudWatch 스케일러 내장

🔄

Scale-to-Zero

트래픽 없을 때 0으로 축소, 비용 절감

🔌

60+ Scalers

AWS, GCP, Azure, Kafka, Redis 등

📊

PromQL Native

기존 Prometheus 인프라 재활용

⚙️

CRD 기반

GitOps 친화적 선언형 관리

📖 더 알아보기: KEDA 실전 가이드

AWS 메트릭 기반 스케일링

SQS Queue Depth / MSK Consumer Lag / ALB Request Count

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    name: sqs-consumer
  minReplicaCount: 0  # Scale-to-Zero!
  maxReplicaCount: 20
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: keda-aws-credentials
    metadata:
      queueURL: https://sqs.ap-northeast-2...
      queueLength: "5"  # 메시지 5개당 1 Pod
      awsRegion: ap-northeast-2

                  🎯 SQS 네이티브 스케일러
                  aws-sqs-queue — Prometheus 불필요
queueLength: "5" — 메시지 5개당 Pod 1개
Scale-to-Zero: 큐 비어있으면 Pod 0
IRSA 기반 인증 (TriggerAuthentication)

                

💡 IRSA 설정 필수: TriggerAuthentication CRD로 SQS ReadOnly 권한 부여

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: msk-consumer-scaler
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 1
  maxReplicaCount: 30
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      query: |
        sum(kafka_consumergroup_lag{
          group="my-consumer",
          topic="orders"
        })
      threshold: "1000"  # Lag 1000당 1 Pod

                  📊 Prometheus 스케일러 + MSK
                  prometheus 스케일러로 Consumer Lag 모니터링
kafka_consumergroup_lag 메트릭 활용
threshold: "1000" — Lag 1000당 Pod 1개
기존 Prometheus 인프라 그대로 재사용

                

⚠️ MSK는 Scale-to-Zero 비권장: Consumer가 0이면 Lag 감지 불가 → minReplicaCount: 1 유지

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: alb-scaler
spec:
  scaleTargetRef:
    name: web-app
  minReplicaCount: 2
  maxReplicaCount: 50
  triggers:
  - type: aws-cloudwatch
    authenticationRef:
      name: keda-aws-credentials
    metadata:
      namespace: AWS/ApplicationELB
      dimensionName: TargetGroup
      dimensionValue: targetgroup/my-tg/...
      metricName: RequestCountPerTarget
      targetMetricValue: "100"  # Pod당 100 req
      metricStatPeriod: "60"

                  ☁️ CloudWatch 직접 쿼리
                  aws-cloudwatch 스케일러 — Prometheus 불필요
RequestCountPerTarget 메트릭
TargetGroup 단위 세밀한 제어
IRSA로 CloudWatch ReadOnly 권한

                

💡 ALB + HPA 대비 장점: CloudWatch 네이티브 쿼리로 Adapter 오버헤드 제거, 정확도 향상

📖 더 알아보기: KEDA 실전 가이드 — SQS, MSK, ALB 스케일러

Istio Gateway RPS 기반 스케일링

PromQL + KEDA로 Istio 메트릭 기반 자동 스케일링

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: istio-rps-scaler
spec:
  scaleTargetRef:
    name: my-app
  minReplicaCount: 0
  maxReplicaCount: 10
  cooldownPeriod: 120
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      query: |
        sum(rate(istio_requests_total{
          destination_service=~"my-app.*"
        }[1m]))
      threshold: "50"  # Pod당 50 RPS