ECS โ†’ EKS Migration Deep Dive

Multi-Cluster Active-Active Architecture (30min)

์˜ค์ค€์„ (Junseok Oh)

Sr. Solutions Architect, AWS

ECS์—์„œ EKS๋กœ โ€” ์™œ ์ „ํ™˜์„ ๊ณ ๋ คํ•˜๋Š”๊ฐ€?

"ECS๋กœ ์ž˜ ์šด์˜ํ•˜๊ณ  ์žˆ์—ˆ๋Š”๋ฐ, ์„œ๋น„์Šค ๊ทœ๋ชจ๊ฐ€ ์ปค์ง€๋ฉด์„œ ํ•œ๊ณ„๊ฐ€ ๋ณด์ด๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค."

๊ทœ๋ชจ๊ฐ€ ์ปค์ง€๋ฉด์„œ ๋ฐœ์ƒํ•˜๋Š” ECS ํ•œ๊ณ„

  • ๋ฉ€ํ‹ฐํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์„ฑ์ด ๋„ค์ดํ‹ฐ๋ธŒ๋กœ ๋ถˆ๊ฐ€ โ€” Blast Radius ๊ด€๋ฆฌ ์–ด๋ ค์›€
  • ๋„คํŠธ์›Œํฌ ์ œ์–ด๊ฐ€ ์ œํ•œ์  โ€” Pod ๋ ˆ๋ฒจ Security Group, Network Policy ๋ถ€์žฌ
  • IP ๊ณ ๊ฐˆ โ€” awsvpc ๋ชจ๋“œ์—์„œ Task๋‹น ENI IP ์ ์œ , ์„œ๋ธŒ๋„ท ๊ณ ๊ฐˆ
  • ์—์ฝ”์‹œ์Šคํ…œ โ€” Helm, Karpenter, Argo, Istio ๋“ฑ CNCF ์ƒํƒœ๊ณ„ ํ™œ์šฉ ๋ถˆ๊ฐ€

EKS ์ „ํ™˜ ์‹œ ๊ธฐ๋Œ€ํšจ๊ณผ

  • Multi-Cluster Active-Active๋กœ Blast Radius ์ตœ์†Œํ™”
  • Prefix Delegation์œผ๋กœ IP ํšจ์œจ์„ฑ 6๋ฐฐ ํ–ฅ์ƒ
  • Gateway API + Istio๋กœ ํ‘œ์ค€ํ™”๋œ ํŠธ๋ž˜ํ”ฝ ๊ด€๋ฆฌ
  • Karpenter + KEDA๋กœ ์ง€๋Šฅํ˜• ์˜คํ† ์Šค์ผ€์ผ๋ง

Target: Active-Active Multi-Cluster

Single Cluster vs Multi-Cluster

Single Cluster

  • ๊ด€๋ฆฌ ํฌ์ธํŠธ ๋‹จ์ผํ™”
  • ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด Pod ๊ฐ„ ํ†ต์‹  ๋น ๋ฆ„
  • ๋‹จ์ : ํด๋Ÿฌ์Šคํ„ฐ ์žฅ์•  ์‹œ ์ „์ฒด ์„œ๋น„์Šค ์ค‘๋‹จ
  • ๋‹จ์ : ์—…๊ทธ๋ ˆ์ด๋“œ ์‹œ ๋‹ค์šดํƒ€์ž„ ๋ฐœ์ƒ
  • Blast Radius: ์ „์ฒด ์„œ๋น„์Šค

Multi-Cluster (Active-Active)

  • Zone๋ณ„ ๋…๋ฆฝ ์šด์˜
  • ํด๋Ÿฌ์Šคํ„ฐ ์—…๊ทธ๋ ˆ์ด๋“œ ์‹œ ๋ฌด์ค‘๋‹จ ๊ฐ€๋Šฅ
  • ์žฅ์ : Blast Radius 50% ๊ฐ์†Œ
  • ์žฅ์ : Zone ์žฅ์•  ๋Œ€์‘ ๊ฐ€๋Šฅ
  • ํŠธ๋ ˆ์ด๋“œ์˜คํ”„: ์šด์˜ ๋ณต์žก์„ฑ ์ฆ๊ฐ€

๋ฉ€ํ‹ฐ ํด๋Ÿฌ์Šคํ„ฐ ๋ถ„๋ฆฌ ์ „๋žต

๊ธฐ์ค€Single ClusterMulti-Cluster
์„œ๋น„์Šค ์ˆ˜< 50๊ฐœ ์„œ๋น„์Šค50+ ์„œ๋น„์Šค
๋ณด์•ˆ ๊ฒฉ๋ฆฌNamespace RBAC ์ถฉ๋ถ„์ปดํ”Œ๋ผ์ด์–ธ์Šค/๊ทœ์ œ ์š”๊ตฌ
ํŒ€ ๋…๋ฆฝ์„ฑ๊ณต์œ  ๋ฆฌ์†Œ์Šค ๊ฐ€๋ŠฅํŒ€๋ณ„ ๋…๋ฆฝ ๋ฆด๋ฆฌ์Šค ํ•„์š”
Blast Radius์ „์ฒด ์„œ๋น„์Šค ์˜ํ–ฅ ํ—ˆ์šฉ์žฅ์•  ๋ฒ”์œ„ ์ œํ•œ ํ•„์ˆ˜
์—…๊ทธ๋ ˆ์ด๋“œ๋‹ค์šดํƒ€์ž„ ํ—ˆ์šฉ ๊ฐ€๋Šฅ๋ฌด์ค‘๋‹จ ํ•„์ˆ˜

ํŒ๋‹จ: ์ „์ฒด ์„œ๋น„์Šค EKS ์ด๊ด€ + ๋ฌด์ค‘๋‹จ ์—…๊ทธ๋ ˆ์ด๋“œ ํ•„์š” โ†’ Multi-Cluster ๊ถŒ์žฅ

ํŒจํ„ด๊ตฌ์กฐ์ ํ•ฉํ•œ ๊ฒฝ์šฐ
ํ™˜๊ฒฝ๋ณ„ ๋ถ„๋ฆฌDev Account / Staging Account / Prod Accountํ™˜๊ฒฝ ๊ฐ„ ์™„์ „ ๊ฒฉ๋ฆฌ ํ•„์š”
์›Œํฌ๋กœ๋“œ๋ณ„ ๋ถ„๋ฆฌFrontend Account / Backend Account / Data AccountํŒ€๋ณ„ ๋น„์šฉ ๋ถ„๋ฆฌ, ๋…๋ฆฝ ์šด์˜
ํ•˜์ด๋ธŒ๋ฆฌ๋“œProd-Service / Prod-Data / NonProd๊ถŒ์žฅ โ€” ์„œ๋น„์Šค์™€ ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ + ํ™˜๊ฒฝ ๋ถ„๋ฆฌ

ํ•ต์‹ฌ: AWS Organizations + OU ๊ตฌ์กฐ๋กœ ๊ณ„์ • ํ‘œ์ค€ํ™”, ์‹ ๊ทœ ๊ณ„์ •์€ Control Tower๋กœ ์ž๋™ ํ”„๋กœ๋น„์ €๋‹

ํŒจํ„ดํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜์žฅ์ ๋‹จ์ 
ํ™˜๊ฒฝ๋ณ„Dev + Staging + Prod๊ฐ„๋‹จ, ๋ช…ํ™•ํ•œ ๊ฒฉ๋ฆฌProd ํด๋Ÿฌ์Šคํ„ฐ ๋น„๋Œ€ํ™”
๋„๋ฉ”์ธ๋ณ„์„œ๋น„์ŠคA + ์„œ๋น„์ŠคB + PlatformํŒ€ ์ž์œจ์„ฑํด๋Ÿฌ์Šคํ„ฐ ๊ด€๋ฆฌ ๋ถ€๋‹ด
๊ธฐ๋Šฅ๋ณ„Service Plane + Data Plane์›Œํฌ๋กœ๋“œ ํŠน์„ฑ ์ตœ์ ํ™”์„œ๋น„์Šค ๊ฐ„ ํ†ต์‹  ๋ณต์žก
AZ ๊ธฐ๋ฐ˜Zone-A + Zone-C๊ณ ๊ฐ€์šฉ์„ฑ ๊ทน๋Œ€ํ™”๋ฐ์ดํ„ฐ ๋™๊ธฐํ™” ๊ณ ๋ ค

๊ถŒ์žฅ: AZ ๊ธฐ๋ฐ˜ Active-Active + Service/Data Plane ๋ถ„๋ฆฌ ์กฐํ•ฉ

Service Plane vs Data Plane ๋ถ„๋ฆฌ

โ˜ AWS VPC (Multi-AZ)
โš– Application Load Balancer
โ†“
โŽˆ Amazon EKS Cluster
๐Ÿ“ฆ Service Plane (Stateless)
Namespace: service-ns
API Pod Web Pod
Spot / Fargate ยท Deployment ยท HPA
๐Ÿ’พ EFS (ReadWriteMany)
๐Ÿ”’
Net
Policy
๐Ÿ—„ Data Plane (Stateful)
๐Ÿ›ก Taint
Namespace: data-ns
db-0
EBS
db-1
EBS
On-Demand ยท StatefulSet ยท KEDA
๐ŸŒ Headless Service (๊ณ ์ • DNS)

EKS Multi-Plane ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š”

Service Plane๊ณผ Data Plane์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๋น„์šฉ ์ตœ์ ํ™” + ๋ฐ์ดํ„ฐ ์•ˆ์ •์„ฑ ๊ทน๋Œ€ํ™”

๐ŸŽฏ ์ฃผ์š” ์„ค๊ณ„ ํฌ์ธํŠธ
  • โ— ๋ฆฌ์†Œ์Šค ๋ถ„๋ฆฌ๋กœ Noisy Neighbor ๋ฌธ์ œ ๋ฐฉ์ง€
  • โ— ์›Œํฌ๋กœ๋“œ ํŠน์„ฑ๋ณ„ ์ธ์Šคํ„ด์Šค ํƒ€์ž… (Spot vs On-Demand)
  • โ— ๋ช…ํ™•ํ•œ ๋„คํŠธ์›Œํฌ ๊ฒฉ๋ฆฌ ๋ฐ ์Šคํ† ๋ฆฌ์ง€ ํ• ๋‹น ์ „๋žต

์›Œํฌ๋กœ๋“œ ์ƒ๋ช…์ฃผ๊ธฐ โ€” Stateless vs Stateful

Stateless Stateful EBS Traffic Error Step 0 / 4

Service/Data Plane ๋ถ„๋ฆฌ์˜ ์ด์ 

Service Plane

  • ์›Œํฌ๋กœ๋“œ: API, Web, Mobile BFF
  • ํŠน์„ฑ: Stateless, ์ˆ˜ํ‰ ํ™•์žฅ ์šฉ์ด
  • ๋…ธ๋“œ: Spot Instance 70% + On-Demand 30%
  • ์Šค์ผ€์ผ๋ง: HPA + Karpenter (RPS ๊ธฐ๋ฐ˜)
  • ์—…๊ทธ๋ ˆ์ด๋“œ: Canary/Rolling (๋น ๋ฅธ ๋ฐ˜์˜)
  • ์žฅ์•  ์˜ํ–ฅ: ์„œ๋น„์Šค ์ผ์‹œ ์ง€์—ฐ (๋ณต๊ตฌ ๋น ๋ฆ„)

Data Plane

  • ์›Œํฌ๋กœ๋“œ: Kafka Consumer, Batch, ML Inference
  • ํŠน์„ฑ: Stateful/Long-running, ๋ฐ์ดํ„ฐ ์ •ํ•ฉ์„ฑ ์ค‘์š”
  • ๋…ธ๋“œ: On-Demand 90% + Spot 10% (๋น„์ค‘์š” ๋ฐฐ์น˜๋งŒ)
  • ์Šค์ผ€์ผ๋ง: KEDA (Queue depth/Lag ๊ธฐ๋ฐ˜)
  • ์—…๊ทธ๋ ˆ์ด๋“œ: Blue/Green (์•ˆ์ „ ์šฐ์„ )
  • ์žฅ์•  ์˜ํ–ฅ: ๋ฐ์ดํ„ฐ ์œ ์‹ค ๊ฐ€๋Šฅ (๋ณต๊ตฌ ์‹œ๊ฐ„ ํ•„์š”)

๋ฉ€ํ‹ฐ ๊ณ„์ • ๋„คํŠธ์›Œํฌ ํ† ํด๋กœ์ง€

Prod Service Account

EKS Service Cluster

Prod Data Account

EKS Data Cluster

NonProd Account

EKS Dev/Staging
Transit Gateway
โ†‘ VPC Peering โ†‘

Shared VPC

ArgoCD, Monitoring

Route 53

Private Hosted Zone / DNS Resolution

NLB Weighted Routing Deep Dive

Gateway API ์†Œ๊ฐœ

apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: nginx spec: controllerName: gateway.nginx.org/nginx-gateway-controller

์—ญํ• : ์ธํ”„๋ผ ์ œ๊ณต์ž๊ฐ€ ์ •์˜ํ•˜๋Š” Gateway ํ…œํ”Œ๋ฆฟ

๋‹ด๋‹น: Platform Team

production-gateway.yamlapiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: production-gateway spec: gatewayClassName: nginx listeners: - name: https protocol: HTTPS port: 443
hwahae-gateway.yamlapiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: hwahae-gateway namespace: gateway-system spec: gatewayClassName: nginx listeners: - name: https protocol: HTTPS port: 443 tls: mode: Terminate certificateRefs: - kind: Secret name: hwahae-tls allowedRoutes: namespaces: from: All

์—ญํ• : ์‹ค์ œ ๋กœ๋“œ๋ฐธ๋Ÿฐ์„œ ์ธ์Šคํ„ด์Šค

๋‹ด๋‹น: Cluster Admin

apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: api-route spec: parentRefs: - name: production-gateway rules: - matches: - path: {type: PathPrefix, value: /api} backendRefs: - name: api-service port: 8080

์—ญํ• : ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ผ์šฐํŒ… ๊ทœ์น™

๋‹ด๋‹น: Application Developer

VPC CNI & Prefix Delegation

์ธ์Šคํ„ด์Šค์ตœ๋Œ€ ENIENI๋‹น IP์ตœ๋Œ€ Pod
t3.medium3617
t3.large31235
m5.xlarge41558

๋ฌธ์ œ: ENI๋‹น ํ• ๋‹น ๊ฐ€๋Šฅํ•œ Secondary IP ์ˆ˜ ์ œํ•œ

๊ฒฐ๊ณผ: ๋…ธ๋“œ๋‹น Pod ์ˆ˜ ์ œํ•œ โ†’ ์ˆ˜ํ‰ ํ™•์žฅ ๋น„์šฉ ์ฆ๊ฐ€

๋ชจ๋“œํ• ๋‹น ๋‹จ์œ„t3.medium ์ตœ๋Œ€ Pod
Secondary IP๊ฐœ๋ณ„ IP17
Prefix Delegation/28 (16 IPs)110

/28 Prefix = 16๊ฐœ IP ํ•œ ๋ฒˆ์— ํ• ๋‹น

  • IP ํ• ๋‹น ์†๋„ ํ–ฅ์ƒ (1 API call โ†’ 16 IPs)
  • ๋…ธ๋“œ๋‹น Pod ๋ฐ€๋„ 6๋ฐฐ ์ด์ƒ ์ฆ๊ฐ€
  • Nitro ์ธ์Šคํ„ด์Šค์—์„œ ์ตœ์  ์„ฑ๋Šฅ

์š”๊ตฌ์‚ฌํ•ญ:

  • EKS 1.21+ / VPC CNI 1.9+
  • Nitro ๊ธฐ๋ฐ˜ ์ธ์Šคํ„ด์Šค ๊ถŒ์žฅ
  • ์„œ๋ธŒ๋„ท /28 prefix ์—ฌ์œ  ํ™•์ธ

์ฃผ์˜์‚ฌํ•ญ:

  • ๊ธฐ์กด ๋…ธ๋“œ ์žฌ์‹œ์ž‘ ํ•„์š”
  • Windows ๋…ธ๋“œ ๋ฏธ์ง€์›
  • Custom Networking๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ ๊ถŒ์žฅ

Custom Networking - ENIConfig (AZ๋ณ„ Pod ์„œ๋ธŒ๋„ท)

eniconfig.yamlapiVersion: crd.k8s.amazonaws.com/v1alpha1 kind: ENIConfig metadata: name: ap-northeast-2a spec: subnet: subnet-0abc123def456789a # Pod ์ „์šฉ ์„œ๋ธŒ๋„ท securityGroups: - sg-0123456789abcdef0

Security Group for Pods โ€” ์„ ํƒ์ง€ ๋น„๊ต

SGP (Branch ENI)

  • Pod๋ณ„ Security Group ์ ์šฉ ๊ฐ€๋Šฅ
  • Branch ENI ์†Œ๋ชจ โ†’ Pod ๋ฐ€๋„ ๋‚ฎ์Œ
  • ๊ธฐ์กด SG ๊ทœ์น™ ์žฌํ™œ์šฉ
  • m5.large: ์ตœ๋Œ€ 9 Branch ENI
Pod ๋ฐ€๋„ ์ œํ•œ

Prefix Delegation + Network Policy

  • ๋…ธ๋“œ SG ๊ณต์œ , Pod ๊ฒฉ๋ฆฌ๋Š” Network Policy
  • /28 prefix ๋‹จ์œ„ ํ• ๋‹น โ†’ Pod ๋ฐ€๋„ 6๋ฐฐโ†‘
  • Cilium L3/L4/L7 ํ•„ํ„ฐ๋ง
  • ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฉ๋ฆฌ ์š”๊ตฌ์‚ฌํ•ญ ์ถฉ์กฑ
๊ถŒ์žฅ ์กฐํ•ฉ

Fargate

  • Task๋ณ„ SG ์ ์šฉ ๊ฐ€๋Šฅ (์„œ๋ฒ„๋ฆฌ์Šค)
  • ENI ์ œํ•œ ๋ฌด๊ด€
  • DaemonSet ๋ฏธ์ง€์›
  • EKS ์ œ์–ด ์˜์—ญ ๋น„์šฉ ๋ณ„๋„
ํŠน์ˆ˜ ์›Œํฌ๋กœ๋“œ์šฉ

๊ถŒ์žฅ: SGP ํ•„์š” Pod๋งŒ ์„ ๋ณ„ ์ ์šฉ, ๋‚˜๋จธ์ง€๋Š” Prefix Delegation + Network Policy ์กฐํ•ฉ. ๊ธฐ์กด SGP Pod โ†’ ๋Œ€๋ถ€๋ถ„ Network Policy๋กœ ์ „ํ™˜ ๊ฐ€๋Šฅ

SGP Branch ENI ์ œ์•ฝ๊ณผ ๋Œ€์•ˆ

Security Group for Pods(SGP) ์•„ํ‚คํ…์ฒ˜:

Node ENI (Primary) โ†’ ๋…ธ๋“œ ์ž์ฒด ํ†ต์‹  Trunk ENI โ†’ Branch ENI ๊ด€๋ฆฌ (๋…ธ๋“œ๋‹น 1๊ฐœ) Branch ENI โ†’ SGP Pod ์ „์šฉ (VLAN ํƒœ๊น…)

์ œ์•ฝ ์‚ฌํ•ญ:

์ธ์Šคํ„ด์ŠคBranch ENI ์ตœ๋Œ€SGP Pod ์ตœ๋Œ€
t3.medium66
m5.large99
m5.xlarge1818
  • Prefix Delegation๊ณผ ๋ณ‘ํ–‰ ๋ถˆ๊ฐ€ (Branch ENI๋Š” ๊ฐœ๋ณ„ IP ํ• ๋‹น)
  • SGP Pod๋Š” Branch ENI ์ˆ˜์— ์ œํ•œ๋จ โ†’ ๋…ธ๋“œ๋‹น Pod ๋ฐ€๋„ ๊ธ‰๊ฐ

๋ฐฉ์•ˆ 1: SGP ์ตœ์†Œํ™” + NetworkPolicy ํ™œ์šฉ (๊ถŒ์žฅ)

network-policy.yaml# NetworkPolicy๋กœ L3/L4 ํŠธ๋ž˜ํ”ฝ ์ œ์–ด (SGP ๋ถˆํ•„์š”ํ•œ ๊ฒฝ์šฐ) apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: db-access-only spec: podSelector: matchLabels: app: api-server egress: - to: - ipBlock: cidr: 10.0.100.0/24 # RDS ์„œ๋ธŒ๋„ท ports: - port: 5432

๋ฐฉ์•ˆ 2: SGP ํ•„์ˆ˜ Pod๋Š” Fargate๋กœ ์ „ํ™˜

  • Fargate Pod๋Š” ์ž์ฒด ENI โ†’ Branch ENI ์ œ์•ฝ ์—†์Œ
  • RDS ์ง์ ‘ ์ ‘๊ทผ ๋“ฑ SG ํ•„์ˆ˜ ์ผ€์ด์Šค์— ์ ํ•ฉ

๋ฐฉ์•ˆ 3: VPC CNI v1.15+ POD_SECURITY_GROUP_ENFORCING_MODE=standard

  • Standard ๋ชจ๋“œ์—์„œ SGP + Prefix Delegation ๋™์‹œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • ๋‹จ, NetworkPolicy์™€ SGP ๊ฐ„ ์šฐ์„ ์ˆœ์œ„ ์ฃผ์˜

๋Œ€๋ถ€๋ถ„์˜ Pod

  • Prefix Delegation (๋†’์€ Pod ๋ฐ€๋„)
  • NetworkPolicy๋กœ L3/L4 ํŠธ๋ž˜ํ”ฝ ์ œ์–ด
  • Spot Instance ํ™œ์šฉ ๊ฐ€๋Šฅ

SG ํ•„์ˆ˜ Pod (RDS ์ง์ ‘ ์ ‘๊ทผ ๋“ฑ)

  • ์˜ต์…˜ A: Fargate (SGP ์ œ์•ฝ ์—†์Œ)
  • ์˜ต์…˜ B: ์ „์šฉ ๋…ธ๋“œ ๊ทธ๋ฃน (SGP ํ™œ์„ฑํ™”)
  • ์˜ต์…˜ C: standard ๋ชจ๋“œ (v1.15+)

๊ถŒ์žฅ: ๋Œ€๋ถ€๋ถ„ NetworkPolicy๋กœ ์ „ํ™˜ํ•˜๊ณ , RDS ์ง์ ‘ ์ ‘๊ทผ ๋“ฑ SG๊ฐ€ ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•œ Pod๋งŒ Fargate ๋˜๋Š” ์ „์šฉ ๋…ธ๋“œ ๊ทธ๋ฃน์œผ๋กœ ๋ถ„๋ฆฌ

์„œ๋ธŒ๋„ท IP ๊ณ ๊ฐˆ โ€” ์ดˆ๊ธฐ ์„ค๊ณ„ vs ํ›„์ ์šฉ

์ดˆ๊ธฐ ์„ค๊ณ„ ์‹œ ์ ์šฉ (๊ถŒ์žฅ)

  • VPC ์ƒ์„ฑ ์‹œ Secondary CIDR ์ถ”๊ฐ€
  • Pod ์ „์šฉ ์„œ๋ธŒ๋„ท (/19 ์ด์ƒ) ๋ฏธ๋ฆฌ ๊ตฌ์„ฑ
  • ENIConfig AZ๋ณ„ ๋งคํ•‘ ์„ค์ •
  • ์žฅ์ : ๋ฌด์ค‘๋‹จ ์ ์šฉ
  • ์žฅ์ : ์„œ๋ธŒ๋„ท ์‚ฌ์ด์ง• ์ž์œ ๋„ ๋†’์Œ
  • ๋น„์šฉ: ์ถ”๊ฐ€ ๋น„์šฉ ์—†์Œ

์šด์˜ ์ค‘ ํ›„์ ์šฉ

  • Secondary CIDR ์ถ”๊ฐ€ โ†’ ๊ฐ€๋Šฅ (VPC ์„ค์ •)
  • Pod ์ „์šฉ ์„œ๋ธŒ๋„ท ์ƒ์„ฑ โ†’ ๊ฐ€๋Šฅ
  • Custom Networking ํ™œ์„ฑํ™” โ†’ ๋…ธ๋“œ ๋กค๋ง ์žฌ์‹œ์ž‘ ํ•„์š”
  • ์ฃผ์˜: ๊ธฐ์กด Pod IP๊ฐ€ ๋ณ€๊ฒฝ๋จ
  • ์ฃผ์˜: ENIConfig ์ ์šฉ ํ›„ ์‹ ๊ทœ ๋…ธ๋“œ๋ถ€ํ„ฐ ์ ์šฉ
  • ๊ถŒ์žฅ: Blue/Green ๋…ธ๋“œ ๊ทธ๋ฃน ๊ต์ฒด

๊ฒฐ๋ก : ํด๋Ÿฌ์Šคํ„ฐ ์ดˆ๊ธฐ ์„ค๊ณ„ ์‹œ 100.64.0.0/16 Secondary CIDR + Custom Networking์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค. ํ›„์ ์šฉ๋„ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ๋…ธ๋“œ ๋กค๋ง ์žฌ์‹œ์ž‘์ด ํ•„์š”ํ•˜๋ฉฐ, Blue/Green ๋…ธ๋“œ ๊ทธ๋ฃน ๊ต์ฒด ๋ฐฉ์‹์ด ์•ˆ์ „ํ•ฉ๋‹ˆ๋‹ค.

ECS โ†’ EKS ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๋กœ๋“œ๋งต

1
Phase 1
๊ธฐ๋ฐ˜ ๊ตฌ์ถ•
VPC ์„ค๊ณ„, EKS ํด๋Ÿฌ์Šคํ„ฐ ํ”„๋กœ๋น„์ €๋‹,
Gateway API ์„ค์น˜ (2์ฃผ)
2
Phase 2
ํŒŒ์ผ๋Ÿฟ ์„œ๋น„์Šค
๋‹จ์ผ ์„œ๋น„์Šค ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜,
CI/CD ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•, ๋ชจ๋‹ˆํ„ฐ๋ง ์„ค์ • (2์ฃผ)
3
Phase 3
์ ์ง„์  ์ „ํ™˜
์„œ๋น„์Šค๋ณ„ ์ˆœ์ฐจ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜,
ํŠธ๋ž˜ํ”ฝ ๊ฐ€์ค‘์น˜ ์กฐ์ •, ECS ์ถ•์†Œ (4์ฃผ)
4
Phase 4
์™„์ „ ์ „ํ™˜
ECS ์ข…๋ฃŒ, Multi-Cluster ํ™œ์„ฑํ™”,
์šด์˜ ์•ˆ์ •ํ™” (2์ฃผ)

๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ์‚ฌ์ „ ์ฒดํฌ๋ฆฌ์ŠคํŠธ

  • ๋„คํŠธ์›Œํฌ ์ค€๋น„

    - VPC Secondary CIDR ์ถ”๊ฐ€ (100.64.0.0/16)

    - Pod ์ „์šฉ ์„œ๋ธŒ๋„ท ์ƒ์„ฑ (/19 ์ด์ƒ)

    - Security Group ์ •๋ฆฌ ๋ฐ ํ‘œ์ค€ํ™”

  • IAM ์ค€๋น„

    - EKS ํด๋Ÿฌ์Šคํ„ฐ ์—ญํ•  ์ƒ์„ฑ

    - ๋…ธ๋“œ ๊ทธ๋ฃน ์—ญํ•  ์ƒ์„ฑ

    - IRSA์šฉ OIDC Provider ์„ค์ •

    - Pod Identity ์ •์ฑ… ์ค€๋น„

  • GitOps ์ค€๋น„

    - Git ์ €์žฅ์†Œ ๊ตฌ์กฐ ์„ค๊ณ„

    - ArgoCD ์„ค์น˜ ๊ณ„ํš

    - Helm Chart / Kustomize ์„ ํƒ

  • ๋ชจ๋‹ˆํ„ฐ๋ง ์ค€๋น„

    - CloudWatch Container Insights ํ™œ์„ฑํ™”

    - Prometheus/Grafana ์Šคํƒ ๊ณ„ํš

    - ๊ธฐ์กด ECS ๋ฉ”ํŠธ๋ฆญ ๋Œ€์‹œ๋ณด๋“œ ๋งคํ•‘

Block 01 Quiz

Q1: Multi-Cluster Active-Active ์•„ํ‚คํ…์ฒ˜์˜ ์ฃผ์š” ์žฅ์ ์€?
Q2: Gateway API์—์„œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐœ๋ฐœ์ž๊ฐ€ ๊ด€๋ฆฌํ•˜๋Š” ๋ฆฌ์†Œ์Šค๋Š”?
Q3: Multi-Cluster ํ™˜๊ฒฝ์—์„œ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ„ ์›Œํฌ๋กœ๋“œ ๋™๊ธฐํ™”์— ์‚ฌ์šฉํ•˜๋Š” ๋„๊ตฌ๋Š”?
Q4: NLB ๊ฐ€์ค‘์น˜ ๋ผ์šฐํŒ…์—์„œ ํ•œ ํด๋Ÿฌ์Šคํ„ฐ ์ ๊ฒ€ ์‹œ ๊ถŒ์žฅ ์„ค์ •์€?

ECS โ†’ EKS Migration Deep Dive

GitOps & Progressive Delivery (30min)

์˜ค์ค€์„ (Junseok Oh)

Sr. Solutions Architect, AWS

CI/CD Pain Point

ํ˜„์žฌ ๋ฌธ์ œ์ 

  • Jenkins/CodePipeline ๊ธฐ๋ฐ˜ Push ๋ชจ๋ธ
  • ๋ฐฐํฌ ์ƒํƒœ ์ถ”์  ์–ด๋ ค์›€
  • ๋กค๋ฐฑ ์‹œ ์ˆ˜๋™ ๊ฐœ์ž… ํ•„์š”
  • ํ™˜๊ฒฝ๋ณ„ ์„ค์ • ์ผ๊ด€์„ฑ ๋ถ€์กฑ

GitOps ๋„์ž… ํšจ๊ณผ

  • Git = Single Source of Truth
  • ์ž๋™ํ™”๋œ Drift Detection
  • ๋ฉ”ํŠธ๋ฆญ ๊ธฐ๋ฐ˜ ์ž๋™ ๋กค๋ฐฑ
  • ํ™˜๊ฒฝ๋ณ„ ์„ค์ • ์ฝ”๋“œํ™”

IaC ์ „๋žต: Terraform + ArgoCD

GitHub - Source of Truth
infra-repo/
k8s-manifests/
โ†“ Boundary โ†“
Terraform - Infrastructure Layer
GitHub Actions
terraform plan/apply
TF State
S3 + DynamoDB Lock
Terraform Modules
VPC EKS IAM RDS ElastiCache Secrets
AWS Resources
VPC EKSร—2 IAM Aurora Redis NLB
ArgoCD - Application Layer
ArgoCD Controller
Auto/Manual Sync
Git Sync
Poll 3min / Webhook
Application Manifests
Helm Kustomize YAML ApplicationSet
EKS Clusters
Cluster A (AZ-a) Cluster C (AZ-c)

IaC ์ „๋žต ๊ฐœ์š”

์ธํ”„๋ผ(Terraform)์™€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜(ArgoCD)์˜ ์ฑ…์ž„์„ ๋ช…ํ™•ํžˆ ๋ถ„๋ฆฌ

๐ŸŽฏ ํ•ต์‹ฌ ์„ค๊ณ„
  • โ— Terraform: AWS ์ธํ”„๋ผ ๋ฆฌ์†Œ์Šค ์ƒํƒœ ๊ด€๋ฆฌ
  • โ— ArgoCD: K8s ์›Œํฌ๋กœ๋“œ ์ง€์† ๋™๊ธฐํ™”
  • โ— Git์ด Single Source of Truth

Terraform vs ArgoCD vs ACK ์˜์‚ฌ๊ฒฐ์ •

๋ฆฌ์†Œ์Šค ํŠน์„ฑTerraformArgoCDACK
EKS ํด๋Ÿฌ์Šคํ„ฐ, VPC, Subnetโœ… ๊ถŒ์žฅโŒโŒ
EKS Add-on (VPC CNI, CoreDNS)โœ… ๊ถŒ์žฅโ–ณ ๊ฐ€๋Šฅโ–ณ ๊ฐ€๋Šฅ
K8s Controller (ALB, ExternalDNS)โ–ณ ๊ฐ€๋Šฅโœ… ๊ถŒ์žฅโŒ
์•ฑ Deployment, ServiceโŒโœ… ๊ถŒ์žฅโŒ
์•ฑ ์ „์šฉ S3 Bucketโ–ณ ๊ฐ€๋Šฅโ–ณ ๊ฐ€๋Šฅโœ… ๊ถŒ์žฅ
์•ฑ ์ „์šฉ SQS Queueโ–ณ ๊ฐ€๋Šฅโ–ณ ๊ฐ€๋Šฅโœ… ๊ถŒ์žฅ
์•ฑ ์ „์šฉ IAM Roleโœ… ๊ถŒ์žฅโŒโ–ณ ๊ฐ€๋Šฅ
๊ณต์šฉ RDS/Auroraโœ… ๊ถŒ์žฅโŒโ–ณ ๊ฐ€๋Šฅ
๊ณต์šฉ ElastiCacheโœ… ๊ถŒ์žฅโŒโ–ณ ๊ฐ€๋Šฅ

ํ•ต์‹ฌ ์›์น™: ๋ฆฌ์†Œ์Šค์˜ ์ƒ๋ช…์ฃผ๊ธฐ๊ฐ€ ์•ฑ๊ณผ ๊ฐ™์œผ๋ฉด โ†’ ACK/ArgoCD, ์ธํ”„๋ผ ์ˆ˜์ค€์ด๋ฉด โ†’ Terraform

AWS Controllers for Kubernetes (ACK)

  • Kubernetes CR๋กœ AWS ๋ฆฌ์†Œ์Šค๋ฅผ ์„ ์–ธ์  ๊ด€๋ฆฌ
  • kubectl apply๋กœ S3, SQS, RDS ๋“ฑ ์ƒ์„ฑ/์‚ญ์ œ
  • ArgoCD์™€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ†ตํ•ฉ (GitOps)
sqs-queue.yaml# ACK๋กœ SQS Queue ์ƒ์„ฑ apiVersion: sqs.services.k8s.aws/v1alpha1 kind: Queue metadata: name: order-queue namespace: production spec: queueName: hwahae-order-queue visibilityTimeout: "30" tags: - key: team value: backend

์žฅ์ : ์•ฑ๊ณผ AWS ๋ฆฌ์†Œ์Šค๋ฅผ ๋™์ผํ•œ GitOps ์›Œํฌํ”Œ๋กœ์šฐ๋กœ ๊ด€๋ฆฌ
์ œํ•œ: IAM, VPC ๋“ฑ ์ธํ”„๋ผ ๋ฆฌ์†Œ์Šค๋Š” ์ง€์› ์ œํ•œ์ 

์•ฑ ์ „์šฉ AWS ๋ฆฌ์†Œ์Šค์ธ๊ฐ€? โ”œโ”€ Yes โ†’ ๋ฆฌ์†Œ์Šค ์ƒ๋ช…์ฃผ๊ธฐ๊ฐ€ ์•ฑ๊ณผ ๋™์ผํ•œ๊ฐ€? โ”‚ โ”œโ”€ Yes โ†’ ACK (GitOps ํ†ตํ•ฉ) โ”‚ โ””โ”€ No โ†’ Terraform โ””โ”€ No (๊ณต์šฉ ๋ฆฌ์†Œ์Šค) โ†’ Terraform ๋ณด์•ˆ ๋ฏผ๊ฐ ๋ฆฌ์†Œ์Šค์ธ๊ฐ€? (IAM Role, SG) โ”œโ”€ Yes โ†’ Terraform (๋ณ€๊ฒฝ ์ด๋ ฅ + Plan ๊ฒ€์ฆ) โ””โ”€ No โ†’ ACK ๋˜๋Š” Terraform

ํ˜„์žฌ ๊ตฌ์กฐ์™€ ๋งคํ•‘:

  • terraform/: EKS, VPC, ๊ณต์šฉ RDS, IAM โ†’ ์œ ์ง€
  • terraform/applications/<app>/: ์•ฑ ์ „์šฉ SG, IAM โ†’ ์œ ์ง€ (๋ณด์•ˆ์ƒ Terraform ๊ถŒ์žฅ)
  • ์‹ ๊ทœ: ์•ฑ ์ „์šฉ S3, SQS, SNS โ†’ ACK๋กœ ์ „ํ™˜ ๊ฒ€ํ† 

EKS Capabilities (๊ด€๋ฆฌํ˜• Add-on, Auto Mode ๋“ฑ)

ํ•ญ๋ชฉEKS Add-on (Terraform)ArgoCDํŒ๋‹จ ๊ธฐ์ค€
VPC CNIโœ…โ–ณEKS API ํ†ตํ•ฉ, ์ž๋™ ์—…๊ทธ๋ ˆ์ด๋“œ
CoreDNSโœ…โ–ณEKS ๋ฒ„์ „ ํ˜ธํ™˜์„ฑ ์ž๋™ ๊ด€๋ฆฌ
kube-proxyโœ…โ–ณEKS ๋ฒ„์ „ ์—ฐ๋™
AWS LB Controllerโ–ณโœ…Helm values ์ปค์Šคํ„ฐ๋งˆ์ด์ง• ํ•„์š”
External DNSโŒโœ…K8s ๋„ค์ดํ‹ฐ๋ธŒ ์„ค์ •
Cert ManagerโŒโœ…K8s ๋„ค์ดํ‹ฐ๋ธŒ ์„ค์ •
Karpenterโ–ณโœ…NodePool CRD๋Š” ArgoCD๊ฐ€ ์ ํ•ฉ
ADOT/CloudWatch Agentโœ…โ–ณEKS Add-on์œผ๋กœ ๊ฐ„ํŽธ ๊ด€๋ฆฌ

์›์น™: EKS API๋กœ ๊ด€๋ฆฌ๋˜๋Š” ํ•ต์‹ฌ ๋„คํŠธ์›Œํ‚น/์‹œ์Šคํ…œ ์ปดํฌ๋„ŒํŠธ๋Š” Terraform, ๋‚˜๋จธ์ง€ Controller๋Š” ArgoCD

ACK ์‹ค์ „ ์„ค์ •

ack-setup.yaml# 1. ACK S3 Controller ์„ค์น˜ (ArgoCD Application) apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: ack-s3-controller namespace: argocd spec: source: chart: s3-chart repoURL: public.ecr.aws/aws-controllers-k8s targetRevision: v1.0.15 helm: values: | serviceAccount: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/ACK-S3-Role destination: server: https://kubernetes.default.svc namespace: ack-system

ACK ๋ฆฌ์†Œ์Šค ๋งค๋‹ˆํŽ˜์ŠคํŠธ

s3-bucket.yaml# ์•ฑ ์ „์šฉ S3 Bucket apiVersion: s3.services.k8s.aws/v1alpha1 kind: Bucket metadata: name: hwahae-user-uploads namespace: production spec: name: hwahae-user-uploads-prod versioning: status: Enabled encryption: rules: - applyServerSideEncryptionByDefault: sseAlgorithm: aws:kms
sqs-queue.yaml# ์•ฑ ์ „์šฉ SQS Queue apiVersion: sqs.services.k8s.aws/v1alpha1 kind: Queue metadata: name: order-processing namespace: production spec: queueName: hwahae-order-processing visibilityTimeout: "60" messageRetentionPeriod: "345600" redrivePolicy: | { "deadLetterTargetArn": "...-dlq", "maxReceiveCount": 3 }

ํšจ๊ณผ: K8s ๋งค๋‹ˆํŽ˜์ŠคํŠธ๋กœ AWS ๋ฆฌ์†Œ์Šค ๊ด€๋ฆฌ โ†’ ์•ฑํŒ€ ์…€ํ”„์„œ๋น„์Šค, ArgoCD GitOps ์ผ์›ํ™”

ArgoCD Sync vs Argo Rollouts

ArgoCD Sync (์ผ๋ฐ˜ ๋ฐฐํฌ)

  • Git ๋ณ€๊ฒฝ ๊ฐ์ง€ โ†’ ์ž๋™/์ˆ˜๋™ Sync
  • Deployment์˜ ๊ธฐ๋ณธ RollingUpdate ์‚ฌ์šฉ
  • ์žฅ์ : ์„ค์ • ๊ฐ„๋‹จ, ๋น ๋ฅธ ๋ฐฐํฌ
  • ๋‹จ์ : ์„ธ๋ฐ€ํ•œ ํŠธ๋ž˜ํ”ฝ ์ œ์–ด ๋ถˆ๊ฐ€
  • ์ ํ•ฉ: ๋‚ด๋ถ€ ์„œ๋น„์Šค, ๊ฐœ๋ฐœ ํ™˜๊ฒฝ
syncPolicy: automated: prune: true selfHeal: true

Argo Rollouts (Progressive Delivery)

  • Canary / Blue-Green ๋ฐฐํฌ ์ง€์›
  • ํŠธ๋ž˜ํ”ฝ ๊ฐ€์ค‘์น˜ ๋‹จ๊ณ„๋ณ„ ์กฐ์ •
  • ์žฅ์ : ๋ฉ”ํŠธ๋ฆญ ๊ธฐ๋ฐ˜ ์ž๋™ ๋กค๋ฐฑ
  • ์žฅ์ : Istio VirtualService ์—ฐ๋™
  • ์ ํ•ฉ: ํ”„๋กœ๋•์…˜, ๊ณ ๊ฐ ๋Œ€๋ฉด ์„œ๋น„์Šค
strategy: canary: steps: - setWeight: 10 - analysis: {templates: [success-rate]} - setWeight: 50

Argo Rollouts + Istio Canary Architecture

๋ฐฐํฌ ํ๋ฆ„

Git Push
โ–ธ
ArgoCD Sync
โ–ธ
Argo Rollouts
โ–ธ
Istio VirtualService

Stable (v1)

ํ˜„์žฌ ํ”„๋กœ๋•์…˜ ํŠธ๋ž˜ํ”ฝ
weight: 90% โ†’ 0%

Canary (v2)

์ƒˆ ๋ฒ„์ „ ํŠธ๋ž˜ํ”ฝ
weight: 10% โ†’ 100%

AnalysisRun

  • โ— Prometheus ๋ฉ”ํŠธ๋ฆญ ์กฐํšŒ
  • โ— ์„ฑ๊ณต๋ฅ  โ‰ฅ 95% โ†’ Promote
  • โ— ์‹คํŒจ ์‹œ โ†’ ์ž๋™ Rollback
Promote
โ†’ 100%
Rollback
โ†’ 0%

Canary Deployment Flow

1
10%
์ดˆ๊ธฐ ํŠธ๋ž˜ํ”ฝ ์ „ํ™˜
2๋ถ„ ๋Œ€๊ธฐ ํ›„ ๋ฉ”ํŠธ๋ฆญ ๋ถ„์„
2
30%
์„ฑ๊ณต ์‹œ ํŠธ๋ž˜ํ”ฝ ์ฆ๊ฐ€
์—๋Ÿฌ์œจ ๋ชจ๋‹ˆํ„ฐ๋ง
3
50%
์ ˆ๋ฐ˜ ์ „ํ™˜
5๋ถ„๊ฐ„ ์•ˆ์ •์„ฑ ๊ฒ€์ฆ
4
80%
๋Œ€๋ถ€๋ถ„ ์ „ํ™˜
์ตœ์ข… ๊ฒ€์ฆ
5
100%
์™„์ „ ์ „ํ™˜
Stable๋กœ Promote

์ง€์› Traffic Provider: Istio, NGINX Ingress, ALB, SMI, Apache APISIX, Traefik

Zone-Aware PDB ๋ฐฉ์–ด ์‹œ๋‚˜๋ฆฌ์˜ค

Step 1 / 7
Zone A Cluster (Spot)
App D PDB maxUnavailable: 2
Zone C Cluster โœ“
์•ˆ์ •์ ์ธ ํŠธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ ์ค‘
D App D (PDB) โ”‚ Running T Terminating PDB Blocked Pending

Zone-Aware Rollouts

GitHub Self-Hosted Runner (ARC)

Actions Runner Controller (ARC)

  • GitHub Actions Self-Hosted Runner๋ฅผ EKS์—์„œ ์‹คํ–‰
  • Runner Pod ์ž๋™ ์Šค์ผ€์ผ๋ง (HRA)
  • ์›Œํฌ๋กœ๋“œ๋ณ„ Runner ๊ฒฉ๋ฆฌ ๊ฐ€๋Šฅ

์žฅ์ :

  • VPC ๋‚ด๋ถ€ ๋ฆฌ์†Œ์Šค ์ ‘๊ทผ (RDS, ElastiCache)
  • ECR Push ์‹œ IAM Role ํ™œ์šฉ
  • ๋นŒ๋“œ ์บ์‹œ PVC๋กœ ์†๋„ ํ–ฅ์ƒ
GitHub Actions Workflow
Webhook Event (workflow_job)
ARC Controller (EKS)
Runner Pod ์ƒ์„ฑ (ephemeral)
Job ์‹คํ–‰
โ–ธ
ECR Push
โ–ธ
ArgoCD Sync
Runner Pod ์‚ญ์ œ
# Runner Pod ๋ฆฌ์†Œ์Šค resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2000m" memory: "4Gi" # Runner ์Šค์ผ€์ผ๋ง minRunners: 1 maxRunners: 10

HorizontalRunnerAutoscaler - ์ž๋™ ์Šค์ผ€์ผ๋ง

hra.yamlapiVersion: actions.summerwind.dev/v1alpha1 kind: HorizontalRunnerAutoscaler metadata: name: hwahae-runners-autoscaler namespace: arc-runners spec: scaleTargetRef: kind: RunnerDeployment name: hwahae-runners minReplicas: 1 maxReplicas: 10 metrics: - type: TotalNumberOfQueuedAndInProgressWorkflowRuns repositoryNames: - hwahae/backend - hwahae/frontend

NGINX Gateway Fabric vs Istio Gateway

NGINX Gateway Fabric

์—ญํ• : North-South ํŠธ๋ž˜ํ”ฝ (์™ธ๋ถ€ โ†’ ํด๋Ÿฌ์Šคํ„ฐ)

  • L4/L7 ๋กœ๋“œ๋ฐธ๋Ÿฐ์‹ฑ
  • TLS ์ข…๋ฃŒ
  • Rate Limiting
  • WAF ํ†ตํ•ฉ ๊ฐ€๋Šฅ
  • Gateway API ๋„ค์ดํ‹ฐ๋ธŒ
apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: external-gateway spec: gatewayClassName: nginx listeners: - name: https port: 443 protocol: HTTPS

Istio Service Mesh

ํ˜„์žฌ ์š”๊ตฌ์‚ฌํ•ญ์— ๋ฏธํฌํ•จ (ํ–ฅํ›„ ๊ฒ€ํ† )

์ฐธ๊ณ : Service Mesh ๋„์ž… ์‹œ ๊ณ ๋ ค์‚ฌํ•ญ

  • East-West ํŠธ๋ž˜ํ”ฝ (Pod โ†” Pod) mTLS
  • ํŠธ๋ž˜ํ”ฝ ๋ฏธ๋Ÿฌ๋ง, Circuit Breaker
  • Canary ๋ฐฐํฌ (Argo Rollouts ์—ฐ๋™)
  • ๋ฆฌ์†Œ์Šค ์˜ค๋ฒ„ํ—ค๋“œ (Sidecar Proxy)
  • ์šด์˜ ๋ณต์žก์„ฑ ์ฆ๊ฐ€

๊ฒฐ๋ก : ํ˜„ ๋‹จ๊ณ„์—์„œ๋Š” NGINX Gateway Fabric๋งŒ์œผ๋กœ ์ถฉ๋ถ„. Service Mesh๋Š” ํŠธ๋ž˜ํ”ฝ ๊ด€์ฐฐ ์š”๊ตฌ์‚ฌํ•ญ ๋ฐœ์ƒ ์‹œ ๋„์ž… ๊ฒ€ํ† 

GitOps Best Practices

  • Repository ๊ตฌ์กฐ

    - ๋ชจ๋…ธ๋ ˆํฌ vs ํด๋ฆฌ๋ ˆํฌ ๊ฒฐ์ •

    - base/ + overlays/ Kustomize ๊ตฌ์กฐ

    - ํ™˜๊ฒฝ๋ณ„ ๋ธŒ๋žœ์น˜ ๋˜๋Š” ๋””๋ ‰ํ† ๋ฆฌ ๋ถ„๋ฆฌ

  • Sync ์ „๋žต

    - Dev/Staging: automated + selfHeal: true

    - Production: manual ๋˜๋Š” Sync Window ์„ค์ •

    - prune: true ์‹ ์ค‘ํ•˜๊ฒŒ ์ ์šฉ

  • ๋ณด์•ˆ

    - Sealed Secrets ๋˜๋Š” External Secrets Operator

    - RBAC: AppProject๋ณ„ ๋„ค์ž„์ŠคํŽ˜์ด์Šค ์ œํ•œ

    - SSO ํ†ตํ•ฉ (OIDC)

  • ์šด์˜

    - Notification ์„ค์ • (Slack, PagerDuty)

    - Application ์ƒํƒœ ๋Œ€์‹œ๋ณด๋“œ ๊ตฌ์„ฑ

    - ์ •๊ธฐ์  Diff ๋ฆฌํฌํŠธ ๊ฒ€ํ† 

AWS ๋ฆฌ์†Œ์Šค ๊ด€๋ฆฌ: Terraform vs ArgoCD vs ACK

๋ฆฌ์†Œ์Šค ์œ ํ˜•๊ด€๋ฆฌ ๋„๊ตฌ์˜ˆ์‹œ๋ณ€๊ฒฝ ๋นˆ๋„
์ธํ”„๋ผ ๊ธฐ๋ฐ˜
Terraform
VPC, EKS Cluster, RDS, IAM Role๋‚ฎ์Œ
EKS ํ•ต์‹ฌ ์• ๋“œ์˜จ
Terraform (EKS Add-on)
VPC CNI, CoreDNS, kube-proxy, EBS CSI๋‚ฎ์Œ
ํด๋Ÿฌ์Šคํ„ฐ ์ปจํŠธ๋กค๋Ÿฌ
ArgoCD
ALB Controller, External DNS, ESO, Cert Manager์ค‘๊ฐ„
App ์ „์šฉ AWS ๋ฆฌ์†Œ์Šค
ArgoCD
or
ACK
App๋ณ„ SQS Queue, S3 Bucket, IAM Policy๋†’์Œ

ํ˜„์žฌ ์›์น™๊ณผ ์ผ์น˜: "EKS ๋‚ด ๋ฆฌ์†Œ์Šค๋Š” ArgoCD, ๊ณต์šฉ AWS ๋ฆฌ์†Œ์Šค๋Š” Terraform"

ACK ๋„์ž… ์‹œ์ : AppํŒ€์ด K8s ๋งค๋‹ˆํŽ˜์ŠคํŠธ๋กœ SQS, S3 ๋“ฑ AWS ๋ฆฌ์†Œ์Šค๋ฅผ ์ง์ ‘ ๊ด€๋ฆฌํ•ด์•ผ ํ•  ๋•Œ. ํ˜„์žฌ ๋‹จ๊ณ„์—์„œ๋Š” Terraform + ArgoCD ์ด์›ํ™”๋กœ ์ถฉ๋ถ„

Block 02 Quiz

Q1: GitOps์˜ ํ•ต์‹ฌ ์›์น™์€?
Q2: Argo Rollouts์—์„œ Canary ๋ฐฐํฌ ์‹คํŒจ ์‹œ ๋™์ž‘์€?
Q3: Zone-Aware Rollouts์˜ ์ฃผ์š” ์žฅ์ ์€?
Q4: ArgoCD์—์„œ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ ๊ถŒ์žฅ Sync ์ •์ฑ…์€?

ECS โ†’ EKS Migration Deep Dive

Scaling Strategy โ€” Karpenter & KEDA (25 min)

์˜ค์ค€์„ (Junseok Oh)

Sr. Solutions Architect, AWS

Scaling Pain Point

Cluster Autoscaler vs Karpenter

Grafana: EKS Node Monitoring

CPU Requests vs Usage โ€” Karpenter ํŒ๋‹จ ๊ธฐ์ค€ ์‹œ๊ฐํ™”

NodePool ์„ค์ •

nodepool.yamlapiVersion: karpenter.sh/v1 kind: NodePool metadata: name: default spec: template: spec: requirements: # Spot + On-Demand ํ˜ผํ•ฉ - key: karpenter.sh/capacity-type operator: In values: ["spot", "on-demand"] # ๋‹ค์–‘ํ•œ ์ธ์Šคํ„ด์Šค ํŒจ๋ฐ€๋ฆฌ - key: karpenter.k8s.aws/instance-family operator: In values: ["m5", "m6i", "m6a", "c5", "c6i", "r5", "r6i"] # ์•„ํ‚คํ…์ฒ˜ - key: kubernetes.io/arch operator: In values: ["amd64"] nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default # ๋ฆฌ์†Œ์Šค ์ œํ•œ limits: cpu: 1000 memory: 1000Gi # Disruption ์ •์ฑ… disruption: consolidationPolicy: WhenEmptyOrUnderutilized consolidateAfter: 1m

Karpenter Consolidation

KEDA RPS ๊ธฐ๋ฐ˜ ์˜คํ† ์Šค์ผ€์ผ๋ง

100

KEDA Architecture

KEDA Scalers

apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: sqs-scaler spec: scaleTargetRef: name: order-processor minReplicaCount: 0 # Zero scaling ๊ฐ€๋Šฅ! maxReplicaCount: 50 triggers: - type: aws-sqs-queue metadata: queueURL: https://sqs.ap-northeast-2.amazonaws.com/123456789/orders queueLength: "5" # ๋ฉ”์‹œ์ง€ 5๊ฐœ๋‹น Pod 1๊ฐœ awsRegion: ap-northeast-2
triggers: - type: prometheus metadata: serverAddress: http://prometheus:9090 metricName: http_requests_per_second threshold: "100" # RPS 100 ์ดˆ๊ณผ ์‹œ ์Šค์ผ€์ผ ์•„์›ƒ query: | sum(rate(http_requests_total{ service="api-gateway" }[1m]))
triggers: - type: cron metadata: timezone: Asia/Seoul start: 0 9 * * 1-5 # ํ‰์ผ 09:00 end: 0 18 * * 1-5 # ํ‰์ผ 18:00 desiredReplicas: "10" # ์—…๋ฌด์‹œ๊ฐ„ 10๊ฐœ ์œ ์ง€
triggers: - type: aws-cloudwatch metadata: namespace: AWS/ApplicationELB dimensionName: LoadBalancer dimensionValue: app/my-alb/1234567890 metricName: RequestCount targetValue: "1000" awsRegion: ap-northeast-2

KEDA + Istio Metrics

keda-istio.yamlapiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: istio-rps-scaler spec: scaleTargetRef: name: api-server minReplicaCount: 2 maxReplicaCount: 20 triggers: - type: prometheus metadata: serverAddress: http://prometheus.istio-system:9090 metricName: istio_requests_per_second threshold: "50" # Pod๋‹น RPS 50 ์œ ์ง€ query: | sum(rate(istio_requests_total{ destination_service="api-server.production.svc.cluster.local", response_code!~"5.*" }[1m])) / count(kube_pod_info{ namespace="production", pod=~"api-server-.*" })

Batch & Schedule ์›Œํฌ๋กœ๋“œ ํŒจํ„ด

Batch ์ „์šฉ Karpenter NodePool

batch-nodepool.yaml# Batch ์ „์šฉ NodePool โ€” Spot ์ตœ์ ํ™” apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: batch spec: template: metadata: labels: workload-type: batch spec: requirements: # Batch๋Š” Spot Only (๋น„์šฉ ์ตœ์ ํ™”) - key: karpenter.sh/capacity-type operator: In values: ["spot"] # ์ปดํ“จํŒ… ์ตœ์ ํ™” ํŒจ๋ฐ€๋ฆฌ - key: karpenter.k8s.aws/instance-family operator: In values: ["c5", "c5a", "c6i", "c6a", "c7i", "c7a"] - key: kubernetes.io/arch operator: In values: ["amd64", "arm64"] nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: batch # Batch Pod ์™„๋ฃŒ ํ›„ ๋น ๋ฅธ ๋…ธ๋“œ ํšŒ์ˆ˜ expireAfter: 2h limits: cpu: 200 disruption: consolidationPolicy: WhenEmpty consolidateAfter: 30s # ๋นˆ ๋…ธ๋“œ 30์ดˆ ํ›„ ์ฆ‰์‹œ ์ •๋ฆฌ
batch-pod-example.yaml# Batch Pod์— Toleration + NodeSelector ์ถ”๊ฐ€ spec: nodeSelector: workload-type: batch tolerations: - key: workload-type value: batch effect: NoSchedule

ํšจ๊ณผ: Batch ์™„๋ฃŒ ํ›„ ๋…ธ๋“œ ์ž๋™ ํšŒ์ˆ˜ โ†’ ์œ ํœด ๋น„์šฉ 0์›

HPA โ†’ Karpenter โ†’ KEDA ๋„์ž… ๋กœ๋“œ๋งต

1
Phase 1 (Week 1-2)
Karpenter ์„ค์น˜ ๋ฐ NodePool ๊ตฌ์„ฑ ๊ธฐ์กด Cluster Autoscaler์™€ ๋ณ‘ํ–‰ ์šด์˜ ํ…Œ์ŠคํŠธ ์›Œํฌ๋กœ๋“œ๋กœ ๊ฒ€์ฆ
2
Phase 2 (Week 3-4)
Cluster Autoscaler ์ œ๊ฑฐ Karpenter Consolidation ํ™œ์„ฑํ™” Spot Instance ๋น„์œจ ํ™•๋Œ€
3
Phase 3 (Week 5-6)
KEDA ์„ค์น˜ SQS ๊ธฐ๋ฐ˜ ์›Œ์ปค ์Šค์ผ€์ผ๋ง ์ ์šฉ Prometheus ๋ฉ”ํŠธ๋ฆญ ์—ฐ๋™
4
Phase 4 (Week 7-8)
Istio ๋ฉ”ํŠธ๋ฆญ ๊ธฐ๋ฐ˜ ์Šค์ผ€์ผ๋ง Cron ์Šค์ผ€์ผ๋Ÿฌ๋กœ ์˜ˆ์ธก ์Šค์ผ€์ผ๋ง Zero Scaling ์ ์šฉ (๋น„์šฉ ์ตœ์ ํ™”)

Block 03 Quiz

Q1: Karpenter๊ฐ€ Cluster Autoscaler๋ณด๋‹ค ๋น ๋ฅธ ์ด์œ ๋Š”?
Q2: Karpenter Consolidation์˜ ๋ชฉ์ ์€?
Q3: KEDA์˜ ๊ฐ€์žฅ ํฐ ์žฅ์ ์€?
Q4: Spot Instance ๋‹ค๊ฐํ™” ์ „๋žต์ด ์ค‘์š”ํ•œ ์ด์œ ๋Š”?

ECS โ†’ EKS Migration Deep Dive

Security & Platform Engineering (20 min)

์˜ค์ค€์„ (Junseok Oh)

Sr. Solutions Architect, AWS

EKS Security Architecture

๋ณด์•ˆ ๊ณ„์ธต:

  1. IAM Layer: OIDC Provider โ†’ IRSA/Pod Identity โ†’ AWS ์„œ๋น„์Šค ์ ‘๊ทผ
  2. Cluster Layer: RBAC, Access Entries, Cluster Endpoint ์ ‘๊ทผ ์ œ์–ด
  3. Workload Layer: Pod Security Standards, Network Policy
  4. Data Layer: Secrets Manager, KMS ์•”ํ˜ธํ™”

aws-auth vs Access Entries

aws-auth ConfigMap (Legacy)

  • ConfigMap ๊ธฐ๋ฐ˜ IAM ๋งคํ•‘
  • ์ˆ˜๋™ ํŽธ์ง‘ ํ•„์š” (์˜ค๋ฅ˜ ์‹œ ์ ‘๊ทผ ๋ถˆ๊ฐ€)
  • ๋ณ€๊ฒฝ ์ด๋ ฅ ์ถ”์  ์–ด๋ ค์›€
  • GitOps์™€ ์ถฉ๋Œ ๊ฐ€๋Šฅ์„ฑ
  • ๋‹จ์ : ConfigMap ์‚ญ์ œ ์‹œ ํด๋Ÿฌ์Šคํ„ฐ ์ž ๊ธˆ
# aws-auth ConfigMap apiVersion: v1 kind: ConfigMap metadata: name: aws-auth namespace: kube-system data: mapRoles: | - rolearn: arn:aws:iam::role/NodeRole username: system:node:{{...}} groups: - system:bootstrappers - system:nodes

Access Entries (๊ถŒ์žฅ)

  • EKS API ๊ธฐ๋ฐ˜ IAM ๋งคํ•‘
  • AWS CLI/Console/IaC๋กœ ๊ด€๋ฆฌ
  • CloudTrail ๊ฐ์‚ฌ ๋กœ๊ทธ ์ž๋™ ๊ธฐ๋ก
  • Terraform/CDK ์™„๋ฒฝ ์ง€์›
  • ์žฅ์ : ConfigMap ์‚ญ์ œํ•ด๋„ ์ ‘๊ทผ ์œ ์ง€
# Access Entry ์ƒ์„ฑ aws eks create-access-entry \ --cluster-name my-cluster \ --principal-arn arn:aws:iam::role/DevRole \ --type STANDARD

Access Entries ์„ค์ •

bash# 1. Access Entry ์ƒ์„ฑ aws eks create-access-entry \ --cluster-name hwahae-prod \ --principal-arn arn:aws:iam::123456789012:role/DevOpsRole \ --type STANDARD # 2. Access Policy ์—ฐ๊ฒฐ (RBAC ๊ถŒํ•œ ๋ถ€์—ฌ) aws eks associate-access-policy \ --cluster-name hwahae-prod \ --principal-arn arn:aws:iam::123456789012:role/DevOpsRole \ --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy \ --access-scope type=cluster # ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ Policy: # - AmazonEKSClusterAdminPolicy (์ „์ฒด ๊ด€๋ฆฌ์ž) # - AmazonEKSAdminPolicy (๋„ค์ž„์ŠคํŽ˜์ด์Šค ๊ด€๋ฆฌ์ž) # - AmazonEKSEditPolicy (ํŽธ์ง‘ ๊ถŒํ•œ) # - AmazonEKSViewPolicy (์ฝ๊ธฐ ๊ถŒํ•œ)
terraform/access-entries.tf# Terraform์œผ๋กœ Access Entry ๊ด€๋ฆฌ resource "aws_eks_access_entry" "devops" { cluster_name = aws_eks_cluster.main.name principal_arn = aws_iam_role.devops.arn type = "STANDARD" } resource "aws_eks_access_policy_association" "devops" { cluster_name = aws_eks_cluster.main.name principal_arn = aws_iam_role.devops.arn policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSAdminPolicy" access_scope { type = "namespace" namespaces = ["production", "staging"] } }

RBAC & OIDC ์›Œํฌํ”Œ๋กœ์šฐ

๐Ÿ›ก๏ธ
๐Ÿ”‘ STEP 1: External Identity Provider (OIDC)
// ID Token (JWT) Claims
"sub": "alice-id"
"email": "alice@company.com"
"groups": ["dev-team", "admin-group"]
โ†’
K8s API ์„œ๋ฒ„๋Š” ์ด ํ† ํฐ์„ ์ฝ์–ด Alice๋ผ๋Š” ์œ ์ €์™€ dev-team์ด๋ผ๋Š” ๊ทธ๋ฃน์ด ์š”์ฒญ์„ ๋ณด๋ƒˆ์Œ์„ ์ธ์‹ํ•ฉ๋‹ˆ๋‹ค.
STEP 2: SUBJECTS
๐Ÿ‘ค
User (alice@...)
+
๐Ÿ‘ฅ
Group (dev-team)
+
๐Ÿค–
ServiceAccount
Bound to
โ†’
STEP 3: BINDING
๐Ÿ”—
RoleBinding
"Connects Subject to Role"
Refers
โ†’
STEP 4: PERMISSIONS
๐Ÿ›ก๏ธ
Role (Reader)
Namespace: App-A
๐Ÿ“ฆ Allowed Actions
- get, list, watch pods
- update deployments
โ„น๏ธ OIDC์™€ K8s ๊ทธ๋ฃน ๋งคํ•‘
ํ† ํฐ์ด ๊ณง ์‹ ๋ถ„์ฆ: JWT ์•ˆ์˜ groups ํ•„๋“œ๋กœ Subject ๋งคํ•‘
์„ค์ •: --oidc-groups-claim=groups
ServiceAccount: Pod/ํ”„๋กœ์„ธ์Šค์šฉ ๋‚ด๋ถ€ ๊ณ„์ •, OIDC ๋ถˆํ•„์š”
RBAC ํ™œ์šฉ: RoleBinding์˜ subjects์— kind: Group ์ง€์ •

RBAC โ€” ClusterRole, Role, RoleBinding

IAM Role (AWS)
  โ†“ Access Entries
EKS Cluster
  โ†“ ClusterRoleBinding / RoleBinding
Kubernetes RBAC
  โ”œโ”€ ClusterRole    โ†’ ํด๋Ÿฌ์Šคํ„ฐ ์ „์ฒด ๋ฒ”์œ„
  โ”œโ”€ Role           โ†’ ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋ฒ”์œ„
  โ”œโ”€ ClusterRoleBinding โ†’ ClusterRole โ†” Subject ์—ฐ๊ฒฐ
  โ””โ”€ RoleBinding    โ†’ Role โ†” Subject ์—ฐ๊ฒฐ
๋ฆฌ์†Œ์Šค๋ฒ”์œ„์šฉ๋„
ClusterRoleํด๋Ÿฌ์Šคํ„ฐ ์ „์ฒด๋…ธ๋“œ ์กฐํšŒ, CRD ๊ด€๋ฆฌ, ์ „์ฒด ๋„ค์ž„์ŠคํŽ˜์ด์Šค ์ฝ๊ธฐ
Role๋„ค์ž„์ŠคํŽ˜์ด์ŠคํŠน์ • ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋‚ด Pod/Service/ConfigMap ๊ด€๋ฆฌ
ClusterRoleBindingํด๋Ÿฌ์Šคํ„ฐ ์ „์ฒดClusterRole์„ ์‚ฌ์šฉ์ž/๊ทธ๋ฃน์— ๋ฐ”์ธ๋”ฉ
RoleBinding๋„ค์ž„์ŠคํŽ˜์ด์ŠคRole ๋˜๋Š” ClusterRole์„ ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋ฒ”์œ„๋กœ ๋ฐ”์ธ๋”ฉ

ํ•ต์‹ฌ: RoleBinding์€ ClusterRole๋„ ์ฐธ์กฐ ๊ฐ€๋Šฅ โ†’ ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋ฒ”์œ„๋กœ ์ถ•์†Œ ์ ์šฉ

ClusterRole: namespace-viewer# 1. ClusterRole โ€” ๊ณตํ†ต ์ฝ๊ธฐ ๊ถŒํ•œ (์ „์ฒด ํŒ€ ๊ณต์œ ) apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: namespace-viewer rules: - apiGroups: [""] resources: ["pods", "services", "configmaps", "events"] verbs: ["get", "list", "watch"] - apiGroups: ["apps"] resources: ["deployments", "replicasets", "statefulsets"] verbs: ["get", "list", "watch"]
Role: deployer# 2. Role โ€” ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ณ„ ๋ฐฐํฌ ๊ถŒํ•œ apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: deployer namespace: backend rules: - apiGroups: ["apps"] resources: ["deployments", "replicasets"] verbs: ["get", "list", "watch", "create", "update", "patch"] - apiGroups: [""] resources: ["pods", "pods/log", "pods/exec"] verbs: ["get", "list", "watch", "create"] - apiGroups: [""] resources: ["configmaps", "secrets"] verbs: ["get", "list", "watch", "create", "update"]
RoleBinding# 3. RoleBinding โ€” ํŒ€๋ณ„ ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๊ถŒํ•œ ๋ถ€์—ฌ apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: backend-deployer namespace: backend subjects: # IAM Role โ†’ Access Entry โ†’ Kubernetes Group ๋งคํ•‘ - kind: Group name: backend-team apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: deployer apiGroup: rbac.authorization.k8s.io
ClusterRoleBinding# 4. ClusterRoleBinding โ€” ์ „์ฒด ์ฝ๊ธฐ ๊ถŒํ•œ apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: all-teams-viewer subjects: - kind: Group name: all-developers apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: namespace-viewer apiGroup: rbac.authorization.k8s.io
RoleBinding with ClusterRole# 5. RoleBinding์œผ๋กœ ClusterRole ๋ฒ”์œ„ ์ถ•์†Œ apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: frontend-viewer namespace: frontend subjects: - kind: Group name: frontend-team apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole # ClusterRole์ด์ง€๋งŒ name: namespace-viewer # ์ด ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋งŒ ์ ์šฉ๋จ apiGroup: rbac.authorization.k8s.io
bash# IAM Role โ†’ Kubernetes Group ๋งคํ•‘ aws eks create-access-entry \ --cluster-name hwahae-prod \ --principal-arn arn:aws:iam::123456789012:role/BackendDevRole \ --type STANDARD \ --kubernetes-groups backend-team,all-developers
IAM Role: BackendDevRole
  โ†“ Access Entry (kubernetes-groups: backend-team, all-developers)
  โ”œโ”€ ClusterRoleBinding: all-teams-viewer โ†’ ์ „์ฒด ์ฝ๊ธฐ
  โ””โ”€ RoleBinding: backend-deployer โ†’ backend NS ๋ฐฐํฌ ๊ถŒํ•œ

๊ถŒ์žฅ ํŒจํ„ด:

  • Platform Team: cluster-admin ClusterRoleBinding
  • Backend Team: ์ž๊ธฐ ๋„ค์ž„์ŠคํŽ˜์ด์Šค deployer Role + ์ „์ฒด viewer ClusterRole
  • Frontend Team: ์ž๊ธฐ ๋„ค์ž„์ŠคํŽ˜์ด์Šค deployer Role + ์ „์ฒด viewer ClusterRole
  • CI/CD (ARC): ๋Œ€์ƒ ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ณ„ deployer RoleBinding

External Secrets Operator ํ๋ฆ„

๋™์ž‘ ํ๋ฆ„:

  1. ExternalSecret CR ์ƒ์„ฑ โ†’ ESO Controller ๊ฐ์ง€
  2. SecretStore ์ฐธ์กฐ โ†’ IRSA๋กœ AWS ์ธ์ฆ
  3. Secrets Manager์—์„œ ๊ฐ’ ์กฐํšŒ
  4. Kubernetes Secret ์ž๋™ ์ƒ์„ฑ/๊ฐฑ์‹ 
  5. Pod์—์„œ Secret ์‚ฌ์šฉ

Network Policy ๊ตฌํ˜„

# VPC CNI v1.14+ ์—์„œ NetworkPolicy ์ง€์› # ํ™œ์„ฑํ™” ๋ช…๋ น: # kubectl set env daemonset aws-node -n kube-system \ # ENABLE_NETWORK_POLICY=true apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-policy namespace: production spec: podSelector: matchLabels: app: api-server policyTypes: - Ingress - Egress ingress: - from: - podSelector: matchLabels: app: frontend ports: - port: 8080 egress: - to: - podSelector: matchLabels: app: database ports: - port: 5432
# Calico ์ถ”๊ฐ€ ๊ธฐ๋Šฅ: GlobalNetworkPolicy apiVersion: projectcalico.org/v3 kind: GlobalNetworkPolicy metadata: name: default-deny-all spec: selector: all() types: - Ingress - Egress # ๊ทœ์น™ ์—†์Œ = ๋ชจ๋“  ํŠธ๋ž˜ํ”ฝ ์ฐจ๋‹จ

2. DNS ํ—ˆ์šฉ (ํ•„์ˆ˜)

allow-dns.yamlapiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-dns namespace: production spec: podSelector: {} policyTypes: - Egress egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: - protocol: UDP port: 53

Platform Engineering โ€” Self-Service ์•„ํ‚คํ…์ฒ˜

ApplicationSet์œผ๋กœ ํŒ€๋ณ„ ์ž๋™ ๋ฐฐํฌ

appset-team-onboarding.yaml# 1. ApplicationSet โ€” Git Directory Generator apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet metadata: name: team-onboarding namespace: argocd spec: generators: - git: repoURL: https://github.com/hwahae/platform-config revision: main directories: - path: "teams/*" # teams/backend, teams/frontend, ... template: metadata: name: "team-{{path.basename}}" spec: project: "{{path.basename}}" source: repoURL: https://github.com/hwahae/platform-config path: "{{path}}" targetRevision: main destination: server: https://kubernetes.default.svc namespace: "{{path.basename}}" syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true
teams/backend/kustomization.yaml# ํŒ€ ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ: teams/backend/ resources: - namespace.yaml # Namespace + Labels - rbac.yaml # Role + RoleBinding - resource-quota.yaml # CPU/Memory ์ œํ•œ - limit-range.yaml # Pod ๊ธฐ๋ณธ ๋ฆฌ์†Œ์Šค - network-policy.yaml # Default Deny + DNS Allow - service-monitor.yaml # Prometheus ์ž๋™ ์ˆ˜์ง‘

๊ฒฐ๊ณผ: teams/ ๋””๋ ‰ํ† ๋ฆฌ์— ํด๋”๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด โ†’ ๋„ค์ž„์ŠคํŽ˜์ด์Šค + ๋ชจ๋“  ๊ธฐ๋ณธ ์ •์ฑ… ์ž๋™ ์ƒ์„ฑ

Namespace Provisioner ๋ฒˆ๋“ค

namespace-bundle.yaml# 1. ResourceQuota โ€” ํŒ€๋ณ„ ๋ฆฌ์†Œ์Šค ์ƒํ•œ apiVersion: v1 kind: ResourceQuota metadata: name: team-quota spec: hard: requests.cpu: "20" requests.memory: "40Gi" limits.cpu: "40" limits.memory: "80Gi" pods: "100"
limit-range.yaml# 2. LimitRange โ€” Pod ๊ธฐ๋ณธ๊ฐ’ ๋ฐ ์ตœ๋Œ€๊ฐ’ apiVersion: v1 kind: LimitRange metadata: name: default-limits spec: limits: - type: Container default: cpu: "200m" memory: "256Mi" defaultRequest: cpu: "100m" memory: "128Mi"
network-policy.yaml# 3. Default NetworkPolicy โ€” Ingress Deny apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-ingress spec: podSelector: {} policyTypes: - Ingress
allow-dns.yaml# 4. DNS ํ—ˆ์šฉ apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-dns spec: podSelector: {} policyTypes: - Egress egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: - protocol: UDP port: 53

Self-Service ํ๋ฆ„: ๊ฐœ๋ฐœ์ž teams/my-team/ ์ƒ์„ฑ โ†’ PR โ†’ Platform Team ๋ฆฌ๋ทฐ โ†’ ArgoCD ์ž๋™ Namespace + ๋ฒˆ๋“ค ์ƒ์„ฑ

Block 04 Quiz

Q1: Access Entries๊ฐ€ aws-auth ConfigMap๋ณด๋‹ค ์ข‹์€ ์ด์œ ๋Š”?
Q2: External Secrets Operator์˜ ์—ญํ• ์€?
Q3: Network Policy์˜ Default Deny ํŒจํ„ด์—์„œ ๋ฐ˜๋“œ์‹œ ํ—ˆ์šฉํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ์€?
Q4: IRSA(IAM Roles for Service Accounts)์˜ ์žฅ์ ์€?

ECS โ†’ EKS Migration Deep Dive

Observability & Next Steps (20 min)

์˜ค์ค€์„ (Junseok Oh)

Sr. Solutions Architect, AWS

Observability 3 Pillars

๐Ÿ“‹

Logs

๊ฐœ๋ณ„ ์ด๋ฒคํŠธ์˜ ๋ถˆ๋ณ€ ๊ธฐ๋ก
๋””๋ฒ„๊น… & ๊ฐ์‚ฌ ์ถ”์ 

Loki / CloudWatch
Label
๋งค์นญ
โ‡„
๐Ÿ“Š

Metrics

์‹œ๊ณ„์—ด ์ˆ˜์น˜ ๋ฐ์ดํ„ฐ
์ง‘๊ณ„ & ์•Œ๋žŒ ๊ธฐ๋ฐ˜

Prometheus / CloudWatch
Exemplar
โ‡„
๐Ÿ”

Traces

๋ถ„์‚ฐ ์š”์ฒญ ๊ฒฝ๋กœ ์ถ”์ 
์ง€์—ฐ ๊ตฌ๊ฐ„ ๋ถ„์„

X-Ray / Tempo
TraceID๋กœ Logs โ†” Traces ์—ฐ๊ฒฐ | Exemplar๋กœ Metrics โ†’ Traces ์ ํ”„

ํ˜„์žฌ vs ๋ชฉํ‘œ ๊ด€์ธก์„ฑ ์Šคํƒ

์˜์—ญ๋„๊ตฌํ•œ๊ณ„
LogsCloudWatch Logs, OpenSearch๋น„์šฉ ์ฆ๊ฐ€, ์ฟผ๋ฆฌ ๋ณต์žก
MetricsCloudWatch Metrics์ปค์Šคํ…€ ๋ฉ”ํŠธ๋ฆญ ๋น„์šฉ, 15๊ฐœ์›” ๋ณด์กด
TracesX-Ray (๋ถ€๋ถ„ ์ ์šฉ)์ƒ˜ํ”Œ๋ง ์ œํ•œ, ์ปจํ…์ŠคํŠธ ์ „ํŒŒ ์–ด๋ ค์›€
DashboardCloudWatch Dashboards์ œํ•œ๋œ ์‹œ๊ฐํ™”
์˜์—ญ๋„๊ตฌ์žฅ์ 
LogsLoki + Fluent Bit๊ฒฝ๋Ÿ‰, ๋ผ๋ฒจ ๊ธฐ๋ฐ˜ ์ฟผ๋ฆฌ, ๋น„์šฉ ํšจ์œจ
MetricsPrometheus (AMP) + GrafanaPromQL, ๋ฌด์ œํ•œ ์ปค์Šคํ…€ ๋ฉ”ํŠธ๋ฆญ
TracesTempo + OpenTelemetry๋ฒค๋” ์ค‘๋ฆฝ, ์ „์ฒด ์ƒ˜ํ”Œ๋ง ๊ฐ€๋Šฅ
DashboardGrafana (AMG)ํ†ตํ•ฉ ์‹œ๊ฐํ™”, ์•Œ๋ฆผ, ์ƒ๊ด€๋ถ„์„

Prometheus + Grafana ์•„ํ‚คํ…์ฒ˜

App Pod
App Pod
App Pod
/metrics
โ† scrape
ServiceMonitor
Prometheus Operator CRD
โ†’ configure
Prometheus
Pull ๋ฐฉ์‹ ์ˆ˜์ง‘
โ†’ remote write
AMP
Amazon Managed
Prometheus
โ†’ PromQL
Grafana (AMG)
๋Œ€์‹œ๋ณด๋“œ & ์•Œ๋ฆผ

ServiceMonitor

๋ผ๋ฒจ ์…€๋ ‰ํ„ฐ๋กœ ์Šคํฌ๋ž˜ํ•‘ ๋Œ€์ƒ ์ž๋™ ๋ฐœ๊ฒฌ. ์ƒˆ ์„œ๋น„์Šค ๋ฐฐํฌ ์‹œ ์ˆ˜๋™ ์„ค์ • ๋ถˆํ•„์š”

Pull ๊ธฐ๋ฐ˜ ์ˆ˜์ง‘

Prometheus๊ฐ€ /metrics ์—”๋“œํฌ์ธํŠธ๋ฅผ ์ฃผ๊ธฐ์ ์œผ๋กœ ์Šคํฌ๋ž˜ํ•‘ (๊ธฐ๋ณธ 30s)

AWS ๊ด€๋ฆฌํ˜•

AMP + AMG๋กœ ์šด์˜ ๋ถ€๋‹ด ์ตœ์†Œํ™”. HA/์Šคํ† ๋ฆฌ์ง€ ์ž๋™ ๊ด€๋ฆฌ

Prometheus + Grafana ์…‹์—…

servicemonitor.yamlapiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: hwahae-api namespace: monitoring spec: selector: matchLabels: app: hwahae-api # ์ž๋™ ๋ฐœ๊ฒฌ endpoints: - port: metrics # metrics ํฌํŠธ interval: 30s # 30์ดˆ ์Šคํฌ๋ž˜ํ•‘ path: /metrics namespaceSelector: matchNames: - production - staging
grafana-dashboard-cm.yamlapiVersion: v1 kind: ConfigMap metadata: name: hwahae-dashboard labels: grafana_dashboard: "1" # Grafana sidecar๊ฐ€ ์ž๋™ ๋กœ๋“œ data: hwahae.json: | { "title": "Hwahae API Dashboard", "panels": [...] }

Datadog vs OSS Stack

Datadog

ํ•ญ๋ชฉ๋‚ด์šฉ
๋น„์šฉํ˜ธ์ŠคํŠธ๋‹น $15-23/์›” + ๋กœ๊ทธ/APM ์ถ”๊ฐ€ ๋น„์šฉ
๊ธฐ๋Šฅ์˜ฌ์ธ์› (Logs, Metrics, APM, RUM)
์šด์˜ ๋ถ€๋‹ด๋‚ฎ์Œ (SaaS)
์žฅ์ ๋น ๋ฅธ ๋„์ž…, ํ†ตํ•ฉ UI, AI ๊ธฐ๋ฐ˜ ๋ถ„์„
๋‹จ์ ๋ฒค๋” ์ข…์†, ๋น„์šฉ ์˜ˆ์ธก ์–ด๋ ค์›€, ๋ฐ์ดํ„ฐ ์†Œ์œ ๊ถŒ

OSS Stack (Prometheus + Loki + Tempo)

ํ•ญ๋ชฉ๋‚ด์šฉ
๋น„์šฉ์ธํ”„๋ผ ๋น„์šฉ๋งŒ (AMP: GB๋‹น $0.03)
๊ธฐ๋Šฅ์กฐํ•ฉ ํ•„์š” (๊ฐ ๋„๊ตฌ ์—ญํ•  ๋ช…ํ™•)
์šด์˜ ๋ถ€๋‹ด์ค‘๊ฐ„ (AWS ๊ด€๋ฆฌํ˜• ์‚ฌ์šฉ ์‹œ ๋‚ฎ์Œ)
์žฅ์ ๋น„์šฉ ํˆฌ๋ช…์„ฑ, ๋ฒค๋” ์ค‘๋ฆฝ, ์ปค์Šคํ„ฐ๋งˆ์ด์ง•
๋‹จ์ ์ดˆ๊ธฐ ํ•™์Šต ๊ณก์„ , ํ†ตํ•ฉ ์„ค์ • ํ•„์š”

๊ธฐ์กด OpenSearch์™€ ๊ณต์กด ์ „๋žต

1
Phase 1
๋ฉ”ํŠธ๋ฆญ ์šฐ์„  (Month 1-2)
๊ธฐ์กด OpenSearch ์œ ์ง€,
Prometheus(AMP) + Grafana(AMG) ์ถ”๊ฐ€,
CloudWatch Container Insights ๋ณ‘ํ–‰
2
Phase 2
๋“€์–ผ ๋กœ๊ทธ ํŒŒ์ดํ”„๋ผ์ธ (Month 3-4)
Fluent Bit ๋“€์–ผ ์•„์›ƒํ’‹:
OpenSearch + ClickHouse,
์‹ ๊ทœ ์„œ๋น„์Šค๋Š” ClickHouse ์ „์šฉ
3
Phase 3
ํŠธ๋ ˆ์ด์‹ฑ ๋„์ž… (Month 5-6)
OpenTelemetry Collector ์„ค์น˜,
ClickHouse์— ํŠธ๋ ˆ์ด์Šค ์ €์žฅ,
Grafana์—์„œ ์ƒ๊ด€๋ถ„์„
4
Phase 4
ํ†ตํ•ฉ ์™„๋ฃŒ (Month 7+)
OpenSearch ํŠธ๋ž˜ํ”ฝ ๋ชจ๋‹ˆํ„ฐ๋ง,
๋‹จ๊ณ„์  ์ถ•์†Œ ๋˜๋Š” ๊ณต์กด ์œ ์ง€,
๋น„์šฉ ๋น„๊ต ํ›„ ์ตœ์ข… ํŒ๋‹จ

Fluent Bit ๋“€์–ผ ์•„์›ƒํ’‹ ์„ค์ •

fluent-bit-dual-output.yaml# Fluent Bit ConfigMap โ€” ๋“€์–ผ ์•„์›ƒํ’‹ apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config namespace: logging data: output.conf: | # Output 1: ๊ธฐ์กด OpenSearch (๊ธฐ์กด ๋Œ€์‹œ๋ณด๋“œ ์œ ์ง€) [OUTPUT] Name opensearch Match kube.* Host vpc-hwahae-logs.ap-northeast-2.es.amazonaws.com Port 443 TLS On AWS_Auth On AWS_Region ap-northeast-2 Index k8s-logs Type _doc # Output 2: ์‹ ๊ทœ ClickHouse (Grafana ํ†ตํ•ฉ) [OUTPUT] Name http Match kube.* Host clickhouse.monitoring.svc Port 8123 URI /?query=INSERT+INTO+k8s_logs+FORMAT+JSONEachRow Format json_stream Json_date_key timestamp Json_date_format iso8601 # Output 3: ํŠน์ • ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋Š” ClickHouse ์ „์šฉ [OUTPUT] Name http Match kube.var.log.containers.*_new-service_* Host clickhouse.monitoring.svc Port 8123 URI /?query=INSERT+INTO+k8s_logs+FORMAT+JSONEachRow Format json_stream

์ „ํ™˜ ์ „๋žต:

  • ๊ธฐ์กด ์„œ๋น„์Šค โ†’ OpenSearch + ClickHouse ๋“€์–ผ ์ „์†ก
  • ์‹ ๊ทœ ์„œ๋น„์Šค โ†’ ClickHouse ์ „์šฉ
  • 2-3๊ฐœ์›” ๋ณ‘ํ–‰ ํ›„ OpenSearch ์˜์กด๋„ ํ™•์ธ โ†’ ์ถ•์†Œ ํŒ๋‹จ

ํ•ต์‹ฌ ๋ฉ”ํŠธ๋ฆญ ๋ชจ๋‹ˆํ„ฐ๋ง ํŒ

Cluster Overview:

# ๋…ธ๋“œ CPU ์‚ฌ์šฉ๋ฅ  100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) # Pod Restart ์•Œ๋ฆผ (5๋ถ„ ๋‚ด 3ํšŒ ์ด์ƒ) increase(kube_pod_container_status_restarts_total[5m]) > 3 # Pending Pod ๊ฐ์ง€ (Karpenter ์Šค์ผ€์ผ๋ง ์ง€์—ฐ) kube_pod_status_phase{phase="Pending"} > 0

Application RED Metrics:

# Rate: ์ดˆ๋‹น ์š”์ฒญ ์ˆ˜ sum(rate(istio_requests_total{reporter="destination"}[5m])) by (destination_service) # Error: 5xx ์—๋Ÿฌ์œจ sum(rate(istio_requests_total{response_code=~"5.*"}[5m])) / sum(rate(istio_requests_total[5m])) # Duration: P99 ๋ ˆ์ดํ„ด์‹œ histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket[5m])) by (le, destination_service))
# ๋…ธ๋“œ ํ”„๋กœ๋น„์ €๋‹ ์‹œ๊ฐ„ karpenter_provisioner_scheduling_duration_seconds # Spot ์ค‘๋‹จ ํšŸ์ˆ˜ karpenter_interruption_received_messages_total # ๋…ธ๋“œ Consolidation ์ด๋ฒคํŠธ karpenter_disruption_pods_disrupted_total # ๋…ธ๋“œ๋‹น Pod ๋ฐ€๋„ count(kube_pod_info) by (node)

์•Œ๋ฆผ ๊ทœ์น™:

  • Spot ์ค‘๋‹จ ๋นˆ๋„ > 5ํšŒ/์ผ โ†’ ์ธ์Šคํ„ด์Šค ๋‹ค๊ฐํ™” ๊ฒ€ํ† 
  • Pending Pod > 0 for 2๋ถ„ โ†’ Karpenter ๋˜๋Š” ๋ฆฌ์†Œ์Šค ์ œํ•œ ํ™•์ธ
  • Consolidation ์‹คํŒจ โ†’ PDB ์„ค์ • ํ™•์ธ
# SQS ํ ๊นŠ์ด (KEDA CloudWatch Scaler) aws_sqs_approximate_number_of_messages_visible_average # KEDA ์Šค์ผ€์ผ๋ง ์ƒํƒœ keda_scaledobject_ready # Worker Pod vs ํ ๋ฉ”์‹œ์ง€ ๋น„์œจ count(kube_pod_info{pod=~"order-worker.*"}) / aws_sqs_approximate_number_of_messages_visible_average

ํ•ต์‹ฌ: ํ ๊นŠ์ด ๋Œ€๋น„ Worker Pod ์ˆ˜์˜ ๋น„์œจ์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜์—ฌ ์Šค์ผ€์ผ๋ง์ด ์ ์ ˆํ•œ์ง€ ํ™•์ธ

Blue/Green ํด๋Ÿฌ์Šคํ„ฐ ์—…๊ทธ๋ ˆ์ด๋“œ

1
์ค€๋น„
์ƒˆ ๋ฒ„์ „ ํด๋Ÿฌ์Šคํ„ฐ
ํ”„๋กœ๋น„์ €๋‹,
์• ๋“œ์˜จ ํ˜ธํ™˜์„ฑ ๊ฒ€์ฆ
2
๋ฐฐํฌ
GitOps๋กœ
์›Œํฌ๋กœ๋“œ ๋™๊ธฐํ™”,
Smoke ํ…Œ์ŠคํŠธ
3
ํŠธ๋ž˜ํ”ฝ ์ „ํ™˜
Route 53 ๊ฐ€์ค‘์น˜
10% โ†’ 50% โ†’ 100%
4
๊ฒ€์ฆ
๋ฉ”ํŠธ๋ฆญ/๋กœ๊ทธ
๋ชจ๋‹ˆํ„ฐ๋ง,
7์ผ ์•ˆ์ •ํ™”
5
์ •๋ฆฌ
์ด์ „ ํด๋Ÿฌ์Šคํ„ฐ ์ข…๋ฃŒ,
๋น„์šฉ ์ ˆ๊ฐ

DR ์ „๋žต: GitOps ์žฌ๊ตฌ์ถ• vs Velero ๋ฐฑ์—…

GitOps ๊ธฐ๋ฐ˜ ์žฌ๊ตฌ์ถ•

  • ์›๋ณธ: Git Repository = ํด๋Ÿฌ์Šคํ„ฐ ์ƒํƒœ
  • ๋ณต๊ตฌ: ArgoCD Sync โ†’ ์ „์ฒด ๋ณต์›
  • ๋Œ€์ƒ: Stateless ์›Œํฌ๋กœ๋“œ์— ์ตœ์ 
  • RTO: 30๋ถ„ ~ 1์‹œ๊ฐ„
  • ์ถ”๊ฐ€ ์ธํ”„๋ผ: ๋ถˆํ•„์š”
  • ๋น„์šฉ: ์—†์Œ (Git ํ™œ์šฉ)

Velero ๋ฐฑ์—…/๋ณต๊ตฌ

  • ์›๋ณธ: etcd ์Šค๋ƒ…์ƒท + PV ๋ฐฑ์—…
  • ๋ณต๊ตฌ: ๋„ค์ž„์ŠคํŽ˜์ด์Šค/๋ฆฌ์†Œ์Šค ๋‹จ์œ„ ์„ ํƒ์  ๋ณต๊ตฌ
  • ๋Œ€์ƒ: Stateful ์›Œํฌ๋กœ๋“œ ํ•„์ˆ˜
  • RTO: 15๋ถ„ ~ 30๋ถ„
  • ์ถ”๊ฐ€ ์ธํ”„๋ผ: S3 + Velero ์„œ๋ฒ„
  • ๋น„์šฉ: S3 ์ €์žฅ ๋น„์šฉ

๊ถŒ์žฅ: Stateless ์„œ๋น„์Šค โ†’ GitOps ์žฌ๊ตฌ์ถ• (์ถ”๊ฐ€ ๋น„์šฉ 0), DB โ†’ RDS ์ž์ฒด ๋ฐฑ์—…/๋ณต๊ตฌ ํ™œ์šฉ, PV ์žˆ๋Š” ์›Œํฌ๋กœ๋“œ๋งŒ Velero ์„ ํƒ์  ์ ์šฉ

๋น„์šฉ ๋ชจ๋‹ˆํ„ฐ๋ง & ์ตœ์ ํ™”

KubeCost

  • ๋„ค์ž„์ŠคํŽ˜์ด์Šค/์›Œํฌ๋กœ๋“œ๋ณ„ ๋น„์šฉ ํ• ๋‹น
  • ์œ ํœด ๋ฆฌ์†Œ์Šค ํƒ์ง€ ๋ฐ ์•Œ๋ฆผ
  • ๋น„์šฉ ์˜ˆ์ธก ๋Œ€์‹œ๋ณด๋“œ
  • ๋ฌด๋ฃŒ ํ‹ฐ์–ด๋กœ ์‹œ์ž‘ ๊ฐ€๋Šฅ

Right-sizing

  • VPA ๊ถŒ์žฅ๊ฐ’ ๊ธฐ๋ฐ˜ ๋ฆฌ์†Œ์Šค ์กฐ์ •
  • requests/limits ์ตœ์ ํ™”
  • Karpenter Consolidation ์—ฐ๊ณ„
  • Over-provisioning ๋ฐฉ์ง€

Savings Plans + Spot

  • Compute Savings Plans (1yr/3yr)
  • Spot Instance ํ™œ์šฉ (Karpenter)
  • On-Demand ๋Œ€๋น„ 40-70% ์ ˆ๊ฐ
  • RI โ†’ SP ์ „ํ™˜ ๊ฒ€ํ† 

$5K-$20K/์›” ๊ทœ๋ชจ โ†’ KubeCost ๋ฌด๋ฃŒ ํ‹ฐ์–ด๋กœ ์‹œ์ž‘, ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ณ„ ๋น„์šฉ ๊ฐ€์‹œ์„ฑ ํ™•๋ณด ํ›„ Right-sizing ์ ์šฉ. Savings Plans๋กœ ์žฅ๊ธฐ ์ ˆ๊ฐ

๋ถ„์‚ฐ ์ถ”์  ๊ตฌํ˜„: ADOT โ†’ X-Ray / Tempo

App (OTel SDK)
โ†’
ADOT Collector
DaemonSet
โ†’
X-Ray (AWS ๋„ค์ดํ‹ฐ๋ธŒ)
Tempo (Grafana ์—ฐ๋™)

๋น ๋ฅธ ์‹œ์ž‘ โ€” AWS ๊ด€๋ฆฌํ˜•

  • ADOT Collector DaemonSet ์„ค์น˜ (EKS Add-on ์ง€์›)
  • X-Ray ์ž๋™ ์—ฐ๋™ โ†’ AWS Console์—์„œ ํŠธ๋ ˆ์ด์Šค ํ™•์ธ
  • Java/Node.js Auto-instrumentation์œผ๋กœ ์ฝ”๋“œ ๋ณ€๊ฒฝ ์ตœ์†Œํ™”
# ADOT Add-on ์„ค์น˜ aws eks create-addon \ --cluster-name hwahae-eks \ --addon-name adot \ --addon-version v0.92.1-eksbuild.1

Grafana ํ†ตํ•ฉ โ€” Logs/Metrics/Traces ์ƒ๊ด€๋ถ„์„

  • Grafana Tempo ์ถ”๊ฐ€ ์„ค์น˜ (S3 ๋ฐฑ์—”๋“œ)
  • ADOT Collector์— Tempo exporter ์ถ”๊ฐ€
  • Grafana์—์„œ Loki โ†” Prometheus โ†” Tempo ์—ฐ๊ฒฐ
  • TraceID๋กœ ๋กœ๊ทธ-๋ฉ”ํŠธ๋ฆญ-ํŠธ๋ ˆ์ด์Šค ์›ํด๋ฆญ ์ „ํ™˜

์„œ๋น„์Šค ๊ณ„์ธก ํ™•๋Œ€

  • Auto-instrumentation โ†’ Manual Spans ์ถ”๊ฐ€
  • ๋น„์ฆˆ๋‹ˆ์Šค ๋กœ์ง ํ•ต์‹ฌ ๊ตฌ๊ฐ„ ์ปค์Šคํ…€ Span
  • DB ์ฟผ๋ฆฌ, ์™ธ๋ถ€ API ํ˜ธ์ถœ ์ถ”์ 
  • Sampling ์ „๋žต ์ˆ˜๋ฆฝ (Head-based โ†’ Tail-based)

ํ˜„์žฌ "๋ถ„์‚ฐ ์ถ”์  ๋ฏธ๊ตฌ์ถ•" โ†’ 1๋‹จ๊ณ„ ADOT + X-Ray๋ถ€ํ„ฐ ์‹œ์ž‘ ๊ถŒ์žฅ. EKS Add-on์œผ๋กœ ์„ค์น˜ํ•˜๋ฉด Terraform์œผ๋กœ ๊ด€๋ฆฌ ๊ฐ€๋Šฅ

30/60/90์ผ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๋กœ๋“œ๋งต

  • Day 1-30: ๊ธฐ๋ฐ˜ ๊ตฌ์ถ•
    - EKS ํด๋Ÿฌ์Šคํ„ฐ ํ”„๋กœ๋น„์ €๋‹ (Terraform/eksctl) - GitOps ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์„ฑ (ArgoCD) - Karpenter ๋…ธ๋“œ ์˜คํ† ์Šค์ผ€์ผ๋ง ์„ค์ • - ๋„คํŠธ์›Œํฌ ์ •์ฑ… ๋ฐ ๋ณด์•ˆ ๊ธฐ๋ณธ ์„ค์ •
  • Day 31-60: ์›Œํฌ๋กœ๋“œ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜
    - Stateless ์„œ๋น„์Šค ๋จผ์ € ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ - Canary ๋ฐฐํฌ๋กœ ํŠธ๋ž˜ํ”ฝ ์ ์ง„์  ์ „ํ™˜ - Observability ์Šคํƒ ๊ตฌ์ถ• (Prometheus, Loki) - KEDA ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์Šค์ผ€์ผ๋ง ์ ์šฉ
  • Day 61-90: ์ตœ์ ํ™” ๋ฐ ์•ˆ์ •ํ™”
    - Stateful ์„œ๋น„์Šค ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ (DB ์—ฐ๋™) - ๋น„์šฉ ์ตœ์ ํ™” (Spot, Savings Plans) - ์šด์˜ ์ž๋™ํ™” (์—…๊ทธ๋ ˆ์ด๋“œ ํŒŒ์ดํ”„๋ผ์ธ) - ECS ํด๋Ÿฌ์Šคํ„ฐ ์ข…๋ฃŒ ๋ฐ ์ •๋ฆฌ

Deep Dive ์ฐธ์กฐ ๋งํฌ

Block 05 Quiz

Q1: Observability์˜ 3๊ฐ€์ง€ ์ถ•์ด ์•„๋‹Œ ๊ฒƒ์€?
Q2: Prometheus๊ฐ€ ๋ฉ”ํŠธ๋ฆญ์„ ์ˆ˜์ง‘ํ•˜๋Š” ๋ฐฉ์‹์€?
Q3: EKS ํด๋Ÿฌ์Šคํ„ฐ ์—…๊ทธ๋ ˆ์ด๋“œ ์‹œ ๊ถŒ์žฅ๋˜๋Š” ์ „๋žต์€?

์ˆ˜๊ณ ํ•˜์…จ์Šต๋‹ˆ๋‹ค!

์ „์ฒด ์„ธ์…˜ ์งˆ๋ฌธ & ํ”ผ๋“œ๋ฐฑ์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค

โ† ๋ชฉ์ฐจ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

Speaker: ์˜ค์ค€์„ (์˜ค์ค€์„ (Junseok Oh)) ยท Email: junseoko@amazon.com

GitBook: atomoh.gitbook.io/kubernetes-docs