๋ฐ˜์‘ํ˜•
kkh1902
Steadily
kkh1902
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (178) N
    • DataEngineering (20) N
      • Spark (7) N
      • Airflow (2) N
      • DBT (2) N
      • Architecture (3) N
      • Data Quality (5) N
      • Infra (1) N
    • ๐Ÿค– AI (12) N
      • ML (7)
      • DL (0)
      • LLM (5) N
    • ๐Ÿ“š Study (74)
      • DataEngineering (0)
      • Spring (9)
      • Java (2)
      • Html, css (10)
      • JS, JQuery (29)
      • DB (5)
      • DevOps (13)
      • roadmap (2)
      • Architecture (1)
      • Flutter (2)
    • ๐Ÿ’ป Computer Science (28)
      • Datastructure (0)
      • Algorithm (2)
      • Design pattern (0)
      • Network (1)
      • DB (13)
      • Operating System (0)
      • Software Engineering (4)
      • CS interview (5)
      • topcit (3)
    • โš’๏ธ Etc (6)
      • Error (3)
      • Trouble_Shooting (2)
      • Dev_environment (1)
    • ๐Ÿ“ฐ News (24)
      • daily (7)
      • think (17)
    • ๐Ÿ“˜ Hobby (13)
      • English (13)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ๐Ÿ“‹ ์ด๋ ฅ์„œ
  • โšก๏ธ ๊นƒํ—ˆ๋ธŒ
  • ํƒœ๊ทธ
  • ๋ฐฉ๋ช…๋ก

๊ณต์ง€์‚ฌํ•ญ

์ธ๊ธฐ ๊ธ€

ํƒœ๊ทธ

  • testcode
  • Wonder # word
  • Flutter
  • think #bootstrap์„ ์จ์•ผํ•˜๋Š” ์ด์œ 
  • ์†Œํ”„ํŠธ์›จ์–ด ๊ณตํ•™ #project๋งŒ๋“ค๋•Œ ์ค‘์š”
  • React JS # ์ž์Šต์„œ # Component์™€ Props
  • Qr_payment project # CSS ํ•ด์„ # Basic ๋งจ์œ„ ํ•ด์„
  • React JS # 2 The Basic of React
  • Linear Regression
  • gitaction
  • React๋ฅผ ๋ฐฐ์›Œ์•ผํ•˜๋Š” ์ด์œ 
  • git
  • ์†Œํ”„ํŠธ์›จ์–ด ๊ณตํ•™ # chapter1
  • db
  • git stash
  • sourcetreee
  • SpringBootTest
  • React JS #์ž์Šต์„œ
  • junit5
  • React # JSX

์ตœ๊ทผ ๋Œ“๊ธ€

์ตœ๊ทผ ๊ธ€

ํ‹ฐ์Šคํ† ๋ฆฌ

250x250
hELLO ยท Designed By ์ •์ƒ์šฐ.
๊ธ€์“ฐ๊ธฐ / ๊ด€๋ฆฌ์ž
kkh1902

Steadily

DataEngineering/Architecture

๐Ÿ“ฆ Partition & ๐Ÿงฒ Clustering ์ •๋ฆฌ

2026. 1. 30. 19:41
728x90
๋ฐ˜์‘ํ˜•


๐Ÿ“ฆ Partition & ๐Ÿงฒ Clustering ์ •๋ฆฌ

ํ•œ ์ค„ ์š”์•ฝ

  • Partition ๐Ÿ‘‰ ๋ฐ์ดํ„ฐ๋ฅผ “ํฐ ๋ฉ์–ด๋ฆฌ”๋กœ ๋‚˜๋ˆ”
  • Clustering ๐Ÿ‘‰ ๊ทธ ์•ˆ์—์„œ “๋น„์Šทํ•œ ๊ฐ’๋ผ๋ฆฌ” ๋ชจ์•„๋‘ 

1๏ธโƒฃ Partition (ํŒŒํ‹ฐ์…˜)

๊ฐœ๋…

๐Ÿ‘‰ ํ…Œ์ด๋ธ”์„ ๋‚ ์งœ/๋ฒ”์œ„ ๊ธฐ์ค€์œผ๋กœ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ชผ๊ฐฌ

orders ํ…Œ์ด๋ธ”
 โ”œโ”€ 2024-01-01
 โ”œโ”€ 2024-01-02
 โ”œโ”€ 2024-01-03

์™œ ์“ฐ๋ƒ?

  • ํ•„์š”ํ•œ ํŒŒํ‹ฐ์…˜๋งŒ ์ฝ์Œ
  • ์ฟผ๋ฆฌ ๋น„์šฉ ↓
  • ์†๋„ ↑

์–ธ์ œ ์“ฐ๋ƒ?

  • ๋‚ ์งœ ์กฐ๊ฑด์ด ๊ฑฐ์˜ ํ•ญ์ƒ ์žˆ์Œ
  • WHERE order_date >= '2024-01-01'

๐Ÿ‘‰ ๋กœ๊ทธ, ์ฃผ๋ฌธ, ์ด๋ฒคํŠธ ๋ฐ์ดํ„ฐ = ๋ฌด์กฐ๊ฑด ํŒŒํ‹ฐ์…˜


์‹ค๋ฌด ๊ทœ์น™ (์ด๊ฑฐ ์ค‘์š”)

  • โœ… ์‹œ๊ฐ„ ์ปฌ๋Ÿผ (date / timestamp)
  • โŒ user_id, category ๊ฐ™์€ ๊ณ ์œ ๊ฐ’ X
  • ํ•˜๋ฃจ ๋‹จ์œ„๊ฐ€ ๊ธฐ๋ณธ

2๏ธโƒฃ Clustering (ํด๋Ÿฌ์Šคํ„ฐ๋ง)

๊ฐœ๋…

๐Ÿ‘‰ ํŒŒํ‹ฐ์…˜ ์•ˆ์—์„œ ํŠน์ • ์ปฌ๋Ÿผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ + ๋ฌถ๊ธฐ

2024-01-01 ํŒŒํ‹ฐ์…˜
 โ”œโ”€ user_id = 1
 โ”œโ”€ user_id = 1
 โ”œโ”€ user_id = 2
 โ”œโ”€ user_id = 2

์™œ ์“ฐ๋ƒ?

  • ํŠน์ • ์ปฌ๋Ÿผ์œผ๋กœ ์ž์ฃผ ํ•„ํ„ฐ/์กฐ์ธ
  • ์ „์ฒด ์Šค์บ” ์ค„์–ด๋“ฆ

์–ธ์ œ ์“ฐ๋ƒ?

  • WHERE / JOIN์— ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ์ปฌ๋Ÿผ
  • WHERE user_id = 123

๐Ÿ‘‰ user_id, product_id, country ๊ฐ™์€ ๊ฒƒ๋“ค


3๏ธโƒฃ Partition vs Clustering ์ฐจ์ด ํ•œ๋ˆˆ์—

๊ตฌ๋ถ„ Partition Clustering

๊ธฐ์ค€ ๋‚ ์งœ/๋ฒ”์œ„ ์ปฌ๋Ÿผ ๊ฐ’
ํฌ๊ธฐ ํผ ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘์Œ
ํšจ๊ณผ ์Šค์บ” ๋Œ€์ƒ ์ž์ฒด ๊ฐ์†Œ ์Šค์บ” ํšจ์œจ ์ฆ๊ฐ€
ํ•„์ˆ˜ ์—ฌ๋ถ€ ๊ฑฐ์˜ ํ•„์ˆ˜ ์žˆ์œผ๋ฉด ์ข‹์Œ
๋น„์šฉ ์ ˆ๊ฐ โญโญโญโญ โญโญ

4๏ธโƒฃ ๊ฐ™์ด ์“ฐ๋ฉด ์ด๋ ‡๊ฒŒ ๋จ (์ •์„)

PARTITION BY DATE(order_date)
CLUSTER BY user_id, product_id

์ฝ๋Š” ์ˆœ์„œ

  1. ๋‚ ์งœ ํŒŒํ‹ฐ์…˜ ์„ ํƒ
  2. ๊ทธ ์•ˆ์—์„œ user_id ๋ฌถ์Œ ํƒ์ƒ‰

๐Ÿ‘‰ ์„ฑ๋Šฅ + ๋น„์šฉ ์ตœ์  ์กฐํ•ฉ


5๏ธโƒฃ ์‹ค๋ฌด์—์„œ ์ž์ฃผ ํ•˜๋Š” ์‹ค์ˆ˜ โŒ

  • ํŒŒํ‹ฐ์…˜ ์•ˆ ๋‚˜๋ˆ ๋„ ๋˜๋Š” ์ž‘์€ ํ…Œ์ด๋ธ”์— ํŒŒํ‹ฐ์…˜
  • ํŒŒํ‹ฐ์…˜ ๋„ˆ๋ฌด ์ž˜๊ฒŒ ์ชผ๊ฐฌ (์‹œ๊ฐ„ ๋‹จ์œ„)
  • ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ปฌ๋Ÿผ์„ ๋„ˆ๋ฌด ๋งŽ์ด ๋„ฃ์Œ

๐Ÿ‘‰ ํŒŒํ‹ฐ์…˜ 1๊ฐœ + ํด๋Ÿฌ์Šคํ„ฐ 1~2๊ฐœ๊ฐ€ ๋ณดํ†ต ์ตœ์ 


6๏ธโƒฃ dbt / DW ๊ธฐ์ค€ ์ถ”์ฒœ ํŒจํ„ด

Silver (DW Core)

  • PARTITION BY date
  • CLUSTER BY business_key

Gold (DM / Fact)

  • PARTITION BY date
  • CLUSTER BY join_key

7๏ธโƒฃ ๋ฉด์ ‘์šฉ 20์ดˆ ๋‹ต๋ณ€ (์™ธ์›Œ๋„ ๋จ)

“Partition์€ ํ…Œ์ด๋ธ”์„ ๋‚ ์งœ ๊ฐ™์€ ํฐ ๊ธฐ์ค€์œผ๋กœ ๋‚˜๋ˆ 

์Šค์บ” ๋ฒ”์œ„๋ฅผ ์ค„์ด๋Š” ๋ฐฉ์‹์ด๊ณ ,

Clustering์€ ํŒŒํ‹ฐ์…˜ ๋‚ด๋ถ€์—์„œ ์ž์ฃผ ์กฐํšŒ๋˜๋Š” ์ปฌ๋Ÿผ ๊ธฐ์ค€์œผ๋กœ

๋ฐ์ดํ„ฐ๋ฅผ ์ •๋ ฌํ•ด ์ฝ๊ธฐ ํšจ์œจ์„ ๋†’์ด๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

์‹ค๋ฌด์—์„œ๋Š” ๋ณดํ†ต ๋‚ ์งœ๋กœ ํŒŒํ‹ฐ์…˜ํ•˜๊ณ ,

์กฐ์ธ์ด๋‚˜ ํ•„ํ„ฐ์— ์ž์ฃผ ์“ฐ๋Š” ์ปฌ๋Ÿผ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค.”


๋งˆ์ง€๋ง‰ ํ•ต์‹ฌ ๋ฌธ์žฅ

๐Ÿ‘‰ Partition์€ ‘์–ด๋””๊นŒ์ง€ ์ฝ์„์ง€’,Clustering์€ ‘์–ด๋–ป๊ฒŒ ๋นจ๋ฆฌ ์ฐพ์„์ง€’๋‹ค.


728x90
๋ฐ˜์‘ํ˜•

'DataEngineering > Architecture' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Medallion Architecture ์ •๋ฆฌ  (0) 2026.01.30
DW / DM ๊ตฌ์ถ• ์ฒดํฌ๋ฆฌ์ŠคํŠธ  (0) 2026.01.30
    'DataEngineering/Architecture' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
    • Medallion Architecture ์ •๋ฆฌ
    • DW / DM ๊ตฌ์ถ• ์ฒดํฌ๋ฆฌ์ŠคํŠธ
    kkh1902
    kkh1902
    1Day 1 Commit ๋ชฉํ‘œ ๊ณต๋ถ€ํ•œ๊ฒƒ๋“ค ๋งค์ผ ๊ธฐ๋กํ•˜๊ธฐ

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”