๋ฐ˜์‘ํ˜•
kkh1902
Steadily
kkh1902
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (178) N
    • DataEngineering (20) N
      • Spark (7) N
      • Airflow (2) N
      • DBT (2) N
      • Architecture (3) N
      • Data Quality (5) N
      • Infra (1) N
    • ๐Ÿค– AI (12) N
      • ML (7)
      • DL (0)
      • LLM (5) N
    • ๐Ÿ“š Study (74)
      • DataEngineering (0)
      • Spring (9)
      • Java (2)
      • Html, css (10)
      • JS, JQuery (29)
      • DB (5)
      • DevOps (13)
      • roadmap (2)
      • Architecture (1)
      • Flutter (2)
    • ๐Ÿ’ป Computer Science (28)
      • Datastructure (0)
      • Algorithm (2)
      • Design pattern (0)
      • Network (1)
      • DB (13)
      • Operating System (0)
      • Software Engineering (4)
      • CS interview (5)
      • topcit (3)
    • โš’๏ธ Etc (6)
      • Error (3)
      • Trouble_Shooting (2)
      • Dev_environment (1)
    • ๐Ÿ“ฐ News (24)
      • daily (7)
      • think (17)
    • ๐Ÿ“˜ Hobby (13)
      • English (13)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ๐Ÿ“‹ ์ด๋ ฅ์„œ
  • โšก๏ธ ๊นƒํ—ˆ๋ธŒ
  • ํƒœ๊ทธ
  • ๋ฐฉ๋ช…๋ก

๊ณต์ง€์‚ฌํ•ญ

์ธ๊ธฐ ๊ธ€

ํƒœ๊ทธ

  • testcode
  • React # JSX
  • React๋ฅผ ๋ฐฐ์›Œ์•ผํ•˜๋Š” ์ด์œ 
  • Qr_payment project # CSS ํ•ด์„ # Basic ๋งจ์œ„ ํ•ด์„
  • SpringBootTest
  • git stash
  • ์†Œํ”„ํŠธ์›จ์–ด ๊ณตํ•™ #project๋งŒ๋“ค๋•Œ ์ค‘์š”
  • Linear Regression
  • db
  • junit5
  • git
  • ์†Œํ”„ํŠธ์›จ์–ด ๊ณตํ•™ # chapter1
  • think #bootstrap์„ ์จ์•ผํ•˜๋Š” ์ด์œ 
  • Flutter
  • sourcetreee
  • React JS # ์ž์Šต์„œ # Component์™€ Props
  • React JS # 2 The Basic of React
  • Wonder # word
  • React JS #์ž์Šต์„œ
  • gitaction

์ตœ๊ทผ ๋Œ“๊ธ€

์ตœ๊ทผ ๊ธ€

ํ‹ฐ์Šคํ† ๋ฆฌ

250x250
hELLO ยท Designed By ์ •์ƒ์šฐ.
๊ธ€์“ฐ๊ธฐ / ๊ด€๋ฆฌ์ž
kkh1902

Steadily

DataEngineering/Airflow

Airflow๋ž€?

2026. 2. 1. 21:13
728x90
๋ฐ˜์‘ํ˜•

 

Airflow๋ž€?

Airflow๋Š”
๐Ÿ‘‰ ๋ฐ์ดํ„ฐ ์ž‘์—…์„ “์–ธ์ œ·์–ด๋–ค ์ˆœ์„œ๋กœ·์–ด๋–ป๊ฒŒ” ์‹คํ–‰ํ• ์ง€ ๊ด€๋ฆฌํ•˜๋Š” ๋„๊ตฌ์•ผ.

ํ•œ ์ค„๋กœ ๋งํ•˜๋ฉด:

“์—ฌ๋Ÿฌ ์ž‘์—…(job)์„ ์ •ํ•ด์ง„ ์ˆœ์„œ์™€ ์‹œ๊ฐ„์— ์ž๋™ ์‹คํ–‰ํ•ด์ฃผ๋Š” ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ๋„๊ตฌ”


Airflow๋ฅผ ์™œ ์“ฐ๋Š”๊ฐ€?

๋ฐ์ดํ„ฐ ์ž‘์—… ํ˜„์‹ค์€ ๋ณดํ†ต ์ด๋ž˜ ๐Ÿ‘‡

  • A ์ž‘์—… ๋๋‚˜์•ผ B ์‹คํ–‰
  • B ์‹คํŒจํ•˜๋ฉด C ์‹คํ–‰ํ•˜๋ฉด ์•ˆ ๋จ
  • ๋งค์ผ ์ƒˆ๋ฒฝ 2์‹œ์— ์ž๋™ ์‹คํ–‰
  • ์‹คํŒจํ•˜๋ฉด ์•Œ๋ฆผ ๋ฐ›๊ณ  ์žฌ์‹คํ–‰

๐Ÿ‘‰ ์ด๋Ÿฐ ๊ฑธ ์‚ฌ๋žŒ์ด ์ˆ˜๋™์œผ๋กœ ํ•˜๋ฉด ๋ฐ”๋กœ ์ง€์˜ฅ
๐Ÿ‘‰ ๊ทธ๋ž˜์„œ Airflow๊ฐ€ ๋Œ€์‹  ๊ด€๋ฆฌ


ํ•ต์‹ฌ ๊ฐœ๋… 5๊ฐ€์ง€ (์ด๊ฑฐ๋งŒ ์•Œ๋ฉด 80%)

1๏ธโƒฃ DAG (Directed Acyclic Graph)

  • ์ž‘์—… ํ๋ฆ„ ์ „์ฒด ์„ค๊ณ„๋„
  • “์ด ์ž‘์—… → ๋‹ค์Œ ์ž‘์—…” ๊ด€๊ณ„ ์ •์˜
Extract → Transform → Load

๐Ÿ‘‰ ์ˆœ์„œ๊ฐ€ ์žˆ๋Š” ์ž‘์—… ๋ฌถ์Œ


2๏ธโƒฃ Task

  • DAG ์•ˆ์˜ ํ•˜๋‚˜์˜ ์ž‘์—… ๋‹จ์œ„
  • ์˜ˆ:
    • Python ์‹คํ–‰
    • SQL ์‹คํ–‰
    • Bash ์‹คํ–‰
    • API ํ˜ธ์ถœ

๐Ÿ‘‰ Task ์—ฌ๋Ÿฌ ๊ฐœ = DAG ํ•˜๋‚˜


3๏ธโƒฃ Operator

  • Task๋ฅผ ์–ด๋–ป๊ฒŒ ์‹คํ–‰ํ• ์ง€ ์ •์˜
  • ์ž์ฃผ ์“ฐ๋Š” ๊ฒƒ:
    • PythonOperator
    • BashOperator
    • SQL/BigQueryOperator

๐Ÿ‘‰ Task์˜ ์‹คํ–‰ ๋ฐฉ์‹


4๏ธโƒฃ Scheduler

  • “์ง€๊ธˆ ์ด DAG ์‹คํ–‰ํ•  ์‹œ๊ฐ„์ธ๊ฐ€?”
  • ์‹คํ–‰ ์‹œ์  ํŒ๋‹จ ๋‹ด๋‹น

๐Ÿ‘‰ ๋‘๋‡Œ ์—ญํ• 


5๏ธโƒฃ Executor

  • ์‹ค์ œ๋กœ ์ž‘์—…์„ ์–ด๋””์„œ ๋Œ๋ฆด์ง€ ๊ฒฐ์ •
  • ์˜ˆ:
    • LocalExecutor
    • CeleryExecutor
    • KubernetesExecutor

๐Ÿ‘‰ ๊ทผ์œก ์—ญํ• 


Airflow ๋™์ž‘ ํ๋ฆ„ (์•„์ฃผ ์‰ฝ๊ฒŒ)

1๏ธโƒฃ DAG ์ •์˜ (Python ํŒŒ์ผ)
2๏ธโƒฃ Scheduler๊ฐ€ ์‹คํ–‰ ์‹œ์  ํŒ๋‹จ
3๏ธโƒฃ Executor๊ฐ€ Task ์‹คํ–‰
4๏ธโƒฃ ์„ฑ๊ณต/์‹คํŒจ ์ƒํƒœ ๊ธฐ๋ก
5๏ธโƒฃ UI์—์„œ ํ•œ๋ˆˆ์— ํ™•์ธ


Airflow๋Š” “๋ฌด์—‡์„” ํ•˜์ง€ ์•Š๋Š”๊ฐ€ โŒ

โŒ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜ ๋„๊ตฌ ์•„๋‹˜
โŒ SQL ์„ฑ๋Šฅ ์ตœ์ ํ™” ์•ˆ ํ•ด์คŒ
โŒ Spark๋ฅผ ๋Œ€์ฒดํ•˜์ง€ ์•Š์Œ

๐Ÿ‘‰ Airflow๋Š” “์ผ์ • + ์ˆœ์„œ + ์ƒํƒœ ๊ด€๋ฆฌ”๋งŒ ์ฑ…์ž„


Airflow๋ฅผ ์“ฐ๋ฉด ์ข‹์€ ๊ฒฝ์šฐ (์ฒดํฌ๋ฆฌ์ŠคํŠธ)

โœ… ๊ฐ•๋ ฅ ์ถ”์ฒœ ์ƒํ™ฉ

  • ETL ํŒŒ์ดํ”„๋ผ์ธ์ด ์žˆ๋‹ค
  • ์ž‘์—… ๊ฐ„ ์˜์กด์„ฑ์ด ์žˆ๋‹ค
  • ์ •๊ธฐ ์‹คํ–‰(์Šค์ผ€์ค„๋ง)์ด ํ•„์š”
  • ์‹คํŒจ ์‹œ ์žฌ์‹œ๋„/์•Œ๋ฆผ ํ•„์š”
  • ์šด์˜ ํŒŒ์ดํ”„๋ผ์ธ์ด ์žˆ๋‹ค

โŒ ๊ตณ์ด ์•ˆ ์จ๋„ ๋˜๋Š” ๊ฒฝ์šฐ

  • ๋‹จ๋ฐœ์„ฑ ๋ถ„์„
  • ์ž‘์—… 1~2๊ฐœ
  • ์ˆ˜๋™ ์‹คํ–‰ํ•ด๋„ ๋จ
  • Notebook ์œ„์ฃผ ์‹คํ—˜

Airflow vs DBT (ํ—ท๊ฐˆ๋ฆฌ๋Š” ํฌ์ธํŠธ ์ •๋ฆฌ)

๊ตฌ๋ถ„AirflowDBT

์—ญํ•  ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜
์ดˆ์  ์–ธ์ œ/์ˆœ์„œ/์‹คํ–‰ SQL ๋กœ์ง
์–ธ์–ด Python SQL
DAG ์ž‘์—… ํ๋ฆ„ ๋ชจ๋ธ ์˜์กด์„ฑ
๊ด€๊ณ„ DBT๋ฅผ ์‹คํ–‰์‹œํ‚ด Airflow์— ์˜ํ•ด ์‹คํ–‰๋จ

๐Ÿ‘‰ Airflow + DBT๋Š” ๊ฐ™์ด ์“ฐ๋Š” ๊ฒŒ ์ •์„


๋น„์œ ๋กœ ์ดํ•ดํ•˜๊ธฐ ๐Ÿง 

  • Airflow: ๊ณต์žฅ ์ƒ์‚ฐ ๊ด€๋ฆฌ์ž
  • DBT: ๊ณต์žฅ ์•ˆ ๊ฐ€๊ณต ๊ธฐ๊ณ„
  • DW: ์ฐฝ๊ณ 
  • Spark: ๋Œ€ํ˜• ์ž๋™ ๊ฐ€๊ณต ๋ผ์ธ

์‹ค๋ฌด์—์„œ ์ž์ฃผ ์“ฐ๋Š” ํŒจํ„ด

Airflow
 โ”œโ”€ raw ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
 โ”œโ”€ Spark / Python ETL
 โ”œโ”€ dbt run
 โ”œโ”€ dbt test
 โ””โ”€ ์™„๋ฃŒ ์•Œ๋ฆผ

๐Ÿ‘‰ Airflow๋Š” “ํŒŒ์ดํ”„๋ผ์ธ ์ „์ฒด๋ฅผ ๋ฌถ๋Š” ์ปจํŠธ๋กค ํƒ€์›Œ”


๋ธ”๋กœ๊ทธ์šฉ ํ•œ ๋ฌธ์žฅ ์š”์•ฝ

Airflow๋Š” ๋ฐ์ดํ„ฐ ์ž‘์—… ์ž์ฒด๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋„๊ตฌ๊ฐ€ ์•„๋‹ˆ๋ผ,
๋ฐ์ดํ„ฐ ์ž‘์—…์ด ‘์ œ๋•Œ, ์˜ฌ๋ฐ”๋ฅธ ์ˆœ์„œ๋กœ, ์•ˆ์ •์ ์œผ๋กœ’ ์‹คํ–‰๋˜๋„๋ก ๊ด€๋ฆฌํ•˜๋Š” ๋„๊ตฌ๋‹ค.

 

728x90
๋ฐ˜์‘ํ˜•

'DataEngineering > Airflow' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Airflow Operator์˜ ์ข…๋ฅ˜  (0) 2026.02.01
    'DataEngineering/Airflow' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
    • Airflow Operator์˜ ์ข…๋ฅ˜
    kkh1902
    kkh1902
    1Day 1 Commit ๋ชฉํ‘œ ๊ณต๋ถ€ํ•œ๊ฒƒ๋“ค ๋งค์ผ ๊ธฐ๋กํ•˜๊ธฐ

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”