加载中...
Skip to content

airflow-laminar/airflow-balancer

Repository files navigation

airflow balancer

Utilities for tracking hosts and ports and load balancing DAGs

Build Status codecov License PyPI

Overview

airflow-balancer is a utility library for Apache Airflow to track host and port usage via yaml files. It enables you to:

  • Track Hosts: Define and manage a pool of worker hosts with different capabilities (OS, queues, tags)
  • Manage Ports: Track port usage across your host infrastructure to avoid conflicts
  • Load Balance: Intelligently select hosts based on queues, operating systems, tags, or custom criteria
  • Integrate with Airflow: Automatically create Airflow pools for each host and port

Integration with airflow-laminar Stack

airflow-balancer is tightly integrated with the airflow-laminar ecosystem:

Library Integration
airflow-pydantic Core data models (Host, Port, BalancerConfiguration) are defined in airflow-pydantic, providing full Pydantic validation, type checking, and JSON/YAML serialization support
airflow-config Configuration loading via Hydra for hierarchical configs with defaults, overrides, and environment-specific settings

With airflow-balancer, you can register host and port usage in configuration:

_target_: airflow_balancer.BalancerConfiguration
default_username: timkpaine
hosts:
  - name: host1
    size: 16
    os: ubuntu
    queues: [primary]

  - name: host2
    os: ubuntu
    size: 16
    queues: [workers]

  - name: host3
    os: macos
    size: 8
    queues: [workers]

ports:
  - host: host1
    port: 8080

  - host_name: host2
    port: 8793

Either via airflow-config or directly, you can then select amongst available hosts for use in your DAGs.

from airflow_balaner import BalancerConfiguration, load

balancer_config: BalancerConfiguration = load("balancer.yaml")

host = balancer_config.select_host(queue="workers")
port = balancer_config.free_port(host=host)

...

operator = SSHOperator(ssh_hook=host.hook(), ...)

Visualization

Configuration, and Host and Port listing is built into the extension, available either from the topbar in Airflow or as a standalone viewer (via the airflow-balancer-viewer CLI).

Installation

You can install from pip:

pip install airflow-balancer

For use with Apache Airflow 2.x:

pip install airflow-balancer[airflow]

For use with Apache Airflow 3.x:

pip install airflow-balancer[airflow3]

Or via conda:

conda install airflow-balancer -c conda-forge

Using with airflow-config

The recommended approach is to use airflow-balancer as an extension within your airflow-config configuration:

# config/config.yaml
# @package _global_
_target_: airflow_config.Configuration
defaults:
  - extensions/balancer@extensions.balancer
# config/extensions/balancer.yaml
# @package extensions.balancer
_target_: airflow_balancer.BalancerConfiguration

default_username: airflow
default_key_file: /home/airflow/.ssh/id_rsa
hosts:
  - name: worker1
    size: 16
    os: ubuntu
    queues: [workers]
from airflow_config import load_config

config = load_config("config", "config")
balancer = config.extensions["balancer"]

# Select a host and use its SSH hook
host = balancer.select_host(queue="workers")
operator = SSHOperator(ssh_hook=host.hook(), ...)

Using with airflow-pydantic

Since the core models are defined in airflow-pydantic, you can leverage its testing utilities:

from airflow_balancer import BalancerConfiguration, Host
from airflow_balancer.testing import pools, variables
from airflow_pydantic import Variable

# Testing with mocked pools
with pools():
    config = BalancerConfiguration(
        hosts=[Host(name="test-host", size=8, queues=["test"])]
    )
    assert config.select_host(queue="test").name == "test-host"

# Using Airflow Variables for credentials
host = Host(
    name="secure-host",
    username="admin",
    password=Variable(key="host_password"),
)

License

This software is licensed under the Apache 2.0 license. See the LICENSE file for details.

Note

This library was generated using copier from the Base Python Project Template repository.