跳转到主要内容

Contents

Vital links

Airflow deployment solutions

Introductions and tutorials

Airflow Summit 2020 videos

The first Airflow Summit 2020 was held in July 2020. It was a truly global, fully online event that was co-hosted by 9 Airflow Meetups from all over the world (MelbourneTokyoBangaloreWarsawAmsterdamLondonNYCBayArea).

It featured 40+ talks and three workshops. You can check out the talk recordings as a YouTube Airflow Summit 2020 Playlist or see the individual talks here:

Best practices, lessons learned and cool use cases

Books, blogs, podcasts, and such

Slide deck presentations and online videos

Libraries, Hooks, Utilities

  • AirFly - Auto generate Airflow's dag.py on the fly.
  • DEAfrica Airflow - Airflow libraries used by Digital Earth Africa, an humanitarian effort to utilize satellite imagery of Africa.
  • Airflow plugins - Central collection of repositories of various plugins for Airflow, including mailchimp, trello, sftp, GitHub, etc.
  • fileflow - Collection of modules to support large data transfers between Airflow operators through either local file system or S3. This addresses a gap where data is too large for XCOMs but too small or inconvenient for loading directly in the operator. Built by Industry Dive.
  • fairflow - Library to abstract away Airflow's Operators with functional pieces that transform the data from one operator to another.
  • airflow-maintenance-dags - Clairvoyant has a repo of Airflow DAGs that operator on Airflow itself, clearing out various bits of the backing metadata store.
  • test_dags - a more complete solution for DAG integrity tests (first Circle of Data’s Inferno are the first.
  • dag-factory - A library for dynamically generating Apache Airflow DAGs from YAML configuration files.
  • whirl - Fast iterative local development and testing of Apache Airflow workflows.
  • airflow-code-editor - A plugin for Apache Airflow that allows you to edit DAGs in browser.
  • Pylint-Airflow - A Pylint plugin for static code analysis on Airflow code.
  • afctl - A CLI tool that includes everything required to create, manage and deploy airflow projects faster and smoother.
  • Dag Dependencies viewer - A plugin which creates a view to visualize dependencies between the Airflow DAGs
  • Airflow ECR Plugin - Plugin to refresh AWS ECR login token at regular intervals. This is helpful where DockerOperator needs to pull images hosted on ECR.
  • AirflowK8sDebugger - A library for generate k8s pod yaml templates from an Airflow dag using the KubernetesPodOperator.
  • Oozie to Airflow - A tool to easily convert between Apache Oozie workflows and Apache Airflow workflows.
  • Airflow Ditto - An extensible framework to do transformations to an Airflow DAG and convert it into another DAG which is flow-isomorphic with the original DAG, to be able to run it on different environments (e.g. on different clouds, or even different container frameworks - Apache Spark on YARN vs Kubernetes). Comes with out-of-the-box support for EMR-to-HDInsight-DAG transforms.
  • gusty - Create a DAG using any number of YAML, Python, Jupyter Notebook, or R Markdown files that represent individual tasks in the DAG. gusty also configures dependencies, DAGs, and TaskGroups, features support for your local operators, and more. A fully containerized demo is available here.
  • Meltano - Open source, self-hosted, CLI-first, debuggable, and extensible ELT tool that embraces Singer for extraction and loading, leverages dbt for transformation, and integrates with Airflow for orchestration.
  • DAG checks - The dag-checks consist of checks that can help you in maintaining your Apache Airflow instance.
  • Airflow DVC plugin - Plugin for open-source version-control system for data science and Machine Learning pipelines - DVC.

Meetups

Commercial Airflow-as-a-service providers

  • Google Cloud Composer - Google Cloud Composer is a managed service built atop Google Cloud and Airflow.
  • Qubole - Qubole is mainly known as a service-and-support company for Apache Hive, but also provides Airflow as a component of its platform.
  • Astronomer.io - Astronomer provides complete ETL lifecycle solutions and appears to be entirely focused on providing Airflow-based products.
  • AWS MWAA - Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale.

Cloud Composer resources

This section contains articles that apply to Cloud Composer — a service built by Google Cloud based on Apache Airflow. Tricks and solutions are described here that are intended for Cloud Composer, but may be applicable to vanilla Airflow.

Non-English resources

Sample projects

原文:https://github.com/jghoman/awesome-apache-airflow