1.airflow安裝
pip install airflow
pip install airflow[celery,mysql,password]
*注:centos在安裝pip install [mysql]之前 執(zhí)行 yum install -y mysql-devel python-devel python-setuptools
2.安裝mysql5.6
選擇安裝的路徑: /home/hadoop/install/mysql5.6
1.下載 https://dev.mysql.com/downloads/mysql/ ?選擇源碼
2.解壓 tar xvfz mysql-5.6.17.tar.gz
3.編譯 cmake -D? MYSQL_DATADIR=/home/hadoop/install/mysql5.6/data -D SYSCONFDIR=/home/hadoop/install/mysql5.6/etc -D CMAKE_INSTALL_PREFIX=/home/hadoop/install/mysql5.6 .
make
make install
4.配置mysql
cd /home/hadoop/install/mysql5.6
cp support-files/my-default.cnf ./
修改my-default.cnf 如下:
scripts/mysql_install_db --defaults-file=./my-default.cnf ?#執(zhí)行完缕贡,在該路徑下會產(chǎn)生my.cnf
/bin/sh bin/mysqld_safe --defaults-file=./my.cnf &
注: *?character_set_server=latin1 是必須的危尿,如果是utf8的話萍聊,在airflow運行過程中偶爾會出現(xiàn) "Invalid utf8 character string: '80027D'" 的異常聊替。
* export LD_LIBRARY_PATH=/home/hadoop/install/mysql5.6/lib?加到~/.bashrc 中告訴airflow時使用的sock,否則會使用系統(tǒng)默認的mysql.sock
創(chuàng)建airflow數(shù)據(jù)庫开财,以及用戶
mysql -uroot #mysql 無秘登錄
CREATE DATABASE airflow;
GRANT all privileges on ?airflow.* ?TO ?'airflow'@'localhost' ? IDENTIFIEDBY ? 'airflow'; #創(chuàng)建數(shù)據(jù)庫airflow ,用戶airflow ,密碼airflow
FLUSH PRIVILEGES;
3.配置airflow
airflow initdb ?#執(zhí)行完可以查看airflow 數(shù)據(jù)庫中是否已經(jīng)生成相關的表
cd ~/airflow ?#安裝完airflow之后汉柒,會默認在~下生成airflow目錄
vim airflow.cfg #配置相關配置
具體參考 線上部署配置
* initdb中出現(xiàn) fernet key ....? raise TypeError(msg)? TypeError: Incorrect padding 误褪。解決辦法:
pip install cryptography
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
Add the generated key to the config file?airflow.cfg,?fernet_key = YOUR_GENERATED_KEY
*??auth_backend = airflow.contrib.auth.backends.password_auth #1.8.1版本中cfg文件沒有寫這個參數(shù),一定要加上,不然會報"airflow.exceptions.AirflowException: Failed to import authentication backend"錯誤
重要的說明如下:
1.airflow_home=/home/hadoop/airflow ?#airflow的home路徑,就是自動生成的~/airflow.這個在發(fā)警報的時候,點擊查看log很有用碾褂。
2.dags_folder ?#dag所在的目錄
3.base_log_folder ?#dag運行過程中的存的log目錄
4.executor = CeleryExecutor #我們使用 CeleryExecutor進行調(diào)度
5.sql_alchemy_conn = mysql://airflow:airflow@localhost:23306/airflow ?#部署sql連接的sqlalchemy
6.parallelism=32 #最大的并行數(shù)目
7.dag_concurrency = 32 #scheduler 最大的運行的dag數(shù)目
8.plugins_folder #plugins所在的目錄兽间,新生成一個,賦值給它就行
9.celeryd_concurrency=32 #CeleryExecutor的起的線程的數(shù)目
4 新增用戶
import airflow
from airflow import models, settings
from airflow.contrib.auth.backends.password_auth import PasswordUser
user = PasswordUser(models.User())
user.username = 'XXX'
user.email = 'YYY'
user._password = 'ZZZ'
session = settings.Session()
session.add(user)
session.commit()
session.close()
exit()
5.重啟shell
ps -ef |grep 'celeryd' |awk '{print $2}'|xargs kill -9
ps -ef |grep 'airflow webserver' |awk '{print $2}' |xargs kill -9
ps -ef |grep 'airflow-webserver' |awk '{print $2}' |xargs kill -9
ps -ef |grep 'airflow scheduler' |awk '{print $2}' |xargs kill -9
nohup airflow worker >worker.log 2>&1 &
nohup airflow scheduler >scheduler.log 2>&1 &
nohup airflow webserver >webserver.log 2>&1 &
6.開發(fā)自己的Operator
7.開發(fā)自己的Sensor
http://michal.karzynski.pl/blog/2017/03/19/developing-workflows-with-apache-airflow/