Apache Airflow 1.10.2– Active Directory Authentication (via LDAP[s])

Apache Airflow 1.10.2– Active Directory Authentication (via LDAP[s])

By : -

This basic guide assumes a functional airflow deployment, albeit without authentication, or perhaps, with LDAP authentication under the legacy UI scheme. This guide also assumes apache airflow 1.10.2, installed via pip using MySQL and Redis. The guide also assumes Amazon Linux on an EC2 instance.

Pre-requisites:

    An Active Directory service account to use as the bind account.

First, modify airflow.cfg to remove the existing LDAP configuration, if it exists. This can be done by simply removing the values to the right of the equal sign under [ldap] in the airflow.cfg configuration file. Alternately, the [ldap] section can be removed.

Next, modify airflow.cfg to remove ‘authentication = True’, under the [webserver] section. Also, remove the authentication backend line, if it exists.

And finally, create a webserver_config.py file in the AIRFLOW_HOME directory (this is where airflow.cfg is also located). The contents should reflect the following:

import os
from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_LDAP
basedir = os.path.abspath(os.path.dirname(__file__))

SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

CSRF_ENABLED = True

AUTH_TYPE = AUTH_LDAP

AUTH_ROLE_ADMIN = 'Admin'
AUTH_USER_REGISTRATION = True

AUTH_USER_REGISTRATION_ROLE = "Admin"
# AUTH_USER_REGISTRATION_ROLE = "Viewer"

AUTH_LDAP_SERVER = 'ldaps://$ldap:636/
AUTH_LDAP_SEARCH = "DC=domain,DC=organization,DC=com"
AUTH_LDAP_BIND_USER = 'CN=bind-user,OU=serviceAccounts,DC=domain,DC=organization,DC=com'
AUTH_LDAP_BIND_PASSWORD = '**************'
AUTH_LDAP_UID_FIELD = 'sAMAccountName'
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = '/etc/pki/ca-trust/source/anchors/$root_CA.crt'

Note that this requires a valid CA certificate in the location specified to verify the SSL certificate given by Active Directory so the $ldap variable must be a resolvable name which has a valid SSL certificate signed by $root_CA.crt. Also note that any user who logs in with this configuration in place will be an Admin (more to come on this).

Once this configuration is in place, it will likely be desirable to remove all existing users, using the following set of commands from the mysql CLI, logged into the airflow DB instance:

SET FOREIGN_KEY_CHECKS=0;
truncate table ab_user;
truncate table ab_user_role;
SET FOREIGN_KEY_CHECKS=1;

Next, restart the webserver process:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once the webserver comes up, login as the user intended to be the Admin. This will allow this user to manage other users later on.

After logging in as the Admin, modify the webserver_config.py to reflect the following change(s):

# AUTH_USER_REGISTRATION_ROLE = "Admin"
AUTH_USER_REGISTRATION_ROLE = "Viewer"

Now restart the webserver process once more:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once that is done, all new users will register as ‘Viewers’. This will give them limited permissions. The Admin user(s) can then assign proper permissions, based on company policies. Note that this does not allow random people to register — only users in AD can register.

I also like to modify the ‘Public’ role to add ‘can_index’ so that anonymous users can see the UI, although they do not see DAGs or other items.

Note that Apache airflow introduced RBAC with version 1.10 and dropped support for the legacy UI after version 1.10.2.

References:
Airflow
Updating Airflow
Flask AppBuilder LDAP Authentication
Flask AppBuilder Configuration

Hello,

Thanks for your article.

I am getting an below error, if I enter airflow url.
{{decorators.py:28}} WARNING – Access is Denied for: can_index on: Airflow

Could you please help me if you know what is wrong?

Hi Atya,

Did you add the ‘can_index’ permission to the Public role while logged in as an Admin?

Can you give me more detail of what you have done so far?

Thanks,
Josh

Hi Josh,

This is very useful information for me. I implemented airflow in our environment and integrated airflow with Active directory successfully with the above instructions. Thank you so much for your help.
Having said that I think you missed one configuration parameter in pre-requisites. (rbac = True in airflow.cfg) without this option airflow is not reading webserver_config.py. Please correct me if I am wrong or incorporate this point in the block if it is useful.

Thanks
Ramakrishna

I had been struggling since long, to get RBAC to work via FAB. I wasn’t aware that “AUTH_USER_REGISTRATION = True” is the key to allow LDAP users to login. Was under the impression that all LDAP users can by default login to the system.
Thank you for this explanatory blog.

Hi Subham,

Can you give some details around what you’ve done so far? A sanitized config would be ideal. Also, what do you see in the logs?

Thanks,
Josh

Hi,
i have walked through what i can of this setup for a WSL install of airflow. i’ve got 1.10.4 and when i split the config out to create the .py file it no longer pops up a login page and just lets me in. i think i have two issues. LDAP isn’t working and the config file for .py isn’t being accessed on start. Below is my file.

import os
from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_LDAP
basedir = os.path.abspath(os.path.dirname(__file__))

SQLALCHEMY_DATABASE_URI = conf.get(‘core’, ‘SQL_ALCHEMY_CONN’)

CSRF_ENABLED = True

AUTH_TYPE = AUTH_LDAP

AUTH_ROLE_ADMIN = ‘Admin’
AUTH_USER_REGISTRATION = True

AUTH_USER_REGISTRATION_ROLE = “Admin”
# AUTH_USER_REGISTRATION_ROLE = “Viewer”

AUTH_LDAP_SERVER = ‘ldap://$ldap:389/
AUTH_LDAP_SEARCH = “DC=***,DC=local”
AUTH_LDAP_BIND_USER = ‘CN=***,OU=***,DC=***,DC=***,DC=local’
AUTH_LDAP_BIND_PASSWORD = ‘***’
AUTH_LDAP_UID_FIELD = ‘sAMAccountName’
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = ‘~/airflow/$root_ca.crt

this is a brand new install using the defaults; sqllite, sequentialexecutor. i did extract a domain cert to load to this webserver but cant seem to find how to do that if needed with this script. any help is greatly appreciated!

Hi Trevor,

Do you have any logging output that might indicate the problem?

I see a problem with your LDAP configuration — you have specified that you don’t want to use TLS/STARTTLS, but you specify port 389 in the AUTH_LDAP_SERVER config value, as well as not specifying LDAPS. Please try to use LDAPS on port 636 or allow TLS under the AUTH_LDAP_USE_TLS parameter.

Also, it’s a good idea to allow self signed certificates until you get authentication working. At that point, start working on denying self signed certificates once you’ve established that everything else is working.

Thanks,
Josh

Hi Josh,

Thanks for sharing this Article, I find it very useful. I followed all the steps mentioned in the article but when I try to access airflow webserver console, login screen never appears and i am directly logged in as admin. Below is my webserver_config.py

import os
from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_LDAP
basedir = os.path.abspath(os.path.dirname(__file__))

SQLALCHEMY_DATABASE_URI = conf.get(‘core’, ‘SQL_ALCHEMY_CONN’)

CSRF_ENABLED = True

AUTH_TYPE = AUTH_LDAP

AUTH_ROLE_ADMIN = ‘Admin’
AUTH_USER_REGISTRATION = True

AUTH_USER_REGISTRATION_ROLE = “Admin”
# AUTH_USER_REGISTRATION_ROLE = “Viewer”

AUTH_LDAP_SERVER = ‘ldaps://********:636’
AUTH_LDAP_SEARCH = “DC=***,DC=***,DC=**”
AUTH_LDAP_BIND_USER = ‘CN=***,OU=***,OU=***,DC=***,DC=***,DC=***’
AUTH_LDAP_BIND_PASSWORD = ‘***’
AUTH_LDAP_UID_FIELD = ‘sAMAccountName’
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = ‘/etc/ca/***.cer’

I am using OS as Linux RHEL 7.4 on Amazon ec2 instance, will this make a difference, I also verified webserver logs and didn’t see any error or exception, seems as if webserver_config.py is not been utilized by airflow.

Thanks for help !!

Hi Pritt,

Did you run ‘airflow db init’ after setting up your configuration to generate the RBAC tables? Please try that, and then restart airflow webserver and let me know if that works.

Thanks,
Josh

Hi Josh,

Yes I tried running command Airflow initdb and that too didn’t make any difference.

Thanks for your time !!

Hi,
thank you for your article.
Can propose one case to not set autoregistration to Admin.
You can just add required admins by

airflow create_user -r Admin -u ... etc

Hi Denis,

Thanks for posting that – it’s a great idea, and better than setting all new users to Admin.

Let me test that out and make sure it works for me and then I can update the post.

Thanks,
Josh

I have the same issues where it just goes directly to login page without prompting for login, after following steps above Airflow 1.10.5 running on RHEL. 7.0

Is there a way to run admin and viewer roles together instead of bringing down the webserver .as it’s not a good practice to make the configuration change in the live application

Thank you!, it worked for me. The challenges i have faced,

Python-ldap module is a pre-req, and for that you need to have pre-req as
sudo yum install python-devel openldap-devel

and then, i have turned on RBAC flag to True and then it worked. thanks again

Hi Josh
Couple of things
1) where you are defining the ad groups names if the users belong to that group they can be admi
2) are you updating the database from the command line or through airflow ui
3) is restarting the airflow in prod would be a good idea ?
4) and how all the users would be viewer after db update and webset restart

So we can not separate out users base based on the diff AD groups like one for admin , one for users/op and other ad group for viewer so all the respective members of these groups should be automatically mapped to those roles .
without doing the registration.

That is correct – you have to manage roles and permissions (authorization) inside airflow. AD is only for login (authentication).

Is there any way to secure experimental api URL via LDAP? We are using airflow installed in Azure VM. The same code (with appropriate modifications) will work there as well?

How to to enable airflow authentication for two LDAP groups
one for Admin
and other for public with limited privileges

I think it’s worth mentioning, that you must have installed
On Centos: openldap-devel
For Python: python-ldap, ldap3

Otherwise, thanks, very useful.

I created a AD service account to bind the LDAP authentication. As it’s an AD account AIRFLOW__LDAP__USER_NAME_ATTR=sAMAccountName and AIRFLOW__LDAP__BIND_PASSWORD is the password of the service account that I created for LDAP auth. However, I’m getting incoorect login details error. What is that I could be doing wrong? Should that AD account be part of some group? Please let me know

Hi Monica,

When using the FAB – Flask Appbuilder method, you need to use the following attributes instead of the ones you listed:


AUTH_LDAP_BIND_USER = 'CN=bind-user,OU=serviceAccounts,DC=domain,DC=organization,DC=com'
AUTH_LDAP_BIND_PASSWORD = '**************'
AUTH_LDAP_UID_FIELD = 'sAMAccountName'

I think that should get you in the right direction.

Thanks,
Josh

Isn’t it possible to configure ldap authentication via airflow.cfg file (without webserver_config.py)..?
Version 1.10.11

In airflow.cfg file there are these parameters under [ldap] section.

[ldap]
# set this to ldaps://:
uri =
user_filter = objectClass=*
user_name_attr = uid
group_member_attr = memberOf
superuser_filter =
data_profiler_filter =
bind_user = cn=Manager,dc=example,dc=com
bind_password = insecure
basedn = dc=example,dc=com
cacert = /etc/ca/ldap_ca.crt
search_scope = LEVEL

Hi Raul,

The ldap authentication configuration in the airflow.cfg file is for the flask-admin version. Using the webserver_config.py allows the use of the FAB based web UI and supports RBAC. The biggest driver for me using the FAB based web UI was RBAC so that users could be assigned different levels of authorization.

Thanks,
Josh

Oh, thanks.
Flask-admin version is the old way that is going to be removed from version 2, am I correct..?

What are these settings for:

AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = ‘<>’

If I omitted these settings, it also worked for me, perhaps first two have some defaults, which default to False.
But omitting cacertfile location – login still works for me.

Using ldaps on port 636

By the way. At the moment I need to specify “AUTH_LDAP_SEARCH” from where it starts searching users.
But is it possible also, that I can give permissions through ldap groups also..?
(for example, user1, user2 are present in group named group1 – so only these 2 users can login)

Hi Raul,

Your certificate is likely not valid if adding AUTH_LDAP_TLS_CACERTFILE causes a login failure. Can you validate the certificate using openssl?


openssl s_client -connect $ldaphost:636

Thanks,
Josh

No.
If I add or omit (do not add) this parameter to webserver_config.py file – login still works for me the same way.
AUTH_LDAP_TLS_CACERTFILE = ‘’

So it doesn’t matter if I have this parameter set or not. Still works.

Output of this query is like this (shouldn’t I need to add client certificate also to connect..?).

# openssl s_client -connect $ldaphost:636
CONNECTED(00000003)
depth=1 DC = com, DC = domain, CN = YYYYYYYY
verify return:1
depth=0 CN = adhostX.domain.com
verify return:1

Certificate chain
….
Server certificate
….

No client certificate CA names sent
Client Certificate Types: RSA sign, DSA sign, ECDSA sign
Requested Signature Algorithms: ……….
Shared Requested Signature Algorithms: ……….
Peer signing digest: SHA256
Server Temp Key: ECDH, P-521, 521 bits

SSL handshake has read 2446 bytes and written 551 bytes

New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : ECDHE-RSA-AES256-SHA384
Session-ID: …..
Session-ID-ctx:
Master-Key: …………………..
Key-Arg : None
Krb5 Principal: None
PSK identity: None
PSK identity hint: None
Start Time: 1598419312
Timeout : 300 (sec)
Verify return code: 0 (ok)

Raul

Hi Josh,

The blog is really informative. Thank you.

I am trying to configure Azure Active Directory with Airflow, which is deployed in an Azure Kubernetes cluster through helm chart(stable/airflow).

Any idea on it would be great help.

Thanks.

Hi Josh,

The blog is really informative. Thank you.

I am trying to configure Azure Active Directory with Airflow which is deployed in Azure Kubernetes service through helm chart (stable/airflow).

Any idea on it would be of great help.

Thanks.

I am trying basic AUTH_DB type authentication with RBAC with airflow 1.12. However I am getting 404 error on web UI and below warning in the logs:
WARNING – Access is Denied for: can_index on: Airflow

I have added one user as below:
airflow create_user -r Admin -u admin -e admin@example.com -f admin -l user -p test

Any idea how do we add can_index permission to specific roles or users.

Silpa,

Are you able to find a solution to configure Apachie airflow with Azure Ad. The same requirement is there for me. If you find any rrlateble I formation please post

Hello Josh
Running
docker logs airflow-webserver

Return:
Traceback (most recent call last):
File “/home/airflow/.local/lib/python3.7/site-packages/flask_appbuilder/security/manager.py”, line 874, in auth_user_ldap
import ldap
ModuleNotFoundError: No module named ‘ldap’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

Do you know whats i need to install?
Thanks a lot

Hi Eneh,

That means the python ldap library is missing. You need to install it with pip. Depending on the distro you’re using for the container, it should be one of the following:


pip3 install pyldap
pip install ldap3

Please share your Dockerfile, if you need any more help.

Thanks,
Josh

Thanks a lot Josh
I am pasting the airflow-webserver Dockerfile

FROM apache/airflow:1.10.12-python3.7
USER root
RUN apt-get update -yqq \
&& apt-get upgrade -yqq \
&& apt-get install -yqq –no-install-recommends \
iputils-ping \
openssh-server \
#sshpass \
gcc python3-dev \
&& apt-get autoremove -yqq –purge \
&& apt-get clean
RUN /usr/local/bin/python -m pip install –upgrade pip
RUN pip install ansible==2.9.11 \
&& pip install ansible-runner==1.4.6

RUN pip install bottle==0.12.18 \
&& pip install openpyxl==3.0.3 \
&& pip install textfsm==1.1.0 \
&& pip install paramiko==2.7.1 \
&& pip install influxdb==5.3.1 \
&& pip install pymongo==3.11.1 \
&& pip3 install python-powerdns==0.2.1\
&& pip install ttp==0.6.0
RUN pip install pysyslogclient==0.1.1
RUN pip install ansible-runner==1.4.6
RUN mkdir /etc/ansible && chmod +755 /etc/ansible
RUN mkdir /usr/local/airflow && chown -R airflow /usr/local/airflow
RUN chmod 777 /usr/local/airflow
RUN mkdir /usr/local/backup && chmod 777 /usr/local/backup
COPY src/ansible/ansible.cfg /etc/ansible
COPY src/ansible/hosts /etc/ansible
COPY src/airflow/airflow.cfg /opt/airflow
USER airflow
WORKDIR /usr/local/airflow
“dockerfiles/webserver/Dockerfile”
openssh-server \
gcc python3-dev \
&& apt-get autoremove -yqq –purge \
&& apt-get clean
RUN /usr/local/bin/python -m pip install –upgrade pip
RUN pip install ansible==2.9.11 \
&& pip install ansible-runner==1.4.6
RUN pip install bottle==0.12.18 \
&& pip install openpyxl==3.0.3 \
&& pip install textfsm==1.1.0 \
&& pip install paramiko==2.7.1 \
&& pip install influxdb==5.3.1 \
&& pip install pymongo==3.11.1 \
&& pip3 install python-powerdns==0.2.1\
&& pip install ttp==0.6.0
RUN pip install pysyslogclient==0.1.1
RUN pip install ansible-runner==1.4.6
RUN mkdir /etc/ansible && chmod +755 /etc/ansible
RUN mkdir /usr/local/airflow && chown -R airflow /usr/local/airflow
RUN chmod 777 /usr/local/airflow
RUN mkdir /usr/local/backup && chmod 777 /usr/local/backup
COPY src/ansible/ansible.cfg /etc/ansible
COPY src/ansible/hosts /etc/ansible
COPY src/airflow/airflow.cfg /opt/airflow
USER airflow
WORKDIR /usr/local/airflow

Thank you so much Josh

Hi Eneh,

Add the following to your Dockerfile – in the appropriate location(s):


RUN apt-get install libldap2-dev libssl-dev libsasl2-dev
..
RUN pip install pyldap

Once I added that, I was able to get it working.

While the Dockerfile you pasted does not work, I assume there was a copy/paste error and that is not your correct Dockerfile.

Thanks,
Josh

43 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *