Apache Airflow 1.10.2– Active Directory Authentication (via LDAP[s])

This basic guide assumes a functional airflow deployment, albeit without authentication, or perhaps, with LDAP authentication under the legacy UI scheme. This guide also assumes apache airflow 1.10.2, installed via pip using MySQL and Redis. The guide also assumes Amazon Linux on an EC2 instance.

Pre-requisites:

    An Active Directory service account to use as the bind account.

First, modify airflow.cfg to remove the existing LDAP configuration, if it exists. This can be done by simply removing the values to the right of the equal sign under [ldap] in the airflow.cfg configuration file. Alternately, the [ldap] section can be removed.

Next, modify airflow.cfg to remove ‘authentication = True’, under the [webserver] section. Also, remove the authentication backend line, if it exists.

And finally, create a webserver_config.py file in the AIRFLOW_HOME directory (this is where airflow.cfg is also located). The contents should reflect the following:

import os
from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_LDAP
basedir = os.path.abspath(os.path.dirname(__file__))

SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

CSRF_ENABLED = True

AUTH_TYPE = AUTH_LDAP

AUTH_ROLE_ADMIN = 'Admin'
AUTH_USER_REGISTRATION = True

AUTH_USER_REGISTRATION_ROLE = "Admin"
# AUTH_USER_REGISTRATION_ROLE = "Viewer"

AUTH_LDAP_SERVER = 'ldaps://$ldap:636/
AUTH_LDAP_SEARCH = "DC=domain,DC=organization,DC=com"
AUTH_LDAP_BIND_USER = 'CN=bind-user,OU=serviceAccounts,DC=domain,DC=organization,DC=com'
AUTH_LDAP_BIND_PASSWORD = '**************'
AUTH_LDAP_UID_FIELD = 'sAMAccountName'
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = '/etc/pki/ca-trust/source/anchors/$root_CA.crt'

Note that this requires a valid CA certificate in the location specified to verify the SSL certificate given by Active Directory so the $ldap variable must be a resolvable name which has a valid SSL certificate signed by $root_CA.crt. Also note that any user who logs in with this configuration in place will be an Admin (more to come on this).

Once this configuration is in place, it will likely be desirable to remove all existing users, using the following set of commands from the mysql CLI, logged into the airflow DB instance:

SET FOREIGN_KEY_CHECKS=0;
truncate table ab_user;
truncate table ab_user_role;
SET FOREIGN_KEY_CHECKS=1;

Next, restart the webserver process:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once the webserver comes up, login as the user intended to be the Admin. This will allow this user to manage other users later on.

After logging in as the Admin, modify the webserver_config.py to reflect the following change(s):

# AUTH_USER_REGISTRATION_ROLE = "Admin"
AUTH_USER_REGISTRATION_ROLE = "Viewer"

Now restart the webserver process once more:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once that is done, all new users will register as ‘Viewers’. This will give them limited permissions. The Admin user(s) can then assign proper permissions, based on company policies. Note that this does not allow random people to register — only users in AD can register.

I also like to modify the ‘Public’ role to add ‘can_index’ so that anonymous users can see the UI, although they do not see DAGs or other items.

Note that Apache airflow introduced RBAC with version 1.10 and dropped support for the legacy UI after version 1.10.2.

References:
Airflow
Updating Airflow
Flask AppBuilder LDAP Authentication
Flask AppBuilder Configuration


Comments

47 responses to “Apache Airflow 1.10.2– Active Directory Authentication (via LDAP[s])”

  1. Hello,

    Thanks for your article.

    I am getting an below error, if I enter airflow url.
    {{decorators.py:28}} WARNING – Access is Denied for: can_index on: Airflow

    Could you please help me if you know what is wrong?

  2. Hi Atya,

    Did you add the ‘can_index’ permission to the Public role while logged in as an Admin?

    Can you give me more detail of what you have done so far?

    Thanks,
    Josh

  3. Ramakrishna Ganji Avatar
    Ramakrishna Ganji

    Hi Josh,

    This is very useful information for me. I implemented airflow in our environment and integrated airflow with Active directory successfully with the above instructions. Thank you so much for your help.
    Having said that I think you missed one configuration parameter in pre-requisites. (rbac = True in airflow.cfg) without this option airflow is not reading webserver_config.py. Please correct me if I am wrong or incorporate this point in the block if it is useful.

    Thanks
    Ramakrishna

  4. Hi Ramakrishna,

    That configuration option has been deprecated:
    https://github.com/apache/airflow/blob/master/UPDATING.md

    Thanks,
    Josh

  5. Ishita Virmani Avatar
    Ishita Virmani

    I had been struggling since long, to get RBAC to work via FAB. I wasn’t aware that “AUTH_USER_REGISTRATION = True” is the key to allow LDAP users to login. Was under the impression that all LDAP users can by default login to the system.
    Thank you for this explanatory blog.

  6. Very good and helpful write-up. Thanks Josh!

  7. subham mishra Avatar
    subham mishra

    I am getting Airflow 404 = lots of circles—error

  8. Hi Subham,

    Can you give some details around what you’ve done so far? A sanitized config would be ideal. Also, what do you see in the logs?

    Thanks,
    Josh

  9. Hi,
    i have walked through what i can of this setup for a WSL install of airflow. i’ve got 1.10.4 and when i split the config out to create the .py file it no longer pops up a login page and just lets me in. i think i have two issues. LDAP isn’t working and the config file for .py isn’t being accessed on start. Below is my file.

    import os
    from airflow import configuration as conf
    from flask_appbuilder.security.manager import AUTH_LDAP
    basedir = os.path.abspath(os.path.dirname(__file__))

    SQLALCHEMY_DATABASE_URI = conf.get(‘core’, ‘SQL_ALCHEMY_CONN’)

    CSRF_ENABLED = True

    AUTH_TYPE = AUTH_LDAP

    AUTH_ROLE_ADMIN = ‘Admin’
    AUTH_USER_REGISTRATION = True

    AUTH_USER_REGISTRATION_ROLE = “Admin”
    # AUTH_USER_REGISTRATION_ROLE = “Viewer”

    AUTH_LDAP_SERVER = ‘ldap://$ldap:389/
    AUTH_LDAP_SEARCH = “DC=***,DC=local”
    AUTH_LDAP_BIND_USER = ‘CN=***,OU=***,DC=***,DC=***,DC=local’
    AUTH_LDAP_BIND_PASSWORD = ‘***’
    AUTH_LDAP_UID_FIELD = ‘sAMAccountName’
    AUTH_LDAP_USE_TLS = False
    AUTH_LDAP_ALLOW_SELF_SIGNED = False
    AUTH_LDAP_TLS_CACERTFILE = ‘~/airflow/$root_ca.crt

    this is a brand new install using the defaults; sqllite, sequentialexecutor. i did extract a domain cert to load to this webserver but cant seem to find how to do that if needed with this script. any help is greatly appreciated!

  10. Hi Trevor,

    Do you have any logging output that might indicate the problem?

    I see a problem with your LDAP configuration — you have specified that you don’t want to use TLS/STARTTLS, but you specify port 389 in the AUTH_LDAP_SERVER config value, as well as not specifying LDAPS. Please try to use LDAPS on port 636 or allow TLS under the AUTH_LDAP_USE_TLS parameter.

    Also, it’s a good idea to allow self signed certificates until you get authentication working. At that point, start working on denying self signed certificates once you’ve established that everything else is working.

    Thanks,
    Josh

  11. Hi Josh,

    Thanks for sharing this Article, I find it very useful. I followed all the steps mentioned in the article but when I try to access airflow webserver console, login screen never appears and i am directly logged in as admin. Below is my webserver_config.py

    import os
    from airflow import configuration as conf
    from flask_appbuilder.security.manager import AUTH_LDAP
    basedir = os.path.abspath(os.path.dirname(__file__))

    SQLALCHEMY_DATABASE_URI = conf.get(‘core’, ‘SQL_ALCHEMY_CONN’)

    CSRF_ENABLED = True

    AUTH_TYPE = AUTH_LDAP

    AUTH_ROLE_ADMIN = ‘Admin’
    AUTH_USER_REGISTRATION = True

    AUTH_USER_REGISTRATION_ROLE = “Admin”
    # AUTH_USER_REGISTRATION_ROLE = “Viewer”

    AUTH_LDAP_SERVER = ‘ldaps://********:636’
    AUTH_LDAP_SEARCH = “DC=***,DC=***,DC=**”
    AUTH_LDAP_BIND_USER = ‘CN=***,OU=***,OU=***,DC=***,DC=***,DC=***’
    AUTH_LDAP_BIND_PASSWORD = ‘***’
    AUTH_LDAP_UID_FIELD = ‘sAMAccountName’
    AUTH_LDAP_USE_TLS = False
    AUTH_LDAP_ALLOW_SELF_SIGNED = False
    AUTH_LDAP_TLS_CACERTFILE = ‘/etc/ca/***.cer’

    I am using OS as Linux RHEL 7.4 on Amazon ec2 instance, will this make a difference, I also verified webserver logs and didn’t see any error or exception, seems as if webserver_config.py is not been utilized by airflow.

    Thanks for help !!

  12. Hi Pritt,

    Did you run ‘airflow db init’ after setting up your configuration to generate the RBAC tables? Please try that, and then restart airflow webserver and let me know if that works.

    Thanks,
    Josh

  13. Hi Josh,

    Yes I tried running command Airflow initdb and that too didn’t make any difference.

    Thanks for your time !!

  14. Hi,
    thank you for your article.
    Can propose one case to not set autoregistration to Admin.
    You can just add required admins by

    airflow create_user -r Admin -u ... etc

  15. Hi Denis,

    Thanks for posting that – it’s a great idea, and better than setting all new users to Admin.

    Let me test that out and make sure it works for me and then I can update the post.

    Thanks,
    Josh

  16. I have the same issues where it just goes directly to login page without prompting for login, after following steps above Airflow 1.10.5 running on RHEL. 7.0

  17. Is there a way to run admin and viewer roles together instead of bringing down the webserver .as it’s not a good practice to make the configuration change in the live application

  18. Thank you!, it worked for me. The challenges i have faced,

    Python-ldap module is a pre-req, and for that you need to have pre-req as
    sudo yum install python-devel openldap-devel

    and then, i have turned on RBAC flag to True and then it worked. thanks again

  19. Hi Josh
    Couple of things
    1) where you are defining the ad groups names if the users belong to that group they can be admi
    2) are you updating the database from the command line or through airflow ui
    3) is restarting the airflow in prod would be a good idea ?
    4) and how all the users would be viewer after db update and webset restart

  20. Hi Deepak,

    While authentication is handled by Active Directory under this scenario, authorization is managed inside Airflow using RBAC UI:

    https://airflow.readthedocs.io/en/latest/howto/add-new-role.html

    Thanks,
    Josh

  21. deepak p Avatar
    deepak p

    So we can not separate out users base based on the diff AD groups like one for admin , one for users/op and other ad group for viewer so all the respective members of these groups should be automatically mapped to those roles .
    without doing the registration.

  22. That is correct – you have to manage roles and permissions (authorization) inside airflow. AD is only for login (authentication).

  23. Lal Prasad R Avatar
    Lal Prasad R

    Is there any way to secure experimental api URL via LDAP? We are using airflow installed in Azure VM. The same code (with appropriate modifications) will work there as well?

  24. chakri Avatar
    chakri

    How to to enable airflow authentication for two LDAP groups
    one for Admin
    and other for public with limited privileges

  25. Hi Chakri,

    While this is, in theory, supported, I was never able to get this working.

    Thanks,
    Josh

  26. I think it’s worth mentioning, that you must have installed
    On Centos: openldap-devel
    For Python: python-ldap, ldap3

    Otherwise, thanks, very useful.

  27. Monica Avatar
    Monica

    I created a AD service account to bind the LDAP authentication. As it’s an AD account AIRFLOW__LDAP__USER_NAME_ATTR=sAMAccountName and AIRFLOW__LDAP__BIND_PASSWORD is the password of the service account that I created for LDAP auth. However, I’m getting incoorect login details error. What is that I could be doing wrong? Should that AD account be part of some group? Please let me know

  28. Hi Monica,

    When using the FAB – Flask Appbuilder method, you need to use the following attributes instead of the ones you listed:


    AUTH_LDAP_BIND_USER = 'CN=bind-user,OU=serviceAccounts,DC=domain,DC=organization,DC=com'
    AUTH_LDAP_BIND_PASSWORD = '**************'
    AUTH_LDAP_UID_FIELD = 'sAMAccountName'

    I think that should get you in the right direction.

    Thanks,
    Josh

  29. Isn’t it possible to configure ldap authentication via airflow.cfg file (without webserver_config.py)..?
    Version 1.10.11

    In airflow.cfg file there are these parameters under [ldap] section.

    [ldap]
    # set this to ldaps://:
    uri =
    user_filter = objectClass=*
    user_name_attr = uid
    group_member_attr = memberOf
    superuser_filter =
    data_profiler_filter =
    bind_user = cn=Manager,dc=example,dc=com
    bind_password = insecure
    basedn = dc=example,dc=com
    cacert = /etc/ca/ldap_ca.crt
    search_scope = LEVEL

  30. Hi Raul,

    The ldap authentication configuration in the airflow.cfg file is for the flask-admin version. Using the webserver_config.py allows the use of the FAB based web UI and supports RBAC. The biggest driver for me using the FAB based web UI was RBAC so that users could be assigned different levels of authorization.

    Thanks,
    Josh

  31. Oh, thanks.
    Flask-admin version is the old way that is going to be removed from version 2, am I correct..?

  32. What are these settings for:

    AUTH_LDAP_USE_TLS = False
    AUTH_LDAP_ALLOW_SELF_SIGNED = False
    AUTH_LDAP_TLS_CACERTFILE = ‘<>’

    If I omitted these settings, it also worked for me, perhaps first two have some defaults, which default to False.
    But omitting cacertfile location – login still works for me.

    Using ldaps on port 636

    By the way. At the moment I need to specify “AUTH_LDAP_SEARCH” from where it starts searching users.
    But is it possible also, that I can give permissions through ldap groups also..?
    (for example, user1, user2 are present in group named group1 – so only these 2 users can login)

  33. Hi Raul,

    Your certificate is likely not valid if adding AUTH_LDAP_TLS_CACERTFILE causes a login failure. Can you validate the certificate using openssl?


    openssl s_client -connect $ldaphost:636

    Thanks,
    Josh

  34. No.
    If I add or omit (do not add) this parameter to webserver_config.py file – login still works for me the same way.
    AUTH_LDAP_TLS_CACERTFILE = ‘’

    So it doesn’t matter if I have this parameter set or not. Still works.

    Output of this query is like this (shouldn’t I need to add client certificate also to connect..?).

    # openssl s_client -connect $ldaphost:636
    CONNECTED(00000003)
    depth=1 DC = com, DC = domain, CN = YYYYYYYY
    verify return:1
    depth=0 CN = adhostX.domain.com
    verify return:1

    Certificate chain
    ….
    Server certificate
    ….

    No client certificate CA names sent
    Client Certificate Types: RSA sign, DSA sign, ECDSA sign
    Requested Signature Algorithms: ……….
    Shared Requested Signature Algorithms: ……….
    Peer signing digest: SHA256
    Server Temp Key: ECDH, P-521, 521 bits

    SSL handshake has read 2446 bytes and written 551 bytes

    New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA384
    Server public key is 2048 bit
    Secure Renegotiation IS supported
    Compression: NONE
    Expansion: NONE
    No ALPN negotiated
    SSL-Session:
    Protocol : TLSv1.2
    Cipher : ECDHE-RSA-AES256-SHA384
    Session-ID: …..
    Session-ID-ctx:
    Master-Key: …………………..
    Key-Arg : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Start Time: 1598419312
    Timeout : 300 (sec)
    Verify return code: 0 (ok)

    Raul

  35. Hi Josh,

    The blog is really informative. Thank you.

    I am trying to configure Azure Active Directory with Airflow, which is deployed in an Azure Kubernetes cluster through helm chart(stable/airflow).

    Any idea on it would be great help.

    Thanks.

  36. Hi Josh,

    The blog is really informative. Thank you.

    I am trying to configure Azure Active Directory with Airflow which is deployed in Azure Kubernetes service through helm chart (stable/airflow).

    Any idea on it would be of great help.

    Thanks.

  37. This really helped. Thanks Josh.

  38. I am trying basic AUTH_DB type authentication with RBAC with airflow 1.12. However I am getting 404 error on web UI and below warning in the logs:
    WARNING – Access is Denied for: can_index on: Airflow

    I have added one user as below:
    airflow create_user -r Admin -u admin -e admin@example.com -f admin -l user -p test

    Any idea how do we add can_index permission to specific roles or users.

  39. Chiranjeevi Avatar
    Chiranjeevi

    Silpa,

    Are you able to find a solution to configure Apachie airflow with Azure Ad. The same requirement is there for me. If you find any rrlateble I formation please post

  40. Hello Josh
    Running
    docker logs airflow-webserver

    Return:
    Traceback (most recent call last):
    File “/home/airflow/.local/lib/python3.7/site-packages/flask_appbuilder/security/manager.py”, line 874, in auth_user_ldap
    import ldap
    ModuleNotFoundError: No module named ‘ldap’

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):

    Do you know whats i need to install?
    Thanks a lot

  41. Hi Eneh,

    That means the python ldap library is missing. You need to install it with pip. Depending on the distro you’re using for the container, it should be one of the following:


    pip3 install pyldap
    pip install ldap3

    Please share your Dockerfile, if you need any more help.

    Thanks,
    Josh

  42. Thanks a lot Josh
    I am pasting the airflow-webserver Dockerfile

    FROM apache/airflow:1.10.12-python3.7
    USER root
    RUN apt-get update -yqq \
    && apt-get upgrade -yqq \
    && apt-get install -yqq –no-install-recommends \
    iputils-ping \
    openssh-server \
    #sshpass \
    gcc python3-dev \
    && apt-get autoremove -yqq –purge \
    && apt-get clean
    RUN /usr/local/bin/python -m pip install –upgrade pip
    RUN pip install ansible==2.9.11 \
    && pip install ansible-runner==1.4.6

    RUN pip install bottle==0.12.18 \
    && pip install openpyxl==3.0.3 \
    && pip install textfsm==1.1.0 \
    && pip install paramiko==2.7.1 \
    && pip install influxdb==5.3.1 \
    && pip install pymongo==3.11.1 \
    && pip3 install python-powerdns==0.2.1\
    && pip install ttp==0.6.0
    RUN pip install pysyslogclient==0.1.1
    RUN pip install ansible-runner==1.4.6
    RUN mkdir /etc/ansible && chmod +755 /etc/ansible
    RUN mkdir /usr/local/airflow && chown -R airflow /usr/local/airflow
    RUN chmod 777 /usr/local/airflow
    RUN mkdir /usr/local/backup && chmod 777 /usr/local/backup
    COPY src/ansible/ansible.cfg /etc/ansible
    COPY src/ansible/hosts /etc/ansible
    COPY src/airflow/airflow.cfg /opt/airflow
    USER airflow
    WORKDIR /usr/local/airflow
    “dockerfiles/webserver/Dockerfile”
    openssh-server \
    gcc python3-dev \
    && apt-get autoremove -yqq –purge \
    && apt-get clean
    RUN /usr/local/bin/python -m pip install –upgrade pip
    RUN pip install ansible==2.9.11 \
    && pip install ansible-runner==1.4.6
    RUN pip install bottle==0.12.18 \
    && pip install openpyxl==3.0.3 \
    && pip install textfsm==1.1.0 \
    && pip install paramiko==2.7.1 \
    && pip install influxdb==5.3.1 \
    && pip install pymongo==3.11.1 \
    && pip3 install python-powerdns==0.2.1\
    && pip install ttp==0.6.0
    RUN pip install pysyslogclient==0.1.1
    RUN pip install ansible-runner==1.4.6
    RUN mkdir /etc/ansible && chmod +755 /etc/ansible
    RUN mkdir /usr/local/airflow && chown -R airflow /usr/local/airflow
    RUN chmod 777 /usr/local/airflow
    RUN mkdir /usr/local/backup && chmod 777 /usr/local/backup
    COPY src/ansible/ansible.cfg /etc/ansible
    COPY src/ansible/hosts /etc/ansible
    COPY src/airflow/airflow.cfg /opt/airflow
    USER airflow
    WORKDIR /usr/local/airflow

    Thank you so much Josh

  43. Hi Eneh,

    Add the following to your Dockerfile – in the appropriate location(s):


    RUN apt-get install libldap2-dev libssl-dev libsasl2-dev
    ..
    RUN pip install pyldap

    Once I added that, I was able to get it working.

    While the Dockerfile you pasted does not work, I assume there was a copy/paste error and that is not your correct Dockerfile.

    Thanks,
    Josh

  44. Hi Josh. I’m using Airflow 1.10.15, and I need to validate ldap users from diferents OU. Is it possible?

  45. No, validating LDAP users from different OUs is not possible, especially with the move to Flask App builder and all authorization being controlled within Airflow. This makes it so that LDAP (or AD) is used for authentication only.

    Also, 1.10.15 is old, you should look at upgrading to 2+.

    Thanks,
    Josh

  46. VINAY PUNGANOOR Avatar
    VINAY PUNGANOOR

    HI , I am trying to setup Airflow on a VM. The LDAP authentication is working only for the bind user but not for any other users. The AD Audit logs shows that the other users are authenticated successfully. Not sure what we are missing here.

    Below is the error I am getting:

    DEBUG – LDAP bind TRY with username:

  47. Hi Vinay,

    I can’t tell what might be going on without seeing your configuration. If it’s possible to post a sanitized version, then I may be able to repro and help you out.

    Thanks,
    Josh

Leave a Reply

Your email address will not be published. Required fields are marked *