Apache Airflow 1.10.2– Active Directory Authentication (via LDAP[s])

This basic guide assumes a functional airflow deployment, albeit without authentication, or perhaps, with LDAP authentication under the legacy UI scheme. This guide also assumes apache airflow 1.10.2, installed via pip using MySQL and Redis. The guide also assumes Amazon Linux on an EC2 instance.

Pre-requisites:

    An Active Directory service account to use as the bind account.

First, modify airflow.cfg to remove the existing LDAP configuration, if it exists. This can be done by simply removing the values to the right of the equal sign under [ldap] in the airflow.cfg configuration file. Alternately, the [ldap] section can be removed.

Next, modify airflow.cfg to remove ‘authentication = True’, under the [webserver] section. Also, remove the authentication backend line, if it exists.

And finally, create a webserver_config.py file in the AIRFLOW_HOME directory (this is where airflow.cfg is also located). The contents should reflect the following:

import os
from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_LDAP
basedir = os.path.abspath(os.path.dirname(__file__))

SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

CSRF_ENABLED = True

AUTH_TYPE = AUTH_LDAP

AUTH_ROLE_ADMIN = 'Admin'
AUTH_USER_REGISTRATION = True

AUTH_USER_REGISTRATION_ROLE = "Admin"
# AUTH_USER_REGISTRATION_ROLE = "Viewer"

AUTH_LDAP_SERVER = 'ldaps://$ldap:636/
AUTH_LDAP_SEARCH = "DC=domain,DC=organization,DC=com"
AUTH_LDAP_BIND_USER = 'CN=bind-user,OU=serviceAccounts,DC=domain,DC=organization,DC=com'
AUTH_LDAP_BIND_PASSWORD = '**************'
AUTH_LDAP_UID_FIELD = 'sAMAccountName'
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = '/etc/pki/ca-trust/source/anchors/$root_CA.crt'

Note that this requires a valid CA certificate in the location specified to verify the SSL certificate given by Active Directory so the $ldap variable must be a resolvable name which has a valid SSL certificate signed by $root_CA.crt. Also note that any user who logs in with this configuration in place will be an Admin (more to come on this).

Once this configuration is in place, it will likely be desirable to remove all existing users, using the following set of commands from the mysql CLI, logged into the airflow DB instance:

SET FOREIGN_KEY_CHECKS=0;
truncate table ab_user;
truncate table ab_user_role;
SET FOREIGN_KEY_CHECKS=1;

Next, restart the webserver process:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once the webserver comes up, login as the user intended to be the Admin. This will allow this user to manage other users later on.

After logging in as the Admin, modify the webserver_config.py to reflect the following change(s):

# AUTH_USER_REGISTRATION_ROLE = "Admin"
AUTH_USER_REGISTRATION_ROLE = "Viewer"

Now restart the webserver process once more:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once that is done, all new users will register as ‘Viewers’. This will give them limited permissions. The Admin user(s) can then assign proper permissions, based on company policies. Note that this does not allow random people to register — only users in AD can register.

I also like to modify the ‘Public’ role to add ‘can_index’ so that anonymous users can see the UI, although they do not see DAGs or other items.

Note that Apache airflow introduced RBAC with version 1.10 and dropped support for the legacy UI after version 1.10.2.

References:
Airflow
Updating Airflow
Flask AppBuilder LDAP Authentication
Flask AppBuilder Configuration

SSH in a for loop is a solution…

I just read an article by Jay Valentine on LinkedIn where he talks about Puppet and how they were not profitable, and also noted that Chef is not, and has never been, profitable. That got me to thinking, why are IT professionals investing in these technologies (time, knowledge, effort…).

As an IT pro, it’s tempting to become a “fan boy” — someone who learns something difficult to use, and then because so much has been invested (time, effort, knowledge), it benefits the IT pro to evangelize the tool or software to make it more relevant (and thus make the IT pro’s skills more valuable and relevant).

This happens to me all the time, Linux, cfengine, puppet, ruby, etc… With little regard for objective analysis of what would work best. I had switched to puppet, from cfengine, when I heard Redhat had adopted Puppet. That was long ago, and they have since switched to Ansible — time to focus more on containers and, when necessary, Ansible. (Although I will continue to support my clients in whatever technology they desire, like any good consultant.)

While this is not a complete waste and is, most of the time, a very good thing, since it will enable quick execution on projects with known skills and tools, it is not ideal in the long run. The reason for this is that all of these projects and tools become very complicated over time. Take puppet or chef — they do require a significant amount of knowledge to effectively deploy. Even worse, they change rapidly. A system deployed one year could require a major re-write (of the manifest/recipe) the following year, if it were upgraded. Many deployments of these configuration management tools go for years without major updates because the effort in upgrading large numbers of services, servers, and configurations is incredible.

This is a huge amount of technical debt. I’d now venture to say that the more time you must spend deploying a configuration management solution, the more technical debt you will incur, unless you do have a very focused plan to upgrade frequently, and maintain a dedicated “puppet/chef/xxxx” IT pro.

I recall reading and/or hearing the famous Luke Kanies (of Puppetlabs) quote where he says, “ssh in a for loop is not a solution”… This has always bothered me, and I couldn’t quantify the reason very well, but it’s similar to the basic text processing argument in old school linux circles — text output is universal. Any app, tool, utility, can process text. Once you move to binary or other output, you lose the ability to universally process the output. It may be more efficient to process it in other manners, but it’s no longer universal.

“SSH in a for loop” is universal.

Standalone puppet with hiera 5 error…

With puppet moving more and more away from supporting a standalone model, it’s somewhat difficult to get puppet standalone working. I recently got bit by a hiera update that caused my puppet standalone deployments to stop interacting with hiera the way that I had deployed it.

Affected versions:

  • puppet 4.10.10
  • hiera 3.4.3

The error that I was receiving was similar to the following — note that this example cites an error with the ec2tagfacts module, which I have modified to work with puppet 4.*:

Error: Evaluation Error: Error while evaluating a Function Call, Lookup of key 'ec2tagfacts::aws_access_key_id' failed: DataBinding 'hiera': v5 hiera.yaml is only to be used inside an environment or a module and cannot be given to the global hiera at $path_to/puppet/manifests/site.pp:12:3 on node $this_node

The new way of managing hiera (via puppet server) is to contain hiera within each environment and module. This does not work with [the way I use] puppet standalone because of the way you have to reference the hiera configuration. I need to try putting puppet in the default locations and try that at some point.

I was able to resolve the issue by downgrading hiera to version 3.1.1. I am testing with other versions. Updates to follow.

Retrieving puppet facts from AWS System Manager

AWS System Manager makes it easy to store and retrieve parameters for use across servers, services, and applications in AWS. One great benefit is storing secrets for use, as needed. I recently needed to retrieve some parameters to place in a configuration file via puppet and wrote a short script to retrieve these values as facts.

Create a script like the following in /etc/facter/facts.d, make it executable.

#!/bin/bash

aws configure set region us-east-1
application_username=$(aws ssm get-parameter --name application_username | egrep "Value" | awk -F\" '{print $4}')
application_password=$(aws ssm get-parameter --name application_password --with-decryption | egrep "Value" | awk -F\" '{print $4}')

echo "application_username=${application_username}"
echo "application_password=${application_password}"

exit 0;

Note that this assumes the username is not an encrypted secret, while the password is.

This can be tested with the following:

# facter -p application_username
# facter -p application_password

These facts can then be used in templates, like the following:

# config.cfg.erb
connection_string = <%= @application_username %>:<%= @application_password %>

Puppet deprecation in stdlib module…

As part of the long upgrade to become fully compatible with puppet 4 and drop puppet 3 support — version 4.13+ of the stdlib module introduced some breaking changes for other modules that I use. I recently upgrade some individual modules using the ‘puppet module upgrade’ method.

Upon upgrading, I received the following message:

Error: Evaluation Error: Error while evaluating a Function Call, undefined method `function_deprecation' ...

The solution, for now, until the modules that I use are upgraded to work with the newer version of stdlib, is to downgrade to version 4.12 of puppetlabs-stdlib.

Before downgrading, check to see if there are other modules which require a version greater than 4.12, ie:

> for file in $(find modules/ -type f -name metadata.json); do echo -n ${file}; echo -n ":  "; egrep -i stdlib ${file} || echo ""; done
modules//apt/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">= 4.5.0 < 5.0.0"}
modules//archive/metadata.json:        "name": "puppetlabs/stdlib",
modules//concat/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">= 4.2.0 < 5.0.0"}
modules//ec2tagfacts/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">= 3.2.0 < 5.0.0"},
modules//epel/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">= 3.0.0"}
modules//gnupg/metadata.json:  
modules//inifile/metadata.json:  
modules//java/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">= 2.4.0 < 5.0.0"}
modules//jenkins/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">= 4.6.0 < 5.0.0"},
modules//lvm/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">=4.1.0 < 5.0.0"}
modules//mysql/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">= 3.2.0 < 5.0.0"},
modules//python/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">= 4.6.0 < 6.0.0"},
modules//rvm/metadata.json:      {"name":"puppetlabs/stdlib","version_requirement":">=4.2.0"},
modules//staging/metadata.json:  
modules//vcsrepo/metadata.json:  
modules//zypprepo/metadata.json:  

The following modules require a version of puppetlabs-stdlib greater than 4.12:

  • apt – 4.5+
  • concat – 4.2.0+
  • jenkins- 4.6.0+
  • python4.6.0+
  • rvm – 4.2.0+

Sting with the apt module, the apt module is version 4.1.0. According to the puppetlabs-apt change log page, they recommend staying on version 2.3.0 unless you’re ready for the latest puppet 4 changes (they actually say any version 2 release but we’ll let that go after figuring it out…), so we need to downgrade that module first:

> puppet module upgrade puppetlabs-apt --version 2.3.0 --modulepath=modules/
Notice: Preparing to upgrade 'puppetlabs-apt' ...
Notice: Found 'puppetlabs-apt' (v2.4.0) in .../puppet/modules ...
Notice: Downloading from https://forgeapi.puppetlabs.com ...
Error: Could not upgrade module 'puppetlabs-apt' (v2.4.0 -> v2.3.0)
  Downgrading is not allowed.

Ok, so you can’t use the upgrade command to downgrade modules, so I need to be a little more heavy handed. I removed the apt module manually:

> rm -rf modules/apt/

Next, install version 2.3.0 of puppetlabs-apt:

> puppet module install puppetlabs-apt --version 2.3.0 --modulepath=modules/
Notice: Preparing to install into .../puppet/modules ...
Notice: Downloading from https://forgeapi.puppetlabs.com ...
Notice: Installing -- do not interrupt ...
.../puppet/modules
└─┬ puppetlabs-apt (v2.3.0)
  └── puppetlabs-stdlib (v4.19.0)

Now that I’m deep into this problem, I see great value in using something like r10k or Code Manager (which uses r10k) to manage these modules and dependencies.

Another potentially useful tip – I removed the puppetlabs-stdlib module and ran a module list command, and it then told me which modules were dependent upon the missing module, and which versions:

> puppet module list --modulepath=modules/
Warning: Missing dependency 'puppetlabs-stdlib':
  'puppetlabs-apt' (v2.3.0) requires 'puppetlabs-stdlib' (>= 4.5.0 < 5.0.0)
  'puppet-archive' (v2.0.0) requires 'puppetlabs-stdlib' (>= 4.13.0 < 5.0.0)
  'puppetlabs-concat' (v2.2.0) requires 'puppetlabs-stdlib' (>= 4.2.0 < 5.0.0)
  'bryana-ec2tagfacts' (v0.2.0) requires 'puppetlabs-stdlib' (>= 3.2.0 < 5.0.0)
  'stahnma-epel' (v1.2.2) requires 'puppetlabs-stdlib' (>= 3.0.0)
  'puppetlabs-java' (v1.5.0) requires 'puppetlabs-stdlib' (>= 2.4.0 < 5.0.0)
  'rtyler-jenkins' (v1.7.0) requires 'puppetlabs-stdlib' (>= 4.6.0 < 5.0.0)
  'puppetlabs-lvm' (v0.7.0) requires 'puppetlabs-stdlib' (>=4.1.0 < 5.0.0)
  'puppetlabs-mysql' (v3.10.0) requires 'puppetlabs-stdlib' (>= 3.2.0 < 5.0.0)
  'stankevich-python' (v1.14.2) requires 'puppetlabs-stdlib' (>= 4.6.0 < 6.0.0)
  'maestrodev-rvm' (v1.13.1) requires 'puppetlabs-stdlib' (>=4.2.0)

Moving on to using r10k..

gem install r10k
...

I then created a Puppetfile using my modules and am using r10k to manage them.

Note: converting to using r10k took around 20 minutes — if you’re not using r10k (or Code Manager), it’s time to start.

Latest Amazon EC2 AMI Supports Puppet 3.7.4

Good news! After quite some time without a supported puppet and ruby combination from the EC2 yum repositories, the latest AMI has support for puppet 3.7.4.

This will make deploying puppet environments easier and not require use of the gem and the development packages requirement to compile it.

puppet search function deprecation

With the release of puppet 3.7, the search function is now deprecated, and will be removed in 4.0. This is a feature that I had used by recommendation of a puppet cookbook when creating virtual resources and managing users that I have now removed.

Using the search function basically added the namespace of an existing class to another class to allow the second class access to the existing classes resources; virtual resources in this scenario. E.g, using the search function:

# /etc/puppet/modules/user/manifests/virtual.pp
class user::virtual {
  @user { "cacti":
    ensure => present,
    uid    => 999,
}

# /etc/puppet/modules/user/manifests/system.pp
class user::system {
  search user::virtual

  realize( User["cacti"])
}

# /etc/puppet/manifests/site.pp

class base {
  include user::system
}

In the above example I created a virtual user that I could then include anywhere and any number of times, and then realize that user where appropriate.

To fix the problem and stop using the search function, I simply included the user::virtual class everywhere that I included the user::system class, e.g.:

# /etc/puppet/modules/user/manifests/virtual.pp
class user::virtual {
  @user { "cacti":
    ensure => present,
    uid    => 999,
}

# /etc/puppet/modules/user/manifests/system.pp
class user::system {
  realize( User["cacti"])
}

# /etc/puppet/manifests/site.pp

class base {
  include user::virtual
  include user::system
}

This resolved my issue. Let me know in the comments if you have other uses of search and how this change might impact you.

– josh