AWS Access Keys in S3 Bucket Policies

I’ve seen what appeared to be AWS Access Keys in S3 bucket policies as an AWS principal in the past. I could never figure out why this was happening and nobody appeared to be adding them. The Access Key never showed up as a valid user Access Key in a search of IAM objects either.

It turns out that if you have an S3 bucket policy with a reference to an IAM user, and delete that user, the principal will be replaced with a string that appears to be an access key. I assume that this is an internal pointer that AWS uses to track that user.

Note: While it is syntactically correct, using an AWS Access Key as a principal in an IAM policy attached to an S3 bucket is not a valid object.

https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-bucket-user-policy-specifying-principal-intro.html

Apache Airflow 1.10.2– Active Directory Authentication (via LDAP[s])

This basic guide assumes a functional airflow deployment, albeit without authentication, or perhaps, with LDAP authentication under the legacy UI scheme. This guide also assumes apache airflow 1.10.2, installed via pip using MySQL and Redis. The guide also assumes Amazon Linux on an EC2 instance.

Pre-requisites:

    An Active Directory service account to use as the bind account.

First, modify airflow.cfg to remove the existing LDAP configuration, if it exists. This can be done by simply removing the values to the right of the equal sign under [ldap] in the airflow.cfg configuration file. Alternately, the [ldap] section can be removed.

Next, modify airflow.cfg to remove ‘authentication = True’, under the [webserver] section. Also, remove the authentication backend line, if it exists.

And finally, create a webserver_config.py file in the AIRFLOW_HOME directory (this is where airflow.cfg is also located). The contents should reflect the following:

import os
from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_LDAP
basedir = os.path.abspath(os.path.dirname(__file__))

SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

CSRF_ENABLED = True

AUTH_TYPE = AUTH_LDAP

AUTH_ROLE_ADMIN = 'Admin'
AUTH_USER_REGISTRATION = True

AUTH_USER_REGISTRATION_ROLE = "Admin"
# AUTH_USER_REGISTRATION_ROLE = "Viewer"

AUTH_LDAP_SERVER = 'ldaps://$ldap:636/
AUTH_LDAP_SEARCH = "DC=domain,DC=organization,DC=com"
AUTH_LDAP_BIND_USER = 'CN=bind-user,OU=serviceAccounts,DC=domain,DC=organization,DC=com'
AUTH_LDAP_BIND_PASSWORD = '**************'
AUTH_LDAP_UID_FIELD = 'sAMAccountName'
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = '/etc/pki/ca-trust/source/anchors/$root_CA.crt'

Note that this requires a valid CA certificate in the location specified to verify the SSL certificate given by Active Directory so the $ldap variable must be a resolvable name which has a valid SSL certificate signed by $root_CA.crt. Also note that any user who logs in with this configuration in place will be an Admin (more to come on this).

Once this configuration is in place, it will likely be desirable to remove all existing users, using the following set of commands from the mysql CLI, logged into the airflow DB instance:

SET FOREIGN_KEY_CHECKS=0;
truncate table ab_user;
truncate table ab_user_role;
SET FOREIGN_KEY_CHECKS=1;

Next, restart the webserver process:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once the webserver comes up, login as the user intended to be the Admin. This will allow this user to manage other users later on.

After logging in as the Admin, modify the webserver_config.py to reflect the following change(s):

# AUTH_USER_REGISTRATION_ROLE = "Admin"
AUTH_USER_REGISTRATION_ROLE = "Viewer"

Now restart the webserver process once more:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once that is done, all new users will register as ‘Viewers’. This will give them limited permissions. The Admin user(s) can then assign proper permissions, based on company policies. Note that this does not allow random people to register — only users in AD can register.

I also like to modify the ‘Public’ role to add ‘can_index’ so that anonymous users can see the UI, although they do not see DAGs or other items.

Note that Apache airflow introduced RBAC with version 1.10 and dropped support for the legacy UI after version 1.10.2.

References:
Airflow
Updating Airflow
Flask AppBuilder LDAP Authentication
Flask AppBuilder Configuration

SSH in a for loop is a solution…

I just read an article by Jay Valentine on LinkedIn where he talks about Puppet and how they were not profitable, and also noted that Chef is not, and has never been, profitable. That got me to thinking, why are IT professionals investing in these technologies (time, knowledge, effort…).

As an IT pro, it’s tempting to become a “fan boy” — someone who learns something difficult to use, and then because so much has been invested (time, effort, knowledge), it benefits the IT pro to evangelize the tool or software to make it more relevant (and thus make the IT pro’s skills more valuable and relevant).

This happens to me all the time, Linux, cfengine, puppet, ruby, etc… With little regard for objective analysis of what would work best. I had switched to puppet, from cfengine, when I heard Redhat had adopted Puppet. That was long ago, and they have since switched to Ansible — time to focus more on containers and, when necessary, Ansible. (Although I will continue to support my clients in whatever technology they desire, like any good consultant.)

While this is not a complete waste and is, most of the time, a very good thing, since it will enable quick execution on projects with known skills and tools, it is not ideal in the long run. The reason for this is that all of these projects and tools become very complicated over time. Take puppet or chef — they do require a significant amount of knowledge to effectively deploy. Even worse, they change rapidly. A system deployed one year could require a major re-write (of the manifest/recipe) the following year, if it were upgraded. Many deployments of these configuration management tools go for years without major updates because the effort in upgrading large numbers of services, servers, and configurations is incredible.

This is a huge amount of technical debt. I’d now venture to say that the more time you must spend deploying a configuration management solution, the more technical debt you will incur, unless you do have a very focused plan to upgrade frequently, and maintain a dedicated “puppet/chef/xxxx” IT pro.

I recall reading and/or hearing the famous Luke Kanies (of Puppetlabs) quote where he says, “ssh in a for loop is not a solution”… This has always bothered me, and I couldn’t quantify the reason very well, but it’s similar to the basic text processing argument in old school linux circles — text output is universal. Any app, tool, utility, can process text. Once you move to binary or other output, you lose the ability to universally process the output. It may be more efficient to process it in other manners, but it’s no longer universal.

“SSH in a for loop” is universal.

Standalone puppet with hiera 5 error…

With puppet moving more and more away from supporting a standalone model, it’s somewhat difficult to get puppet standalone working. I recently got bit by a hiera update that caused my puppet standalone deployments to stop interacting with hiera the way that I had deployed it.

Affected versions:

  • puppet 4.10.10
  • hiera 3.4.3

The error that I was receiving was similar to the following — note that this example cites an error with the ec2tagfacts module, which I have modified to work with puppet 4.*:

Error: Evaluation Error: Error while evaluating a Function Call, Lookup of key 'ec2tagfacts::aws_access_key_id' failed: DataBinding 'hiera': v5 hiera.yaml is only to be used inside an environment or a module and cannot be given to the global hiera at $path_to/puppet/manifests/site.pp:12:3 on node $this_node

The new way of managing hiera (via puppet server) is to contain hiera within each environment and module. This does not work with [the way I use] puppet standalone because of the way you have to reference the hiera configuration. I need to try putting puppet in the default locations and try that at some point.

I was able to resolve the issue by downgrading hiera to version 3.1.1. I am testing with other versions. Updates to follow.

Adding Global Environment Variables to Jenkins via puppet…

When using Jenkins in any environment, it’s useful to have variables related to that environment available to Jenkins jobs. I recently worked on a project where I used puppet to deploy global environment variables to Jenkins for use with AWS commands — typically to execute the awscli, one must have knowledge of the region, account, and other items.

In order to make global environment variables available to Jenkins, we can create an init.groovy.d directory in $JENKINS_HOME, as part of the Jenkins puppet profile, ie:

class profile::jenkins::install () {
...
  file { '/var/lib/jenkins/init.groovy.d':
    ensure => directory,
    owner  => jenkins,
    group  => jenkins,
    mode   => '0755',
  }
...
}

We then need to create the puppet template (epp) that we will deploy to this location, as a groovy script:

import jenkins.model.Jenkins
import hudson.slaves.EnvironmentVariablesNodeProperty
import hudson.slaves.NodeProperty

def instance             = Jenkins.instance
def environment_property = new EnvironmentVariablesNodeProperty();

for (property in environment_property) {
  property.envVars.put("AWS_VARIABLE1", "<%= @ec2_tag_variable1 -%>")
  property.envVars.put("AWS_VARIABLE2", "<%= @ec2_tag_variable2 -%>")
  property.envVars.put("AWS_VARIABLE3", "<%= @ec2_tag_variable3 -%>")
}

instance.nodeProperties.add(environment_property)

instance.save()

Note that in this instance, I am using the ec2tagfacts puppet module that allows me to use EC2 tags as facts in puppet. I will later move to dynamic fact enumeration using a script with facter.

The next step is to add another file resource to the Jenkins puppet profile to place the groovy script in the proper location and restart the Jenkins Service:

class profile::jenkins::install () {
...
  file { '/var/lib/jenkins/init.groovy.d/aws-variables.groovy':
    ensure  => present,
    mode    => '0755',
    owner   => jenkins,
    group   => jenkins,
    notify  => Service['jenkins'],
    content => template('jenkins/aws-variables.groovy.epp'),
  }
...
}

Now when puppet next runs, this will deploy the groovy script and restart Jenkins to take effect.

Note that these environment variables are not viewable under System Information under Manage Jenkins, but are only available inside each Jenkins job, ie inside a shell build section:

#!/bin/bash -x

echo "${AWS_VARIABLE1}"

Retrieving puppet facts from AWS System Manager

AWS System Manager makes it easy to store and retrieve parameters for use across servers, services, and applications in AWS. One great benefit is storing secrets for use, as needed. I recently needed to retrieve some parameters to place in a configuration file via puppet and wrote a short script to retrieve these values as facts.

Create a script like the following in /etc/facter/facts.d, make it executable.

#!/bin/bash

aws configure set region us-east-1
application_username=$(aws ssm get-parameter --name application_username | egrep "Value" | awk -F\" '{print $4}')
application_password=$(aws ssm get-parameter --name application_password --with-decryption | egrep "Value" | awk -F\" '{print $4}')

echo "application_username=${application_username}"
echo "application_password=${application_password}"

exit 0;

Note that this assumes the username is not an encrypted secret, while the password is.

This can be tested with the following:

# facter -p application_username
# facter -p application_password

These facts can then be used in templates, like the following:

# config.cfg.erb
connection_string = <%= @application_username %>:<%= @application_password %>

Running Apache 2 under Ubuntu 16.04 on Docker

I recently wanted to setup a new Ubuntu 16.04 host running Apache under Docker for some AWS ECS/Fargate testing I was doing and encountered the following error:

docker run -p 8085:80 aws-ecr-hello-world:v0.5
[Thu Mar 15 00:11:31.074011 2018] [core:warn] [pid 1] AH00111: Config variable ${APACHE_LOCK_DIR} is not defined
[Thu Mar 15 00:11:31.074576 2018] [core:warn] [pid 1] AH00111: Config variable ${APACHE_PID_FILE} is not defined
AH00526: Syntax error on line 74 of /etc/apache2/apache2.conf:
Invalid Mutex directory in argument file:${APACHE_LOCK_DIR}

This is a typical Ubuntu problem where the /etc/apache2/envvars file needs to be sourced before apache2 can start properly. To figure out which ones needed to be added, I commented out the CMD to start apache and instead entered a command to print out the contents of the envvars file. I also added a sed command to print out line 74 of the apache2.conf file so I could further troubleshoot what was happening there.

# Dockerfile
...
CMD ["cat", "/etc/apache2/envvars"]
CMD ["sed", "-n", "74p", "/etc/apache2/apache2.conf"]
...

This output showed that I had to add a few environment variables to the Dockerfile, and verify that they exist when I run the container:

# Dockerfile
...
ENV APACHE_RUN_USER  www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR   /var/log/apache2
ENV APACHE_PID_FILE  /var/run/apache2/apache2.pid
ENV APACHE_RUN_DIR   /var/run/apache2
ENV APACHE_LOCK_DIR  /var/lock/apache2
ENV APACHE_LOG_DIR   /var/log/apache2

RUN mkdir -p $APACHE_RUN_DIR
RUN mkdir -p $APACHE_LOCK_DIR
RUN mkdir -p $APACHE_LOG_DIR
...

I also verified that the directories would exist to prevent any issues there:

# Dockerfile
...
RUN mkdir -p $APACHE_RUN_DIR
RUN mkdir -p $APACHE_LOCK_DIR
RUN mkdir -p $APACHE_LOG_DIR
...

After I finished that, I rebuilt the image and was able to run the container without issues.

The full Dockerfile is:

FROM ubuntu:16.04

# Install dependencies
RUN apt-get update -y
RUN apt-get install -y apache2

# Install apache and write hello world message
RUN echo "Hello World!" > /var/www/index.html

# Configure apache
RUN a2enmod rewrite
RUN chown -R www-data:www-data /var/www


ENV APACHE_RUN_USER  www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR   /var/log/apache2
ENV APACHE_PID_FILE  /var/run/apache2/apache2.pid
ENV APACHE_RUN_DIR   /var/run/apache2
ENV APACHE_LOCK_DIR  /var/lock/apache2
ENV APACHE_LOG_DIR   /var/log/apache2

RUN mkdir -p $APACHE_RUN_DIR
RUN mkdir -p $APACHE_LOCK_DIR
RUN mkdir -p $APACHE_LOG_DIR

EXPOSE 80

# CMD ["sed", "-n", "74p", "/etc/apache2/apache2.conf"]
# CMD ["cat", "/etc/apache2/envvars"]
 CMD ["/usr/sbin/apache2", "-D",  "FOREGROUND"]

Build the container image:

> docker build -t aws-ecr-hello-world:v0.9.1 .
Sending build context to Docker daemon   2.56kB
Step 1/18 : FROM ubuntu:16.04
 ---> f975c5035748
Step 2/18 : RUN apt-get update -y
 ---> Using cache
 ---> 1716ac62d2f6
Step 3/18 : RUN apt-get install -y apache2
 ---> Using cache
 ---> b03c08c103b5
Step 4/18 : RUN echo "Hello World!" > /var/www/index.html
 ---> Using cache
 ---> a8352375b937
Step 5/18 : RUN a2enmod rewrite
 ---> Using cache
 ---> 313f2e8046ec
Step 6/18 : RUN chown -R www-data:www-data /var/www
 ---> Using cache
 ---> c2e7512d4fe8
Step 7/18 : ENV APACHE_RUN_USER  www-data
 ---> Using cache
 ---> 2054c48681ae
Step 8/18 : ENV APACHE_RUN_GROUP www-data
 ---> Using cache
 ---> 493b20667534
Step 9/18 : ENV APACHE_LOG_DIR   /var/log/apache2
 ---> Using cache
 ---> 8c5029eb8e83
Step 10/18 : ENV APACHE_PID_FILE  /var/run/apache2/apache2.pid
 ---> Using cache
 ---> 701ddcccf335
Step 11/18 : ENV APACHE_RUN_DIR   /var/run/apache2
 ---> Using cache
 ---> 6700b8a02ca0
Step 12/18 : ENV APACHE_LOCK_DIR  /var/lock/apache2
 ---> Using cache
 ---> ac692e86caf7
Step 13/18 : ENV APACHE_LOG_DIR   /var/log/apache2
 ---> Using cache
 ---> 660af37232bc
Step 14/18 : RUN mkdir -p $APACHE_RUN_DIR
 ---> Running in 02978786f1b5
Removing intermediate container 02978786f1b5
 ---> 3e5ef0c00431
Step 15/18 : RUN mkdir -p $APACHE_LOCK_DIR
 ---> Running in 68408f3091c1
Removing intermediate container 68408f3091c1
 ---> 90efa3a2f9bc
Step 16/18 : RUN mkdir -p $APACHE_LOG_DIR
 ---> Running in f1ee7e4d5a4b
Removing intermediate container f1ee7e4d5a4b
 ---> 9fb6a50c6792
Step 17/18 : EXPOSE 80
 ---> Running in f3fd904326e4
Removing intermediate container f3fd904326e4
 ---> b4ba8575620d
Step 18/18 : CMD ["/usr/sbin/apache2", "-D",  "FOREGROUND"]
 ---> Running in a3cba653d7b3
Removing intermediate container a3cba653d7b3
 ---> 0bfa187abf69
Successfully built 0bfa187abf69
Successfully tagged aws-ecr-hello-world:v0.9.1

Run the container:

> docker run -p 8085:80 aws-ecr-hello-world:v0.9.1
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

UPDATE: it looks like James Turnbull had already solved this problem here.

The Phoenix Project

I recently stumbled upon a novel that talks about managing IT Operations. “The Phoenix Project”, by Gene Kim, Kevin Behr, and George Spafford. Wow, what a great read. This book accurately describes many of my experiences in IT with many different companies.

This book has some exceptional concepts around optimizing interaction between Development, IT Operations, and the customer. The book pushes the reader (and fictional characters) to visualize development and IT Operations as a factory floor, using the same terminology to analyze dis-function. Each station includes: machine, man, method, measure. These components can be used to optimize work flow, find bottlenecks, and improve overall efficiency of the factory or deveopment/IT Operations space.

The Three Ways

The First Way: Left to right workflow from development to IT Operations to the Customer. Small batch sizes and intervals of work. Reduce WIP or inventory of tasks.

The Second Way: Constant feedback from right to left at all stages.

The Third Way: Creating a culture that fosters continual experimentation (risk) and understanding that repetition and practice is the prerequisite to mastery.

The Four Types of Work

Business projects: Business initiatives, tracked and PM’d.

Internal IT projects: Infrastructure or IT projects.

Changes: Scheduled updates, releases, etc.. — configuration management.

Unplanned work or recovery work: Production issues, unplanned incidents or problems that disrupt the above 3 types of work.

As a DevOps consultant, I can immediately use these concepts to improve the value that I provide to each client by working with these concepts as a guide.

I highly recommend this book.

Powershell to ElasticSearch to find ElastAlert

I recently worked on an interesting project where I needed to use a powershell script to query ElasticSearch to find a document that was inserted via ElastAlert.

The purpose of this exercise was to determine whether or not a service had been marked down recently, which would determine whether an operation ran that might take down the passive node in an active/passive HA configuration.

The following script snippet will search ElasticSearch for any entries in the past 1 week with the specified rule name with more than 0 hits and matches.

    $Rule_Name = "Rule name here"

    $Es_Endpoint = "elastic_search_dns_endpoint"
    $Es_Index    = "elastalert_writeback_index"
    $Es_Type     = "elastalert_status"

    $Body = @{
      "query" = @{
        "bool" = @{
          "filter" = @(
            @{  
              "term" = @{
                "rule_name" = $Rule_Name;
              }   
            };  
            @{  
              "range" = @{
                "hits" = @{
                  "gt" = 0 
                }   
              }   
            };
            @{  
              "range" = @{
                "matches" = @{
                  "gt" = 0 
                }   
              }   
            };    
            @{  
              "range" = @{
                "@timestamp" = @{
                  "gt" = "now-1w"
                }   
              }   
            }   
          )   
        }   
      }   
    }   

    $Json_Body = $Body | ConvertTo-Json -Depth 10

    # Un-comment as needed for troubleshooting
    # Write-Output $Json_Body

    $Response = Invoke-RestMethod -Method POST -URI https://$Es_Endpoint/$Es_Index/_search  -Body $Json_Body -ContentType 'application/json'

    # Un-comment these as needed for troubleshooting
    # Write-Output ($Response | Format-List | Out-String)
    # Write-Output ($Response.hits.total | Out-String)

    if ($Response.hits.total -gt 0) {
      $Restore = 0 
    }   

Once the query returns, the script checks to see if the number of hits exceeds 0, which means at least one entry satisfied the query parameters. Based on this response, action can then be taken on the HA service in question.

ruby aws-sdk strikes again…

When using ruby to upload files to S3 and trying to use multipart upload, beware the following ArgumentError:

...param_validator.rb:32:in `validate!': unexpected value at params[:server_side_encryption] (ArgumentError)
...
	from /var/lib/jenkins/.gem/ruby/gems/aws-sdk-core-3.6.0/lib/seahorse/client/request.rb:70:in `send_request'
	from /var/lib/jenkins/.gem/ruby/gems/aws-sdk-s3-1.4.0/lib/aws-sdk-s3/client.rb:3980:in `list_parts'
...

The options passed to list_parts must not include “server_side_encryption”. I always forget to remove this parameter.

A good way that I have found to solve this issue is:

...
      input_opts = {
        bucket:                 bucket,
        key:                    key,
        server_side_encryption: "AES256",
      }

      if defined? mime_type
        input_opts = input_opts.merge({
          content_type: mime_type,
        })
      end
...
      input_opts.delete_if {|key,value| key.to_s.eql?("content_type") }
      input_opts.delete_if {|key,value| key.to_s.eql?("server_side_encryption") }

      input_opts = input_opts.merge({
          :upload_id   => mpu_create_response.upload_id,
      })

      parts_resp = s3.list_parts(input_opts)
...

You can see here that I delete values that may have been added so that the final options hash will work with the list_parts call.