Proxy SQL Services Reporting Server with HAProxy

A common issue with SQL Server Reporting Services is to proxy the server so it is not exposed on the internet. This is difficult to do with nginx, apache, and others due to NTLM authentication, although nginx offers a paid version that supports NTLM authentication. One easy fix is to use HAProxy and use TCP mode.

A simple configuration like the following works well. Note that this configuration requires an SSL certificate (+key) and terminates SSL at the haproxy service.

global
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    stats socket /var/lib/haproxy/stats

    # utilize system-wide crypto-policies
    ssl-default-bind-ciphers PROFILE=SYSTEM
    ssl-default-server-ciphers PROFILE=SYSTEM

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

frontend main
    bind *:80
    bind *:443 ssl crt /etc/haproxy/$path_to_cert_and_key_in_one_file
    option tcplog
    mode tcp
    default_backend             ssrs

backend ssrs
    mode tcp
    server  $ssrs_hostname $ssrs_ip_address:80 check

AWS Access Keys in S3 Bucket Policies

I’ve seen what appeared to be AWS Access Keys in S3 bucket policies as an AWS principal in the past. I could never figure out why this was happening and nobody appeared to be adding them. The Access Key never showed up as a valid user Access Key in a search of IAM objects either.

It turns out that if you have an S3 bucket policy with a reference to an IAM user, and delete that user, the principal will be replaced with a string that appears to be an access key. I assume that this is an internal pointer that AWS uses to track that user.

Note: While it is syntactically correct, using an AWS Access Key as a principal in an IAM policy attached to an S3 bucket is not a valid object.

https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-bucket-user-policy-specifying-principal-intro.html

Apache Airflow 1.10.2– Active Directory Authentication (via LDAP[s])

This basic guide assumes a functional airflow deployment, albeit without authentication, or perhaps, with LDAP authentication under the legacy UI scheme. This guide also assumes apache airflow 1.10.2, installed via pip using MySQL and Redis. The guide also assumes Amazon Linux on an EC2 instance.

Pre-requisites:

    An Active Directory service account to use as the bind account.

First, modify airflow.cfg to remove the existing LDAP configuration, if it exists. This can be done by simply removing the values to the right of the equal sign under [ldap] in the airflow.cfg configuration file. Alternately, the [ldap] section can be removed.

Next, modify airflow.cfg to remove ‘authentication = True’, under the [webserver] section. Also, remove the authentication backend line, if it exists.

And finally, create a webserver_config.py file in the AIRFLOW_HOME directory (this is where airflow.cfg is also located). The contents should reflect the following:

import os
from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_LDAP
basedir = os.path.abspath(os.path.dirname(__file__))

SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

CSRF_ENABLED = True

AUTH_TYPE = AUTH_LDAP

AUTH_ROLE_ADMIN = 'Admin'
AUTH_USER_REGISTRATION = True

AUTH_USER_REGISTRATION_ROLE = "Admin"
# AUTH_USER_REGISTRATION_ROLE = "Viewer"

AUTH_LDAP_SERVER = 'ldaps://$ldap:636/
AUTH_LDAP_SEARCH = "DC=domain,DC=organization,DC=com"
AUTH_LDAP_BIND_USER = 'CN=bind-user,OU=serviceAccounts,DC=domain,DC=organization,DC=com'
AUTH_LDAP_BIND_PASSWORD = '**************'
AUTH_LDAP_UID_FIELD = 'sAMAccountName'
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = '/etc/pki/ca-trust/source/anchors/$root_CA.crt'

Note that this requires a valid CA certificate in the location specified to verify the SSL certificate given by Active Directory so the $ldap variable must be a resolvable name which has a valid SSL certificate signed by $root_CA.crt. Also note that any user who logs in with this configuration in place will be an Admin (more to come on this).

Once this configuration is in place, it will likely be desirable to remove all existing users, using the following set of commands from the mysql CLI, logged into the airflow DB instance:

SET FOREIGN_KEY_CHECKS=0;
truncate table ab_user;
truncate table ab_user_role;
SET FOREIGN_KEY_CHECKS=1;

Next, restart the webserver process:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once the webserver comes up, login as the user intended to be the Admin. This will allow this user to manage other users later on.

After logging in as the Admin, modify the webserver_config.py to reflect the following change(s):

# AUTH_USER_REGISTRATION_ROLE = "Admin"
AUTH_USER_REGISTRATION_ROLE = "Viewer"

Now restart the webserver process once more:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once that is done, all new users will register as ‘Viewers’. This will give them limited permissions. The Admin user(s) can then assign proper permissions, based on company policies. Note that this does not allow random people to register — only users in AD can register.

I also like to modify the ‘Public’ role to add ‘can_index’ so that anonymous users can see the UI, although they do not see DAGs or other items.

Note that Apache airflow introduced RBAC with version 1.10 and dropped support for the legacy UI after version 1.10.2.

References:
Airflow
Updating Airflow
Flask AppBuilder LDAP Authentication
Flask AppBuilder Configuration

Adding Global Environment Variables to Jenkins via puppet…

When using Jenkins in any environment, it’s useful to have variables related to that environment available to Jenkins jobs. I recently worked on a project where I used puppet to deploy global environment variables to Jenkins for use with AWS commands — typically to execute the awscli, one must have knowledge of the region, account, and other items.

In order to make global environment variables available to Jenkins, we can create an init.groovy.d directory in $JENKINS_HOME, as part of the Jenkins puppet profile, ie:

class profile::jenkins::install () {
...
  file { '/var/lib/jenkins/init.groovy.d':
    ensure => directory,
    owner  => jenkins,
    group  => jenkins,
    mode   => '0755',
  }
...
}

We then need to create the puppet template (epp) that we will deploy to this location, as a groovy script:

import jenkins.model.Jenkins
import hudson.slaves.EnvironmentVariablesNodeProperty
import hudson.slaves.NodeProperty

def instance             = Jenkins.instance
def environment_property = new EnvironmentVariablesNodeProperty();

for (property in environment_property) {
  property.envVars.put("AWS_VARIABLE1", "<%= @ec2_tag_variable1 -%>")
  property.envVars.put("AWS_VARIABLE2", "<%= @ec2_tag_variable2 -%>")
  property.envVars.put("AWS_VARIABLE3", "<%= @ec2_tag_variable3 -%>")
}

instance.nodeProperties.add(environment_property)

instance.save()

Note that in this instance, I am using the ec2tagfacts puppet module that allows me to use EC2 tags as facts in puppet. I will later move to dynamic fact enumeration using a script with facter.

The next step is to add another file resource to the Jenkins puppet profile to place the groovy script in the proper location and restart the Jenkins Service:

class profile::jenkins::install () {
...
  file { '/var/lib/jenkins/init.groovy.d/aws-variables.groovy':
    ensure  => present,
    mode    => '0755',
    owner   => jenkins,
    group   => jenkins,
    notify  => Service['jenkins'],
    content => template('jenkins/aws-variables.groovy.epp'),
  }
...
}

Now when puppet next runs, this will deploy the groovy script and restart Jenkins to take effect.

Note that these environment variables are not viewable under System Information under Manage Jenkins, but are only available inside each Jenkins job, ie inside a shell build section:

#!/bin/bash -x

echo "${AWS_VARIABLE1}"

Retrieving puppet facts from AWS System Manager

AWS System Manager makes it easy to store and retrieve parameters for use across servers, services, and applications in AWS. One great benefit is storing secrets for use, as needed. I recently needed to retrieve some parameters to place in a configuration file via puppet and wrote a short script to retrieve these values as facts.

Create a script like the following in /etc/facter/facts.d, make it executable.

#!/bin/bash

aws configure set region us-east-1
application_username=$(aws ssm get-parameter --name application_username | egrep "Value" | awk -F\" '{print $4}')
application_password=$(aws ssm get-parameter --name application_password --with-decryption | egrep "Value" | awk -F\" '{print $4}')

echo "application_username=${application_username}"
echo "application_password=${application_password}"

exit 0;

Note that this assumes the username is not an encrypted secret, while the password is.

This can be tested with the following:

# facter -p application_username
# facter -p application_password

These facts can then be used in templates, like the following:

# config.cfg.erb
connection_string = <%= @application_username %>:<%= @application_password %>

Powershell to ElasticSearch to find ElastAlert

I recently worked on an interesting project where I needed to use a powershell script to query ElasticSearch to find a document that was inserted via ElastAlert.

The purpose of this exercise was to determine whether or not a service had been marked down recently, which would determine whether an operation ran that might take down the passive node in an active/passive HA configuration.

The following script snippet will search ElasticSearch for any entries in the past 1 week with the specified rule name with more than 0 hits and matches.

    $Rule_Name = "Rule name here"

    $Es_Endpoint = "elastic_search_dns_endpoint"
    $Es_Index    = "elastalert_writeback_index"
    $Es_Type     = "elastalert_status"

    $Body = @{
      "query" = @{
        "bool" = @{
          "filter" = @(
            @{  
              "term" = @{
                "rule_name" = $Rule_Name;
              }   
            };  
            @{  
              "range" = @{
                "hits" = @{
                  "gt" = 0 
                }   
              }   
            };
            @{  
              "range" = @{
                "matches" = @{
                  "gt" = 0 
                }   
              }   
            };    
            @{  
              "range" = @{
                "@timestamp" = @{
                  "gt" = "now-1w"
                }   
              }   
            }   
          )   
        }   
      }   
    }   

    $Json_Body = $Body | ConvertTo-Json -Depth 10

    # Un-comment as needed for troubleshooting
    # Write-Output $Json_Body

    $Response = Invoke-RestMethod -Method POST -URI https://$Es_Endpoint/$Es_Index/_search  -Body $Json_Body -ContentType 'application/json'

    # Un-comment these as needed for troubleshooting
    # Write-Output ($Response | Format-List | Out-String)
    # Write-Output ($Response.hits.total | Out-String)

    if ($Response.hits.total -gt 0) {
      $Restore = 0 
    }   

Once the query returns, the script checks to see if the number of hits exceeds 0, which means at least one entry satisfied the query parameters. Based on this response, action can then be taken on the HA service in question.

ruby aws-sdk strikes again…

When using ruby to upload files to S3 and trying to use multipart upload, beware the following ArgumentError:

...param_validator.rb:32:in `validate!': unexpected value at params[:server_side_encryption] (ArgumentError)
...
	from /var/lib/jenkins/.gem/ruby/gems/aws-sdk-core-3.6.0/lib/seahorse/client/request.rb:70:in `send_request'
	from /var/lib/jenkins/.gem/ruby/gems/aws-sdk-s3-1.4.0/lib/aws-sdk-s3/client.rb:3980:in `list_parts'
...

The options passed to list_parts must not include “server_side_encryption”. I always forget to remove this parameter.

A good way that I have found to solve this issue is:

...
      input_opts = {
        bucket:                 bucket,
        key:                    key,
        server_side_encryption: "AES256",
      }

      if defined? mime_type
        input_opts = input_opts.merge({
          content_type: mime_type,
        })
      end
...
      input_opts.delete_if {|key,value| key.to_s.eql?("content_type") }
      input_opts.delete_if {|key,value| key.to_s.eql?("server_side_encryption") }

      input_opts = input_opts.merge({
          :upload_id   => mpu_create_response.upload_id,
      })

      parts_resp = s3.list_parts(input_opts)
...

You can see here that I delete values that may have been added so that the final options hash will work with the list_parts call.

ELK + EA — Silencing ElastAlert Alerts

Many shops are realizing the benefit of the ELK stack / Elastic Stack, and the great flexibility that it brings to an infrastructure in the form of centralized logging and reporting which has always been critical when troubleshooting difficult and/or distributed problems. Having many input options (via elastic beats) to choose from, and lots of flexibility in data manipulation (via logstash) has only increased the usefulness of this stack. As a consultant, I’m finding this stack deployed more and more often with clients and it’s enjoyable to work with.

I’ve had the opportunity to implement ElastAlert to provide monitoring and alerting services against an established Elastic Stack deployment. ElastAlert is a Yelp product that is written in Python and is “a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in ElasticSearch”.

With ElastAlert, much of what has traditionally been monitored via Nagios, or similar tool, can now be done against ElasticSearch. ElastAlert also provides many notification options, templates, and formats. Also, there is a fairly straightforward enhancement process where local modifications can be made against the framework without diverting from the main code base when additional processing or manipulation may be desired.

With a very strong background in Nagios and related tools, the one failure (with an existing enhancement request) in ElastAlert is no ability to silence or suppress or acknowledge alerts. They are either alerting or not, relative to the realert setting. This is a huge inconvenience if all alerts are not immediately resolved as ElastAlert will continue to notify the alerting endpoint with alerts that are already being acted upon and require no new action. This is a (IMO) bad way to do business as it may result in missed alerts and poor service to customers.

ElastAlert will send out an alert on every run unless it has an entry in the elastalert_status (or equivalent metadata index) index under the silence type, ie “_type:silence” with an “until” field that has not yet expired, for the rule in question. This is how ElastAlert maintains the proper realert time period for alerts where notifications have already been sent and the run time is more frequent than the realert time. We can add an appropriate entry to this index to silence an alert and provide the same functionality as acknowledging in Nagios or New Relic, or similar behavior in other alerting systems.

To provide an example, start with the rule that has the following configuration:

es_host: 172.16.0.4
es_port: 80
aws_region: us-west-2
name: "nginx-web-app error in logs"
index: sales-promos-*
type: any
filter:
  - query_string:
    query: "type:log AND source:'/var/log/nginx/sales-promo.log' AND message: *error*"
query_delay:
  minutes: 1
query_key: host
realert:
  minutes: 15
alert:
 - "sns"
sns_topic_arn: arn:aws:sns:us-west-2:************:duty-pager-alerts

The rule name that we would use is “nginx-web-app error in logs”. The realert time is 15 minutes. This means any time we get errors in the logs, we’ll see an error alert every 15 minutes, as long as the error condition continues. In order to suppress this alert for 1 hours, we’d issue the following curl command (or similar):

$ export ES_ENDPOINT=172.16.0.4
$ export ES_INDEX=elastalert_status

$ curl -X POST https://${ES_ENDPOINT}/${ES_INDEX}/silence/ -d '{
  "rule_name": "nginx-web-app error in logs.ip-172-16-0-10",
  "@timestamp": "2017-08-07T16:43:24.000000Z",
  "exponent": 0,
  "until": "2017-08-07T20:43:24.000000Z"
}'

Note also that when using a query_key, the node identified by the query_key can be silenced without silencing the alert in general. This is incredibly helpful as one problem should not disable the entire monitoring system. This example above shows silencing the alert for only the host with a hostname of ip-172-16-0-10. Note that if a query_key is specified in a silence entry when the rule does not have a query_key defined, ElastAlert will fail to run.

To delete an entry, in the event an error has occurred, issue a curl delete, after locating the index ID of the entry to delete, ie:

$ curl -X DELETE https://${ES_ENDPOINT}/${ES_INDEX}/silence/${index_id_of_bad_entry}

It may take a few minutes for ElasticSearch to re-index the data after a delete so the error may not go away immediately.

Adding git branch and aws profile to your bash prompt…

As a consultant who works in AWS for numerous clients, one of the most important things to keep track of is which AWS CLI profile am I currently on. To help clarify this, I’ve recently added the AWS profile to my bash prompt to remove all doubt. In addition, I’ve added a prompt for the git branch that I’m currently on. I don’t know if I’ll keep both of these around as it’s added some latency to my prompt returning promptly, but so far it’s useful.

To do this, add the following to your ~/.bashrc:


    git_branch() {
      git branch > /dev/null 2>&1
      if [[ $? -gt 0 ]]
      then
        echo "(none)"
      else
        git_branch=$(git branch | awk '/\*/ {print "("$2")"}')
        echo "${git_branch}"
      fi  
    }
    
    aws_profile() {
      aws_profile=$(aws configure list | egrep profile | awk '{print "("$2")"}')
      if [[ "${aws_profile}" == "(<not)" ]]
      then
        echo "(none)"
      else
        echo "${aws_profile}"
      fi  
    }
    
    export PS1='\n-$?- \u@\h \w >\ngit branch:   $(git_branch)\naws profile:  $(aws_profile)\n\n> '

Note that this will over-write any PS1 that you might have already setup, so use caution. This results in a multi-line prompt that looks like the following:

-0- username@hostname ~/consulting >
git branch:   (master)
aws profile:  (test-dev-account)

> 

To get the new prompt, either logout and login, or source the .bashrc file, ie:

. ~/.bashrc

This prompt is very useful as it provides the return code of the last command (-0-), the user@hostname, the current location on disk, and then two lines with the git branch of the current repository, and then the aws profile that is currently active. The prompt then follows up with two newlines and an angle bracket for the cursor.

Throttling Requests with the Ruby aws-sdk

A common problem of late is throttling requests when using the ruby aws-sdk gem to access AWS services. Handling these exceptions is fairly trivial with a while loop like the following:

retry_count   = 0 
retry_success = 0 

while retry_success == 0
  retry_success = 1
  begin

    #
    # enter code to interact with AWS here
    #

  rescue Aws::APIGateway::Errors::TooManyRequestsException => tmre

  #
  # note that different AWS services have different exceptions
  # for this type of response, be sure to check your error output
  #

    sleep_time = ( 2 ** retry_count )
    retry_success = 0 
    sleep sleep_time
    retry_count = retry_count + 1 

  end
end

Note that there are different exceptions for different services that might indicate a throttling scenario so be sure to check the output received or the documentation around which exception to handle. Also note that additional exceptions should be handled around bad requests, missing, duplicate, unavailable, or mal-formed objects.