WordPress hack attempt…

Gotta love the script kiddies – keeping us entertained by “hacking” static HTML sites:

5.135.230.129 230.159.29.94[CHR(0)]'<?php print(238947899389478923-34567343546345); ?>' $domain.com - [11/Mar/2019:10:46:51 -0700] "GET / HTTP/1.1" 200 1533 "http://www.google.com/'<?php print(238947899389478923-34567343546345); ?>'" "Mozilla/5.9'<?php print(238947899389478923-34567343546345); ?>'"

Apache Airflow 1.10.2– Active Directory Authentication (via LDAP[s])

This basic guide assumes a functional airflow deployment, albeit without authentication, or perhaps, with LDAP authentication under the legacy UI scheme. This guide also assumes apache airflow 1.10.2, installed via pip using MySQL and Redis. The guide also assumes Amazon Linux on an EC2 instance.

Pre-requisites:

    An Active Directory service account to use as the bind account.

First, modify airflow.cfg to remove the existing LDAP configuration, if it exists. This can be done by simply removing the values to the right of the equal sign under [ldap] in the airflow.cfg configuration file. Alternately, the [ldap] section can be removed.

Next, modify airflow.cfg to remove ‘authentication = True’, under the [webserver] section. Also, remove the authentication backend line, if it exists.

And finally, create a webserver_config.py file in the AIRFLOW_HOME directory (this is where airflow.cfg is also located). The contents should reflect the following:

import os
from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_LDAP
basedir = os.path.abspath(os.path.dirname(__file__))

SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

CSRF_ENABLED = True

AUTH_TYPE = AUTH_LDAP

AUTH_ROLE_ADMIN = 'Admin'
AUTH_USER_REGISTRATION = True

AUTH_USER_REGISTRATION_ROLE = "Admin"
# AUTH_USER_REGISTRATION_ROLE = "Viewer"

AUTH_LDAP_SERVER = 'ldaps://$ldap:636/
AUTH_LDAP_SEARCH = "DC=domain,DC=organization,DC=com"
AUTH_LDAP_BIND_USER = 'CN=bind-user,OU=serviceAccounts,DC=domain,DC=organization,DC=com'
AUTH_LDAP_BIND_PASSWORD = '**************'
AUTH_LDAP_UID_FIELD = 'sAMAccountName'
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = False
AUTH_LDAP_TLS_CACERTFILE = '/etc/pki/ca-trust/source/anchors/$root_CA.crt'

Note that this requires a valid CA certificate in the location specified to verify the SSL certificate given by Active Directory so the $ldap variable must be a resolvable name which has a valid SSL certificate signed by $root_CA.crt. Also note that any user who logs in with this configuration in place will be an Admin (more to come on this).

Once this configuration is in place, it will likely be desirable to remove all existing users, using the following set of commands from the mysql CLI, logged into the airflow DB instance:

SET FOREIGN_KEY_CHECKS=0;
truncate table ab_user;
truncate table ab_user_role;
SET FOREIGN_KEY_CHECKS=1;

Next, restart the webserver process:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once the webserver comes up, login as the user intended to be the Admin. This will allow this user to manage other users later on.

After logging in as the Admin, modify the webserver_config.py to reflect the following change(s):

# AUTH_USER_REGISTRATION_ROLE = "Admin"
AUTH_USER_REGISTRATION_ROLE = "Viewer"

Now restart the webserver process once more:

initctl stop airflow-webserver;sleep 300;initctl start airflow-webserver

Once that is done, all new users will register as ‘Viewers’. This will give them limited permissions. The Admin user(s) can then assign proper permissions, based on company policies. Note that this does not allow random people to register — only users in AD can register.

I also like to modify the ‘Public’ role to add ‘can_index’ so that anonymous users can see the UI, although they do not see DAGs or other items.

Note that Apache airflow introduced RBAC with version 1.10 and dropped support for the legacy UI after version 1.10.2.

References:
Airflow
Updating Airflow
Flask AppBuilder LDAP Authentication
Flask AppBuilder Configuration

Tuning EC2 Network Stack

I recently had an issue with web requests taking 1.2-1.5 seconds from a service hosted in AWS. I had a small SSD-backed EC2 instance with a small SSD-backed RDS instance running a wordpress site and this type of performance was not acceptable. After a bit of troubleshooting I discovered that the network was suffering from major congestion.

To determine whether or not performance is suffering due to network congestion, I’d reccommend first ruling out swapping, CPU usage, and disk IO, as those can all contribute to network congestion and related symptoms. Once those items have been ruled out, issue the following command to review current network state:

# netstat -s | egrep 'backlog|queue|retrans|fail'
    75 input ICMP message failed.
    0 ICMP messages failed
    145 failed connection attempts
    1986800 segments retransmited
    1773 packets pruned from receive queue because of socket buffer overrun
    77017125 packets directly queued to recvmsg prequeue.
    15266317022 packets directly received from backlog
    114432650212 packets directly received from prequeue
    155427 packets dropped from prequeue
    104055915 packets header predicted and directly queued to user
    9660 fast retransmits
    885 forward retransmits
    547 retransmits in slow start
    126 sack retransmits failed
    130424 packets collapsed in receive queue due to low socket buffer

The netstat -s command will return summary statistics [since last reboot] for each protocol. The above command searches for specific items related to the backlog, queue, retransmissions, and failures, which will give us a good summary of how healthy the stack is under the current load.

With the above output, we can see that there are a number of retransmitted segments, packets pruned from the receive queue due to socket buffer overrun, packets collapsed in receive queue due to low socket buffer, and others.

To fix this issue, I took a look at the current default settings for 3 parameters, the network device backlog and tcp read and write buffer sizes:

# sysctl -a | egrep 'max_backlog|tcp_rmem|tcp_wmem'
net.core.netdev_max_backlog = 1000
net.ipv4.tcp_rmem = 4096        87380   6291456
net.ipv4.tcp_wmem = 4096        20480   4194304

The default backlog size is pretty small at 1000 units. The read and write memory are also pretty small for a production server that handles large amounts of web traffic. The first value is the minimum memory allocated to each network socket, the second is the default amount of memory alloated to each network socket, and the third is the maximum amount. The reason for these three numbers is to allow the OS to manage the buffers with regard to various load conditions.

To adjust these values, I typically make an entry or modification to /etc/sysctl.conf and then issue the ‘sysctl -p’ command to read them into memory.

I made a few adjustments to these values to get to the values seen below. After each modification, I would test service performance to ensure the system was operating smoothly and without bottlenecks on CPU, memory, disk, or network. Keep in mind that larger values result in larger memory consumption, especially with more network connections, which requires a careful analysis of memory and swap usage after tuning and performance testing.

net.core.netdev_max_backlog = 10000
net.ipv4.tcp_rmem = 20480 174760 25165824
net.ipv4.tcp_wmem = 20480 174760 25165824

These values produced a much quicker response from the web service and reduced page load times by over half a second for each page, as well as cleaning up network statistic failures, queue, and backlog errors.

I think that the default EC2 values are far too conservative and this might be due to the marketing around the larger instances being more IO performant. I’d highly recommend tuning the TCP stack to get more performance from smaller instances.

Rebooting: quick tip

Note to self: whenever rebooting a server, login via SSH and restart the OpenSSH daemon first to validate that it will come back up.

I just updated an AWS instance and rebooted it without doing this. Some new update in OpenSSH required that the AuthorizedKeysCommandUser be defined if AuthorizedKeysCommand is defined and the OpenSSH daemon will not start.

Luckily I can tell puppet to fix this and will be able to login in 30 minutes but that’s 30 minutes I’d prefer not to wait.

– josh

AWS VPC DB Security Group

The other day I was working with a client and creating a CloudFormation template that used RDS instances within a VPC. I found that while creating the DB security group object that I was getting an error like the following:

STACK_EVENT  CloudFormationName  DBSecurityGroupName   
       AWS::RDS::DBSecurityGroup                2012-12-17T22:30:20Z  CREATE_FAILED
       Please see the documentation for authorizing DBSecurityGroup ingress. For VPC,
       EC2SecurityGroupName and EC2SecurityGroupOwner must be omitted.To 
       authorize only the source address of this request (and no other address), pass
       xx.xx.xx.xx/32 as the CIDRIP parameter.

It turns out that beyond the requirement for a DB subnet group, I also needed to change the way that I create DB security groups within the VPC. I solved this problem by using the CIDRIP parameter and included the IP ranges of two private subnets:

    "DBSecurityGroupName": {
       "Type": "AWS::RDS::DBSecurityGroup",
       "Properties": {
          "EC2VpcId" : { "Ref" : "VpcName" },
          "DBSecurityGroupIngress" : [ { "CIDRIP": "10.1.201.0/24" }, { "CIDRIP": "10.1.301.0/24" } ],
          "GroupDescription": "Application Server Access"
        }   
    },

The examples given on the official docs page did not help with this issue, I found that I was experimenting until I was able to get this working. I copied the examples and they failed for this particular scenario.

SSH Public Key Authentication via OpenLDAP on RHEL/CentOS 6.x

With the release of RHEL/CentOS 6.x there are some changes to the way clients authenticate using public keys over SSH with keys stored in OpenLDAP. I was able to get this working with the following modifications.

Pre-requisites:
* RHEL / CentOS 6.x
* openssh-ldap

Setup the sshd_config by setting up the AuthorizedKeysCommand. This will execute the ssh-ldap-wrapper and output the users public key:

AuthorizedKeysCommand /usr/libexec/openssh/ssh-ldap-wrapper

Next, ensure a proper ldap.conf in /etc/ssh — be sure to setup the appropriate level of TLS security to suite your environment:

ldap_version 3
bind_policy soft

binddn cn=readonly,ou=people,dc=example,dc=com
bindpw secret

ssl no
ssl start_tls
tls_reqcert never
tls_cacertdir /etc/openldap/cacerts

host 10.x.x.x
port 389
base dc=example,dc=com

If the LDAP server is setup with the proper schema and contains public keys, this configuration should work.

For more information on how to setup the schema and insert public keys, review the documents here but be sure to note that things have changed with client configuration.

DNS Vulnerability

Tuesday a vulnerability was made public which affects all users of the DNS system. The vulnerability was discovered by Dan Kaminsky, a prominent researcher in the area of DNS, who organized a mass patch release by major vendors to prevent delays between the vulnerability becoming public and the patch being released.

http://doxpara.com/

This is a critical fix that should be applied ASAP to all DNS servers, regardless of vendor.