EBS Volumes – deleteOnTermination ?

June 18th, 2013

When using EC2 instances with EBS backed storage, whether or not your instances are setup to delete their EBS volumes on termination can be a big deal — especially if you burn AMIs and provision instances over and over. You could find yourself with many EBS volumes that are unused and pay for lots of storage you don’t use.

Audit your systems with a command similar to this one – the last column in the output is whether or not deleteOnTermination is set:

for instanceid in $(ec2-describe-instances | awk '/INSTANCE/ {print $2}')
do
  echo "InstanceID: ${instanceid}"
  ec2-describe-instance-attribute -v -b ${instanceid} | egrep "BLOCKDEVICE.*false"

done

If you see output like the following:

InstanceID: i-xxxxxxxx
  BLOCKDEVICE     /dev/sda1        vol-xxxxxxxx    2013-05-24T20:32:05.000Z        false

…you have instances with volumes that will not delete when the instance is terminated.

To fix this, run the following command for each instance, and burn another AMI:

ec2-modify-instance-attribute -b '/dev/sda1=vol-xxxxxxxx:true' i-xxxxxxxx

I made a simple bash script that will iterate over all EC2 instances in an account and modify the first volume that is not set to delete on termination to do so. Note that this limitation requires the script to be re-run multiple times, depending on the number of EBS volumes attached to each instance that might need this flag set.

#!/bin/bash

#
# Audit instances to set all volumes to deleteOnTermination
#

for instanceid in $(ec2-describe-instances | awk '/INSTANCE/ {print $2}')
do
  IFS='\n'
  result=$(ec2-describe-instance-attribute -v -b ${instanceid} | egrep "BLOCKDEVICE.*false")
  for line in ${result}
  do
    echo ${line}
    device=$(echo ${line} | head -n 1 | awk '{print $2}')
    volume=$(echo ${line} | head -n 1 | awk '{print $3}')
    ec2-modify-instance-attribute -b "${device}=${volume}:true" ${instanceid}
    if [ $? -gt 0 ]
    then
      echo "command failed for ${instanceid}"
    fi
  done
  unset IFS
done

exit 0;

Note that for some instances, multiple volumes were set properly and for some it was not. I did not take the time to troubleshoot this discrepancy or write a proper loop at this point. Patches welcome.

Could not initialize master info structure; more error messages can be found in the MySQL error log

June 10th, 2013

At times I’ve received this message when restarting a mysql slave server:

Could not initialize master info structure; more error messages can be found in the MySQL error log

This often occurs when changing the hostname of the slave (ie, master failover) and relay-log is not set in the my.cnf.

The answer here is to simply issue ’show slave status\G’ and record the current master log file and position. Then reset the slave, and change master back to exactly where it was. Not that this requires the replication credentials.

1. display current replication information

SHOW SLAVE STATUS\G

Note the Master_Log_File and Read_Master_Log_Pos.

2. Reset the slave – note that this will completely reset the slave, removing all master info – do not do this if you do not have your master info noted.

RESET SLAVE ALL;

3. Setup replication again by issuing a change master statement.

CHANGE MASTER TO
  MASTER_HOST='master_ip',
  MASTER_USER='replicationuser',
  MASTER_PASSWORD='replicationpass',
  MASTER_LOG_FILE='(from above)',
  MASTER_LOG_POS= (from above);

Then start the slave and you should be good.

START SLAVE;

MySQL – Large Dataset Dump and Restore

June 5th, 2013

When performing a dump and restore of large datasets in MySQL that take multiple days to perform, I typically use the process outlined here.

Note that it is critical to connect to the server in question and establish a screen session to perform all commands and operations within. This prevents issues with client connections dropping. I have been known to start a multi-hour restore and need to take my laptop to another client and become discouraged when I had not followed this advice.

Dumping Data to Disk

1. Stop replication after noting binary log positions, if applicable:

STOP SLAVE;
SHOW SLAVE STATUS\G
SHOW MASTER STATUS;

Then edit your my.cnf file and temporarily add the following – just in case the mysql daemon restarts due to corrupt record or server reboot:

# /etc/my.cnf
skip-slave-start = 1

2. Set the bind-address parameter to localhost and restart mysql to prevent network access to the server in question – a rogue connection could write or consume resources from the mysql daemon and slow down the dump and/or restore.

# /etc/my.cnf
bind-address = 127.0.0.1

3. Dump data

mysqldump  [--skip-add-drop-table] [--replace] -u root -pxxxxxxxx $database $table | gzip > /mnt/backups/$dumpfile-0.sql.gz

Dumping Partial Data

Sometimes you export due to a corrupt InnoDB record or the dump might fail at some point. If you know approximately where this occurred, use the following command to dump a partial set:

mysqldump  --skip-add-drop-table --replace -u root -pxxxxxxxx $database $table --where="id > XXXXXXXX" | gzip > /mnt/backups/$dumpfile-N.sql.gz

** Note the use of the –replace command to ensure you can over-write existing data. This allows you to estimate conservatively where the previous dump may have stopped.

Restoring Data

** Note that you can use the tail n +XX command to skip any drop/create statements in the file if you forgot to tell mysqldump to skip those commands.

gzip -dc /mnt/backups/$backupfile.sql.gz | tail -n +49 | mysql -u root -pxxxxxxxx $database

I will also run this shell script in a screen window to clear up any binary logs that might accumulate, if binary logging is enabled. Note: do not use this command if you are restoring a master which has an active slave during the process.

# while restoring, clear binary logs to prevent disk filling issues:
while true;
do
  echo "clearing binary logs";
  echo "reset master;" | mysql -u root -pxxxxxxxx;
  sleep 60;
done

Cleanup

  • Enable / re-establish replication
  • Enable backups
  • Enable network access by disabling ‘bind-address’ in my.cnf
  • Enable slave start, disable ’skip-slave-start’ in my.cnf

Book Review: Instant Chef Starter, by John Ewart

April 7th, 2013

Instant Chef Starter is an introductory book about Chef, an open-source configuration management and automation platform. John Ewart and Packt Publishing have published a book that will allow a system administrator with no prior Chef experience to get Chef up and running within a day, if not a few hours, by using this guide. If one has already decided that buying a book is the best way to learn Chef basics, I can easily recommend this one.

The book opens with a clear introduction of Chef, along with descriptions of basic components, concepts, and terminology. The biggest benefit to using Chef, or any configuration management software, is to automate and ease the burden of multi-server administration.

(Note that John elected to omit Chef solo and hosted Chef from this book.)

The following section covers installation with a promise to give clear guidance on installation to Debian-based and Redhat-based distributions, as well as a source based install. While the Debian-based and source-based installation instructions were very clear and easy to follow, the Redhat-based instructions were missing. I’m sure that finding the proper installation method would not be difficult for most, and given most Chef users are Debian/Ubuntu proponents, this omission is minor.

Over the next few sections, John takes us through bootstrapping a Chef client, managing cookbooks, recipies, and attributes, as well as data bags and templates. I like the introduction via knife and the web UI and then moving onto knife for more command line power. This allows the reader to see both sides of Chef management and choose what might work best for them.

While this book was designed to introduce Chef to a beginner, I would have liked to see mention made of idempotent operations, source control backed cookbooks and recipes, as well as running chef-client regularly to maintain system state over time rather than ad-hoc execution.

While managing a few servers is not a daunting task for any but the most beginner of system administrator, I would have liked to see an example made mention of which better quantifies the benefits of using a configuration management tool-set once the servers being managed reaches the double digits or further. Take, for instance, the management of 30 servers. Once an operation must be performed on 30 servers that takes 2 minutes for each server, the result is 1 hour of time if no issues are discovered through the process. This could be off-set with Chef in under 10 minutes and off to more important tasks.

While I did find a couple of issues with the book, overall I enjoyed reading it, and found that it provided clear instruction on how to deploy Chef into an enterprise environment. I regularly use Chef to manage client systems and appreciate the benefits that it brings to a system administrator or devops engineer. Configuration management advances and brings standardization to the profession of system administration.

John Ewart, thanks for the good read. I hope to see more from you in the future.

Upcoming book review: Instant Chef Starter

April 4th, 2013

Packt Publishing has sent me a copy of Instant Chef Starter, by John Ewart to review. I should have this review up within the week.

This is a great opportunity as one of the most important tools that any system administrator can learn is configuration management, and Chef is a leader in that space. Chef is unique in that it integrates with major cloud providers and allows management from provisioning to deletion of the virtual server behind the service.

Proxy Splunk via Apache

March 22nd, 2013

I have had to setup Apache to proxy splunk several times over the past 6 months and keep forgetting the splunk configuration to make this work.

Be sure to set the following in /opt/splunk/etc/system/local/web.conf:

enableSplunkWebSSL = 0
root_endpoint = /splunk
tools.proxy.on = True

This assumes a proxy configuration of the following for apache 2.2:

ProxyPass /splunk http://localhost:8000/splunk
ProxyPassReverse /splunk http://localhost:8000/splunk

Be sure to secure your proxy before enabling!

Restart splunk:

/opt/splunk/bin/splunk stop
/opt/splunk/bin/splunk start

How to completely remove a file from git!

February 11th, 2013

I recently made a mistake and committed an ISO file to git that was 2GB in size. I did not immediately notice this issue and made several local commits without a push to github working properly. I did some research and figured out how to fix this problem.

Note that this might not be a good idea if you’ve successfully pushed to your remote repository and share it with othes.

1. Remove the file from each commit with git filter-branch:

git filter-branch -f --tree-filter 'rm -f $FILENAME' HEAD

2. Remove the references:

git reflog expire --expire=now --all

3. Run garbage collection:

git gc --aggressive --prune=now

That worked for me. Let me know if you have a similar experience at: linux (at) itsecureadmin (dot) com.

ldapmodify fails with “Server is unwilling to perform (53)”

January 2nd, 2013

I recentlyr an into an issue when setting up a new LDAP directory using OpenLDAP 2.4.23 on Mac OSX. The issue was that I would get the following error when attempting to modify any entry in the directory:

modifying entry "olcDatabase={1}bdb,cn=config"
ldap_modify: Server is unwilling to perform (53)
        additional info: shadow context; no update referral

A few web searches for this error indicated that it might be bad credentials or that the server was setup as a replication consumer which would forced a read only state. I had dumped the directory from another server where it was a MMR member and thought I had removed the required parts to make it work here (olcSyncRepl, olcServerID, etc..).

I confirmed that the credentials were correct by issuing a search using the credentials used in my attempt to modify the directory.

The problem was that I had the oldMirrorMode directive set to FALSE. The fix was to remove this from the LDIF that I was importing with slapadd and re-import.

Note that you could alternately modify olcMirrorMode and set it to TRUE which should resolve this scenario if you are running MMR and require the olcSyncRepl directives.

AWS VPC DB Security Group

December 18th, 2012

The other day I was working with a client and creating a CloudFormation template that used RDS instances within a VPC. I found that while creating the DB security group object that I was getting an error like the following:

STACK_EVENT  CloudFormationName  DBSecurityGroupName
       AWS::RDS::DBSecurityGroup                2012-12-17T22:30:20Z  CREATE_FAILED
       Please see the documentation for authorizing DBSecurityGroup ingress. For VPC,
       EC2SecurityGroupName and EC2SecurityGroupOwner must be omitted.To
       authorize only the source address of this request (and no other address), pass
       xx.xx.xx.xx/32 as the CIDRIP parameter.

It turns out that beyond the requirement for a DB subnet group, I also needed to change the way that I create DB security groups within the VPC. I solved this problem by using the CIDRIP parameter and included the IP ranges of two private subnets:

    "DBSecurityGroupName": {
       "Type": "AWS::RDS::DBSecurityGroup",
       "Properties": {
          "EC2VpcId" : { "Ref" : "VpcName" },
          "DBSecurityGroupIngress" : [ { "CIDRIP": "10.1.201.0/24" }, { "CIDRIP": "10.1.301.0/24" } ],
          "GroupDescription": "Application Server Access"
        }
    },

The examples given on the official docs page did not help with this issue, I found that I was experimenting until I was able to get this working. I copied the examples and they failed for this particular scenario.

Configure MAC OSX Network Interface from the Command Line

December 10th, 2012

Command line network configuration for the MAC is quite a bit different when compared to Linux or Unix. The networksetup command is used instead of ifconfig to configure devices.

The first step is to get a list of all network services – these are really physical and virtual devices, VPNs, etc..

networksetup -listallnetworkservices

The next step is to get the current settings for the network service that you might want to configure – the following example is using the service “Ethernet” and is enclosed in quotes as some of the services have spaces in the names:

networksetup -getinfo "Ethernet"

To configure an interface manually, use something like the following:

networksetup -setmanual "Ethernet" 192.168.1.10 255.255.255.0 192.168.1.1

Confirm that the settings are correct by issuing the getinfo command once more:

networksetup -getinfo "Ethernet"