Archive for the ‘Open Source Software’ Category

Packt Publishing 2000th Title Campaign

Thursday, March 27th, 2014

I’m a big fan of Packt Publishing and the work they do to provide quality ebooks to folks.

Check out Packt’s amazing Buy One, Get One Free offer http://bit.ly/1j26nPN.

MySQL 5.6 Transportable Tablespaces

Monday, February 24th, 2014

Transportable tablespaces in mysql 5.6 are an amazing feature allows InnoDB tables to be exported from one database and imported into another. This allowed me to recently consolidate 4 large but lightly used databases onto one server with little down time by exporting, copying to a new server, and importing.

In versions prior to 5.6 your options were to mysqldump or backup/restore and it was difficult to do that with a large (200GB+) database as it could take several days to a week. With transportable tablespaces, I was able to export/copy/import a 200GB+ database in under 5 hours.

Note that you need to have innodb_file_per_table enabled for this to work.

Also, if you get the following error, you need to ALTER the column(s) in question to upgrade to the 5.6 storage format:

ERROR 1808 (HY000): Schema mismatch (Column updated_at precise type mismatch.)

For more information on the timestamp,date,datetime storage changes, check here. Note that running an optimize on the table to rebuild it will *not* work (I tried). You must run an alter on the specific column(s) in question.

Ie, to update a datetime column in the wp_users table that has the following defintion:

`user_registered` datetime NOT NULL DEFAULT '0000-00-00 00:00:00'

Use this statement:


ALTER TABLE wp_users
CHANGE COLUMN user_registered user_registered datetime NOT NULL DEFAULT '0000-00-00 00:00:00';

Also check out online DDL in MySQL 5.6!

git branching strategy – what’s yours?

Tuesday, January 28th, 2014

When it comes to deploying code to various environments, the ideal scenario would be to continuously deliver a specific branch to production, after delivering to dev/test/staging with unit tests to validate the code on the way. The other end of the spectrum is to manually deploy and test to each environment. All clients I work with are comfortable somewhere in between where deployments are automated to pre-production environments where validation can be performed manually and then production deployments are triggered manually with immediate human validation.

What does this have to do with git branching, you say? If you don’t deploy code to production immediately, it’s a good idea to logically mark it as different than what is in production. The most common logical markers used in git are branches and tags. A branch is a copy made from another branch that then becomes a seperate code base within the repository while the tag is a very specific commit hash that transcends branches and is applied repository wide. A branch may be merged with other branches while a tag remains static.

There are three primary strategies to deployments with these logical markers:

  • creating a tag for all deployments and deploy the tag
  • creating a branch which applies to each environment – deployments are triggered by merging with that branch
  • creating a branch for pre-prod work and deploying master to production

I’ve created the list based on the order that I have seen them applied with various clients I work with and have worked with in the past. The most fool-proof strategy that I have seen by far is the strategy to create a branch for each environment and auto-deploy by merging into that branch. That strategy can easily be implemented in Jenkins by creating a job for each environment and setting the branch in the configuration.

If you create a tag or new branch for pre-prod work and plan to deploy that to staging, for example, you must specify somewhere that you want to deploy that tag or branch and this creates more of an interactive deployment situation. The more interactive the process is, the more potential for error. Also, if you work with a large team and you’re coordinating deployments, you must be very clear as to which branch/tag gets deployed [and has been deployed] to each environment.

What’s your strategy?

Outside Access to VPC RDS Instance

Wednesday, November 13th, 2013

Many applications inside Amazon Web Service are using MySQL inside a Virtual Private Cloud (VPC) and not accessible to the outside network. Oftentimes clients will want to connect to the database directly to inspect data, run a visualization tool, or simply connect a locally run application. The solution to this problem is to NAT connections from the application hosts to the RDS instance(s) using an ELB.

internet -> ELB -> webapp server -> RDS instance

The first step in this process is to enable forwarding on the webapp servers by editing /etc/sysctl.conf.

# /etc/sysctl.conf
net.ipv4.ip_forward = 1

Enable the new setting:

sysctl -p

The next step is to add the IPTables rules to route traffic through, first add a destination NAT rule to the PREROUTING table which will intercept all traffic to port 3306 and forward it to the IP:port specificed, then setup masquerading in the POSTROUTING table (another option is to SNAT):

iptables -t nat -A PREROUTING -p tcp --destination-port 3306 -j DNAT --to-destination 10.x.x.x:3306
iptables -t nat -A POSTROUTING -j MASQUERADE

Note that this is not a perfect solution as the RDS instance internal IP address may change at some point. I tried to use the endpoint DNS name but IPTables would not accept that and the man page does indicate that an IP address must be used.

After that, be sure to allow access from the desired IP range on the security group that controls access to the ELB as well as the EC2 instances.

Delete Orphaned AMI-Related Snapshots

Wednesday, August 14th, 2013

I recently worked with a client where there were a number of Amazon EC2 AMIs where not all of the disk volumes were set to delete on termination. This caused quite a few snapshots to become orphaned when the associated AMI was deleted. This was discovered when there were hundreds of snapshots and no active snapshot plan.

To fix this issue, I wrote a script that will loop through all snapshots that have been created as part of a AMI and deleting them if that AMI no longer exists.

Note that this process should be used with a process to set all volumes to delete on termination to prevent future orphans.

This script requires the EC2 command line tools.

Note that you should be 100% comfortable with this script before running it as it will delete snapshots. On the other hand, they are only snapshots, it will not delete EC2 instances.

#!/bin/bash

images=$(ec2-describe-images  | awk '/IMAGE/ {print $2}')
invalid_count=0
valid_count=0

IFS='
'

for snapshot in $(ec2-describe-snapshots)
do
  snapshotid=$(echo ${snapshot} | sed -n 's/.*\(snap-[a-z0-9]\{4,8\}\).*/\1/p')
  amiid=$(echo ${snapshot} | sed -n 's/.*\(ami-[a-z0-9]\{4,8\}\).*/\1/p')

  if [ -z ${amiid} ]
  then
    # not related to AMI
    continue;
  fi  

  valid=$(echo ${images} | egrep -c ${amiid})
  if [ "${valid}" -gt 0 ] 
  then
    valid_count=$((valid_count+1))
  else
    echo "Deleting orphaned snapshot ${snapshotid} which belongs to non-existent AMI ${amiid}"
    invalid_count=$((invalid_count+1))
    ec2-delete-snapshot ${snapshotid}
  fi  

done

unset IFS

echo "Valid snapshots:  ${valid_count}"
echo "Invalid snapshots:  ${invalid_count}"

exit 0;

Let me know if you find any issues.

- josh

puppet node name using FQDN rather than short name

Wednesday, August 14th, 2013

I recently deployed puppet to a host of machines and ran into an issue with getting one of the hosts to read the /catalog.

* puppet 2.7
* Amazon Linux

Wed Aug 14 22:11:39 +0000 2013 Puppet (err): Could not retrieve catalog from remote server: 
Error 403 on SERVER: Forbidden request: hostname.example.com.
(10.0.1.20) access to /catalog/hostname.example.com. [find] authenticated  at /etc/puppet/auth.conf:52

I was confused as to why the client was reporting using the fully qualified domain name and why it was failing to read the catalog while I had just deployed half a dozen other clients without any issues.

I was able to identify the problem as this client was the only client in the deployment with a search domain configured in /etc/resolv.conf. This resulted in the certificate being generated for the FQDN and the puppet client identifying itself using the FQDN where my node declarations were using the short hostname (via LDAP).

I ended up removing the search domain from the /etc/resolv.conf file to resolve this issue. It looks like the “right” answer might be to specify the node_name in puppet.conf to tell puppet how to identify each client although that might be less secure.

Have you encountered this issue? How have you solved it?

- josh

Book Review: Instant RSpec Test Driven Development How-to

Friday, July 12th, 2013

Instant RSpec Test Driven Development How-to, by Charles Feduke, is another book in the Instant series published by Pakt Publishing designed to get the reader up and running quickly (Short, Fast, Focused). This book covers test driven development (TDD) using rspec with ruby and is designed for developers of all experience levels.

As a solutions architect who supports rails applications which are written with rspec tests, this book bridges a gap to give me more familiarity with the tests which are relied upon to validate application code.

While reading through this book I was able to quickly and easily get my environment setup with the appropriate gems and packages installed, as well as begin development on a basic application using test driven development. Charles walks the reader through creating a sample application and later on moving that into a rails application with ActiveRecord.

There were many examples throughout the book and it lends itself to following along with a shell window open beside the e-reader. I found myself wondering what some of the examples were doing at times but Charles always came through and explained them within a few pages to satisfy my curiousity. The presentation method in this book is to get you up and running and then explain things a bit later on. I’m sure those more familiar with ruby development would catch on a bit quicker.

While Charles does a great job in this book explaining code refactoring, creating concise code blocks, and giving great examples, he calls out a difference in mocking and stubbing but doesn’t really explain what this critical difference is in a way that I understood it. He also covers JSON validation and file uploads which seem required for every API lately.

The book ends with coverage of capybara for client side testing. This is a great end to the book as after you develop your application function it’s time to validate that the user experience is as expected.

Overall, this was a good read. I thought it was a bit above my level, as far as ruby experience goes, but a useful tool that I will reference and use in the future as I continue to work and learn on ruby/rails projects. Thanks, Charles, I look forward to seeing more in the future.

Bash Tip: Modify and repeat last command…

Friday, July 12th, 2013

I will often issue a command at the bash prompt and want to re-issue the same command, albeit with a slight modification. This can be a pain if the command is lengthy and I’ve often thought it should be easier. I finally got around to trying something new to make it easier.

1. search for a package with yum

sudo yum --enablerepo=epel search ssldump

2. issue the same command, although replace search with install:

$(echo !! | sed 's/search/install/')

Now that I typed that all out, I realize that it’s not much of a savings, and with a different command, it could be more painful although it seems clever.

After a bit of research, I came across this shorter version (second step only for brevity):

^search^install^

Much more elegant for single occurence replacement. For multiple occurences, try this previous one again, with an added /g:

$(echo !! | sed 's/search/install/g')

Check out the bash reference.

Upcoming Book Review: Instant RSpec Test-Driven Development

Tuesday, July 2nd, 2013

I have received a copy of Instant RSpec Test-Driven Development from Pakt Publishing and will be reviewing this book within the next couple weeks.

EBS Volumes – deleteOnTermination ?

Tuesday, June 18th, 2013

When using EC2 instances with EBS backed storage, whether or not your instances are setup to delete their EBS volumes on termination can be a big deal — especially if you burn AMIs and provision instances over and over. You could find yourself with many EBS volumes that are unused and pay for lots of storage you don’t use.

Audit your systems with a command similar to this one – the last column in the output is whether or not deleteOnTermination is set:

for instanceid in $(ec2-describe-instances | awk '/INSTANCE/ {print $2}')
do
  echo "InstanceID: ${instanceid}"
  ec2-describe-instance-attribute -v -b ${instanceid} | egrep "BLOCKDEVICE.*false"

done

If you see output like the following:

InstanceID: i-xxxxxxxx            
  BLOCKDEVICE     /dev/sda1        vol-xxxxxxxx    2013-05-24T20:32:05.000Z        false   

…you have instances with volumes that will not delete when the instance is terminated.

To fix this, run the following command for each instance, and burn another AMI:

ec2-modify-instance-attribute -b '/dev/sda1=vol-xxxxxxxx:true' i-xxxxxxxx

I made a simple bash script that will iterate over all EC2 instances in an account and modify the first volume that is not set to delete on termination to do so. Note that this limitation requires the script to be re-run multiple times, depending on the number of EBS volumes attached to each instance that might need this flag set.

#!/bin/bash

#
# Audit instances to set all volumes to deleteOnTermination
#

for instanceid in $(ec2-describe-instances | awk '/INSTANCE/ {print $2}')
do
  IFS='\n'
  result=$(ec2-describe-instance-attribute -v -b ${instanceid} | egrep "BLOCKDEVICE.*false")
  for line in ${result}
  do  
    echo ${line}
    device=$(echo ${line} | head -n 1 | awk '{print $2}')
    volume=$(echo ${line} | head -n 1 | awk '{print $3}')
    ec2-modify-instance-attribute -b "${device}=${volume}:true" ${instanceid}
    if [ $? -gt 0 ] 
    then
      echo "command failed for ${instanceid}"
    fi  
  done
  unset IFS
done

exit 0;

Note that for some instances, multiple volumes were set properly and for some it was not. I did not take the time to troubleshoot this discrepancy or write a proper loop at this point. Patches welcome.