Book Review: Puppet Reporting and Monitoring

July 11th, 2014

I just finished reading “Puppet Reporting and Monitoring” and I gained quite a bit from reading this book and will change the way that I work with Puppet to use techniques that I’ve learned here. I have traditionally used Puppet and other configuration management tools as a simple configuration tool and haven’t expected much back from them beyond failure/sucess email. Michael Duffy reveals his fantastic knowledge of Puppet and this topic throughout as he explains how to setup and obtain useful information from Puppet using built in and freely available tools.

The first few chapters cover basic report setup, existing dashboard options, and report processors. This is good information and prepares the beginning to intermediate Puppet administrator with the required back ground to really take advantage of what comes over the rest of the book with custom report processing, PuppetDB and the associated API, as well as custom reports and dashboards.

Michael gives quite a few examples, clearly explaining how to write basic reports in ruby, using readily available gems and frameworks which can be immediately useful to the reader in retrieving and displaying data about the managed environment. While there is significant emphasis on using PuppetDB and the associated API which might not benefit those with heavy investments in other ENCs; Michael makes a great case for use of PuppetDB through his examples and reports that might convince the otherwise protesting reader to use it.

Overall, I would highly recommend this book. There is insufficient emphasis placed on reporting and monitoring in the configuration management space and I applaud the efforts of Michael Duffy in getting this book out and taking the time to share his expertise.

Group vs Role – what’s the difference?

July 8th, 2014

I often hear people using the terms group and role as if they are completely different things while I would argue that they are the same thing.

A role is function assumed by a person or thing in a particular scenario. A group is a number of things considered similar. One might correctly assume then that a role is a group of users or ACLs applied to a user.

Now wait a minute, you say, those are two different things. Yeah, sure, they are, but a group without ACLs is nothing. A role without ACLs is a group. Therefore, a group is a role.

The issue is that roles are controlled within an application to group together ACLs that are then applied to users and groups which may or may not be retrieved from a directory and then administrators start assigning groups and users to roles because somebody thought it would be a good idea to provide the functionality to do so. This only makes things more complicated and harder to manage.

Simplify your application and infrastructure. Start thinking in terms of logical containers of N number of users with specific ACLs applied equals a role (aka group).

Puppet Reporting and Monitoring

July 8th, 2014

I received a new book from Packt Pubishing: Puppet Reporting and Monitoring. It’s a good read so far and I will be posting a review soon.

Rebooting: quick tip

April 9th, 2014

Note to self: whenever rebooting a server, login via SSH and restart the OpenSSH daemon first to validate that it will come back up.

I just updated an AWS instance and rebooted it without doing this. Some new update in OpenSSH required that the AuthorizedKeysCommandUser be defined if AuthorizedKeysCommand is defined and the OpenSSH daemon will not start.

Luckily I can tell puppet to fix this and will be able to login in 30 minutes but that’s 30 minutes I’d prefer not to wait.

- josh

Packt Publishing 2000th Title Campaign

March 27th, 2014

I’m a big fan of Packt Publishing and the work they do to provide quality ebooks to folks.

Check out Packt’s amazing Buy One, Get One Free offer http://bit.ly/1j26nPN.

MySQL 5.6 Transportable Tablespaces

February 24th, 2014

Transportable tablespaces in mysql 5.6 are an amazing feature allows InnoDB tables to be exported from one database and imported into another. This allowed me to recently consolidate 4 large but lightly used databases onto one server with little down time by exporting, copying to a new server, and importing.

In versions prior to 5.6 your options were to mysqldump or backup/restore and it was difficult to do that with a large (200GB+) database as it could take several days to a week. With transportable tablespaces, I was able to export/copy/import a 200GB+ database in under 5 hours.

Note that you need to have innodb_file_per_table enabled for this to work.

Also, if you get the following error, you need to ALTER the column(s) in question to upgrade to the 5.6 storage format:

ERROR 1808 (HY000): Schema mismatch (Column updated_at precise type mismatch.)

For more information on the timestamp,date,datetime storage changes, check here. Note that running an optimize on the table to rebuild it will *not* work (I tried). You must run an alter on the specific column(s) in question.

Ie, to update a datetime column in the wp_users table that has the following defintion:

`user_registered` datetime NOT NULL DEFAULT '0000-00-00 00:00:00'

Use this statement:


ALTER TABLE wp_users
CHANGE COLUMN user_registered user_registered datetime NOT NULL DEFAULT '0000-00-00 00:00:00';

Also check out online DDL in MySQL 5.6!

git branching strategy – what’s yours?

January 28th, 2014

When it comes to deploying code to various environments, the ideal scenario would be to continuously deliver a specific branch to production, after delivering to dev/test/staging with unit tests to validate the code on the way. The other end of the spectrum is to manually deploy and test to each environment. All clients I work with are comfortable somewhere in between where deployments are automated to pre-production environments where validation can be performed manually and then production deployments are triggered manually with immediate human validation.

What does this have to do with git branching, you say? If you don’t deploy code to production immediately, it’s a good idea to logically mark it as different than what is in production. The most common logical markers used in git are branches and tags. A branch is a copy made from another branch that then becomes a seperate code base within the repository while the tag is a very specific commit hash that transcends branches and is applied repository wide. A branch may be merged with other branches while a tag remains static.

There are three primary strategies to deployments with these logical markers:

  • creating a tag for all deployments and deploy the tag
  • creating a branch which applies to each environment – deployments are triggered by merging with that branch
  • creating a branch for pre-prod work and deploying master to production

I’ve created the list based on the order that I have seen them applied with various clients I work with and have worked with in the past. The most fool-proof strategy that I have seen by far is the strategy to create a branch for each environment and auto-deploy by merging into that branch. That strategy can easily be implemented in Jenkins by creating a job for each environment and setting the branch in the configuration.

If you create a tag or new branch for pre-prod work and plan to deploy that to staging, for example, you must specify somewhere that you want to deploy that tag or branch and this creates more of an interactive deployment situation. The more interactive the process is, the more potential for error. Also, if you work with a large team and you’re coordinating deployments, you must be very clear as to which branch/tag gets deployed [and has been deployed] to each environment.

What’s your strategy?

Outside Access to VPC RDS Instance

November 13th, 2013

Many applications inside Amazon Web Service are using MySQL inside a Virtual Private Cloud (VPC) and not accessible to the outside network. Oftentimes clients will want to connect to the database directly to inspect data, run a visualization tool, or simply connect a locally run application. The solution to this problem is to NAT connections from the application hosts to the RDS instance(s) using an ELB.

internet -> ELB -> webapp server -> RDS instance

The first step in this process is to enable forwarding on the webapp servers by editing /etc/sysctl.conf.

# /etc/sysctl.conf
net.ipv4.ip_forward = 1

Enable the new setting:

sysctl -p

The next step is to add the IPTables rules to route traffic through, first add a destination NAT rule to the PREROUTING table which will intercept all traffic to port 3306 and forward it to the IP:port specificed, then setup masquerading in the POSTROUTING table (another option is to SNAT):

iptables -t nat -A PREROUTING -p tcp --destination-port 3306 -j DNAT --to-destination 10.x.x.x:3306
iptables -t nat -A POSTROUTING -j MASQUERADE

Note that this is not a perfect solution as the RDS instance internal IP address may change at some point. I tried to use the endpoint DNS name but IPTables would not accept that and the man page does indicate that an IP address must be used.

After that, be sure to allow access from the desired IP range on the security group that controls access to the ELB as well as the EC2 instances.

Delete Orphaned AMI-Related Snapshots

August 14th, 2013

I recently worked with a client where there were a number of Amazon EC2 AMIs where not all of the disk volumes were set to delete on termination. This caused quite a few snapshots to become orphaned when the associated AMI was deleted. This was discovered when there were hundreds of snapshots and no active snapshot plan.

To fix this issue, I wrote a script that will loop through all snapshots that have been created as part of a AMI and deleting them if that AMI no longer exists.

Note that this process should be used with a process to set all volumes to delete on termination to prevent future orphans.

This script requires the EC2 command line tools.

Note that you should be 100% comfortable with this script before running it as it will delete snapshots. On the other hand, they are only snapshots, it will not delete EC2 instances.

#!/bin/bash

images=$(ec2-describe-images  | awk '/IMAGE/ {print $2}')
invalid_count=0
valid_count=0

IFS='
'

for snapshot in $(ec2-describe-snapshots)
do
  snapshotid=$(echo ${snapshot} | sed -n 's/.*\(snap-[a-z0-9]\{4,8\}\).*/\1/p')
  amiid=$(echo ${snapshot} | sed -n 's/.*\(ami-[a-z0-9]\{4,8\}\).*/\1/p')

  if [ -z ${amiid} ]
  then
    # not related to AMI
    continue;
  fi  

  valid=$(echo ${images} | egrep -c ${amiid})
  if [ "${valid}" -gt 0 ] 
  then
    valid_count=$((valid_count+1))
  else
    echo "Deleting orphaned snapshot ${snapshotid} which belongs to non-existent AMI ${amiid}"
    invalid_count=$((invalid_count+1))
    ec2-delete-snapshot ${snapshotid}
  fi  

done

unset IFS

echo "Valid snapshots:  ${valid_count}"
echo "Invalid snapshots:  ${invalid_count}"

exit 0;

Let me know if you find any issues.

- josh

puppet node name using FQDN rather than short name

August 14th, 2013

I recently deployed puppet to a host of machines and ran into an issue with getting one of the hosts to read the /catalog.

* puppet 2.7
* Amazon Linux

Wed Aug 14 22:11:39 +0000 2013 Puppet (err): Could not retrieve catalog from remote server: 
Error 403 on SERVER: Forbidden request: hostname.example.com.
(10.0.1.20) access to /catalog/hostname.example.com. [find] authenticated  at /etc/puppet/auth.conf:52

I was confused as to why the client was reporting using the fully qualified domain name and why it was failing to read the catalog while I had just deployed half a dozen other clients without any issues.

I was able to identify the problem as this client was the only client in the deployment with a search domain configured in /etc/resolv.conf. This resulted in the certificate being generated for the FQDN and the puppet client identifying itself using the FQDN where my node declarations were using the short hostname (via LDAP).

I ended up removing the search domain from the /etc/resolv.conf file to resolve this issue. It looks like the “right” answer might be to specify the node_name in puppet.conf to tell puppet how to identify each client although that might be less secure.

Have you encountered this issue? How have you solved it?

- josh