Bash tip: Print begin and end timestamp on apache logs…

July 9th, 2010

This morning I needed to audit some log files that I had recently processed through AWstats and received a report that there was a discrepancy in the data. The complaint was that one day was missing. I used the following bash script to print out the start and end timestamp of each log file:

for file in $( ls -tr *.gz ) ;
do
  BEGIN=$(zcat ${file} | head -n 1 | awk '{print $4}');
  END=$(zcat ${file} | tail -n 1 | awk '{print $4}');
  echo "${file} - ${BEGIN} - ${END}";
done

Note that each log file was named uniquely by web server and logrotate number, eg webserver1.access_log.XX.gz.

System Administrator Technical Interviews

July 8th, 2010

I have had the opportunity to interview many candidates over the past few months and have a few tips:

  • When indicating that you have VMware experience, clearly indicate which features you have experience with. I have interviewed many candidates who claim to be experts on VI3/vSphere and yet have never used clustering or shared storage.
  • When asked about rating yourself from 1-5 or 1-10, make sure you understand which side is the proficient side and give an example of what you think is proficient in a particular area.

My methodology is to ask the interviewee to rate themselves and then ask them what that rating means to them. If they rate themselves a 4 out of 5 with general Linux system administration, I then ask them to give me a few examples of what somebody who has a 4/5 rating would be able to do. I then ask them questions based on that assessment. If you can’t win on those terms, you typically can’t win.

It is not my desire to stump somebody in an interview, I would prefer to ask them questions about what they have done in the past and get into a good dialogue about things they are familiar with.  Do your interviewer a favor and be very clear on the resume and during the interview process.

Apache 2.2 – Return 200 OK for missing images

June 18th, 2010

I recently faced a problem where I needed to configure Apache to return a 200 OK when it received a request for an image that was missing, along with a custom 404 ErrorDocument which was an image. The reason for this requirement is that when Outlook 2003/2007 displays an HTML page where an image request returns a ‘404 Not Found’, it displays a broken image link icon, which is a little red ‘x’.

The solution that I ended up using was to configure mod_rewrite to look for any requests that were not valid files, links, or directories and return the custom ErrorDocument if these conditions were true. This results in a 200 OK for all requests — even on missing images. Note that this results in Apache never using the ErrorDocument 404 configured.

This configuration must be set at the directory level and not the virtual host level as this references the filesystem which requires a rewritebase to be set (which cannot be done at the virtual host level).

  RewriteEngine On
  RewriteBase /var/www/html
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-l
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !^/server-status [NC]
  RewriteRule .* /missing.jpg [L]

Reporting search referrals

May 4th, 2010

Here is a quick awk command that will parse apache web logs and print a simple virtual host/date/referral csv report that only includes referrals from google, bing, or yahoo:

awk 'tolower($11) ~ "google|bing|yahoo" {print $2 "," $4 "," $11}' ${input_file} >> report.csv

Linux guest hangs at “starting udev” (VMware vSphere)

March 23rd, 2010

Having recently upgraded the Virtual Infrastructure at work to vSphere, I have encountered many scenarios with CentOS 5.3 guests not booting or taking a long time to boot. The last message on the console typically indicates that it’s hanging while starting udev.

The fix for this issue is to ensure proper time keeping practices have been adhered to in accordance with the VMware Timekeeping KB.

The basic steps are:
1. Modify the kernel line of /boot/grub/grub.conf to include the following line:

clocksource=acpi_pm notsc divider=10

clocksource=acpi_pm – uses the Power Management Timer (PMTMR) available in some southbridges as primary timing source
notsc – disable the timestamp counter
divider=10 – reduces the frequency of timer interrupts by 10 (from 1000/second to 100/second)

2. Disable time sync through VMware tools (note that this will continue to happen on bootup, pause, resume, etc..):

vmware-guestd --cmd "vmx.set_option synctime 1 0"

3. Setup time sync through NTP:
a. Setup your /etc/ntp.conf to point to a good NTP server pool.
b. Set NTP to start and persist across reboots.

# yum -y install ntp
# chkconfig ntpd on
# /etc/init.d/ntpd start

Expanding an ext3 filesystem online

March 4th, 2010

One common scenario that I face in my daily work is to add disk to various filesystems. Setting up systems correctly so that this is possible will save time and frustration. One of the easiest cases is adding disk to a virtual machine when the guest is using LVM and ext3.

As always, please be sure to backup your data before trying any filesystem or disk manipulation.

After adding the virtual hard disk using the VI client, provision the space from within the virtual machine using the following steps:

1. re-scan storage

echo "- - -" > /sys/class/scsi_host/host0/scan

2. Create physical volume from new device (Note: check with your SAN admin to see if you need to create a partition and align appropriately.)

pvcreate /dev/sdb

3. Extend the volume group to the new PV (physical volume):

vgextend vg01 /dev/sdb

3. Extend the LV (logical volume) to the desired size:

lvextend -L +2G /dev/vg01/lvol05

4. Resize the filesystem to cover the newly extended LV:

resize2fs /dev/vg01/lvol05

Your newly resized filesystem should now be available.

I have not yet tried expanding existing VMDK files on the fly with vSphere but I plan to test that out next.

Fedora 12 – Disable mouse focus

February 23rd, 2010

One problem that I’ve had with Fedora 12 is that when enabling compliz the focus starts to follow the mouse pointer. This behavior is annoying to me as I don’t like events to occur unless I explicitly ask for them (aka click).

To disable this feature, perform the following:
1. install control-center-extra
2. Open with Applications -> System Tools -> Configuration Editor
3. Select the checkbox at /apps/compiz/general/allscreens/options/click_to_focus

That worked for me.

Bash Tips – Parse mail logs for used mail boxes

February 19th, 2010

I was auditing a set of mail servers at work the other day getting a list of all active user accounts and developed this little one liner:

zgrep LOGIN /var/log/mail.log.[1-9].gz | sed -n 's/.*user=\(.*\), ip.*/\L\1/p' | sort | uniq >> /tmp/mailbox.list

This script finds all logins from the mail log and prints out only the account@domain portion in lowercase sorting and printing one of each occurrence.

MySQL /tmp Usage with Optimize Table Command

February 10th, 2010

I’m currently trying to prune a MyISAM table with 200 million rows down to 100 million rows. As part of this process, I am simply removing any orphaned records. This is a simple tracking table where every record must have an associated record in the users table. The total size on disk is 11G of index data and 9G of table data.

The basic process to prune this table is to:
1. setup replication slave for this purpose
2. batch deletes from the table throughout the day
3. run optimize on the table each night

I was surprised to come in this morning to find that the optimize that I had started last night at 6PM had not finished yet. Looking on the server, there was no load, no swapping, no disk I/O, nothing. It was as if MySQL had fallen asleep at the wheel. I looked at the MySQL error log and discovered the problem, /tmp was full.

-0- josh@mysql-server04 /var/lib/mysql >
> tail /var/log/mysql/mysql.err
100210  7:38:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  7:48:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  7:58:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  8:08:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  8:18:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  8:28:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  8:38:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  8:48:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  8:58:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
100210  9:08:09 [ERROR] /usr/sbin/mysqld: Disk is full writing '/tmp/STJSSGYU' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs

The disk had been full since 4:00 AM and had been blocking on this. I quickly ran a df -h and confirmed that the disk was full, then extended the logical volume by 2G to make a total of 4G of space and MySQL continued to process the optimize.

-0- josh@mysql-server04 ~ >
> sudo lvextend -L +2G /dev/vg01/lvol03
  Extending logical volume lvol03 to 3.97 GB
  Logical volume lvol03 successfully resized

-0- josh@mysql-server04 ~ >
> resize2fs !$
resize2fs /dev/vg01/lvol03
resize2fs 1.39 (29-May-2006)
Filesystem at /dev/vg01/lvol03 is mounted on /tmp; on-line resizing required
Performing an on-line resize of /dev/vg01/lvol03 to 1040384 (4k) blocks.
The filesystem on /dev/vg01/lvol03 is now 1040384 blocks long.

I was then curious to see the files that were in /tmp and listed the contents to find nothing there.

-0- josh@mysql-server04 /tmp >
> find .
.
./.winbindd
./.winbindd/pipe
./.ICE-unix
./lost+found

I then issued another ‘df -h’ to find that the volume was 88% full. Something didn’t add up here. The next step is to use ‘lsof’ to find any deleted files that had not been synced to disk yet.

-0- josh@mysql-server04 /tmp >
> lsof | grep -i delet
mysqld     1404     mysql    7u      REG              253,2           0         12 /tmp/ibERGn68 (deleted)
mysqld     1404     mysql    8u      REG              253,2           0         13 /tmp/ibX1mdbt (deleted)
mysqld     1404     mysql    9u      REG              253,2           0         14 /tmp/ibjjh3fN (deleted)
mysqld     1404     mysql   10u      REG              253,2           0         15 /tmp/ibjHbE38 (deleted)
mysqld     1404     mysql   14u      REG              253,2           0         16 /tmp/ib3F6Tfw (deleted)
mysqld     1404     mysql   68u      REG              253,2  3385629393         17 /tmp/STJSSGYU (deleted)
hald-addo  5043 haldaemon  txt       REG              253,3       15720     258081 /usr/libexec/hald-addon-keyboard.#prelink#.RmsJR9 (deleted)
hald-addo  5046 haldaemon  txt       REG              253,3       15720     258081 /usr/libexec/hald-addon-keyboard.#prelink#.RmsJR9 (deleted)
hald-addo  5050 haldaemon  txt       REG              253,3       15720     258081 /usr/libexec/hald-addon-keyboard.#prelink#.RmsJR9 (deleted)

There is the culprit! We have a large file being used by MySQL although it’s been deleted and not synced to disk yet. That explains everything.

Note to self: ensure the /tmp directory on large MySQL servers is greater than the value of myisam_max_sort_file_size. On this server I had this value set to 10G while the /tmp directory volume was 2G.

Command line replace with perl

February 5th, 2010

I often run into issues on the command line where I’d like to perform a non-greedy search and replace. This is not possible with sed, grep, or egrep, AFAIK, so I must resort to perl. Here is how it’s done:

perl -pe 's|"http://(.*?)/.*$|$1|g' 

The above example will take a list of requested URIs or search domains from an apache log and print out only the domain. I’m using pipes instead of slashes in the regex to eliminate the need to escape the slashes.

Note the question mark used to make a non-greedy match.