AWS Access Keys in S3 Bucket Policies

I’ve seen what appeared to be AWS Access Keys in S3 bucket policies as an AWS principal in the past. I could never figure out why this was happening and nobody appeared to be adding them. The Access Key never showed up as a valid user Access Key in a search of IAM objects either.

It turns out that if you have an S3 bucket policy with a reference to an IAM user, and delete that user, the principal will be replaced with a string that appears to be an access key. I assume that this is an internal pointer that AWS uses to track that user.

Note: While it is syntactically correct, using an AWS Access Key as a principal in an IAM policy attached to an S3 bucket is not a valid object.

Adding Global Environment Variables to Jenkins via puppet…

When using Jenkins in any environment, it’s useful to have variables related to that environment available to Jenkins jobs. I recently worked on a project where I used puppet to deploy global environment variables to Jenkins for use with AWS commands — typically to execute the awscli, one must have knowledge of the region, account, and other items.

In order to make global environment variables available to Jenkins, we can create an init.groovy.d directory in $JENKINS_HOME, as part of the Jenkins puppet profile, ie:

class profile::jenkins::install () {
  file { '/var/lib/jenkins/init.groovy.d':
    ensure => directory,
    owner  => jenkins,
    group  => jenkins,
    mode   => '0755',

We then need to create the puppet template (epp) that we will deploy to this location, as a groovy script:

import jenkins.model.Jenkins
import hudson.slaves.EnvironmentVariablesNodeProperty
import hudson.slaves.NodeProperty

def instance             = Jenkins.instance
def environment_property = new EnvironmentVariablesNodeProperty();

for (property in environment_property) {
  property.envVars.put("AWS_VARIABLE1", "<%= @ec2_tag_variable1 -%>")
  property.envVars.put("AWS_VARIABLE2", "<%= @ec2_tag_variable2 -%>")
  property.envVars.put("AWS_VARIABLE3", "<%= @ec2_tag_variable3 -%>")


Note that in this instance, I am using the ec2tagfacts puppet module that allows me to use EC2 tags as facts in puppet. I will later move to dynamic fact enumeration using a script with facter.

The next step is to add another file resource to the Jenkins puppet profile to place the groovy script in the proper location and restart the Jenkins Service:

class profile::jenkins::install () {
  file { '/var/lib/jenkins/init.groovy.d/aws-variables.groovy':
    ensure  => present,
    mode    => '0755',
    owner   => jenkins,
    group   => jenkins,
    notify  => Service['jenkins'],
    content => template('jenkins/aws-variables.groovy.epp'),

Now when puppet next runs, this will deploy the groovy script and restart Jenkins to take effect.

Note that these environment variables are not viewable under System Information under Manage Jenkins, but are only available inside each Jenkins job, ie inside a shell build section:

#!/bin/bash -x

echo "${AWS_VARIABLE1}"

Retrieving puppet facts from AWS System Manager

AWS System Manager makes it easy to store and retrieve parameters for use across servers, services, and applications in AWS. One great benefit is storing secrets for use, as needed. I recently needed to retrieve some parameters to place in a configuration file via puppet and wrote a short script to retrieve these values as facts.

Create a script like the following in /etc/facter/facts.d, make it executable.


aws configure set region us-east-1
application_username=$(aws ssm get-parameter --name application_username | egrep "Value" | awk -F\" '{print $4}')
application_password=$(aws ssm get-parameter --name application_password --with-decryption | egrep "Value" | awk -F\" '{print $4}')

echo "application_username=${application_username}"
echo "application_password=${application_password}"

exit 0;

Note that this assumes the username is not an encrypted secret, while the password is.

This can be tested with the following:

# facter -p application_username
# facter -p application_password

These facts can then be used in templates, like the following:

# config.cfg.erb
connection_string = <%= @application_username %>:<%= @application_password %>

ruby aws-sdk strikes again…

When using ruby to upload files to S3 and trying to use multipart upload, beware the following ArgumentError:

...param_validator.rb:32:in `validate!': unexpected value at params[:server_side_encryption] (ArgumentError)
	from /var/lib/jenkins/.gem/ruby/gems/aws-sdk-core-3.6.0/lib/seahorse/client/request.rb:70:in `send_request'
	from /var/lib/jenkins/.gem/ruby/gems/aws-sdk-s3-1.4.0/lib/aws-sdk-s3/client.rb:3980:in `list_parts'

The options passed to list_parts must not include “server_side_encryption”. I always forget to remove this parameter.

A good way that I have found to solve this issue is:

      input_opts = {
        bucket:                 bucket,
        key:                    key,
        server_side_encryption: "AES256",

      if defined? mime_type
        input_opts = input_opts.merge({
          content_type: mime_type,
      input_opts.delete_if {|key,value| key.to_s.eql?("content_type") }
      input_opts.delete_if {|key,value| key.to_s.eql?("server_side_encryption") }

      input_opts = input_opts.merge({
          :upload_id   => mpu_create_response.upload_id,

      parts_resp = s3.list_parts(input_opts)

You can see here that I delete values that may have been added so that the final options hash will work with the list_parts call.

Throttling Requests with the Ruby aws-sdk

A common problem of late is throttling requests when using the ruby aws-sdk gem to access AWS services. Handling these exceptions is fairly trivial with a while loop like the following:

retry_count   = 0 
retry_success = 0 

while retry_success == 0
  retry_success = 1

    # enter code to interact with AWS here

  rescue Aws::APIGateway::Errors::TooManyRequestsException => tmre

  # note that different AWS services have different exceptions
  # for this type of response, be sure to check your error output

    sleep_time = ( 2 ** retry_count )
    retry_success = 0 
    sleep sleep_time
    retry_count = retry_count + 1 


Note that there are different exceptions for different services that might indicate a throttling scenario so be sure to check the output received or the documentation around which exception to handle. Also note that additional exceptions should be handled around bad requests, missing, duplicate, unavailable, or mal-formed objects.

Multipart uploads to s3 using aws-sdk v2 for ruby…

The Ruby guys over at AWS have done a great job at explaining file uploads to S3 but they left out how to perform multipart uploads citing reservation over “advanced use cases“.


  • identify an S3 bucket to upload a file to — use an existing bucket or create a new one
  • create or identify a user with an access key and secret access key
  • install version 2 of the aws-sdk gem
  • this example uses ruby 2.2.0, other versions may not be supported or work the same way

A multipart upload consists of four steps: setup a client connection to S3, a call to create_multipart_upload, one or more calls to upload_part, and finally, and if all works out, a call to complete_multipart_upload. Otherwise, the final step would be an abort_multipart_upload.

Before starting the upload, I will first establish a constant to define the file size before an upload will be executed using multipart or whether I will simply call upload_file.

# 100MB

Next, over-ride the File class and setup a method to return parts of the file:

class File
  def each_part(part_size=PART_SIZE)
    yield read(part_size) until eof?

The rest of this script assumes that between 4 and 5 parameters were passed into the script from the command line. The reason we pass all of these in is so that we can keep credentials out of source control and so we can use this script with any bucket and any object. The optional ‘prefix’ parameter would be text that would be prepended to the key and facilitate organizing key objects into a directory structure.

(access,secret,bucket,localfile,prefix) = ARGV

Next, we need to establish an authenticated client connection to S3:

s3 =
  region: 'us-east-1',

This will establish a connection to the us-east-1 region using the access key and secret access key variables listed.

Next, I will setup the key with any path information removed and with the optional prefix prepended:

  filebasename = File.basename(localfile)

  if prefix
    key = prefix + "/" + filebasename
    key = filebasename

The next step is to open the File and determine whether or not it’s large enough to warrant a multipart upload:, 'rb') do |file|
    if file.size > PART_SIZE
      puts "File size over #{PART_SIZE} bytes, using multipart upload..."

Next, we use create_multipart_upload to start the upload and return an upload_id needed to manage the rest of the process. Replace bucket and key with your s3 bucket and the intended name of the file on S3 (aka key).

      input_opts = {
        bucket: bucket,
        key:    key,

      mpu_create_response = s3.create_multipart_upload(input_opts)

If all worked well there, we can then upload the parts. I like to give some level of progress as the parts are uploaded so let’s add some code to do that as well:

      total_parts = file.size.to_f / PART_SIZE
      current_part = 1 

Next, we can iterate through the file parts as we upload them:

      file.each_part do |part|

        part_response = s3.upload_part({
          body:        part,
          bucket:      bucket,
          key:         key,
          part_number: current_part,
          upload_id:   mpu_create_response.upload_id,

        percent_complete = (current_part.to_f / total_parts.to_f) * 100 
        percent_complete = 100 if percent_complete > 100 
        percent_complete = sprintf('%.2f', percent_complete.to_f)
        puts "percent complete: #{percent_complete}"
        current_part = current_part + 1 


This part also computed the complete percentage which prints out a progress line after each part is uploaded. That part is optional but I’ve found my clients really appreciate it.

Once all of the parts are uploaded, we can finish the multipart upload with a call to complete_multipart_upload:

      input_opts = input_opts.merge({
          :upload_id   => mpu_create_response.upload_id,

      parts_resp = s3.list_parts(input_opts)

      input_opts = input_opts.merge(
          :multipart_upload => {
            :parts =>
     do |part|
              { :part_number => part.part_number,
                :etag        => part.etag }

      mpu_complete_response = s3.complete_multipart_upload(input_opts)

Note that we called list_parts to get a some required information about each part. This would allow us to make a multi-threaded upload client but that would require a different approach relative to what I have done here and we will leave that for advanced use cases (ha!).

Finally, be sure to close the if statement and provide the put_object method for files smallert than PART_SIZE:

      s3.put_object(bucket: bucket, key: key, body: file)

The script can then be called with the following format:

./upload_file.rb $access_key $secret_key $bucket $upload_file $optional_prefix

Now that completes the multipart upload. A critical piece that was not covered here is that if you do not complete an upload, it is consuming space that you are paying for and you either need to abort it or complete it to free up this space. I will share a utility that I have to clean these up in a future article.