Backup Route 53 zones

We have all heard about DNS catastrophes.  I just read about horror story on reddit the other day, where an Azure root DNS zone was accidentally deleted with no backup.  I experienced a similar disaster a few years ago – a simple DNS change managed to knock out internal DNS for an entire domain, which contained hundreds of records.  Reading the post hit close to home, uncovering some of my own past anxiety, so I began poking around for solutions.  Immediately, I noticed that backing up DNS records is usually skipped over as part of the backup process.  Folks just tend to never do it, for whatever reason.

I did discover, though, that backing up DNS is easy.  So I decided to fix the problem.

I wrote a simple shell script that dumps out all Route53 zones for a given AWS account to a json file, and uploads the zones to an S3 bucket.  The script is a handful lines, which is perfect because it doesn’t take much effort to potentially save your bacon.

If you don’t host DNS in AWS, the script can be modified to work for other DNS providers (assuming they have public API’s).

Here’s the script:

#!/usr/bin/env bash

set -e

# Dump route 53 zones to a text file and upload to S3.

BACKUP_DIR=/home/<user>/dns-backup
BACKUP_BUCKET=<bucket>
# Use full paths for cron
CLIPATH="/usr/local/bin"

# Dump all zones to a file and upload to s3
function backup_all_zones () {
  local zones
  # Enumerate all zones
  zones=$($CLIPATH/aws route53 list-hosted-zones | jq -r '.HostedZones[].Id' | sed "s/\/hostedzone\///")
  for zone in $zones; do
  echo "Backing up zone $zone"
  $CLIPATH/aws route53 list-resource-record-sets --hosted-zone-id $zone > $BACKUP_DIR/$zone.json
  done

  # Upload backups to s3
  $CLIPATH/aws s3 cp $BACKUP_DIR s3://$BACKUP_BUCKET --recursive --sse
}

# Create backup directory if it doesn't exist
mkdir -p $BACKUP_DIR
# Backup up all the things
time backup_all_zones

Be sure to update the <user> and <bucket> in the script to match your own environment settings.  Dumping the DNS records to json is nice because it allows for a more programmatic way of working with the data.  This script can be run manually, but is much more useful if run automatically.  Just add the script to a cronjob and schedule it to dump DNS periodically.

For this script to work, the aws cli and jq need to be installed.  The installation is skipped in this post, but is trivial.  Refer to the links for instructions.

The aws cli needs to be configured to use an API key with read access from Route53 and the ability to write to S3.  Details are skipped for this step as well – be sure to consult the AWS documentation on setting up IAM permissions for help with setting up API keys.  Another, simplified approach is to use a pre-existing key with admin credentials (not recommended).

Read More

Limit Jenkins Multibranch Pipeline Builds

Update: Thanks to Torben Kerr for adding instructions for setting up these properties at the multibranch build level rather then for each Jenkinsfile.  You can check out his “fix” here.

As the Jenkins pipeline functionality continues to rapidly evolve – the project documentation (or lack thereof), has been a consistent pain point as a user. Invariably, the documentation is either out of date or completely missing.  I expect the docs to improve as the project matures, but for now, the cake is a lie.  I ran into this roadblock recently, looking for a way to limit the number of concurrent builds that happen in Jenkins, using the pipeline.  In all of my anguish, I hope this post will help others in avoiding the tediousness of finding the seemingly simple functionality of limiting concurrent builds, as well as give some insight into strategies for figuring out how to find undocumented features in Jenkins.

While this feature is fairly obvious for old-style Jenkins jobs, a simple check box in the job configuration – finding the same functionality for pipelines is seemingly non existent.  Through extensive Googling and Stack Overflowing, I discovered this feature was recently added to the Multibranch plugin.  Specifically, I found an issue in the (awful) issue tracker used by Jenkins, which in turn led me to uncover some code in a semi recent PR that basically allows concurrency to be turned on or off.  Of course when I tried to use the code from the PR it didn’t work right away.  So I had to go deeper.

Eventually, I  stumbled across a SO post that discusses how to use the properties functionality of pipelines.  Equipped with this new piece of information, I finally had enough substance to start playing around with the code.  To make the creation of pipelines easier, Jenkins also recently added a snippet generator, which allows users to build out sample snippets quickly.

To use the snippet generator, either drill into an existing pipeline style job using a similar URL as below:

https://jenkins.example.com/job/<jobname>/pipeline-syntax/

Or create a new job, and click on the “Pipeline Syntax” link after it has been created to test out different snippets.

pipeline syntax

Inside the snippet generator there are a number of “steps” to choose from.  From the information I had already gathered, I just selected the properties step to create the basic skeleton of what I wanted and was able to use the disableConcurrentBuilds() function I found earlier. Below is a snippet of what the code in your Jenkinsfile might actually look like:

node {
 // This oneliner is what limits concurrent builds
 properties([disableConcurrentBuilds()])

 // Do stuff
 ...
}

Yep.  That’s it.  Just make sure to put the properties() function at the beginning of the node block, otherwise concurrency won’t be adjusted right away and could lead to problems.  Another thing to note; the step to disable concurrency could just as easily be moved into workflow libraries and applied at the global level and applied at the beginning of all jobs if you wanted to limit concurrency for all pipeline builds, since the code is just Groovy.  Finally, the code will disable concurrent builds on a per branch basis.  Essentially, if you push many different branches it will still build all of them, it will just limit each branch to one build at a time and will queue up jobs for any commits that get pushed after the initial job has been created.  I know that is a mouthful.  Let me know in the comments if this explanation needs any clarification.

While I love open source software, sometimes project’s move so fast that certain areas of it get neglected.  I am thankful for things like Github, because I was able use it to piece together all the other information I found to come up with a solution.  But, I would argue having good documentation not only saves folks like me the time and energy of the crazy searches, it also makes it much easier for potentially new users to look at, and understand what is going on.  I will be 100% honest and say that Jenkins pipelines are not for the faint of heart, and I’m sure there are many others who will agree with this sentiment.  I know it is easier said than done, but anything right now would be an improvement in my opinion.

Read More

Generate Certbot certificates with a container

This is a little bit of a follow up post to the origin post about generating certs with the DNS challenge.  I decided to create a little container that can be used to generate a certificate based on the newly renamed dehyrdated script with the extras to make DNS provisioning easy.

A few things have changed in the evolution of Let’s Encrypt and its tooling since the last post was written.  First, some of the tools have been renamed so I’ll just try to clear up some of the names if there is any confusion.  The official Let’s Encrypt client has been renamed to Certbot.  The shell script used to provision the certificates has been renamed as well.  What used to be called letsencrypt.sh has been renamed to dehydrated.

The Docker image can be found here.  The image is essentially the dehydrated script with a few other dependencies to make the DNS challenge work, including Ruby, a ruby script DNS hook and a few Gems that the script relies on.

The following is an example of how to run the script:

docker run -it --rm \
    -v $(pwd):/dehydrated \
    -e AWS_ACCESS_KEY_ID="XXX" \
    -e AWS_SECRET_ACCESS_KEY="XXX" \
    jmreicha/dehydrated-dns --cron --domain test.example.com --hook ./route53.rb --challenge dns-01

Just replace test.example.com with the desired domain.  Make sure that you have the DNS zone added to route53 and make sure the AWS credentials used have the appropriate permissions to read and write records on route53 zone.

The command is essentially the same as the command in the original post but is a lot more convenient to run now because you can specify where on your local system you want to dump the generated certificates to and you can also easily specify/update the AWS credentials.

I’d like to quickly explain the decision to containerize this process.  Obviously the dehydrated tool has been designed and written to be a standalone tool but in order to generate certificates using the DNS challenge requires a few extra tidbits to be added.  Cooking all of the requirements into a container makes the setup portable so it can be easily automated on different environments and flexible so that it can be run in a variety of setups, with different domain names and AWS credentials.  With the container approach, the certs could potentially be dropped out on to a Windows machine running Docker for Windows if desired, for example.

tl;dr This setup may be overkill for some, but it has worked out well for my purposes.  Feel free to give it a try if you want to test out creating Certbot certs with the deyhrdated tool in a container.

Read More

Fixing docker-machine shared folder performance issues

It is a known issue that vboxsf (Virtualbox Shared Folders) has performance problems.  This ugly fact becomes a problem if you use docker-machine with the default Virtualbox driver to mount volumes, both on Windows and OS X, especially when mounting directories with a large amount (~17k and above files).  Linux does not suffer from this performance problem since it is able to run Docker natively and does not require you to run docker-machine.

There are various issues floating around Github referencing this problem, most of which remain unresolved.

Unfortunately there is currently not a proper fix for the vboxsf performance issues on OS X and Windows.  In fact, I reached out to the Virtualbox developers around a year ago asking what the deal was and was basically told that fixing shared folder performance was not a high priority issue for their dev team.

After hearing the unsettling news, I set out to find a good way to deal with shared volumes.  I stumbled across a few different approaches to solving the problem but most of them ended up being glitchy (at the time) or overly complicated.  There is a nice write up that mentions many of the tools that I tried myself.

Having tried most of the methods out there, easily the best workaround I have found is to use NFS file shares to mount the “Users” directory using a tool called docker-machine-nfs.  It is easy to install and run and most importantly it just works out of the box, which is exactly what most folks are looking for.

Sadly this method does not work on Windows.  And as far as I can tell there is not a good workaround to this problem if you are running docker-machine on Windows.  It does sounds like some folks maybe have had some success using samba but I have not attempted to get fast volumes working on Windows so can’t say for sure what the best approach is.

To install docker-machine-nfs

curl -s https://raw.githubusercontent.com/adlogix/docker-machine-nfs/master/docker-machine-nfs.sh |
  sudo tee /usr/local/bin/docker-machine-nfs > /dev/null && \
  sudo chmod +x /usr/local/bin/docker-machine-nfs

To run it

Make sure you already have created a docker-machine VM and verify that it is running.  Then run the following command.

docker-machine-nfs <machine-name>

And that’s pretty much it…  It requires admin access to do the NFS mounting so you might need to punch in your password, other than that you can pretty much follow along with what the output is doing.

There are a few caveats to be aware of.

I have noticed that on newer versions of docker-machine, if you run the script too quickly after creating the VM, docker-machine ends up having issues communicating with the Docker daemon.  The work around (for now) is just to wait ~30 seconds the docker-machine VM to boot fully before running the command to mount nfs.

There is also currently an issue on the docker-machine side on version 0.5.5 and above that breaks docker-machine-nfs on the first run, which is described here.  The workaround for that issue is to modify the script and place a “sleep 20” in the script, as per the comments in the issue.  The author appears to have brought the issue up with docker-machine developers so should fixed properly in the near future.

Read More

Autosnap AWS snapshot and volume management tool

This is my first serious attempt at a Python tool on github.  I figured it was about time, as I’ve been leveraging Open Source tools for a long time, I might as well try to give a little bit back.  Please check out the project and leave feedback by emailing, opening a github or issue or commenting here, I’d love to see what can be done with this tool, there are lots of bugs to shake out and things to improve.  Even better if you have some code you’d like to contribute, this is very much a work in progress!

Here is the project – https://github.com/jmreicha/autosnap.

Introduction

Essentially, this tool is designed to ease the management of the snapshot and volume lifecycle in an AWS environment.  I have discovered that snapshots and volumes can be used together to form a simple backup management system, so by simplifying the management of these resources, by utilizing the power of the AWS API, you can easily manage backups of your AWS data.

While this obviously isn’t a full blown backup tool, it can do a few handy things like leverage tags to create and destroy backups based on custom expiration dates and create snapshots based on a few other criteria, all managed with tags.  Another cool thing about handling backups this way is that you get amazing resiliency by storing snapshots to S3, as well as dirt cheap storage.  Obviously if you have a huge number of servers and volumes your mileage will vary, but this solution should scale up in to the hundreds, if not thousands pretty easily.  The last big bonus is that you can nice granularity for backups.

For example, if you wanted to keep a weeks worth of backups across all your servers in a region, you would simply use this tool to set an expiration tag of 7 days and voila.  You will have rolling backups, based on snapshots for the previous seven days.  You can get the backup schedule fairly granular, because the snapshots are tagged down to the hour. It would be easy to get them down to the second if that is something people would find useful, I could see DB snapshots being important enough but for now it is set to the hour.

The one drawback is that this needs to be run on a daily basis so you would need to add it to a cron job or some other tool that runs tasks periodically.  Not a drawback really as much of a side note to be aware of.

Configuration

There is a tiny bit of overhead to get started, so I will show you how to get going.  You will need to either set up a config file or let autosnap build you one.  By default, autosnap will help create one the first time you run it, so you can use this command to build it:

autosnap

If you would like to provide your own config, create a file called ‘.config‘ in the base directory of this project.  Check the README on the github page for the config variables and for any clarifications you may need.

Usage

Use the –help flag to get a feeling for some of the functions of this tool.

$ autosnap --help

usage: autosnap [--config] [--list-vols] [--manage-vols] [--unmanage-vols]
 [--list-snaps] [--create-snaps] [--remove-snaps] [--dry-run]
 [--verbose] [--version] [--help]

optional arguments:
 --config          create or modify configuration file
 --list-vols       list managed volumes
 --manage-vols     manage all volumes
 --unmanage-vols   unmanage all volumes
 --list-snaps      list managed snapshots
 --create-snaps    create a snapshot if it is managed
 --remove-snaps    remove a snapshot if it is managed
 --version         show program's version number and exit
 --help            display this help and exit

The first thing you will need to do is let autosnap manage the volumes in a region:

autosnap --manage-vols

This command will simply add some tags to help with the management of the volumes.  Next, you can take a look and see what volumes got  picked up and are now being managed by autosnap

autosnap --list-vols

To take a snapshot of all the volumes that are being managed:

autosnap --create-snaps

And you can take a look at your snapshots:

autosnap --list-snaps

Just as easily you can remove snapshots older than the specified expiration date:

autosnap --remove-snaps

There are some other useful features and flags but the above commands are pretty much the meat and potatoes of how to use this tool.

Conclusion

I know this is not going to be super useful for everybody but it is definitely a nice tool to have if you work with AWS volumes and snapshots on a semi regular basis.  As I said, this can easily be improved so I’d love to hear what kinds of things to add or change to make this a great tool.  I hope to start working on some more interesting projects and tools in the near future, so stay tuned.

Read More