Fix the JenkinsAPI No valid crumb error

If you are working with the Python based JenkinsAPI library you might run into the No valid crumb was included in the request error.  The error below will probably look familiar if you’ve run into this issue.

Traceback (most recent call last):
 File "myscript.py", line 47, in <module>
 deploy()
 File "myscript.py", line 24, in deploy
 jenkins.build_job('test')
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/jenkins.py", line 165, in build_job
 self[jobname].invoke(build_params=params or {})
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/job.py", line 209, in invoke
 allow_redirects=False
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/utils/requester.py", line 143, in post_and_confirm_status
 response.text.encode('UTF-8')
jenkinsapi.custom_exceptions.JenkinsAPIException: Operation failed. url=https://jenkins.example.com/job/test/build, data={'json': '{"parameter": [], "statusCode": "303", "redirectTo": "."}'}, headers={'Content-Type': 'application/x-www-form-urlencoded'}, status=403, text=b'<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>\n<title>Error 403 No valid crumb was included in the request</title>\n</head>\n<body><h2>HTTP ERROR 403</h2>\n<p>Problem accessing /job/test/build. Reason:\n<pre> No valid crumb was included in the request</pre></p><hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.z-SNAPSHOT</a><hr/>\n\n</body>\n</html>\n'

It is good practice to enable additional security in Jenkins by turning on the “Prevent Cross Site Forgery exploits” option in the security settings, so if you see this error it is a good thing.  The below example shows this security feature in Jenkins.

enable xss protection

The Fix

This error threw me off at first, but it didn’t take long to find a quick fix.  There is a crumb_requester class in the jenkinsapi that you can use to create the crumbed auth token.  You can use the following example as a guideline in your own code.

from jenkinsapi.jenkins import Jenkins
from jenkinsapi.utils.crumb_requester import CrumbRequester

JENKINS_USER = 'user'
JENKINS_PASS = 'pass'
JENKINS_URL = 'https://jenkins.example.com'

# We need to create a crumb for the request first
crumb=CrumbRequester(username=JENKINS_USER, password=JENKINS_PASS, baseurl=JENKINS_URL)

# Now use the crumb to authenticate against Jenkins
jenkins = Jenkins(JENKINS_URL, username=JENKINS_USER, password=JENKINS_PASS, requester=crumb)

...

The code looks very similar to creating a normal Jenkins authentication object, the only difference being that we create and then pass in a crumb for the request, rather than just a username/password combination.  Once the crumbed authentication object has been created, you can continue writing your Python code as you would normally.  If you’re interested in learning more about crumbs and CSRF you can find more here, or just Google for CSRF for more info.

This issue was slightly confusing/annoying, but I’d rather deal with an extra few lines of code and know that my Jenkins server is secure.

Read More

Quicktip: Manage Memory Usage with Supervisord

I have been using Supervisord for process management for quite a while now but had no idea it could manage memory usage (among other things) until just recently.

There is a Python project called Superlance which essentially adds some extra functionality to supervisord for managing processes and memory.  The docs are a little thin so I thought it would be a good idea to highlight some of the functionality for folks that just want a few examples of how it works or can be used in a useful way.

Obviously you will want to have supervisor installed and configured already.  That can be done with pip or via apt-get.  You will also need to make sure you have a proper [unix_http_server] section in your /etc/supervisor/supervisord.conf file.

To install Superlance (on Ubuntu 14.04).

sudo pip install superlance

This will download and install a handful of Python scripts that can then be plugged in to Supervisor.  Check the link above if you are interested in the other plugins.

Then you will need to add a section to your supervisor config for memmon to manage memory usgae.

[eventlistener:memmon]
command=memmon -p <program_name>=3GB
events=TICK_60

The “-p <program_name>” corresponds to the program header in your supervisor configuration.  There are other options available to manage group processes, etc. for more advanced use cases but this should cover most basic scenarios.

You will need to reload the supervisor configuration after your changes have been made.  Unforunately the supervisor process needs to be fully reloaded.

sudo supervisorctl reload

If you want to check that the the memmon script is available before restarting supervisor you can use reread.

sudo supervisorctl reread

I would suggest reading through the Superlance docs and checking out the other scripts.  This additional functionality really helps add another layer of functionality to supervisord that I didn’t know existed.

Read More

Autosnap AWS snapshot and volume management tool

This is my first serious attempt at a Python tool on github.  I figured it was about time, as I’ve been leveraging Open Source tools for a long time, I might as well try to give a little bit back.  Please check out the project and leave feedback by emailing, opening a github or issue or commenting here, I’d love to see what can be done with this tool, there are lots of bugs to shake out and things to improve.  Even better if you have some code you’d like to contribute, this is very much a work in progress!

Here is the project – https://github.com/jmreicha/autosnap.

Introduction

Essentially, this tool is designed to ease the management of the snapshot and volume lifecycle in an AWS environment.  I have discovered that snapshots and volumes can be used together to form a simple backup management system, so by simplifying the management of these resources, by utilizing the power of the AWS API, you can easily manage backups of your AWS data.

While this obviously isn’t a full blown backup tool, it can do a few handy things like leverage tags to create and destroy backups based on custom expiration dates and create snapshots based on a few other criteria, all managed with tags.  Another cool thing about handling backups this way is that you get amazing resiliency by storing snapshots to S3, as well as dirt cheap storage.  Obviously if you have a huge number of servers and volumes your mileage will vary, but this solution should scale up in to the hundreds, if not thousands pretty easily.  The last big bonus is that you can nice granularity for backups.

For example, if you wanted to keep a weeks worth of backups across all your servers in a region, you would simply use this tool to set an expiration tag of 7 days and voila.  You will have rolling backups, based on snapshots for the previous seven days.  You can get the backup schedule fairly granular, because the snapshots are tagged down to the hour. It would be easy to get them down to the second if that is something people would find useful, I could see DB snapshots being important enough but for now it is set to the hour.

The one drawback is that this needs to be run on a daily basis so you would need to add it to a cron job or some other tool that runs tasks periodically.  Not a drawback really as much of a side note to be aware of.

Configuration

There is a tiny bit of overhead to get started, so I will show you how to get going.  You will need to either set up a config file or let autosnap build you one.  By default, autosnap will help create one the first time you run it, so you can use this command to build it:

autosnap

If you would like to provide your own config, create a file called ‘.config‘ in the base directory of this project.  Check the README on the github page for the config variables and for any clarifications you may need.

Usage

Use the –help flag to get a feeling for some of the functions of this tool.

$ autosnap --help

usage: autosnap [--config] [--list-vols] [--manage-vols] [--unmanage-vols]
 [--list-snaps] [--create-snaps] [--remove-snaps] [--dry-run]
 [--verbose] [--version] [--help]

optional arguments:
 --config          create or modify configuration file
 --list-vols       list managed volumes
 --manage-vols     manage all volumes
 --unmanage-vols   unmanage all volumes
 --list-snaps      list managed snapshots
 --create-snaps    create a snapshot if it is managed
 --remove-snaps    remove a snapshot if it is managed
 --version         show program's version number and exit
 --help            display this help and exit

The first thing you will need to do is let autosnap manage the volumes in a region:

autosnap --manage-vols

This command will simply add some tags to help with the management of the volumes.  Next, you can take a look and see what volumes got  picked up and are now being managed by autosnap

autosnap --list-vols

To take a snapshot of all the volumes that are being managed:

autosnap --create-snaps

And you can take a look at your snapshots:

autosnap --list-snaps

Just as easily you can remove snapshots older than the specified expiration date:

autosnap --remove-snaps

There are some other useful features and flags but the above commands are pretty much the meat and potatoes of how to use this tool.

Conclusion

I know this is not going to be super useful for everybody but it is definitely a nice tool to have if you work with AWS volumes and snapshots on a semi regular basis.  As I said, this can easily be improved so I’d love to hear what kinds of things to add or change to make this a great tool.  I hope to start working on some more interesting projects and tools in the near future, so stay tuned.

Read More

Cloud Backup Tutorial

I have been knee deep in backups for the past few weeks, but I think I can finally see light at the end of the tunnel.  What looked like a simple enough idea to implement turned out to be a much more complicated task to accomplish.  I don’t know why, but there seems to be practically no information at all out there covering this topic.  Maybe it’s just because backups suck?  Either way they are extremely important to the vitality of a company and without a workable set of data, you are screwed if something happens to your data.  So today I am going to write about managing cloud data and cloud backups and hopefully shine some light on this seemingly foreign topic.

Part of being a cloud based company means dealing with cloud based storage.  Some of the terms involved are slightly different than the standard backup and storage terminology.  Things like buckets, object based storage, S3, GCS, boto all come to mind when dealing with cloud based storage and backups.  It turns out that there are a handful of tools out there for dealing with our storage requirements which I will be discussing today.

The Google and Amazon API’s are nice because they allow for creating third party tools to manage the storage, outside of their official and standard tools.  In my journey to find a solution I ran across several, workable tools that I would like to mention.  The end goal of this project was to sync a massive amount of files and data from S3 storage to GCS.  I found that the following tools all provided at least some of my requirements and each has its own set of uses.  They are included here in no real order:

  • duplicity/duply – This tool works with S3 for small scale storage.
  • Rclone – This one looks very promising, supports S3 to GCS sync.
  • aws-cli – The official command line tool supported by AWS.

S3cmd – This was the first tool that came close to doing what I wanted.  It’s a really nice tool for smallish amounts of files and has some really nice and handy features and is capable of syncing S3 buckets.  It is equipped with a number of nice and handy options but unfortunately the way it is designed does not allow for reading and writing a large number of files.  It is a great tool for smaller sets of data.

s3s3mirror – This is an extremely fast copy tool written in Java and hosted on Github.  This thing is awesome at copying data quickly.  This tool was able to copy about 6 million files in a little over 5 hours the other day.  One extremely nice feature of this tool is that it has an intelligent sync built in so it knows which files have been copied over.  Even better, this tool is even faster when it is running reads only.  So once your initial sync has completed, additional syncs are blazing fast.

This is a jar file so you will need to have Java installed on your system to run it.

sudo apt-get install openjdk-jre-headless

Then you will need to grab the code from Github.

git clone [email protected]:cobbzilla/s3s3mirror.git

And to run it.

./s3s3mirror.sh first-bucket/ second-bucket/

That’s pretty much it.  There are some handy flags but this is the main command. There is an -r flag for changing the retry count, a -v flag for verbosity and troubleshooting as well as a –dry-run flag to see what will happen.

The only down side of this tool is that it only seems to be supported for S3 at this point – although the source is posted to Github so could easily be adapted to work for GCS, which is something I am actually looking at doing.

Gsutil – The Python command line tool that was created and developed by Google.  This is the most powerful tool that I have found so far.  It has a ton of command line options, the ability to communicate with other cloud providers, open source and is under active development and maintenance.  Gsutil is scriptable and has code for dealing with failures – it can retry failed copies as well as resumable transfers, and has intelligence for checking which files and directories already exist for scenarios where synchronizing buckets is important.

The first step to using gsutil after installation is to run through the configuration with the gsutil config command.  Follow the instructions to link gsutil with your account.  After the initial configuration has been run you can modify or update all the gsutil goodies by editing the config file – which lives in ~/.boto by default.  One config change that is worth mentioning is the parallel_process_count and parallel_thread_count.  These control how much data can get shoved through gsutil at once – so on really beefy boxes you can crank this number up quite a bit higher than its default.  To utilize the parallel processing you simply need to set the -m flag on your gsutil command.

gsutil -m sync/cp gs://bucket-name

One very nice feature of gsutil is that it has built in functionality to interact with AWS and S3 storage.  To enable  this functionality you need to copy your AWS access_id and your secret_access_key in to your ~/.boto config file.  After that, you can test out the updated config to look at your buckets that live on S3.

gsutil ls s3://

So your final command to sync an S3 bucket to Google Cloud would look similar to the following,

gsutil -m cp -R s3://bucket-name gs://bucket-name

Notice the -R flag, which sets the copy to be a recursive copy instead everything in one bucket to the other, instead of a single layer copy.

There is one final tool that I’d like to cover, which isn’t a command line tool but turns out to be incredibly useful for copying large sets of data from S3 in to GCS, which is the GCS Online Import tool.  Follow the link and go fill out the interest form listed and after a little while you should hear from somebody from Google about setting up and using your new account.  It is free to use and the support is very good. Once you have been approved for using this tool you will need to provide a little bit of information for setting up sync jobs, your AWS ID and key, as well as allowing your Google account to sync the data.  But it is all very straight forward and if you have any questions the support is excellent.  This tool saved me from having to manually sync my S3 storage to GCS manually, which would have taken at least 7 days (and that was even with a monster EC2 instance).

Ultimately, the tools you choose will depend on your specific requirements.  I ended up using a combination of s3s3mirror, AWS bucket versioning, the Google cloud import tool and gsutil.  But my requirements are probably different from the next person and each backup scenario is unique so a combination of these various tools allows for flexibility to accomplish pretty much all scenarios.  Let me know if you have any questions or know of some other tools that I have failed to mention here.  Cloud backups are an interesting and unique challenge that I am still mastering so I would love to hear any tips and tricks you may have.

Read More

Getting Python Fabric setup in Windows

This has really turned into a wild goose chase.  Initially my goal when I set out on this project was simply to get Fabric up and running so I could test out some different features on some network gear.  It seems like the Python integration in Windows is very different than it is in the Linux world where everything is all bundled up nice and neatly.  There are several separate, seemingly unrelated pieces that all need to fit together to get Python and Fabric working correctly in a Windows environment, which can be very perplexing at first, hence my need to write a post so I don’t have to remember all this complexity for next time.  I thought I might as well show people how I got this to work instead of picking and choosing different bits of information from the internet.

The following is a list of links that I have found to be helpful in getting everything up and going, flip back to here for the different resources and components:

There’s a few steps for getting up and running.  For basic Python functionality it should be enough to download and install Python via the basic installer in your Windows environment.  Accepting the defaults should be enough.  Also, I recommend going with Python 2.7, rather than 3.3 because it has much better backwards compatibility.  You will also want to double check to make sure you download the correct version for you OS as well, either 32-bit or 64-bit.

Once you have your Python install up and going you will want to get pip installed. You will use this tool to get Python modules because it aids tremendously with downloading, managing and installing useful Python code.

So to get up and running with pip, first make sure that you have the correctly matched version of Python and the pip installed for your environment.  For example the 2.7 pip installer will not work with a 3.3 Python installation.  Second, you will need to make sure you have the Distribute package installed in your Python environment as well.  This is the tool that will allow pip to work.  Once you have these modules installed you will need to switch to the directory where pip is installed (or add it to your ENV path variable).  For me it was located in the following location:

C:\Python27\Scripts\pip.exe

So the command to install Fabric would be as follows:

pip.exe install fabric

You would think that’s all you need to get fabric working right?  Well it turns out that using this method we do not have the correct version of Pycrypto installed.

pycrypto error

So using the link posted above go ahead and get the correct version of Pycrypto downloaded and installed (version 2.1.0).  That still doesn’t fix it though!  It just gets us to a different error.  I used this post and this post as a guide for getting the correct version of Pycrypto installed on the Windows machine.

Okay, so now we should have a fully functioning Python environment with Fabric installed.  The only main issue that remains at this point (to my knowledge at least) is that pip still doesn’t work quite right when attempting to install various Python packages.  To get that part working you will need MinGW32 installed (reference above for links).  But that is basically out of the scope of this post, I will write another post about it if there is any interest or you can ask me if you have issues as always.

The only other piece left then is to get Fabric up and going with our Cisco gear.  Take a look at the docs for basic usage on getting acquainted with Fabric, it is fairly straight forward for the most part.

One thing I was not aware of was the way Cisco CLI and devices would behave when using Fabric to control them remotely.  I was having issues with Fabric flaking out whenever I went into config mode on a Cisco switch.  It turns out that when you enter into config mode you are essentially dropped into a new shell and Fabric doesn’t have a nice way to deal with that.  So something like this will bomb out,

def test():
	run("conf t", shell=False)
	run("int 1/0/1", shell=False)
	run("no shut", shell=False)
	run("exit", shell=False)

The “conf t” command opens your new shell and the Cisco gear freaks out because it doesn’t know what to do with the next command.  I should also mention the shell=False is somewhat unrelated to this issue but it gets around Fabric trying to use bash as its default shell.  The workaround?  Use the open_shell command in Fabric and escape each command by using \n to escape to a new line.  So a sample command using this format would be something like the following,

def test():
	open_shell("conf t \n"
		   "ip name-server 1.1.1.1 \n"
		   "exit \n"
		   "exit \n"
		   )

Yeah this is sort of hacky, and I’m not sure if it will be able to do everything I am looking for but hey at least it kind of works.  I am currently looking for a more robust and easier way around this limitation so if you have any suggestions let me know.

Credit goes to markmm on reddit for letting me know about this workaround as well as the people who hang out on the #fabric irc channel on freenode.

Read More