Getting Python Fabric setup in Windows

This has really turned into a wild goose chase.  Initially my goal when I set out on this project was simply to get Fabric up and running so I could test out some different features on some network gear.  It seems like the Python integration in Windows is very different than it is in the Linux world where everything is all bundled up nice and neatly.  There are several separate, seemingly unrelated pieces that all need to fit together to get Python and Fabric working correctly in a Windows environment, which can be very perplexing at first, hence my need to write a post so I don’t have to remember all this complexity for next time.  I thought I might as well show people how I got this to work instead of picking and choosing different bits of information from the internet.

The following is a list of links that I have found to be helpful in getting everything up and going, flip back to here for the different resources and components:

There’s a few steps for getting up and running.  For basic Python functionality it should be enough to download and install Python via the basic installer in your Windows environment.  Accepting the defaults should be enough.  Also, I recommend going with Python 2.7, rather than 3.3 because it has much better backwards compatibility.  You will also want to double check to make sure you download the correct version for you OS as well, either 32-bit or 64-bit.

Once you have your Python install up and going you will want to get pip installed. You will use this tool to get Python modules because it aids tremendously with downloading, managing and installing useful Python code.

So to get up and running with pip, first make sure that you have the correctly matched version of Python and the pip installed for your environment.  For example the 2.7 pip installer will not work with a 3.3 Python installation.  Second, you will need to make sure you have the Distribute package installed in your Python environment as well.  This is the tool that will allow pip to work.  Once you have these modules installed you will need to switch to the directory where pip is installed (or add it to your ENV path variable).  For me it was located in the following location:

C:\Python27\Scripts\pip.exe

So the command to install Fabric would be as follows:

pip.exe install fabric

You would think that’s all you need to get fabric working right?  Well it turns out that using this method we do not have the correct version of Pycrypto installed.

pycrypto error

So using the link posted above go ahead and get the correct version of Pycrypto downloaded and installed (version 2.1.0).  That still doesn’t fix it though!  It just gets us to a different error.  I used this post and this post as a guide for getting the correct version of Pycrypto installed on the Windows machine.

Okay, so now we should have a fully functioning Python environment with Fabric installed.  The only main issue that remains at this point (to my knowledge at least) is that pip still doesn’t work quite right when attempting to install various Python packages.  To get that part working you will need MinGW32 installed (reference above for links).  But that is basically out of the scope of this post, I will write another post about it if there is any interest or you can ask me if you have issues as always.

The only other piece left then is to get Fabric up and going with our Cisco gear.  Take a look at the docs for basic usage on getting acquainted with Fabric, it is fairly straight forward for the most part.

One thing I was not aware of was the way Cisco CLI and devices would behave when using Fabric to control them remotely.  I was having issues with Fabric flaking out whenever I went into config mode on a Cisco switch.  It turns out that when you enter into config mode you are essentially dropped into a new shell and Fabric doesn’t have a nice way to deal with that.  So something like this will bomb out,

def test():
	run("conf t", shell=False)
	run("int 1/0/1", shell=False)
	run("no shut", shell=False)
	run("exit", shell=False)

The “conf t” command opens your new shell and the Cisco gear freaks out because it doesn’t know what to do with the next command.  I should also mention the shell=False is somewhat unrelated to this issue but it gets around Fabric trying to use bash as its default shell.  The workaround?  Use the open_shell command in Fabric and escape each command by using \n to escape to a new line.  So a sample command using this format would be something like the following,

def test():
	open_shell("conf t \n"
		   "ip name-server 1.1.1.1 \n"
		   "exit \n"
		   "exit \n"
		   )

Yeah this is sort of hacky, and I’m not sure if it will be able to do everything I am looking for but hey at least it kind of works.  I am currently looking for a more robust and easier way around this limitation so if you have any suggestions let me know.

Credit goes to markmm on reddit for letting me know about this workaround as well as the people who hang out on the #fabric irc channel on freenode.

Read More

Document storage: Part 6

Document Storage Project

This is Part 6: Tying it all together.

All that’s left to do now is write a script that will:

  • Detect when a new file’s been uploaded.
  • Turn it into a searchable PDF with OCR.
  • Put the finished PDF in a suitable directory so we can easily browse for it later.

This is actually pretty easy. inotifywait(1) will tell us whenever a file’s been closed, we can use that as our trigger to OCR the document.

Our script is therefore in two parts:

Part 1: will watch the /home/incoming directory for any files that are closed.
Part 2: will be called by the script in part 1 every time a file is created.

Part 1

This script lives in /home/scripts and is called watch-dir.

#!/bin/bash
INCOMING="/home/incoming"
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

inotifywait -m --format '%:e %f' -e CLOSE_WRITE "${INCOMING}"  2>/dev/null | while read LINE
do
        FILE="${INCOMING}"/`echo ${LINE} | cut -d" " -f2-`
        "${DIR}"/process-image "${FILE}" &
done

Part 2

This script lives in /home/scripts and is called process-image.

#!/bin/bash

# Dead easy - at least in theory!
# Take a single argument - filename of the file to process. 
# Do all the necessary processing to make it a 
# searchable PDF.

OUTFILE="`basename "${1}"`"
TEMPFILE="`mktemp`"

if [ -s "${1}" ]
then
	# We use the first part of the filename as a classification.
	CLASSIFICATION=`echo ${OUTFILE} | cut -f1 -d"-"`
	OUTDIR="/home/http/documents/${CLASSIFICATION}/`date +%Y`/`date +%Y-%m`/`date +%Y-%m-%d`"

	if [ ! -d "${OUTDIR}" ]
	then
		mkdir -p "${OUTDIR}" || exit 1
	fi

	# We have to move our file to a temporary location right away because 
	# otherwise pdfsandwich uses the file's own location for 
	# temporary storage. Well and good - but the file's location is 
	# subject to an inotify that will call this script!

	mv "${1}" "${TEMPFILE}" || exit 1

	# Have we a colour or a mono image? Probably quicker to find out 
	# and process accordingly rather than treat everything as RGB.
	# We assume the first page is representative of everything
        COLOURDEPTH=`convert "${TEMPFILE}[0]" -verbose -identify /dev/null 2>/dev/null | grep "Depth:" | awk -F'[/-]' '{print $2}'`
	if [ "${COLOURDEPTH}" -gt 1 ]
	then
		SANDWICHOPTS="-rgb"
	fi
	pdfsandwich ${SANDWICHOPTS} -o "${OUTDIR}/${OUTFILE}" "${TEMPFILE}" > /dev/null 2>&1
	rm "${TEMPFILE}"
fi

There’s just one thing missing: pdfsandwich. This is actually something I found elsewhere on the web. It hasn’t made it into any of the major distro repositories as far as I can tell, but it’s easy enough to compile and install yourself. Find it here.

Run /home/scripts/watch-dir every time we boot – the easiest way to do this is to include a line in /etc/rc.local that calls it:

/home/scripts/watch-dir &

Get it started now (unless you were planning on rebooting):

nohup /home/scripts/watch-dir &

Now you should be able to scan in documents, they’ll be automatically OCR’d and made available on the internal website you set up in part 3.

Further enhancements are left to the reader; suggestions include:

  • Automatically notifying sphider-plus to reindex when a document is added. (You’ll need a newer version of sphider-plus to do this. Unfortunately there is a cost associated with this, but it’s pretty cheap. Get it from here).
  • There is a bug in pdfsandwich (actually, I think the bug is probably in tesseract or hocr2pdf, both of which are called by pdfsandwich): under certain circumstances which I haven’t been able to nail down, sometimes you’ll find that in the finished PDF one page of a multi-page document will only show the OCR’d layer, not the original document. Track down this bug, fix it and notify the maintainer of the appropriate package so that the upstream package can also be fixed.
  • This isn’t terribly good for bulk scanning – if you want to scan in 50 one-page documents, you have to scan them individually otherwise they’ll be treated as a single 50 page document. Edit the script so we can somehow communicate with it that certain documents should be split into their constituent pages and store the resulting PDFs in this way.
  • Like all OCR-based solutions, this won’t give you a perfect representation of the source text in the finished PDF. But I’m quite sure the accuracy can be improved, very likely without having to make significant changes to how this operates. Carry out some experiments to figure out optimum settings for accuracy and edit the scripts accordingly.

Read More

Protip December: Customize your Powershell profile

Powershell is great but is a little boring if anything, out of the box, with its drab white foreground and all.  It isn’t exactly informative either, so I wanted to show everybody a quick trick to customize this look and feel to make things look a little bit cleaner.  Hopefully this introduction will demonstrate one of the many features that makes Powershell a great tool for Windows admins, which is its flexibility.

This customization file, called profile.ps1 can be located in one of two places.

  • The first location is the global location and would be useful when you want all users to have a customized Powershell profile.  This profile should be placed in C:\WINDOWS\system32\WindowsPowerShell\v1.0\Profile.ps1.
  • The second location is for the local profile and would be specific to each user account.  This file overrides the global configuration file and should be placed in C:\Username\My Documents\WindowsPowerShell\Profile.ps1.

By default these files don’t exist so you will have to navigate to the respective directory and create the initial, empty profile.ps1 file.

Once you have the file created just pop this chunk of code into your profile.ps1 file.

function prompt {
	$path = ""
	$pathbits = ([string]$pwd).split("\", [System.StringSplitOptions]::RemoveEmptyEntries)
	if($pathbits.length -eq 1) {
		$path = $pathbits[0] + "\"
	} else {
		$path = $pathbits[$pathbits.length - 1]
	}
	$userLocation = $env:username + '@' + [System.Environment]::MachineName + ' ' + $path
	$host.UI.RawUi.WindowTitle = $userLocation
    Write-Host($userLocation) -nonewline -foregroundcolor Green 

	Write-Host('>') -nonewline -foregroundcolor Green    
	return " "
}

Once you have this bit added you will need to reload Powershell and voila,

Customized powershell look
Here’s our customized Powershell look.

Much better.  As you can see we now have some additional, out of the box information including:

  • Our current username – jreichardt
  • Our computer name – JOSH-TEST
  • Our current directory – separated by the | (pipe) symbol
  • As well as a nice green foreground font to help with the readability.

Pretty simple to get set up but definitely adds a lot to the default look and feel.  Let me know if you have any other cool Powershell customization’s or tricks that you think are worth sharing.

Read More

Project: Document Storage Made Simple

I’ve recently identified a need to setup a means of indexing and browsing documents. This is a great little project to demonstrate how we can tie lots of tools together with Linux, so I’m going to write up what I’m doing as I go along.

The needs are pretty simple, but I haven’t found anything out of the box that suits. In particular, I’ve got the following needs:

  • This is only for a couple of people, it doesn’t need the complexity associated with a full-blown document management system.
  • It does need to index PDFs – including PDFs that have been generated from scanned in pages rather than from an office suite.
  • Scanned PDFs may not have had any sort of OCR process applied to them – but that doesn’t mean I don’t want to be able to search for them!
  • It needs to be able to do this with minimal interaction – as a rule of thumb, if it’s even conceivably possible to automate part of the process, that part of the process must be automated. I can think of better things to do with my time than click “Next…”
  • Must be able to interact with the system via a web browser.
  • Anything running on the public Internet is out – most of the information I’m scanning in has no business being anywhere near the public Internet.
  • Must be dead easy to backup. Anything that involves databases, Tomcat etc. is probably far too complicated.
  • Budget: About £250+VAT for a MFD that has a duplexing scanner unit. Other than that: £0. Most multifunction devices come with software that will OCR scanned files and index them, but further investigation suggests it usually fails the web-based and the “minimal interaction” requirement. I have a spare computer I can use sitting around, but I can’t justify a fortune on software. That may change in the future, but it’s what we’ve got now.

So, here’s the question: Can I do it? Read on….
(more…)

Read More

Make MacVim’s mvim script use tabs and play nice with the command line

tl;dr: Replace the mvim script with this modified version: https://gist.github.com/3780676

MacVim comes with a really sweet script called mvim, which lets you launch MacVim and edit files from the command line. Unfortunately, this script is a little weak in a few ways:

  • It doesn’t let you edit multiple files.
  • It doesn’t let you pass in command line options.
  • It doesn’t let you use new tabs for opening new files into an existing window.
  • It doesn’t let you pipe stdin into vim for viewing (great with diffs).

All those things are awesome, so let’s make the mvim script better! How do we do that?

Well, first we add some extra command line options parsing to detect if we’re in diff mode, if we’re using stdin, and to preserve options for passing back into MacVim later. At Line 60 we add the following:

# Add new flags for different modes
stdin=false
diffmode=false
# Preserve command line options lazily
while [ -n "$1" ]; do
  case $1 in 
  -d) diffmode=true; shift;;
  -?*) opts="$opts $1"; shift;;
  -) stdin=true; break;;
  *) echo "*"; break;; 
  esac
done

This is a pretty normal bash argument getting loop. We look for -d (diff mode) and – (stdin) separate from other arguments. We also need to modify the command that starts MacVim to handle our different modes, etc. So we replace that command (originally on line 69):

# Last step:  fire up vim.
# The program should fork by default when started in GUI mode, but it does
# not; we work around this when this script is invoked as "gvim" or "rgview"
# etc., but not when it is invoked as "vim -g".
if [ "$gui" ]; then
	# Note: this isn't perfect, because any error output goes to the
	# terminal instead of the console log.
	# But if you use open instead, you will need to fully qualify the
	# path names for any filenames you specify, which is hard.
	exec "$binary" -g $opts ${1:+"$@"}
else
	exec "$binary" $opts ${1:+"$@"}
fi

With this better command:

# Last step:  fire up vim.
# The program should fork by default when started in GUI mode, but it does
# not; we work around this when this script is invoked as "gvim" or "rgview"
# etc., but not when it is invoked as "vim -g".
if [ "$gui" ]; then
  # Note: this isn't perfect, because any error output goes to the
  # terminal instead of the console log.
  # But if you use open instead, you will need to fully qualify the
  # path names for any filenames you specify, which is hard.

  # Handle stdin
  if $stdin; then
    exec "$binary" -g $opts -
  elif $diffmode; then
    exec "$binary" -f -g -d $opts $*
  elif $tabs && [[ `$binary --serverlist` = "VIM" ]]; then
    #make macvim open stuff in the same window instead of new ones
    exec "$binary" -g $opts --remote-tab-silent ${1:+"$@"} & 
    wait
  else
    exec "$binary" -g $opts ${1:+"$@"}
  fi
else
  exec "$binary" $opts ${1:+"$@"}
fi

The first two branches are pretty clear – they just invoke the MacVim binary in the correct way, for our different modes. The third one uses the very awesome –remote-tab-silent option, which gives us the ability to reuse the same window with new tabs when we edit multiple files. Neato!

Finally, if you don’t want to do the modifications yourself, it’s available as a gist, so you can download it and use it as a drop-in replacement for the vanilla mvim script: https://gist.github.com/3780676

Read More