Export SNMP metrics with the Prometheus Operator

There are quite a few use cases for monitoring outside of Kubernetes, especially for previously built infrastructure and otherwise legacy systems.  Additional monitoring adds an extra layer of complexity to your monitoring setup and configuration, but fortunately Prometheus makes this extra complexity easier to manage and maintain, inside of Kubernetes.

In this post I will describe a nice clean way to monitor things that are internal to Kubernetes using Prometheus and the Prometheus Operator.  The advantage of this approach is that it allows the Operator to manage and monitor infrastructure, and it allows Kubernetes to do what it’s good at; make sure the things you want are running for you in an easy to maintain, declarative manifest.

If you are already familiar with the concepts in Kubernetes then this post should be pretty straight forward.  Otherwise, you can pretty much copy/paste most of these manifests into your cluster and you should have a good way to monitor things in your environment that are external to Kubernetes.

Below is an example of how to monitor external network devices using the Prometheus SNMP exporter.  There are many other exporters that can be used to monitor infrastructure that is external to Kubernetes but currently it is recommended to set up these configurations outside of the Prometheus Operator to basically separate monitoring concerns (which I plan on writing more about in the future).

Create the deployment and service

Here is what the deployment might look like.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: snmp-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: snmp-exporter
  template:
    metadata:
    labels:
      app: snmp-exporter
  spec:
    containers:
    - image: oakman/snmp-exporter
    command: ["/bin/snmp_exporter"]
    args: ["--config.file=/etc/snmp_exporter/snmp.yml"]
    name: snmp-exporter
    ports:
    - containerPort: 9116
      name: metrics

And the accompanying service.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: snmp-exporter
  name: snmp-exporter
spec:
  ports:
  - name: http-metrics
    port: 9116
    protocol: TCP
    targetPort: metrics
  selector:
    app: snmp-exporter

At this point you would have a pod in your cluster, attached to a static IP address.  To see if it worked you can check to make sure a service IP was created.  The service is basically what the Operator uses to create targets in Prometheus.

kubectl get sv

From this point you can 1) set up your own instance of Prometheus using Helm or by deploying via yml manifests or 2) set up the Prometheus Operator.

Today we will walk through option 2, although I will probably cover option 1 at some point in the future.

Setting up the Prometheus Operator

The beauty of using the Prometheus Operator is that it gives you a way to quickly add or change Prometheus specific configuration via the Kubernetes API (custom resource definition) and some custom objects provided by the operator, including AlertManager, ServiceMonitor and Prometheus objects.

The first step is to install Helm, which is a little bit outside of the scope of this post but there are lots of good guides on how to do it.  With Helm up and running you can easily install the operator and the accompanying kube-prometheus manifests which give you access to lots of extra Kubernetes metrics, alerts and dashboards.

helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/
helm install --name prometheus-operator --set rbacEnable=true --namespace monitoring coreos/prometheus-operator
helm install coreos/kube-prometheus --name kube-prometheus --namespace monitoring

After a few moments you can check to see that resources were created correctly as a quick test.

kubectl get pods -n monitoring

NOTE: You may need to manually add the “prometheus” service account to the monitoring namespace after creating everything.  I ran into some issues because Helm didn’t do this automatically.  You can check this with kubectl get events.

Prometheus Operator configuration

Below are steps for creating custom objects (CRDs) that the Prometheus Operator uses to automatically generate configuration files and handle all of the other management behind the scenes.

These objects are wired up in a way that configs get reloaded and Prometheus will automatically get updated when it sees a change.  These object definitions basically convert all of the Prometheus configuration into a format that is understood by Kubernetes and converted to Prometheus configuration with the operator.

First we make a servicemonitor for monitoring the the snmp exporter.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: snmp-exporter
    prometheus: kube-prometheus # tie servicemonitor to correct Prometheus
  name: snmp-exporter
spec:
  jobLabel: k8s-app
  selector:
    app: snmp-exporter
  namespaceSelector:
    matchNames:
    - monitoring

  endpoints:
  - interval: 60s
    port: http-metrics
    params:
      module:
      - if_mib # Select which SNMP module to use
      target:
      - 1.2.3.4 # Modify this to point at the SNMP target to monitor
    path: "/snmp"
    targetPort: 9116

Next, we create a custom alert and tie it our Prometheus Operator.  The alert doesn’t do anything useful, but is a good demo for showing how easy it is to add and manage alerts using the Operator.

Create an alert-example.yml configuration file, add it as a configmap to k8s and mount it in as a configuration with the ruleSelector label selector and the prometheus operator will do the rest. Below shows how to hook up a test rule into an existing Prometheus (kube-prometheus) alert manager, handled by the prometheus-operator.

kind: ConfigMap
apiVersion: v1
metadata:
 name: josh-test
 namespace: monitoring
 labels:
 role: alert-rules # Standard convention for organizing alert rules
 prometheus: kube-prometheus # tie to correct Prometheus
data:
 test.rules.yaml: |
 groups:
 - name: test.rules # Top level description in Prometeheus
 rules:
 - alert: TestAlert
 expr: vector(1)

Once you have created the rule definition via configmap just use kubectl to create it.

kubectl create -f alert-example.yml -n monitoring

Testing and troubleshooting

You will probably need to port forward the pod to get access to the IP and port in the cluster

kubectl port-forward snmp-exporter-<name> 9116

Then you should be able to visit the pod in your browser (or with curl).

localhost:9116

The exporter itself does a lot more so you will probably want to play around with it.  I plan on covering more of the details of other external exporters and more customized configurations for the SNMP exporter.

For example, if you want to do any sort of monitoring past basic interface stats, etc. you will need to generate and build your own set of MIBs to gather metrics from your infrastructure and also reconfigure your ServiceMonitor object in Kubernetes to use the correct MIBs so that the Operator updates the configuration correctly.

Conclusion

The amount of options for how to use Prometheus is one area of confusion when it comes to using Prometheus, especially for newcomers.  There are lots of ways to do things and there isn’t much direction on how to use them, which can also be viewed as a strength since it allows for so much flexibility.

In some situations it makes sense to use an external (non Operator managed Prometheus) when you need to do things like manage and tune your own configuration files.  Likewise, the Prometheus Operator is a great fit when you are mostly only concerned about managing and monitoring things inside Kubernetes and don’t need to do much external monitoring.

That said, there is some support for external monitoring using the Prometheus Operator, which I want to write about in a different post.  This support is limited to a handful of different external exporters (for the time being) so the best advice is to think about what kind of monitoring is needed and choose the best solution for your own use case.  It may turn out that both types of configurations are needed, but it may also end up being just as easy to use one method or another to manage Prometheus and its configurations.

Read More

Mount a volume using Ignition and Terraform

Sometimes when provisioning a server you may want to configure and provision storage as part of the bootstrapping and booting process.  For example, the other day I ran into an issue where I needed to define a disk, partition it, mount it to a specified location and then create a few directories in it.  It turned out to be surprisingly not straight forward to provision this storage and I learned quite a few things that I thought were worth sharing.

I’d just like to mention that ignition works like magic.  If you aren’t familiar, Ignition is basically a tool to help provision and configure servers, very similar to cloud-config except by default Ignition only runs once, on first boot.  The magic of Ignition is that it injects itself into initramfs before the OS ever eve boots and manipulating the system.  Ignition can be read in from  remote URL so that it can easily be provisioned in bare metal infrastructures.  There were several pieces to this puzzle.

The first was getting down all of the various ignition configuration components in Terraform.  Nothing was particularly complicated, there was just a lot of trial and error to get everything working.  Terraform has some really nice documentation for working with Ignition configurations, I’d recommend starting there and just playing around to figure out some of the various bits and pieces of configuration that Ignition can do.  There is some documentation on Ignition troubleshooting as well which I found to be helpful when things weren’t working correctly.

Below each portion of the Ignition configuration gets declared inside of a “ignition_config” block.  The Ignition configuration then points towards each invidual component that we want Ignition to configure. e.g. systemd, filesystem, directories, etc.

data "ignition_config" "staging_rancher_host_stateful" {
  systemd = [
     "${data.ignition_systemd_unit.mount_data.id}",
  ]

  filesystems = [
    "${data.ignition_filesystem.data_fs.id}",
  ]

  directories = [
    "${data.ignition_directory.data_dir.id}",
  ]

  disks = [
    "${data.ignition_disk.data_disk.id}",
  ]
}

This part of the setup is pretty straight forward.  Create a data block with the needed ignition configuration to mount the disk to the correct location,  format the device if it hasn’t already been formatted and create the desired directory and then create the Systemd unit to configure the mount point for the OS.  Here’s what each of the data blocks might look like.

data "ignition_filesystem" "data_fs" {
   name = "data"

  mount {
    device = "/dev/xvdb1"
    format = "ext4"
  }
}

data "ignition_directory" "data_dir" {
  filesystem = "data"
  path = "/data"
  uid = 500
  gid = 500
}

data "ignition_disk" "data_disk" {
  device = "/dev/xvdb"

  partition {
    number = 1
    start = 0
    size = 0
  }
}

Next, create the Systemd unit.

data "ignition_systemd_unit" "mount_data" {
  content = "${file("./data.mount")}"
  name = "data.mount"
}

Another challenge was getting the Systemd unit to mount the disk correctly.  I don’t work with Systemd frequently so initially had some trouble figuring this part out.  Basically, Systemd expects the service/unit definition name to EXACTLY match what’s declared inside the “Where” clause of the service definition.

For example, the following configuration needs to be named data.mount because that is what is defined in the service.

[Unit]
Description=Mount /data
Before=local-fs.target

[Mount]
What=/dev/xvdb1
Where=/data
Type=ext4

[Install]
WantedBy=local-fs.target

After all the kinks have been worked out of the Systemd unit(s) and other above Terraform Ignition configuration you should be able to deploy this and have Ignition provision disks for you automatically when the OS comes up.  This can be extended as much as needed for getting initial disks  set up correctly and is a huge step in automating your infrastructure in a nice repeatable way.

There is currently an open issue with Ignition currently where it breaks when attempting to re-provision a previously configured disk on a new machine.  Basically the Ignition process chokes because it sees the device has already been partitioned and formatted and can’t do it again.  I ran into this scenario where I was trying to create a basically floating persistent data EBS volume that gets attached to servers in an autoscaling group and wanted to allow the volume to be able to move around freely if the server gets killed off.

Read More

Configure S3 to store load balancer logs using Terraform

If you’ve ever encountered the following error (or similar) when setting up an AWS load balancer to write its logs to an s3 bucket using Terraform then you are not alone.  I decided to write a quick note about this problem because it is the second time I have been bitten by this and had to spend time Googling around for an answer.  The AWS documentation for creating and attaching the policy makes sense but the idea behind why you need to do it is murky at best.

aws_elb.alb: Failure configuring ELB attributes: InvalidConfigurationRequest: Access Denied for bucket: <my-bucket> Please check S3bucket permission
status code: 409, request id: xxxx

For reference, here are the docs for how to manually create the policy by going through the AWS console.  This method works fine for manually creating and attaching to the policy to the bucket.  The problem is that it isn’t obvious why this needs to happen in the first place and also not obvious to do in Terraform after you figure out why you need to do this.  Luckily Terraform has great support for IAM, which makes it easy to configure the policy and attach it to the bucket correctly.

Below is an example of how you can create this policy and attach it to your load balancer log bucket.

data "aws_elb_service_account" "main" {}

data "aws_iam_policy_document" "s3_lb_write" {
    policy_id = "s3_lb_write"

    statement = {
        actions = ["s3:PutObject"]
        resources = ["arn:aws:s3:::<my-bucket>/logs/*"]

        principals = {
            identifiers = ["${data.aws_elb_service_account.main.arn}"]
            type = "AWS"
        }
    }
}

Notice that you don’t need to explicitly define the principal like you do when setting up the policy manually.  Just use the ${data.aws_elb_service_account.main.arn} variable and Terraform will figure out the region that the bucket is in and pick out the correct parent ELB ID to attach to the policy.  You can verify this by checking the table from the link above and cross reference it with the Terraform output for creating and attaching the policy.

You shouldn’t need to update anything in the load balancer config for this to work, just rerun the failed command again and it should work.  For completeness here is what that configuration might look like.

...
access_logs {
    bucket = "${var.my_bucket}"
    prefix = "logs"
    enabled = true
}
...

This process is easy enough but still begs the question of why this seemingly unnecessary process needs to happen in the first place?  After searching around for a bit I finally found this:

When Amazon S3 receives a request—for example, a bucket or an object operation—it first verifies that the requester has the necessary permissions. Amazon S3 evaluates all the relevant access policies, user policies, and resource-based policies (bucket policy, bucket ACL, object ACL) in deciding whether to authorize the request.

Okay, so it basically looks like when the load balancer gets created, the load balancer gets associated with an AWS owned ID, which we need to explicitly give permission to, through IAM policy:

If the request is for an operation on an object that the bucket owner does not own, in addition to making sure the requester has permissions from the object owner, Amazon S3 must also check the bucket policy to ensure the bucket owner has not set explicit deny on the object

Note

A bucket owner (who pays the bill) can explicitly deny access to objects in the bucket regardless of who owns it. The bucket owner can also delete any object in the bucket.

There we go.  There is a little bit more information in the link above but now it makes more sense.

Read More

Organization and time management tricks

As part of my reflection over the past year, one item I decided to focus on was getting better at goal setting and following through with goals.  As a result, I decided to do some research on tools to help organize tasks.  I have been using Kanban style boards for organizing work for many years now, so thought it might be a good fit for organizing my personal life.  And so far, it has been great.  There is a section later in this point that has more details on my current at home Trello flow if anyone is interested.

As I started down this path, I noticed that many areas of organization and time management skills are commonly neglected, both in work and in day to day life at home, so thought I’d share some other tricks I have learned along the way.  Keeping on top of the surrounding chaos is a skill that is acquired, and therefore is something that needs to be worked at regularly.  Just like getting better at anything else, the process requires constant feeding and attention.  My own workflow and style is still constantly evolving, as I still try new tools and techniques regularly.  As always, if you have any good advice or useful tools please feel free to share them.

In addition to my Trello discovery, this post will mostly be a brain dump of some of my thoughts and ideas about the processes and tools that I have come to enjoy using in my daily planning, various workflows, and just my overall approach to time management and tasks.  I realize that everyone is different, with their own styles for breaking down work and organizing their thoughts and ideas so don’t take this post as the best way to do any one thing.  I just want to point out things that I have found to work over the years and hopefully people can take bits and pieces that fit well into their current way of managing the chaos.

For whatever reason, I have always enjoyed organization, attention to detail and meticulous planning.  As such, I have read quite a bit about organization and prioritization, including the book that really kick started my interest in getting organized professionally – Time Management for System Administrators: Stop Working Late and Start Working Smart, which I highly recommend.

The conclusion I have come to is that there isn’t one right way to do things for everybody.  You have to experiment with different approaches until you find a set of tools and workflows that suits your tastes and makes your more productive.  Most importantly, if somebody already has a tool they really like, don’t force them to change the way they do things.  Instead, re-fit your workflow to accommodate.

Connecting the dots

The most important aspect I have found for creating your own style of organization is to experiment and just use what works best for you.  The tools and apps I mention in this post all work well for me.  As an example, one thing that is important to me is that I can access all of my tools, notes, tasks, whatever easily across different devices.  Having the ability to view Trello boards is great when I’m on the go but need a quick reference for something I’m working on.  Likewise, I pretty much only use Keep to track and update my to-do list items or add new events to calendars, but these tools are also available on any computer I use if I need to look at something quickly or update a list.

A number of these tools also integrate with each other, which helps further improve organization.  For example, Google apps and Trello integrate nicely, so Google calendars can be configured to keep track of various Trello boards.  Trello can also hook into Google Drive so that online documents can be attached to cards.  There are a number of other Trello integrations which I haven’t explored yet but look useful, including Zapier, Dropbox, Slack, Outlook, Gmail, etc.  Since Trello is so flexible it can be tailored to work with other tools and workflows.

Simple to-do lists

Simple lists for keeping track of lists and small daily tasks has become invaluable to helping me stay on top of my responsibilities.  I keep these quick lists on my phone in Google Keep and none of them usually ever have more than 5 items at a time.  If I see that the task is more complicated I will move it into something that is easier to organize and prioritize.  For example, I might add a to-do item for fixing the dishwasher.  I have discovered that lists can quickly get cluttered and become messy, which will basically work against you and make you not want to use them.

Therefore, if any task or item in my to-do list takes more than a few hours of messing with and/or parts need to get ordered or prioritized or anything that can get more complicated and take more than a day to do, I will move it off the freshly minted to-do list into Trello.

Trello

As described above, Trello is a great tool for organizing a variety of different activities.  Trello is flexible so can be used for any number of use cases you can imagine, which allows for some great use cases.  As I said, my main objective for using Trello was to more effectively track my goals and tasks as well as those of my family.

In the past I have tried to use documentation and note taking apps to organize and prioritize things I would categorize as “projects” but it has just never stuck very well.  Trello really adds another dimension to organization and prioritization of tasks that you just can’t get with something like OneNote, even though I love OneNote.  These tools just slightly overlap I would say and Trello is just much better suited towards driving work.

My board relies heavily on labels for doing different kinds of filtering between goals and tasks.  If I want my wife to know what’s going on with trip planning for example, I can add a label for her and I can also add her as a subscriber so that she can easily receive updates/changes via email, or if she wants to look at all the progress just open up the card (the mobile version is really good at this).

family trello board

Notes, attachments, lists, etc, all are being used so I am least leveraging some of the other features that make Trello useful.

All said and done, I am very happy with how easy Trello is to use and how easy it is to track progress of my various happenings.  The board I came up with is fairly simple, but I’m looking forward to getting better at it and adding “features” in the future.  I know that Trello can do much more for project and task planning so I will be looking into beefing up my board with some of the integrations.  For now, the bones are there at least.

OneNote

I have written about OneNote and technical documention in the past so won’t cover it in a lot of detail here.  Suffice it to say though, I use this tool on pretty much a daily basis.  Pretty much everything I write down goes into OneNote.  Including all of my notes, documentation, thoughts, journals, etc.  Any time I have an idea or take a note, it goes into OneNote.  I like to have separate notebooks for organizing anything from my personal journal to notes about finance and investing to technical documentation and useful links in my daily work.

OneNote is great at handling things likes screenshots, checklists

One word of wisdom if you are thinking of using a note taking app for organizing notes.  Your notes really need an INTENSE amount love and care to make them useful and easy to reference, which means constantly updating them, moving pages around, as well as deleting and merging pages that are no longer relevant.  The notes are a living and breathing document and are pretty much in a constant state of change.  If you don’t keep your thoughts organized and up to date then things will quickly get unwieldy.

I have used Evernote in the past but strongly prefer OneNote because it matches my organization style more closely and the sharing and syncing features fit well with all my other workflows.

Google Calendar

Getting a good calendar app to keep track of daily time and date commitments is crucial.

I really like Google calendar because it is easy to use, is very portable and cross platform.  For me, everything that has a due date goes into Google calendar.  As an added bonus I can easily share my calendar out to family members as well (assuming they also use Google) and can view their calendars which allows everyone to stay in sync that needs to and helps tremendously with keeping on top of things that are going on, day to day.

There is a fantastic section that covers the topic of using a calendar to stay organized in the Time Management for Systems Administrators book, which I highly recommend if you aren’t really sure how or where to get started with organizing your own calendar.

Honorable mentions

Here are some of the other tools I have found useful as organization and productivity tools but don’t really fit well as time management tools.  If you are interested, feel free to check them out.

Read More

Bash tricks

bash

Update 2/18/18 – add some handy alt shortcuts

Bash is great.  As I have discovered over the years, Bash contains many different layers, like a good movie or a fine wine.  It is fun to explore and expose these different layers and find uses for them.  As my experience level has increased, I have (slowly) uncovered a number of these features of Bash that make life easier and worked to incorporate them in different ways into my own workflows and use them within my own style.

The great thing about fine arts, Bash included, is that there are so many nuances and for Bash, a huge number of features and uses, which makes the learning process that much more fun.

It does take a lot of time and practice to get used to the syntax and to become effective with these shortcuts.  I use this page as a reference whenever I think of something that sounds like it would be useful and could save time in a script or a command.  At first, it may take more time to look up how to use these shortcuts, but eventually, with practice and drilling will become second nature and become real time savers.

Shell shortcuts

Navigating the Bash shell is easy to do.  But it takes time to learn how to do well.  Below are a number of shortcuts that make the navigation process much more efficient.  I use nearly all of the shortcuts daily (except Ctrl + t and Ctrl + xx, which I only recently discovered).  In a similar vein, I wrote a separate post long ago about setting up CLI shortcuts on iterm that can further augment the capabilities of the CLI.

This is a nice reference with more examples and features

  • Ctrl + a => Return to the start of the command you’re typing
  • Ctrl + e => Go to the end of the command you’re typing
  • Ctrl + u => Cut everything before the cursor to a special clipboard
  • Ctrl + k => Cut everything after the cursor to a special clipboard
  • Ctrl + y => Paste from the special clipboard that Ctrl + u and Ctrl + k save their data to
  • Ctrl + t => Swap the two characters before the cursor (you can actually use this to transport a character from the left to the right, try it!)
  • Ctrl + w => Delete the word / argument left of the cursor
  • Ctrl + l => Clear the screen
  • Ctrl + _ => Undo previous key press
  • Ctrl + xx => Toggle between current position and the start of the line

There are some nice Alt key shortcuts in Linux as well.  You can map the alt key in OSX pretty easily to unlock these shortcuts.

  • Alt + l => Uncapitalize the next word that the cursor is under (If the cursor is in the middle of the the word it will capitalize the last half of the word).
  • Alt + u => Capitalize the word that the cursor is under
  • Alt + t => Swap words or arguments that the cursor is under with the previous
  • Alt + . => Paste the last word of the previous command
  • Alt + b => Move backward one word
  • Alt + f => Move forward one word
  • Alt + r => Undo any changes that have been done to the current command

Argument tricks

Argument tricks can help to grow the navigation capabilities that Bash shortcuts provide and can even further speed up your effectiveness in the terminal.  Below is a list of special arguments that can be passed to any command that can be expanded into various commands.

Repeating

  • !! => Repeat the previous (full) command
  • !foo => Repeat the most recent command that starts with ‘foo‘ (e.g. !ls)
  • !^ => Repeat the first argument of the previous command
  • !$ => Repeat the last argument of the previous command
  • !* => Repeat all arguments of last command
  • !:<number> => Repeat a specifically positioned argument
  • !:1-2 => Repeat a range of arguments

Printing

  • !$:p => Print out the word that !$ would substitute
  • !*:p => Print out the previous command except for the last word
  • !foo:p =>Print out the command that !foo would run

Special parameters

When writing scripts , there are a number of special parameters you can feed into the shell.  This can be convenient for doing lots of different things in scripts.  Part of the fun of writing scripts and automating things is discovering creative ways to fit together the various pieces of the puzzle in elegant ways.  The “special” parameters listed below can be seen as pieces of the puzzle, and can be very powerful building blocks in your scripts.

Here is a full reference from the Bash documentation

  • $* => Expand parameters. Expands to a single word for each parameter separated by IFS delimeter – think spaces
  • $@ => Expand parameters. Each parameter expand to a separate word, enclosed by “” –  think arrays
  • $# => Expand the number of parameters of a command
  • $? => Expand the exit status of the previous command
  • $$ => Expand the pid of the shell
  • $! => Expand the pid of the most recent command
  • $0 => Expand the name of the shell or script
  • $_ => Expand the last previous argument

Conclusion

There are some many crevices and cracks of Bash to explore, I keep finding new and interesting things about Bash that lead down new paths and help my skills grow.  I hope some of these tricks give you some ideas that can help and improve your own Bash style and workflows in the future.

Read More