Monitoring email flow with MFM

May 1, 2017 Josh Reichardt Leave a comment

This is a sponsored post by the folks over at EveryCloud. They have recently developed and released a new tool to help manage and troubleshoot email issues, which is starting to get some traction, especially among Exchange environments. As a mail admin in a previous life, I can sympathize with desire for better monitor tools. Here’s their post.

Managing mail flow is a challenge for every systems administrator, and the price of a mistake is very high. Any interruption in mail flow can spell disaster for a company, disrupting daily operations and leaving the management team, the IT team and the systems administrator scrambling for solutions.

While there are a number of mail flow solutions on the market, they tend to be quite pricey, making it difficult for systems administrators, especially those who work for small businesses and start-ups, to justify the cost.

For those who do not already know, the makers behind the EveryCloud mail flow monitor have recently launched a free service – Mail Flow Monitor (MFM). EveryCloud MFM tool is the only free round-trip mail flow monitor on the market, giving systems administrators the ability to observe their organizations’ email systems 24 hours a day, 7 days a week and 365 days a year, all without spending a penny.

Some of the features of Mail Flow Monitor include:

A full-featured round trip monitor, with start-to-finish email tracking and monitoring
Systems administrators can receive real-time text and email alerts whenever a delay or rejection occurs – to your cell phone as well an email or to your alternative email address.
Timely monitoring means issues can be addressed quickly, before they spiral out of control
The system sends a test email every few minutes to a monitoring mailbox on your server. You set up a forward to send the emails back and the Everycloud team does the rest.
MFM is cloud based, which means there is nothing to update or manage.
MSP’s and IT Resellers can create an account and manage as many customers as they wish via the EveryCloud Partner Area, all completely free!

When you consider that competing mail filtering solutions generally cost about $30 a month, it is easy to see the saving potential. That $360 annual cost savings may not seem like much, but since it is assessed on a domain level, the charges can add up quickly. In addition, the per-domain charges can make managing a complex IT operation difficult, an extra level of hard work that systems administrators do not need.

From the smallest startups to the largest multinational corporations, modern businesses live and die on their email. An unexpected email breakdown, significant bottleneck or major failure could make the firm’s email inaccessible and unreliable for hours or even days, and every minute of downtime is costing the company money.

Tips for monitoring Rancher Server

April 17, 2017April 17, 2017 Josh Reichardt Leave a comment

Last week I encountered an interesting bug in Rancher that managed to cause some major problems across my Rancher infrastructure. Basically, the bug was causing of the Rancher agent clients to continuously bounce between disconnected/reconnected/finished and reconnecting states, which only manifested itself either after a 12 hour period or by deactivating/activating agents (for example adding a new host to an environment). The only way to temporarily fix the issue was to restart the rancher-server container.

With some help, we were eventually able to resolve the issue. I picked up a few nice lessons along the way and also became intimately acquainted with some of the inner workings of Rancher. Through this experience I learned some tips on how to effectively monitor the Rancher server environment that I would otherwise not have been exposed to, which I would like to share with others today.

All said and done, I view this experience as a positive one. Hitting the bug has not only helped mitigate this specific issue for other users in the future but also taught me a lot about the inner workings of Rancher. If you’re interested in the full story you can read about all of the details about the incident, including steps to reliably reproduce and how the issue was ultimately resolved here. It was a bug specific to Rancher v1.5.1-3, so upgrading to 1.5.4 should fix this issue if you come across it.

Before diving into the specifics for this post, I just want to give a shout out to the Rancher community, including @cjellik, @ibuildthecloud, @ecliptok and @shakefu. The Rancher developers, team and community members were extremely friendly and helpful in addressing and fixing the issue. Between all the late night messages in the Rancher slack, many many logs, countless hours debugging and troubleshooting I just wanted to say thank you to everyone for the help. The small things go a long way, and it just shows how great the growing Rancher community is.

Effective monitoring

I use Sysdig as the main source of container and infrastructure monitoring. To accomplish the metric collection, I run the Sysdig agent as a systemd service when a server starts up so when a server dies and goes away or a new one is added, Sysdig is automatically started up and begins dumping that metric data into the Sysdig Cloud for consumption through the web interface.

I have used this data to create custom dashboards which gives me a good overview about what is happening in the Rancher server environment (and others) at any given time.

The other important thing I discovered through this process, was the role that the Rancher database plays. For the Rancher HA setup, I am using an externally hosted RDS instance for the Rancher database and was able to fine found some interesting correlations as part of troubleshooting thanks to the metrics in Sysdig. For example, if the database gets stressed it can cause other unintended side effects, so I set up some additional monitors and alerts for the database.

Luckily Sysdig makes the collection of these additional AWS metrics seamless. Basically, Sysdig offers an AWS integration which pull in CloudWatch metrics and allows you to add them to dashboards and alert on them from Sysdig, which has been very nice so far.

Below are some useful metrics in helping diagnose and troubleshoot various Rancher server issues.

Memory usage % (server)
CPU % (server)
Heap used over time (server)
Number of network connections (server)
Network bytes by application (server)
Freeable memory over time (RDS)
Network traffic over time (RDS)

As you can see, there are quite a few things you can measure with metrics alone. Often though, this isn’t enough to get the entire picture of what is happening in an environment.

Logs

It is also important to have access to (useful) logs in the infrastructure in order to gain insight into WHY metrics are showing up the way they do and also to help correlate log messages and errors to what exactly is going on in an environment when problems occur. Docker has had the ability for a while now to use log drivers to customize logging, which has been helpful to us. In the beginning, I would just SSH into the server and tail the logs with the “docker logs” command but we quickly found that to be cumbersome to do manually.

One alternative to tailing the logs manually is to configure the Docker daemon to automatically send logs to a centralized log collection system. I use Logstash in my infrastructure with the “gelf” log driver as part of the bootstrap command that runs to start the Rancher server container, but there are other logging systems if Logstash isn’t the right fit. Here is what the relevant configuration looks like.

...
--log-driver=gelf \
--log-opt gelf-address=udp://<logstash-server>:12201 \
--log-opt tag=rancher-server \
...

Just specify the public address of the Logstash log collector and optionally add tags. The extra tags make filtering the logs much easier, so I definitely recommend adding at least one.

Here are a few of the Logstash filters for parsing the Rancher logs. Be aware though, it is currently not possible to log full Java stack traces in Logstash using the gelf input.

if [tag] == "rancher-server" {
    mutate { remove_field => "command" }
    grok {
      match => [ "host", "ip-(?<ipaddr>\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3})" ]
    }

    # Various filters for Rancher server log messages
    grok {
     match => [ "message", "time=\"%{TIMESTAMP_ISO8601}\" level=%{LOGLEVEL:debug_level} msg=\"%{GREEDYDATA:message_body}\"" ]
     match => [ "message", "%{TIMESTAMP_ISO8601} %{WORD:debug_level} (?<context>\[.*\]) %{GREEDYDATA:message_body}" ]
     match => [ "message", "%{DATESTAMP} http: %{WORD:http_type} %{WORD:debug_level}: %{GREEDYDATA:message_body}" ]
   }
 }

There are some issues open for addressing this, but it doesn’t seem like there is much movement on the topic, so if you see a lot of individual messages from stack traces that is the reason.

One option to mitigate the problem of stack traces would be to run a local log collection agent (in a container of course) on the rancher server host, like Filebeat or Fluentd that has the ability to clean up the logs before sending it to something like Logstash, ElasticSearch or some other centralized logging. This approach has the added benefit of adding encryption to the logs, which GELF does not have (currently).

If you don’t have a centralized logging solution or just don’t care about rancher-server logs shipping to it – the easiest option is to tail the logs locally as I mentioned previously, using the json-file log format. The only additional configuration I would recommend to the json-file format is to turn on log rotation which can be accomplished with the following configuration.

...
 --log-driver=json-file \
 --log-opt max-size=100mb \
 --log-opt max-file=2 \
...

Adding these logging options will ensure that the container logs for rancher-server will never full up the disk on the server.

Bonus: Debug logs

Additional debug logs can be found inside of each rancher-server container. Since these debug logs are typically not needed in day to day operations, they are sort of an easter egg, tucked away. To access these debug logs, they are located in /var/lib/cattle/logs/ inside of the rancher-server container. The easiest way to analyze the logs is to get them off the server and onto a local machine.

Below is a sample of how to do this.

docker exec -it <rancher-server> bash
cd /var/lib/cattle/logs
cp cattle-debug.log /tmp

Then from the host that the container is sitting on you can docker cp the logs out of the container and onto the working directory of the host.

docker cp <rancher-server>:/tmp/cattle-debug.log .

From here you can either analyze the logs in a text editor available on the server, or you can copy the logs over to a local machine. In the example below, the server uses ssh keys for authentication and I chose to copy the logs from the server into my local /tmp directory.

 scp -i ~/.ssh/<rancher-server-pem> user@rancher-server:/tmp/cattle-debug.log /tmp/cattle-debug.log

With a local copy of the logs you can either examine the logs using your favorite text editor or you can upload them elsewhere for examination.

Conclusion

With all of our Rancher server metrics dumping into Sysdig Cloud along with our logs dumping into Logstash it has made it easier for multiple people to quickly view and analyze what was going on with the Rancher servers. In HA Rancher environments with more than one rancher-server running, it also makes filtering logs based on the server or IP much easier. Since we use 2 hosts in our HA setup we can now easily filter the logs for only the server that is acting as the master.

As these container based grow up, they also become much more complicated to troubleshoot. With better logging and monitoring systems in place it is much easier to tell what is going on at a glance and with the addition of the monitoring solution we can be much more proactive about finding issues earlier and mitigating potential problems much faster.

Dockerizing Sentry

December 24, 2015December 25, 2015 Josh Reichardt Leave a comment

I have created a Github project that has basic instructions for getting started. You can take a look over there for ideas of how all of this works and to get ideas for your own set up.

I used the following links as reference for my approach to Dockerizing Sentry.

https://registry.hub.docker.com/u/slafs/sentry
https://github.com/rchampourlier/docker-sentry

If you have configurations to use, it is probably a good idea to start from there. You can check my Github repo for what a basic configuration looks like. If you are starting from scratch or are using version 7.1.x or above you can use the “sentry init” command to generate a skeleton configuration to work from.

For this setup to work you will need the following prebuilt Docker images/containers. I suggest using something simple like docker-compose to stitch the containers together.

redis – https://registry.hub.docker.com/_/redis/
postgres – https://registry.hub.docker.com/_/postgres/
memcached – https://hub.docker.com/_/memcached/
nginx – https://hub.docker.com/_/nginx/

NOTE: If you are running this on OS X you may need to do some trickery and give special permission on the host (mac) level e.g. create ~/docker/postgres directory and give it the correct permission (I just used 777 recursively for testing, make sure to lock it down if you put this in production).

I wrote a little script in my Github project that will take care of setting up all of the directories on the host OS that need to be set up for data to persist. The script also generates a self signed cert to use for proxying Sentry through Nginx. Without the certificate, the statistics pages in the Sentry web interface will be broken.

To run the script, run the following command and follow the prompts. Also make sure you have docker-compose installed beforehand to run all the needed command.

sudo ./setup.sh

The certs that get generated are self signed so you will see the red lock in your browser. I haven’t tried it yet but I imagine using Let’s Encrytpt to create the certificates would be very easy. Let me know if you have had any success generating Nginx certs for Docker containers, I might write a follow up post.

Preparing Postgres

After setting up directories and creating certificates, the first thing necessary to getting up and going is to add the Sentry superuser to Postgres (at least 9.4). To do this, you will need to fire up the Postgres container.

docker-compose up -d postgres

Then to connect to the Postgres DB you can use the following command.

docker-compose run postgres sh -c 'exec psql -h "$POSTGRES_PORT_5432_TCP_ADDR" -p "$POSTGRES_PORT_5432_TCP_PORT" -U postgres'

Once you are logged in to the Postgres container you will need to set up a few Sentry DB related things.

First, create the role.

CREATE ROLE sentry superuser;

And then allow it to login.

ALTER ROLE sentry WITH LOGIN;

Create the Sentry DB.

CREATE DATABASE sentry;

When you are done in the container, \q will drop out of the postgresql shell.

After you’re done configuring the DB components you will need to “prime” Sentry by running it a first time. This will probably take a little bit of time because it also requires you to build and pull all the other needed Docker images.

docker-compose build
docker-compose up

You will quickly notice if you try to browse to the Sentry URL (e.g. the IP/port of your Sentry container or docker-machine IP if you’re on OS X) that you will get errors in the logs and 503’s if you hit the site.

Repair the database (if needed)

To fix this you will need to run the following command on your DB to repair it if this is the first time you have run through the set up.

docker-compose run sentry sentry upgrade

The default Postgres database username and password is sentry in this setup, as part of the setup the upgrade prompt will ask you got create a new user and password, and make note of what those are. You will definitely want to change these configs if you use this outside of a test or development environment.

After upgrading/preparing the database, you should be able to bring up the stack again.

docker-compose up -d && docker-compose logs

Now you should be able to get to the Sentry URL and start configuring . To manage the username/password you can visit the /admin url and set up the accounts.

Next steps

The Sentry server should come up and allow you in but will likely need more configuration. Using the power of docker-compose it is easy to add in any custom configurations you have. For example, if you need to adjust sentry level configurations all you need to do is edit the file in ./sentry/sentry.conf.py and then restart the stack to pick up the changes. Likewise, if you need to make changes to Nginx or celery, just edit the configuration file and bump the stack – using “docker-compose up -d”.

I have attempted to configure as many sane defaults in the base config to make the configuration steps easier. You will probably want to check some of the following settings in the sentry/sentry.conf.py file.

SENTRY_ADMIN_EMAIL – For notifications
SENTRY_URL_PREFIX – This is especially important for getting stats working
SENTRY_ALLOW_ORIGIN – Where to allow communications from
ALLOWED_HOSTS – Which hosts can communicate with Sentry

If you have the SENTRY_URL_PREFIX set up correctly you should see something similar when you visit the /queue page, which indicates statistics are working.

If you want to set up any kind of email alerting, make sure to check out the mail server settings.

docker-compose.yml example file

The following configuration shows how the Sentry stack should look. The meat of the logic is in this configuration but since docker-compose is so flexible, you can modify this to use any custom commands, different ports or any other configurations you may need to make Sentry work in your own environment.

# Caching
redis:
  image: redis:2.8
  hostname: redis
  ports:
    - "6379:6379"
   volumes:
     - "/data/redis:/data"

memcached:
  image: memcached
  hostname: memcached
  ports:
    - "11211:11211"

# Database
postgres:
  image: postgres:9.4
  hostname: postgres
  ports:
    - "5432:5432"
  volumes:
    - "/data/postgres/etc:/etc/postgresql"
    - "/data/postgres/log:/var/log/postgresql"
    - "/data/postgres/lib/data:/var/lib/postgresql/data"

# Customized Sentry configuration
sentry:
  build: ./sentry
  hostname: sentry
  ports:
    - "9000:9000"
    - "9001:9001"
  links:
    - postgres
    - redis
    - celery
    - memcached
  volumes:
    - "./sentry/sentry.conf.py:/home/sentry/.sentry/sentry.conf.py"


# Celery
celery:
  build: ./sentry
  hostname: celery
  environment:
    - C_FORCE_ROOT=true
  command: "sentry celery worker -B -l WARNING"
  links:
    - postgres
    - redis
    - memcached
  volumes:
    - "./sentry/sentry.conf.py:/home/sentry/.sentry/sentry.conf.py"

# Celerybeat
celerybeat:
  build: ./sentry
  hostname: celerybeat
  environment:
    - C_FORCE_ROOT=true
  command: "sentry celery beat -l WARNING"
  links:
    - postgres
    - redis
  volumes:
    - "./sentry/sentry.conf.py:/home/sentry/.sentry/sentry.conf.py"

# Nginx
nginx:
  image: nginx
  hostname: nginx
  ports:
    - "80:80"
    - "443:443"
  links:
    - sentry
  volumes:
    - "./nginx/sentry.conf:/etc/nginx/conf.d/default.conf"
    - "./nginx/sentry.crt:/etc/nginx/ssl/sentry.crt"
    - "./nginx/sentry.key:/etc/nginx/ssl/sentry.key"

The Dockerfiles for each of these component are fairly straight forward. In fact, the same configs can be used for the Sentry, Celery and Celerybeat services.

Sentry

# Kombu breaks in 2.7.11
FROM python:2.7.10

# Set up sentry user
RUN groupadd sentry && useradd --create-home --home-dir /home/sentry -g sentry sentry
WORKDIR /home/sentry

# Sentry dependencies
RUN pip install \
 psycopg2 \
 mysql-python \
 supervisor \
 # Threading
 gevent \
 eventlet \
 # Memcached
 python-memcached \
 # Redis
 redis \
 hiredis \
 nydus

# Sentry
ENV SENTRY_VERSION 7.7.4
RUN pip install sentry==$SENTRY_VERSION

# Set up directories
RUN mkdir -p /home/sentry/.sentry \
 && chown -R sentry:sentry /home/sentry/.sentry \
 && chown -R sentry /var/log

# Configs
COPY sentry.conf.py /home/sentry/.sentry/sentry.conf.py

#USER sentry
EXPOSE 9000/tcp 9001/udp

# Making sentry commands easier to run
RUN ln -s /home/sentry/.sentry /root

CMD sentry --config=/home/sentry/.sentry/sentry.conf.py start

Since the customized Sentry config is rather lengthy, I will point you to the Github repo again. There are a few values that you will need to provide but they should be pretty self explanatory.

Once the configs have all been put in to place you should be good to go. A bonus piece would be to add an Upstart service that takes care of managing the stack if the server either gets rebooted or the containers manage to get stuck in an unstable state. The configuration is a fairly easy thing to do and many other guides and posts have been written about how to accomplish this.

Graphite threshold alerting with Sensu

September 8, 2015 Josh Reichardt 1 Comment

Instrumenting your code to report application level metrics is definitely one of the most powerful monitoring tasks you can accomplish. It is damn satisfying to get working the first time as well. Having the ability to look at your application and how it is performing at a granular level can help identify potential issues or bottlenecks but can also give you a greater understanding of how people are interacting with the application at a broad scale. Everybody loves having these types of metrics to talk about their apps and products so this style of monitoring is a great win for the whole team.

I don’t want to dive in to the specifics of WHAT you should monitor here, that will be unique to every environment. Instead of covering the what and how of instrumenting the code to report specific metrics, I will be running through an example of what the process might look like for instrumenting a check and alarm for monitoring and alerting purposes at an operations level. I am not a developer, so I don’t spend a lot of time thinking about what types of things are important to collect metrics on. Usually my job instead is to figure out how to monitor and alert effectively, based on the metrics that developers come up with.

Sensu has a great plugin to check Graphite thresholds in their plugin repo. If you haven’t looked already, take a minute to glance over the options a little bit and see how the plugin works. It is a pretty simple plugin but has been able to do everything I need it to.

One common monitoring task is to check how long requests are taking. So in this example, we are querying the Graphite server and reporting a critical status (status 1) if the request averages more than 7 seconds for a response time.

Here is the command you would run manually to check this threshold. Make sure to download the script if you haven’t already, you can just copy the code directly or clone the repo if you are doing this manually. If you are using Sensu you can use the sensu_plugin LWRP to grab the script (more on that below).

./check-data -s <servername:port> -t <graphite query> -c 7000 -u user -p password
./check-data -s graphite.example.com -t alias(stats.timer.server.response_time.mean, 'Mean') -c 7000 -u myuser -p awesomepassword

There are a few things to note. The -s flag specifies which graphite server or endpoint to hit, -t specifies the target or the graphite query to run the script against, the -c flag sets the threshold, -u and -p are used if your Graphite server uses authentication. If your Graphite instance is public it should probably use auth, otherwise if it is internal only, probably not as important. Obviously these are just dummy values, included to give you a better idea of what a real command should look like. Use your own values in their place.

The query we’re running is against a statsd metric that for mean response time for a request that gets recorded from the code (this is the developer instrumenting their code part I mentioned). This check is specific to my environment so you will need to modify any of your queries to make sure to alert on a useful metric and threshold in your own environment.

Here’s an example of what the graphite graph (rendered in Grafana) looks like.

Obviously this is just a sample but it should give you the general idea of what to look for.

If you examine the script, there are a few Ruby Gem requirements to get the script to run, which you will need to be installed if you haven’t already. They are sensu-plugin, json, json-uri and openssl. You don’t need the sensu-plugin if you are just running the check manually but you WILL need to have it installed on the Sensu client that will be running the scheduled check. That can be done manually or with the Sensu Chef recipe (specifically for turning on Sensu embedded ruby and ruby gems), which I recommend using anyway if you plan on doing any type of deployments at scale using Sensu.

Here is the Chef code looks like if you use Sensu to deploy this check automatically.

sensu_check "check_request_time" do 
  command "#{node['sensu']['plugindir']}/check-data.rb -s graphite.example.com -t \"alias(stats.timers.server.facedetection.response_time.mean, 'Mean')\" -c 7000 -a 360 -u myuser -p awesomepassword"
  handlers ["pagerduty", "slack"] 
  subscribers ["core"] 
  interval 60 
  standalone true 
  additional(:notification => "Request time above threshold", :occurrences => 5)
end

This should look familiar if you have any background using the Sensu Chef cookbook. Basically we are using the sensu_check LWRP to execute the script with the different parameters we want, using the pagerduty and slack handlers, which are just fancy ways to pipe out the results of the check. We are also saying we want to run this on a scheduled interval time of 60 seconds as a standalone check, which means it will be executed on the client node (not the Sensu server itself). Finally, we are saying that after 5 failed checks we want to append a message to the handler that says what exactly is going wrong.

You can stick this logic in an existing recipe or create a new one that handles your metrics threshold checks. I’m not sure what the best practice is for where to put the check but I have a recipe that runs standalone threshold checks that I stuck this logic in to and it seems to work. Once the logic has been added you should be able to run chef-client for the new check to get picked up.

DevOps Conferences

May 27, 2015August 31, 2015 Josh Reichardt 2 Comments

I did a post quite awhile ago that highlighted some of the cooler system admin and operations oriented conferences that I had on my radar at that time. Since then I have changed jobs and am now currently in a DevOps oriented position, so I’d like to revisit the subject and update that list to reflect some of the cool conferences that are in the DevOps space.

I’d like to start off by saying first that even if you can’t make it to the bigger conferences, local groups and meet ups are also an excellent way to get out and meet other professionals that do what you do. Local groups are also an excellent way to stay in the loop on what’s current and also learn about what others are doing. If you are interested in eventually becoming a presenter or speaker, local meet ups and groups can be a great way to get started. There are numerous opportunities and communities (especially in bigger cities), check here for information or to see if there is a DevOps meet up near you. If there is nothing near by, start one! If you can’t find any DevOps groups look for Linux groups or developer groups and network from there, DevOps is beginning to become popular in broader circles.

After you get your feet wet with meet ups, the next place to start looking is conferences that sound like they might be interesting to you. There are about a million different opportunities to choose from, from security conferences, developer conferences, server and network conferences, all the way down the line. I am sticking with strictly DevOps related conferences because that is currently what I am interested and know the best.

Feel free to comment if I missed any conferences that you think should be on this list.

DevOps Days (Multiple dates)

Recent videos

Perhaps the most DevOps centric of all the conference list. These conferences are a great way to meet with fellow DevOps professionals and network with them. The space and industry is changing constantly and being on top of all of the changes is crucial to being successful. Another nice thing about the DevOps days is that they are spread out around the country (and world) and spread out throughout the year so they are very accessible. WARNING: DevOps days are not tied to any one set of DevOps tools but rather the principles and techniques and how to apply them to different environments. If you are looking for super in depth technical talks, this one may not be for you.

ChefConf (March)

2015 videos

The main Chef conference. There are large conferences for the main configuration management tools but I chose to highlight Chef because that’s what we use at my job. There are lots of good talks that have a Chef centered theme but also are great because the practices can be applied with other tools. For example, there are many DevOps themes at ChefConf including continuous integration and deployment topics, how to scale environments, tying different tools together and just general configuration management techniques. Highly recommend for Chef users, feel free to substitute the other big configuration management tool conferences here if Chef isn’t your cup of tea (Salt, Puppet, Ansible).

CoreOS Fest (May)

2015 videos haven’t been posted yet

Admittedly, this is a much smaller and niche conference but is still awesome. The conference is the first one put on by the folks at CoreOS and was designed to help the community keep up with what is going on in the CoreOS and container world. The venue is pretty small but the content at this years conference was very good. There were some epic announcements and talks at this years conference, including Tectonic announcements and Kubernetes deep dives, so if container technology is something you’re interested in then this conference would definitely be worth checking out.

Velocity (May)

2014 videos (2015 conference going on right now)

This one just popped up on my DevOps conference radar. I have been hearing good things about this conference for awhile now but have not had the opportunity to go to it. It always has interesting speakers and topics and a number of the DevOps thought leaders show up for this event. One cool thing about this conference is that there are a variety of different topics at any one time so it offers a nice, wide spectrum of information. For example, there are technical tracks covering different areas of DevOps.

DockerCon (June)

2014 videos

Docker has been growing at a crazy pace so this seems like the big conference to go check out if you are in the container space. This conference is similar to CoreOS fest but focuses more heavily on topics of Docker (obviously). I haven’t had a chance to go to one of these yet but containers and Docker have so much momentum it is very difficult to avoid. As well, many people believe that container technologies are going to be the path to the future so it is a good idea to be as close to the action as you can.

Monitorama (June)

2014 videos

This is one of the coolest conferences I think, but that is probably just because I am so obsessed with monitoring and metrics collection. Monitoring seems to be one of those topics that isn’t always fun to deal with or work around but talks and technologies at this conference actually make me excited about monitoring. To most, monitoring is a necessary evil and a lot of the content from this conference can help make your life easier and better in all aspects of monitoring, from new trends and tools to topics on how to correctly monitor and scale infrastructures. Talks can be technical but well worth it, if monitoring is something that interests you.

AWS Re:Invent (November)

2014 videos

This one is a monster. This is the big conference that AWS puts on every year to announce new products and technologies that they have been working on as well as provide some incredibly helpful technical talks. I believe this conference is one of the pricier and more exclusive conferences but offers a lot in the way of content and details. This conference offers some of the best, most technical topics of discussion that I have seen and has been invaluable as a learning resource. All of the videos from the conference are posted on YouTube so you can get access to this information for free. Obviously the content is related to AWS but I have found this to be a great way to learn.

Conclusion

Even if you don’t have a lot of time to travel or get out to these conferences, nearly all of them post video from the event so you can watch it whenever you want to. This is an INCREDIBLE learning tool and resource that is FREE. The only downside to the videos is that you can’t ask any questions, but it is easy to find the presenters contact info if you are interested and feel like reaching out.

That being said, you tend to get a lot more out of attending the conference. The main benefit of going to conferences over watching the videos alone is that you get to meet and talk to others in the space and get a feel for what everybody else is doing as well as check out many cool tools that you might otherwise never hear about. At every conference I attend, I always learn about some new tech that others are using that I have never heard of that is incredibly useful and I always run in to interesting people that I would otherwise not have the opportunity to meet.

So definitely if you can, get out to these conferences, meet and talk to people, and get as much out of them as you can. If you can’t make it, check out the videos afterwards for some really great nuggets of information, they are a great way to keep your skills sharp and current.

If you have any more conferences to add to this list I would be happy to update it! I am always looking for new conferences and DevOps related events.