Exchange Transport Service won’t start

Due to an outage this weekend, I’d like to take a minute to briefly describe the scenario that occurred and how it was resolved.  If you are having trouble starting your Exchange Transport Service then you may potentially be running into the same issue I was having during the outage.  Luckily there is an easy remedy for the service failing to start.  Basically what was happening was the Exchange message queue database was beginning to fail due to some sort of corruption, causing the Transport service to fail.  Because the Transport service wasn’t running, the Edge Sync process was failing, causing external mail delivery to fail.  Obviously a big issue, since you cannot receive any email from external domains if this is not working correctly.

To troubleshoot this, there are a few obvious signs that you should look at first.  The main thing you should check first is your disk sizes, I wrote about it in my previous post.  If your disks are full or are filling up then you are pretty much dead in the water and will need to fix your disk issue.  In my scenario the disk sizes were not an issue so the next tool I turned to were the logs.  I found a number of interesting entries in the Windows Application Event logs that gave me some clues.  I want to detail as many of these messages as I can so that people who are having similar issues know what to look for.

Transport error Transport error Transport error Transport error

There are a few possible resolutions to this problem.  Through some Google searches one solution I found is that you can attempt to repair the corruption in the queue databases by running the database through ESE util.  There is no guarantee this will work and it can potentially take a lot of time, depending on the size of your queue database. There is some good information here about the mail queue and how it works.

If you decide to repair the database, the mail queue file is located in the following location:

C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\data\Queue

Inside this directory is a file called tmp.edb.  This is the file that you will need to repair.

The other method is much simpler and was the solution I went with.  Instead of attempting to repair the database corruption, simply copy and rename the queue folder and restart the Transport service.  Doing this will force the Transport service to create a new, fresh copy of the database queue along with all of the accompanying config files and associated items that are required to get things up and running.  It is faster and simpler, IMO.  The only problem with this approach is that items that were stuck in the queue when the database corruption occurred will be lost.  For me, this was an acceptable loss.  If not, you will probably have to use the first method and attempt to repair the database or try to somehow work with a shadow copy or backup somehow to get unstuck.

Read More

Monitor your Exchange disk sizes

A word to the wise.  If you all of a sudden are unable to send and receive email messages in your Exchange environment, take a look and make sure the Exchange server disks aren’t being filled up.  Today I ran across an interesting (and by interesting I mean that this could have caused a serious outage) issue where Windows updates were very routinely being downloaded for our next patch management installation cycle but unknowingly were also causing our email services to stop functioning correctly.  I am thankful the scenario didn’t get ugly and luckily this event gives me the opportunity to talk about a few of things that I think might be useful for readers and other admins.

It turns out that this month’s wave of Windows updates caused the disks on our Hub Transport servers to quietly fill up during the day, unbeknownst to any of the admins.  In normal circumstances this process is by design and almost never becomes an issue, however in this case there was not enough disk available for Exchange to work correctly.  This could have been disastrous had we not known that the disk was starting to fill up.  We could have been chasing our tails for a much longer period of time and the situation could have escalated to a more stressful situation.  For some reason, the company likes to be able to send and receive emails.  Thank god for monitoring that works.

There are a couple things that need to be investigated at this point.  First, had we not known that the Windows updates were what were causing the disk to fill up, a logical place to start looking for clues would be to examine the log files on the suspect servers.  I would like to take a little bit of time and quickly go over some steps for looking at logs in an Exchange environment, when thinking about potential disk space issues a few things come to mind.  Are log files growing rapidly?  Did somebody turn on verbose logging and accidentally forget to turn it off?  To verify the logs aren’t the issue there are a few places that are good to look.  If you are familiar with or have ever used message tracking in Exchange you know how powerful it can be.  Sometimes that can also potentially be an issue with your disk filling up.  Here is the location that these message tracking logs are stored:

C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\MessageTracking

Another location that gets used when you turn on verbose logging for troubleshooting send or receive connectors are the smtpsend and smtpreceive directories.  These can fill up quite quickly if you forget to turn off verbose logging on a send or receive connector when are you done troubleshooting.  This location is here:

C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\ProtocolLog

Finally, there is a location for logging protocol settings on the hub transport.  These logs can be found here:

C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\ProtocolLog

I would like to point out quickly that any and all of the behaviors of these logging methods can be modified using the Exchange Management Shell, and sometimes for more detailed settings can only be modified by the EMS.

If these quick spot checks don’t uncover any immediate problems another good technique to help gain some insight into where your disk space issues are is to use a tool that enumerates file locations and file sizes.  There are a few tools available, one of them I like to use is Space Sniffer.  It is fast, easy to use and gives a good visual representation of directory sizes and file sizes.  The tool can do much more but in this case we are just interested in finding the disk issue quickly.  We were able to quickly find that the size and contents of the %windir%\softwaredistribution\download folder were growing rather quickly.  I just happen to know that this is the temporary location that Windows uses to store Windows update files before they are installed.

There are a few things that can be done here.  You can either clear the temporary Windows updates files, delete other unnecessary files or you can grow your disks.  We were lucky because our Hub Transport servers are VM’s and increasing the disk size of these servers is simple.  That seems like the best option if it is a possibility, just in case something like this happens again we will have the additional space so the Exchange servers won’t bog down.

Ultimately we prevented the disaster from occurring but the incident is a great illustration of the lesson I’d like to share.  Make sure you have a good monitoring and alerting solution in place.  Otherwise you may not have any clue where to start looking.  If we did not have a reliable monitoring tool in place it would have been much more difficult to track this problem down in the first place because our Exchange environment is large and complex.  Because we have good monitoring tools we were able to quickly identify the problem and resolve it before anything bad happened.  On a side note, I am still thinking about how we can take this monitoring and alerting one step further in the future to become proactive instead of reactive but for now the monitoring tools are doing their job and because of this we avoided a potential disaster.  If you have any thoughts on proactive monitoring and alerting relating to these types of disk issues let me know, I’d love to hear how you handle it.

Read More

Why Computer Science degrees translate to System Administration

I run across a lot of articles and posts that talk about how a degree in Computer Science is usually irrelevant to system administration and that you are just as well off with another degree or no degree at all. I think that line of logic is very short sighted and today I am going (or at least attempt) to explain why. By no means am I criticizing these approaches, in fact I believe in the logic that there is more than one way to skin a cat, and I have found many other highly successful admins that have reached their positions by these alternate means. I just want to quickly clarify that I am not advising readers that taking the CS route to becoming a system admin is the only, right way to go, I am simply relating my own experiences in system administration to my background in CS and making a case of why pursuing a degree in Computer Science, or any other degree in engineering for that matter isn’t going to hurt your chances of becoming a sysadmin.

When you think of Computer Science you think of programming or maybe math, at least I do. Most CS programs these days have a heavy orientation towards programming and the scientific and mathematic applications of programming as it applies to the world around us. As an aside, I am beginning to see many more programs that are tailored to specific disciplines inside the realm of IT which looks promising. This is a great hybrid approach in my opinion because it gives students a chance to look at a few alternate options. Coding isn’t my passion so having an option to become a system administrator without the amount of intense coding from a CS program looks like an attractive approach.

It is true that many of the mundane daily tasks related to system administration don’t involve 8 hours a day of reading and writing code. Because of this I think it is important to characterize and distinguish a sysadmin as somebody who relies on software tools and programming to solve problems and technical challenges but doesn’t necessarily devote all of their time and energy to living in and interacting with code. The relationship of the sysadmin to programming is more of an indirect one, though still very important.

The farther along I wander on in my journey as a sysadmin the more I realize how the CS background is helping me.  I have a solid foundation in many of the core concepts that were taught through the CS program, which in turn  have indirectly influenced my abilities as a system administrator for the better. The first and most valuable asset my CS background has given me is the ability to write and understand code.  This is extremely useful in my daily slew of activities.  It allows me to approach problems with a programmatic methodology, it allows me to automate redundant and repeatable tasks with scripts, it gives me intuition into why databases or programs are slow, it allows me to debug issues systematically, and on and on.  Obviously these skills can be learned elsewhere but having them rolled up into your education when you learn about Computer Science as part of the package deal is very convenient.  I would much rather have this set of skills and have the ability to look at things from a different perspective than have to learn each of these techniques separately.  There is no way that somebody coming from a business or other similar background will know about silly things like big O notation or how different algorithms work at a fundamental level, it just isn’t part of their background so they don’t spend time thinking about these things.

This really parlays into other areas well and you are setting yourself up for a diversified and broad horizon for future employment prospects. For example, take a pure sysadmin that knows no programming or CS; at their core they know system administration. But what if they either get burnt out (which is common in this profession) or they don’t keep up the skills to match their position? There is nowhere in the industry for these individuals to turn, unless they want to go into management. That is why I believe individuals that choose not to further their careers are essentially crippling themselves and their future prospects by not knowing how or learning to program, or to at least understand how system administration and programming can relate to each other. With a diverse background the CS sysadmin could potentially move into a Devops role, a pure programming and development role or a management role. With the diverse IT ecosystem, programming and development skills are very much saught after and so the demand is high for these other types of positions and sets of skills.

Another well known fact in the IT industry, which I don’t necessarily agree with but nonetheless exists, is the fact that just having a CS degree will open doors that may not otherwise be open without a degree. I personally believe that a degree shouldn’t dictate your position but by having a degree you set yourself up for some unique opportunities and certainly are not hurting yourself. For example, all other things being equal, somebody scanning through resumes has to select an individual applicant that either has a degree in Computer Science or a degree in Philosophy. Which do you think will be picked? Like I said, I don’t think the hiring process is fair or even has anything to do with skill but can be used as a way to get ahead of the competition in the hiring process and can therefore a degree be valuable by itself as well as viewed as a strategic component in the hiring process if nothing else.

Here’s what I am saying. You don’t have to have a degree in Computer Science to be a great System Administrator. But the CS background definitely equips you with the tools to both understand some of the more abstract technical concepts and ideas and give you a robust framework working through and solving these difficult and complex problems. Ultimately the most important factors in being a good sysadmin (let alone anything else) is a combination of many different things, including a willingness to learn and the amount of experience an individual possesses. There is no cookie cutter way to build the perfect sysadmin and you will invariably find a very diverse group of people in this profession, but a head start with a CS degree is certainly one path that won’t hurt you and is a good attribute of many good sysadmins.

Read More

Reflections on the year

It is the time of year again to reflect on some of the things that happened during 2013.  As usual, it is impossible to predict what will happen in the future and what kinds of experiences will shape you and what kinds of difficult challenges you will encounter and overcome.  Luckily in 2013 there weren’t any challenges that I wasn’t able to overcome in one way or another.

There was a lot that happened in the past year that is worth going over.  The first main thing I’d like to mention is that I hit my 2nd full year of blogging, which was really exciting for me.  I have nearly 100 blog posts published to date and I really feel like I am just getting started.  I began to experiment a lot more with the format and content of the blog and I have found that to be enjoyable.  I have also begun to experiment with different techniques to monetize the blog, which has been interesting to me as well.  I think that it will be really fun to see what happens with all of the different ways the blog is growing in the coming year.  One thing I would like to see more of are some unique perspectives from other sysadmin/IT bloggers because I feel like it will really spark some other areas of growth.

Other high notes of the year include my first trip to Cisco Live! which was a great experience, I learned so much from that conference and it wound up being a great trip.  I have taken on more responsibilities in my current position.  I have begun implementing some fun interesting techniques and projects as well, including a fully featured testing environment with load balancers, SAN, clustered Hyper-V, SQL, etc.  That was been a great tool not only for myself and my own experience learning the technologies but has been a valuable tool for the organization as well to help prototype and test potential technologies.  This past year has also been valuable from a networking standpoint, I took part in a full blown wireless upgrade project, I helped with the management and move forward plan with our current switches, and in general learned a ton of new stuff about networking technologies that I did not see myself learning, which has been valuable and fun for me.

While things went well for the most part there is always room for improvement.  Areas of improvement for next year include more involvement in automation, for one.  I am really getting a good taste now of automation and I think it will be huge for my career growth as well as a benefit to my current employer.  I would also like to see myself involved in more (people) networking, whether it be through conferences or other user group gatherings.  I think networking with other IT pros is something I need to continue to work on.

Finally, outside of work I have some other stuff I’m working on getting up off the ground that I’d like to mention.  First, and most excitingly for me is my side business;  I repair mobile devices, iPads, iPhones, Android, etc.  The learning experience from that project has been great so far and I would really like to expand some of things I’m doing with it into the next year.  Part of getting this up and going will be learning how to develop Android and iOS apps, building a repair tracking system, and learning much more of the nuances that go into running a business that I had no idea about before I started this project.  Last but not least, I met my wonderful girlfriend.  She has been a true blessing to me so far and I just wanted to get her a shout out while I am writing this up.  So to bring things together here, I am really looking forward to all of the rewards and opportunities that go along with hard work and persistence.

There will be more of the same this coming year and I am excited for it.  From career goals to personal projects, I would like to see myself continuing to learn, continuing to improve processes and continuing to become a person that can take on responsibilities and people can depend on to get things done.  I know it will be hard work and won’t always be fun but I know it will be worth it.  Next year should be fun, so until then have a happy new years!

Read More

Design Group Policy for easy troubleshooting

I tend to see a lot of one off fixes for setting up and fixing group policies that either don’t exist or are intended for policies that are broken the majority of the time when I am looking up GP answers on teh google’s.  I recently watched a great video over at the channel9 website by Daren Mar-Elia of GPOguy fame about using best practices and design principles for managing your Group Policy environment.  Here is the link to that video.

That video really got me thinking about the topic of how I could improve my GP management skills in my day to day environment.  So I decided that I would take as many offerings from his talk and elsewhere in my searches across the interwebz to help come up with some of my own best practices and guidelines for managing Group Policy.

The following is an overview of the ideas and techniques that I came up with and what has worked well in my experience with regards to managing Group Policy.

Group Policy organizational best practices:

  • Use either a “U” “S” or “C” to denote whether Group policy is User, Server or Computer
  • Tack on a version at the end of the specific Group Policy.  Brand new Group Policies begin at v1.0
  • Every time a policy changes increment the version number.  It makes things easier to troubleshoot when using gpresult with this method
  • Each GPO has one specific use case.  DO NOT LUMP MULTIPLE FUNCTIONS INTO ONE POLICY
  • Use very detailed and descriptive names to denote what a GPO is and does

Here are some example policies that I have been working on in a test environment.  I think it captures many of these above best practices quite nicely.  Please feel free to adapt this technique to suit your own specific needs, this is only a template and I’d like to see how it can be improved.

Group Policy best practices

As you can see, using this format it is easy to tell whether or not this is a computer policy, what specifically the policy is doing and which version of the policy we’re at currently.

The most crucial part of using this system is to get other Group Policy admins to buy in to this technique.  If you don’t clearly lay out your expectations then keeping policies up to date and organized could potentially become a pain point looking on down the road.  The other caveat is to get the other GP admins in the habit of creating policies that address only one specific task, that are broken into either user or computer policies and have descriptive names.  If the environment utilizes multi-purpose policies that contain both user and computer specific settings then this may be a new concept for many of the admins but the extra effort in setting this type of environment up will be totally worth the extra overhead initially.

I definitely think that this technique can be improved and I am always tinkering with it to see how I can get it to work better but for now it is at a good point.  If you make the transition to organizing and improving your management of Group Policy or just have some solid best practices of your own already let me know, I would love to hear about what you are doing and how to incorporate more techniques into my own management style.

Read More