After over 20 years consulting PC tech support I’ve seen ridiculous actions taken in the name of improving performance. I’ve seen hundreds of high-end thin clients sporting more horsepower than a typical PC deployed simply to run a remote desktop client. I’ve seen a whole server racks deployed to do the work of a single server. I’ve seen video cards designed for gaming installed in desktops to make a line-of-business application work better.
In most of these cases and others like them, unquestionably wasteful decisions were made because of a common fear of one of the worst types of user complaints an IT pro can hear: “It’s running too slow!” Those words issued from the right lips into the right ears can touch off a political disaster that often ends with a pile of wasted time and money. Many times, simply being seen to take any action at all — regardless of whether it helps — is more valued than the frequently painstaking process of figuring out what the problem really is (and indeed whether there even is one in the first place).
The real challenge for IT pros faced with a high-profile performance complaint is to quickly and decisively determine where the problem may lie before wasteful measures that serve only to distract from the real issue are forced down our throats. This almost always requires the right tools to be in place ahead of the complaint being made, great communications skills, and in the worst cases, the intellectual curiosity to dig into the weeds search for a smoking gun. Here’s what you need know about each of these three critical troubleshooting strategies.
Computer Repair Strategy #1: Preparation
One of the most critical parts of any performance-troubleshooting challenge isn’t necessarily proving where a problem is, but rather where it isn’t. Begin with the end in mind and work backwards. If you can immediately proclaim upon receiving a performance complaint that the network, compute, and storage infrastructures are not to blame for any perceived slowness, you can short-circuit the knee-jerk “it’s the hardware!” conclusion often made early on in the process.
The reason hardware tends to get the automatic blame is because hardware is easy for pretty much everyone to understand (or think they understand). Even a completely non-technical stakeholder will know that a 10Gbps Ethernet switch is probably somewhere around 10 times as fast as a 1Gbps Ethernet switch. When doubts as to whether the network might be the cause of a performance problem crop up, it’s simple for the non-technical to seize upon a 10-fold increase in performance as a potential solution. The same goes for all kinds of hardware, servers, storage, and network alike.
Being forearmed against this requires you to have thorough monitoring tools in place. If you’re monitoring literally every piece of gear in your infrastructure, you’ll always be able to prove whether you’re really bumping into the limits of the infrastructure or whether the problem is caused by something else. Although monitoring systems take some work to get running and dialed in, I’ve seen time and time again how incredibly useful they are in steering a troubleshooting conversation in the right direction.
Although simply monitoring performance metrics of the infrastructure will help, also consider configuring your monitoring systems to evaluate user-facing criteria. For example, you’d typically configure a monitoring system to monitor storage latency, storage throughput, database query response times, and network throughput to evaluate the performance of a database-driven line-of-business application. Those are all great metrics to have, but also consider scripting a process that logs into the application as a normal user would, performs a few basic functions that a typical user would perform, and logs out. If you configure that script to be timed by your monitoring system, you’ll have an monitoring agent to tell you when things are wrong, whereas other, more specific monitoring might miss.
Computer Repair Strategy #2: Good Communication
Many times, if you were sitting behind the user’s shoulder when he or she encountered a problem, you’d immediately be able to identify the cause. However, when you’re not there and the problem happens randomly or without enough consistency to reproduce, knowing how to ask questions and interpret the answers is absolutely vital. It is indeed a fact to say that someone who knows how to ask good questions and logically parse the replies — but knows next to nothing about technology — may be in a better position to identify a problem than the most technically astute person who lacks these skills. Often, asking an affected user to keep a log of exactly what happens — and when it happens — can really help when it comes to matching user complaints to system logs and performance charts.
If you can coach the user on what to look for and how to document events accurately, you can save yourself a lot of troubleshooting chasing the problems created by approximation, hearsay, or an overactive imagination. Getting an accurate description of the problem not long after it is first reported is perhaps the most valuable clue you can work with in the troubleshooting process. I can’t tell you how many times I’ve helped someone chase a problem that didn’t actually exist — at least not the way in which it was described. Not only is this frustrating for the IT pros who waste their time chasing ghosts, it’s also intensely frustrating for stakeholders who see no progress being made on an issue they’ve reported.
Computer Repair Strategy #3: Some Curiosity
Once you have a good description of exactly what the problem looks like in the field, you’ll usually already know your answer as to what the problem is or be able to hand it to a third party such as an application vendor to solve. However, the most difficult performance problems won’t be untangled that easily. The really fun ones leave no trace of their existence. Performance graphs will look well within limits, error logs will be devoid of useful errors, and hardware/software specifications will all be satisfied. In these situations, you need to be able to think outside of the box for new ways to tear into the problem. As with communication, not every IT pro has this skill. Realize that all these strategies should be combined with patience and attention to details in computer repair.