Somebody Else’s Problem

Things that are Somebody Else’s Problem (SEP) are invisible. Douglas Adams famously joked about this in “The Hitchhiker’s Guide to the Galaxy,” but the effect is serious and real.

For example, local British politicians were falling over each other trying to attract data centers. They were focusing on the cachet of having Google or Facebook in their town, and the half-dozen jobs for the electricians and plumbers maintaining them. Supplying these energy-hungry behemoths with power was Somebody Else’s Problem.

Now they have so many data centers in West London that their electrical grid is overloaded, and they won’t be able to build more housing until they have upgraded their main cables. That’ll be sometime in the 2030s.

As an IT leader, it is your job to ensure that each team knows the problems they might cause for other parts of the organization.

Documentation is Unnecessary Until You Need It

If you have a fire in your server room, your insurance pays out. Insurance is expensive, but a necessary part of your risk management strategy. For many risks, there is a way to get almost free insurance. Yet few people take it. I am talking about documentation.

A chocolate factory in Belgium didn’t follow its own processes and did not document its production. When kids started falling sick with salmonella all over Europe, suspicion fell on the Kinder egg factory in Arlon. The authorities asked for the production documentation. Because the factory couldn’t provide it, the whole plant was shut down. If they had had documentation, they would have been insured against this risk. They could have shut down just one production line instead of the whole plant.

So the reason you might not be able to get chocolate eggs this Easter is bad documentation.

Are You Monitoring Your Automated Systems?

It is hard to anticipate the real world. I’m sure the wet concrete on the road in Japan looked just like solid ground to the delivery robot. Consequently, it happily trundled into the urban swamp and got stuck. The story does not report whether the delivery company managed to get their robot out before the concrete hardened…

This is why you need careful monitoring of all the fully automated systems you are deploying. The first line of defense is automated metrics and their normal interval. For a delivery robot, the distance covered over a minute should be greater than zero and less than 270 (if you have limited the robot to e.g. 10 mph). The second line of defense consists of humans who will evaluate the alarms and take appropriate action. The third line of defense are developers who will fix the software and the alarms.

Too many automated systems are simply unleashed and depend on customers to detect that something is wrong and complain. You want to figure out you have a problem before the image of your robot encased in concrete starts trending on Twitter.

Shooting the messenger

Even though the clueless Governor of Missouri tried to shoot the messenger, he missed. Last year, a reporter published his findings that private data on more than 100,000 teachers was available to anyone who knew how to click “View Source” on a web page. The Governor held a widely-ridiculed press conference where he vowed to prosecute the “hackers” who had told the world about the incompetence of the state IT department.

A thorough report by law enforcement now roundly exonerates the journalist. It also exposes that personal information on more than half a million people had been available for a decade to anyone who care to look.

Even professional IT organizations occasionally fail like the state of Missouri did here. You have a little simple system, you are under schedule pressure, and you forgot to book time with the security team. So you roll it out without a security review. The antidote to this is to maintain a complete systems inventory with a field for the name and email of the person who did the security review. That will show you if this step got skipped, and allow you to quickly ask questions about any alleged security issues before you start shooting at the messenger.

We are Still Building our IT on Shaky Ground

Once again, researchers have demonstrated the shaky foundation under our IT infrastructure. A lot of modern code is being built with Node.js, making use of Node Package Manager (npm) to pull in libraries that your code depends on.

There is no evidence that evil Russian hackers built npm. But if it didn’t exist, it would be a priority for the cyber-warfare command of our adversaries to build something like it and tempt us to use it.

The problem is that it is easy to use npm wrong, and hard to use it right. We’ve already seen many cases where organizations simply pull the latest packages from npm when they build. That means that as soon as the central package in the npm repository is corrupted or taken down, that failure will ripple through many layers of code that depend on it.

The latest discovery is that 8,000 packages have maintainers with an expired email domain. That allows any hacker to purchase that domain, re-create the maintainer email and take over the package.

Ask your CISO or other security function how your IT organization makes sure that every project only pulls npm packages from your official repository of security-vetted packages.

Do you have control over the libraries that go into you projects?

Yet again, a rogue developer took down thousands of applications that depended on his library. Unhappy with the fact that open source developers work for free and companies use open source to make lots of money, he deliberately broke the faker.js and colors.js NPM libraries.

Interestingly, the more than 20,000 projects that depend on these two libraries download them almost 30 million times per week. That means a lot of projects are downloading the code from the NPM repository for every build.

In a professional IT organization, all your projects don’t just pull the latest version, they pull a specific version. And you don’t pull straight from the internet, but from the “blessed repository” with the officially approved version of everything. Are you sure you don’t have projects that just pull the latest libraries down from wherever?

Risk and Reward

This week’s episode of my podcast Beneficial Intelligence is about risks and rewards. Humans are a successful species because we are good at calculating risks and rewards. Similarly, organizations are successful if they are good at calculating the risks they face and the rewards they can gain.

Different people have different risk profiles, and companies also have different appetite for risk. Industries like aerospace and pharmaceuticals face large consequences if something goes wrong and have a low risk tolerance. Hedge funds, on the other hand, takes big risks to reap large rewards.

It is easy to create incentives for building things fast and cheap, but it is harder to create incentives that reward quality. Most organizations don’t bother with quality incentives and try to ensure quality through QA processes instead. As Boeing found out, even a strong safety culture does not protect against misaligned incentives.

As an IT leader at any level, it is your job to consider the impact of your incentive structure. If you can figure out a way to incentivize user friendliness, robustness and other quality metrics, you can create a successful IT organization. If you depend on QA processes to counterbalance powerful incentives to ship software, corners will be cut.

Listen here or find “Beneficial Intelligence” wherever you get your podcasts.