The ROI on AI Projects is Still Negative

Unless you are Microsoft, your IT solutions are expected to provide a positive return on the investment. You might have heard that Microsoft loses $20 a month for every GitHub Copilot customer. That’s after the customer pays $10 for the product. If you are a heavy user of Copilot, you might be causing Microsoft a loss of up to $80 every month.

Some organizations are rich enough to be able to afford unprofitable products like this. They typically have to spend their own money. VCs seem to have soured on the idea that “we lose money on every customer, but we make up for it in volume.”

If you are running an AI project right now, you should be clear that it will not pay for itself. Outside a very narrow range of applications, typically image recognition, AI is still experimental. If you have approved an AI project based on a business case showing a positive ROI, question the assumptions behind it. The AI failures are piling up, and even the largest, best-run, and most experienced organizations in the world cannot make money implementing AI yet. You probably can’t, either. Unless you have money to burn, let someone else figure out how to get AI to pay for itself.

AI is not Coming for Your Job

Unless you write corporate mission statements, AI is not coming for your job. Generative AI like ChatGPT works by continually adding the most likely next word. That ensures that an AI-written text is a bland average of all the texts it has read. It is unlikely to be thought-provoking or even useful.

I was reminded of how useless an AI-generate text is when LinkedIn invited me to participate in a “collaborative article.” The AI generates a text on a subject, and I am supposed to add a real-life story or lesson next to that. Unfortunately, the AI text is a collection of trivial platitudes. LinkedIn asked me to rate the article, and I immediately clicked “It’s not so great” (because there was no lower rating). Unfortunately, the feedback options did not include “Your AI text adds no value.”

The striking writers in Hollywood want guarantees from the studios that they won’t be replaced with AI. They need not worry. A script written by AI will be mind-numbingly boring. What AI might do for the film and TV industry is to take over boring housekeeping tasks like ensuring continuity – was the blood on his left or right jacket sleeve? But it won’t write the next hit show or movie.

The right way to use AI in its current state is to use it deductively – to analyze stuff. Programmers who inherit a huge pile of undocumented code benefit from having ChatGPT or its siblings explain the code. Using AI inductively to generate text might be fun, but it doesn’t create any value.

The Guard Rail Pattern

There is a simple way to prevent many IT disasters, and it is sadly underused. It’s not on the standard lists of design patterns, but I call it the “Guard Rail” pattern.

It would have prevented the IT disaster that dominates the news cycle in Denmark these days. Techno-optimists have forced a new digital building valuation on the long-suffering Danes, and it is an unmitigated catastrophe. The point is to replace the professional appraisers who determine the value of a property for tax purposes with a computer system. And many of the results from the computer are way off. Implementing a Guard Rail pattern would mean that the output from the new system would be compared to the old one, and those valuations that are, for example, 3x higher would be stopped and manually processed.

A colleague just shared a video of the latest iteration of the Tesla Full Self Driving mode. This version seems to be fully based on Machine Learning. Previous versions used ML to detect objects and traditional algorithmic programming to determine how to drive. As always infatuated with his own cleverness, Elon Musk does not seem to think that guard rails are necessary. Never mind that the FSD Tesla would have run a red light had the driver not stopped it. Implementing the Guard Rail pattern would mean that a completely separate system gets to evaluate the output from the ML driver before it gets passed to the steering, accelerator, and brakes.

When I attach a computer to my (traditional) car to read the log, I can see many “unreasonable value from sensor” warnings. This indicates that traditional car manufacturers are implementing the Guard Rail pattern, doing a reasonableness check on inputs before it passes the values to the adaptive cruise control, lane assist, and other systems. But the Boeing 737 MAX8 flight control software was missing a crucial Guard Rail, allowing the computer to override the pilot and fly two aircraft into the ground.

In your IT organization, discuss where it makes sense to implement the Guard Rail pattern. Your experienced developers can probably remember several examples where Guard Rails would have saved you from embarrassing failures. There is no need to keep making these mistakes when there is an easy fix.

Another Large IT Project Failure – and How it Could Have Been Avoided

The City of Birmingham can be added to the long list of organizations that went bankrupt trying to replace their ERP system. They were running a heavily customized SAP system and tried to implement Oracle Fusion. As often happens in this kind of project, the costs exploded from the initial estimate of $25 million to $125 million by the last count. They are not done yet, and since they’ve stopped paying their bills, they might never be.

When you are faced with a legacy system no longer fit for purpose, don’t fall prey to the dangerous illusion that you can run one large project to replace it. A project is a collaborative enterprise intended to reach a well-defined goal. But for a large IT project, the project duration alone (four years and counting in Birmingham) ensures that the goalposts will have moved several times before you are done. Your Program Manager is not likely to be among the few hundred people in the world with the exceptional project and change management skills needed to pull off such a project.

A series of smaller projects to carve out and replace functionality in smaller chunks does not promise to solve all your problems in one fell swoop. But it has a much higher chance of success.

Would You Notice the Quality of Your AI Dropping?

You know that ChatGPT is getting more politically correct. But did you know that it is also getting dumber? Researchers have repeatedly been asking it to do tasks like generating code to solve math problems. In March, ChatGPT 4 could generate functioning code 50% of the time. By June, that ability had dropped to 10%. If you’re not paying, you are stuck with ChatGPT 3.5. This version managed 20% correct code in March but was down to approximately zero ability in June 2023.

This phenomenon is known to AI researchers as “drift.” It happens when you don’t like the answers the machine gives, and take the shortcut of tweaking the parameters instead of expensively re-training your model on a more appropriate data set. Twisting the arm of an AI to generate more socially acceptable answers has been proven to have unpredictable and sometimes negative consequences.

If you are using any AI-based services, do you know what the engine behind the solution is? If you ask, and your vendor is willing to tell you, you will find that most SaaS AI solutions today are running ChatGPT with a thin veneer of fine-tuning. Unless you continually test your AI solution with a suite of standard tests, you will never notice that the quality of your AI solution has gone down the drain because OpenAI engineers are pursuing the goal of not offending anyone.

Do Your Employees Follow your AI Guidelines?

Unless you override it, your organization’s policy for AI-driven tools is “anything goes.” That’s because your developers want to get their job done as quickly as possible. If that involves having Github Copilot write part of the code or copying a code block into ChatGPT for debugging help, so be it.

If you don’t have secrets, maybe that’s fine with you. But even though OpenAI is not training ChatGPT on user prompts, they have not been very diligent about keeping them safe. You should assume that everything your developers paste into ChatGPT will eventually leak.

That includes your data. AI tools are very good at data cleaning and visualization. Your Data Scientists are surely pasting data into ChatGPT and getting back fully functional Python code to run in a Jupyter Notebook. Unless you tell them not to.

If I asked one of your developers or Data Scientists about your policy on AI tools, would they know it? And would they follow the rules or would they take the 10x or 100x productivity boost?

How Do We Make IT Projects More Successful?

At least nuclear waste storage is worse. In his book “How Big Things Get Done,” professor Bent Flyvbjerg ranks 25 categories of projects by their average cost overrun. IT projects are the fifth worst offender, better than nuclear but worse than buildings, rail, airports, tunnels, and many others. We all know many public IT failures (Denmark has its fair share), and the private sector has suffered many more, even if less publicized.

What can we do about it? One chapter in the book is dedicated to creating better estimates. The problem with our estimating today is that we treat every project as unique. We then estimate each bit, and our usual how-hard-can-it-be optimism leads to the underestimation so common in IT. Flyvbjerg argues that we should start by identifying the class of projects this new project belongs to. The average for this class of projects is then the starting point for our estimate, adjusted up or down.

For example, you estimate an ERP project by looking at other ERP projects. If the cost in your industry is $20 million on average, that is your initial value. Then adjust up or down depending on whether your project is smaller or larger – or more straightforward or more complex – than the members of the reference class.

Bring this book with you to the beach this summer so that you can help our industry move forward when you return from vacation. IT projects exceed their budgets by an average of 73%. We can do better.

AI Will Not Destroy Humanity

AI doesn’t pose an extinction risk. And it has already created brand new jobs in the catastrophizing industry.

The only reason AI industry leaders like Sam Altman and Demis Hassabis jump on that bandwagon is to encourage more government red tape. If you are a powerful incumbent, asking for as many constraints to your industry as possible makes sense. The EU, ever happy to regulate industries originating elsewhere, is delighted to oblige. With compliance departments of thousands, these massive organizations can handle any amount of regulation thrown at them. But a lean startup will get regulated out of business.

The most fascinating part of AI is local, small-scale AI. We currently have massive, centralized AI running in enormous data centers. But since LLaMA escaped from the Facebook lab, tinkerers and hobbyists have already built Large Language Models on their local computers. But, of course, OpenAI, Microsoft, and Google would like small competitors to be regulated away.

Did You Hear the One About the Gullible Lawyer?

You need the best arguments to win a discussion, get a project approved, or win a court case. But, if you are short of preparation time, you might take a shortcut like the New York Lawyer who asked ChatGPT for help.

Ever willing to help, ChatGPT offered six cases supporting the lawyer’s argument. Unfortunately, they were entirely made up. That might work if you write a marketing blog post, but it does not hold up in court. The gullible lawyer claims he did not know that ChatGPT might be hallucinating but is, of course, facing sanctions for lying to the court.

IT professionals know that ChatGPT cannot be trusted to answer truthfully. It is not much of a problem for a programmer because the compiler or the unit tests will catch defective answers. But the rest of the world doesn’t know.

Now is the time to remind everyone in the organization of your company policy on using ChatGPT and its ilk (you do have such a policy, right?). Tell the story of the gullible New York lawyer to make the point clear.

Does it Pay to Move to the Cloud? Or Back?

Most organizations that decide to move workloads to the cloud are missing a crucial piece of information: What it costs to run the system on-premise. In a viral blog post, David Heinemeier Hansson shared his specific calculations for Basecamp and HEY. Moving back from the cloud makes perfect business sense for him. Of course, your calculation will be different, but unless you know what it costs to run on-premise, you are comparing an uncertain cloud cost with a completely unknown on-premise cost.

As a CIO, you are expected to make sound business decisions. You can only do that if you have both numbers.