The TSB Mainframe Migration Disaster: Planning for Guaranteed Failure

by FormulatedBy | Technology

Reading Time: ( Word Count: )

Disclaimer: Even though this blog post sounds harsh, I believe the people involved in the mainframe migration of TSB in 2018 acted always in good faith and with the best of the business and customers in mind. They acted according to the best knowledge available at the time and with good intentions.

The Project Worth £62 Million in Fines

On April 22, 2018, TSB Bank executed what they described as “one of the most complex migrations in UK banking history.” The project had everything conventional wisdom says you need for success: three years of planning, 85 specialized subcontractors, multiple board-level reviews, third-party audits, and comprehensive testing frameworks.

Within hours of going live, a significant portion of the 5.2 million customers of TSB were locked out of their accounts. Digital banking collapsed. Branch systems failed. The chaos continued for weeks.

The final damage: A £62 million fine from regulators, over £32 million in customer compensation, a CEO resignation, and hundreds of millions in fixing the mess afterwards. The UK’s Financial Conduct Authority (FCA) conducted a comprehensive investigation, and their findings reveal uncomfortable truths about how large-scale IT migrations actually fail, and what it takes for them to succeed.

The Scale of Mainframe Migration

Mainframe migrations aren’t just software updates, they’re institutional transformations. When TSB set out to migrate from Lloyds Banking Group’s platform to a new system called Proteo4UK, they weren’t just moving code. They were attempting to crystallize and transfer decades of accumulated business logic, regulatory requirements, edge cases, and institutional knowledge embedded in systems serving 5.2 million customers.

The FCA report documents that the Proteo4UK Platform required 221 applications, most of which were existing ones for easy plug-and-play action. Beyond COBOL, CICS, VSAM, and DB2, there were other languages like Assembler, Easytrieve, Telon, and PL/1. Databases like IMS and IDMS. Complex batch jobs with intricate dependencies. Integration points with dozens of external systems.

Each of these components represents years of business decisions, regulatory compliance requirements, and operational knowledge. Missing any connection creates cascading failures.

The Impossibility of Planning Everything Upfront

Planning for 3 years and then just handing that plan to 85 contractors and they get the job done. Easy Peasy?
Here’s the uncomfortable truth that emerges from the TSB disaster: you cannot know in advance how long a mainframe migration will take or what it will cost.

TSB tried. They created an “Integrated Master Plan” (IMP) in March 2016 targeting November 2017 for migration, a deliberately ambitious two-year timeline that was publicly announced. The FCA report notes this timeline was “based on very little information” and described as “deliberately very ambitious” to act as a “forcing mechanism.”

But reality had other plans. By September 2017, TSB acknowledged they would miss the November deadline. They spent weeks re-planning, creating what they called the “Defender Plan” with new target dates in Q1 2018. Then came multiple re-re-plannings, re-re-re-plannings etc, but they had already locked themselves into a public deadline before understanding what work remained.

The lesson: Setting fixed timelines and budgets for migration before you truly understand the system is not planning, it’s wishful thinking that creates pressure to cut corners later.

The Third-Party Trap: 85 Contractors and Zero Control

One of the most striking findings in the FCA report is TSB’s reliance on 85 third-party subcontractors managed through their outsourcing partner, SABIS. We’ve all been there, trying to coordinate projects across multiple partners, but a project with 85? That’s a recipe for desaster right from day 1.

The lesson: Outsourcing the core migration work to a web of contractors means outsourcing knowledge, control, and ultimately, accountability. When things go wrong (and in complex migrations, things always go wrong) you need people who can make decisions immediately, understand the full context, and take ownership. You can’t have 85 subcontractors in a call trying to diagnose why a million customers can’t access their accounts.

How to Succesfully Migrate in the AI Era

The FCA report, combined with the Bank of England’s operational resilience framework, reveals what actually works. Success requires abandoning the illusion of complete upfront planning and embracing a fundamentally different approach, especially in the age of AI.

1. Plan only as much as needed

The TSB disaster proves that comprehensive upfront planning is a myth. Despite three years of planning, the FCA found that “TSB prepared a plan in circumstances where it had not yet finished defining its requirements (what the system was supposed to do and how it was supposed to be).”

Now in the AI era, code is cheaper and quicker to make, making an agile approach to large projects more topical than ever. Plan enough to start intelligently, and accept that most learning happens when you actually begin the work. You’ll discover dependencies you didn’t know existed. You’ll find business rules that contradict the documentation. You’ll encounter edge cases that only emerge under real conditions.

Build your plan around discovery and adaptation, not comprehensive prediction. Then, just get to work and move incrementally, learning as you go.

2. Context Is All You Need.

Wether you are a human or an LLM you need the right context to make decisions or create anything usefull. If you have too little context, your decisions are not well-informed and if you have too much, you can’t see the forest for the trees. The same applies for complete organisations too, everybody should share the right context to make their decisions effectively. This was one of the things that TSB failed to achieve, a shared context amongst everybody involved.

In my opinion, the right context for anyone working with a mainframe contains:

The vision: What and why are we doing
The system: How does the system really work
The users: How customers and employees actually use the system in practice
Business logic and regulation: What compliance requirements and business processes are embedded in the code

Of course, a developer and a business analyst, have different depths of understanding for each category, but everybody should have a certain base level always up-to-date in their head.

The problem is, that’s a lot of information to take in. Calendars are full and peoples working memories are constantly on the edge. So how can people take in all this infromation and comprehend it while going about their daily tasks? I think this is the most significant use-case for AI: distill humongous amounts of information for easy human consumption. Tools like Nomain, analyse your complete code base, enhance the information with what ever data you have available, and provide full context for everybody across the organisation. From CTO to junior developer. All they need to do is ask.

3. Small Teams Can Achieve Amazing Things

As TSB demonstrated, having a massive amount of developers, doesn’t mean that you can achieve massive things in a short time. Mostly it seems, it’s just a massive waste of money.

If you compare the average startup size now and prior to 2020, you can see that team sizes have steadily declined. In fact, it seems that a team of 15 developers (average size of a Series A startup in 2024), equiped with the right AI tools, can build tools that are worth tens of millions.

Of course, 15 developer’s will never be able to migrate a whole core banking system anywhere, but the same idea applies. The mentality should always be that a “special forces team” equiped with AI understanding and coding tools beats the approach of “just throwing bodies on the problem” every time.

It’s also understated, how important it is to keep an in-house team in the core of all development. When you move incrementally, your team learns constantly, getting better and more efficient every day. If you use outsourced teams, this learning often get’s disrupted as high employee turnover rates cause you to have a new team every year.

4. Make Sure You Stay In Control

AI is a good servant, but a bad master. Sure, you can prompt AI to generate you a simple app, but when it comes to legacy systems with tens of millions of lines of code, you just can’t give the whole rewriting task to AI agents.

Even though that sounds very tempting, but if you have ever tried out vibe coding, you know the fundamental problem. It’s great in the start, but once the code base grows and problems start to compound, it’s very difficult to keep moving ahead. As you yourself have no idea about the architecture or tech choises, it becomes harder and harder to explain to the AI agent what really is not working. As a result, you end up fixing bugs manually and going through tons of code, like you would when you just rewrite the code yourself anyway.

My advise is, use AI always as a tool to increase the speed of humans, not to replace them. This applies also in mainframe migration projects.

5. The Right Tools

The mainframe development landscape is remarkably inconsistent. Some companies embrace the latest development and AI tools: GitHub Copilot, CI/CD pipelines, modern IDEs. Others have virtually nothing. Their developers write code directly into the mainframe through terminal emulators: no assisted editing, no debuggers, just white or green text on a black screen. For these organizations, adopting a modern code editor like VS Code would be the best place to start. The report about TSB doesn’t include any information about developer tooling, but my guess would be they belonged in the latter group.

Most modern AI tools focus on code generation. It’s an obvious use case, but it misses the real problem in mainframe environments. Mainframe developers spend only 16% of their time actually writing code. The rest is spent on searching for context and coordinating with stakeholders.

Consider this: developers typically spend 3-7 days per feature just locating where in the codebase to make their changes. The remaining time goes to meetings, aligning with business requirements, and piecing together how the system works, leaving very little time for actually coding.

This is why the true efficiency breakthrough for mainframe work doesn’t come from generating code faster, it comes from understanding existing code better and sharing that knowledge across the organization. This is why an AI knowledge platform should be the first tool in your mainframe modernization toolkit.

Platforms like Nomain enable users to grasp the big picture and drill into details in a fraction of the traditional time. Users can ask questions about the codebase, locate functionality, diagnose bugs, discuss new features, and share insights, all in seconds rather than days.

Developers aren’t the only beneficiaries. Business analysts, product owners, and other stakeholders constantly make technical decisions about the mainframe, but most can’t read mainframe code. A knowledge platform becomes their superpower. They can now assess how difficult a new feature might be to implement before escalating to development teams, saving everyone’s time and enabling better-informed decisions.

A AI knowledge platforms can analyze legacy codebases to:

Map business logic, dependencies, and data flows automatically, revealing connections that would take weeks to discover manually
Connect code to business rules and operational data, bridging the gap between technical implementation and business intent
Create accessible, current knowledge instead of documentation, ensuring critical understanding doesn’t exist only in retiring experts’ heads
Enable rapid system comprehension, eliminating weeks of code archaeology before teams can make changes

In essese, they create comprehensive, connected understanding of how your mainframe systems actually work, not how you hope they work or how outdated documentation claims they work.

The Path Forward

The TSB migration failed not because of insufficient planning or technical incompetence, but because of too much faith in upfront planning and trusting that a large enough number of developers can achieve anything. They set public deadlines before understanding the work. They outsourced critical knowledge to 85 subcontractors. They skipped testing to meet arbitrary dates. They went “all in” with a big bang migration with no rollback option.

Every one of these decisions seemed reasonable at the time. They’re what conventional project management recommends and what executives expect. They’re what regulators thought they were supervising.

And they led to catastrophic failure.

The successful path is different:

Build your core team in-house with people who have complete context about your systems, business, users, and regulatory requirements
Create shared understanding by using an AI knowledge platform across your entire organization so everyone knows what you’re doing and why
Equip your team with modern AI tools like Nomain, Cursor and/or Github Copilot
Plan only what you need to start intelligently, then learn and adapt as you actually do the work
Migrate incrementally, start small, prove it works, learn from it, then scale up
Use AI everywhere it helps, but maintain control and human judgment for critical decisions

The bottom line for successful mainframe migration isn’t more planning. It’s better understanding, incremental progress, and the right tools to maintain clarity as you go.

Stop planning everything upfront. Start understanding everything deeply. Get Nomain to help your team see clearly. Then get to work. One piece at a time.

Ready to build the deep understanding your migration needs before you commit to a timeline? Learn how Nomain helps teams get ovet the communication over-head and mvoe faster here: www.nomain.com

Post Category: Technology

Tags: Management | Security

← Previous