By Brian Kimmel — Feb 16, 2025

The LLM Plateau and What Software Engineers Can Do About It

When was the last time a new LLM blew you away with its level of intelligence? Is OpenAI’s o3 mini really any better than o1? Is DeepSeek really any worse?

There have been great innovations in the past year in efficiency and features, but the models aren’t getting much smarter. We’re charging towards a Large Language Model plateau.

OpenAI’s recent announcement of their roadmap for GPT 4.5/5 is a strong signal of the coming plateau. Along with the announcement of 4.5 within “weeks” and 5 coming soon was the news that users will no longer be able to choose which model will be used to answer this question. Reading deeper into the announcement tells us 2 things:

GPT-5 isn’t ready, but OpenAI felt the need to progress, so they’re releasing a new model as GPT-4.5
OpenAI is betting that most users can’t differentiate between the models, so they’re implementing a UI change that will reduce costs.

Both of these imply that there aren’t any major intelligence leaps on the horizon, at least from OpenAI.

So what’s next?

All of this is not to say that innovation is going to slow or stop, but that the new frontier in LLMs is not going to be in AGI or superintelligence. Advances over the next few years are going to focus on training efficiency, cheaper implementations, and UI advancements.

DeepSeek has already started making strides in training efficiency with GRPO. This is only the beginning, and plans to spend $500 billion on hardware are either ridiculous folly or intentional grift.

Implementation costs are already plummeting and the raft of competing models is only going to pressure the market to process more request faster, and crucially, increase context windows so that LLMs can respond to more and more complex questions.

Finally, people are already starting to realize that chat is not always the best interface for AI. Tools like Microsoft CoPilot, Cursor, and Windsurf are going to get better and better at assisting Engineers in their day-to-day tasks.

How Can We as Software Engineers Prepare?

We’re in an AI bubble right now. Companies talking about reducing their workforce due to AI are either too bullish or masking layoffs that they were going to make anyway. At some point, the bubble will burst and there will be a great need for engineers again. But we’ll need to know how to efficiently leverage these new tools.

Anyone who’s tried to use an LLM for any project bigger than a few files will know that they’re bad at maintaining a consistent design. Even with larger context windows, models like to rewrite irrelevant parts of the code and have recency bias towards solving the current problem and ignoring the big picture.

The good news is, this forces some best practice that we should be doing anyway; namely the Single Responsibility Principle and Test-Driven Development. LLMs are good at working on small bits of testable code. Writing code this way allows us to take better advantage of LLMs along with all of the existing positives of TDD.

This leaves design and architecture up to us humans. This is an area where we’re still way better than AI. We can consider things like extensibility, readability, performance, and scalability all at once, then break the project into small testable units for the LLM to write.

These skills will be essential in a world where AI models stagnate, with the added bonus of being good practice.

LLMs are also very good at reading code, and there are more ways to leverage that than you think.

AI code reviews: feed your coding standards document along with an output of the changes into an LLM and ask it to code review. As context windows increase in size, you’ll eventually be able to feed your whole codebase in and ask for unintentional side effects.

AI documentation: LLMs are also great for summarizing large pieces of code, and for documenting APIs. When context windows get large enough, they'll be able to extract and diagram your code’s design. You could even regenerate this documentation continuously so that it’s never out of date.

AI user stories: This may be controversial, but LLMs can help to automate the most time consuming part of writing user stories- converting your natural language descriptions into acceptance criteria. Taking a general description and converting it into a detailed set of requirements that can then be reviewed for accuracy can help reduce requirements gaps and help engineers work of the correct implementation.

Conclusion

Please remember that I’m a sign, not a cop. I’m speculating based on my experiences with LLMs and having lived through many different bubble/bust cycles over the past 30 years, but nobody can predict the future… except maybe a super intelligent AI.