The Long Journey of AI in Software Engineering

Photo 65.jpg — AI automates everything and can it replace programmers? Photo: Midjourney

A team of researchers has just published a comprehensive map of the challenges facing artificial intelligence (AI) in software development, and proposed a research roadmap to push the field further.

Imagine a future where AI quietly takes over the mundane tasks of software development: refactoring tangled code, migrating legacy systems, and hunting down race conditions, so that human software engineers can focus on system architecture, design, and creative problems that machines can’t solve yet. Recent advances in AI seem to be bringing that vision closer.

However, a new study by scientists at the Computer Science and Artificial Intelligence Laboratory (CSAIL) - MIT and partner research institutes has shown that: to realize that future, we must first look directly at the very real challenges of the present time.

“Many people say programmers are no longer needed because AI automates everything,” said Armando Solar-Lezama, a professor of electrical engineering and computer science at MIT, a senior researcher at CSAIL, and the study’s lead author. “In fact, we’ve made significant progress. The tools are much more powerful than they were before. But there’s still a long way to go to realize the full potential of automation.”

Professor Armando Solar-Lezama argues that the prevailing view reduces software engineering to something like a student programming assignment: taking a small function and writing code to handle it, or doing a LeetCode-style exercise. The reality, however, is much more complex: from code refactorings to optimize designs, to large-scale migrations with millions of lines of code moving from COBOL to Java that change the entire technology base of a company.

Measurement and communication remain difficult problems

Industrial-scale code optimizations—like GPU core tweaks or multi-layer improvements in Chrome V8’s engine—are still difficult to evaluate. Current benchmarks are mostly for small, packaged problems. The most practical metric, SWE-Bench, simply asks an AI model to fix a bug on GitHub—a low-level programming exercise that involves a few hundred lines of code, potentially exposing data, and ignores a wide range of real-world scenarios, like AI-assisted refactoring, human-machine pair programming, or high-performance system rewrites with millions of lines of code. Until benchmarks expand to cover higher-risk scenarios, measuring progress—and thus accelerating it—will remain an open challenge.

In addition, human-machine communication is also a big barrier. PhD student Alex Gu - the lead author said that currently, interacting with AI is still like "a fragile communication line". When asking AI to generate code, he often receives large, unstructured files, along with a few simple and sketchy test sets. This gap is also reflected in the fact that AI cannot effectively take advantage of software tools that are familiar to humans such as debuggers, static analyzers, etc.

Call to action from the community

The authors argue that there is no magic wand solution to these problems, and call for community-scale efforts: building data that reflects the actual development process of programmers (which code to keep, which code to remove, how code is refactored over time, etc.); common assessment toolkits for refactor quality, patch durability, and system migration accuracy; and building transparent tools that allow AI to express uncertainty and invite human intervention.

PhD student Alex Gu sees this as a “call to action” for large-scale open-source communities that no single lab can deliver. Solar-Lezama envisions progress coming in small, incremental steps—“research findings that solve pieces of the problem one at a time”—transforming AI from a “code suggestion tool” to a true technical partner.

“Why does this matter? Software is already the foundation of finance, transportation, healthcare , and just about every day activity. But the human effort to build and maintain it securely is becoming a bottleneck,” Gu said. “An AI that can do the heavy lifting without making hidden errors would free up programmers to focus on creativity, strategy, and ethics. But to get there, we have to understand: finishing a piece of code is the easy part—the hard part is everything else.”

(Briefly translated from MIT News)

Source: https://vietnamnet.vn/hanh-trinh-dai-cua-ai-trong-ky-thuat-phan-mem-tu-dong-hoa-2426456.html