Artificial intelligence: What water turns into steam and how AI learns have in common

Artificial intelligence (AI) models such as: Chat GPT, Claudeand Gemini often gives the impression that the mind is working inside a machine. Recently they have “Think” according to the questiongo back and correct themselves and apologize for mistakes, imitating many of the tics of human communication.

However, to this day, there is no direct physical evidence that machine minds exist. In fact, there’s good reason to believe that what these machines are doing when they say they’re “thinking” is actually dealing with physical phenomena.

Also read | The final frontier of thinking: Will AI kill creativity?

In the 1980s, a group of physicists led by John Hopfield and Jeffrey Hinton realized that if you had a network containing millions of neurons, you could stop treating them as individual “particles” and start treating them as a system. The behavior and properties of these systems can then be explained by the rules of thermodynamics and statistical mechanics.

Hopfield and Hinton won. 2024 Nobel Prize in Physics for this research. A pair of studies published in Physical Review E took the same idea further and showed that two common “tricks” that engineers use to make AI models better are also similar physical phenomena.

achilles tendon

A neural network is a network of interconnected processors, like the neurons in a human brain, that learn and use information the same way the brain does. They can also be stacked in multiple layers, with one layer preparing the input for the next layer. Neural networks are at the heart of machine learning applications such as generative AI, self-driving cars, computer vision, and modeling.

There is also an Achilles heel called overfitting. The network becomes so fixated on a few specific examples it saw during training that it fails to understand broader patterns. Engineers have developed several techniques to prevent this. for example, Papers for October 2025 Researchers Francesco Mori and Francesca Mignacco from the University of Oxford and Princeton University focused on a technique called dropout. During training, the neural network randomly turns off a certain percentage of neurons, forcing the remaining neurons to work harder and learn concepts independently.

Abdulkadir Qanatar and Sueyoung Chong of the Flatiron Institute and New York University looked at a constraint called tolerance. August paper. They analyzed what happens when AI is instructed to ignore errors within a narrow range. Therefore, rather than trying to correct every small discrepancy, the network treats any “close enough” answer as good enough.

Although dropout and tolerance appear to be different programming choices, the authors of the two papers argued (separately) that both are governed by the same fundamental physical phenomena.

Teacher and student experiment

Both duos used a tool called the Teacher-Student Framework to explain how. Teacher is a neural network that is already familiar with the dataset, while Student is a network that starts from a completely blank page. The student’s goal is to learn the same dataset until their internal settings match the teacher’s settings.

Initially, Mori and Mignakko write, students were stuck in an “unspecialization stage” in which all their neurons were doing the same thing. In the authors’ mathematical model, this appeared as a plateau or flat line in the error graph, indicating that students were not learning.

Three stages of learning. |Photo courtesy: Phys. Rev.E 112, 045301

They argued that in order for students to become smarter, they first need to undergo a “transition of specialization.” Physicists are familiar with such transitions because they use the same mathematics to describe how liquid water turns into vapor, a process called a phase transition.

Mori and Mignacco reported that dropout injects a certain amount of noise into the system by randomly turning off neurons, which pushes the network off a plateau and toward specialized intelligence, or a phase transition. This explanation is also consistent with Hopfield and Hinton’s research. Hopfield and Hinton proved that the energy in neural networks is real by manipulating which energy to improve the network’s performance.

They even reported a formula that supposedly allows them to find the ideal dropout rate. It relates the activation probability, which is the probability that a neuron will spit out a particular output for a given set of inputs, to the learning rate, noise level, and learning capacity of the teacher and student network.

like an atom

Canatar and Chung also discovered that the consequences of changing tolerances on a network can be explained using the laws of physics, and they did so by applying the results to a double descent problem. If you give the network more data, performance may suddenly get worse before suddenly getting better. According to Canatar and Chung, once the network has learned as many examples as it has internal configurations, it reaches a stage where it wants more information. Without that information, we begin to overfit what we already “know” to every problem along the way.

The machine doesn’t reach this overfitting stage not only because its algorithm is flawed, but also because its millions of parameters are like a collection of atoms that try and fail to undergo phase transitions, they added. As a result, the neuron’s calculation results are full of errors.

What is the solution? “Kanatar and Chong discovered an important value of tolerance that separates two regimes: one in which the neural network perfectly adapts to the training data, and one in which overfitting is avoided. Physically speaking, this crossover of regimes…corresponds to a phase transition,” Hugo Cuy, a researcher at the University of Paris-Saclay and France’s National Center for Scientific Research, writes in a commentary. physics.

some limitations

Mori and Mignacco used a two-layer neural network, but it’s a toy model compared to the large-scale, multi-layer deep learning networks that power AI models like ChatGPT and self-driving cars. Nevertheless, they wrote that the “mechanism” they discovered answered “several unanswered questions regarding the mechanisms that drive dropout-induced performance gains.”

Meanwhile, Canatar and Chung applied their equation to ResNet, an advanced type of neural network used to solve real-world problems such as computer vision. They said the same geometric and thermodynamic rules they found in the simpler model hold true in this setting.

For decades, engineers have often treated machine learning as a kind of “black box,” simply tinkering with code until it works, but without understanding why it works. But in the 1980s, there was a growing belief that machine intelligence, although a complex product, was a product of statistical mechanics, a process well understood by physicists. According to this logic, the inner workings of machines are no more mysterious than physical systems that can be deciphered using undergraduate physics.

These studies hint at a future where scientists can use analytical theory like the one described in the paper to estimate the performance of an AI model even before launching it.

mukunth.v@thehindu.co.in

issued – March 2, 2026 7:30 AM IST

Latest Update