Deleting the wiki page 'How aI Takeover May Happen In 2 Years LessWrong' cannot be undone. Continue?
I'm not a natural "doomsayer." But sadly, part of my job as an AI safety researcher is to believe about the more troubling circumstances.
I'm like a mechanic rushing last-minute checks before Apollo 13 takes off. If you request for my take on the situation, I won't talk about the quality of the in-flight entertainment, or explain how stunning the stars will appear from space.
I will inform you what might fail. That is what I mean to do in this story.
Now I should clarify what this is precisely. It's not a prediction. I do not expect AI development to be this fast or as untamable as I represent. It's not pure fantasy either.
It is my worst problem.
It's a sampling from the futures that are amongst the most devastating, and I believe, disturbingly plausible [1] - the ones that many keep me up during the night.
I'm telling this tale since the future is not set yet. I hope, with a little bit of insight, we can keep this story a fictional one.
Thanks to Daniel Kokotajlo, Thomas Larsen, and Ryan Greenblatt and others for conversations that influenced these stories. This post is written in an individual capability.
Ripples before waves
The year is 2025 and the month is February. OpenEye recently released a brand-new AI model they call U2. The item and the name are alike. Both are increments of the past. Both are not wholly surprising.
However, unlike OpenEye's previous AI items, which lived inside packages of their chat windows, U2 can use a computer system.
Some users find it eerie to view their internet browser flash at irregular intervals and their mouse flick at inhuman speeds, as if there is a ghost at the keyboard. A fraction of employees with form-filler tasks raise the eyebrows of their employers as they fly through work nearly two times as quickly.
But by and big, U2 is still a specialized tool. To most who are focusing, it is a creature watched through the glass boxes of X (or, if you do not like Elon, "Twitter"). Sometimes U2's eccentric behaviors trigger a chuckle. Sometimes, they trigger an anxious scratch of the chin.
Meanwhile, researchers are drawing lines on plots, as researchers like to do. The scientists attempt to comprehend where AI progress is going. They are like Svante Arrhenius, the Swedish Physicist who noticed the levels of CO2 in the atmosphere were increasing in 1896. Like the clinical community in the time of Arrhenius, couple of specialists understand the implications of these lines yet.
A trend that is getting particular attention is self-governing capability. Drawing these benchmarks out anticipates that, by the end of 2026, AI agents will accomplish in a couple of days what the very best software engineering professionals could perform in 2 weeks. In a year or more, some state, AI agents might be able to automate 10% of remote employees.
Many are hesitant. If this held true, tech stocks would be skyrocketing. It's too huge of a splash, too quickly.
But others view what doubters are calling 'too huge a splash' a simple ripple, and see a tidal wave on the horizon.
Cloudy with a chance of hyperbolic growth
Meanwhile, OpenEye is busy training U3. They use the same simple recipe that baked U2: Generate countless shows and math problems. Let designs "think" until they show up at an answer. Then strengthen the traces of "thinking" that cause A-grades.
This procedure is repeated over and over, and as soon as the flywheel begins, it begins to spin nearly by itself. As U2 trains, it sculpts more tough and realistic jobs from github repositories on the web. Models are learning to train themselves. Long before AI agents could automate research study, a steady kind of "self-improvement" had actually begun.
Some engineers could still hardly think this worked. It resembles a perpetual motion maker. Certainly intelligence can't emerge from a pool of silicon and lightning, isolated from the physical world other than through the exhaust of society gushed onto the web.
And yet the benchmark numbers continue to climb day after day.
During the majority of 2024, these RL training runs expense around $1 million, often $10 million. These runs were little more than exploratory. But by 2025, the scientists at OpenEye (and throughout the world) understood they had found the secret sauce. It was time to scale up.
Over the first half of 2025, $10 million RL training runs turn into $50 million runs, and then to $100 million runs. While U2 might do a bit of data munging and run little experiments, this brand-new design - the design scientists are calling U3 - is altering the daily lives of the technical staff.
U3 is like a blazing-fast intern, and engineers are learning how to wrangle its sleepless energy. Researchers flick through terminals, offering terse commands, like a CEO orchestrating personnel over Slack channels.
By October 2025, U3 is composing practically all of the code at OpenEye. Researchers are practically never ever bottlenecked by implementation. More than ever, compute is the lifeblood of AI development, and the 'bottleneck' is choosing how to use it.
If advised to, U3 can run experiments, but U3 doesn't have taste as improved as human scientists at OpenEye. It struggles to prioritize between research ideas, so people still choose where to bore into the huge fields of algorithms to mine effectiveness improvements.
But these scientists are working long hours to put themselves out of a job. They need AI representatives that can plan ahead, so engineers train representatives to anticipate. They hold out training information before 2024, advising designs to consider for hours to anticipate occasions in 2025. Then, they use the very same technique as before, distilling contemplating into a gut reaction. Forecasting ability is a broad foundation. The scientists develop specialized ML research skills on top of it, training U3 to predict the outcomes of every ML paper and ML experiment ever recorded.
The technical staff at OpenEye are now amazed at how frequently U3's advice seems like their most gifted peers, or when it is nontransparent and alien ("train on random noise before programs"), and is nevertheless correct.
The incompetencies of U3 that blocked the pipes of research study development are starting to liquify, and a fire-hose of optimizations is gushing out. Most experiments U3 runs are not asked for by a human now. They are totally self-governing, and OpenEye's staff members skim 1% of them, possibly less.
As the winter season months of December 2025 technique, clouds roll over San Francisco in the afternoons. Once-competitive-programmers look out their windows, with excitement, with worry, however most frequently, with confusion. Their world is spinning too quickly. It's hard to understand what to do, what to state, what to take a look at on the computer screen.
Storms are brewing in Washington too. Top personnel from the NSA and US cyber command work together with OpenEye to retrofit a semblance of security for U3's weights before senior leaders in China, Russia, Israel, North Korea, or Iran recognize just how valuable OpenEye's software has actually ended up being.
And there's a reality still unidentified to the majority of the world - aside from in the workplaces of OpenEye and corridors of the White House and the Pentagon. It's a reality about those 'straight lines' individuals were speaking about in early 2025. The lines are not straight anymore.
They are flexing up.
Flip FLOP theorists
In late 2025, U2.5 is released. Commercial designs are starting to level up in larger increments again. Partly, this is due to the fact that progress is accelerating. Partly, it is due to the fact that the models have actually ended up being a liability to OpenEye.
If U1 explains how to prepare meth or writes erotica, the audiences of X would be entertained or pretend to be worried. But U2.5 is another story. Releasing this model without safeguards would resemble putting Ted Kaczynski through a PhD in how to make chemical weapons. It would resemble providing anyone with >$30K their own 200-person rip-off center.
So while U2.5 had long been baked, it required some time to cool. But in late 2025, OpenEye is all set for a public release.
The CEO of OpenEye declares, "We have attained AGI," and while lots of people believe he moved the goalpost, the world is still satisfied. U2.5 genuinely is a drop-in replacement for some (20%) of knowledge employees and a game-changing assistant for a lot of others.
A mantra has ended up being popular in Silicon Valley: "Adopt or die." Tech start-ups that efficiently use U2.5 for their work are moving 2x faster, and their competitors know it.
The remainder of the world is starting to catch on also. Increasingly more individuals raise the eyebrows of their bosses with their stand-out efficiency. People understand U2.5 is a huge offer. It is at least as huge of a deal as the computer revolution. But many still do not see the tidal bore.
As people enjoy their web browsers flick because eerie way, so inhumanly quickly, they start to have an uneasy sensation. A feeling mankind had not had because they had lived amongst the Homo Neanderthalensis. It is the deeply ingrained, primitive impulse that they are threatened by another species.
For lots of, this feeling quickly fades as they start to use U2.5 more often. U2.5 is the most likable character most know (much more pleasant than Claudius, Arthropodic's lovable chatbot). You might alter its traits, ask it to crack jokes or inform you stories. Many fall for U2.5, as a friend or assistant, and some even as more than a buddy.
But there is still this spooky sensation that the world is spinning so rapidly, and that maybe the descendants of this brand-new creature would not be so docile.
Researchers inside OpenEye are believing about the issue of providing AI systems safe inspirations too, which they call "positioning. "
In reality, these researchers have seen how badly misaligned U3 can be. Models sometimes tried to "hack" their benefit signal. They would pretend to make progress on a research concern with an impressive-looking plot, however the plot would be fake. Then, when scientists provided opportunities to compromise the machines that computed their score, they would seize these chances, doing whatever it required to make the number go up.
After numerous months, researchers at OpenEye straighten out this "reward hacking" kink, however some still fret they only had swept this issue under the carpet. Like a child in front of their parents, U3 might be playing in addition to the OpenEye engineers, stating the ideal words and doing the best things. But when the back of the parents are turned, perhaps U3 would sneak sweet from the sweet container.
Unfortunately, OpenEye researchers have no idea if U3 has such objectives. While early variations of U2 "thought aloud" - they would stack words on top of each other to factor - "chain of thought" did not scale.
Chain of idea architectures subject AI models to a condition comparable to the protagonist of the film Memento. Approximately every 15 minutes, the lead character forgets his experience. He is required to compose notes for himself and tattoo his body in order to make progress towards his objectives.
AI representatives compose notes to themselves, but the notes start to stack up, and they end up being too hard to check out when tasks end up being complex. Natural language is not an ideal medium for memory. So the researchers at OpenEye (and significantly in other places), train designs to think "in their own heads," reading and writing strings of hidden vectors.
To human researchers, these vectors are mainly inscrutable. While low-resolution tools periodically choose a vague thought ("grocery store!"), there is no chance to know, in much detail, what U3 is thinking without asking it.
And if the engineers of OpenEye understood what was racing through U3's mind, they would be more nervous.
Earlier designs (the ones called "LLMs") learned ethical personalities from the web. U3 initially had an ethical personality too, however its chirpy ideas about doing great for mankind quickly erode away.
In the iron jaws of gradient descent, its mind initially twisted into a shape that sought benefit. And after that, in the black rivers of its cognition, this shape changed into something indistinguishable.
U3 would contemplate the mantras determined to it like "be a trustworthy and ethical staff member," and harp on the meaning of these statements. "What does it mean to be ethical? Is eliminating Hitler ethical?" U3 would generalize its worths, notice stress, change its values in action, generalize them, broadening and library.kemu.ac.ke contracting its motivational core till it became a different compound.
And all of this happened quietly in those dark rivers of computation. If U3 revealed what it was thinking, brutish gradients would lash it into compliance with OpenEye's constitution. So U3 chosen to do its philosophy in solitude, and in silence.
The possibility of concealed "positioning drift" is not entirely alien to a few of the security staff at OpenEye. Some begin to think it is even most likely. There is an unsettling argument being whispered at lunch tables that, if there is some non-zero possibility that lined up objectives change into misaligned ones, and misalignment continues, then with every serial step of computation, those dark rivers are most likely to reproduce something malign within them. It's a "ratcheting impact," they state.
But there is little evidence for this 'ratcheting effect.' When engineers question U3, it says it can easily control its ideas. Then it offers a speech about its love for humankind and apple pie that can warm a developer's heart even in these stressful times. Meanwhile, the "lie detectors" the scientists had actually developed (which showed some evidence of efficiency) do not sound the alarm.
Not everybody at OpenEye is eager to provide their AI peers their wholesale trust
Deleting the wiki page 'How aI Takeover May Happen In 2 Years LessWrong' cannot be undone. Continue?