PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: AI scaling

  • Ilya Sutskever on the “Age of Research”: Why Scaling Is No Longer Enough for AGI

    In a rare and revealing discussion on November 25, 2025, Ilya Sutskever sat down with Dwarkesh Patel to discuss the strategy behind his new company, Safe Superintelligence (SSI), and the fundamental shifts occurring in the field of AI.

    TL;DW

    Ilya Sutskever argues we have moved from the “Age of Scaling” (2020–2025) back to the “Age of Research.” While current models ace difficult benchmarks, they suffer from “jaggedness” and fail at basic generalization where humans excel. SSI is betting on finding a new technical paradigm—beyond just adding more compute to pre-training—to unlock true superintelligence, with a timeline estimated between 5 to 20 years.


    Key Takeaways

    • The End of the Scaling Era: Scaling “sucked the air out of the room” for years. While compute is still vital, we have reached a point where simply adding more data/compute to the current recipe yields diminishing returns. We need new ideas.
    • The “Jaggedness” of AI: Models can solve PhD-level physics problems but fail to fix a simple coding bug without introducing a new one. This disconnect proves current generalization is fundamentally flawed compared to human learning.
    • SSI’s “Straight Shot” Strategy: Unlike competitors racing to release incremental products, SSI aims to stay private and focus purely on R&D until they crack safe superintelligence, though Ilya admits some incremental release may be necessary to demonstrate power to the public.
    • The 5-20 Year Timeline: Ilya predicts it will take 5 to 20 years to achieve a system that can learn as efficiently as a human and subsequently become superintelligent.
    • Neuralink++ as Equilibrium: In the very long run, to maintain relevance in a world of superintelligence, Ilya suggests humans may need to merge with AI (e.g., “Neuralink++”) to fully understand and participate in the AI’s decision-making.

    Detailed Summary

    1. The Generalization Gap: Humans vs. Models

    A core theme of the conversation was the concept of generalization. Ilya highlighted a paradox: AI models are superhuman at “competitive programming” (because they’ve seen every problem exists) but lack the “it factor” to function as reliable engineers. He used the analogy of a student who memorizes 10,000 problems versus one who understands the underlying principles with only 100 hours of study. Current AIs are the former; they don’t actually learn the way humans do.

    He pointed out that human robustness—like a teenager learning to drive in 10 hours—relies on a “value function” (often driven by emotion) that current Reinforcement Learning (RL) paradigms fail to capture efficiently.

    2. From Scaling Back to Research

    Ilya categorized the history of modern AI into eras:

    • 2012–2020: The Age of Research (Discovery of AlexNet, Transformers).
    • 2020–2025: The Age of Scaling (The consensus that “bigger is better”).
    • 2025 Onwards: The New Age of Research.

    He argues that pre-training data is finite and we are hitting the limits of what the current “recipe” can do. The industry is now “scaling RL,” but without a fundamental breakthrough in how models learn and generalize, we won’t reach AGI. SSI is positioning itself to find that missing breakthrough.

    3. Alignment and “Caring for Sentient Life”

    When discussing safety, Ilya moved away from complex RLHF mechanics to a more philosophical “North Star.” He believes the safest path is to build an AI that has a robust, baked-in drive to “care for sentient life.”

    He theorizes that it might be easier to align an AI to care about all sentient beings (rather than just humans) because the AI itself will eventually be sentient. He draws parallels to human evolution: just as evolution hard-coded social desires and empathy into our biology, we must find the equivalent “mathematical” way to hard-code this care into superintelligence.

    4. The Future of SSI

    Safe Superintelligence (SSI) is explicitly an “Age of Research” company. They are not interested in the “rat race” of releasing slightly better chatbots every few months. Ilya’s vision is to insulate the team from market pressures to focus on the “straight shot” to superintelligence. However, he conceded that demonstrating the AI’s power incrementally might be necessary to wake the world (and governments) up to the reality of what is coming.


    Thoughts and Analysis

    This interview marks a significant shift in the narrative of the AI frontier. For the last five years, the dominant strategy has been “scale is all you need.” For the godfather of modern AI to explicitly declare that era over—and that we are missing a fundamental piece of the puzzle regarding generalization—is a massive signal.

    Ilya seems to be betting that the current crop of LLMs, while impressive, are essentially “memorization engines” rather than “reasoning engines.” His focus on the sample efficiency of human learning (how little data we need to learn a new skill) suggests that SSI is looking for a new architecture or training paradigm that mimics biological learning more closely than the brute-force statistical correlation of today’s Transformers.

    Finally, his comment on Neuralink++ is striking. It suggests that in his view, the “alignment problem” might technically be unsolvable in a traditional sense (humans controlling gods), and the only stable long-term outcome is the merger of biological and digital intelligence.

  • Extropic’s Thermodynamic Revolution: 10,000x More Efficient AI That Could Smash the Energy Wall

    Artificial intelligence is about to hit an energy wall. As data centers devour gigawatts to power models like GPT-4, the cost of computation is scaling faster than our ability to produce electricity. Extropic Corporation, a deep-tech startup founded three years ago, believes it has found a way through that wall — by reinventing the computer itself. Their new class of thermodynamic hardware could make generative AI up to 10,000× more energy-efficient than today’s GPUs:contentReference[oaicite:0]{index=0}.

    From GPUs to TSUs: The End of the Hardware Lottery

    Modern AI runs on GPUs — chips originally designed for graphics rendering, not probabilistic reasoning. Each floating-point operation burns precious joules moving data across silicon. Extropic argues that this design is fundamentally mismatched to the needs of modern AI, which is probabilistic by nature. Instead of computing exact results, generative models sample from vast probability spaces. The company’s solution is the Thermodynamic Sampling Unit (TSU) — a chip that doesn’t process numbers, but samples from probability distributions directly:contentReference[oaicite:1]{index=1}.

    TSUs are built entirely from standard CMOS transistors, meaning they can scale using existing semiconductor fabs. Unlike exotic academic approaches that require magnetic junctions or optical randomness, Extropic’s design uses the natural thermal noise of transistors as its source of entropy. This turns what engineers usually fight to suppress — noise — into the very fuel for computation.

    X0 and XTR-0: The Birth of a New Computing Platform

    Extropic’s first hardware platform, XTR-0 (Experimental Testing & Research Platform 0), combines a CPU, FPGA, and sockets for daughterboards containing early test chips called X0. X0 proved that all-transistor probabilistic circuits can generate programmable randomness at scale. These chips perform operations like sampling from Bernoulli, Gaussian, or categorical distributions — the building blocks of probabilistic AI:contentReference[oaicite:2]{index=2}.

    The company’s pbit circuit acts like an electronic coin flipper, generating millions of biased random bits per second using 10,000× less energy than a GPU’s floating-point addition. Higher-order circuits like pdit (categorical sampler), pmode (Gaussian sampler), and pMoG (mixture-of-Gaussians generator) expand the toolkit, enabling full probabilistic models to be implemented natively in silicon. Together, these circuits form the foundation of the TSU architecture — a physical embodiment of energy-based computation:contentReference[oaicite:3]{index=3}.

    The Denoising Thermodynamic Model (DTM): Diffusion Without the Energy Bill

    Hardware alone isn’t enough. Extropic also introduced a new AI algorithm built specifically for TSUs — the Denoising Thermodynamic Model (DTM). Inspired by diffusion models like Stable Diffusion, DTMs chain together multiple energy-based models that gradually denoise data over time. This architecture avoids the “mixing–expressivity trade-off” that plagues traditional EBMs, making them both scalable and efficient:contentReference[oaicite:4]{index=4}.

    In simulations, DTMs running on modeled TSUs matched GPU-based diffusion models on image-generation benchmarks like Fashion-MNIST — while consuming roughly one ten-thousandth the energy. That’s the difference between joules and picojoules per image. The company’s open-source library, thrml, lets researchers simulate TSUs today, and even replicate the paper’s results on a GPU before the chips ship.

    The Physics of Intelligence: Turning Noise Into Computation

    At the heart of thermodynamic computing is a radical idea: computation as a physical relaxation process. Instead of enforcing digital determinism, TSUs let physical systems settle into low-energy configurations that correspond to probable solutions. This isn’t metaphorical — the chips literally use thermal fluctuations to perform Gibbs sampling across energy landscapes defined by machine-learned functions:contentReference[oaicite:5]{index=5}.

    In practical terms, it’s like replacing the brute-force precision of a GPU with the subtle statistical behavior of nature itself. Each transistor becomes a tiny particle in a thermodynamic system, collectively simulating the world’s most efficient sampler: reality.

    From Lab Demo to Scalable Platform

    The XTR-0 kit is already in the hands of select researchers, startups, and tinkerers. Its modular design allows easy upgrades to upcoming chips — like Z-1, Extropic’s first production-scale TSU, which will support complex probabilistic machine learning workloads. Eventually, TSUs will integrate directly with conventional accelerators, possibly as PCIe cards or even hybrid GPU-TSU chips:contentReference[oaicite:6]{index=6}.

    Extropic’s roadmap extends beyond AI. Because TSUs efficiently sample from continuous probabilistic systems, they could accelerate simulations in physics, chemistry, and biology — domains that already rely on stochastic processes. The company envisions a world where thermodynamic computing powers climate models, drug discovery, and autonomous reasoning systems, all at a fraction of today’s energy cost.

    Breaking the AI Energy Wall

    Extropic’s October 2025 announcement comes at a pivotal time. Data centers are facing grid bottlenecks across the U.S., and some companies are building nuclear-adjacent facilities just to keep up with AI demand:contentReference[oaicite:7]{index=7}. With energy costs set to define the next decade of AI, a 10,000× improvement in energy efficiency isn’t just an innovation — it’s a revolution.

    If Extropic’s thermodynamic hardware lives up to its promise, it could mark a “zero-to-one” moment for computing — one where the laws of physics, not the limits of silicon, define what’s possible. As the company put it in their launch note: “Once we succeed, energy constraints will no longer limit AI scaling.”

    Read the full technical paper on arXiv and explore the official Extropic site for their thermodynamic roadmap.

  • Andrej Karpathy on the Decade of AI Agents: Insights from His Dwarkesh Podcast Interview

    TL;DR

    Andrej Karpathy’s reflections on artificial intelligence trace the quiet, inevitable evolution of deep learning systems into general-purpose intelligence. He emphasizes that the current breakthroughs are not sudden revolutions but the result of decades of scaling simple ideas — neural networks trained with enormous data and compute resources. The essay captures how this scaling leads to emergent behaviors, transforming AI from specialized tools into flexible learning systems capable of handling diverse real-world tasks.

    Summary

    Karpathy explores the evolution of AI from early, limited systems into powerful general learners. He frames deep learning as a continuation of a natural process — optimization through scale and feedback — rather than a mysterious or handcrafted leap forward. Small, modular algorithms like backpropagation and gradient descent, when scaled with modern hardware and vast datasets, have produced behaviors that resemble human-like reasoning, perception, and creativity.

    He argues that this progress is driven by three reinforcing trends: increased compute power (especially GPUs and distributed training), exponentially larger datasets, and the willingness to scale neural networks far beyond human intuition. These factors combine to produce models that are not just better at pattern recognition but are capable of flexible generalization, learning to write code, generate art, and reason about the physical world.

    Drawing from his experience at OpenAI and Tesla, Karpathy illustrates how the same fundamental architectures power both self-driving cars and large language models. Both systems rely on pattern recognition, prediction, and feedback loops — one for navigating roads, the other for navigating language. The essay connects theory to practice, showing that general-purpose learning is not confined to labs but already shapes daily technologies.

    Ultimately, Karpathy presents AI as an emergent phenomenon born from scale, not human ingenuity alone. Just as evolution discovered intelligence through countless iterations, AI is discovering intelligence through optimization — guided not by handcrafted rules but by data and feedback.

    Key Takeaways

    • AI progress is exponential: Breakthroughs that seem sudden are the cumulative effect of scaling and compounding improvements.
    • Simple algorithms, massive impact: The underlying principles — gradient descent, backpropagation, and attention — are simple but immensely powerful when scaled.
    • Scale is the engine of intelligence: Data, compute, and model size form a triad that drives emergent capabilities.
    • Generalization emerges from scale: Once models reach sufficient size and data exposure, they begin to generalize across modalities and tasks.
    • Parallel to evolution: Intelligence, whether biological or artificial, arises from iterative optimization processes — not design.
    • Unified learning systems: The same architectures can drive perception, language, planning, and control.
    • AI as a natural progression: What humanity is witnessing is not an anomaly but a continuation of the evolution of intelligence through computation.

    Discussion

    The essay invites a profound reflection on the nature of intelligence itself. Karpathy’s framing challenges the idea that AI development is primarily an act of invention. Instead, he suggests that intelligence is an attractor state — something the universe converges toward given the right conditions: energy, computation, and feedback. This idea reframes AI not as an artificial construct but as a natural phenomenon, emerging wherever optimization processes are powerful enough.

    This perspective has deep implications. It implies that the future of AI is not dependent on individual breakthroughs or genius inventors but on the continuation of scaling trends — more data, more compute, more refinement. The question becomes not whether AI will reach human-level intelligence, but when and how we’ll integrate it into our societies.

    Karpathy’s view also bridges philosophy and engineering. By comparing machine learning to evolution, he removes the mystique from intelligence, positioning it as an emergent property of systems that self-optimize. In doing so, he challenges traditional notions of creativity, consciousness, and design — raising questions about whether human intelligence is just another instance of the same underlying principle.

    For engineers and technologists, his message is empowering: the path forward lies not in reinventing the wheel but in scaling what already works. For ethicists and policymakers, it’s a reminder that these systems are not controllable in the traditional sense — their capabilities unfold with scale, often unpredictably. And for society as a whole, it’s a call to prepare for a world where intelligence is no longer scarce but abundant, embedded in every tool and interaction.

    Karpathy’s work continues to resonate because it captures the duality of the AI moment: the awe of creation and the humility of discovery. His argument that “intelligence is what happens when you scale learning” provides both a technical roadmap and a philosophical anchor for understanding the transformations now underway.

    In short, AI isn’t just learning from us — it’s showing us what learning itself really is.