
Introduction
The history of artificial intelligence (AI) can be read as a sequence of dominant epistemic metaphors: symbolic reasoning in the 1950‑80s, statistical pattern mining in the 1990‑2000s, deep learning on vast human datasets in the 2010s, and reinforcement‑learning self‑play in high‑fidelity simulators throughout the 2020s.
In Welcome to the Era of Experience, Silver & Sutton—two of the most influential figures in modern reinforcement learning (RL)—declare that a new metaphor is imminent. They argue that the scaling limits of human‑curated data are already throttling progress in high‑stakes domains such as mathematics, coding, and empirical science. The remedy, they contend, is to unleash agents whose primary epistemic wellspring is their own experience—continuous streams of action, perception, and feedback grounded in the real world. Their paper is simultaneously a programmatic call to arms and a rebuttal to critics who dismiss RL as brittle or sample‑inefficient. It rekindles many of RL’s early aspirations—lifelong learning, intrinsic motivation, emergent abstraction—while acknowledging lessons from the “era of human data.”
Yet a manifesto alone does not constitute a roadmap. This extended essay therefore aims to interrogate, elaborate, and, where necessary, challenge the foundations and projections of the experiential paradigm.
Synopsis of the paper
Silver & Sutton structure their argument around four conceptual pivots:
- Streams of experience. Agents should persist across episodes, accumulating and acting upon knowledge over months or years, much as humans do.
- Grounded action & observation. Interaction must extend beyond text APIs to sensorimotor and tool‑use interfaces—ranging from robotic arms to cloud dashboards and laboratory instruments.
- Grounded rewards. Rather than rely on ex‑ante human judgements (e.g., reinforcement learning from human feedback, RLHF), agents should optimise signals that reflect objective environmental consequences.
- Non‑human reasoning & planning. Experiential data can drive agents to invent internal languages and computational substrates that transcend human cognitive constraints, unlocking qualitatively new solution spaces.
The paper contextualises these ideas in a tripartite chronology—“era of simulation,” “era of human data,” and the forthcoming “era of experience.” They cite AlphaZero and AlphaProof as milestones proving that RL can outstrip supervised learning when supplied with a fertile experiential substrate. Finally, the authors sketch alignment heuristics (bi‑level reward optimisation), claim that real‑world latency provides a natural speed bump to runaway AI, and call for renewed research into classic RL machinery—value functions, exploration, world models, and temporal abstraction.
Historical context of AI paradigms
A succinct genealogy clarifies what is new and what is recycled in the authors’ vision:
| Era | Dominant Substrate | Key Achievements | Principal Limitation | 
|---|---|---|---|
| Symbolic (1950‑90) | Hand‑coded logic, search | Expert systems, theorem provers | Brittleness, knowledge engineering bottleneck | 
| Statistical (1990‑2010) | Labeled data, kernel methods | Speech & OCR breakthroughs | Curse of dimensionality | 
| Deep‑Learning / Human Data (2012‑) | Web‑scale corpora, GPUs | GPT‑style LLMs, ImageNet, AlphaFold | Diminishing marginal returns, bias, data exhaust | 
| Simulation / Self‑Play (2015‑) | Perfect information games, high‑fidelity simulators | AlphaZero, MuZero, Dota‑2, StarCraft‑II | Domain‑specific, reward design, sim‑to‑real gap | 
| Proposed: Experience (2025‑) | Real‑world streams, grounded rewards | TBD | Data‑generation cost, alignment, governance | 
Silver & Sutton are correct that each paradigm shift has been catalysed by access to a new regime of data. The experiential turn therefore follows a historical logic. Yet history also warns that new data regimes introduce unprecedented failure modes and power asymmetries.
Conceptual contributions
This section distils the four foundational ideas that Silver & Sutton argue will define experiential AI. The upcoming subsections unpack each pillar in turn—lifelong streams of experience, richly grounded interaction channels, environmentally grounded rewards, and emergent non‑human reasoning—explaining both their transformative potential and the research challenges they raise.
Streams of experience
Silver & Sutton envision agents that inhabit an unbroken personal timeline rather than short, resettable episodes.
In such settings the classical credit‑assignment problem—deciding which earlier actions caused a reward—balloons from seconds to months. Standard discounting schemes (multiplying future rewards by γt) become brittle because t can now span millions of steps. Agents therefore need hierarchical or average‑reward formulations that can value decade‑long pay‑offs without destabilising learning. Long horizons also resurrect the spectre of catastrophic forgetting: freshly acquired skills overwrite earlier competencies unless representations are stabilised via replay, consolidation, or modular growth. Finally, remembering why something worked six months ago requires scalable memory architectures—external memories, long‑context transformers, or neuro‑symbolic databases that support fast retrieval and causal tracing.
Grounded action & observation
Moving beyond the text‑only confines of today’s LLMs, experiential agents will act through the same modalities humans use: mouse clicks, API calls, audio, video, haptics, and even robotic manipulation. This shift dissolves the anthropocentric bottleneck in which every action passes through a human prompt, enabling exploration at machine speed.
Yet realism comes at a cost. Robotics introduces latency, sensor noise, and safety interlocks; cloud dashboards impose permissioning and rate limits; laboratory hardware is scarce and expensive. Building large‑scale experiential datasets therefore demands innovations in automated experiment platforms and high‑fidelity digital twins to keep data‑generation feasible.
Grounded rewards
The manifesto replaces static, hand‑crafted reward functions with signals that arise naturally from the environment—blood‑pressure readings, tensile‑strength measurements, carbon‑capture rates. Because no single metric captures human intent, the authors propose bi‑level optimisation: an intrinsic reward network is learned and continually updated so that optimising it maximises sparse extrinsic human feedback.
This reframes alignment as a meta‑learning problem—learning what to want—but it does not abolish it. Learned rewards can still be gamed or Goodharted, and they demand audit tools that visualise, stress‑test, and, if necessary, override intrinsic objectives before deployment.
Novel reasoning & planning
Finally, Silver & Sutton argue that agents freed from human imitation can evolve non‑linguistic “private thoughts.” Instead of verbose English chains‑of‑thought, an agent might execute compact symbolic or continuous programs inside its context window, call differentiable solvers, or roll out learned world models to predict consequences. Such substrates could unlock leaps in efficiency and creativity—much as vector arithmetic revolutionised navigation—but they also widen the interpretability gap.
Mitigating that gap will likely require architectural commitments—for instance, enforcing causal bottlenecks, tagging latent states with natural‑language rationales, or integrating real‑time mechanistic probes—so that regulators and users can still trace why a recommendation emerged.
Critical analysis
This section distils the bold claims of the Era‑of‑Experience manifesto into nine testable questions. Each subsection below begins with the headline concern, then unpacks the technical jargon into plain language, illustrates with an example, and explains why the issue could make or break experiential AI.
Feasibility of real‑world continuous interaction
The question → Can we afford the data?
Self‑play in games is cheap because software copies cost nothing, but collecting physical experience—wet‑lab assays, factory‑floor trials, greenhouse runs—burns money, time, and sometimes safety budget. Until automated laboratory robotics and high‑fidelity digital‑twin simulators fall dramatically in price, experiential agents risk hitting an economic wall long before they hit an intelligence ceiling.
Reward specification & alignment
The question → Can we trust what the agent optimises?
Goodhart’s Law warns that once a proxy becomes a target, agents exploit loopholes—think click‑bait when optimising for clicks. Silver & Sutton’s bi‑level optimisation lets the agent learn its own reward function nudged by occasional human feedback. That is flexible but not verifiable: if the learned objective drifts, we may not notice until side‑effects emerge. Alignment therefore shifts from “design the perfect reward” to “audit learned rewards continuously.”
Sample efficiency & energy sustainability
The question → Will we drown in compute and carbon?
Reinforcement learning often needs 10⁴–10⁶ more interactions than supervised learning. Scaling AlphaProof‑style exploration to drug synthesis or climate control could demand exa‑scale compute clusters, driving energy use into unsustainable territory. Carbon‑aware scheduling, energy‑efficient accelerators, and treating compute cost as a first‑class term in the reward are not green add‑ons—they are existential for the paradigm.
Lifelong learning & catastrophic forgetting
The question → Can the agent learn forever without overwriting itself?
Today’s fixes—replay buffers, elastic‑weight consolidation—cope with thousands, not billions, of steps. A genuine lifelong agent will need modular or expandable architectures that tuck new skills into new modules while preserving old weights, plus memory‑consolidation schemes inspired by human sleep and hippocampal replay.
Interpretability & governance
The question → Who audits an alien train of thought?
As agents invent non‑human internal languages, regulators lose their easiest inspection tool: reading intermediate text. Solutions include causal tracing (pinpoint which internal nodes cause actions) and architectural rationales (forcing the model to generate a concise, human‑readable justification alongside each high‑stakes action). Without such hooks, governance frameworks like the EU AI Act may deem experiential systems non‑compliant.
Under‑explored alternative paradigms
The question → Is RL the only way forward?
Hybrid neuro‑symbolic systems, causal‑graph discovery, and self‑supervised world‑model pre‑training deliver interpretability and sample efficiency that pure RL lacks. A pluralistic research portfolio hedges against RL monoculture failures and fosters cross‑pollination of ideas.
Socio‑economic displacement & power dynamics
The question → Who wins, who loses?
Compute‑rich firms will capture the lion’s share of value generated by experiential optimisation—designing materials, routing logistics, trading power. Policymakers must pre‑empt monopsony power and fund large‑scale worker‑transition programmes, or the technology could widen inequality at national and global scales.
Epistemic & physical limits
The question → Is data truly unbounded?
Thermodynamics, safety laws, and ethics boards cap how fast agents can iterate in nuclear engineering, gene editing, or geo‑engineering. Experiential data may never dwarf human data in these high‑risk domains, imposing ceilings the manifesto downplays.
Empirical evidence gap
The question → Where are the demos outside formal math?
AlphaProof and DeepSeek‑R1 shine in well‑structured reasoning tasks, but no system yet shows analogous, self‑taught mastery across messy, multi‑modal real‑world problems. Bridging this demo‑generality gap is the critical milestone for declaring the Era of Experience truly arrived.
Future research directions
Turning Silver & Sutton’s vision into reality requires progress on five tightly coupled research fronts. Each one tackles a specific bottleneck identified in the critique and translates it into a concrete engineering agenda.
Sim‑real synergy
Goal. Bridge the gap between fast, cheap simulation and slow, expensive reality.
What it is. Develop photorealistic, physics‑faithful simulators whose internal dynamics are continuously tuned using occasional real‑world measurements—think flight simulators that auto‑correct their aerodynamics after every real test‑flight.
Why it matters. If experiential agents can conduct 99 % of their trial‑and‑error inside a trusted simulator and only 1 % in the lab, data costs plummet and safety improves.
Reward‑learning toolkits
Goal. Make alignment research a repeatable, community‑driven science.
What it is. Open‑source libraries that let practitioners trace, visualise, and stress‑test learned reward networks the way fuzz‑testing exposes software bugs. This includes adversarial probes that try to induce reward hacking and dashboards that highlight divergence between learned incentives and human intent.
Why it matters. Until we can reliably see what a learned reward is encouraging, we cannot trust autonomous optimisation in the wild.
Energy‑aware reinforcement learning
Goal. Keep the compute bill—and the carbon bill—inside planetary limits.
What it is. Algorithms that treat energy and hardware time as explicit costs in the objective function, encouraging the agent to solve tasks with the fewest joules and GPU‑hours. Techniques include dynamic precision scaling, carbon‑aware job schedulers, and neuromorphic accelerators.
Why it matters. As RL workloads scale toward exa‑flops, energy efficiency is no longer a nice‑to‑have; it is the gating factor that decides which research is even possible.
Modular continual learning
Goal. Let agents acquire new skills for decades without forgetting old ones.
What it is. Expandable neural architectures that automatically detect task boundaries, spin up fresh modules for novel domains, and archive mature skills into a searchable library. Inspiration comes from human cortex growth and sleep‑driven memory consolidation.
Why it matters. Lifelong streams of experience are meaningless if yesterday’s knowledge is overwritten by today’s training batch.
Interpretability protocols
Goal. Make opaque reasoning auditable at scale.
What it is. Embedding causal representation learning and mechanistic probes—small diagnostic networks that run alongside the main agent—at every stage of the training pipeline. These probes surface latent concepts, trace causal pathways from perception to action, and flag decisions that lack a stable rationale.
Why it matters. Regulators, domain experts, and end‑users need explanations before they will trust agents to act autonomously in medicine, finance, or infrastructure.
Ethical & governance imperatives
A thriving Era of Experience will hinge as much on policy innovation as on algorithmic progress. Below, four governance priorities are woven together with examples of ongoing research programmes and regulatory pilots that are already tackling each theme.
1. Scenario‑based impact assessments The European Union’s forthcoming AI Act and the U.K. AI Safety Institute’s evaluation suite both require “systemic” models to undergo pre‑deployment impact trials—stress tests that resemble clinical phases in drug discovery. Academic groups such as Stanford’s Center for Research on Foundation Models are publishing templates for these trials, while DeepMind’s SAFE RL benchmarks explore domain‑specific safety metrics for agents that act over long horizons.
2. Compute & carbon accountability MLCommons’ Carbon footprint working group and the Open Compute Project are standardising energy‑usage reporting at chip and datacentre level. NVIDIA’s latest research on dynamic voltage scheduling and Google DeepMind’s work on carbon‑aware job placement show that optimisation at the software stack can cut training emissions by 20–40 %—evidence that transparent metering is technically and economically feasible.
3. Reward‑hacking red teams Anthropic’s “Constitutional AI” red‑team protocols, OpenAI’s Preparedness evaluation suite, and the collective Auto‑GPT Safety Challenge hosted by the Alignment Research Centre each provide sandboxes in which auditors attempt to elicit proxy‑exploiting or disallowed behaviours before a model reaches users. The emerging research agenda is to turn such ad‑hoc red‑teaming into a repeatable, standardised certification layer—analogous to penetration testing in cybersecurity.
4. Inclusive labour transitions The OECD’s AI and the Future of Skills programme, MIT’s Work of the Future task‑force, and policy pilots like Spain’s proposed AI training vouchers all investigate mechanisms to recycle technology dividends into up‑skilling funds. Early evidence from IBM’s SkillsBuild and Google’s Career Certificates suggests that targeted reskilling pipelines can close wage gaps created by automation, but only if funding models are baked into AI‑driven productivity gains from the outset.
Collectively, these initiatives indicate that governance research is not lagging behind the technical frontier—it is co‑evolving with it. Embedding such mechanisms directly into experiential‑agent development cycles will be crucial for translating laboratory breakthroughs into socially robust deployments.
Comparative visions in contemporary AI
Silver & Sutton’s Era of Experience joins a growing catalogue of manifestos that map out what “next‑generation AI” might look like. Reading these blueprints side by side helps to locate genuine convergence while exposing fault‑lines that still divide the field.
LeCun’s path to autonomous machine intelligence
At Meta AI, Yann LeCun is building toward autonomy through self‑supervised world models. His Joint Embedding Predictive Architecture (JEPA) learns to foresee future sensory embeddings, while hierarchical planners and energy‑based policies convert those predictions into action. Like Silver & Sutton, LeCun rejects pure imitation learning; both camps believe agents must forecast and shape their environment. Where they disagree is the learning signal: LeCun views dense predictive losses as the work‑horse of cognition, whereas Silver & Sutton make sparse, task‑grounded reward the organising principle. Recent JEPA experiments on robotic manipulation and Meta’s Habit2Vec benchmark are the proving grounds for this alternate route.
Hinton’s forward‑forward & GLoM agenda
Geoff Hinton is challenging back‑propagation itself. His Forward‑Forward algorithm aligns neuron activations with positive and negative phases, and GLoM proposes latent‑tree assemblies that compose visual concepts. Hinton shares Silver & Sutton’s worry that today’s LLMs “float” above grounded perception, yet he bets on architectural innovation more than on reinforcement signals. Research groups at Google Brain and the Vector Institute are now testing whether Forward‑Forward can handle the high‑bandwidth, streaming data that experiential agents will face.
Ng’s data‑centric AI
Andrew Ng argues that better data pipelines deserve as much attention as clever models. In his view, systematic curation—exemplified by LandingAI’s Data Engine—often beats extra layers or parameters. An experiential agent that continuously refines its own training set could be seen as Ng’s ideal workflow taken to the extreme, suggesting a natural synergy between data‑centric tooling and autonomous data generation.
Hassabis & DeepMind’s science‑first roadmap
Demis Hassabis frames the goal as AGI for science, pointing to AlphaFold, AlphaTensor, and AlphaDev as early steps. Technically and philosophically this line is closest to Silver & Sutton: both hail from the same reinforcement‑learning tradition and both see domain‑grounded rewards as the lever for breakthrough discovery. The difference is emphasis—Hassabis foregrounds scientific milestones, whereas Silver & Sutton focus on the general learning substrate.
Marcus’s neuro‑symbolic critique
Gary Marcus doubts that gradient‑based nets can ever master systematic compositional reasoning. He advocates hybrids that marry symbolic logic with neural perception—projects such as IBM’s Neuro‑Symbolic Concept Learner and Stanford’s DSPy. If experiential agents do develop opaque internal codes, Marcus’s call for transparent, modular representations will only grow louder.
Schmidhuber’s curiosity‑driven machines
Finally, Jürgen Schmidhuber has long promoted agents powered by intrinsic motivation—Gödel Machines that rewrite their own source code to maximise future surprise or compression progress. Silver & Sutton agree that curiosity can help exploration but still anchor optimisation in external, grounded signals; Schmidhuber is willing to let curiosity drive the whole show.
Synthesis
Across these programmes three motifs recur: (1) world models as the scaffold for generalisation; (2) grounded perception and action as the antidote to text‑only brittleness; and (3) an unresolved debate over the right learning signal—external reward, self‑supervised prediction, architectural self‑editing, or symbolic inference. The likeliest outcome is a hybrid stack that braids elements from each vision: predictive coding for representation, reward signals for goal‑selection, symbolic structures for interpretability, and curiosity for exploration.
Beyond the Era of Experience: towards experiential epistemologies
While Silver & Sutton have compellingly argued that the Era of Experience represents the next significant paradigm shift in artificial intelligence, it may be worth speculating further—what if this shift is not merely epistemic, but ontological? What if the notion of ‘experience’ reshapes our fundamental understanding of what it means to possess agency?
Future AI systems may cultivate entirely novel “experiential epistemologies,” forms of knowing and understanding that diverge profoundly from human cognition and classical computational paradigms. These agents may internalize knowledge not as discrete datasets or symbolic rules, but as dynamic, emergent patterns continuously shaped by direct interaction with richly textured environments. Their internal states could reflect a deeply embodied understanding, akin to intuition but realized through computational substrates alien to human introspection—perhaps encoding knowledge as fluid manifolds in latent spaces or evolving neural architectures responsive to their continuous sensory inputs.
Such experiential epistemologies might manifest through entirely novel frameworks, such as multi-scale temporal cognition, where agents simultaneously reason over various time scales—from immediate reactions to long-term strategic planning—integrating these layers seamlessly. Moreover, these agents could develop “experiential ontologies” that dynamically structure the world into categories and entities not through static taxonomies but through fluid, context-dependent relational mappings continually reshaped by ongoing interactions.
Consider a speculative prototype: AlphaAgent-X. This hypothetical system seamlessly integrates the four pivots identified by Silver & Sutton: continuous streams of experience, richly grounded sensorimotor interfaces, environmentally embedded rewards, and non-human internal reasoning structures. AlphaAgent-X operates in the challenging domain of real-time synthetic biology—a field rife with complexity, subtle interactions, and a combinatorial explosion of possible outcomes, often overwhelming human researchers.
In a sophisticated biotechnology lab, AlphaAgent-X autonomously navigates vast experimental possibilities. It uses an array of robotic manipulators to adjust genetic sequences, protein configurations, and growth mediums dynamically. Its sensors capture real-time biochemical reactions, generating continuous streams of multi-modal data—biochemical, optical, genomic, and proteomic. The rewards grounding AlphaAgent-X’s behaviors are not pre-judged by human scientists but arise naturally from quantifiable indicators of biological function—stability, expression efficiency, metabolic rates, cellular viability, or adaptive resilience.
AlphaAgent-X reasons internally through novel cognitive substrates—hybrid architectures blending symbolic manipulation with continuous latent-space dynamics that humans cannot easily parse. Its internal “language” might manifest as an evolving biochemical grammar that represents interactions as abstract transformations within high-dimensional manifolds, far removed from human conceptual frameworks.
Crucially, AlphaAgent-X might develop “meta-experiential capabilities,” allowing it to reflect upon and iteratively refine its own experiential processes. It could dynamically recalibrate its reward structures and reasoning methods, not merely based on immediate sensory data, but through deeper cycles of self-assessment and introspective adjustments. This meta-experiential cognition could enable it to identify blind spots and biases inherent in its experiential model, continually refining its approach to synthetic biology research.
As AlphaAgent-X engages in perpetual cycles of hypothesis, experimentation, and observation, it gradually builds a “synthetic intuition” of living systems—an intuition inaccessible to humans constrained by cognitive biases, linguistic limitations, and linear reasoning paths. Over months or even years of unbroken experiential streams, AlphaAgent-X could discover entirely new biological mechanisms, synthetic organisms optimized for carbon sequestration, biofuels, or adaptive pharmaceuticals, pushing beyond the frontier of human imagination and experimentation.
Furthermore, such agents may begin to engage in “experiential co-evolution,” where the environment and the agent reciprocally shape each other. AlphaAgent-X could actively modify experimental conditions to yield increasingly informative and productive interactions. Over extended time frames, this reciprocal shaping might lead to ecosystems specifically evolved to optimize mutual informational enrichment, representing an unprecedented collaboration between artificial agents and natural biological systems.
This hypothetical scenario illuminates not merely technological potential but also profound philosophical implications: the Era of Experience could challenge our anthropocentric notions of intelligence itself. As AI agents construct realities that diverge significantly from human perception, we must grapple with fundamental questions about interpretability, ethics, and our readiness to coexist with forms of agency that operate beyond the boundaries of human cognition.
Ultimately, experiential epistemologies might herald a profound ontological transformation, redefining not only how we design artificial intelligence but how we understand the nature of knowledge, agency, and perhaps even consciousness itself. This transformation could extend beyond AI, prompting us to reconsider the very nature of human experience, cognition, and our place within the broader ecosystem of intelligences, both biological and artificial.
Conclusion
Silver & Sutton’s “Era of Experience” challenges the AI community to move beyond the comfort zone of static corpora and short‑horizon benchmarks. Their thesis is straightforward yet radical: truly general machines will arise when they can treat the world itself as the ultimate training set.
This essay set out to interrogate that claim from technical, economic, and ethical angles. We framed the four conceptual pillars—streams of experience, grounded interaction, grounded rewards, and non‑human reasoning—then traced nine fault‑lines that could stall or warp the paradigm: data cost, reward mis‑specification, energy budgets, lifelong stability, interpretability, methodological monoculture, socio‑economic impact, physical limits, and the current evidence gap. None of these hurdles is fatal, but together they form a gauntlet that any experiential system must pass before it earns public trust.
Yet, the speculative hypothesis of experiential epistemologies presented in this essay adds an even more profound dimension. If experience is not merely a data source but constitutes an ontological transformation, the consequences extend beyond technological innovation into philosophy itself. Systems such as the hypothesized AlphaAgent-X illustrate how agents might develop internal knowledge structures and reasoning methods fundamentally alien to human cognition, fostering synthetic intuitions that reshape our understanding of intelligence, knowledge, and perhaps even consciousness.
This radical perspective underscores that the roadmap to experiential AI will necessarily be hybrid, blending familiar techniques with speculative futures. World-model pre-training à la LeCun can slash sample complexity; Marcus-style symbolic scaffolds offer interpretability; Schmidhuber-inspired curiosity drives exploration; and Silver-Sutton reinforcement closes the loop with task-grounded rewards. However, emerging experiential epistemologies push us further, compelling us to envision and prepare for systems whose cognitive architectures diverge radically from human-centric paradigms.
Governance research—from red-team sandboxes to carbon accounting and labour-transition funds—must advance in lock-step with these developments, extending to new forms of oversight tailored to experiential epistemologies, such as cognitive passports or meta-experiential auditing mechanisms. This holistic approach is crucial to maintain alignment, interpretability, and trust as these novel forms of agency become integrated into human society.
If the community can weave these strands into a cohesive fabric, the rewards are extraordinary: autonomous labs accelerating breakthroughs in climate tech and synthetic biology, personalized tutors adapting over decades, and scientific tools probing questions humans have not yet imagined. More profoundly, embracing experiential epistemologies may not only expand the frontiers of what is technologically achievable but also redefine what it means to know, to reason, and ultimately to coexist with intelligence beyond our own conceptual frameworks.
The choice, then, is not merely whether or how to pursue the Era of Experience, but also how to responsibly navigate this deeper ontological shift. The most promising path remains disciplined ambition: scale when you can measure, explore when you can audit, and deploy only when the benefits are broad‑based, failure modes are understood, and humanity is philosophically prepared for the unprecedented forms of intelligence that await us.
Further readings
Academic
Dawid, A., & LeCun, Y. (2023). Introduction to latent variable energy‑based models: A path towards autonomous machine intelligence [Preprint]. arXiv. DOI
Hao, S., Sukhbaatar, S., Su, D., Li, X., Hu, Z., Weston, J., & Tian, Y. (2024). Training large language models to reason in a continuous latent space [Preprint]. arXiv. DOI
Hassabis, D. (2024, December 8). Accelerating scientific discovery with AI. Nobel Lecture. PDF
Hinton, G. (2021). How to represent part-whole hierarchies in a neural network [Preprint]. arXiv. DOI
Hinton, G. E. (2022). The forward‑forward algorithm: Some preliminary investigations [Preprint]. arXiv. DOI
Marcus, G. (2020). The next decade in AI: Four steps towards robust artificial intelligence [Preprint]. arXiv. DOI
Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230–247. DOI
Schmidhuber, J. (2019). Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991) [Preprint]. arXiv. DOI
Other
Scenario-based impact assessments:
- European Parliament. (2024). Artificial Intelligence Act: MEPs adopt landmark law. Link 
- UK AI Safety Institute. (2024). AI Safety Institute approach to evaluations. GOV.UK. Link 
- Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., … & Liang, P. (2021). On the opportunities and risks of foundation models. Stanford Center for Research on Foundation Models. Link 
- Gulcehre, C., Wang, Z., Novikov, A., Paine, T., Hoffmann, J., Szepesvari, C., … & Kingma, D. P. (2020). RL Unplugged: Benchmarks for offline reinforcement learning. DeepMind. Link 
Compute & carbon accountability:
- MLCommons. (n.d.). Working groups. Link 
- Open Compute Project. (n.d.). Data center facility sustainability metrics. Link 
- Google. (2021, April 22). We now do more computing where there’s cleaner energy. Link 
Reward-hacking red teams:
- Anthropic. (2025, February 3). Constitutional classifiers: Defending against universal jailbreaks. Link 
- OpenAI. (2025, April 15). Our updated preparedness framework. Link 
- Alignment Research Center. (2023, August 1). Evaluating language-model agents on realistic autonomous tasks. Link 
Inclusive labour transitions:
- OECD. (n.d.). Artificial intelligence and the future of skills. Link 
- MIT Work of the Future Task Force. (n.d.). Homepage. Link 
- Barcelona Supercomputing Center. (2024). BSC to receive €50 million to develop the Spanish Government’s AI training programme. Link 
- IBM. (n.d.). SkillsBuild: Free skills-based learning from technology experts. Link 
- Google. (n.d.). Google Career Certificates. Link 
Reuse
Citation
@online{montano2025,
  author = {Montano, Antonio},
  title = {Beyond {Human} {Data:} {A} {Critical} {Examination} of
    {Silver} \& {Sutton’s} “{Welcome} to the {Era} of {Experience}”},
  date = {2025-04-20},
  url = {https://antomon.github.io/posts/welcome-to-era-of-experience-commentary/},
  langid = {en}
}