BigBuys

Silicon Valley Bets Big On ‘Environments’ To Train AI Agents

Big Tech CEOs have long praised ideas of AI agents that can use software programs to perform jobs for humans on their own.However, you will soon see how restricted the technology is if you try out any of today’s consumer AI agents, such as Perplexity’s Comet or OpenAI’s ChatGPT Agent.It might require a new set of methods that the industry is still learning to make AI agents more resilient.

Reinforcement learning (RL) environments, which are meticulously modeled workspaces where agents can be educated on multi-step tasks, are one of such methods. RL settings are beginning to appear as a crucial component in the creation of agents, much way labeled datasets drove the previous wave of AI.

Leading AI laboratories are increasingly requesting more RL environments, and there are plenty of firms looking to provide them, according to AI researchers, founders, and investors who spoke to TechCrunch.

A new generation of well-funded firms, such Mechanize and Prime Intellect, have emerged in response to the demand for RL settings, with the goal of dominating the market. Large data-labeling firms like Mercor and Surge, meanwhile, claim to be increasing their investments in RL settings in order to stay up with the industry’stransition from static datasets to interactive simulations. The Information reports that Anthropic executives have talked about spending over $1 billion on RL environments over the course of the next year, indicating that the major labs are also thinking about making significant investments.

The hope for investors and founders is that one of these startups emerge as the “Scale AI for environments,” referring to the $29 billion data labelling powerhouse that powered the chatbot era.

The question is whether RL environments will truly push the frontier of AI progress.

Fundamentally, RL environments are training grounds that mimic the actions of an AI agent in an actual software program. In a recent interview, one founder said that constructing them was “like designing an extremely boring video game.”

Even though it seems like a fairly easy assignment, there are numerous potential pitfalls for an AI agent. It might purchase too many socks or become confused when exploring the website’s drop-down choices. Additionally, the environment itself must be resilient enough to record any unexpected behavior and still provide insightful feedback because developers are unable to precisely forecast the wrong course an agent will take. Because of this, creating environments is much more complicated than creating a static dataset.

Some environments are quite elaborate, allowing for AI agents to use tools, access the internet, or use various software applications to complete a given task. Others are more narrow, aimed at helping an agent learn specific tasks in enterprise software applications.

While RL environments are the hot thing in Silicon Valley right now, there’s a lot of precedent for using this technique. One of OpenAI’s first projects back in 2016 was building “RL Gyms,” which were quite similar to the modern conception of environments. The same year, Google DeepMind’s AlphaGo AI system beat a world champion at the board game, Go. It also used RL techniques within a simulated environment.

A crowded field

RL environments are being developed by AI data labeling firms such as Scale AI, Surge, and Mercor in an effort to keep up with the times. These businesses have strong ties with AI labs and greater resources than many startups in the field.

Mercor, a $10 billion business that has collaborated with OpenAI, Meta, and Anthropic, is right behind Surge. Marketing documents acquired by TechCrunch show that Mercor is pitching investors on its business of creating RL environments for domain-specific jobs like coding, healthcare, and law.

In an interview with TechCrunch, Brendan Foody, CEO of Mercor, stated that “few comprehend how vast the opportunity around RL settings genuinely is.”

Prior to Meta’s $14 billion investment and CEO departure, Scale AI dominated the data labeling market. Since then, Scale AI has lost its status as a data provider to Google and OpenAI, and the business is now competing with Meta for data labeling jobs. Scale is still working to create surroundings to match the current needs, nevertheless.

Chetan Rane, head of product for agents and RL environments at Scale AI, stated, “This is just the nature of the business [Scale AI] is in.” Scale has demonstrated its capacity for rapid adaptation. This was our first business unit, and we accomplished it in the early days of autonomous vehicles. Scale AI adjusted to ChatGPT’s release. We are currently adjusting to new frontier regions, such as agents and habitats, once more.

Some more recent gamers are concentrating solely on environments right away. Mechanize, a firm that was established about six months ago with the bold objective of “automating all jobs,” is one of them. Matthew Barnett, a co-founder, tells TechCrunch that his company is beginning with RL settings for AI coding agents.

According to Barnett, Mechanize wants to provide AI laboratories with a limited number of strong RL environments as opposed to bigger data companies that produce a variety of basic RL environments. The business is currently paying software engineers $500,000 to create RL settings, which is significantly more than what an hourly contractor at Scale AI or Surge might make.

Outside of AI laboratories, other startups are placing bets that RL settings will have a significant impact. With its RL environments, Prime Intellect, a firm supported by Menlo Ventures, Founders Fund, and AI researcher Andrej Karpathy, is aiming its products at smaller developers.

In an effort to serve as a “Hugging Face for RL environments,” Prime Intellect introduced an RL environments hub last month. The goal is to sell computing resources to open-source developers while providing them with the same resources made available by large AI laboratories.

According to Will Brown, a researcher at Prime Intellect, developing generally capable agents in RL contexts can be more computationally costly than earlier AI training methods. In addition to businesses creating RL environments, GPU suppliers have an additional chance to support the process.

In an interview, Brown stated that “RL settings will be too big for any one firm to dominate.” “We are working to create a solid open-source infrastructure around it as part of our efforts. We are considering this more long-term, but it is a nice onramp to adopting GPUs because our service is computation.

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Scroll to Top