New Algorithm Lets AI Learn From Mistakes, Become a Little More Human

An AI That Looks Back

In recent months, researchers at OpenAI have been focusing on developing artificial intelligence (AI) that learns better. Their machine learning algorithms are now capable of training themselves, so to speak, thanks to the reinforcement learning methods of their OpenAI Baselines. Now, a new algorithm lets their AI learn from its own mistakes, almost as human beings do.

The development comes from a new open-source algorithm called Hindsight Experience Replay (HER), which OpenAI researchers released earlier this week. As its name suggests, HER helps an AI agent “look back” in hindsight, so to speak, as it completes a task. Specifically, the AI reframes failures as successes, according to OpenAI’s blog.

“The key insight that HER formalizes is what humans do intuitively: Even though we have not succeeded at a specific goal, we have at least achieved a different one,” the researchers wrote. “So why not just pretend that we wanted to achieve this goal to begin with, instead of the one that we set out to achieve originally?”

Simply put, this means that every failed attempt as an AI works towards a goal counts as another, unintended “virtual” goal.

Think back to when you learned how to ride a bike. On the first couple of tries, you actually failed to balance properly. Even so, those attempts taught you how to not ride properly, and what to avoid when balancing on a bike. Every failure brought you closer to your goal, because that’s how human beings learn.

Rewarding Every Failure

With HER, OpenAI wants their AI agents to learn the same way. At the same time, this method will become an alternative to the usual rewards system involved in reinforcement learning models. To teach AI to learn on its own, it has to work with a rewards system: either the AI reaches its goal and gets an algorithm “cookie” or it doesn’t. Another model gives out cookies depending on how close an AI is to achieving a goal.

Both methods aren’t perfect. The first one stalls learning, because an AI either gets it or it doesn’t. The second one, on the other hand, can be quite tricky to implement, according to the IEEE Spectrum. By treating every attempt as a goal in hindsight, HER gives an AI agent a reward even when it actually failed to accomplish the specified task. This helps the AI learn faster and at a higher quality.

“By doing this substitution, the reinforcement learning algorithm can obtain a learning signal since it has achieved some goal; even if it wasn’t the one that you meant to achieve originally. If you repeat this process, you will eventually learn how to achieve arbitrary goals, including the goals that you really want to achieve,” according to OpenAI’s blog.

Here’s an example of how HER works with OpenAI’s Fetch simulation.

This method doesn’t mean that HER makes it completely easier for AI agents to learn specific tasks. “Learning with HER on real robots is still hard since it still requires a significant amount of samples,” OpenAI’s Matthias Plappert told IEEE Spectrum.

In any case, as OpenAI’s simulations demonstrated, HER can be quite helpful at “encouraging” AI agents to learn even from their mistakes, pretty much as we all do — the major difference being that AIs don’t get frustrated like the rest of us feeble folks.

The post New Algorithm Lets AI Learn From Mistakes, Become a Little More Human appeared first on Futurism.

New Algorithm Lets AI Learn From Mistakes, Become a Little More Human

An AI That Looks Back

Rewarding Every Failure

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112