iPhone 6s
ƒ/2.2
4.15 mm
1/30
200

Inspired by Alison Gopnik’s Scientist in the Crib, can we train robots and AIs to play, and become creative like children naturally do? At Stanford today, I heard the update on some fascinating research that aims to do just that, and it may help us better understand human autism in the process.

Using adversarial reinforcement learning between two neural nets (or models), they can reward the exploration of novelty. The World Model contains a representation of the world, and the Self Model is rewarded for learning something new, a reasonable semantic proxy for creativity. Over time, the robot AI will look around for new things to play with. And as they introduce social play, they hope to learn signaling cues, like tentative motions taken by an autonomous car to see if the other driver is attentive and will yield to a lane merge.

Autistic children play very differently from other children. They tend to ignore differences in objects and repeat activities for long periods of time, like putting heterogenous objects in a line, always a line. It appears robotic. There does not appear to be the same novelty-seeking behavior that may underlie creative exploration and play. By 6 months of age, the autistic child will look down more often than up, whereas other children have moved on, looking upward in their search for novel visual material.

The neural net model seems like an implementation of the memory-prediction framework for intelligence as described by Jeff Hawkins in On Intelligence. Jeff argues that the 100 billion neurons in our neocortex provide a vast amount of memory that learns a model of the world. These memory-based models continuously make low-level predictions in parallel across all of our senses. We only notice them when a prediction is incorrect. Higher in the hierarchy, we make predictions at higher levels of abstraction (the crux of intelligence, creativity and all that we consider being human), but the structures are fundamentally the same.

I asked if this might relate to Hinton’s new research focus on coincidence detection (the neurons that fire together wire together), and this team saw their system as the ultimate coincidence detector.

Learning to Play with Intrinsically-Motivated Self-Aware Agents
(PI Dan Yamins with the poster photo above, Feb 2018):
“Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which an agent can move and interact with objects it sees, we propose a “world-model” network that learns to predict the dynamic consequences of the agent’s actions. Simultaneously, we train a separate explicit “self- model” that allows the agent to track the error map of its own world-model, and then uses the self-model to adversarially challenge the developing world-model. We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering. Moreover, the world-model that the agent learns supports improved performance on object dynamics prediction, detection, localization and recognition tasks. Taken together, our results are initial steps toward creating flexible autonomous agents that self-supervise in complex novel physical environments.

These ideas rely on a virtuous cycle in which the agent actively self-curricularizes as it pushes the boundaries of what its world-model-prediction systems can achieve. As world-modeling capacity improves, what used to be novel becomes old hat, and the cycle starts again.

As the agent optimizes the accuracy of its world-model, a separate explicit “self- model” neural network simultaneously learns to predict the errors of the agent’s own world-model. Based on the self- model, the agent then uses an action policy that seeks to take actions that adversarially challenge the current state of its world-model. We demonstrate that this intrinsically-motived self-aware architecture stably engages in the virtuous reinforcement learning cycle described above, spontaneously learning to understand self-generated ego-motion and to selectively pay attention to, localize, recognize, and interact with objects, without having any of these concepts built in. This learning occurs through an emergent active self-supervised process in which new capacities arise at distinct “developmental milestones” like those in human infants.”

The Catalyst program in the Stanford school of engineering helps fund new inter-disciplinary work with seed funding (an idea I have been socializing with the Dean for a few years now). Here is the agenda of cool projects we saw today, selected from the first two years and 66 proposals.

P.S. Alison Gopnik and the Berkeley / MIT crew are working on similar avenues in game play: “Another problem with reinforcement learning is that programs can get stuck trying the same successful strategy over and over, instead of risking something new. Drs. Pathak and Agrawal have designed a program to use curiosity in mastering videogames. It has two crucial features to do just that. First, instead of just getting rewards for a higher score, it’s also rewarded for being wrong. The program tries to predict what the screen will look like shortly after it makes a new move. If the prediction is right, the program won’t make that move again—it’s the same old same old. But if the prediction is wrong, the program will make the move again, trying to get more information. The machine is always driven to try new things and explore possibilities. If artificial intelligence is really going to compete with natural intelligence, more childlike insatiable curiosity may help.” — WSJ

One response to “Playtime for Robots — Creativity in AI”

  1. Deep learning… recapitulating our biology, again! "We demonstrate that this intrinsically-motived self-aware architecture stably engages in the virtuous reinforcement learning cycle described above, spontaneously learning to understand self-generated ego-motion and to selectively pay attention to, localize, recognize, and interact with objects, without having any of these concepts built in. This learning occurs through an emergent active self-supervised process in which new capacities arise at distinct “developmental milestones” like those in human infants.” — from arxiv.org/pdf/1802.07442.pdf (PI Dan Yamins is in the poster photo above) and to track human development and autism, head-mounted cameras and Google Glass

Leave a Reply

Your email address will not be published. Required fields are marked *