Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-to-end provide better performance than training each component separately? To this end, we develop a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors. The policies are represented by deep convolutional neural networks (CNNs) with 92,000 parameters, and are trained using a guided policy search method, which transforms policy search into supervised learning, with supervision provided by a simple trajectory-centric reinforcement learning method. We evaluate our method on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and present simulated comparisons to a range of prior policy search methods.
We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems. These trajectory distributions can be used within the framework of guided policy search to learn policies with an arbitrary parameterization. Our method fits time-varying linear dynamics models to speed up learning, but does not rely on learning a global model, which can be difficult when the dynamics are complex and discontinuous. We show that this hybrid approach requires many fewer samples than model-free methods, and can handle complex, nonsmooth dynamics that can pose a challenge for model-based techniques. We present experiments showing that our method can be used to learn complex neural network policies that successfully execute simulated robotic manipulation tasks in partially observed environments with numerous contact discontinuities and underactuation.
Autonomous learning of object manipulation skills can enable robots to acquire rich behavioral repertoires that scale to the variety of objects found in the real world. However, current motion skill learning methods typically restrict the behavior to a compact, low-dimensional representation, limiting its expressiveness and generality. In this paper, we extend a recently developed policy search method  and use it to learn a range of dynamic manipulation behaviors with highly general policy representations, without using known models or example demonstrations. Our approach learns a set of trajectories for the desired motion skill by using iteratively refitted time-varying linear models, and then unifies these trajectories into a single control policy that can generalize to new situations. To enable this method to run on a real robot, we introduce several improvements that reduce the sample count and automate parameter selection. We show that our method can acquire fast, fluent behaviors after only minutes of interaction time, and can learn robust controllers for complex tasks, including putting together a toy airplane, stacking tight-fitting lego blocks, placing wooden rings onto tight-fitting pegs, inserting a shoe tree into a shoe, and screwing bottle caps onto bottles.
Due to a general shift in manufacturing paradigm from mass production towards mass customization, reconfigurable automation technologies, such as robots, are required. However, current industrial robot solutions are notoriously difficult to program, leading to high changeover times when new products are introduced by manufacturers. In order to compete on global markets, the factories of tomorrow need complete production lines, including automation technologies that can effortlessly be reconfigured or repurposed, when the need arises. In this paper we present the concept of general, self-asserting robot skills for manufacturing. We show how a relatively small set of skills are derived from current factory worker instructions, and how these can be transferred to industrial mobile manipulators. General robot skills can not only be implemented on these robots, but also be intuitively concatenated to program the robots to perform a variety of tasks, through the use of simple task-level programming methods. We demonstrate various approaches to this, extensively tested with several people inexperienced in robotics. We validate our findings through several deployments of the complete robot system in running production facilities at an industrial partner. It follows from these experiments that the use of robot skills, and associated task-level programming framework, is a viable solution to introducing robots that can intuitively and on the fly be programmed to perform new tasks by factory workers.