In a paper published this month, the human motions to wear clothes is synthesized in animation with reinforcement learning. The paper named Learning to Dress: Synthesizing Human Dressing Motion via Deep Reinforcement Learning was published yesterday. The team is made up of two Ph.D. students from The Georgia Institute of Technology, two of its professors and a researcher from Google Brain.
Dressing, putting on a t-shirt or a jacket is something we do every day. Yet it is a computationally costly and complex task for a machine to perform or be simulated by computers. Techniques in physics simulation and machine learning are used in this paper to simulate an animation. A physics engine is used to simulate character motion and cloth motion. On the other hand deep reinforcement learning on a neural network is used to produce character motion.
The authors of the paper introduce a salient representation of haptic information to guide the dressing process. Then the haptic information is used in the reward function to provide learning signals when training the network. As the task is too complex to do perform in one go, the dressing task is separated into several subtasks for better control.
A policy sequencing algorithm is introduced to match the distribution of output states from one task to the input distribution for the next task. The same approach is used to produce character controllers for various dressing tasks like wearing a t-shirt, wearing a jacket, and robot-assisted dressing of a sleeve.
The approach taken by the authors splits the dressing task into a sequence of subtasks. Then a state machine guides the between these tasks. Dressing a jacket, for example, consists of four subtasks:
A separate reinforcement learning problem is formulated for each subtask in order to learn a control policy.
The policy sequencing algorithm ensures that these individual control policies can lead to a successful dressing sequence on being executed sequentially. The algorithm matches the initial state of one subtask with the final state of the previous subtask in the sequence. A variety of successful dressing motions can be produced by applying the resultant control policies.
Each subtask in the dressing task is formulated as a partially observable Markov Decision Process (POMDP). Character dynamics are simulated with Dynamic Animation and Robotics Toolkit (DART) and cloth dynamics with NVIDIA PhysX.
A system that learns to animate a character that puts on clothing is successfully created with the use of deep reinforcement learning and physics simulation. From the subtasks, the system learns each sub-task individually, then connects them with a state machine. It was found that carefully selecting the cloth observations and the reward functions were important factors for the success of the approach taken.
This system currently performs only upper body dressing. For lower body, a balance into the controller would be required. The number of subtasks might reduce on using a control policy architecture with memory. This will allow for greater generalization of the skills learned.
You can read the research paper at the Georgia Institute of Technology website.
Facebook launches Horizon, its first open source reinforcement learning platform for large-scale products and services
Deep reinforcement learning – trick or treat?
Google open sources Active Question Answering (ActiveQA), a Reinforcement Learning based Q&A system