Best Paper Award, ICRA 2019
[pdf] [site] [ppt]
摘要
在非結(jié)構(gòu)環(huán)境中框都,多接觸操作任務(wù)(Contact-rich manipulation tasks)通常同時需要觸覺和視覺反饋畜埋,通常手動設(shè)計結(jié)合不同特征的模式的控制器并非易事粹污。深度強(qiáng)化學(xué)習(xí)(DRL),在高維輸入下學(xué)習(xí)控制策略已經(jīng)取得了成功,由于樣本復(fù)雜性,這些算法通常難以在真實機(jī)器人上部署歇拆。我們使用自監(jiān)督,去學(xué)習(xí)傳感器數(shù)據(jù)的緊湊和多模態(tài)表示,然后可以用來提高我們的策略學(xué)習(xí)的樣本效率查吊。我們在栓釘插入任務(wù)上評估了我們的方法谐区,對不同的幾何形狀,配置和間隙進(jìn)行推廣逻卖,同時對外部擾動具有魯棒性宋列。我們在仿真和真實機(jī)器人上呈現(xiàn)效果。
介紹
Fig. 1: Force sensor readings in the z-axis (height) and visual observations are shown with corresponding stages of a peg insertion task. The force reading transitions from (1) the arm moving in free space to (2) making contact with the box. While aligning the peg, the forces capture the sliding contact dynamics on the box surface (3, 4). Finally, in the insertion stage, the forces peak as the robot attempts to insert the peg at the edge of the hole (5), and decrease when the peg slides into the hole (6).
主要貢獻(xiàn)有:
- 多模態(tài)表示學(xué)習(xí)模型评也,可以從中學(xué)習(xí)多接觸操作策略炼杖。
- 插入任務(wù)的示范,有效地利用觸覺和視覺反饋進(jìn)行孔搜索盗迟,栓釘對齊和插入(參見Fig.1)坤邪。燒蝕研究比較了每種模態(tài)對任務(wù)表現(xiàn)的影響。
- 評估具有不同栓釘幾何形狀的任務(wù)的泛化罚缕,以及對擾動和傳感器噪聲的魯棒性艇纺。
多模態(tài)表示模型
Fig. 2: Neural network architecture for multimodal representation learning with self-supervision. The network takes data from three different sensors as input: RGB images, F/T readings over a 32ms window, and end-effector position and velocity. It encodes and fuses this data into a multimodal representation based on which controllers for contact-rich manipulation can be learned. This representation learning network is trained end-to-end through self-supervision.
策略學(xué)習(xí)和控制器設(shè)計
Fig. 3: Our controller takes end-effector position displacements from the policy at 20Hz and outputs robot torque commands at 200Hz. The trajectory generator interpolates high-bandwidth robot trajectories from low-bandwidth policy actions. The impedance PD controller tracks the interpolated trajectory. The operational space controller uses the robot dynamics model to transform Cartesianspace accelerations into commanded joint torques. The resulting controller is compliant and reactive.
實驗:設(shè)計和設(shè)置
Fig. 4: Simulated Peg Insertion: Ablative study of representations trained on different combinations of sensory modalities. We compare our full model, trained with a combination of visual and haptic feedback and proprioception, with baselines that are trained without vision, or haptics, or either. (b) The graph shows partial task completion rates with different feedback modalities, and we note that both the visual and haptic modalities play an integral role for contact-rich tasks.
Reward Design
實驗:結(jié)果
Real Robot Experiments
Fig. 5: (a) 3D printed pegs used in the real robot experiments and their box clearances. (b) Qualitative predictions: We visualize examples of optical flow predictions from our representation model (using color scheme in [22]). The model predicts different flow maps on the same image conditioned on different next actions indicated by projected arrows.
Fig. 6: Real Robot Peg Insertion: We evaluate our Full Model on the real hardware with different peg shapes, indicated on the x-axis. The learned policies achieve the tasks with a high success rate. We also study transferring the policies and representations from trained pegs to novel peg shapes (last four bars). The robot effectively re-uses previously trained models to solve new tasks.