GitHub Park
#

Vision Language Action Model

Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

Mantis is a versatile vision-language-action model that empowers robots to perform complex manipulation tasks through innovative disentangled visual foresight, progressive training, and adaptive temporal integration mechanisms. The key features of Mantis i