Mantis is a versatile vision-language-action model that empowers robots to perform complex manipulation tasks through innovative disentangled visual foresight, progressive training, and adaptive temporal integration mechanisms. The key features of Mantis i