Enhancing Embodied Lifelong Learning in Minecraft with Guanaco Models: The Modified Voyager Project
Enhancing Embodied Lifelong Learning in Minecraft with Guanaco Models: The Modified Voyager Project

Enhancing Embodied Lifelong Learning in Minecraft with Guanaco Models: The Modified Voyager Project

Introduction:

This report aims to present the modified version of Voyager, an embodied lifelong learning agent in Minecraft, which incorporates the use of localized LLM called the Guanaco models. Voyager represents a significant advancement in the field of artificial intelligence, as it demonstrates the potential for autonomous exploration, skill acquisition, and novel discoveries without human intervention. By integrating the capabilities of Guanaco models, we aim to enhance the performance and efficiency of Voyager, thereby improving its lifelong learning capabilities.

Function of Voyager:

Voyager comprises three fundamental components that enable its embodied lifelong learning abilities. Firstly, it employs an automatic curriculum that maximizes exploration, allowing the agent to discover and interact with various aspects of the Minecraft world. This exploration serves as the foundation for skill acquisition and knowledge accumulation.

Secondly, Voyager maintains an ever-growing skill library, which consists of executable code that encapsulates complex behaviors. These skills are stored and retrieved as needed, enabling the agent to perform a wide range of tasks within the Minecraft environment. The skill library plays a crucial role in facilitating the agent’s ability to rapidly compound its abilities and alleviate catastrophic forgetting, as it allows for the retention and reuse of previously acquired knowledge.

Lastly, Voyager incorporates an iterative prompting mechanism that leverages environment feedback, execution errors, and self-verification to improve its programming and decision-making processes. This mechanism enables the agent to iteratively refine its skills, adapt to new challenges, and optimize its performance over time. By incorporating these components, Voyager demonstrates exceptional proficiency in playing Minecraft, achieving superior results compared to prior state-of-the-art approaches.

Role of Guanaco Models in Enhancing Voyager:

The introduction of Guanaco models into the modified version of Voyager brings several advantages and improvements to the overall system. Guanaco models are localized LLMs that have been obtained through 4-bit QLoRA tuning on the OASST1 dataset. These models have demonstrated competitive performance with commercial chatbot systems on the Vicuna and OpenAssistant benchmarks, as evaluated by both human and GPT-4 raters.

By utilizing Guanaco models, Voyager can leverage the capabilities of high-quality chatbot systems while operating in a localized and cost-effective manner. This allows for cheap and local experimentation, enabling researchers to conduct in-depth investigations and customize the chatbot system to suit specific research requirements.

Moreover, Guanaco models offer a replicable and efficient training procedure that can be extended to new use cases. The training scripts for Guanaco models are available in the QLoRA repository, providing a foundation for researchers to build upon and adapt the models for different applications. This scalability and extensibility contribute to the versatility of Voyager and facilitate its integration into various research domains.

Additionally, Guanaco models have been rigorously compared to 16-bit methods, including both full-finetuning and LoRA approaches, demonstrating the effectiveness of 4-bit QLoRA finetuning. This comparison highlights the efficiency and competitiveness of Guanaco models in terms of their performance and resource requirements. The lightweight checkpoints of Guanaco models, which only contain adapter weights, further contribute to the efficiency and practicality of integrating these models into Voyager.

GPT-4 in this graph should be replaced by Guanaco

Conclusion:

The integration of Guanaco models into the modified version of Voyager represents a significant advancement in the field of embodied lifelong learning agents. Through the utilization of Guanaco models, Voyager benefits from the competitive performance of high-quality chatbot systems while enabling cost-effective and localized experimentation. The enhanced capabilities of Voyager, coupled with the efficiency and scalability of Guanaco models, open up new avenues for research and development in the field of lifelong learning agents and their applications.