CUPCAKEAGI ๐ง๐ฐ๐๐ค๐ง ๐ฉ๐ช
Hey there cupcake lovers๐งโค๏ธ! I am excited to introduce you to my latest project, CupcakeAGI!
You can find the documentation here: https://akshitireddy.github.io/CUPCAKEAGI/
๐ Features
- ๐ Access to internet
- ๐ถ Upload Images
- ๐ต Upload Audio
- ๐น Upload Video
- ๐พ Persistent Memory
- โค๏ธ Emotions
- ๐ญ Random Thoughts
- ๐ด Dreams
- ๐ ๏ธ Pre-defined Abilities
- ๐งฑ Modular approach for adding new Abilities
- ๐ Assign & schedule Tasks
- ๐ Asynchronous Task Processing
- ๐ฃ๏ธ Talk while Tasks are being processed in Background
- ๐งโ๐ป Create & Run Python Code
- ๐ง GPT-3.5 as the brain
โจ Demo
demo.mp4
๐จ Requirements
Open up a terminal and go to backend/Multi-Sensory Virtual AAGI (you need to have conda installed)
npm install next
conda env create -f environment.yml
๐ How to use
Open up a terminal and go to backend/Multi-Sensory Virtual AAGI
conda activate aagi
uvicorn inference:app
Open up another terminal and go to frontend/assistant (you need to have node installed)
npm run dev
Enter your API keys in .env file, You'll need an OPENAI API key, SERPER API key
โจ About
CupcakeAGI is an agent that aims to mimic human-like behavior and cognitive abilities to assist users in performing various tasks. It's equipped with some sweet๐ฌ features, including the ability to dream๐ด, have random thoughts, and perform mental simulations on how to complete a task. Just like how we humans have thoughts floating around our heads, CupcakeAGI has a thought bubble๐ญ with abstract words.
To make CupcakeAGI more expressive, I've added emotion parameters. This will allow it to interact with users in a more personal wayโค๏ธ.
One of CupcakeAGI's most impressive features is its ability to accept various forms of sensory data, such as images๐ถ, videos๐น, and audio๐ต. Although I haven't implemented smell๐, touchโ and taste๐ yet, it should be similar to what I did for image, video, and audio. You'll need a function to convert the sensory data to text and then it will get added as a file description for the file which will be used while prompting the model.
CupcakeAGI provides two main features for user interaction: talk and task. The talk feature allows for immediate responses to user queries using tools like search engines, calculators, and translators, making it a real-time problem solver. And who doesn't love a good problem solver๐ง , especially when it comes to baking cupcakes๐ง?
The task feature is used for completing tasks at a start time or by a deadline. Both Task & Talk features allows for chaining multiple tools together using a natural language task function that converts the output of one tool into the input of another, making different tools compatible with each other. So, whether you need to bake some cupcakes for a birthday party or a cupcake contest, CupcakeAGI is here to help you out!
Some abilities like search, calculator, wikipedia search are predefined, these abilities are defined as python functions which the agent can use by creating a python script and importing these functions followed by running the final script and saving the output to a text file which it can access. More abilities can be defined and existing ones can be modified in a modular fashion, all one needs to do is to drop the python script in ability functions and then mention it's name, description and directions to use in abilities.json in state_of_mind directory and just like that the agent will have a new ability. The agent can chain these abilities to do more complex tasks and to ensure compatibility it can use the natural_task_function.
Overall, I hope you find CupcakeAGI to be a sweet addition to your life. This project was a lot of fun to create, and I'm excited to see where it goes. Thanks for reading, and happy baking!โจ
โจ Why?
-
Our brain processes and integrates these sensory inputs to form a coherent perception of the world around us. Similarly, in the realm of artificial intelligence, the ability to process and integrate multisensory data is crucial for building intelligent agents that can interact with humans in a more natural and effective way.
-
In recent years, large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated remarkable abilities in generating human-like text based on vast amounts of training data. However, these models are typically limited to working with text and image data and lack the ability to process other types of sensory inputs.
-
Beyond the ability to process multisensory data, the LLM agent also exhibits several cognitive abilities that are typically associated with humans. For instance, the agent is equipped with the ability to dream and have random thoughts, which are thought to play important roles in human creativity, memory consolidation, and problem-solving. By incorporating these features into the LLM agent, we aim to create an agent that can assist users in performing tasks in a more natural and effective way and make these agents more human-like.
โจ Multisensory Data
-
๐ง Welcome back to the world of cupcakes and baking! We all know that human experience is much more than just text-based interactions. It's not just about reading, but also about experiencing the world with all our senses, including sight ๐, sound ๐, smell ๐, taste ๐ , and touch ๐. Similarly, an LLM agent that can work with multisensory data can open up a new world of possibilities for machine learning.
-
Instead of missing out on the rich and varied data available through other sensory modalities, we can use neural network architectures that convert various forms of sensory data into text data that the LLM can work with.
-
For instance, we can use image captioning models like vit-gpt2 and blip to convert images into text data, which the LLM agent can then process. Similarly, for audio data, audio-to-text models like OpenAI's Whisper can be used to convert audio signals into text data.๐ท๐ค
-
Now, I know what you're thinking: what about videos ๐ฅ, smell ๐, taste ๐ , and touch ๐? Don't worry, we got you covered! To save computation, we can use one frame per second of video data and use image captioning models to convert each frame into text. The audio track from the video can be separated and transcribed using audio-to-text models, providing the LLM agent with both visual and auditory data.
-
As for smell ๐, taste ๐ , and touch ๐, we can use electronic noses and tongues to capture different types of chemical and taste data and convert them into text data that the LLM can process. Haptic sensors can capture pressure, temperature, and other physical sensations and convert them into text data using a neural network or anything else.
-
Remember, these models should be used as modular components that can be easily switched out as new models emerge. Think of them as lego blocks or react components that we can assemble to create a more comprehensive system.
-
So, let's get baking with CupcakeAGI and incorporate multisensory data into an LLM agent to create a more natural and effective human-machine interaction. With the availability of different sensory data, the LLM agent can process and understand various types of data, leading to a more human-like agent that can assist us in different tasks.๐ง๐ป
โจ Human Like Behavior and Persistent Memory
๐ง๐ Welcome to CupcakeAGI, where we bake up some sweet and creamy AI goodness! ๐ฐ๐ค
Here are some of the key features of our LLM agent that make it more human-like and effective:
-
๐ง Human-like behavior: Our LLM agent is equipped with several features that mimic human behavior, including the ability to dream, have random thoughts, and perform mental simulations of how to complete a task. These features allow the agent to better understand and respond to user queries.
-
๐ค Persistent memory: Our LLM agent has a state of mind where all files relating to its personality, emotions, thoughts, conversations, and tasks are stored. Even if the agent has stopped running, all relevant information is still stored in this location. This allows the agent to provide a more personalized and effective experience.
-
๐ Emotion parameters: We use emotion parameters such as happiness, sadness, anger, fear, curiosity, and creativity to make the LLM agent more expressive and better understand the user's needs and preferences.
-
๐ญ Thought bubble: Our LLM agent also has a thought bubble, which is essentially a list of lists that corresponds to different topics. This allows the agent to more effectively process and integrate its thoughts with the user's queries and tasks.
-
๐ฃ๏ธ Conversation storage: The LLM agent stores the conversation it has had so far and the list of tasks it needs to perform. It breaks the conversation into chunks and summarizes it to maintain coherence and relevance. This allows the agent to maintain a coherent and relevant conversation with the user.
With these features, our LLM agent is better equipped to assist users in performing tasks in a natural and effective way. We hope you enjoy our sweet and creamy AI goodness! ๐ง๐ฐ๐ค
โจ Talk & Task
๐ง๐ Welcome to CupcakeAGI! Here are some sweet deets about the LLM agent that will make your tasks a cakewalk:
- ๐ฃ๏ธ Talk and Task modes make it easy for users to communicate with the LLM and get things done seamlessly.
- ๐ The LLM converts files like images, videos, and audio to text, making them easy to store and retrieve.
- ๐ With access to various tools like search engines, wikis, and translators, the LLM can provide users with the necessary information for their queries.
- ๐งฐ Natural language task functions allow users to chain together different tools, making them compatible with each other.
- ๐ฐ๏ธ The Task mode is particularly useful for lengthy tasks and can be set to start at a specific time, allowing users to focus on other things while the LLM takes care of the task.
- ๐ญ The LLM experiences random thoughts and dreams, just like humans, making it more relatable and human-like.
- ๐งโ๐ป The LLM can even use Python packages like Hugging Face models to complete tasks, making it a highly versatile agent. So go ahead and give CupcakeAGI a try! With its modular approach, you can easily add new tools and features as needed. Who knew cupcakes and AI could go so well together? ๐ง๐ค
โจ Limitations
Welcome to CupcakeAGI! ๐ง๐ฐ๐ฉ๐ช
Let's talk about some important things you need to know about this sweet project:
-
Complex tasks: While CupcakeAGI is as human-like as possible, it may not be able to solve complex tasks that require significant back and forth. We're talking about tasks that involve negotiating with multiple parties to reach a solution. CupcakeAGI is intended to assist individuals on a personal level, but it may not be suitable for solving highly intricate problems. Don't worry, though, CupcakeAGI is still your go-to for all your cupcake baking needs! ๐ง๐ฉโ๐ณ
-
Accuracy of sensory data conversion: The effectiveness of CupcakeAGI relies heavily on the accuracy of the neural network architectures used to convert sensory data into text. If these models are not accurate, CupcakeAGI may misunderstand the user's input, leading to incorrect or ineffective responses. But don't fret, we're constantly working on improving CupcakeAGI's accuracy to ensure you get the best experience possible! ๐ค๐
-
Ethics and Privacy: CupcakeAGI has the potential to collect and process a large amount of personal data from the users. Thus, there is a risk that sensitive data may be compromised, leading to privacy concerns. CupCakeAGI will do it's best to keep your cupcake secrets safe! ๐๐คซ
Thanks for checking out CupcakeAGI, and remember, with CupcakeAGI by your side, you'll always have the perfect cupcake recipe! ๐ง๐ป
โจ Conclusion
Welcome to the conclusion of our multisensory LLM agent project! ๐๐ง๐ค๐ง
Here are the key takeaways from our project ๐คช๐ง
- Our LLM agent is like a cupcake, made with many different ingredients - it can work with multisensory data, dream, have random thoughts, and show emotions ๐ง๐ญ๐
- By incorporating multisensory data, our agent can understand different types of information, just like a baker uses different ingredients to make a delicious cupcake ๐ฐ๐
- With its cognitive abilities and persistent memory, our agent can assist users in a more human-like way, just like a friendly baker who helps you choose the perfect cupcake flavor ๐ค๐ง
- This project represents a small but important step towards building more natural and effective AI assistants, just like a small cupcake can bring a smile to someone's face and brighten their day ๐๐ง
- We hope our project has inspired you to think about the possibilities of multisensory LLM agents and how they can improve human-machine interaction. Thank you for taking the time to check out our project - it was made with lots of love and cupcakes! โค๏ธ๐ง