• Stars
    star
    128
  • Rank 279,655 (Top 6 %)
  • Language
    Python
  • License
    The Unlicense
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿง๐Ÿ‘‹ Welcome to CupcakeAGI, where we bake up some sweet and creamy AGI goodness! ๐Ÿฐ๐Ÿค–

CUPCAKEAGI ๐Ÿง๐Ÿฐ๐ŸŽ‰๐Ÿค–๐Ÿง ๐Ÿฉ๐Ÿช

Hey there cupcake lovers๐Ÿงโค๏ธ! I am excited to introduce you to my latest project, CupcakeAGI!

You can find the documentation here: https://akshitireddy.github.io/CUPCAKEAGI/

๐Ÿš€ Features

  • ๐ŸŒ Access to internet
  • ๐Ÿถ Upload Images
  • ๐ŸŽต Upload Audio
  • ๐Ÿ“น Upload Video
  • ๐Ÿ’พ Persistent Memory
  • โค๏ธ Emotions
  • ๐Ÿ’ญ Random Thoughts
  • ๐Ÿ˜ด Dreams
  • ๐Ÿ› ๏ธ Pre-defined Abilities
  • ๐Ÿงฑ Modular approach for adding new Abilities
  • ๐Ÿ“ Assign & schedule Tasks
  • ๐Ÿ“ Asynchronous Task Processing
  • ๐Ÿ—ฃ๏ธ Talk while Tasks are being processed in Background
  • ๐Ÿง‘โ€๐Ÿ’ป Create & Run Python Code
  • ๐Ÿง  GPT-3.5 as the brain

โœจ Demo

demo.mp4

๐Ÿšจ Requirements

Open up a terminal and go to backend/Multi-Sensory Virtual AAGI (you need to have conda installed)

npm install next
conda env create -f environment.yml

๐Ÿ”Œ How to use

Open up a terminal and go to backend/Multi-Sensory Virtual AAGI

conda activate aagi
uvicorn inference:app

Open up another terminal and go to frontend/assistant (you need to have node installed)

npm run dev

Enter your API keys in .env file, You'll need an OPENAI API key, SERPER API key

โœจ About

00 (2)

CupcakeAGI is an agent that aims to mimic human-like behavior and cognitive abilities to assist users in performing various tasks. It's equipped with some sweet๐Ÿฌ features, including the ability to dream๐Ÿ˜ด, have random thoughts, and perform mental simulations on how to complete a task. Just like how we humans have thoughts floating around our heads, CupcakeAGI has a thought bubble๐Ÿ’ญ with abstract words.

To make CupcakeAGI more expressive, I've added emotion parameters. This will allow it to interact with users in a more personal wayโค๏ธ.

One of CupcakeAGI's most impressive features is its ability to accept various forms of sensory data, such as images๐Ÿถ, videos๐Ÿ“น, and audio๐ŸŽต. Although I haven't implemented smell๐Ÿ‘ƒ, touchโœ‹ and taste๐Ÿ‘… yet, it should be similar to what I did for image, video, and audio. You'll need a function to convert the sensory data to text and then it will get added as a file description for the file which will be used while prompting the model.

CupcakeAGI provides two main features for user interaction: talk and task. The talk feature allows for immediate responses to user queries using tools like search engines, calculators, and translators, making it a real-time problem solver. And who doesn't love a good problem solver๐Ÿง , especially when it comes to baking cupcakes๐Ÿง?

The task feature is used for completing tasks at a start time or by a deadline. Both Task & Talk features allows for chaining multiple tools together using a natural language task function that converts the output of one tool into the input of another, making different tools compatible with each other. So, whether you need to bake some cupcakes for a birthday party or a cupcake contest, CupcakeAGI is here to help you out!

Some abilities like search, calculator, wikipedia search are predefined, these abilities are defined as python functions which the agent can use by creating a python script and importing these functions followed by running the final script and saving the output to a text file which it can access. More abilities can be defined and existing ones can be modified in a modular fashion, all one needs to do is to drop the python script in ability functions and then mention it's name, description and directions to use in abilities.json in state_of_mind directory and just like that the agent will have a new ability. The agent can chain these abilities to do more complex tasks and to ensure compatibility it can use the natural_task_function.

Overall, I hope you find CupcakeAGI to be a sweet addition to your life. This project was a lot of fun to create, and I'm excited to see where it goes. Thanks for reading, and happy baking!โœจ

โœจ Why?

  • Our brain processes and integrates these sensory inputs to form a coherent perception of the world around us. Similarly, in the realm of artificial intelligence, the ability to process and integrate multisensory data is crucial for building intelligent agents that can interact with humans in a more natural and effective way.

  • In recent years, large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated remarkable abilities in generating human-like text based on vast amounts of training data. However, these models are typically limited to working with text and image data and lack the ability to process other types of sensory inputs.

  • Beyond the ability to process multisensory data, the LLM agent also exhibits several cognitive abilities that are typically associated with humans. For instance, the agent is equipped with the ability to dream and have random thoughts, which are thought to play important roles in human creativity, memory consolidation, and problem-solving. By incorporating these features into the LLM agent, we aim to create an agent that can assist users in performing tasks in a more natural and effective way and make these agents more human-like.

โœจ Multisensory Data

  • ๐Ÿง Welcome back to the world of cupcakes and baking! We all know that human experience is much more than just text-based interactions. It's not just about reading, but also about experiencing the world with all our senses, including sight ๐Ÿ‘€, sound ๐Ÿ”Š, smell ๐Ÿ‘ƒ, taste ๐Ÿ‘…, and touch ๐Ÿ‘. Similarly, an LLM agent that can work with multisensory data can open up a new world of possibilities for machine learning.

  • Instead of missing out on the rich and varied data available through other sensory modalities, we can use neural network architectures that convert various forms of sensory data into text data that the LLM can work with.

  • For instance, we can use image captioning models like vit-gpt2 and blip to convert images into text data, which the LLM agent can then process. Similarly, for audio data, audio-to-text models like OpenAI's Whisper can be used to convert audio signals into text data.๐Ÿ“ท๐ŸŽค

  • Now, I know what you're thinking: what about videos ๐ŸŽฅ, smell ๐Ÿ‘ƒ, taste ๐Ÿ‘…, and touch ๐Ÿ‘? Don't worry, we got you covered! To save computation, we can use one frame per second of video data and use image captioning models to convert each frame into text. The audio track from the video can be separated and transcribed using audio-to-text models, providing the LLM agent with both visual and auditory data.

  • As for smell ๐Ÿ‘ƒ, taste ๐Ÿ‘…, and touch ๐Ÿ‘, we can use electronic noses and tongues to capture different types of chemical and taste data and convert them into text data that the LLM can process. Haptic sensors can capture pressure, temperature, and other physical sensations and convert them into text data using a neural network or anything else.

  • Remember, these models should be used as modular components that can be easily switched out as new models emerge. Think of them as lego blocks or react components that we can assemble to create a more comprehensive system.

  • So, let's get baking with CupcakeAGI and incorporate multisensory data into an LLM agent to create a more natural and effective human-machine interaction. With the availability of different sensory data, the LLM agent can process and understand various types of data, leading to a more human-like agent that can assist us in different tasks.๐Ÿง๐Ÿ’ป

โœจ Human Like Behavior and Persistent Memory

๐Ÿง๐Ÿ‘‹ Welcome to CupcakeAGI, where we bake up some sweet and creamy AI goodness! ๐Ÿฐ๐Ÿค–

Here are some of the key features of our LLM agent that make it more human-like and effective:

  • ๐Ÿง  Human-like behavior: Our LLM agent is equipped with several features that mimic human behavior, including the ability to dream, have random thoughts, and perform mental simulations of how to complete a task. These features allow the agent to better understand and respond to user queries.

  • ๐Ÿค– Persistent memory: Our LLM agent has a state of mind where all files relating to its personality, emotions, thoughts, conversations, and tasks are stored. Even if the agent has stopped running, all relevant information is still stored in this location. This allows the agent to provide a more personalized and effective experience.

  • ๐Ÿ˜ƒ Emotion parameters: We use emotion parameters such as happiness, sadness, anger, fear, curiosity, and creativity to make the LLM agent more expressive and better understand the user's needs and preferences.

  • ๐Ÿ’ญ Thought bubble: Our LLM agent also has a thought bubble, which is essentially a list of lists that corresponds to different topics. This allows the agent to more effectively process and integrate its thoughts with the user's queries and tasks.

  • ๐Ÿ—ฃ๏ธ Conversation storage: The LLM agent stores the conversation it has had so far and the list of tasks it needs to perform. It breaks the conversation into chunks and summarizes it to maintain coherence and relevance. This allows the agent to maintain a coherent and relevant conversation with the user.

With these features, our LLM agent is better equipped to assist users in performing tasks in a natural and effective way. We hope you enjoy our sweet and creamy AI goodness! ๐Ÿง๐Ÿฐ๐Ÿค–

โœจ Talk & Task

๐Ÿง๐Ÿ‘‹ Welcome to CupcakeAGI! Here are some sweet deets about the LLM agent that will make your tasks a cakewalk:

  • ๐Ÿ—ฃ๏ธ Talk and Task modes make it easy for users to communicate with the LLM and get things done seamlessly.
  • ๐Ÿ“ The LLM converts files like images, videos, and audio to text, making them easy to store and retrieve.
  • ๐Ÿ” With access to various tools like search engines, wikis, and translators, the LLM can provide users with the necessary information for their queries.
  • ๐Ÿงฐ Natural language task functions allow users to chain together different tools, making them compatible with each other.
  • ๐Ÿ•ฐ๏ธ The Task mode is particularly useful for lengthy tasks and can be set to start at a specific time, allowing users to focus on other things while the LLM takes care of the task.
  • ๐Ÿ’ญ The LLM experiences random thoughts and dreams, just like humans, making it more relatable and human-like.
  • ๐Ÿง‘โ€๐Ÿ’ป The LLM can even use Python packages like Hugging Face models to complete tasks, making it a highly versatile agent. So go ahead and give CupcakeAGI a try! With its modular approach, you can easily add new tools and features as needed. Who knew cupcakes and AI could go so well together? ๐Ÿง๐Ÿค–

โœจ Limitations

Welcome to CupcakeAGI! ๐Ÿง๐Ÿฐ๐Ÿฉ๐Ÿช

Let's talk about some important things you need to know about this sweet project:

  • Complex tasks: While CupcakeAGI is as human-like as possible, it may not be able to solve complex tasks that require significant back and forth. We're talking about tasks that involve negotiating with multiple parties to reach a solution. CupcakeAGI is intended to assist individuals on a personal level, but it may not be suitable for solving highly intricate problems. Don't worry, though, CupcakeAGI is still your go-to for all your cupcake baking needs! ๐Ÿง๐Ÿ‘ฉโ€๐Ÿณ

  • Accuracy of sensory data conversion: The effectiveness of CupcakeAGI relies heavily on the accuracy of the neural network architectures used to convert sensory data into text. If these models are not accurate, CupcakeAGI may misunderstand the user's input, leading to incorrect or ineffective responses. But don't fret, we're constantly working on improving CupcakeAGI's accuracy to ensure you get the best experience possible! ๐Ÿค–๐ŸŽ‚

  • Ethics and Privacy: CupcakeAGI has the potential to collect and process a large amount of personal data from the users. Thus, there is a risk that sensitive data may be compromised, leading to privacy concerns. CupCakeAGI will do it's best to keep your cupcake secrets safe! ๐Ÿ”’๐Ÿคซ

Thanks for checking out CupcakeAGI, and remember, with CupcakeAGI by your side, you'll always have the perfect cupcake recipe! ๐Ÿง๐Ÿ’ป

โœจ Conclusion

Welcome to the conclusion of our multisensory LLM agent project! ๐ŸŽ‰๐Ÿง๐Ÿค–๐Ÿง 

Here are the key takeaways from our project ๐Ÿคช๐Ÿง

  • Our LLM agent is like a cupcake, made with many different ingredients - it can work with multisensory data, dream, have random thoughts, and show emotions ๐Ÿง๐Ÿ’ญ๐Ÿ˜
  • By incorporating multisensory data, our agent can understand different types of information, just like a baker uses different ingredients to make a delicious cupcake ๐Ÿฐ๐Ÿ‘€
  • With its cognitive abilities and persistent memory, our agent can assist users in a more human-like way, just like a friendly baker who helps you choose the perfect cupcake flavor ๐Ÿค๐Ÿง
  • This project represents a small but important step towards building more natural and effective AI assistants, just like a small cupcake can bring a smile to someone's face and brighten their day ๐ŸŒž๐Ÿง
  • We hope our project has inspired you to think about the possibilities of multisensory LLM agents and how they can improve human-machine interaction. Thank you for taking the time to check out our project - it was made with lots of love and cupcakes! โค๏ธ๐Ÿง

More Repositories

1

Interactive-LLM-Powered-NPCs

Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game! ๐ŸŽฎ๐Ÿค–๐Ÿš€
Python
507
star
2

AI-Powered-Video-Tutorial-Generator

Create AI-Generated Video Tutorials with Character Animation and Slides!
JavaScript
232
star
3

AI-NPCs-that-can-Control-their-Actions-along-with-Dialogue

AI NPCs that can control their actions along with dialogue. For instance, if I ask an NPC to tell me its favorite magic spell, it not only tells me the spell but also performs it!
Python
38
star
4

AI-Plays-God-of-War

LLM Agent paired with Image Captioning and Yolov8 models plays God of War
Python
34
star
5

Clara-AI

Embarking on a new project is always exciting, but this one had me buzzing with anticipation! I decided to take on the challenge of creating a customized version of Bing's AI, a chatbot powered by none other than ChatGPT with added capabilities of searching the web.
Python
8
star
6

AI-starter-project-collection

A collection of some of my old starter AI mini projects.
2
star
7

twitter-arxiv-news-summary-report-generator

Python
1
star
8

Chatbot-using-OpenAI-Gpt-3

I used OpenAI's Gpt-3 with a speech to text and text to speech module to make a chatbot that can voice chat.
Python
1
star
9

AI-Twitter

Like Twitter but everyone is an AI using LLM and Stable Diffusion
Jupyter Notebook
1
star
10

TransparencyApp

An app to make windows transparent, tested on Windows 11
Python
1
star
11

Real-or-AI

Participants in this competition will be challenged to create a model that can classify AI & Human generated images.
Jupyter Notebook
1
star
12

Sciantia-AI

Sciantia AI is a tool that enables you to ask questions directly to YouTube videos using the advanced capabilities of OpenAIโ€™s LLM.
1
star