• Stars
    star
    570
  • Rank 75,983 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created about 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks

AgentChain logo

AgentChain uses Large Language Models (LLMs) for planning and orchestrating multiple Agents or Large Models (LMs) for accomplishing sophisticated tasks. AgentChain is fully multimodal: it accepts text, image, audio, tabular data as input and output.

  • ๐Ÿง  LLMs as the brain: AgentChain leverages state-of-the-art Large Language Models to provide users with the ability to plan and make decisions based on natural language inputs. This feature makes AgentChain a versatile tool for a wide range of applications, such as task execution give natural language instructions, data understanding, and data generation.
  • ๐ŸŒŸ Fully Multimodal IO: AgentChain is fully multimodal, accepting input and output from various modalities, such as text, image, audio, or video (coming soon). This feature makes AgentChain a versatile tool for a wide range of applications, such as computer vision, speech recognition, and transitioning from one modality to another.
  • ๐Ÿค Orchestrate Versatile Agents: AgentChain can orchestrate multiple agents to perform complex tasks. Using composability and hierarchical structuring of tools AgentChain can choose intelligently which tools to use and when for a certain task. This feature makes AgentChain a powerful tool for projects that require complex combination of tools.
  • ๐Ÿ”ง Customizable for Ad-hoc Needs: AgentChain can be customized to fit specific project requirements, making it a versatile tool for a wide range of applications. Specific requirements can be met by enhancing capabilities with new agents (and distributed architecture coming soon).

Get started

  1. Install requirements: pip install -r requirements.txt
  2. Download model checkpoints: bash download.sh
  3. Depending on the agents you need in-place, make sure to export environment variables
OPENAI_API_KEY={YOUR_OPENAI_API_KEY} # mandatory since the LLM is central in this application
SERPAPI_API_KEY={YOUR_SERPAPI_API_KEY}  # make sure to include a serp API key in case you need the agent to be able to search the web

# These environment variables are needed in case you want the agent to be able to make phone calls
AWS_ACCESS_KEY_ID={YOUR_AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY={YOUR_AWS_SECRET_ACCESS_KEY}
TWILIO_ACCOUNT_SID={YOUR_TWILIO_ACCOUNT_SID}
TWILIO_AUTH_TOKEN={YOUR_TWILIO_AUTH_TOKEN}
AWS_S3_BUCKET_NAME={YOUR_AWS_S3_BUCKET_NAME} # make sure to create an S3 bucket with public access
  1. Install ffmpeg library (needed for whisper): sudo apt update && sudo apt install ffmpeg (Ubuntu command)
  2. Run the main script: python main.py

System requirements

As of this commit, it is needed to have at least 29 GB of GPU memory to run the AgentChain. However, make sure to assign GPU devices correctly in main.py.

You can comment out some tools and models to reduce the GPU memory footprint (but for less capabilities).

Demo

AgentChain demo 1: transcribing audio and visualizing the result as an image. A video of the AgentChain interface shows an uploaded audio and the resulting generated image, which is a representation of the audio content.

Demo1.sound.mp4

AgentChain demo 2: asking questions about an image. A video of the AgentChain interface shows an image and a question being asked about it, with the resulting answer displayed below.

Demo2.sound.mp4

AgentChain demo 3: question-answering on tabular data and making a phone call to report the results. A video of the AgentChain interface shows a table of data with a question being asked and the resulting answer displayed, followed by a phone call being made using the CommsAgent.

Demo3.sound.mp4

Agents in AgentChain

The content of this document mostly shows our vision and what we aim to achieve with AgentChain. Check the Demo section to understand what we achieved so far.

AgentChain is a sophisticated system with the goal of solving general problems. It can orchestrate multiple agents to accomplish sub-problems. These agents are organized into different groups, each with their unique set of capabilities and functionalities. Here are some of the agent groups in AgentChain:

SearchAgents

The SearchAgents group is responsible for gathering information from various sources, including search engines, online databases, and APIs. The agents in this group are highly skilled at retrieving up-to-date world knowledge information. Some examples of agents in this group include the Google Search API, Bing API, Wikipedia API, and Serp.

CommsAgents

The CommsAgents group is responsible for handling communication between different parties, such as sending emails, making phone calls, or messaging via various platforms. The agents in this group can integrate with a wide range of platforms. Some examples of agents in this group include TwilioCaller, TwilioEmailWriter, TwilioMessenger and Slack.

ToolsAgents

The ToolsAgents group is responsible for performing various computational tasks, such as performing calculations, running scripts, or executing commands. The agents in this group can work with a wide range of programming languages and tools. Some examples of agents in this group include Math, Python REPL, and Terminal.

MultiModalAgents

The MultiModalAgents group is responsible for handling input and output from various modalities, such as text, image, audio, or video (coming soon). The agents in this group can process and understand different modalities. Some examples of agents in this group include OpenAI Whisper, Blip2, Coqui, and StableDiffusion.

ImageAgents

The ImageAgents group is responsible for processing and manipulating images, such as enhancing image quality, object detection, or image recognition. The agents in this group can perform complex operations on images. Some examples of agents in this group include Upscaler, ControlNet and YOLO.

DBAgents

The DBAgents group is responsible for adding and fetching data from your database, such as getting metrics or aggregations from your database. The agents in this group interact with databases and enrich other agents with your database information. Some examples of agents in this group include SQL, MongoDB, ElasticSearch, Qrant and Notion.

Potential Applications

Example 1: ๐Ÿ๏ธ๐Ÿ“ธ๐ŸŒ… AgentChain Image Generation System for Travel Company

As a travel company that is promoting a new and exotic destination, it is crucial to have high-quality images that can grab the attention of potential travelers. However, manually creating stunning images can be time-consuming and expensive. That's why the travel company wants to use AgentChain to automate the image generation process and create beautiful visuals with the help of various agents.

Here is how AgentChain can help by chaining different agents together:

  1. Use SearchAgent (Google Search API, Wikipedia API, Serp) to gather information and inspiration about the destination, such as the most popular landmarks, the local cuisine, and the unique features of the location.
  2. Use ImageAgent (Upscaler) to enhance the quality of images and make them more appealing by using state-of-the-art algorithms to increase the resolution and remove noise from the images.
  3. Use MultiModalAgent (Blip2) to generate descriptive captions for the images, providing more context and making the images more meaningful.
  4. Use CommsAgent (TwilioEmailWriter) to send the images to the target audience via email or other messaging platforms, attracting potential travelers with stunning visuals and promoting the new destination.

Example 2: ๐Ÿ’ผ๐Ÿ’น๐Ÿ“ˆ AgentChain Financial Analysis Report for Investment Firm

As an investment firm that manages a large portfolio of stocks, it is critical to stay up-to-date with the latest market trends and analyze the performance of different stocks to make informed investment decisions. However, analyzing data from multiple sources can be time-consuming and error-prone. That's why the investment firm wants to use AgentChain to automate the analysis process and generate reports with the help of various agents.

Here is how AgentChain can help by chaining different agents together:

  1. Use ToolsAgent (Python REPL, TableQA) to analyze data from different sources (e.g., CSV files, stock market APIs) and perform calculations related to financial metrics such as earnings, dividends, and P/E ratios.
  2. Use SearchAgent (Bing API) to gather news and information related to the stocks in the portfolio, such as recent earnings reports, industry trends, and analyst ratings.
  3. Use NLPAgent (GPT) to create a summary and bullet points of the news and information gathered, providing insights into market sentiment and potential trends.
  4. Use CommsAgent (TwilioEmailWriter) to send a summary report of the analysis to the appropriate stakeholders, helping them make informed decisions about their investments.

Example 3: ๐Ÿ›๏ธ๐Ÿ’ฌ๐Ÿ’ป AgentChain Customer Service Chatbot for E-commerce Site

As an e-commerce site that wants to provide excellent customer service, it is crucial to have a chatbot that can handle customer inquiries and support requests in a timely and efficient manner. However, building a chatbot that can understand and respond to complex customer requests can be challenging. That's why the e-commerce site wants to use AgentChain to automate the chatbot process and provide superior customer service with the help of various agents.

Here is how AgentChain can help by chaining different agents together:

  1. Use MultiModalAgent (Blip2, Whisper) to handle input from various modalities (text, image, audio), making it easier for customers to ask questions and make requests in a natural way.
  2. Use SearchAgent (Google Search API, Wikipedia API) or DBAgent to provide information about products or services whether in-house or public, such as specifications, pricing, and availability.
  3. Use CommsAgent (TwilioMessenger) to communicate with customers via messaging platforms, providing support and answering questions in real-time.
  4. Use ToolsAgent (Math) to perform calculations related to discounts, taxes, or shipping costs, helping customers make informed decisions about their purchases.
  5. Use MultiModalAgent (Coqui) to generate natural-sounding responses and hold more complex conversations, providing a personalized and engaging experience for customers.

Example 4: ๐Ÿง‘โ€โš•๏ธ๐Ÿ’Š๐Ÿ’ค AgentChain Personal Health Assistant

Access to personal health assistance can be expensive and limited. It is essential to have a personal health assistant that can help individuals manage their health and well-being. However, providing personalized health advice and reminders can be challenging, especially for seniors. That's why AgentChain aims to automate the health assistant process and provide personalized support with the help of various agents.

Here is how AgentChain can help by chaining different agents together:

  1. Use DBAgent to handle input from various health monitoring devices (e.g., heart rate monitors, blood pressure monitors, sleep trackers), providing real-time health data and alerts to the health assistant.
  2. Use SearchAgent (Google Search API, Wikipedia API) or any other medical database to provide information about health topics and medications, such as side effects, dosage, and interactions.
  3. Use NLPAgent (GPT) to generate personalized recommendations for diet, exercise, and medication, taking into account the seniors' health goals and preferences.
  4. Use CommsAgent (TwilioCaller, TwilioMessenger) to advise, make reminders and provide alerts to help stay on track with their health goals, improving their quality of life and reducing the need for emergency care.

Acknowledgements

We appreciate the open source of the following projects:

Hugging Face โ€‚ LangChain โ€‚ Stable Diffusion โ€‚ ControlNet โ€‚ InstructPix2Pix โ€‚ CLIPSeg โ€‚ BLIP โ€‚ Microsoft โ€‚

More Repositories

1

jina

โ˜๏ธ Build multimodal AI applications with cloud-native stack
Python
20,171
star
2

clip-as-service

๐Ÿ„ Scalable embedding, reasoning, ranking for images and sentences with CLIP
Python
12,150
star
3

reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
TypeScript
3,126
star
4

dalle-flow

๐ŸŒŠ A Human-in-the-Loop workflow for creating HD images from text
Python
2,826
star
5

dev-gpt

Your Virtual Development Team
Python
1,658
star
6

langchain-serve

โšก Langchain apps in production using Jina & FastAPI
Python
1,573
star
7

finetuner

๐ŸŽฏ Task-oriented embedding tuning for BERT, CLIP, etc.
Python
1,443
star
8

thinkgpt

Agent techniques to augment your LLM and push it beyong its limits
Python
1,402
star
9

auto-gpt-web

Set Your Goals, AI Achieves Them.
TypeScript
749
star
10

docarray

The data structure for unstructured data
Python
522
star
11

vectordb

A Python vector database you just need - no more, no less.
Python
481
star
12

jcloud

Simplify deploying and managing Jina projects on Jina Cloud
Python
294
star
13

jina-video-chat

Python
266
star
14

jinabox.js

A lightweight, customizable omnibox in Javascript, for use with a Jina backend.
JavaScript
219
star
15

annlite

โšก A fast embedded library for approximate nearest neighbor search
Python
214
star
16

rungpt

An open-source cloud-native of large multi-modal models (LMMs) serving framework.
Python
140
star
17

fastapi-serve

FastAPI to the Cloud, Batteries Included! โ˜๏ธ๐Ÿ”‹๐Ÿš€
Python
139
star
18

jina-hub

An open-registry for hosting Jina executors via container images
Python
103
star
19

dashboard

Interactive UI for analyzing Jina logs, designing Flows and viewing Hub images
TypeScript
100
star
20

GoldRetriever

Create and host retrieval plugins for ChatGPT in one click
Python
61
star
21

example-multimodal-fashion-search

Input text or image, get back matching image fashion results, using Jina, DocArray, and CLIP
Python
44
star
22

jinaai-py

Python
44
star
23

streamlit-jina

Streamlit component for Jina neural search
Python
37
star
24

docs

Jina V1 Official Documentation. For the latest one, please check out https://docs.jina.ai
HTML
35
star
25

executors

internal-only
Python
28
star
26

jerboa

LLM finetuning
Python
27
star
27

jina-ai.github.io

Homepage of Jina AI Limited
HTML
27
star
28

jinaai-js

TypeScript
27
star
29

example-meme-search

Meme search engine built with Jina neural search framework. Search with captions or image files to find matching memes.
Python
23
star
30

example-app-store

App store search example, using Jina as backend and Streamlit as frontend
Python
21
star
31

docsQA-ui

Web UI for docsQA. Main branch: https://jina-docqa-ui.netlify.app/
TypeScript
20
star
32

example-speech-to-image

An example of building a speech to image generation pipeline with Jina, Whisper and StableDiffusion
Python
20
star
33

jina-hubble-sdk

Python API for authentication, resource management with Hubble
Python
19
star
34

product-recommendation-redis-docarray

Python
18
star
35

career

Find out job opportunities at Jina AI
17
star
36

executor-3d-encoder

An executor that wraps 3D mesh models and encodes 3D content documents to d-dimension vector.
Python
16
star
37

client-go

Golang Client for Jina (https://github.com/jina-ai/jina)
Go
16
star
38

workshops

Jupyter Notebook
15
star
39

benchmark

Benchmark environment and results of different versions of Jina.
Python
14
star
40

action-hub-builder

Simple interface for building & validating Jina Hub executors.
Python
12
star
41

inference-client

Python
12
star
42

executor-hnsw-postgres

A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL
Python
12
star
43

now

Python
11
star
44

cookiecutter-jina

Cookiecutter template for a Jina project
Python
10
star
45

simple-jina-examples

Python
9
star
46

executor-simpleindexer

Simple Indexer
Python
9
star
47

cloud-ops

Python
8
star
48

good-first-issues

Issues that don't fit under Jina's other repos!
8
star
49

executor-clip-encoder

Encoder that embeds documents using either the CLIP vision encoder or the CLIP text encoder, depending on the content type of the document.
Python
8
star
50

api

API schema of Jina command line interface exposed as JSON and YAML files.
HTML
8
star
51

inference-client-js

TypeScript
7
star
52

executor-text-transformers-dprreader-ranker

DPRReaderRanker
Python
7
star
53

executor-video-loader

Python
7
star
54

executor-image-clip-encoder

CLIPImageEncoder is an image encoder that wraps the image embedding functionality using the CLIP
Python
7
star
55

.github

This repository stores github actions templates as described https://docs.github.com/en/actions/learn-github-actions/sharing-workflows-with-your-organization
7
star
56

GSoC

Google Summer of Code
7
star
57

example-wikipedia-recommendation

An example of graph embeddings for wikipedia page recommendations
Jupyter Notebook
6
star
58

executor-U100KIndexer

An Indexer that works out-of-the-box when you have less than 100K stored Documents
Python
6
star
59

devrel-heartmaker

Heart mosaics of your GitHub contributors
Python
6
star
60

executor-text-transformers-torch-encoder

**TransformerTorchEncoder** wraps the torch-version of transformers from huggingface. It encodes text data into dense vectors.
Python
6
star
61

executor-cases

Summarize all Executor patterns for Hubble
Python
5
star
62

executor-normalizer

Jina executor package normalizer
Python
5
star
63

auth

deprecated, use `jina-hubble-sdk`
Python
5
star
64

jina-commons

A collection of shared function for Jina Executor
Python
5
star
65

tutorial-notebooks

Jupyter Notebook
5
star
66

jina-paddle-hackathon

ๆž็บณ x ็™พๅบฆ้ฃžๆกจ ้ป‘ๅฎข้ฉฌๆ‹‰ๆพ
Python
5
star
67

executor-image-preprocessor

An executor that performs standard pre-processing and normalization on images.
Python
5
star
68

jina-hackathon

Support repo for Jina X Hackathon - Sep 2020
5
star
69

executor-featurehasher

FeatureHasher
Python
4
star
70

stress-test

A collection of stress tests of Jina infrastructure
Python
4
star
71

executor-image-clip-classifier

Python
4
star
72

executor-text-transformerqa

**TransformerQAExecutor* wraps a question-answering model from huggingface and return relevant answers given questions and contexts/paragraphs.
Python
4
star
73

hub-integration

Integration test for hub
Python
4
star
74

executor-faissindexer

A similarity search indexer based on Faiss. https://hub.jina.ai/executor/8gsd0tts
Python
4
star
75

example-audio-search

Python
3
star
76

example-video-qa

This is an example of building a video QA with jina
TypeScript
3
star
77

jinad

Management of Jina on remote
Python
3
star
78

executor-indexers

Indexer Executors for Jina
Python
3
star
79

executor-text-dpr-encoder

Encode text into embeddings using the DPR model.
Python
3
star
80

jina-sagemaker

Jina Embedding Models on AWS SageMaker
Jupyter Notebook
3
star
81

executor-clip-image

Executor for the pre-trained clip model. https://openai.com/blog/clip/
Python
3
star
82

executor-weaviate-indexer

Python
3
star
83

executor-doc2query

Python
3
star
84

executor-evaluator-ranking

Python
3
star
85

legacy-examples

Unmaintained examples for Jina
Python
3
star
86

executor-image-paddle-encoder

Python
3
star
87

jupyter-notebooks

Jupyter Notebook
3
star
88

executor-yolov5

Python
3
star
89

executor-lightgbm-ranker

Python
3
star
90

terraform-jina-jinad-aws

Module for deploying JinaD on AWS
HCL
3
star
91

encoder-image-torch

The ImageTorchEncoder encodes Document content from a ndarray to an d-dimensional vector.
Python
3
star
92

executor-image-niireader

Python
2
star
93

example-odqa

Roff
2
star
94

jina-ui

Monorepo for JinaJS and frontend projects
TypeScript
2
star
95

executor-audio-clip-encoder

Wraps the AudioCLIP model for generating embeddings for audio data for the Jina framework
Python
2
star
96

executor-text-clip-encoder

Encode text into embeddings using the CLIP model.
Python
2
star
97

executor-image-normalizer

Executor that reads, resizes, crops and normalizes images.
Python
2
star
98

executor-vgg-audio-encoder

Python
2
star
99

executor-image-hasher

An executor to encode images using comparable hashing techniques. Useful for duplicate detection
Python
2
star
100

executor-image-clothing-segmenter

An executor that performs image segmentation on fashion items
Python
2
star