• Stars
    star
    8,364
  • Rank 4,278 (Top 0.09 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The user analytics platform for LLMs







The next-generation platform to monitor and optimize your AI costs in one place

Nebuly is the next-generation platform to monitor and optimize your AI costs in one place. The platform connects to all your AI cost sources (compute, API providers, AI software licenses, etc) and centralizes them in one place to give you full visibility on a model basis. The platform also provides optimization recommendations and a co-pilot model that can guide during the optimization process. The platform builds on top of the open-source tools allowing you to optimize the different steps of your AI stack to squeeze out the best possible cost performances.

If you like the idea, give us a star to show your support for the project 

Apply for enterprise version early access here.

AI costs monitoring (SDK)

The monitoring platform allows you to monitor 100% of your AI costs. We support 3 main buckets of costs:

  • Infrastructure and compute (AWS, Azure, GCP, on-prem)
  • AI-related software/tools licenses (OpenAI, Cohere, Scale AI, Snorkel, Pinecone, HuggingFace, Databricks, etc)
  • People (Jira, GitLab, Asana, etc)

The easiest way to install the SDK is via pip:

pip install nebuly

The list of the supported integrations will be available soon.

AI cost optimization

Once you have full visibility over your AI costs, you are ready to optimize them. We have developed multiple open-source tools to optimize the cost and improve the performances of your AI systems:

 Speedster: reduce inference costs by leveraging SOTA optimization techniques that best couple your AI models with the underlying hardware (GPUs and CPUs)

✅ Nos: reduce infrastructure costs by leveraging real-time dynamic partitioning and elastic quotas to maximize the utilization of your Kubernetes GPU cluster

 ChatLLaMA: reduce hardware and data costs by leveraging fine-tuning optimization techniques and RLHF alignment

Contributing

As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features, improved infrastructure, and better documentation. If you're interested in contributing, please see the linked page for more information on how to get involved.


Join the community | Contribute to the library