• Stars
    star
    24
  • Rank 986,245 (Top 20 %)
  • Language
    Python
  • Created 2 months ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing and batching to deliver high-quality text extraction from complex PDF documents. Ideal for businesses seeking efficient document digitization and data extraction solutions.

More Repositories

1

telnyx-whisper-bulk-caller

A script that uses Telnyx API to make bulk calls to a given list of numbers and analyzes the audio recordings with Whisper. This way, we donโ€™t have to read all the transcriptions.
Python
129
star
2

write-into-menubar

Write whatever you want into OSX menubar by using BitBar + Alfred Powerpack.
Shell
34
star
3

context-aware-srt-translation-gpt

A repository trying to translate subtitles with GPT 3.5 Turbo without losing context (using the dynamic window context method).
Python
27
star
4

data-preparation-for-fine-tuning

A Python project for preparing and analyzing datasets from JSONL files. It includes tools for shuffling, categorizing, and generating reports on dataset content.
Python
14
star
5

rust-hyper-load-balanced-api-client

A high-performance Rust tool for sending API requests (to LLMs in my case) with built-in weighted load balancing, retry mechanisms, and rate limiting. Using hyper for fast request handling, it manages large volumes of asynchronous requests and is optimized for 10K request per second.
Rust
12
star
6

fastapi-http-proxy-with-caching

A FastAPI-based HTTP proxy with request caching using Redis, designed to forward requests while caching responses for efficient repeated queries.
Python
9
star
7

bulk-openai-embeddings-creator

A multi-threaded script & CLI tool for generating embeddings from text using multiple OpenAI endpoints. Supports resuming from previously processed data and customizable thread configurations
Python
8
star
8

cluster-by-similarity-high-dim-vectors

Python script for automated clustering of embeddings with DBSCAN, using cosine similarity for flexible, size-agnostic grouping (I hate K-means); outputs analysis metrics.
Python
7
star
9

copilot-pr-first-comment-updater

Python script that appends the copilot:all label to the first comments of closed pull requests in a specified GitHub repository, allowing easy tracking and management of processed PRs.
Python
4
star
10

n8n-docker-ffmpeg

Repository for setting up n8n with ffmpeg using Docker Compose, including beginner-friendly instructions and systemd service configuration for automatic startup.
Dockerfile
3
star
11

pineconedb-appscript-integration-for-sheets

A Google Apps Script custom function to fetch similar categories from a vector database using OpenAI and Pinecone APIs. This function can be used directly in Google Sheets.
JavaScript
3
star
12

go-native-squid-proxy

GoNativeSquidProxy is a high-performance, scalable proxy server fully written in Go, designed to efficiently handle HTTP/HTTPS requests as a modern alternative to Squid.
Go
2
star
13

Go-JSON-AzureSearch-Prepper

A Go utility for processing and combining JSON files, making them ready for integration with Azure Search AI.
Go
2
star
14

kv-backup-enhanced

A script to help you to download 1200 Cloudflare KV records per 5 minute (restricted due to Cloudflare's global rate limit)
Python
2
star
15

GoogleSheets-Translator

Automate translations in Google Sheets using Google Translate, optimized for handling large datasets with efficient batch processing (by using =GOOGLETRANSLATE formula to handle large amount of data effectively)
JavaScript
1
star
16

translation-cli-by-openai-api

This CLI tool makes it easy to translate a large number of strings from an XLSX file into many languages using OpenAI. It has powerful features like a progress bar and the ability to customize the file structure.
1
star