Intel-Xeon-LLM-RAG-Inference-Setup
This repository provides a comprehensive guide to setting up and running a LLM inference server optimized for Intel Xeon machines, with a focus on Retrieval Augmented Generation (RAG). The repository includes step-by-step instructions for configuring a Docker-based server environment and a Python client setup.