Task-oriented finetuning for better embeddings on neural search
Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing fine-tuning can be very time-consuming and resource-intensive.
Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure in the cloud. With Finetuner, you can easily enhance the performance of pre-trained models, making them production-ready without extensive labeling or expensive hardware.
๐ Better embeddings: Create high-quality embeddings for semantic search, visual similarity search, cross-modal text<->image search, recommendation systems, clustering, duplication detection, anomaly detection, or other uses.
โฐ Low budget, high expectations: Bring considerable improvements to model performance, making the most out of as little as a few hundred training samples, and finish fine-tuning in as little as an hour.
๐ Performance promise: Enhance the performance of pre-trained models so that they deliver state-of-the-art performance on domain-specific applications.
๐ฑ Simple yet powerful: Easy access to 40+ mainstream loss functions, 10+ optimizers, layer pruning, weight freezing, dimensionality reduction, hard-negative mining, cross-modal models, and distributed training.
โ All-in-cloud: Train using our GPU infrastructure, manage runs, experiments, and artifacts on Jina AI Cloud without worrying about resource availability, complex integration, or infrastructure costs.
Documentation
Pretrained Text Embedding Models
name | parameter | dimension | Huggingface |
---|---|---|---|
jina-embedding-t-en-v1 | 14m | 312 | link |
jina-embedding-s-en-v1 | 35m | 512 | link |
jina-embedding-b-en-v1 | 110m | 768 | link |
jina-embedding-l-en-v1 | 330m | 1024 | link |
Benchmarks
Model | Task | Metric | Pretrained | Finetuned | Delta | Run it! |
---|---|---|---|---|---|---|
BERT | Quora Question Answering | mRR | 0.835 | 0.967 | 15.8% | |
Recall | 0.915 | 0.963 | 5.3% | |||
ResNet | Visual similarity search on TLL | mAP | 0.110 | 0.196 | 78.2% | |
Recall | 0.249 | 0.460 | 84.7% | |||
CLIP | Deep Fashion text-to-image search | mRR | 0.575 | 0.676 | 17.4% | |
Recall | 0.473 | 0.564 | 19.2% | |||
M-CLIP | Cross market product recommendation (German) | mRR | 0.430 | 0.648 | 50.7% | |
Recall | 0.247 | 0.340 | 37.7% | |||
PointNet++ | ModelNet40 3D Mesh Search | mRR | 0.791 | 0.891 | 12.7% | |
Recall | 0.154 | 0.242 | 57.1% |
All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models, 5e-4 for PointNet++
Install
Make sure you have Python 3.8+ installed. Finetuner can be installed via pip
by executing:
pip install -U finetuner
If you want to submit a fine-tuning job on the cloud, please use
pip install "finetuner[full]"
โ ๏ธ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is0.4.1
. This version is still available for installation viapip
. See Finetuner git tags and releases.
Articles about Finetuner
Check out our published blogposts and tutorials to see Finetuner in action!
- Fine-tuning with Low Budget and High Expectations
- Hype and Hybrids: Search is more than Keywords and Vectors
- Improving Search Quality for Non-English Queries with Fine-tuned Multilingual CLIP Models
- How Much Do We Get by Finetuning CLIP?
If you find Jina Embeddings useful in your research, please cite the following paper:
@misc{gรผnther2023jina,
title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
author={Michael Gรผnther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
year={2023},
eprint={2307.11224},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Support
- Use Discussions to talk about your use cases, questions, and support queries.
- Join our Discord community and chat with other community members about ideas.
- Join our Engineering All Hands meet-up to discuss your use case and learn Jina AI new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public events calendar/.ical) and live stream on YouTube
- Subscribe to the latest video tutorials on our YouTube channel
Join Us
Finetuner is backed by Jina AI and licensed under Apache-2.0.
We are actively hiring AI engineers and solution engineers to build the next generation of open-source AI ecosystems.