There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Repository Details
A nimble and innovative implementation of the Direct Preference Optimization (DPO) algorithm with Causal Transformer and LSTM model, inspired by the paper of DPO in fine-tuning unsupervised Language Models