The repository provides code for training the models described in this report.
All experiments were conducted on Google Colab instance with L4 GPU, below is the command to install the required the dependencies:
! pip install wandb xformers trl peft accelerate bitsandbytes flash-attn evaluate timeout-decorator git+https://github.com/google-research/bleurt.gitFor reproducibility in the future:
- the Python version used in these experiments is
3.10.12 requirements.txtis also provided, which is the output ofpip freezeon the same type of instance
Here's a W&B dashboard with all the experiment logs.
Below are some highlighted results:
When a small value of score lambda is used, the proposed alignment procedure is able to improve the calibration on the held-out data.
When only oracle score is used as a reward, the proposed alignment procedure is able to improve the oracle score on the held-out data.
When the model is generating an incorrect answer, if it is prompted with a ground truth answer, the model's confidence on average is higher for models that had a calibration component during the alignment procedure.
If this repository is useful for you, please cite as:
@misc{arozhkov2024llmcalib,
author = {Aleksei Rozhkov},
title = {Can AI Call its Own Bluffs?},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/alexisrozhkov/llm_calib}}
}