Can AI Call its Own Bluffs?

The repository provides code for training the models described in this report.

Installation

All experiments were conducted on Google Colab instance with L4 GPU, below is the command to install the required the dependencies:

! pip install wandb xformers trl peft accelerate bitsandbytes flash-attn evaluate timeout-decorator git+https://github.com/google-research/bleurt.git

For reproducibility in the future:

the Python version used in these experiments is 3.10.12
requirements.txt is also provided, which is the output of pip freeze on the same type of instance

Experiment Results

Here's a W&B dashboard with all the experiment logs.

Below are some highlighted results:

Score lambda = 0.01

When a small value of score lambda is used, the proposed alignment procedure is able to improve the calibration on the held-out data.

Only score

When only oracle score is used as a reward, the proposed alignment procedure is able to improve the oracle score on the held-out data.

Confidence for ground truth answers

When the model is generating an incorrect answer, if it is prompted with a ground truth answer, the model's confidence on average is higher for models that had a calibration component during the alignment procedure.

Citation

If this repository is useful for you, please cite as:

@misc{arozhkov2024llmcalib,
  author = {Aleksei Rozhkov},
  title = {Can AI Call its Own Bluffs?},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/alexisrozhkov/llm_calib}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can AI Call its Own Bluffs?

Installation

Experiment Results

Score lambda = 0.01

Only score

Confidence for ground truth answers

Citation

About

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Can AI Call its Own Bluffs?

Installation

Experiment Results

Score lambda = 0.01

Only score

Confidence for ground truth answers

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 1

Languages