A benchmark for foundation models on behavioral-science tasks, evaluated at the individual and distributional levels.
reasoning_effort used for evaluation. It applies only to reasoning models.
If you use BehaviorBench or BeFM in your work, please consider citing:
@misc{behaviorbench2026,
title = {{BehaviorBench}: Benchmarking Foundation Models for Behavioral Science Tasks},
author = {Huang, Jin* and Xie, Yutong* and Song, Wanli and Zhang, Xingjian and Yuan, Walter and Jackson, Matthew O. and Mei, Qiaozhu},
year = {2026},
note = {Preprint coming soon}
}