Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?

Research Questions

  1. Can LLMs imitate human agents in behavioral economics experiments?
  2. Can these experiments be reproduced using LLMs and expanded with new parameters?
  3. When “endowed” with different attributes, can LLMs represent diverse, human-like perspectives?

Results

  • GPT-3 was able to qualitatively reproduce human findings from behavioral economics experiments.
  • More advanced models (e.g., davinci-003) achieved stronger performance, while smaller models failed to reflect endowment effects reliably.
  • LLM-based experiments are significantly faster and more cost-efficient than human subject studies.
  • LLMs can behave like diversified agents when endowed with different personalities or viewpoints.

Findings

  • Experiment Replication:

    • Classic behavioral insights—such as social preferences, fairness judgments, and status quo bias—were successfully reproduced using LLMs.
  • Diversity Through Endowment:

    • When guided with different political views or preference structures, model responses shifted predictably, demonstrating controllable viewpoint diversity.
  • Limitations of Smaller Models:

    • Lightweight GPT-3 variants (ada, babbage, curie) often failed to capture endowment effects or nuanced behavioral patterns.
  • Memorization Concerns:

    • Because LLMs may have been exposed to descriptions of these experiments during training, questions remain regarding the originality of the reproduced behaviors.
  • Ethical Considerations:

    • Conducting experiments without human participants offers advantages, but the authenticity and potential misrepresentation of AI-generated responses remain open concerns.
  • LLM Models: 5

  • Synthetic Data: 4

  • Method: 4

  • Speed: 4

  • Ethics: 3

  • Accuracy: 3

  • Demographics: 2

If you would like to access more detailed information about this article, click here to view the supplementary material.

5 min read

Related Articles