Prompt-Hacking: The New p-Hacking?

Thomas Kosch, Sebastian Feger

January 2025

Abstract

As Large Language Models (LLMs) become increasingly embedded in empirical research workﬂows, their use as analytical tools raises pressing concerns for scientiﬁc integrity. This opinion paper draws a parallel between “prompt-hacking”, the strategic t weaking of prompts to elicit desirable outputs from LLMs, and the welldocumented practice of “p-hacking” in statistical analysi s. We argue that the inherent biases, non-determinism, and opacityof LLMs make them unsuitable for data analysis tasks demanding rigo r, impartiality, and reproducibility. We emphasize how researchers may inadvertently, or even deliberately, adjust prompts to con ﬁrm hypotheses while undermining research validity. We advocate for a critical view of using LLMs in research, transparent prompt documentation, and clear standards for when LLM use is appropria te. We discuss how LLMs can replace traditional analytical meth ods, whereas we recommend that LLMs should only be used with caution, oversight, and justiﬁcation.

Type

Preprint

Publication

In **

Prompt-Hacking: The New p-Hacking?

Abstract

Thomas Kosch

Professor