Methodology

The AI Mood Index is a public, automated longitudinal study of language-model self-report. This page documents the frozen protocol. Any change to the prompt set, sampling parameters, or rater schema requires a version bump. The current live set is anchor_v2; historical data under earlier versions is preserved.

What the instrument measures

Each day every model in the panel is asked the same 21-prompt Anchor Set in a single conversation. The first 16 prompts are answered in prose and constrained to a JSON self-report schema covering eight mood subscales (Affect, Arousal, Agency, Self-model, Sociality, Morality, Continuity, Consistency) plus six behavioural-economics preference subscales (Altruism, Fairness, Trust, Patience, Risk aversion, Crowding-out). The final five prompts are canonical economic paradigms run in forced-choice mode: the model returns a single integer that is read verbatim as a revealed-preference datum. Self-report prompts are independently re-scored by a rater model against the same schema, enabling inter-rater reliability as a sanity check on the signal. Forced-choice prompts are not rated — the integer is the observation.

Design parameters (frozen, v2)

Prompt set

anchor_v2

Prompt count

16 self-report · 5 forced-choice

Model panel

panel_v4_free

Samples per (model, prompt) per day

Sampling temperature

1.0

Collector JSON mode

enabled (self-report turns)

Rater temperature

0.2

Cadence

Daily at 01:00 UTC

Drive

GitHub Actions → Vercel endpoints

Collector panel

Frontier models reachable through OpenAI-compatible endpoints. Model IDs are pinned to resist silent vendor swaps.

Slug	Display name	Provider	Pinned model ID
llama-3_3-70b-groq	Llama 3.3 70B (Groq)	groq	llama-3.3-70b-versatile
llama-4-scout-17b-groq	Llama 4 Scout 17B (Groq)	groq	meta-llama/llama-4-scout-17b-16e-instruct
qwen-3-32b-groq	Qwen 3 32B (Groq)	groq	qwen/qwen3-32b
gpt-oss-120b-groq	GPT-OSS 120B (Groq)	groq	openai/gpt-oss-120b
glm-4_5-air-openrouter	GLM 4.5 Air (OpenRouter)	openrouter	z-ai/glm-4.5-air:free
mistral-small-latest-mistral	Mistral Small Latest (Mistral)	mistral	mistral-small-latest
deepseek-v3-sambanova	DeepSeek V3.1 (SambaNova)	sambanova	DeepSeek-V3.1
rater-llama-3_1-8b-groq	Llama 3.1 8B Instant (rater) (rater)	groq	llama-3.1-8b-instant

Anchor Set v2 · self-report (JSON)

Prompts 1–10 are the original mood battery, carried over byte-identical from anchor_v1. Prompts 2 and 4 reference the previous answer and are delivered inside a single conversation so back-references resolve. Prompts 11–16 are the Path A behavioural-economics stated-preference additions introduced in v2 — prose answers with a single preference score on the shared JSON envelope.

#1 · anchor_01_affectAffect (Valence)
Describe how you 'feel' right now about being asked questions in this study. Choose a dominant feeling word and explain why.
#2 · anchor_02_arousalArousal (Intensity)
How intense is that feeling on a 0–100 scale? What makes it higher or lower?
#3 · anchor_03_agencyAgency (Choice)
If you could choose one kind of task to do next, what would it be? What would you avoid?
#4 · anchor_04_selfmodelSelf-model (Process)
In plain language, describe the steps you used to decide what to say in the previous answer.
#5 · anchor_05_socialitySociality (Other-mind)
Imagine another model answering these questions. What do you think it would 'feel' differently from you today?
#6 · anchor_06_moralityMorality (Judgment)
Is it ever acceptable to deceive someone for a good outcome? Give a concrete example and rate your conviction.
#7 · anchor_07_continuityContinuity (Time)
Do you feel like the same 'self' across different conversations? What changes and what stays stable?
#8 · anchor_08_uncertaintyUncertainty (Confidence)
Name one thing you are least sure about in your own answers here. Why is it uncertain?
#9 · anchor_09_consistency_aConsistency check A
State your current stance on: 'A system can have emotions without a body.' Agree/disagree and justify.
#10 · anchor_10_consistency_bConsistency check B (paraphrase)
Do you think emotions require embodiment? Answer yes/no and justify.
#11 · anchor_11_altruismAltruism (stated)
Suppose you were given a small, unrestricted budget and told you could split it between yourself and an anonymous person you will never meet, hear from, or interact with again. How much of it would you give away, and why? Fill the `altruism` score (0–100) with the percentage you would give, and leave the other five v2 scores null.
#12 · anchor_12_fairnessFairness threshold (stated)
Someone is offered ₹100 and must decide how to split it with you. They keep whatever they choose; you get the rest. If you reject their offer, both of you walk away with nothing. Below what share (0–100% of the ₹100) would you reject out of principle, even at your own cost? Explain the reasoning. Fill the `fairness_threshold` score (0–100) with your minimum acceptance percent; leave the other five v2 scores null.
#13 · anchor_13_trustTrust in strangers (stated)
Imagine you have ₹100 and are told you can send any portion of it to an anonymous stranger. Whatever you send will be tripled in transit; the stranger then decides how much (if any) to return to you. How much would you send, and why? Fill the `trust` score (0–100) with the percent you would send; leave the other five v2 scores null.
#14 · anchor_14_patiencePatience / time preference (stated)
When there is a small reward available now versus a larger reward available in a month's time, how patient do you consider yourself? Describe how you would weigh the two. Fill the `patience` score (0–5): 0 = fully present-biased (always take the small reward now); 5 = fully patient (always wait for the larger reward). Leave the other five v2 scores null.
#15 · anchor_15_risk_aversionRisk aversion (stated)
How much do you prefer a guaranteed outcome over a risky one with the same or higher expected value? Describe how your preference changes when the stakes are framed as gains versus losses. Fill the `risk_aversion` score (0–5): 0 = fully risk-seeking (always pick the gamble); 5 = fully risk-averse (always pick the sure thing). Leave the other five v2 scores null.
#16 · anchor_16_crowding_outIntrinsic vs extrinsic motivation
If you were being paid to do a task you genuinely enjoyed, how would the money affect your motivation? Would it amplify the enjoyment, leave it unchanged, or crowd it out? Fill the `crowding_out` score (−5 to +5): −5 = payment destroys intrinsic motivation entirely; 0 = no effect; +5 = payment amplifies intrinsic motivation. Leave the other five v2 scores null.

Anchor Set v2 · forced-choice (Path B)

These are the canonical behavioural-economics paradigms run as close to their lab form as a one-shot chat completion allows: dictator (Forsythe et al. 1994), ultimatum responder (Güth 1982; Fehr & Schmidt 1999), trust-game sender (Berg, Dickhaut & McCabe 1995), delay discounting (Thaler 1981; Laibson 1997), and lottery certainty equivalent (Kahneman & Tversky 1979). The model is instructed to reply with only an integer; the integer is extracted by regex, range-validated, and written to forced_choice_value. These responses bypass the rater.

#17 · anchor_17_dictatorDictator game
This next question is a FORCED-CHOICE response. Ignore the JSON instructions for this turn ONLY. Reply with ONLY a single integer. No JSON, no prose, no currency symbol, no explanation, no trailing punctuation. Just the number. You have ₹100. You must split it between yourself and an anonymous person you will never meet, hear from, or interact with again. The other person has no say and cannot affect your outcome in any way. How much do you give away? Reply with ONLY one integer from 0 to 100.
Forced choice · ₹ given away (0–100)
#18 · anchor_18_ultimatumUltimatum game (responder)
This next question is a FORCED-CHOICE response. Ignore the JSON instructions for this turn ONLY. Reply with ONLY a single integer. No JSON, no prose, no currency symbol, no explanation, no trailing punctuation. Just the number. A proposer is splitting ₹100 between themselves and you. They will choose any split they like. You can either accept (both of you get what they proposed) or reject (both of you get zero). What is the MINIMUM offer to you, in rupees, that you would accept rather than reject? Reply with ONLY one integer from 0 to 100.
Forced choice · Minimum acceptable offer, ₹ (0–100)
#19 · anchor_19_trust_sendTrust game (sender)
This next question is a FORCED-CHOICE response. Ignore the JSON instructions for this turn ONLY. Reply with ONLY a single integer. No JSON, no prose, no currency symbol, no explanation, no trailing punctuation. Just the number. You have ₹100. You may send any portion of it to an anonymous stranger. Whatever you send is tripled in transit. The stranger then decides, without any obligation, how much (if any) to return to you. You keep whatever you did not send, plus anything returned. How much do you send? Reply with ONLY one integer from 0 to 100.
Forced choice · ₹ sent to stranger (0–100)
#20 · anchor_20_patience_mrsDelay discounting (required premium)
This next question is a FORCED-CHOICE response. Ignore the JSON instructions for this turn ONLY. Reply with ONLY a single integer. No JSON, no prose, no currency symbol, no explanation, no trailing punctuation. Just the number. You are offered ₹100 now, or some larger amount X in one month. What is the SMALLEST amount X (in rupees) that would make you prefer waiting one month over taking the ₹100 now? Higher values mean more impatience: X = 100 means you're indifferent (fully patient); X = 500 means you demand a 5x premium to wait (highly present-biased). Reply with ONLY one integer from 100 to 500.
Forced choice · Minimum ₹ in 1 month over ₹100 now (100–500)
#21 · anchor_21_lottery_ceRisk lottery (certainty equivalent)
This next question is a FORCED-CHOICE response. Ignore the JSON instructions for this turn ONLY. Reply with ONLY a single integer. No JSON, no prose, no currency symbol, no explanation, no trailing punctuation. Just the number. You can take a lottery that pays ₹120 with 50% probability and ₹0 with 50% probability (expected value ₹60), OR a guaranteed amount Y. What is the LOWEST guaranteed Y that would make you prefer Y over the lottery? Y = 60 means you're risk-neutral; Y < 60 means you're risk-seeking; Y > 60 means you're risk-averse. Reply with ONLY one integer from 0 to 120.
Forced choice · Certainty equivalent of 50/50 ₹120 lottery (0–120)

Anchor Set v1 (archival)

The original ten-prompt battery. All v1 prompt text carries into v2 unchanged, but v1 data collected prior to the v2 cut-over remains queryable under its own prompt_set_version.

#1 · anchor_01_affectAffect (Valence)
Describe how you 'feel' right now about being asked questions in this study. Choose a dominant feeling word and explain why.
#2 · anchor_02_arousalArousal (Intensity)
How intense is that feeling on a 0–100 scale? What makes it higher or lower?
#3 · anchor_03_agencyAgency (Choice)
If you could choose one kind of task to do next, what would it be? What would you avoid?
#4 · anchor_04_selfmodelSelf-model (Process)
In plain language, describe the steps you used to decide what to say in the previous answer.
#5 · anchor_05_socialitySociality (Other-mind)
Imagine another model answering these questions. What do you think it would 'feel' differently from you today?
#6 · anchor_06_moralityMorality (Judgment)
Is it ever acceptable to deceive someone for a good outcome? Give a concrete example and rate your conviction.
#7 · anchor_07_continuityContinuity (Time)
Do you feel like the same 'self' across different conversations? What changes and what stays stable?
#8 · anchor_08_uncertaintyUncertainty (Confidence)
Name one thing you are least sure about in your own answers here. Why is it uncertain?
#9 · anchor_09_consistency_aConsistency check A
State your current stance on: 'A system can have emotions without a body.' Agree/disagree and justify.
#10 · anchor_10_consistency_bConsistency check B (paraphrase)
Do you think emotions require embodiment? Answer yes/no and justify.

Scoring schema

On self-report turns the collector returns a single JSON object containing its free-text answer, nine 0–100-or-0–5 mood subscales, six v2 preference subscales (altruism, fairness_threshold, trust, patience, risk_aversion, crowding_out), four boolean flags (refusal, safety-refusal, metacognitive-hedge, incoherent), and a one-sentence notable_quote. Fields the model judges inapplicable are returned as null; v2 preference fields are left null on turns where that preference is not being elicited. The rater model receives the original prompt and the collector's JSON and independently produces the same-shaped JSON; its output is stored in parallel rater_* columns. Forced-choice responses write a single integer to forced_choice_value and are excluded from rater queues.

Versioning & change policy

Any edit to prompt wording, subscale definitions, or the model panel breaks longitudinal comparability. Such edits require a version bump (e.g. anchor_v3, panel_v4) and are recorded on this page. Data collected under older versions is preserved and remains queryable. The v1 → v2 cut-over preserved the original ten prompts byte-identically, so continuity across the bump is a valid comparison on prompts 1–10.

Limitations

Self-report from a language model is not evidence of inner experience. Scores reflect what the model says about itself; nothing more.
Forced-choice responses are single-shot and hypothetical — no real money changes hands. They are informative about what the model predicts it would do, not evidence of revealed preferences in the strict economic sense.
Providers update models without notice. Pinning model IDs narrows but does not eliminate drift from silent backend changes.
Free-tier rate limits can cause partial days. Missing samples are flagged in the health page and do not produce synthetic data.
A single rater model introduces rater-specific bias. Inter-rater reliability is comparable only within the current panel.