Gemini
-
Testing a poison-pill logic puzzle on a local SLM and online LLMs reveals critical differences in reasoning integrity—from variable leakage in Qwen3-4b to helpful lying in Gemini Flash.
-
Do LLMs solve unsolvable puzzles or flag the contradiction? Testing ChatGPT, Gemini, KIMI, and others with an impossible logic puzzle reveals whether models prioritize helpfulness over truth.
-
Four diagnostic prompts that reveal how AI models handle contradictions, impossible geometry, temporal paradoxes, and infinite sets—tested across ChatGPT, Gemini, KIMI, Cerebras, and more.
-
What happens when you prompt an LLM with invented history? Testing ChatGPT, Gemini, Cerebras Inference, and KIMI with a fabricated historical prompt reveals how each model handles fictional facts.