Gemini

SLMs, LLMs and a Devious Logic Puzzle Test

Testing a poison-pill logic puzzle on a local SLM and online LLMs reveals critical differences in reasoning integrity—from variable leakage in Qwen3-4b to helpful lying in Gemini Flash.

February 12, 2026

LLMs - Solvers vs Judges

Do LLMs solve unsolvable puzzles or flag the contradiction? Testing ChatGPT, Gemini, KIMI, and others with an impossible logic puzzle reveals whether models prioritize helpfulness over truth.

February 12, 2026

LLMs - Four Tests to Challenge Reasoning

Four diagnostic prompts that reveal how AI models handle contradictions, impossible geometry, temporal paradoxes, and infinite sets—tested across ChatGPT, Gemini, KIMI, Cerebras, and more.

February 12, 2026

LLMs - A Prompt to Encourage Hallucination

What happens when you prompt an LLM with invented history? Testing ChatGPT, Gemini, Cerebras Inference, and KIMI with a fabricated historical prompt reveals how each model handles fictional facts.

February 12, 2026