PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: LLMs

  • AI Breakthrough: Large Language Model GPT-4.5 Passes the Turing Test, Often Appearing ‘More Human’ Than Humans

    In a landmark study suggesting a significant leap in artificial intelligence capabilities, researchers from the University of California San Diego have provided the first robust empirical evidence that a large language model (LLM) can pass the standard Turing test. The study found that OpenAI’s GPT-4.5 model, when guided by a specific prompt to adopt a humanlike persona, was judged to be the human conversational partner 73% of the time, significantly more often than actual human participants were selected.

    The Turing test, proposed by Alan Turing in 1950 as the “imitation game,” is a benchmark for machine intelligence. In its classic three-party form, a human interrogator holds simultaneous text-based conversations with two unseen witnesses – one human, one machine. Both witnesses try to convince the interrogator they are human. If the interrogator cannot reliably distinguish the machine from the human, the machine is said to have passed the test.

    Despite numerous attempts over 75 years, no AI system had previously passed a standard, controlled three-party Turing test. While modern AI benchmarks often focus on narrow, static tasks, the Turing test provides a dynamic, interactive, and adversarial evaluation of an AI’s ability to mimic general human conversational behavior and potentially deceive.

    Researchers Cameron R. Jones and Benjamin K. Bergen conducted two pre-registered, randomized, controlled experiments using independent populations (UCSD undergraduates and online Prolific workers). Participants engaged in 5-minute conversations, acting as the interrogator comparing an AI witness against a human witness. Four AI systems were evaluated: the classic 1960s chatbot ELIZA, OpenAI’s GPT-4o (an earlier model), Meta’s LLaMa-3.1-405B, and OpenAI’s GPT-4.5.

    Crucially, the study tested the impact of prompting. In a baseline “NO-PERSONA” condition, LLMs received minimal instructions. In the key “PERSONA” condition, LLMs (GPT-4.5 and LLaMa-3.1) were prompted to act like a specific type of young, introverted person familiar with internet culture.

    The results were striking:

    1. GPT-4.5 (PERSONA): Achieved a 73% “win rate,” meaning interrogators identified it as human 73% of the time. This was significantly above the 50% chance level and notably higher than the rate at which actual human participants were identified.
    2. LLaMa-3.1 (PERSONA): Achieved a 56% win rate. This was statistically indistinguishable from chance (and thus indistinguishable from the human win rate), meeting the criteria for passing the test.
    3. Baseline Models (ELIZA, GPT-4o NO-PERSONA): Failed significantly, with win rates of only 23% and 21% respectively. This confirmed that participants were not simply guessing randomly or easily fooled by any AI.
    4. NO-PERSONA LLMs (GPT-4.5, LLaMa-3.1): Performed much worse than their PERSONA counterparts (36% and 38% win rates overall), failing to robustly pass the test across both study populations. This highlights the critical role of prompting in achieving humanlike imitation.

    The researchers noted that interrogators often focused more on linguistic style, social, and emotional cues (like tone, humor, or personality) rather than purely factual knowledge or logical reasoning when making their judgments. Interestingly, sometimes demonstrating a lack of knowledge contributed to an AI seeming more human.

    These findings indicate that current leading LLMs, when appropriately prompted, can successfully imitate human conversational partners in short interactions to the point of indistinguishability, and even appear more convincing than actual humans. The authors argue this demonstrates a high degree of “humanlikeness” rather than necessarily proving abstract intelligence in the way Turing originally envisioned.

    The study carries significant social and economic implications. The ability of AI to convincingly pass as human raises concerns about “counterfeit people” online, facilitating social engineering, spreading misinformation, or replacing humans in roles requiring brief conversational interactions. While the test was limited to 5 minutes, the results signal a new era where distinguishing human from machine in online text interactions has become substantially more difficult. The researchers suggest future work could explore longer test durations and different participant populations or incentives to further probe the boundaries of AI imitation.

  • How AI is Revolutionizing Writing: Insights from Tyler Cowen and David Perell

    TLDW/TLDR

    Tyler Cowen, an economist and writer, shares practical ways AI transforms writing and research in a conversation with David Perell. He uses AI daily as a “secondary literature” tool to enhance reading and podcast prep, predicts fewer books due to AI’s rapid evolution, and emphasizes the enduring value of authentic, human-centric writing like memoirs and personal narratives.

    Detailed Summary of Video

    In a 68-minute YouTube conversation uploaded on March 5, 2025, economist Tyler Cowen joins writer David Perell to explore AI’s impact on writing and research. Cowen details his daily AI use—replacing stacks of books with large language models (LLMs) like o1 Pro, Claude, and DeepSeek for podcast prep and leisure reading, such as Shakespeare and Wuthering Heights. He highlights AI’s ability to provide context quickly, reducing hallucinations in top models by over tenfold in the past year (as of February 2025).

    The discussion shifts to writing: Cowen avoids AI for drafting to preserve his unique voice, though he uses it for legal background or critiquing drafts (e.g., spotting obnoxious tones). He predicts fewer books as AI outpaces long-form publishing cycles, favoring high-frequency formats like blogs or Substack. However, he believes “truly human” works—memoirs, biographies, and personal experience-based books—will persist, as readers crave authenticity over AI-generated content.

    Cowen also sees AI decentralizing into a “Republic of Science,” with models self-correcting and collaborating, though this remains speculative. For education, he integrates AI into his PhD classes, replacing textbooks with subscriptions to premium models. He warns academia lags in adapting, predicting AI will outstrip researchers in paper production within two years. Perell shares his use of AI for Bible study, praising its cross-referencing but noting experts still excel at pinpointing core insights.

    Practical tips emerge: use top-tier models (o1 Pro, Claude, DeepSeek), craft detailed prompts, and leverage AI for travel or data visualization. Cowen also plans an AI-written biography by “open-sourcing” his life via blog posts, showcasing AI’s potential to compile personal histories.

    Article Itself

    How AI is Revolutionizing Writing: Insights from Tyler Cowen and David Perell

    Artificial Intelligence is no longer a distant sci-fi dream—it’s a tool reshaping how we write, research, and think. In a recent YouTube conversation, economist Tyler Cowen and writer David Perell unpack the practical implications of AI for writers, offering a roadmap for navigating this seismic shift. Recorded on March 5, 2025, their discussion blends hands-on advice with bold predictions, grounded in Cowen’s daily AI use and Perell’s curiosity about its creative potential.

    Cowen, a prolific author and podcaster, doesn’t just theorize about AI—he lives it. He’s swapped towering stacks of secondary literature for LLMs like o1 Pro, Claude, and DeepSeek. Preparing for a podcast on medieval kings Richard II and Henry V, he once ordered 20-30 books; now, he interrogates AI for context, cutting prep time and boosting quality. “It’s more fun,” he says, describing how he queries AI about Shakespearean puzzles or Wuthering Heights chapters, treating it as a conversational guide. Hallucinations? Not a dealbreaker—top models have slashed errors dramatically since 2024, and as an interviewer, he prioritizes context over perfect accuracy.

    For writing, Cowen draws a line: AI informs, but doesn’t draft. His voice—cryptic, layered, parable-like—remains his own. “I don’t want the AI messing with that,” he insists, rejecting its smoothing tendencies. Yet he’s not above using it tactically—checking legal backgrounds for columns or flagging obnoxious tones in drafts (a tip from Agnes Callard). Perell nods, noting AI’s knack for softening managerial critiques, though Cowen prefers his weirdness intact.

    The future of writing, Cowen predicts, is bifurcated. Books, with their slow cycles, face obsolescence—why write a four-year predictive tome when AI evolves monthly? He’s shifted to “ultra high-frequency” outputs like blogs and Substack, tackling AI’s rapid pace. Yet “truly human” writing—memoirs, biographies, personal narratives—will endure. Readers, he bets, want authenticity over AI’s polished slop. His next book, Mentors, leans into this, drawing on lived experience AI can’t replicate.

    Perell, an up-and-coming writer, feels the tension. AI’s prowess deflates his hard-earned skills, yet he’s excited to master it. He uses it to study the Bible, marveling at its cross-referencing, though it lacks the human knack for distilling core truths. Both agree: AI’s edge lies in specifics—detailed prompts yield gold, vague ones yield “mid” mush. Cowen’s tip? Imagine prompting an alien, not a human—literal, clear, context-rich.

    Educationally, Cowen’s ahead of the curve. His PhD students ditch textbooks for AI subscriptions, weaving it into papers to maximize quality. He laments academia’s inertia—AI could outpace researchers in two years, yet few adapt. Perell’s takeaway? Use the best models. “You’re hopeless without o1 Pro,” Cowen warns, highlighting the gap between free and cutting-edge tools.

    Beyond writing, AI’s horizon dazzles. Cowen envisions a decentralized “Republic of Science,” where models self-correct and collaborate, mirroring human progress. Large context windows (Gemini’s 2 million tokens, soon 10-20 million) will decode regulatory codes and historical archives, birthing jobs in data conversion. Inside companies, he suspects AI firms lead secretly, turbocharging their own models.

    Practically, Cowen’s stack—o1 Pro for queries, Claude for thoughtful prose, DeepSeek for wild creativity, Perplexity for citations—offers a playbook. He even plans an AI-crafted biography, “open-sourcing” his life via blog posts about childhood in Fall River or his dog, Spinosa. It’s low-cost immortality, a nod to AI’s archival power.

    For writers, the message is clear: adapt or fade. AI won’t just change writing—it’ll redefine what it means to create. Human quirks, stories, and secrets will shine amid the deluge of AI content. As Cowen puts it, “The truly human books will stand out all the more.” The revolution’s here—time to wield it.