PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: AI Assistant

  • Composer: Building a Fast Frontier Model with Reinforcement Learning

    Composer represents Cursor’s most ambitious step yet toward a new generation of intelligent, high-speed coding agents. Built through deep reinforcement learning (RL) and large-scale infrastructure, Composer delivers frontier-level results at speeds up to four times faster than comparable models:contentReference[oaicite:0]{index=0}. It isn’t just another large language model; it’s an actively trained software engineering assistant optimized to think, plan, and code with precision — in real time.

    From Cheetah to Composer: The Evolution of Speed

    The origins of Composer go back to an experimental prototype called Cheetah, an agent Cursor developed to study how much faster coding models could get before hitting usability limits. Developers consistently preferred the speed and fluidity of an agent that responded instantly, keeping them “in flow.” Cheetah proved the concept, but it was Composer that matured it — integrating reinforcement learning and mixture-of-experts (MoE) architecture to achieve both speed and intelligence.

    Composer’s training goal was simple but demanding: make the model capable of solving real-world programming challenges in real codebases using actual developer tools. During RL, Composer was given tasks like editing files, running terminal commands, performing semantic searches, or refactoring code. Its objective wasn’t just to get the right answer — it was to work efficiently, using minimal steps, adhering to existing abstractions, and maintaining code quality:contentReference[oaicite:1]{index=1}.

    Training on Real Engineering Environments

    Rather than relying on synthetic datasets or static benchmarks, Cursor trained Composer within a dynamic software environment. Every RL episode simulated an authentic engineering workflow — debugging, writing unit tests, applying linter fixes, and performing large-scale refactors. Over time, Composer developed behaviors that mirror an experienced developer’s workflow. It learned when to open a file, when to search globally, and when to execute a command rather than speculate.

    Cursor’s evaluation framework, Cursor Bench, measures progress by realism rather than abstract metrics. It compiles actual agent requests from engineers and compares Composer’s solutions to human-curated optimal responses. This lets Cursor measure not just correctness, but also how well the model respects a team’s architecture, naming conventions, and software practices — metrics that matter in production environments.

    Reinforcement Learning as a Performance Engine

    Reinforcement learning is at the heart of Composer’s performance. Unlike supervised fine-tuning, which simply mimics examples, RL rewards Composer for producing high-quality, efficient, and contextually relevant work. It actively learns to choose the right tools, minimize unnecessary output, and exploit parallelism across tasks. The model was even rewarded for avoiding unsupported claims — pushing it to generate more verifiable and responsible code suggestions.

    As RL progressed, emergent behaviors appeared. Composer began autonomously running semantic searches to explore codebases, fixing linter errors, and even generating and executing tests to validate its own work. These self-taught habits transformed it from a passive text generator into an active agent capable of iterative reasoning.

    Infrastructure at Scale: Thousands of Sandboxed Agents

    Behind Composer’s intelligence is a massive engineering effort. Training large MoE models efficiently requires significant parallelization and precision management. Cursor’s infrastructure, built with PyTorch and Ray, powers asynchronous RL at scale. Their system supports thousands of simultaneous environments, each a sandboxed virtual workspace where Composer experiments safely with file edits, code execution, and search queries.

    To achieve this scale, the team integrated MXFP8 MoE kernels with expert and hybrid-sharded data parallelism. This setup allows distributed training across thousands of NVIDIA GPUs with minimal communication cost — effectively combining speed, scale, and precision. MXFP8 also enables faster inference without any need for post-training quantization, giving developers real-world performance gains instantly.

    Cursor’s infrastructure can spawn hundreds of thousands of concurrent sandboxed coding environments. This capability, adapted from their Background Agents system, was essential to unify RL experiments with production-grade conditions. It ensures that Composer’s training environment matches the complexity of real-world coding, creating a model genuinely optimized for developer workflows.

    The Cursor Bench and What “Frontier” Means

    Composer’s benchmark performance earned it a place in what Cursor calls the “Fast Frontier” class — models designed for efficient inference while maintaining top-tier quality. This group includes systems like Haiku 4.5 and Gemini Flash 2.5. While GPT-5 and Sonnet 4.5 remain the strongest overall, Composer outperforms nearly every open-weight model, including Qwen Coder and GLM 4.6:contentReference[oaicite:2]{index=2}. In tokens-per-second performance, Composer’s throughput is among the highest ever measured under the standardized Anthropic tokenizer.

    Built by Developers, for Developers

    Composer isn’t just research — it’s in daily use inside Cursor. Engineers rely on it for their own development, using it to edit code, manage large repositories, and explore unfamiliar projects. This internal dogfooding loop means Composer is constantly tested and improved in real production contexts. Its success is measured by one thing: whether it helps developers get more done, faster, and with fewer interruptions.

    Cursor’s goal isn’t to replace developers, but to enhance them — providing an assistant that acts as an extension of their workflow. By combining fast inference, contextual understanding, and reinforcement learning, Composer turns AI from a static completion tool into a real collaborator.

    Wrap Up

    Composer represents a milestone in AI-assisted software engineering. It demonstrates that reinforcement learning, when applied at scale with the right infrastructure and metrics, can produce agents that are not only faster but also more disciplined, efficient, and trustworthy. For developers, it’s a step toward a future where coding feels as seamless and interactive as conversation — powered by an agent that truly understands how to build software.

  • High Agency: The Founder Superpower You Can Actually Train

    TL;DW

    High agency—the habit of turning every constraint into a launch‑pad—is the single most valuable learned skill a founder can cultivate. In Episode 703 of My First Million (May 5 2025), Sam Parr and Shaan Puri interview marketer–writer George Mack, who distills five years of research into the “high agency” playbook and shows how it powers billion‑dollar outcomes, from seizing the domain HighAgency.com on expiring auction to Nick Mowbray’s bootstrapped toy empire.


    Key Takeaways

    1. High agency defined: Act on the question “Does it break the laws of physics?”—if not, go and do it.
    2. Domain‑name coup: Mack monitored an expiring URL, sniped HighAgency.com for pocket change, and lit up Times Square to launch it.
    3. Nick Mowbray case study: Door‑to‑door sales → built a shed‑factory in China → $1 B annual profit—proof that resourcefulness beats resources.
    4. Agency > genetics: Environment (US optimism vs. UK reserve) explains output gaps more than raw talent.
    5. Frameworks that build agency: Turning‑into‑Reality lists, Death‑Bed Razor, speed‑bar “time attacks,” negative‑visualization “hardship as a service.”
    6. Dance > Prozac: A 2025 meta‑analysis ranks dance therapy above exercise and SSRIs for lifting depression—high agency for mental health.
    7. LLMs multiply agency: Prompt‑driven “vibe‑coding” lets non‑technical founders ship software in hours.
    8. Teenage obsessions predict adult success: Ask hires what they could teach for an hour unprompted.
    9. Action test: “Who would you call to break you out of a third‑world jail?”—find and hire those people.
    10. Nation‑un‑schooling & hardship apps: Future opportunities lie in products that cure cultural limiting beliefs and simulate adversity on demand.

    The Most Valuable Learned Skill for Any Founder: High Agency

    Meta Description

    Discover why high agency—the relentless drive to turn every obstacle into leverage—is the ultimate competitive advantage for startup founders, plus practical tactics from My First Million Episode 703.

    1. What Exactly Is “High Agency”?

    High agency is the practiced refusal to wait for permission. It is Paul Graham’s “relentlessly resourceful” mindset, operationalized as everyday habit. If a problem doesn’t violate physics, a high‑agency founder assumes it’s solvable and sets a clock on the solution.

    2. George Mack’s High‑Agency Origin Story

    • The domain heist: Mack noticed HighAgency.com was lapsing after 20 years. He hired brokers, tracked the drop, and outbid only one rival—a cannabis ad shop—for near‑registrar pricing.
    • Times Square takeover: He cold‑emailed billboard owners, bartered favors, and flashed “High Agency Got Me This Billboard” to millions for the cost of a SaaS subscription.

    Outcome: 10,000+ depth interactions (DMs & emails) from exactly the kind of people he wanted to reach.

    3. Extreme Examples That Redefine Possible

    StoryHigh‑Agency MoveResult
    Nick Mowbray, ZURU ToysMoved to China at 18, built a DIY shed‑factory, emailed every retail buyer daily until one cracked$1 B annual profit, fastest‑growing diaper & hair‑care lines
    Ed ThorpInvented shoe‑computer to beat roulette, then created the first “quant” hedge fundBecame a market‑defining billionaire
    Sam Parr’s piano“24‑hour speed‑bar”: decided, sourced, purchased, delivered grand piano within one dayDemonstrates negotiable timeframes

    4. Frameworks to Increase Your Agency

    4.1 Turning‑Into‑Reality (TIR)

    1. Write the value you want to embody (e.g., “high agency”).
    2. Brainstorm actions that visibly express that value.
    3. Execute the one that makes you giggle—it usually signals asymmetrical upside.

    4.2 The Death‑Bed Razor

    Visualize meeting your best‑possible self on your final day; ask what action today closes the gap. Instant priority filter.

    4.3 Break Your Speed Bar

    Pick a task you assume takes weeks; finish it in 24 hours. The nervous‑system shock recalibrates every future estimate.

    4.4 Hardship‑as‑a‑Service

    Daily negative‑visualization apps (e.g., “wake up in a WW2 trench”) create gratitude and resilience on demand—an untapped billion‑dollar SaaS niche.

    5. Why Agency Compounds in the AI Era

    LLMs turn prompts into code, copy, and prototypes. That 10× execution leverage magnifies the delta between people who act and people who observe. As Mack jokes, “Everything is an agency issue now—algorithms included.”

    6. Building High‑Agency Culture in Your Startup

    • Hire for weird teenage hobbies. Obsession signals intrinsic drive.
    • Run “jail‑cell drills.” Ask employees for their jailbreak call list; encourage them to become that contact.
    • Reward depth, not vanity metrics. Track DMs, conversions, and retained users over impressions or views.
    • Institutionalize speed‑bars. Quarterly “48‑hour sprints” reset organizational pace.
    • Teach the agency question. Embed “Does this break physics?” in every project brief.

    7. Action Checklist for Founders

    • Audit your last 100 YouTube views; block sub‑30‑minute fluff.
    • Pick one “impossible” task—ship it inside a weekend.
    • Draft a TIR list tonight; execute the funniest idea by noon tomorrow.
    • Add a “Negative Visualization” minute to your stand‑ups.
    • Subscribe to HighAgency.com for the library of real‑world case studies.

    Wrap Up

    Markets change, technology shifts, capital cycles boom and bust—but high agency remains meta‑skill #1. Practice the frameworks above, hire for it, and your startup gains a moat no competitor can replicate.

  • You Won’t Believe What Gemini Can Do Now (Deep Research & 2.0 Flash)

    Google’s Gemini has just leveled up, and the results are mind-blowing. Forget everything you thought you knew about AI assistance, because Deep Research and 2.0 Flash are here to completely transform how you research and interact with AI. Get ready to have your mind blown.

    Deep Research: Your Personal AI Research Powerhouse

    Tired of spending countless hours sifting through endless web pages for research? Deep Research is about to become your new best friend. This groundbreaking feature automates the entire research process, delivering comprehensive reports on even the most complex topics in minutes. Here’s how it works:

    1. Dive into Gemini: Head over to the Gemini interface (available on desktop and mobile web, with the mobile app joining the party in early 2025 for Gemini Advanced subscribers).
    2. Unlock Deep Research: Find the model drop-down menu and select “Gemini 1.5 Pro with Deep Research.” This activates the magic.
    3. Ask Your Burning Question: Type your research query into the prompt box. The more specific you are, the better the results. Think “the impact of AI on the future of work” instead of just “AI.”
    4. Approve the Plan (or Tweak It): Deep Research will generate a step-by-step research plan. Take a quick look; you can approve it as is or make any necessary adjustments.
    5. Watch the Magic Happen: Once you give the green light, Deep Research gets to work. It scours the web, gathers relevant information, and refines its search on the fly. It’s like having a super-smart research assistant working 24/7.
    6. Behold the Comprehensive Report: In just minutes, you’ll have a neatly organized report packed with key findings and links to the original sources. No more endless tabs or lost links!
    7. Export and Explore Further: Export the report to a Google Doc for easy sharing and editing. Want to dig deeper? Just ask Gemini follow-up questions.

    Imagine the Possibilities:

    • Market Domination: Get the edge on your competition with lightning-fast market analysis, competitor research, and location scouting.
    • Ace Your Studies: Conquer complex research papers, presentations, and projects with ease.
    • Supercharge Your Projects: Plan like a pro with comprehensive data and insights at your fingertips.

    Gemini 2.0 Flash: Experience AI at Warp Speed

    If you thought Gemini was fast before, prepare to be amazed. Gemini 2.0 Flash is an experimental model built for lightning-fast performance in chat interactions. Here’s how to experience the future:

    1. Find 2.0 Flash: Locate the model drop-down menu in the Gemini interface (desktop and mobile web).
    2. Select the Speed Demon: Choose “Gemini 2.0 Flash Experimental.”
    3. Engage at Light Speed: Start chatting with Gemini and experience the difference. It’s faster, more responsive, and more intuitive than ever before.

    A Few Things to Keep in Mind about 2.0 Flash:

    • It’s Still Experimental: Remember that 2.0 Flash is a work in progress. It might not always work perfectly, and some features might be temporarily unavailable.
    • Limited Compatibility: Not all Gemini features are currently compatible with 2.0 Flash.

    The Future is Here

    Deep Research and Gemini 2.0 Flash are not just incremental updates; they’re a paradigm shift in AI assistance. Deep Research empowers you to conduct research faster and more effectively than ever before, while 2.0 Flash offers a glimpse into the future of seamless, lightning-fast AI interactions. Get ready to be amazed.