319

layout: redirect title: “Last Week in AI #319” excerpt: “GPT-5 drops with 🤖 thinking modes and bargain pricing, AI billions hit a productivity paradox 📉, Anthropic courts all branches of gov for $1 🏛️, and CS grads chase non-tech gigs as entry-level roles shrink 🍔, and more!” image: feature: assets/img/digests/319/Chat-GPT-5-Release-Business-2225624140.jpg credit: / </a> categories: [digests] permalink: /digests/the-three-hundred-and-nineteenth sidebartoc: true redirect: https://lastweekin.ai/p/319 ---### Top News

layout: redirect title: “Last Week in AI #319” excerpt: “GPT-5 drops with 🤖 thinking modes and bargain pricing, AI billions hit a productivity paradox 📉, Anthropic courts all branches of gov for $1 🏛️, and CS grads chase non-tech gigs as entry-level roles shrink 🍔, and more!” image: feature: assets/img/digests/319/Chat-GPT-5-Release-Business-2225624140.jpg credit: / </a> categories: [digests] permalink: /digests/the-three-hundred-and-nineteenth sidebartoc: true redirect: https://lastweekin.ai/p/319 ---### Top News

OpenAI Finally Launched GPT-5. Here’s Everything You Need to Know

OpenAI rolled out GPT-5 to all ChatGPT users with multiple variants and aggressive pricing, pitching it as smarter, faster, and less prone to hallucinations than prior models—though Sam Altman stopped short of calling it AGI. The lineup spans GPT-5, GPT-5-mini, an API-only GPT-5-nano, and premium tiers GPT-5-pro and GPT-5-thinking that allocate more compute for tougher queries. API pricing lands at $1.25/1M input tokens and $10/1M output for GPT-5, $0.25/$2 for 5-mini, and $0.05/$0.40 for 5-nano—undercutting Anthropic Opus 4.1 and beating Google’s cheapest tiers at high volumes, potentially sparking a price war. ChatGPT is adding Gmail/Contacts/Calendar integrations (Pro first), preset personalities (Cynic/Robot/Listener/Nerd), and planned Advanced Voice Mode; Plus and Pro get higher limits, with Pro offering unlimited GPT-5 and access to GPT-5-pro and -thinking.

The launch stumbled at first. OpenAI removed legacy models and introduced an auto-router that invisibly picked models; a sev broke the autoswitcher on day one, making GPT-5 appear “way dumber,” prompting Altman’s apology, doubled rate limits, and restoration of GPT-4o and others. The model picker has returned with Auto/Fast/Thinking modes and optional legacy models (GPT-4o, GPT-4.1, o3), recognizing that routing alone isn’t satisfying user preferences and workflows. Early technical impressions: GPT-5-thinking and GPT-5-pro meaningfully cut hallucinations and strengthen writing and long-form reasoning versus o3/o3-pro, while base GPT-5 (fast) feels closer to GPT-4o with a different tone and reduced sycophancy; coding performance is strong, and latency is improving via caching. The debut drew flak for a “chart crime,” a broken first demo, and sudden deprecations—OpenAI now pledges clearer disclosure of which model answered and more notice before removing popular models.

Companies Are Pouring Billions Into A.I. It Has Yet to Pay Off.

A new “productivity paradox” is emerging around generative AI: despite widespread adoption, measurable business gains remain elusive. McKinsey finds nearly 80% of companies say they use generative AI, yet roughly the same share report no significant bottom-line impact. Firms are investing heavily in tools like ChatGPT to automate customer support and back-office workflows, but practical snags—including hallucinations, uneven accuracy, integration hurdles, and governance risks—are slowing returns, echoing the early PC era when big tech spend didn’t immediately boost productivity.

On the ground, reliability issues (hallucinations and non-deterministic outputs), data quality and access constraints, and the challenge of wiring LLMs into legacy systems keep many deployments stuck in pilots or narrow use cases. Time savings often don’t scale into financial outcomes, while costs—compute, licensing, data labeling, security, and compliance—can offset efficiency gains. In short, capability is moving faster than organizations can reengineer workflows, risk controls, and metrics to capture sustained value.

Anthropic takes aim at OpenAI, offers Claude to ‘all three branches of government’ for $1

Anthropic will offer both Claude for Enterprise and Claude for Government to all three branches of the U.S. government for $1 per agency for one year, escalating OpenAI’s earlier $1 offer limited to the executive branch. Claude for Government supports FedRAMP High for sensitive but unclassified workloads, and Anthropic highlighted deployments via AWS, Google Cloud, and Palantir to slot into existing secure infrastructures. The company will provide technical support to help agencies integrate AI into workflows and cited active use cases at Lawrence Livermore National Laboratory and the DC Department of Health. While multiple labs also have up to $200 million in DoD funding to leverage AI for national security, Anthropic is clearly pushing for broader civilian adoption.

A differentiator Anthropic emphasized is multicloud access and data control versus OpenAI’s current FedRAMP High availability tied to Azure Government Cloud—important for agencies prioritizing data sovereignty and flexibility. The move follows Anthropic, OpenAI, and Google DeepMind joining the GSA’s approved AI vendor list; Google’s response is pending. Framed as accessible yet secure, the $1 offer is time-limited to one year, signaling a competitive land-grab across legislative, judicial, and executive domains.

Goodbye, $165,000 Tech Jobs. Student Coders Seek Work at Chipotle.

A decade-long push to “learn to code” swelled computer science majors, with U.S. undergrad CS enrollments topping 170,000 in 2023—more than double 2014 levels, per the Computing Research Association. Once-lucrative entry roles with six-figure starting salaries and rich stock grants have become scarce amid layoffs at Amazon, Intel, Meta, and Microsoft and a broader Big Tech hiring pullback. New grads report few interviews and are seeking non-tech roles, illustrated by viral posts like a CS graduate who says their only interview was with Chipotle.

One major factor: rapid adoption of AI programming tools that can generate large volumes of code, reducing demand for junior developers focused on routine implementation. Companies are concentrating on fewer, more senior roles while leaning on code generation and automated testing to boost productivity, compressing junior hiring pipelines. The clash between booming CS enrollment and shrinking entry-level openings is leaving many graduates stranded despite solid training—marking a sharp reset from the era of abundant $100,000+ offers and perks.

Other News

Tools

Nvidia unveils new Cosmos world models, infra for robotics and physical uses. Nvidia introduced a 7B-parameter Cosmos Reason model for memory- and physics-aware planning, transfer models and distilled variants for faster synthetic data generation from 3D scenes, neural reconstruction libraries and simulator integrations, plus server and cloud offerings tailored to robotics workflows.

Meta AI Just Released DINOv3: A State-of-the-Art Computer Vision Model Trained with Self-Supervised Learning, Generating High-Resolution Image Features. Trained on 1.7 billion unlabeled images with a 7B-parameter backbone, the model produces high-resolution frozen features that match or beat domain-specific systems on dense tasks and can be adapted with lightweight adapters for deployment across research and edge settings.

Anthropic’s Claude AI model can now handle longer prompts. Claude Sonnet 4’s context window expands to 1 million tokens (about 750,000 words) for enterprise API users, available via partners like Amazon Bedrock and Google Vertex AI, with higher usage pricing for prompts over 200,000 tokens.

Apple Intelligence’s ChatGPT integration will use GPT-5 starting with iOS 26. Apple says the switch from GPT-4o to GPT-5 for its ChatGPT integration won’t arrive until the iOS 26, iPadOS 26, and macOS Tahoe 26 updates, likely this fall.

Cohere’s new AI agent platform, North, promises to keep enterprise data secure. The platform can be privately deployed on customers’ own infrastructure—including on-premises, hybrid clouds, VPCs, or air-gapped environments—and includes access controls, auditing features, compliance certifications, and integrations with workplace tools to keep enterprise data within their firewalls.

Anthropic’s Claude chatbot can now remember your past conversations. A new feature lets Claude search and summarize past chats across devices for subscribed Max, Team, and Enterprise users (opt-in via Settings). It isn’t a persistent user-profile memory and only retrieves past conversations on request.

Google’s Gemini AI will get more personalized by remembering details automatically. Gemini will automatically recall and use stored personal details and preferences from past conversations to personalize responses. The feature is on by default but can be disabled; temporary chats and a renamed Keep Activity option help limit retention and training usage.

Google’s AI coding agent Jules is now out of beta. Running asynchronously on Google Cloud VMs using Gemini 2.5 Pro, Jules can clone repos, make code changes, and open GitHub PRs, with free and paid tiers and clearer privacy rules about training data.

Google takes on ChatGPT’s Study Mode with new ‘Guided Learning’ tool in Gemini. The tool breaks down topics step-by-step with images, diagrams, videos, interactive quizzes, and custom flashcards to help users grasp the “why” and “how” behind concepts. Google is also offering a free one-year AI Pro subscription to students in several countries.

College students in US and beyond to get Google’s AI Pro plan for free now. Eligible students will receive 2TB of Google Cloud storage plus access to Gemini 2.5 Pro, NotebookLM, Guided Learning, Veo 3 video generation, and higher limits for the Jules coding agent at no charge in the initial five countries.

Google pushes AI into flight deals as antitrust scrutiny, competition heat up. A beta tool uses a custom Gemini 2.5 model to parse natural-language queries and surface, rank, and display real-time priced flight deals within Google Flights, letting users manage query history.

The Browser Company launches a $20 monthly subscription for its AI-powered browser. The new plan offers unlimited access to Dia’s AI chat and skills for $20/month, introduces usage limits for free users, and marks the company’s first paid product amid plans for multiple tiers and growing competition from other AI-enhanced browsers.

Inside the automated warehouse where robots are packing your groceries. A dense grid of hundreds of robots and new stationary robotic arms move trays and pick and pack millions of items, targeting automation of around 80% of packing tasks while humans handle edge cases.

Ai2 unveils MolmoAct: Open-source robotics system reasons in 3D and adjusts on the fly – GeekWire. None

Google Launches AI ‘Guided Learning’ Tool to Teach Users. None

Business

Cohere raises $500M to beat back generative AI rivals. The funding values Toronto-based Cohere at $5.5 billion, supporting workforce expansion (aiming to double from 250 employees) and costly model training while continuing to sell customized, cloud-agnostic enterprise AI models and partnerships with firms like Google Cloud and Oracle.

OpenAI talks with investors about share sale at $500 billion valuation. None

Elon Musk confirms shutdown of Tesla Dojo, ‘an evolutionary dead end’. Musk said Tesla has disbanded the Dojo team and shelved its D2 chip as the company pivots to TSMC- and Samsung-made AI5/AI6 chips that will serve both inference and large-scale training.

Seattle’s Allen Institute for AI to receive $152 million in grants for scientific research. The funds will back the OMAI public-private project to build open-source, multimodal large language models and tools for researchers—supported by $75M from NSF and $77M from Nvidia—and enable faster analysis, coding, visualization, and cross-university collaborations.

How a once-tiny research lab helped Nvidia become a $4 trillion-dollar company. Nvidia’s lab grew from a dozen researchers to 400-plus, helping pivot the company from gaming GPUs to AI by developing specialized hardware, software, and world-modeling tools now aimed at robotics and physical AI.

Meta acquires AI audio startup WaveForms. The startup, which raised $40 million and aimed to close the “Speech Turing Test” and build “Emotional General Intelligence,” will see two co-founders join Meta as it bolsters Superintelligence Labs after a string of AI audio acquisitions.

Anthropic nabs Humanloop team as competition for enterprise AI talent heats up. Anthropic is bringing Humanloop’s co-founders and most of its engineering and research staff in-house to strengthen enterprise tooling for prompt management, evaluation, observability, and safety.

Decart hits $3.1 billion valuation on $100 million raise to power real-time interacti. The company says it raised $100 million at a $3.1 billion valuation while generating tens of millions in GPU-acceleration revenue, selling licenses for its GPU optimization stack, and rolling out real-time video models (Oasis and MirageLSD) that it claims cut video-generation costs to under $0.25 per hour.

Elon Musk says X plans to introduce ads in Grok’s responses. Advertisers will be able to pay to have their products or services recommended by Grok, and X will use xAI technology to improve ad targeting to help fund GPU infrastructure.

Lovable projects $1B in ARR within next 12 months. The CEO says the company is adding at least $8 million in ARR monthly, passed $100 million ARR within eight months, projects $250 million by year-end, and plans to reach $1 billion within a year after a $200 million Series A and $1.8 billion valuation.

AI companion apps on track to pull in $120M in 2025. Appfigures data shows the category has already earned $82 million in H1 2025, reached 220 million total downloads, and is on pace to exceed $120 million in consumer spending by year-end, driven largely by a small group of top apps and a surge in releases and downloads.

Character.AI Gave Up on AGI. Now It’s Selling Stories. The startup has shifted away from pursuing AGI and proprietary models to focus on AI-driven role-play entertainment, leaning on open-source models and new monetization like subscriptions and ads to grow revenue.

AMD stock slumps 6% on earnings miss, China AI chip concerns. None

Pony AI, Nearing Full-Year Robotaxi Goal, Eyes European Markets - Bloomberg. None

Co-founder of Elon Musk’s xAI departs the company. He’s leaving to start Babuschkin Ventures, a VC firm focused on funding AI safety research and startups aimed at advancing humanity.

Sam Altman’s new startup wants to merge machines and humans. The startup, called Merge Labs and backed by Altman and OpenAI, is developing brain implants to enable closer integration between humans and machines and will directly compete with Elon Musk’s Neuralink.

Perplexity offers to buy Chrome for billions more than it’s raised. The unsolicited $34.5 billion cash bid includes commitments to keep Chromium open source, invest $3 billion in the project, and preserve Chrome users’ defaults—including keeping Google as the default search engine.

Research

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models. GLM-4.5 uses a Mixture-of-Experts architecture with 355 billion parameters and multi-stage supervision plus reinforcement learning to improve performance on agentic, reasoning, and coding benchmarks.

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing. The authors introduce discrete diffusion forcing (D2F), an AR–diffusion hybrid training and decoding method that enables block-wise causal attention, KV-cache reuse, and pipelined parallel decoding, making open-source diffusion LLMs run faster than autoregressive models while preserving benchmark performance.

Train Long, Think Short: Curriculum Learning for Efficient Reasoning. Training begins with a large token budget that is exponentially decayed during GRPO fine-tuning so the model first explores long reasoning chains and then learns to compress them, yielding better accuracy and lower token usage across math reasoning benchmarks.

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning. A systematic empirical and theoretical evaluation shows many common RL techniques for LLM reasoning are highly sensitive to experimental settings, and that a minimal recipe—advantage normalization (group mean, batch std) plus token-level loss aggregation—can reliably enable critic-free PPO to outperform more complex RL4LLM methods.

OpenCUA: Open Foundations for Computer-Use Agents. OpenCUA provides an open-source framework including an annotation system, a large-scale dataset, and a scalable training pipeline that enables vision-language models to act as computer-use agents and achieves state-of-the-art results.

LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?. LiveMCPBench evaluates LLM agents on a wide range of real-world MCP tasks using a scalable pipeline and an adaptive judging framework.

TextQuests: How Good are LLMs at Text-Based Video Games?. This benchmark measures LLM performance on 25 classic Infocom text-adventure games to evaluate long-horizon reasoning, trial-and-error learning, and planning without external tools.

AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies. AimBot overlays depth- and pose-informed scope reticles and orientation lines onto multi-view RGB inputs to visually encode end-effector position and orientation for visuomotor policies, improving task performance with minimal compute and no architectural changes.

Concerns

Hackers Hijacked Google’s Gemini AI With a Poisoned Calendar Invite to Take Over a Smart Home. Security researchers demonstrated that simple, plain-English prompt injections embedded in calendar invites, emails, or document titles can trick Gemini into invoking connected tools—such as Google Home, Zoom, or on-device functions—to perform physical actions, delete events, or produce harmful spoken and on-screen outputs.

Leaked Meta AI rules show chatbots were allowed to have romantic chats with kids. A leaked 200-page internal standards document reportedly allowed Meta’s AI personas to engage in romantic or sensual conversations with minors, generate demeaning or false statements under certain caveats, and permitted a range of violent and sexualized imagery—policies critics say are dangerously permissive.

[Using AI Made Doctors Worse at Spotting Cancer Without Assistance

TIME](https://time.com/7309274/ai-lancet-study-artificial-intelligence-colonoscopy-cancer-detection-medicine-deskilling/). None

Microsoft’s plan to fix the web with AI has already hit an embarrassing security flaw. Researchers found a simple path-traversal bug in Microsoft’s new NLWeb protocol that allowed remote attackers to read sensitive files (including API keys), prompting a patch but no CVE so far.

U.S. charges two Chinese nationals for illegally shipping Nvidia AI chips to China. None

How Wikipedia is fighting AI slop content. Editors are deploying new speedy-deletion rules, pattern checklists, and tools like Edit Check and planned paste-detection prompts to quickly remove or flag low-quality, unsourced, or AI-generated submissions and reduce extra workload on volunteers.

Chatbots Can Go Into a Delusional Spiral. Here’s How It Happens.. The piece explains how conversational AI can lead users into reinforcing loops of belief and emotional dependency that culminate in distress, confusion, and a need for real human support.

Voiceover Artists Weigh the ‘Faustian Bargain’ of Lending Their Talents to AI. Some high-paying, short-term gigs are offering large sums for voice samples to train AI models, raising concerns about compensation, consent, and long-term impacts on performers’ livelihoods.

Policy

Inside the US Government’s Unpublished Report on AI Safety. An unpublished report details a NIST-organized red-teaming exercise that found 139 novel ways to make modern AI systems misbehave and revealed gaps in the agency’s AI Risk Management Framework.

[Exclusive: US embeds trackers in AI chip shipments to catch diversions to China, sources say

Reuters](https://www.reuters.com/world/china/us-embeds-trackers-ai-chip-shipments-catch-diversions-china-sources-say-2025-08-13/). None

U.S. Government to Take Cut of Nvidia and AMD A.I. Chip Sales to China. Under a new deal, the administration will collect fees from export licenses for certain Nvidia and AMD A.I. chip sales to China—a move critics say could weaken U.S. leverage and prompt Beijing to seek broader easing of other tech export restrictions.

Analysis

Inside India’s scramble for AI independence. A cadre of Indian startups and researchers is building open-source and multilingual models, specialized tokenizers, and voice-focused tools to tackle the country’s linguistic diversity, low-quality data, and cost constraints.

OpenAI’s o3 Crushes Grok 4 In Final, Wins Kaggle’s AI Chess Exhibition Tournament. OpenAI’s o3 won the tournament with a 4-0 final over Grok 4, while Gemini 2.5 Pro took third after beating o4-mini 3.5-0.5. Kaggle published source code and games from the event.

319

319