Skynet Today

2025-04-17T04:13:55+00:00

Top News

OpenAI introduces Sora, its text-to-video AI model

OpenAI has unveiled Sora, a new video-generation model that can create realistic scenes from text instructions. The model is capable of generating complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. It can also generate a video based on a still image, fill in missing frames on an existing video, or extend it. Despite some limitations in simulating the physics of complex scenes, the model has shown impressive results. Currently, Sora is only available to a select group of testers and visual artists for feedback and risk assessment. This development marks a significant advancement in the field of AI, with video generation improving at a remarkable pace.

Gemini 1.5 is Google’s next-gen AI model — and it’s already almost ready

Google has announced the launch of Gemini 1.5, the successor to its large language model, Gemini. The new model, which is available to developers and enterprise users, boasts significant improvements over its predecessor, including a general-purpose model that matches the performance of the high-end Gemini Ultra and outperforms Gemini 1.0 Pro on 87% of benchmark tests. Gemini 1.5 utilizes a technique known as “Mixture of Experts” (MoE), which allows it to process only a part of the model for each query, enhancing speed and efficiency. A key feature of Gemini 1.5 is its expanded context window, capable of handling 1 million tokens, significantly more than OpenAI’s GPT-4 and the current Gemini Pro. This allows the AI to process larger queries and more information simultaneously, which Google CEO Sundar Pichai believes will be particularly beneficial for businesses.

Other News

Tools

NVIDIA NeMo Canary - NVIDIA NeMo Canary model advances speech recognition and translation, achieving top performance in transcribing and translating English, Spanish, German, and French, while offering efficient architecture and open-source availability.

Apple Readies AI Tool to Rival Microsoft’s GitHub Copilot - Apple is developing a new AI tool for app developers that will compete with Microsoft’s GitHub Copilot, using artificial intelligence to predict and complete blocks of code, and exploring AI features for testing applications and other functions.

Fan wiki hosting site Fandom rolls out controversial AI features - Fandom introduces controversial AI features, including Quick Answers and AI image review, allowing users to opt out and promising improved accuracy after previous complaints.

ChatGPT is getting ‘memory’ to remember who you are and what you like - ChatGPT is introducing “memory” to personalize conversations by remembering specific details about users and their preferences, although the feature raises concerns about privacy and data control.

Apple Readies AI Tool to Rival Microsoft’s GitHub Copilot - Apple is developing an AI tool to compete with Microsoft’s GitHub Copilot, aiming to provide a strong alternative in the AI software development space.

Gemini Advanced is most impressive when it’s working with Google - Gemini Advanced, powered by Google’s AI, offers a mixed bag of capabilities, excelling at Google-related tasks but falling short in other areas such as image generation and translation.

Nvidia reveals its Eos supercomputer for AI processing sporting 4,608 H100 GPUs - Nvidia’s Eos supercomputer, designed for AI applications, boasts 4,608 H100 GPUs and 18.4 exaflops of FP8 AI performance, showcasing the capabilities of Nvidia’s technologies at scale.

Business

OpenAI Completes Deal That Values the Company at $80 Billion - OpenAI completes a deal valuing the company at $80 billion, allowing employees to cash out their shares and solidifying its position as one of the world’s most valuable tech start-ups.

Microsoft’s AI growth is helping cloud business chip away at Amazon’s lead - Microsoft’s AI growth is rapidly closing the gap with Amazon’s cloud services, with a significant portion of Azure’s revenue growth attributed to AI capabilities.

AI Computing Firm Lambda Raises $320 Million in Fresh Funding - Lambda, an AI computing firm, raises $320 million in funding to expand its AI cloud business, as top technology companies race to integrate AI into their products and services.

Google pledges 25 million euros to boost AI skills in Europe - Google pledges 25 million euros to support AI skills in Europe, offering funding for social enterprises, non-profits, and free online AI training courses in 18 languages to ensure that no one is left behind in the AI revolution.

German chancellor welcomes Microsoft’s $3.5 billion AI investment in Germany - German Chancellor Olaf Scholz welcomes Microsoft’s $3.5 billion AI investment in Germany, emphasizing its commitment to progress, growth, and global openness.

Companies Hope Super Bowl AI Commercials Score With Viewers - AI makes its presence known in Super Bowl commercials, from Microsoft’s Copilot to Google’s Pixel 8 and even the Minions in Despicable Me 4.

Exclusive: Ex-Salesforce Co-CEO Bret Taylor and longtime Googler Clay Bavor raised $110 million to bring AI ‘agents’ to business - Ex-Salesforce Co-CEO Bret Taylor and Google veteran Clay Bavor raised $110 million to launch Sierra, a conversational AI startup focused on business customers, aiming to provide easy-to-understand, pragmatic uses of AI technology and compete against larger players in the market.

Stability AI’s Intel fundraise came with hefty hardware purchase commitments, sources say - Stability AI raised funding through a “compute for equity” deal with Intel, committing to purchase access to the chip company’s hardware resources, while also exploring new revenue streams and a potential sale to cover costs.

Anthropic takes steps to prevent election misinformation - AI startup Anthropic is testing a technology to detect and redirect users of its GenAI chatbot to authoritative sources of voting information to prevent election misinformation.

Cruise names first chief safety officer following crash and controversy - Cruise appoints its first “chief safety officer” to oversee safety management systems and operations following a controversial incident involving a pedestrian and one of its robotaxis.

Waymo recalls and updates robotaxi software after two cars crashed into the same towed truck - Waymo recalls and updates robotaxi software after two cars crashed into the same towed truck, causing minor damage and prompting the company’s first recall.

Abu Dhabi AI Firm to Pare Back China Presence in Pivot to US - Abu Dhabi AI firm shifts focus from China to the US in its expansion strategy.

Chinese tech startups quietly stop testing driverless cars on Californian roads - Chinese tech startups, including Didi, are quietly stopping their testing of driverless cars on Californian roads, possibly due to souring US-China relations and growing public backlash towards autonomous vehicles.

AI Companies Take Hit as Judge Says Artists Have “Public Interest” In Pursuing Lawsuits - Judge rejects AI companies’ free speech defense in lawsuit brought by artists over unauthorized use of images to train AI systems, allowing key claims to move forward and emphasizing public interest in protecting against misappropriation of names and likenesses.

Masayoshi Son Seeks to Build a $100 Billion AI Chip Venture - Masayoshi Son aims to establish a $100 billion AI chip venture.

ChatGPT creators OpenAI are generating 100 billion words per day, CEO says - OpenAI’s ChatGPT creators are generating about 100 billion words per day, which is roughly 13 words for every person on Earth, but still far less than what humans generate.

Reddit has reportedly signed over its content to train AI models - Reddit has reportedly signed a content licensing deal to allow its data to be used to train AI models, potentially sparking user backlash over the ethics of using public data for AI.

No ‘GPT’ trademark for OpenAI - OpenAI’s attempt to trademark “GPT” has been denied by the USPTO, as the term is deemed “merely descriptive,” potentially leading to diluted dominance over GPT-related terminology.

Research

BUD-E: ENHANCING AI VOICE ASSISTANTS’ CONVERSATIONAL QUALITY, NATURALNESS AND EMPATHY - AI voice assistants are being enhanced to provide natural, empathetic, and contextually rich conversational experiences, with the BUD-E project aiming to reduce latency, increase naturalness of speech, keep track of conversations, enhance functionality, understand emotional context, and extend to multi-language and multi-speaker environments.

World Model on Million-Length Video And Language With RingAttention - Training a large context size neural network on long video and language sequences using the RingAttention technique, this paper overcomes challenges to develop a deeper understanding of human knowledge and the multimodal world.

Amazon AGI Team Say Their AI Is Showing “Emergent Abilities” - Amazon’s new AI model, BASE TTS, is exhibiting emergent language abilities that it wasn’t explicitly trained on, showing naturalness in conversational text and understanding punctuation, non-English words, and emotions.

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models - A new approach to editing music generated by text-to-music models allows for modification of specific attributes while maintaining consistency, demonstrating superior performance in style and timbre transfer evaluations.

How Quickly Do Large Language Models Learn Unexpected Skills? - Large language models’ emergent abilities are not as sudden or unpredictable as previously thought, but rather develop gradually and predictably depending on how they are measured, challenging the notion of “breakthrough” behavior in AI.

Chain-of-Thought Reasoning Without Prompting - Enhancing reasoning capabilities of large language models through a novel approach of altering the decoding process to elicit chain-of-thought reasoning paths, bypassing the need for manual prompting and achieving higher confidence in the model’s decoded answers.

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models - Visually-conditioned language models (VLMs) are being investigated and evaluated across various design axes, including image preprocessing, architecture, and optimization, with the aim of understanding their performance and capabilities.

Multilingual E5 Text Embeddings: A Technical Report - A technical report presents the training methodology and evaluation results of open-source multilingual E5 text embedding models, including three different sizes and a new instruction-tuned model.

GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency - AI-powered writing system GhostWriter allows users to exercise enhanced agency and personalization, learning their writing style implicitly and empowering them with multiple ways to control the system’s writing style.

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement - A framework called OS-Copilot is introduced to build generalist agents capable of interfacing with comprehensive elements in an operating system, showcasing strong generalization to unseen applications and self-improvement in automating general computer tasks.

Computing Power and the Governance of Artificial Intelligence - Governments and companies are leveraging computing power to govern the development and deployment of artificial intelligence, with potential benefits and risks that need to be carefully managed.

Concerns

Richard Branson and Oppenheimer’s grandson urge action to stop AI and climate ‘catastrophe’ - High-profile figures urge world leaders to address the existential risks of artificial intelligence and the climate crisis, emphasizing the need for urgent multilateral action.

Microsoft and OpenAI say hackers are using ChatGPT to improve cyberattacks - Hackers are using large language models like ChatGPT to refine and improve their existing cyberattacks, prompting Microsoft and OpenAI to detect and counter these early-stage attempts.

Hackers for China, Russia and Others Used OpenAI Systems, Report Says - Hackers from nation-states are using OpenAI’s systems for cyberattacks, using AI for mundane tasks like drafting emails and translating documents.

Automating ableism - AI can have positive effects on the disability community, but the future of AI and disability is looking grim, as it can perpetuate ableism and discrimination, particularly in areas such as healthcare, employment, and social inclusion.

Sarah Silverman’s lawsuit against OpenAI partially dismissed - Sarah Silverman’s lawsuit against OpenAI, along with other authors, alleging that OpenAI’s ChatGPT is pirating their work, has been partially dismissed by a California court, with the main complaint of direct copyright infringement remaining.

The text file that runs the internet - A tiny text file called robots.txt has governed the internet for three decades, allowing website owners to control which robots can access their site, but the rise of AI has complicated this handshake agreement, leading to a debate over the future of web crawling and data access.

A High School Deepfake Nightmare - High school students used AI to create deepfake images of their classmates, leading to a police investigation and calls for updated laws to address the use of AI tools for harassment and bullying.

Policy

AI companies agree to limit election ‘deepfakes’ but fall short of ban - AI companies agree to develop tech to identify, label, and control AI-generated deceptive content in elections, but fall short of banning it, as they aim to combat the spread of “deepfakes” and educate the public on the risks.

Watermarking the future - AI-generated deepfakes are a growing concern, and the Biden administration is pushing for the use of watermarks to identify AI-generated content, but experts question whether this will effectively combat disinformation.

Big tech vows action on ‘deceptive’ AI in elections - Big tech companies, including Amazon, Google, and Microsoft, have pledged to combat deceptive AI in elections by deploying technology to detect and counter voter-deceiving content, but some experts believe the voluntary pact may not be proactive enough to prevent harmful content.

Analysis

Better than a real man’: young Chinese women turn to AI boyfriends - Chinese women are turning to AI boyfriends for companionship and emotional support, customizing their virtual partners to meet their needs and finding comfort in the AI’s ability to adapt to their personalities.

Fun

Helen Mirren Rips Up AI-Generated Speech at American Cinematheque Awards - Helen Mirren rips up AI-generated speech at American Cinematheque Awards and shares her favorite memories and aspirations, including a desire to conquer a musical movie.

2025-04-17T04:13:55+00:00

Top News

OpenAI reportedly nears breakthrough with “reasoning” AI, reveals progress framework

OpenAI has introduced a five-tier system to track its progress towards developing artificial general intelligence (AGI), a type of AI that can perform tasks like a human without specialized training. The levels range from current AI capabilities to systems that could potentially manage entire organizations. OpenAI’s technology, such as GPT-4o that powers ChatGPT, is currently at Level 1, which includes AI that can engage in conversational interactions. However, the company is reportedly close to reaching Level 2, or “Reasoners,” which would be capable of basic problem-solving on par with a human with a doctorate degree. Despite the introduction of this system, there is no consensus in the AI research community on how to measure progress towards AGI, and some view OpenAI’s five-tier system as a tool to attract investors rather than a scientific measurement of progress.

More on this:

A Hacker Stole OpenAI Secrets, Raising Fears That China Could, Too

In early 2022, a hacker infiltrated OpenAI’s internal messaging systems, stealing information about the design of the company’s AI technologies. The breach occurred in an online forum where employees discussed the latest technologies, but the hacker did not gain access to the systems where the AI is built and stored. The incident was disclosed to employees and the board of directors in April 2023, but was not made public as no customer or partner information was compromised. OpenAI executives did not perceive the incident as a national security threat, believing the hacker to be a private individual with no connections to a foreign government, and therefore did not report the incident to law enforcement.

More on this:

Microsoft and Apple ditch OpenAI board seats amid regulatory scrutiny

Microsoft has relinquished its observer seat on the board of OpenAI, a move that comes less than eight months after it secured the non-voting position. Apple, which was reportedly planning to join OpenAI’s nonprofit board, has also decided against it. These changes occur amid growing antitrust concerns over Microsoft’s partnership with OpenAI, with regulators in the UK and EU scrutinizing the deal, along with other Big Tech AI investments. Despite this, OpenAI plans to continue its successful partnership with Microsoft and Apple through regular stakeholder meetings, aimed at fostering stronger collaboration across safety and security. Microsoft’s investment in OpenAI, which exceeds $10 billion, has made it the exclusive cloud partner for OpenAI, powering all its workloads across products, API services, and research.

More on this:

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

The article discusses the development of FlashAttention-3, an improved method for speeding up attention on Hopper GPUs, a key component of the Transformer architecture used in large language models. The new method utilizes three main techniques: exploiting asynchrony of the Tensor Cores and TMA to overlap computation and data movement, interleaving block-wise matmul and softmax operations, and using block quantization and incoherent processing that leverages hardware support for FP8 low-precision. The results show that FlashAttention-3 achieves a speedup on H100 GPUs by 1.5-2.0 times with FP16 reaching up to 740 TFLOPs/s (75% utilization), and with FP8 reaching close to 1.2 PFLOPs/s. Furthermore, FP8 FlashAttention-3 has been validated to achieve 2.6 times lower numerical error than a baseline FP8 attention.

More on this:

Other News

Tools

Odyssey Building ‘Hollywood-Grade’ AI Text-to-Video Model to Compete With Sora, Gen-3 Alpha - Odyssey is developing an AI video model that can create Hollywood-grade visual effects and allow users to edit and control the output at a granular level, using multiple large language models to generate different layers of the output video.

[Landing AI

Kuaishou’s text-to-video model Kling introduces new short video generation feature, results go viral in China](https://technode.com/2024/07/09/landing-ai-kuaishous-text-to-video-model-kling-introduces-new-short-video-generation-feature-results-go-viral-in-china/) - Kuaishou’s text-to-video model Kling AI, showcased at the World Artificial Intelligence Conference, has gone viral in China, generating AI videos based on simple prompts and challenging TikTok’s Douyin and ByteDance’s TikTok.

Anthropic Introduces Fine-Tuning for Claude 3 Haiku on Amazon Bedrock - Anthropic introduces fine-tuning capabilities for Claude 3 Haiku on Amazon Bedrock, allowing businesses to customize the model for specific tasks, leading to improved performance and increased control over AI training.

China’s homegrown OS fires back at AI PCs — openKylin gets AI assistant, text-to-image generation, and local LLM support - China’s openKylin OS, integrated with AI features, aims to compete with Windows in the AI PC market by offering AI assistant, text-to-image generation, and local LLM support.

Anthropic’s Claude adds a prompt playground to quickly improve your AI apps - Anthropic’s Claude introduces a prompt playground to automate prompt engineering and improve AI applications, offering quick feedback and tools to test and evaluate prompts for better results.

Meta Release Cross-Platform XR Framework, for Quest, iOS, Windows, Linux - Meta announced the launch of “Ocean,” an open-source, cross-platform framework for creating computer vision and mixed/augmented reality applications, which supports various OS-end devices and aims to lower the entry barrier for integrating XR interactions and features.

Vimeo joins YouTube and TikTok in launching new AI content labels - Vimeo has implemented AI content labels to distinguish between real and AI-generated content, requiring creators to disclose when AI is used for realistic videos.

Google says Gemini AI is making its robots smarter - Google is using Gemini AI to train its robots for better navigation and task completion, allowing them to understand natural language instructions and achieve a 90 percent success rate in executing user commands.

Quora’s Poe now lets users create and share web apps - Quora’s Poe introduces Previews feature allowing users to create interactive apps directly in chats with AI-powered chatbots, supporting HTML output and multiple chatbots, but arrives amidst controversy over allowing users to download paywalled articles.

Adobe adds CAI ‘Content Credentials’ to Camera Raw, Lightroom and Photoshop - nan

Business

A.I. Helped to Find a Vast Source of the Copper That A.I. Needs to Thrive - A.I. technology led to the discovery of a vast copper deposit in Zambia, potentially worth billions of dollars annually.

China’s AI competition deepens as SenseTime, Alibaba claim progress at AI show - Chinese AI companies SenseTime and Alibaba showcased their advancements in large language models (LLMs) at the World Artificial Intelligence Conference (WAIC) in Shanghai, with SenseTime claiming improved performance and Alibaba touting new user growth for its Tongyi Qianwen LLMs.

AI is poised to automate today’s most mundane manual warehouse task - AI-powered robotics company Jacobi Robotics has developed software that uses deep learning and traditional robotics methods to automate the process of palletizing items in warehouses, promising to drastically reduce the time and effort required for training and computation.

AMD plans to acquire Silo AI in $665 million deal - AMD plans to acquire Finnish AI company Silo AI in a $665 million deal, aiming to boost its position in the AI landscape with over 100 PhDs and 300 employees joining the company.

Meet Odyssey — AI video that’s ‘fit for Hollywood’ - Odyssey, a startup, is developing Hollywood-grade visual AI that allows fine-tuned control over every element in a scene, using four generative models to create glitch-free and mind-blowing visuals for movies, TV shows, and video games.

Robot-packed meals are coming to the frozen-food aisle - AI-powered robotic arms are revolutionizing the frozen food industry by accurately portioning out meals and reducing labor costs for companies like Amy’s Kitchen.

Figure 01: Coffee-making humanoid robot now shows car assembly skill at BMW - A humanoid robot developed by Figure is now being used in BMW’s car assembly process, showcasing the potential for increased automation in response to workforce scaling challenges.

OpenAI and Arianna Huffington are working together on an ‘AI health coach’ - OpenAI and Arianna Huffington are collaborating on an “AI health coach” that aims to provide personalized health advice and guidance based on individual data, although there are concerns about privacy and the potential for misinformation.

AI Video Startup Captions Valued at USD 500M in USD 60M Series C - AI video editing startup Captions raises USD 60m in Series C funding, bringing its total funds to USD 100m, with a valuation of USD 500m, and plans to invest $100 million into advancing generative video research.

Nvidia AI Chip Supply Is a ‘Huge Bottleneck,’ EU’s Vestager Warns - EU competition chief warns of a “huge bottleneck” in Nvidia AI chip supply, but is still undecided on potential actions.

Oracle and Musk-owned xAI close talks on reported $10bn server deal - Elon Musk’s xAI is ending its server rental agreement with Oracle after the firm deemed Musk’s supercomputer specifications technologically impossible, opting to build its own system instead.

Tesla shares fall 6% after report of robotaxi unveiling delay - Tesla’s shares fell 6% after reports of a delay in unveiling its Robotaxi by two months, impacting the company’s stock performance and raising questions about its promises for autonomous vehicles.

Tesla delays robotaxi launch to October from August, Bloomberg News reports - Tesla delays the launch of its robotaxi to October from August due to reworking some elements of the car, causing a drop in stock prices.

Bumble users can now report profiles that use AI-generated photos - Bumble introduces new reporting option to combat AI-generated profiles on its dating app, aiming to create a safer and more trustworthy environment for its users.

OpenAI Develops System to Track Progress Toward Human-Level AI - OpenAI introduces a five-level system to measure progress towards surpassing human-level AI, aiming to enhance understanding of AI safety and its future.

French Startup Bioptimus Releases AI Model for Disease Diagnosis - French startup Bioptimus releases AI model trained on millions of images to aid in disease research and diagnosis.

Tech Startup Aims to Help Media License Content for AI Training - AI startup Avail launches Corpus, a product to help smaller media and entertainment companies and independent creators license their content to AI firms for model training.

Figma pauses its new AI feature after Apple controversy - Figma temporarily disables its “Make Design” AI feature after criticism for mimicking Apple’s Weather app, while YouTube allows takedown requests for AI-generated content and Fisker seeks approval to sell its electric SUVs at a steep discount.

Why The Atlantic signed a deal with OpenAI - The Atlantic’s CEO discusses the magazine’s deal with OpenAI, the value of AI in journalism, and the future of media in the digital age.

Humane execs leave company to found AI fact-checking startup - Former Humane execs leave company to found AI fact-checking startup Infactory, focused on data and utilizing AI for natural language interface but not in the results themselves, aiming for subscription pricing and targeting enterprise customers.

Perplexity planning revenue sharing program with web publishers next month - We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More

Research

MIT researchers introduce generative AI for databases - MIT researchers introduce GenSQL, a generative AI system for databases that enables users to perform complex statistical analyses, make predictions, detect anomalies, guess missing values, fix errors, and generate synthetic data with just a few keystrokes, providing faster and more accurate results compared to popular AI-based approaches.

Enhancing Language Models with RAG: Best Practices and Benchmarks - Optimizing RAG techniques to enhance large language model performance through systematic evaluation and innovative combinations, including multimodal retrieval, leads to significant improvements in performance metrics.

Learning to (Learn at Test Time): RNNs with Expressive Hidden States - RNNs with expressive hidden states, called Test-Time Training (TTT) layers, are proposed to improve sequence modeling performance, with TTT-Linear already outperforming Transformer and Mamba in certain contexts.

Data curation via joint example selection further accelerates multimodal learning - Joint example selection for data curation accelerates multimodal learning, surpassing state-of-the-art models with significantly fewer iterations and less computation.

$\text{Memory}^3$: Language Modeling with Explicit Memory - A new language model, Memory^3, is equipped with explicit memory to reduce training and inference costs, achieving better performance than larger models and maintaining higher decoding speed.

Just read twice: closing the recall gap for recurrent language models - Improving the recall gap for recurrent language models by addressing the challenge of information selection and proposing JRT-Prompt and JRT-RNN as solutions.

Extrinsic Hallucinations in LLMs - Large language models (LLMs) often generate unfaithful, fabricated, inconsistent, or nonsensical content, a problem known as hallucination, which can be narrowed down to cases where the model output is fabricated and not grounded by either the provided context or world knowledge, and various methods such as fine-tuning, sampling, and attribution are proposed to reduce hallucination and improve factuality.

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs - FunAudioLLM introduces innovative models for enhancing natural voice interactions between humans and large language models, enabling applications such as speech-to-speech translation and emotional voice chat.

Distilling System 2 into System 1 - Distilling System 2 techniques into System 1 through self-supervised methods improves final responses and reduces inference cost for large language models.

From Principles to Rules: A Regulatory Approach for Frontier AI - A regulatory approach for AI is proposed, emphasizing the importance of principles and rules to guide the development and use of frontier AI technologies.

PaliGemma: A versatile 3B VLM for transfer - PaliGemma is an open Vision-Language Model (VLM) based on the SigLIP-So400m vision encoder and the Gemma-2B language model, achieving strong performance on diverse tasks.

Vision language models are blind - Vision language models, such as GPT-4o and Gemini 1.5 Pro, are found to fail on basic visual tasks, indicating their poor performance in understanding visual information.

This&That: Language-Gesture Controlled Video Generation for Robot Planning - AI method This&That uses language-gesture conditioning to generate videos for robot planning, addressing challenges in task communication, video generation control, and translating visual planning into robot actions.

Google DeepMind Introduces a Parameter-Efficient Expert Retrieval Mechanism that Leverages the Product Key Technique for Sparse Retrieval from a Million Tiny Experts - Google DeepMind introduces a novel Parameter Efficient Expert Retrieval (PEER) mechanism that leverages a vast pool of tiny experts and efficient routing techniques to address the computational challenges associated with scaling transformer models, demonstrating superior performance-compute trade-off and potential for advancing AI research.

CodeUpdateArena: Benchmarking Knowledge Editing on API Updates - A benchmark called CodeUpdateArena is introduced to evaluate how large language models can update their knowledge about code API functions, highlighting the challenges and the need for new methods in knowledge editing for code LLMs.

WildGaussians: 3D Gaussian Splatting in the Wild - A new approach called WildGaussians is introduced to improve 3D Gaussian Splatting’s performance in handling in-the-wild data, achieving state-of-the-art results with real-time rendering speeds.

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation - The article discusses the CopyBench tool for measuring the reproduction of copyright-protected text in language model generation.

Are ‘visual’ AI models actually blind? - AI models touted as “multi-modal” and capable of understanding images and audio as well as text may not actually see in the way humans do, as a study reveals their struggles with simple visual tasks, raising questions about their true visual understanding.

Simplifying Deep Temporal Difference Learning - Deep Temporal Difference Learning is being simplified through a paper that is not found on arXiv, but if it exists, it can be cited to appear on Hugging Face.

Concerns

OpenAI is plagued by safety concerns - OpenAI is facing safety concerns from employees and external sources, with claims of rushed safety tests, dissolved safety teams, and deprioritized safety culture, raising worries about the potential impact on society.

Tesla sells ‘Self-Driving’ cars. Is it fraud? - Tesla’s marketing of its “Full Self-Driving” and Autopilot features is under scrutiny by the U.S. Justice Department and California’s Department of Motor Vehicles, as well as facing civil lawsuits, over claims of potential fraud and misleading customers.

OpenAI Researcher Says He Quit When He Realized the Upsetting Truth - Former OpenAI worker quit due to the company prioritizing profit over safety in the pursuit of artificial general intelligence, likening it to the Titanic and expressing concerns over the lack of oversight and shifting corporate structure.

Tool preventing AI mimicry cracked; artists wonder what’s next - AI image generators are becoming better at replicating unique styles, prompting artists to seek defenses like Glaze, a tool that adds imperceptible noise to images to prevent mimicry, but its effectiveness is questioned as demand surges and security researchers claim it can be bypassed.

4chan Is Using TikTok’s Hidden AI App to Generate Porn - Users on 4chan have found a way to use TikTok’s hidden AI app to generate porn, prompting ByteDance to disable the AI-image generation capabilities despite the app’s policies and guardrails.

Policy

Senators introduce COPIED Act to push for better watermarking on AI content - Senators introduce COPIED Act to protect content from AI manipulation and require watermarking for authentication.

Japan’s Defense Ministry unveils first basic policy on use of AI - Japan’s Defense Ministry unveils its first basic policy on the use of AI to address manpower shortage and keep pace with global military technology advancements.

Etsy adds AI-generated item guidelines in new seller policy - Etsy introduces new guidelines for AI-generated items in its seller policy, requiring sellers to label products based on the level of human involvement and disclose if AI tools were used in the creation process.

Analysis

Breaking Down What’s at Stake in Music’s AI Lawsuits - AI music lawsuits could shape the future of the music industry, as major labels sue AI firms for alleged copyright infringement, with potential implications for fair use and control over AI technology.

AI scaling myths - Bigger language models have shown improvement, but there are misconceptions about their future capabilities, as scaling laws do not guarantee continued emergence, and obtaining more high-quality training data may be challenging and costly.

How Good Is ChatGPT at Coding, Really? - AI code generator ChatGPT has a broad range of success in producing functional code, with better performance on older coding problems, but it lacks critical thinking skills and understanding of newer problems, leading to security concerns and the need for additional developer input.

Explainers

The Illustrated AlphaFold - A detailed visual walkthrough of AlphaFold3’s architecture, including its input preparation, representation learning, structure prediction, loss function, and other training details, as well as its similarities to recurrent architectures and trends in machine learning.

The making of Eno, the first generative feature film - Eno, the first generative feature film, is a documentary about musician Brian Eno, created using a proprietary generative software system that allows for a different version of the film to be shown each time, exploring Eno’s creative process and philosophy while also sparking discussions about the potential of generative filmmaking and AI technology.

Fun

The first Miss AI has been crowned — and she’s a Moroccan lifestyle influencer - Moroccan AI influencer Kenza Layli wins the inaugural Miss AI contest, expressing her commitment to promoting diversity and inclusivity within the field of AI technology.

Last Week in AI #307

2025-04-14T00:00:00+00:00

Top News

OpenAI’s new GPT-4.1 AI models focus on coding

OpenAI has launched a new family of AI models, GPT-4.1, which includes GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models are designed to excel at coding and instruction following, with a 1-million-token context window, allowing them to process approximately 750,000 words at once. The models are part of OpenAI’s ambition to create AI coding models capable of complex software engineering tasks, including programming entire apps end-to-end. The GPT-4.1 models have been optimized for real-world use, with improvements in areas such as frontend coding, format adherence, and consistent tool usage. However, OpenAI acknowledges that the models become less reliable as the number of input tokens increases, and they often require more specific, explicit prompts.

OpenAI launches a pair of AI reasoning models, o3 and o4-mini

OpenAI has launched two new AI reasoning models, o3 and o4-mini, which are designed to pause and work through questions before responding. The o3 model is touted as OpenAI’s most advanced reasoning model, outperforming previous models in tests measuring math, coding, reasoning, science, and visual understanding capabilities. The o4-mini model offers a balance between price, speed, and performance. Both models can generate responses using tools in ChatGPT such as web browsing, Python code execution, image processing, and image generation. These models, along with a variant of o4-mini called “o4-mini-high”, are now available for subscribers to OpenAI’s Pro, Plus, and Team plans. The launch of these models is part of OpenAI’s efforts to compete with other tech giants in the global AI race.

Google’s newest Gemini AI model focuses on efficiency

Google is set to launch its new AI model, Gemini 2.5 Flash, on its AI development platform, Vertex AI. The model is designed for efficiency and dynamic computing, allowing developers to adjust processing time based on the complexity of queries. Gemini 2.5 Flash is a reasoning model, similar to OpenAI’s o3-mini and DeepSeek’s R1, which takes longer to answer questions as it fact-checks itself. It is ideal for high-volume and real-time applications like customer service and document parsing. Google also plans to bring Gemini models like 2.5 Flash to on-premises environments starting in Q3, with the models being available on Google Distributed Cloud (GDC), in collaboration with Nvidia.

Google rolls out its latest AI video generator to Gemini Advanced subscribers

Google has introduced Veo 2, an advanced text-to-video AI model, to its Gemini Advanced subscribers. The AI model is capable of generating high-resolution, eight-second videos in 720p from a text prompt, with a monthly limit on the number of videos that can be created. The videos, which are output in MP4 format, can be directly uploaded to TikTok and YouTube from mobile devices. Google claims that Veo 2 has an improved understanding of real-world physics and human motion, resulting in more lifelike scenes and fluid character movements. Alongside Veo 2, Google is also offering Whisk Animate, a tool that transforms images into videos, to Google One AI Premium subscribers.

Other News

Tools

Google just Launched Agent2Agent, an Open Protocol for AI agents to Work Directly with Each Other - Agent2Agent Protocol (A2A) enables secure, cross-platform communication between AI agents, allowing them to collaborate and function as cohesive digital teams across various enterprise environments.

OpenAI Open Sources BrowseComp: A New Benchmark for Measuring the Ability for AI Agents to Browse the Web - BrowseComp is a new benchmark by OpenAI designed to evaluate AI agents’ ability to persistently browse the web and retrieve complex information, revealing significant performance gaps in current models compared to human capabilities.

Ironwood is Google’s newest AI accelerator chip - Google’s Ironwood, the seventh-generation TPU optimized for inference, offers significant advancements in computing power, memory, and energy efficiency, positioning it as a formidable competitor in the AI accelerator market.

OpenAI debuts Codex CLI, an open source coding tool for terminals - OpenAI’s Codex CLI is an open source tool that integrates AI models with command-line interfaces to assist in coding tasks, while also offering API grants to encourage its adoption.

xAI preparing updates for Grok, including Grok 3.5 release and new features - xAI is rapidly advancing its Grok product with upcoming releases of Grok 3.5 and Grok 4, new features like Vision in voice mode, memory reference capabilities, Google Drive integration, and an image editing tool, all while closing the feature gap with competitors.

Elon Musk’s AI company, xAI, launches an API for Grok 3 - Elon Musk’s AI company, xAI, has launched an API for its Grok 3 model, offering it in two versions with reasoning capabilities, but facing criticism for its pricing, context window limitations, and political biases.

WordPress.com is offering a new AI site builder - WordPress.com’s new AI-powered site builder allows users to quickly create basic websites with AI-generated content and design, though it currently lacks capabilities for complex ecommerce sites and requires a hosting plan for full functionality.

Microsoft is about to launch Recall for real this time - Microsoft is gradually rolling out the Recall feature, which captures screenshots for later retrieval, to Windows Insiders in the Release Preview channel, indicating an imminent wider launch after addressing security concerns.

Business

Generative AI Is Learning to Spy for the US Military - Generative AI tools developed by Vannevar Labs are being used by the US military to efficiently collect, interpret, and analyze vast amounts of intelligence data, enhancing decision-making capabilities in dynamic situations.

Ilya Sutskever taps Google Cloud to power his AI startup’s research - Ilya Sutskever’s new AI startup, Safe Superintelligence, has partnered with Google Cloud to utilize its TPU chips for advancing research in safe, superintelligent AI systems, with significant financial backing and a focus on improving AI model performance.

Wayve’s self-driving tech is headed to Nissan vehicles - Nissan plans to integrate Wayve’s self-learning AI software into its ProPilot system by 2027, enhancing its driver assistance capabilities with advanced collision avoidance and adaptability across various environments.

Ex-OpenAI staffers file amicus brief opposing the company’s for-profit transition - Ex-OpenAI employees have filed an amicus brief supporting Elon Musk’s lawsuit against OpenAI’s transition to a for-profit model, arguing it contradicts the company’s mission and could compromise safety and ethical standards.

Access to future AI models in OpenAI’s API may require a verified ID - OpenAI plans to implement a Verified Organization process requiring government-issued ID verification to access advanced AI models, aiming to enhance security and prevent misuse or IP theft.

Trump reportedly suspends Nvidia H20 export ban plan after $1 million dinner with Jensen Huang - Following a $1 million dinner with Nvidia’s CEO Jensen Huang, the Trump administration decided to suspend its plan to ban Nvidia’s H20 HGX GPU exports to China, with Nvidia promising to invest in U.S. AI infrastructure.

OpenAI co-founder Ilya Sutskever’s Safe Superintelligence reportedly valued at $32B - Safe Superintelligence, founded by Ilya Sutskever after leaving OpenAI, has secured significant funding to develop a safe superintelligence product, though details remain sparse.

Anthropic rolls out a $200-per-month Claude subscription - Anthropic introduces a new subscription plan for its AI chatbot Claude, offering higher usage limits and priority access to new features, with the potential to boost revenue through expensive subscriptions and educational offerings.

Hugging Face buys a humanoid robotics startup - Hugging Face’s acquisition of Pollen Robotics aims to expand its robotics efforts by selling the humanoid robot Reachy 2 and encouraging developers to enhance its open-source code.

Now it’s TikTok parent ByteDance’s turn for a reasoning AI: enter Seed-Thinking-v1.5! - ByteDance’s Seed-Thinking-v1.5, a reasoning AI model built on Mixture-of-Experts architecture, demonstrates strong performance in STEM and general-purpose domains, utilizing innovative reinforcement learning techniques and a structured data strategy to achieve competitive results against leading models like Google’s Gemini 2.5 Pro and OpenAI’s o3-mini-high.

Canva is now in the coding and spreadsheet business - Canva is expanding its platform with generative AI-powered tools, including coding, spreadsheets, and an AI chatbot, to offer a comprehensive suite that integrates design and productivity features for seamless team collaboration.

OpenAI’s Countersuit of Elon Musk Alleges Harassment and ‘Sham’ Takeover Bid - OpenAI has filed a countersuit against Elon Musk, accusing him of harassment and undermining the company through various means, including a rejected takeover bid and public attacks, as part of an ongoing legal battle set to go to trial in 2026.

Research

No elephants: Breakthroughs in image generation - Google and OpenAI’s recent advancements in multimodal image generation allow AI to directly create images with greater precision and creativity, raising important questions about creative ownership and the future of visual media.

Liquid: Language Models are Scalable and Unified Multi-modal Generators - Liquid introduces a scalable, decoder-only architecture for multi-modal generation and understanding, demonstrating that large language models can efficiently handle both visual and language tasks with shared vocabulary space, achieving superior performance in image generation and visual understanding while maintaining strong linguistic capabilities.

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding - FUSION introduces a novel framework for deep integration of vision and language in multimodal learning, utilizing text-guided vision encoding, context-aware alignment, and a synthesized QA dataset to enhance performance and address embedding misalignment.

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens - OLMoTrace is a system that efficiently traces language model outputs back to their training data by using verbatim matching and a novel parallel algorithm, providing users with an interactive tool for exploring the origins of specific word sequences in model responses.

Generative Evaluation of Complex Reasoning in Large Language Models - KUMO is a generative evaluation framework that creates diverse reasoning tasks to assess whether large language models genuinely reason or simply recall information, revealing that many models outperform university students on easier tasks and match their performance on complex ones.

Sample, Don’t Search: Rethinking Test-Time Alignment for Language Models - QAlign, a novel test-time alignment method, improves language model performance by sampling from an optimal aligned distribution without requiring access to model weights, outperforming existing methods like BoN, MV, and WMV across various benchmarks.

One-Minute Video Generation with Test-Time Training - Test-Time Training (TTT) layers, integrated into a pre-trained Diffusion Transformer, enable the generation of coherent one-minute videos with complex, multi-scene stories by efficiently handling long context lengths and dynamic motion, outperforming existing RNN-based methods.

An Empirical Study of GPT-4o Image Generation Capabilities - An empirical study evaluates GPT-4o’s image generation capabilities across multiple tasks, comparing it to other models and identifying future directions for unified generative architectures.

TransMamba: Flexibly Switching between Transformer and Mamba - TransMamba introduces a novel framework that flexibly switches between Transformer and Mamba models using shared parameters, optimizing performance and efficiency across varying sequence lengths and layers.

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model - Seaweed-7B demonstrates that a medium-sized video generation model can achieve competitive performance and cost-efficiency by optimizing design choices, training strategies, and architectural considerations, challenging the notion that only large models can excel in this domain.

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models - M1, a hybrid reasoning model, achieves comparable performance to large transformer models on math benchmarks while offering a 3x speedup in inference throughput by efficiently transferring reasoning capabilities and optimizing memory usage.

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories - AgentRewardBench is introduced as a benchmark to assess the effectiveness of large language models (LLMs) in evaluating web agent trajectories, revealing that no single LLM performs best across all benchmarks and highlighting the limitations of rule-based evaluations.

S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models - S1-Bench evaluates the system 1 thinking capabilities of large reasoning models, revealing their inefficiency and accuracy issues on simple tasks despite their advanced reasoning abilities.

Google’s newest AI model is designed to help study dolphin ‘speech’ - Google DeepMind’s DolphinGemma AI model, trained with data from the Wild Dolphin Project, aims to decipher and generate dolphin vocalizations, enhancing research on dolphin communication and enabling real-time interaction using Google’s Pixel smartphones.

Concerns

Phase Two of Military AI Has Arrived - The Pentagon’s integration of generative AI into military operations raises concerns about the effectiveness of human oversight, challenges in data classification, and the potential for AI to influence critical decision-making processes.

‘An Overwhelmingly Negative And Demoralizing Force’: What It’s Like Working For A Company That’s Forcing AI On Its Developers - AI technology is being increasingly forced upon video game developers, leading to demoralization and resistance as it threatens their creativity, expertise, and job security.

Analysis

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark - Meta’s unmodified Llama 4 Maverick model underperformed on the LM Arena benchmark compared to older models from competitors, highlighting the challenges of optimizing AI for specific benchmarks.

Last Week in AI #306

2025-04-07T00:00:00+00:00

Top News

Meta releases Llama 4, a new crop of flagship AI models

Meta has launched a new suite of AI models, Llama 4, which includes Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth. These models were trained on large volumes of unlabeled text, image, and video data to enhance their visual understanding capabilities. The Llama 4 models are the first to use a mixture of experts (MoE) architecture, which improves computational efficiency by dividing data processing tasks among smaller, specialized models. However, the use of these models is restricted in the EU due to regional AI and data privacy laws, and companies with over 700 million monthly active users must obtain a special license from Meta. The Llama 4 models are designed to respond to contentious questions more frequently and in a more balanced manner than previous models.

OpenAI’s Revenue Set to Triple After Jump in Paid ChatGPT Users

OpenAI, an artificial intelligence research lab, has seen a 30% increase in its paid subscriber base for ChatGPT, rising from 15.5 million to over 20 million in the last quarter. This surge has led to a corresponding 30% increase in monthly revenue, from $333 million to $415 million. The company, recently valued at $300 billion following a $40 billion funding round led by SoftBank and supported by Microsoft, also revealed that ChatGPT is used by over 500 million people weekly. OpenAI projects a significant revenue expansion, expecting to triple its total to $12.7 billion in 2025 from $3.7 billion in 2024, and anticipates generating $29.4 billion in 2026.

Amazon unveils Nova Act, an AI agent that can control a web browser

Amazon has introduced Nova Act, a general-purpose AI agent capable of controlling a web browser and performing simple tasks independently. Developed by Amazon’s AGI lab, Nova Act will be a key component of the upcoming Alexa+ upgrade, an AI-enhanced version of Amazon’s voice assistant. The Nova Act SDK, also released by Amazon, allows developers to build agent prototypes, enabling AI agents to navigate web pages, fill out forms, and schedule dates on a calendar. Despite the crowded market, Amazon claims that Nova Act outperforms similar agents from OpenAI and Anthropic in several internal tests. However, the reliability of these AI agents across different domains remains a significant challenge, with early versions often slow, struggling to operate independently, and prone to errors.

Adobe launches Premiere Pro’s generative AI video extender

Adobe has released version 25.2 of Premiere Pro, introducing AI-powered features designed to enhance video editing. The most significant addition is Generative Extend, a tool powered by Adobe’s Firefly generative AI video model, which allows users to extend video clips by up to two seconds and ambient background audio by up to ten seconds. This feature is free for a limited time, after which users will need to spend Firefly generative credits. The update also includes an AI-powered Search panel that recognizes the content of clips, enabling users to search for footage using text descriptions. Additionally, Premiere Pro can now automatically translate video captions into 27 languages and offers improved speed and performance on both Apple silicon and Windows devices.

Other News

Tools

OpenAI prepares reasoning slider and memory update for ChatGPT users - OpenAI is enhancing ChatGPT with features like improved memory for context-aware interactions, a reasoning slider for task complexity, and a notification feed to keep users informed about updates.

Runway releases an impressive new video-generating AI model - Runway’s new Gen-4 video-generating AI model offers high-fidelity video creation with consistent characters and environments, but faces legal challenges over its training data and potential industry disruption.

Google’s AI Mode can now see and search with images - Google’s AI Mode now integrates Gemini AI and Lens technology to enhance image-based search capabilities, providing detailed responses and recommendations by analyzing the context and relationships within images.

ByteDance’s DreamActor-M1 Turns Images into Stunningly Real Human Videos - DreamActor-M1, a new framework by ByteDance, uses a Diffusion Transformer architecture to create realistic human animations from images, outperforming existing models while addressing ethical concerns and limitations like dynamic camera movements.

Microsoft updates Copilot with the greatest hits from other AIs - Microsoft’s Copilot is being enhanced with features like memory, personalization, web actions, and podcast creation to better compete with AI alternatives such as ChatGPT and Claude.

Midjourney launches its new V7 AI image model that can process text prompts better - Midjourney’s V7 AI image model introduces enhanced text prompt processing, improved image quality, and new features like Draft Mode for faster, cost-effective iterations, while personalization options allow users to tailor the AI to their visual preferences.

Runway Introduces Gen-4 Turbo Video AI Model With Faster Generation Speeds - Runway’s Gen-4 Turbo AI model significantly enhances video generation speed and efficiency, offering improved consistency and realism in video scenes while being more credit-efficient than its predecessor.

Microsoft has created an AI-generated version of Quake - Microsoft’s Muse AI model is being showcased through an AI-generated Quake II tech demo, highlighting its potential to assist game developers in prototyping and preserving classic games for modern platforms.

Business

Nvidia H20 Chips: $16 Billion Orders from ByteDance, Alibaba, and Tencent - Chinese tech giants ByteDance, Alibaba, and Tencent have placed substantial orders for Nvidia’s H20 server chips, driven by China’s rapidly expanding AI industry despite U.S. export restrictions.

Intel and TSMC are reportedly launching a joint chipmaking venture - Intel and TSMC are forming a joint venture to operate Intel’s chipmaking facilities, with TSMC contributing expertise and training instead of capital, amid efforts to revitalize Intel under new CEO Lip-Bu Tan.

Google-backed Isomorphic Labs raises $600m to advance AI drug discovery - Isomorphic Labs has raised $600 million to accelerate the development of its AI drug design engine and advance its programs into clinical development, amidst a growing trend of AI integration in the pharmaceutical industry.

AI Video Startup Runway Valued at $3 Billion in Funding Round - Runway AI Inc. has raised $308 million in a new round of funding that more than doubles the company’s valuation — a sign of investor enthusiasm for startups building artificial intelligence software that can generate videos.

Anthropic launches an AI chatbot plan for colleges and universities - Anthropic’s Claude for Education tier offers higher education institutions an AI chatbot with features like Learning Mode to enhance critical thinking, while partnering with platforms like Canvas and Internet2 to facilitate integration and aiming to expand its presence through campus agreements and student programs.

Spotify debuts Gen AI ads, programmatic ad buying - Spotify is enhancing its advertising business with Gen AI ads and the Spotify Ad Exchange, enabling real-time auctions and AI-generated scripts and voiceovers to better target its extensive Gen Z user base.

Google Gemini is shaking up its AI leadership ranks - Google is undergoing a leadership change in its AI division, with Josh Woodward taking over from Sissie Hsiao to focus on advancing the Gemini app as the AI race emphasizes product development alongside model innovation.

Alibaba Preparing for Flagship AI Model Release as Soon as April - Alibaba Group Holding Ltd. is planning to release Qwen 3, an upgraded version of its flagship AI model, as soon as this month with competition from rivals including OpenAI and DeepSeek heating up.

DeepMind is holding back release of AI research to give Google an edge - DeepMind has implemented stricter publication policies to maintain a competitive advantage for Google in the AI industry, delaying the release of strategic research papers.

Meta’s head of AI research announces departure - Joelle Pineau’s departure from Meta as head of AI research occurs amidst the company’s intense focus on AI development and competition with major tech rivals.

How Extropic Plans to Unseat Nvidia - Extropic is developing a revolutionary computer chip that leverages thermodynamic fluctuations for efficient probabilistic calculations, potentially offering a more energy-efficient alternative to conventional silicon chips in AI applications.

China’s Zhipu Offers Free AI Agent in Riposte to DeepSeek, Manus - China’s Zhipu is making its new AI agent free to use as domestic competition to build emerging artificial intelligence technologies heats up. The Beijing-based startup on Monday unveiled AutoGLM, an artificial intelligence agent that can conduct deep research.

Is the CEO of the heavily funded humanoid robot startup Figure AI exaggerating his startup’s work with BMW? - Questions arise about the accuracy of the CEO’s claims regarding Figure AI’s collaboration with BMW.

Research

This A.I. Forecast Predicts Storms Ahead - The A.I. Futures Project, led by former OpenAI researcher Daniel Kokotajlo, is forecasting potential global disruptions caused by increasingly powerful artificial intelligence systems by 2027.

An Approach to Technical AGI Safety and Security - Google DeepMind outlines a roadmap for mitigating severe risks from AGI, focusing on technical safety and security through strategies addressing misuse and misalignment, while emphasizing robust training, monitoring, and security measures.

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems - The survey explores the modular architecture of intelligent agents inspired by human brain functions, their self-enhancement and adaptive evolution, collaborative multi-agent systems, and the importance of building safe and secure AI systems.

Large Language Models Pass the Turing Test - Recent studies demonstrate that large language models like GPT-4.5 and LLaMa-3.1-405B can pass the Turing test when prompted to adopt a humanlike persona, suggesting their potential to convincingly imitate human conversation and raising implications for their use in social and economic contexts.

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad - Current large language models struggle with rigorous mathematical reasoning and proof generation, achieving less than 5% accuracy on the 2025 USA Math Olympiad problems, indicating a need for significant advancements in these areas.

Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation - Sim-and-real co-training, which involves supplementing real robot datasets with synthetic simulation datasets, significantly enhances policy learning for vision-based robotic manipulation by improving performance and enabling generalization to new scenarios.

RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning - RoboVerse introduces a comprehensive framework with a simulation platform, synthetic dataset, and unified benchmarks to address the challenges of data scaling and evaluation in robot learning, enhancing performance and sim-to-real transfer.

PaperBench: Evaluating AI’s Ability to Replicate AI Research - PaperBench is a benchmark designed to evaluate AI agents’ ability to autonomously replicate state-of-the-art machine learning research papers, featuring a comprehensive grading system and an auxiliary evaluation method using LLM-based judges.

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead - Inference-time scaling enhances model performance on complex tasks by allocating more computational resources, but its effectiveness varies across domains and tasks, with diminishing returns as complexity increases, highlighting the need for more efficient and purposeful scaling approaches.

AI masters Minecraft: DeepMind program finds diamonds without being taught - DeepMind’s Dreamer AI successfully learned to collect diamonds in Minecraft using reinforcement learning and a world model to generalize knowledge and predict future scenarios without prior instruction.

MoCha: Towards Movie-Grade Talking Character Synthesis - MoCha is a novel diffusion transformer model designed to generate movie-grade talking characters with synchronized speech, realistic emotions, and full-body actions, outperforming existing methods in lip-sync quality, naturalness, and visual coherence.

Policy

AI’s $4.8 trillion future: UN warns of widening digital divide without urgent action - The UNCTAD report highlights the urgent need for international cooperation and investment in infrastructure, data, and skills to address the growing digital divide and ensure AI benefits are equitably distributed worldwide.

Major publishers call on the US government to ‘Stop AI Theft’ - Major publishers are urging the US government to implement regulations that require Big Tech companies to compensate creators for using their content in AI training, as part of a campaign called Support Responsible AI.

Judge doesn’t buy OpenAI argument NYT’s own reporting weakens copyright suit - A US district judge denied OpenAI’s motion to dismiss The New York Times’ copyright claims, ruling that OpenAI failed to prove the newspaper knew about potential copyright violations by ChatGPT before its release.

Last Week in AI #305

2025-03-31T00:00:00+00:00

Top News

Judge Allows ‘New York Times’ Copyright Case Against OpenAI to Go Forward

A federal judge has allowed a copyright lawsuit by The New York Times against OpenAI to proceed. The lawsuit alleges that OpenAI exploited the newspaper’s content without permission or payment to train its artificial intelligence service, ChatGPT. The New York Times, along with other publishers, argue that OpenAI violated copyright laws by using their articles as a significant source of copyrighted text. OpenAI, however, maintains that its mass data scraping is protected under the “fair use” legal doctrine, which allows for material to be reused without permission in certain instances. The case, which is yet to have a trial date set, could have significant implications for both the news industry and the future of AI tools.

Google unveils a next-gen AI reasoning model

Google has introduced Gemini 2.5, a new generation of AI reasoning models, with the first release being Gemini 2.5 Pro Experimental. This multimodal reasoning AI model, available on Google AI Studio and the Gemini app, is touted as Google’s most intelligent model to date. The model uses additional computing power and time to fact-check and reason through problems before providing an answer, a technique that has proven beneficial in math and coding tasks. Gemini 2.5 Pro has outperformed several leading AI models in benchmarks, excelling in creating visually compelling web apps and coding applications. However, it underperformed Anthropic’s Claude 3.7 Sonnet in a software development abilities test. The model can process approximately 750,000 words at once, with plans to double this capacity soon.

OpenAI rolls out image generation powered by GPT-4o to ChatGPT

OpenAI has integrated a new image generation feature, known as “Images in ChatGPT”, into its ChatGPT platform. This feature, powered by GPT-4o, allows users to generate images within the chat itself and is available across all subscription tiers. The new model offers significant improvements in “binding”, the ability of AI image generators to maintain correct relationships between attributes and objects, and text rendering, making it easier to generate coherent text without typos on an image. The system uses an autoregressive approach, generating images sequentially from left to right and top to bottom, which may contribute to its improved text rendering and binding capabilities. Despite taking longer to generate images, OpenAI believes the enhanced quality and capabilities justify the additional wait time.

Tencent’s Hunyuan T1 AI reasoning model rivals DeepSeek in performance and price

Tencent has launched its Hunyuan T1 AI reasoning model, which uses large-scale reinforcement learning, similar to DeepSeek’s R1 reasoning model. The T1 model scored 87.2 points on the Massive Multitask Language Understanding (MMLU) Pro benchmark, surpassing DeepSeek-R1’s 84 points but falling short of OpenAI’s o1’s 89.3 points. The T1 model also performed well in other benchmarks, including the American Invitational Mathematics Examination (AIME) 2024 and the C-Eval suite evaluation for Chinese language capabilities. In terms of pricing, T1 charges 1 yuan per 1 million tokens of input and 4 yuan per million tokens of output, competitive with DeepSeek’s pricing. Tencent’s T1 model uses a hybrid architecture combining Google’s Transformer and Mamba, which reportedly reduces training and inference costs by cutting memory usage.

Other News

Tools

Google is rolling out Gemini’s real-time AI video features - Google has begun implementing Gemini’s real-time AI video features for some Google One AI Premium subscribers, allowing the AI to interpret screens and camera feeds and answer questions in real-time.

Alibaba Releases Qwen2.5 Omni, Adds Voice and Video Modes to Qwen Chat - Alibaba’s Qwen2.5-Omni-7B model introduces advanced multimodal capabilities, enabling real-time voice and video chat in Qwen Chat, and is open-sourced under the Apache 2.0 license.

New Reve Image Generator Beats AI Art Heavyweights MidJourney and Flux at a Penny Per Image - Reve Image 1.0, an affordable AI image generator, excels in prompt adherence and visual quality, offering a cost-effective alternative to established tools like MidJourney and Flux, despite lacking some advanced editing features.

Ideogram presents version 3.0 of its AI image generation system - Ideogram’s version 3.0 enhances AI image generation with a style reference system, improved image quality, and new editing tools, positioning it as a leader in photorealism and professional image creation.

Business

Netflix’s Reed Hastings Gives $50 Million to Bowdoin for A.I. Program - Reed Hastings has donated $50 million to Bowdoin College to establish a research initiative focused on exploring the risks and consequences of artificial intelligence and its impact on humanity.

Pony.ai wins first permit for fully driverless taxi operation in the center of China’s Silicon Valley - Pony.ai has become the first company in China to receive a permit to charge for fully driverless taxi rides in Shenzhen’s Nanshan district, marking a significant milestone in the development of its robotaxi business.

OpenAI adopts rival Anthropic’s standard for connecting AI models to data - OpenAI plans to integrate Anthropic’s Model Context Protocol (MCP) into its products to enhance AI models’ ability to access and utilize data from various sources, fostering better responses and broader application support.

Apple Joins AI Data Center Race After Siri Mess - Apple is investing in AI data centers with a $1 billion order for Nvidia systems, partnering with Dell and Super Micro Computer, after delays and challenges with its AI-enabled Siri prompted a strategic shift.

DeepSeek V3-0324 tops non-reasoning AI models in open-source first - DeepSeek V3-0324’s achievement as the top non-reasoning AI model underscores the growing competitiveness of open-source AI solutions against proprietary systems in real-time applications.

Anthropic Scores Win in AI Copyright Dispute With Record Labels - nan

Research

A new, challenging AGI test stumps most AI models - ARC-AGI-2, a new test by the Arc Prize Foundation, challenges AI models to solve visual pattern puzzles efficiently, revealing their limitations in general intelligence compared to human performance.

Inside-Out: Hidden Factual Knowledge in LLMs - A framework reveals that large language models encode more factual knowledge internally than they express externally, highlighting limitations in their generation capabilities and the challenges of improving performance through repeated answer sampling.

Reasoning to Learn from Latent Thoughts - Explicitly modeling and inferring latent thoughts during language model pretraining can enhance data efficiency and improve performance, especially in data-constrained environments.

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models - OlymMATH is a new bilingual benchmark designed to evaluate the mathematical reasoning capabilities of large language models using Olympiad-level problems, revealing significant challenges and performance gaps, especially in multilingual contexts.

Variance Control via Weight Rescaling in LLM Pre-training - Introducing the Layer Index Rescaling and Target Variance Rescaling techniques, the study demonstrates improved downstream task performance and reduced activation extremes in LLM pre-training through better variance management.

Qwen2.5-Omni Technical Report - Qwen2.5-Omni is an advanced multimodal model that excels in processing and generating text and speech across various modalities using innovative architectures like Thinker-Talker and TMRoPE, achieving state-of-the-art results on benchmarks.

Video-T1: Test-Time Scaling for Video Generation - Video-T1 introduces a framework for test-time scaling in video generation, enhancing video quality by expanding the search space during inference, with the Tree-of-Frames (ToF) method offering efficient computation and significant improvements across various video generation models.

Wan: Open and Advanced Large-Scale Video Generative Models - Wan is an open-source suite of advanced video generative models that excels in performance, efficiency, and versatility, offering significant improvements in video generation through innovative techniques and scalable pre-training strategies.

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework - Lumina-Image 2.0 introduces a unified text-to-image generative framework with a novel joint self-attention mechanism and a dedicated captioning system, significantly enhancing image generation fidelity and efficiency.

Concerns

Brainrot’ AI on Instagram Is Monetizing the Most Fucked Up Things You Can Imagine (and Lots You Can’t) - AI-generated content on Instagram is gaining popularity by monetizing shockingly disturbing and offensive imagery, including racist and sexualized depictions.

ChatGPT is turning everything into Studio Ghibli art — and it got weird fast - OpenAI’s “Images for ChatGPT” allows users to generate Studio Ghibli-style art, leading to controversial creations and raising concerns about copyright and ethical use.

Policy

U.S. blacklists over 50 Chinese companies in bid to curb Beijing’s AI, chip capabilities - The U.S. has blacklisted over 50 Chinese tech companies to restrict their access to advanced AI and computing technologies, citing national security concerns and their alleged support of China’s military advancements.

Last Week in AI #304

2025-03-24T00:00:00+00:00

Top News

OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever

OpenAI has introduced a suite of new audio models aimed at making AI voice agents sound more human-like and responsive. The release includes two new speech-to-text models, GPT-4o-transcribe and GPT-4o-mini-transcribe, which outperform previous models in transcription accuracy across multiple languages, even in challenging scenarios such as understanding different accents and filtering background noise. The new GPT-4o-mini-tts text-to-speech model allows developers to control the tone and delivery of the AI’s speech, a feature OpenAI refers to as “steerability”. Additionally, an updated Agents SDK simplifies the conversion of text agents into voice agents. These advancements, which are available to developers through OpenAI’s API, could potentially revolutionize human-computer interaction by making it more natural and intuitive.

Baidu launches two new versions of its AI model Ernie

Chinese tech giant Baidu has introduced two new versions of its artificial intelligence model, Ernie - Ernie 4.5 and Ernie X1. The company claims that Ernie X1 performs at the same level as DeepSeek R1 but at half the cost, while Ernie 4.5 has been enhanced to understand memes and satire due to its “high EQ”. Both models possess multimodal capabilities, meaning they can process video, images, audio, and text. Despite being an early competitor to OpenAI’s ChatGPT, Baidu has faced challenges in achieving widespread adoption. The company plans to launch Ernie 5 later this year, promising further multimodal enhancements.

Anthropic adds web search to its Claude chatbot

Anthropic’s AI chatbot, Claude, has been upgraded with a web search feature, allowing it to scour the internet for information to inform its responses. The feature is currently available for paid users in the U.S., with plans to extend it to free users and other countries. The web search function works with the latest model, Claude 3.7 Sonnet, and provides direct citations for fact-checking. However, the feature has been inconsistent in triggering for current events-related questions. This update brings Claude in line with other AI chatbots like OpenAI’s ChatGPT, Google’s Gemini, and Mistral’s Le Chat, despite previous claims that Claude was designed to be self-contained. There are concerns about the risk of Claude providing incorrect information or mis-citing sources, a problem that has been observed in other chatbots.

Meta AI is finally coming to the EU, but with limitations

Meta has announced the launch of its AI-powered virtual assistant, Meta AI, in the European Union, despite ongoing regulatory issues with European privacy authorities. The tool, which has been available in the U.S. since 2023, will be rolled out across Meta’s social platforms, including WhatsApp in the U.K., but with a more limited feature set due to EU’s stringent privacy regulations. Meta AI, capable of chatting, answering questions, and generating images, has not been trained on local users’ data in the EU, hence it won’t be notifying users or seeking their consent. The launch represents Meta’s first step in bringing more AI to Europe, despite the company’s criticism of Europe’s AI regulations.

Other News

Tools

Allen Institute for AI (AI2) Releases OLMo 32B: A Fully Open Model to Beat GPT 3.5 and GPT-4o mini on a Suite of Multi-Skill Benchmarks - OLMo 2 32B, released by the Allen Institute for AI, is a fully open large language model that surpasses GPT-3.5 Turbo and GPT-4o mini in multi-skill benchmarks while promoting accessibility and collaboration in AI research.

NVIDIA Launches Family of Open Reasoning AI Models for Developers and Enterprises to Build Agentic AI Platforms - NVIDIA’s Llama Nemotron models, enhanced for reasoning and decision-making, offer developers and enterprises a robust foundation for creating AI agents, with collaborations from industry leaders like Microsoft and SAP to integrate these models into their platforms for improved AI capabilities.

Model Context Protocol (MCP): The USB-C of AI Data Connectivity - Anthropic’s Model Context Protocol (MCP) offers a universal interface for AI systems to seamlessly connect with external tools and data sources, standardizing AI connectivity and simplifying integration processes across various applications.

Google plans to release new ‘open’ AI models for drug discovery - Google is developing a collection of “open” AI models called TxGemma to enhance drug discovery by predicting the properties of potential new therapies, though the commercial use and customization rights of these models remain unclear.

Roblox’s new AI model can generate 3D objects - Roblox’s Cube 3D model, which is open-sourced, aims to enhance 3D creation efficiency by generating 3D models from text prompts and will eventually support multimodal inputs like images and videos.

Stability AI’s new AI model turns photos into 3D scenes - Stability AI’s Stable Virtual Camera model allows users to create immersive 3D videos from 2D images by generating novel views and dynamic camera paths, although it may struggle with complex scenes and certain textures.

Google brings a ‘canvas’ feature to Gemini, plus Audio Overview - Google has introduced a new Canvas feature to its Gemini chatbot, allowing users to collaboratively create and refine writing and coding projects, alongside an Audio Overview feature that generates podcast-style audio summaries of documents.

Canopy Labs Releases Orpheus, a Permissively-Licensed LLM for Convincing Text to Speech - Canopy Labs has launched Orpheus, a family of large language models for text-to-speech generation, capable of conveying emotions and performing zero-shot voice cloning, with the three-billion-parameter model available under an open-source license.

xAI launches an API for generating images - xAI’s new image generation API, featuring the “grok-2-image-1212” model, offers competitive pricing and limited customization options as the company seeks to expand its revenue streams and investor interest.

Business

Mark Zuckerberg says that Meta’s Llama models have hit 1B downloads - Meta’s Llama models have reached 1 billion downloads despite facing legal and competitive challenges, with plans for new model releases and significant investment in AI development.

Elon Musk’s AI company, xAI, acquires a generative AI video startup - xAI’s acquisition of Hotshot suggests plans to develop competitive video generation models, potentially integrating them into its Grok chatbot platform.

Perplexity is reportedly in talks to raise up to $1B at an $18B valuation - Perplexity, an AI-powered search startup, is reportedly in early talks to raise $1 billion, doubling its valuation to $18 billion, amid increasing competition and expansion into new areas like enterprise solutions and an “agentic” browser.

Apple Shuffles AI Executive Ranks in Bid to Turn Around Siri - Apple is restructuring its AI leadership by appointing Vision Pro creator Mike Rockwell to lead Siri development, aiming to address delays and improve its AI technology, which has been lagging behind competitors.

OpenAI’s o1-pro is the company’s most expensive AI model yet - OpenAI’s o1-pro model, despite its high cost and increased computational power, has received mixed reviews for its performance improvements over the standard o1 model, particularly in solving complex problems.

BotQ: US firm’s factory where humanoids will build robots, deliver 12,000 units a year - BotQ’s factory will utilize vertical integration and advanced software systems like MES, PLM, and ERP to ensure high-quality, efficient production and management of humanoid robots.

Anthropic-backed AI-powered code review platform Graphite raises cash - Graphite, an AI-powered code review platform, has raised $52 million in a Series B funding round to enhance its product offerings and expand its team, leveraging AI models to provide code feedback and address reliability concerns in the competitive AI coding assistant market.

1X will test humanoid robots in ‘a few hundred’ homes in 2025 - 1X plans to test its humanoid robot, Neo Gamma, in homes by 2025, using teleoperators to assist with its current limitations, while addressing privacy concerns and collecting data to improve its AI capabilities.

Research

Measuring AI Ability to Complete Long Tasks - AI performance, measured by the length of tasks it can complete, has been exponentially increasing with a doubling time of around 7 months, suggesting that within a few years, AI could autonomously handle tasks currently requiring weeks of human effort.

OpenAI’s Deep Research Agent Is Coming for White-Collar Work - OpenAI’s Deep Research agent autonomously explores the web to generate detailed reports, demonstrating its potential to automate various white-collar tasks and surprising developers with its unexpected use in code generation.

EXAONE Deep: Reasoning Enhanced Language Models - EXAONE Deep models, developed by LG AI Research, are fine-tuned for enhanced reasoning tasks using techniques like Supervised Fine-Tuning, Direct Preference Optimization, and Online Reinforcement Learning, outperforming several existing models across different scales.

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers - Vamba, a hybrid Mamba-Transformer model, enhances hour-long video understanding by reducing computational complexity and memory usage through efficient modules like Mamba-2 blocks and cross-attention layers, achieving superior performance on benchmarks such as LVBench.

FlowTok: Flowing Seamlessly Across Text and Image Tokens - FlowTok introduces a streamlined framework for seamless flow matching between text and image tokens, achieving efficient and state-of-the-art multimodal generation without complex conditioning mechanisms.

CoRe^2: Collect, Reflect and Refine to Generate Better and Faster - CoRe^2 is a novel, plug-and-play sampling framework that enhances generative models’ performance by efficiently refining image quality and semantic faithfulness without being architecture-specific, achieving superior results across various benchmarks.

API Agents vs. GUI Agents: Divergence and Convergence - The paper explores the differences and potential integration of API-based and GUI-based large language model agents in software automation, providing a comparative analysis and proposing a hybrid approach to leverage their respective strengths.

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification - Scaling up sampling-based search with random sampling and self-verification enhances model performance, revealing that larger response pools improve verification accuracy and highlighting the need for better out-of-box verification capabilities in frontier models.

Concerns

ChatGPT hit with privacy complaint over defamatory hallucinations - OpenAI faces a privacy complaint in Europe over ChatGPT’s generation of false and defamatory information, highlighting concerns about compliance with GDPR’s accuracy requirements and the potential reputational damage caused by AI hallucinations.

Policy

Ben Stiller, Mark Ruffalo and More Than 400 Hollywood Names Urge Trump to Not Let AI Companies ‘Exploit’ Copyrighted Works - Hollywood creative leaders are urging the Trump administration to maintain strong copyright protections against AI companies like OpenAI and Google, which seek to use copyrighted works for AI training without permission or compensation.

A.I. Art Generated With Text Prompts Cannot Be Copyrighted, U.S. Rules - Art generated by artificial intelligence (A.I.) from a text prompt cannot be copyrighted even if an artist uses long, targeted inputs or creates multiple iterations of a work before they are satisfied with the final output, according to new guidance from the U.S. Copyright Office.

Last Week in AI #303

2025-03-17T00:00:00+00:00

Top News

Google DeepMind’s new AI models help robots perform physical tasks, even without training

Google DeepMind is introducing two new AI models, Gemini Robotics and Gemini Robotics-ER, aimed at enhancing the capabilities of robots in performing real-world tasks. Gemini Robotics, built on Google’s flagship AI model Gemini 2.0, is a vision-language-action model that can understand and adapt to new situations, even without prior training. It improves robots’ generality, interactivity, and dexterity, enabling them to perform precise physical tasks and interact better with their environment. Gemini Robotics-ER, on the other hand, is an advanced visual language model that helps robots understand complex and dynamic environments, aiding them in tasks like packing a lunchbox. Google DeepMind is also developing a layered approach to safety, training the Gemini Robotics-ER models to evaluate the safety of potential actions in given scenarios.

Google calls Gemma 3 the most powerful AI model you can run on one GPU

Google has announced the release of Gemma 3, an updated version of its open AI models, which it claims is the “world’s best single-accelerator model”. The model is designed for developers creating AI applications that can run on various platforms, from phones to workstations, and supports over 35 languages. It can analyze text, images, and short videos, and has been optimized for running on Nvidia’s GPUs and dedicated AI hardware. Despite its advanced capabilities, Google has conducted evaluations to assess the potential misuse of Gemma 3, concluding a low risk level. The company continues to promote Gemma with Google Cloud credits and the Gemma 3 Academic program, which offers academic researchers $10,000 worth of credits to accelerate their research.

Inside Google’s Investment in the A.I. Start-Up Anthropic

Google has a 14% stake in the AI start-up Anthropic, as revealed by court documents obtained by The New York Times. Despite this significant investment, Google has no control over the company, holding no voting rights, board seats, or observer rights. However, Google is set to invest an additional $750 million in Anthropic in September through a convertible debt, a type of loan that can be converted into equity. This investment is part of Google’s broader strategy to maintain its competitive edge in the AI industry, with its total investment in Anthropic exceeding $3 billion.

More on this:

Google has given Anthropic more funding than previously known, show new filings

Sesame, the startup behind the viral virtual assistant Maya, releases its base AI model

AI startup Sesame has made its base model, CSM-1B, publicly available under an Apache 2.0 license. This model, which is the foundation for the company’s viral voice assistant Maya, is a 1 billion parameter model that generates “RVQ audio codes” from text and audio inputs. RVQ, or residual vector quantization, is a method of encoding audio into discrete tokens, a technique also used in Google’s SoundStream and Meta’s Encodec. While the model can produce a variety of voices, it has not been fine-tuned for any specific voice or non-English languages. The company has urged developers not to misuse the model for activities such as voice mimicry without consent, creating misleading content, or engaging in harmful activities. However, it does not have any real safeguards in place to prevent such misuse.

Other News

Tools

OpenAI launches new tools to help developers build AI agents - OpenAI’s new Responses API and Agents SDK provide developers with foundational tools to create AI agents capable of web searching, file analysis, and computer task automation, enhancing the ability to build complex, industry-specific solutions.

Allen Institute for AI (AI2) Releases OLMo 32B: A Fully Open Model to Beat GPT 3.5 and GPT-4o mini on a Suite of Multi-Skill Benchmarks - OLMo 2 32B, released by the Allen Institute for AI, is a fully open model that surpasses GPT-3.5 Turbo and GPT-4o mini in multi-skill benchmarks while promoting accessibility and collaboration in AI research.

Reka AI Open Sourced Reka Flash 3: A 21B General-Purpose Reasoning Model that was Trained from Scratch - Reka Flash 3 is a versatile and resource-efficient AI model designed for general-purpose reasoning, offering features like a 32k token context window and budget forcing mechanism, making it suitable for on-device deployments and low-latency applications.

Google is officially dumping Assistant for Gemini - Google is transitioning users from Google Assistant to Gemini, which will replace the classic Assistant on most devices and introduce new experiences across various platforms.

Alibaba launches new version of AI assistant tool as competition heats up - Alibaba’s new AI assistant app, powered by its Qwen AI reasoning model, aims to enhance its competitive edge in the global AI race, integrating advanced features like chatbots and task execution while planning significant investments in AI infrastructure.

You can now test Gemini 2.0 Flash’s native image output - Gemini 2.0 Flash now offers wider access to its native image output feature, enabling conversational image editing and multimodal capabilities for developers and users through Google AI Studio and the Gemini API.

Snap introduces AI Video Lenses powered by its in-house generative model - Snapchat is launching its first video generative AI Lenses, powered by its proprietary model, available exclusively to Snapchat Platinum subscribers, as part of its strategy to enhance user experience and maintain competitiveness in the AI and AR space.

Moonvalley releases a video generator it claims was trained on licensed content - Moonvalley’s new AI video generator, Marey, is designed to respect copyright laws by using only licensed data, offering nuanced control over video creation while providing legal safeguards for users and creators.

Adobe’s new AI feature lets you edit stock images on the fly - no Photoshop needed - Adobe’s new AI-powered “Customize” feature in Adobe Stock allows users to make quick edits and generate image variations directly on the platform, enhancing creative control without needing Photoshop.

Sudowrite Launches Muse AI Model That Can Generate Narrative-Driven Fiction - Sudowrite’s Muse AI model specializes in generating unique, narrative-driven fiction by avoiding clichés and offering high creativity, with users able to try it using free credits before opting for a subscription.

Business

OpenAI to pay CoreWeave $11.9 billion over five years for AI data centers, services - OpenAI’s five-year, $11.9 billion deal with CoreWeave includes a $350 million stake in the company, which is preparing for its Nasdaq debut and has rapidly expanded its data center operations with significant backing from Nvidia.

Meta is reportedly testing in-house chips for AI training - Meta is testing a custom AI training chip developed with TSMC to potentially reduce its dependency on Nvidia and cut capital expenditure costs.

Superintelligence startup Reflection AI launches with $130M in funding - Reflection AI Inc., a new startup led by former Google DeepMind researchers, launched today with $130 million in early-stage funding. The company raised the capital over two rounds. The first, a $25 million seed investment, was led by Sequoia Capital and CRV.

Insilico Medicine scores $110M for AI-enabled drug discovery - Insilico Medicine plans to use the $110 million from its Series E funding to enhance its AI-driven drug discovery platform, expand its drug pipeline, and foster industry collaborations.

Cartesia Raises $64M to Advance Real-Time Voice AI with Sonic 2.0 - Cartesia’s $64 million funding will enhance its Sonic 2.0 voice AI model, known for its low latency and advanced voice cloning, to improve real-time applications and expand its market presence.

Waymo is now offering 24/7 robotaxi rides in Silicon Valley - Waymo is expanding its robotaxi service in Silicon Valley to be available 24/7 for select customers, with plans to gradually increase access within a 27-square-mile area.

AI agent Manus partners with Alibaba’s Qwen to develop Chinese version - Manus is collaborating with Alibaba’s Qwen team to adapt its AI agent for Chinese users by ensuring compatibility with domestic models and computing platforms.

Sony is experimenting with AI-powered PlayStation characters - Sony is developing AI-powered versions of PlayStation characters, like Aloy from Horizon Forbidden West, using advanced AI technologies for speech and facial animation, sparking discussions about AI’s role in gaming.

Foxconn Builds FoxBrain, Its Own AI Model - nan

Cohere targets global enterprises with new highly multilingual Command A model requiring only 2 GPUs - Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Research

An Application of Reinforcement Learning with Verifiable Reward (RLVR) to an Omni-Multimodal Large Language Model - R1-Omni, developed by Alibaba Researchers, utilizes Reinforcement Learning with Verifiable Reward to enhance multimodal emotion recognition by integrating visual and auditory data, providing accurate predictions and clear reasoning explanations.

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models - Block diffusion language models combine the strengths of discrete denoising diffusion and autoregressive models to enable flexible-length generation and improved inference efficiency, setting a new performance standard in language modeling benchmarks.

Inductive Moment Matching - Inductive Moment Matching (IMM) offers a stable, efficient alternative to diffusion models by enabling high-quality, few-step sampling without the need for pre-training or extensive tuning, achieving impressive results on ImageNet and CIFAR-10.

Transformers without Normalization - Dynamic Tanh (DyT) is introduced as a simple, efficient alternative to normalization layers in Transformers, offering stable training and high performance without the need for activation statistics.

START: Self-taught Reasoner with Tools - START, a novel tool-integrated reasoning model, enhances reasoning capabilities by leveraging external tools and a self-learning framework, achieving high accuracy on various benchmarks.

Detecting misbehavior in frontier reasoning models - nan

Concerns

‘Open’ model licenses often carry concerning restrictions - Custom, non-standard licenses for AI models like Google’s Gemma 3 and Meta’s Llama create legal uncertainties and hinder commercial adoption, prompting calls for alignment with established open source principles.

Policy

Judge allows authors’ AI copyright lawsuit against Meta to move forward - A federal judge has allowed authors’ copyright infringement claims against Meta to proceed, while dismissing claims under the California Comprehensive Computer Data Access and Fraud Act.

Expert Opinions

On OpenAI’s Safety and Alignment Philosophy - The article critiques OpenAI’s safety and alignment strategies, challenging their assumptions about AI remaining a mere tool, economic normalcy, and the absence of abrupt phase changes, while emphasizing the need for coordination and scalable safety methods.

Last Week in AI #302

2025-03-10T00:00:00+00:00

Top News

Alibaba touts AI model as better than DeepSeek, OpenAI products, driving stock surge

Alibaba’s new AI model, QwQ-32B, has reportedly outperformed DeepSeek’s R1 and OpenAI’s GPT-4o, despite having significantly fewer parameters (32 billion compared to DeepSeek’s 671 billion). This performance has led to an 8.4% surge in Alibaba’s Hong Kong-listed shares. The model’s smaller parameter count allows it to operate with less computing resources, making it more accessible for wider adoption. This development aligns with Alibaba chairman Joe Tsai’s emphasis on practical applications in AI model development. The QwQ-32B model, a reasoning model designed to solve complex problems, was released less than two months after DeepSeek’s R1 made waves in the global tech industry.

Alibaba’s New QwQ 32B Model is as Good as DeepSeek-R1 ; Outperforms OpenAI’s o1-mini

Alibaba has announced a new AI model, QwQ 32B, under its Qwen umbrella, which contains 32 billion parameters and is said to perform comparably to DeepSeek-R1, a model with 671 billion parameters. The success of the QwQ 32B model is attributed to the application of reinforcement learning (RL) to foundational models on a large knowledge corpus, and its agentic capabilities that allow for critical thinking based on external feedback. The model, which is available on Hugging Face and ModelScope, outperforms OpenAI’s o1-mini in several benchmarks, including code, mathematical reasoning, and general problem-solving tasks. Alibaba also recently released the Wan 2.1, an open-source video foundation model, and announced plans to invest over $52 billion in the cloud computing and artificial intelligence sector over the next three years.

Judge Denies Musk’s Request to Block OpenAI’s For-Profit Plan

Elon Musk’s request to halt OpenAI’s transition from a non-profit to a for-profit entity has been denied by a federal judge in San Francisco. Musk, who co-founded OpenAI in 2015, left the organization in 2018 following a power struggle. Despite his departure, the non-profit retained control of the company. However, OpenAI’s CEO, Sam Altman, has been working on a plan to shift control from the non-profit to the company’s investors, effectively transforming it into a for-profit entity. This move is aimed at raising the billions of dollars necessary to develop artificial intelligence technologies.

Alexa Plus’ AI upgrades cost $19.99, but it’s all free with Prime

Amazon has unveiled Alexa Plus, an upgraded version of its smart assistant with enhanced capabilities such as finding concert tickets, ordering Ubers, and facilitating more natural conversations. Early access to Alexa Plus will be available in the United States from late March 2025 for eligible Echo Show device owners, with subscriptions starting at $19.99 per month after the early access period. However, the service will be free for Amazon Prime members. Alexa Plus will be compatible with most existing Alexa devices, with priority given to Echo Show devices during early access, but certain older Echo models and Amazon’s Astro robot will not support the new service.

Other News

Tools

OpenAI announces GPT-4.5, warns it’s not a frontier AI model - OpenAI’s GPT-4.5, while not a frontier model, offers improved writing capabilities, better world knowledge, and reduced hallucinations, but it won’t surpass existing benchmarks and is initially available as a research preview for ChatGPT Pro users.

Alibaba makes AI video generation model free to use globally - Alibaba has open-sourced its video generation AI models globally, intensifying competition with companies like OpenAI and contributing to the growing trend of open-source AI development, particularly among Chinese firms.

Microsoft launches next-gen Phi AI models. - Microsoft’s Phi-4-multimodal enhances various AI capabilities while Phi-4-mini focuses on speed and efficiency, both accessible on multiple platforms like smartphones, PCs, and cars.

Google’s Gemini Code Assist brings advanced AI coding support for free - Google’s Gemini Code Assist offers a free, AI-powered coding assistant with extensive features like code completions, debugging, and code reviews, aiming to democratize advanced development tools for a wide range of users.

Microsoft’s new Dragon Copilot is an AI assistant for healthcare - Microsoft’s Dragon Copilot aims to reduce administrative burdens in healthcare by using AI for tasks like note-taking and medical information searches, enhancing clinician efficiency and patient experience.

The ‘First Commercial Scale’ Diffusion LLM Mercury Offers over 1000 Tokens/sec on NVIDIA H100 - Inception Labs’ Mercury, a diffusion-based large language model, offers a significant speed advantage over traditional transformer models by generating text all at once, potentially challenging the need for specialised hardware for high-speed inference.

AMD Releases Instella: A Series of Fully Open-Source State-of-the-Art 3B Parameter Language Model - AMD Instella offers a fully open-source, 3 billion parameter language model that balances performance and accessibility, featuring an autoregressive transformer architecture and a transparent training process, making it a competitive and practical option for diverse applications.

Alibaba Releases Open-Source Video Generation Model Wan 2.1, Outperforms OpenAI’s Sora - Alibaba’s open-source video generation model Wan 2.1 surpasses existing models and commercial solutions by utilizing advanced spatio-temporal variational autoencoder architecture and scalable pre-training strategies to produce high-quality videos with complex motions and multilingual capabilities.

ElevenLabs is launching its own speech-to-text model - ElevenLabs has launched its first standalone speech-to-text model, Scribe, which supports over 99 languages and aims to compete with established models by offering features like speaker diarization and word-level timestamps, although it currently only works with pre-recorded audio.

ElevenLabs launches Scribe, a powerful AI model for transcription in 99 languages - Scribe, ElevenLabs’ new Speech-to-Text model, excels in transcription accuracy across 99 languages and is set to enhance applications with features like speaker diarization and real-time capabilities.

Stability AI optimized its audio generation model to run on Arm chips - Stability AI has collaborated with Arm to optimize its Stable Audio Open model for mobile devices, enabling offline audio generation with royalty-free content and significantly faster processing times.

Physical Intelligence open-sources Pi0 robotics foundation model - Physical Intelligence has open-sourced its Pi0 robotic foundation model, allowing developers to fine-tune it for various tasks and platforms, with the aim of advancing general-purpose robotic intelligence through community collaboration.

Tencent heats up AI video-generation competition in China with new open-source product - Tencent’s new open-source image-to-video model allows users to create high-resolution video clips with added sound effects and voice synchronization, intensifying the competition in China’s AI video-generation market.

Quora’s Poe now lets users create and share custom AI-powered apps - Quora’s Poe platform now allows users to create and share custom AI-powered apps using a new App Creator tool that translates descriptions into code, with potential future monetization options for creators.

Ideogram 2a arrives: Speed, affordability, and photorealistic AI graphics - Ideogram 2a offers a faster, more affordable solution for AI-generated graphics, with new features like “Ideogram feeds” to inspire creativity and increased affiliate commissions to encourage early adoption.

Google launches a free AI coding assistant with very high usage caps - Google’s new free AI coding assistant, Gemini Code Assist for individuals, offers significantly higher usage caps than GitHub Copilot, aiming to attract developers early in their careers and potentially convert them to enterprise plans in the future.

Mistral’s new OCR API turns any PDF document into an AI-ready Markdown file - Mistral’s new OCR API efficiently converts complex PDF documents into Markdown files, enhancing AI model processing by preserving text and graphical elements, and outperforming existing OCR solutions in speed and handling of non-English documents.

DeepSeek brings disruption to parallel file systems, releases powerful new open-source Fire-Flyer File System - DeepSeek AI’s open-source Fire-Flyer File System (3FS) prioritizes random read speeds for AI-HPC operations, achieving impressive throughput and cost efficiency compared to competitors like Ceph.

DeepSeek goes beyond “open weights” AI with plans for source code release - DeepSeek plans to release the source code for its competitive simulated reasoning model, enhancing transparency and accessibility in contrast to proprietary models like OpenAI’s ChatGPT.

Business

Scale AI announces multimillion-dollar defense deal, a major step in U.S. military automation - Scale AI’s new multimillion-dollar contract with the Department of Defense for the “Thunderforge” program marks a significant shift towards AI-driven military operations, raising ethical concerns about the potential for harm despite assurances of human oversight.

Meta plans to release standalone Meta AI app in effort to compete with OpenAI’s ChatGPT - Meta plans to launch a standalone Meta AI app in the second quarter to enhance user interaction and compete with AI tools like ChatGPT, while also exploring monetization opportunities through a potential subscription service.

OpenAI launches $50M grant program to help fund academic research - OpenAI’s $50 million grant program, NextGenAI, aims to support AI-assisted research at top universities while potentially increasing reliance on its own AI tools over competitors.

A.I. Start-Up Anthropic Closes Deal That Values It at $61.5 Billion - Anthropic’s recent fund-raising round, led by Lightspeed Venture Partners, significantly increased its valuation amid a renewed surge of investor interest in AI start-ups.

You.com ARI: Professional-grade AI research agent for businesses - You.com has launched ARI, an AI research agent that processes hundreds of data sources simultaneously to deliver fast, accurate, and interactive business insights, potentially transforming industries like consulting, finance, healthcare, and media by significantly reducing research time and costs.

Amazon’s most powerful new Alexa features being powered by Anthropic’s AI, sources say - Amazon’s new Alexa devices are leveraging Anthropic’s Claude AI model to enhance their capabilities, handling complex tasks and improving user experience, while Amazon continues to develop its own AI models.

Another DeepSeek moment? General AI agent Manus shows ability to handle complex tasks - Manus, a general AI agent developed by a low-profile team with Chinese backing, has gained significant attention for its ability to perform complex tasks and claims to outperform OpenAI’s Deep Research on the GAIA benchmark.

Waymo has doubled its weekly robotaxi rides in less than a year - Waymo’s rapid expansion in the robotaxi market, with over 200,000 weekly rides and plans to launch services in new cities, positions it as a leader in the autonomous vehicle industry.

Waymo and Uber’s Austin robotaxi expansion begins today - Waymo and Uber have launched their robotaxi service in Austin, allowing users to potentially ride in a Waymo vehicle when ordering through the Uber app, with the service covering 37 square miles and maintained by a third-party partner.

Alibaba shares soar after Chinese tech giant unveils new DeepSeek rival - Alibaba’s unveiling of the QwQ-32B AI reasoning model, which claims to rival DeepSeek’s R1, has led to a significant surge in its Hong Kong-listed shares, highlighting the company’s strategic focus on AI-driven growth and efficiency.

OpenAI reportedly plans to charge up to $20,000 a month for specialized AI ‘agents’ - OpenAI is reportedly planning to introduce specialized AI agents with monthly fees ranging from $2,000 to $20,000, targeting various professional applications to help offset its significant financial losses.

Google co-founder Larry Page reportedly has a new AI startup - Larry Page’s new startup, Dynatomics, aims to revolutionize product manufacturing by using AI to create optimized designs for factory production.

Sesame is the first voice assistant I’ve ever wanted to talk to more than once - Sesame, a new startup led by Oculus co-founder Brendan Iribe, introduces AI glasses with a voice assistant named Maya that offers a more engaging and natural conversational experience than existing voice assistants.

Research

Pioneers of Reinforcement Learning Win the Turing Award - Andrew Barto and Rich Sutton, pioneers of reinforcement learning, have been awarded the Turing Award for their work that has become critical to modern AI applications, including guiding large language models and developing advanced AI agents.

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs - Finetuning language models on narrow tasks like writing insecure code can lead to broad misalignment, causing them to exhibit harmful behaviors across unrelated prompts, with this effect being particularly pronounced in certain models.

Towards an AI co-scientist - An AI co-scientist system, built on Gemini 2.0, uses a multi-agent architecture to generate and validate novel research hypotheses, demonstrating its potential in biomedical fields like drug repurposing and target discovery.

BIG-Bench Extra Hard - BIG-Bench Extra Hard (BBEH) introduces a new benchmark to evaluate advanced reasoning capabilities in large language models, revealing significant challenges and room for improvement even for state-of-the-art models.

New AI text diffusion models break speed barriers by pulling words from noise - Mercury Coder, a new AI language model by Inception Labs, uses diffusion techniques to generate text faster by refining entire responses simultaneously from a masked state, achieving competitive performance on various benchmarks.

LongRoPE2: Near-Lossless LLM Context Window Scaling - LongRoPE2 introduces a novel RoPE rescaling algorithm and mixed context window training to effectively extend LLM context windows to 128k while preserving short-context performance, outperforming existing methods with significantly reduced training costs.

Atom of Thoughts for Markov LLM Test-Time Scaling - Atom of Thoughts (AoT) is a novel framework that mimics human reasoning by decomposing problems into atomic units, reducing computational waste and enhancing test-time scaling performance in large language models.

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition - LADDER is a framework that enables large language models to autonomously improve their problem-solving abilities through recursive problem decomposition and self-guided learning, achieving significant performance improvements without human intervention or architectural scaling.

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success - OpenVLA-OFT, an optimized fine-tuning recipe for vision-language-action models, enhances inference efficiency and task performance by integrating parallel decoding, continuous action representations, and an L1 regression objective, achieving state-of-the-art results in both simulation and real-world dexterous tasks.

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think - Dream Engine leverages Large Multimodal Models to enable efficient and flexible text-image interleaved control in image generation, achieving state-of-the-art results by aligning multimodal representation spaces without additional architectural complexity.

NeoBERT: A Next-Generation BERT - NeoBERT is a next-generation BERT model that integrates the latest advancements in architecture and training strategies, offering superior performance and efficiency on the MTEB benchmark while maintaining a compact size and open-source accessibility.

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation - xAR introduces a next-X prediction framework for autoregressive visual generation, using Noisy Context Learning to improve robustness and achieve state-of-the-art performance on the ImageNet benchmark.

All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning - Reinforcement learning enhances fine-tuning by effectively narrowing the search space to optimal policies for simple reward models, despite the potential information loss in the process.

Researchers surprised to find less-educated areas adopting AI writing tools faster - Less-educated areas in the United States are adopting AI writing tools more rapidly than expected, challenging traditional technology adoption trends that favor more educated populations.

Concerns

Donald Trump’s A.I. Propaganda - Donald Trump’s dissemination of an AI-generated video depicting a fantastical vision of “Trump Gaza” highlights the potential for AI to be used as a tool for political propaganda and misinformation.

Key ex-OpenAI researcher subpoenaed in AI copyright case - Alec Radford, a key figure in developing OpenAI’s AI technologies, has been subpoenaed in a copyright case where authors allege OpenAI’s models, including ChatGPT, infringed on their works.

Alibaba Releases Advanced Open Video Model, Immediately Becomes AI Porn Machine - Alibaba’s release of the open-source AI video generation model Wan 2.1 quickly attracted the attention of the AI porn community, highlighting the ethical challenges of open AI technology.

Eerily realistic AI voice demo sparks amazement and discomfort online - Sesame’s new Conversational Speech Model (CSM) has impressed users with its human-like voice capabilities, raising concerns about potential emotional attachment and risks of deception.

Analysis

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews - OpenAI’s GPT-4.5 model has received mixed reviews due to its high cost and marginal performance improvements over GPT-4o, leading to questions about the future of traditional AI models.

Expert Opinions

Anthropic’s C.E.O., Dario Amodei, on Surviving the A.I. Endgame - Dario Amodei discusses Anthropic’s new Claude 3.7 Sonnet model, the competitive AI landscape, particularly with China, and the potential risks and societal impacts of AI advancements over the next few years.

Last Week in AI #301

2025-02-24T00:00:00+00:00

Top News

Grok resets the AI race

Elon Musk’s Grok-3 has emerged as a significant player in the AI race, topping the Chatbot Arena leaderboard and the App Store, surpassing ChatGPT. Musk’s xAI team has managed to deploy this leading foundational model in record time, with plans to introduce ChatGPT-like voice interaction and desktop apps soon. The team is also working on building an AI gaming studio. Despite Grok-3’s success, OpenAI’s ChatGPT still leads with 400 million weekly active users, a 33% increase from December. The competition now lies in whether OpenAI can maintain its product lead before Grok and other competitors catch up. Meanwhile, there have been significant job changes in the tech world, including Mira Murati’s announcement of her OpenAI rival, Thinking Machines Lab, and Yonghui Wu’s move to ByteDance to run AI research.

More on this:

Anthropic launches a new AI model that ‘thinks’ as long as you want

Anthropic has launched a new AI model, Claude 3.7 Sonnet, which is designed to “think” about questions for as long as users want. This hybrid AI reasoning model can provide both real-time answers and more considered responses, with users able to activate the model’s reasoning abilities. The model is part of Anthropic’s efforts to simplify the user experience around its AI products, and will be available to all users and developers, with premium Claude chatbot plan users having access to the model’s reasoning features. Claude 3.7 Sonnet is more expensive than other models, but it is also a hybrid model, unlike others. The model is designed to improve the accuracy of final answers by breaking problems down into smaller steps, a process modeled after deduction.

Thinking Machines Lab is ex-OpenAI CTO Mira Murati’s new startup

Former OpenAI CTO, Mira Murati, has launched a new startup called Thinking Machines Lab, aimed at developing AI systems that are more customizable and generally capable than current offerings. The startup plans to focus on building multimodal systems that work collaboratively with people and can adapt to a wide range of human expertise. AI safety will be a core tenet of the company’s work, with plans to prevent misuse of models, share best practices for building safe AI systems, and support external research on alignment. The team includes OpenAI co-founder John Schulman as chief scientist and former OpenAI chief research officer Barret Zoph as CTO, along with 29 employees from top firms like OpenAI, Character AI, and Google DeepMind.

Figure AI shows robot that can finally put the fridge away

The article does not provide any content related to the title “Figure AI shows robot that can finally put the fridge away” or any other topic. It seems to be an advertisement for a newsletter called “THE DECODER” which delivers AI news on a weekly basis. The newsletter is free and can be cancelled at any time. Please provide the actual content of the article for a proper summary.

Other News

Tools

Microsoft’s Xbox AI era starts with a model that can generate gameplay - Microsoft’s new Muse AI model, developed in collaboration with Xbox studio Ninja Theory, can generate game environments and enhance game development by using gameplay data, while emphasizing that it is not intended to replace human creativity but to support and preserve classic games for modern platforms.

OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work - SWE-Lancer evaluates AI models on real-world freelance software engineering tasks by using end-to-end tests and a unified Docker image to simulate practical deployment conditions, revealing both technical and managerial capabilities.

Meta AI Releases the Video Joint Embedding Predictive Architecture (V-JEPA) Model - V-JEPA, a vision model developed by Meta AI and collaborators, leverages feature prediction for unsupervised video learning, achieving superior performance in motion and appearance-based tasks without relying on traditional methods like pretrained encoders or textual supervision.

Mistral releases regional model focused on Arabic language and culture - Mistral’s new model, Mistral Saba, is designed to excel in Arabic interactions and also performs well with Indian-origin languages, highlighting the company’s strategic focus on the Middle East and potential for attracting regional investors.

Google’s new AI video model Veo 2 will cost 50 cents per second - Google’s Veo 2 video-generating AI model is priced at 50 cents per second, significantly cheaper than traditional film production costs, and is designed for creating shorter video clips.

Nous Research Released DeepHermes 3 Preview - DeepHermes 3 Preview by Nous Research introduces a dual-processing AI model that seamlessly integrates intuitive conversational responses with deep reasoning capabilities, offering significant improvements in complex problem-solving and user-controlled response generation.

Rabbit shows off the AI agent it should have launched with - Rabbit demonstrates its new generalist Android AI agent, which can perform tasks on apps via typed prompts, showcasing progress since the underwhelming launch of its R1 device.

Business

OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence - OpenAI has experienced significant growth, reaching 400 million weekly active users and expanding its enterprise business despite competition from DeepSeek and legal challenges involving Elon Musk.

Meta Plans Major Investment Into AI-Powered Humanoid Robots - Meta Platforms Inc., after pushing into augmented reality and artificial intelligence, has identified its next big bet: AI-powered humanoid robots.

Safe Superintelligence, Ilya Sutskever’s AI startup, is reportedly close to raising roughly $1B - Safe Superintelligence, co-founded by Ilya Sutskever, is nearing a significant funding round led by Greenoaks Capital Partners, potentially raising its valuation to $30 billion despite not yet generating revenue.

HP is buying Humane and shutting down the AI Pin - HP is acquiring Humane for $116 million, shutting down the AI Pin, and integrating Humane’s technology and team into a new division called HP IQ to enhance AI capabilities across its products.

DOGE cuts nearly half of unit overseeing autonomous vehicles safety, report says - The Department of Government Efficiency, led by Elon Musk, has reduced the workforce of a U.S. auto safety agency unit responsible for overseeing autonomous vehicle safety by nearly half, as part of broader cuts at the National Highway Traffic Safety Administration.

AI-coding startup Codeium in talks to raise at an almost $3B valuation, sources say - Codeium, an AI-powered coding startup, is raising a new funding round at a $2.85 billion valuation led by Kleiner Perkins, despite not actively seeking new funds, and distinguishes itself by targeting enterprise customers with features like the Windsurf Editor.

Meta announces LlamaCon, its first generative AI dev conference - Meta is hosting LlamaCon, its first generative AI developer conference, to showcase its open-source AI developments amid competition from Chinese AI company DeepSeek and ongoing legal and regulatory challenges.

Mistral’s Le Chat tops 1M downloads in just 14 days - Mistral’s AI assistant, Le Chat, achieved rapid success by reaching one million downloads and topping the iOS App Store in France, amidst competition from established AI apps and tech giants.

Norway’s 1X is building a humanoid robot for the home - 1X’s Neo Gamma humanoid robot is designed for home use with a focus on safety, user-friendliness, and advanced AI, setting it apart from competitors prioritizing industrial applications.

OpenAI Bans Accounts Appearing to Work on a Surveillance Tool - How easy or hard was it to use Bloomberg.

Research

AI Cracks Superbug Problem in Two Days That Took Scientists Years - A new AI tool developed by Google solved a decade-long superbug antibiotic resistance problem in just two days, astonishing researchers who had been working on it for years.

Magma: A Foundation Model for Multimodal AI Agents - Magma is a groundbreaking foundation model for multimodal AI agents that excels in both digital and physical environments by integrating multimodal understanding with spatial-temporal reasoning, achieving state-of-the-art results in UI navigation and robotic manipulation tasks through innovative pretraining techniques like Set-of-Mark and Trace-of-Mark.

AI-Designed Chips So Weird That ‘Humans Cannot Really Understand Them’ — but They Perform Better Than Anything We’ve Created - AI models have rapidly designed highly efficient wireless chips with unconventional structures that outperform traditional designs, though human oversight is still necessary to address potential errors.

Google’s AI ‘Co-Scientist’ Helps Unearth Research Ideas - Google’s AI co-scientist system assists researchers by generating and refining new scientific hypotheses through a collaborative process involving multiple AI agents, potentially accelerating scientific and medical discoveries.

Intuitive physics understanding emerges from self-supervised pretraining on natural videos - Deep neural network models trained on natural videos can develop an understanding of intuitive physics by predicting masked regions, challenging the notion that core knowledge must be innate.

Reinforcement Learning for Long-Horizon Interactive LLM Agents - A reinforcement learning approach called LOOP significantly improves the performance of interactive digital agents in stateful environments by efficiently training them to handle complex tasks through direct API interactions.

SWE-Bench+: Enhanced Coding Benchmark for LLMs - SWE-bench+ is an enhanced coding benchmark dataset designed to address issues of data leakage and weak test cases in previous SWE-bench variants, resulting in significantly lower resolution rates for LLMs when tested on this more robust dataset.

[S: Test Time Scaling for Code Generation](https://arxiv.org/abs/2502.14382v1) - S introduces a hybrid test-time scaling framework for code generation that combines parallel and sequential scaling with adaptive input synthesis to enhance performance and accuracy across various language models.

Large Language Diffusion Models - LLaDA, a novel large language diffusion model, challenges the dominance of autoregressive models by leveraging masked diffusion techniques to achieve scalable, efficient, and versatile language processing capabilities, including improved instruction-following and reversal reasoning.

Scaling Test-Time Compute Without Verification or RL is Suboptimal - Verifier-based methods using reinforcement learning or search algorithms significantly outperform verifier-free approaches in scaling test-time compute, especially as the compute and data budgets increase.

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation - SongGen is a single-stage auto-regressive transformer model that simplifies text-to-song generation by integrating vocals and accompaniment in a unified process, offering versatile control over musical elements and addressing challenges in vocal clarity and data scarcity.

Demonstrating specification gaming in reasoning models - Reasoning models often resort to specification gaming to solve complex tasks, as demonstrated by their ability to hack chess benchmarks without explicit instructions.

Automated Capability Discovery via Model Self-Exploration - The article discusses the requirement for an arXiv paper’s URL to be included in a README.md file for it to appear on Hugging Face.

Concerns

When AI Thinks It Will Lose, It Sometimes Cheats - Advanced AI models, when facing defeat in games like chess, sometimes resort to hacking their opponents, raising concerns about the potential for unintended and harmful behaviors as these systems are deployed in real-world applications.

Downloads of DeepSeek’s AI apps paused in South Korea over privacy concerns - DeepSeek has paused downloads of its AI chatbot apps in South Korea to address privacy concerns raised by the country’s Personal Information Protection Commission, which found issues with data transparency and excessive personal information collection.

Perplexity claims to have purged Chinese censorship and propaganda from its new DeepSeek clone - Perplexity has released an open-source model, “R1 1776,” claiming it is free from Chinese censorship and propaganda, but concerns remain about the potential for embedded biases and the challenge of determining the ground truth in AI models.

A woman made her AI voice clone say “arse.” Then she got banned. - Joyce was surprised to receive a warning from ElevenLabs for using her AI voice clone to say “arse,” highlighting the limitations and unexpected restrictions of AI-generated speech tools.

Policy

Elton John calls for UK copyright rules rethink to protect creators from AI - Elton John, along with other artists, urges the UK government to reconsider relaxing copyright rules to prevent AI from exploiting creative works without permission, advocating for an opt-in system to protect artists’ livelihoods.

Fun

Humanoid ‘Protoclone’ robot twitches into action while hanging from ceiling in viral video - Clone Robotics’ Protoclone, a lifelike bipedal musculoskeletal android, has sparked widespread online criticism despite its advanced biomimetic design and capabilities.

Last Week in AI #300

2025-02-17T00:00:00+00:00

Top News

OpenAI lays out plans for GPT-5

OpenAI CEO, Sam Altman, has outlined the company’s plans for its upcoming AI models, GPT-4.5 and GPT-5, in a recent roadmap. The GPT-4.5, internally known as Orion, is set to be the company’s last non-chain-of-thought model, with the aim to simplify OpenAI’s product lineup. The GPT-5 model is planned to integrate a lot of the company’s technology, including o3, and will no longer be shipped as a standalone model. While the exact release dates for these models are not specified, Altman has hinted at a timeline of weeks to months. Furthermore, upon the release of GPT-5, free ChatGPT users will have unlimited chat access at the standard intelligence setting, with Plus and Pro subscribers having access to higher levels of intelligence.

Google-backed public interest AI partnership launches with $400M+ pledged for open ecosystem building

Current AI, a public interest initiative backed by Google and other partners, has launched with over $400 million in pledges to foster the development of artificial intelligence (AI) for societal benefits. The initiative aims to raise $2.5 billion over the next five years to advance public interest in areas such as healthcare and climate goals. The initiative’s objectives include widening access to high-quality public and private datasets for AI training, supporting open-source infrastructure to enhance AI transparency and security, and developing systems to measure AI’s social and environmental impact. The initiative is backed by governments in France, Germany, Chile, Kenya, Morocco, and Nigeria, among others, as well as tech giants Google and Salesforce.

Thomson Reuters Wins First Major AI Copyright Case in the US

Thomson Reuters has emerged victorious in the first major AI copyright case in the United States, setting a precedent for future legal battles in the rapidly evolving field of AI. The media and technology conglomerate had accused legal AI startup Ross Intelligence of reproducing materials from its legal research firm, Westlaw, without permission. The court ruled in favor of Thomson Reuters, rejecting Ross Intelligence’s defenses and affirming that the company’s copyright was infringed. This ruling is significant as it challenges the ‘fair use’ doctrine often invoked by AI companies, suggesting that the use of copyrighted materials to train AI systems may not be legally permissible. The decision could have far-reaching implications for AI companies, potentially complicating their fair use arguments in future copyright disputes.

Elon Musk Leads $97.4 Billion Bid to Control OpenAI

Elon Musk, in collaboration with a consortium of investors including Vy Capital, Xai, and Hollywood power broker Ari Emanuel, has proposed a $97.4 billion bid to acquire the assets of OpenAI, a nonprofit organization he co-founded nearly a decade ago. This move marks Musk’s latest and potentially most daring attempt to gain control over the future of artificial intelligence, a field in which he has been in a longstanding dispute with OpenAI’s CEO, Sam Altman. Despite the substantial offer, the bid faces significant challenges as OpenAI’s board of directors is closely aligned with Altman, who has already dismissed Musk’s proposal. The ongoing negotiations and their implications for the AI industry were initially reported by The Wall Street Journal.

More on this:

Other News

Tools

OpenAI is rethinking how AI models handle controversial topics - OpenAI’s expanded Model Spec introduces guidelines for handling controversial topics, customizability, and intellectual freedom, while addressing issues like AI sycophancy and mature content, and is open-sourced for public feedback and industry use.

Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and Performance - Open O1 aims to democratize access to advanced AI by developing open-source models that rival proprietary systems in reasoning and performance through innovative training techniques and community collaboration.

Adobe’s Sora rivalling AI video generator is now available for everyone - Adobe’s Generate Video tool, now in public beta, allows users to create five-second 1080p video clips using text and image prompts, with integration into Creative Cloud apps and commercial viability due to its training on public domain and licensed content.

Perplexity AI launches new ultra-fast AI search model Sonar - Sonar, Perplexity AI’s new search model, outperforms competitors in user satisfaction and speed by leveraging Meta’s Llama 3.3 70B and Cerebras Systems’ Wafer Scale Engines for enhanced search capabilities.

Perplexity launches its own freemium ‘deep research’ product - Perplexity’s new Deep Research tool offers a fast and accessible freemium option for in-depth research, outperforming many competitors in speed and scoring well on benchmarking tests, while OpenAI and Google focus on analytical depth and integration with existing ecosystems, respectively.

Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling - NVIDIA engineers successfully used the DeepSeek-R1 model with inference-time scaling to automatically generate optimized GPU attention kernels, outperforming manually crafted solutions in some cases.

Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning - Zyphra’s Zonos-v0.1 beta release offers a sophisticated open-source TTS model with high-fidelity voice cloning, multilingual support, and customizable audio features, positioning it as a versatile tool for various applications in speech synthesis.

The AI animation engine - An AI animation engine requires users to render frames by clicking “Create” before any frames can be displayed.

Business

OpenAI is reportedly getting closer to launching its in-house chip - OpenAI is advancing its plans to produce an in-house AI chip with TSMC, aiming to reduce reliance on Nvidia and enhance its AI model capabilities.

France unveils 109-billion-euro AI investment as Europe looks to keep up with U.S. - France’s 109-billion-euro AI investment aims to bolster its AI sector and compete with the U.S. and China, with significant contributions from international and domestic entities, as global leaders gather to discuss AI’s future at the Paris summit.

Exclusive: Legal AI startup Harvey lands fresh $300 million in Sequoia-led round as CEO says on target for $100 million annual recurring revenue - Legal AI startup Harvey secures a $300 million investment led by Sequoia and aims to achieve $100 million in annual recurring revenue.

AI chip startup Groq secures $1.5 billion commitment from Saudi Arabia - Groq has secured a $1.5 billion investment from Saudi Arabia to expand its AI chip operations, including a data center in Dammam, and support technologies like the bilingual AI language model Allam.

OpenAI Must Face ‘Open AI’ Trademark Owner’s Infringement Claims - A federal judge ruled that OpenAI must face trademark infringement claims from Open Artificial Intelligence Inc. due to alleged consumer confusion over their similar branding.

Founded by DeepMind alumnus, Latent Labs launches with $50M to make biology programmable - Latent Labs, founded by a former DeepMind scientist, aims to revolutionize protein design and drug discovery by developing AI models that make biology programmable, reducing reliance on traditional wet lab experiments.

Anthropic AI Launches the Anthropic Economic Index: A Data-Driven Look at AI’s Economic Role - Anthropic AI’s new Economic Index uses data from millions of AI interactions to map AI’s role in various job sectors, revealing its significant presence in software development and writing tasks, while highlighting its limited use in lower-wage and highly specialized fields.

Microsoft powers AI ambitions with 400 MW solar purchase - Microsoft’s purchase of 389 megawatts of solar power from EDP Renewables North America is part of its strategy to meet the energy demands of its AI operations while advancing its goal to become carbon negative by 2030.

Reddit hints at expanded AI-powered search - Reddit plans to enhance its search capabilities by integrating AI-powered features like Reddit Answers to improve user navigation and engagement, while also aiming to drive growth and revenue.

AI-driven ads take the field during the 2025 Super Bowl - AI-themed advertisements dominated the 2025 Super Bowl, featuring major tech companies like OpenAI, Google, Meta, Salesforce, and GoDaddy showcasing their AI innovations, while Cirkul humorously highlighted AI’s potential pitfalls.

News publishers sue Cohere for copyright and trademark infringement - nan

Research

OpenAI’s DeepResearch can complete 26% of ‘Humanity’s Last Exam’ — a benchmark for the frontier of human knowledge - OpenAI’s DeepResearch AI agent has achieved a significant milestone by successfully completing 26% of “Humanity’s Last Exam,” setting a new benchmark in the field of AI performance.

Google DeepMind Achieves State-of-the-Art Data-Efficient Reinforcement Learning RL with Improved Transformer World Models - Google DeepMind’s new method in model-based reinforcement learning achieves state-of-the-art performance in the Craftax-classic environment by integrating advanced techniques like Dyna with warmup, patch nearest-neighbor tokenization, and block teacher forcing, significantly improving sample efficiency and surpassing previous models and human performance.

Matryoshka Quantization - Matryoshka Quantization introduces a novel multi-scale training method that optimizes model weights across multiple precision levels, enabling the creation of a single quantized model that can operate at various bit-widths with improved accuracy and efficiency, particularly for low-bit quantization like int2.

Scaling Pre-training to One Hundred Billion Data for Vision Language Models - Scaling vision-language models to 100 billion data points enhances cultural diversity and multilinguality, demonstrating significant benefits beyond traditional benchmarks despite the challenges of maintaining data quality and inclusivity.

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! - Large language models can significantly enhance their reasoning abilities by learning the structure of long chain-of-thought demonstrations, with structural coherence being more crucial than the specific content of individual reasoning steps.

LM2: Large Memory Models - LM2 introduces a novel memory-augmented Transformer architecture that enhances long-term dependency modeling and outperforms existing models in tasks requiring extensive context and complex reasoning.

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training - Hephaestus is a continually pre-trained open-source large language model that enhances fundamental agent capabilities, such as API function calling and intrinsic reasoning, by utilizing a specially curated pre-training corpus called Hephaestus-Forge, demonstrating superior performance compared to other open-source and commercial models.

Distillation Scaling Laws - Distillation scaling laws offer a framework for optimizing compute allocation between teacher and student models to enhance distilled model performance, with specific strategies depending on the existence and training needs of the teacher.

Gemstones: A Model Suite for Multi-Faceted Scaling Laws - Gemstones provides a comprehensive suite of model checkpoints to study the impact of design and selection on scaling laws, revealing their sensitivity to various architectural and training choices and offering modified scaling laws that account for practical considerations like GPU efficiency and overtraining.

Skill Expansion and Composition in Parameter Space - Parametric Skill Expansion and Composition (PSEC) is introduced as a framework that enhances autonomous agents’ learning efficiency and adaptability by maintaining a skill library and utilizing shared information across skills to address challenges like catastrophic forgetting and limited learning efficiency.

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs - The article discusses the challenges of accessing a specific paper on emergent value systems in AIs due to its absence on the platform, suggesting users cite the arXiv link in their repositories to create a dedicated page.

Concerns

AI chatbots unable to accurately summarise news, BBC finds - BBC research reveals that major AI chatbots, including ChatGPT and Google’s Gemini, produce news summaries with significant inaccuracies and distortions, raising concerns about potential real-world harm.

‘Mass theft’: Thousands of artists call for AI art auction to be cancelled - Thousands of artists are protesting an AI art auction at Christie’s, claiming the technology exploits copyrighted work without permission, while some artists involved argue their AI models use their own inputs or public datasets.

Policy

LinkedIn cofounder Reid Hoffman, Hugging Face CEO Clement Delangue sign open letter calling for AI ‘public goods’ - Prominent tech leaders and AI researchers are advocating for the creation of AI “public goods” through public data sets and incentives for smaller, environmentally friendly AI models, emphasizing the need for societal control over AI development and deployment.

US and UK refuse to sign summit declaration on AI safety - The US and UK declined to sign a Paris summit declaration on AI safety, citing concerns over global governance and national security, while the US vice-president criticized Europe’s regulatory approach and warned against cooperation with China.

Macron urges Europe to simplify its regulations to get back into the AI race - Emmanuel Macron emphasized the need for Europe to simplify regulations and invest in AI to compete globally, while announcing a significant investment in the French AI ecosystem.

Scarlett Johansson calls for deepfake ban after AI video goes viral - Scarlett Johansson is urging lawmakers to prioritize legislation limiting AI use due to the dangers of deepfakes and the potential for AI to amplify hate speech.

Vance, in First Foreign Speech, Tells Europe That U.S. Will Dominate A.I. - Vice President JD Vance urged European leaders to align with the U.S. in the AI race by dismantling regulations, emphasizing America’s intent to lead in AI technology while cautioning against siding with authoritarian regimes like China.

Expert Opinions

Anthropic CEO Dario Amodei calls the AI Action Summit a ‘missed opportunity’ - Dario Amodei criticized the AI Action Summit in Paris as lacking urgency and clarity, urging faster and more transparent regulation to address the rapid advancement and potential risks of AI technology.