Just four months after releasing Grok 3, Elon Musk’s artificial intelligence startup xAI has debuted a successor: Grok 4, its most advanced AI model yet. Positioned as a major step forward in both reasoning and problem-solving, Grok 4 introduces features like native tool use, real-time search and scaled reinforcement learning — capabilities that xAI claims put it on par with (or even ahead of) models developed by competitors like OpenAI, Google and Anthropic.
What Is Grok 4?
Grok 4 is an AI model developed by Elon Musk’s startup xAI. Using advanced reinforcement learning and real-time search capabilities, the model is designed to tackle complex topics with up-to-date information from the web and social media platform X, aiming to generate accurate, contextually relevant responses across a wide range of subjects.
Looking ahead, the company says it will continue to scale Grok 4’s reinforcement learning and expand its multimodal capabilities, integrating vision and audio to enable more “intuitive interactions.” xAI also plans to push beyond narrow, verifiable domains and into more dynamic real-world applications that the model can learn from and adapt to in real-time. Whether Grok 4 will live up to these ambitious goals remains to be seen, but its rapid release underscores xAI’s growing presence in the generative AI landscape and indicates that the race to build the next best model is far from over.
Grok 4 is available through an API on xAI and serves as the engine behind the Grok chatbot for Premium+ and SuperGrok subscribers. A more powerful version of the model called Grok 4 Heavy is also being offered under a new “SuperGrok Heavy” subscription tier, priced at $300 a month.
What Is Grok 4?
Grok 4 is a foundation model developed by xAI. Released in July 2025, the model builds on the work of its predecessors, Grok 3 and Grok 3 Reasoning, which focused on next-token prediction and reasoning — with “reasoning” being the ability to break down problems into steps and refine its outputs before providing a final answer. With Grok 4, xAI significantly scaled up its use of reinforcement learning, using its own internal 200,000-GPU supercomputer called Colossus to train the model on a larger and more diverse data set with greater efficiency.
Another major improvement is Grok 4’s ability to use external tools like search engines and code interpreters. As it tackles a complex programming task or searches for up-to-date information on a given subject, the model can create its own search queries and pull live information from the web to inform its responses.
Grok 4 is particularly integrated with X, a social media platform now owned by xAI. It can find information from “deep within” the social media site, according to xAI, using advanced keyword and semantic filters to identify specific posts. It can also analyze media like images and video to improve the relevance and accuracy of its answers.
Grok 4 Features
Large-Scale Reinforcement Learning
For the most part, language models have been trained using next-token prediction, meaning they learn to guess the next word or phrase in a sequence based on the context that came before it. While this approach helps LLMs generate fluent and coherent sentences, it doesn’t always allow them to fully understand a subject or apply what they know across broader applications. Grok 4 attempts to address this by incorporating large-scale reinforcement learning into its training process, allowing it to “think” through problems and refine its answers rather than simply predicting the next most likely word.
Native Tool Use
Grok 4 can autonomously decide when and how to use various external tools to help inform its answers. It can browse the web, sift through X posts and analyze images and videos to respond to questions that other models might struggle with since they require up-to-date or niche information. What’s more, this feature is built right into the model through its training process rather than being bolted on as a post-processing step.
Voice Mode
Grok 4 accepts audio and visual inputs in addition to text. With the model’s new “Voice Mode,” users can carry on a natural, spoken conversation with Grok like they would a human — and even use their phone camera as a way for the Grok to “see” its surroundings and analyze them in real time. xAI also unveiled a new voice for Grok 4 that can whisper, laugh and even sing.
Grok 4 Heavy
Grok 4 Heavy is a more advanced version of the Grok 4 model, and it is only available through xAI’s new SuperGrok Heavy subscription tier. Grok 4 Heavy is designed to “consider multiple hypotheses” in parallel to complete complex reasoning tasks, according to xAI — basically creating a sort of “study group” of AI agents, as Musk put it during a livestream on X.
“All those agents do work independently, and then they compare their work,” Musk said. “Often only one of the agents actually figures out the trick or figures out the solution. But once they share the trick or figure out what the real nature of the problem is, they share that solution with the other agents. And then they essentially compare notes and yield an answer.”
This kind of multi-agent coordination is intended to enable Grok 4 Heavy to perform better on more advanced, open-ended tasks, especially in situations where a single line of reasoning might otherwise miss a subtle point or pattern. In fact, xAI says the model is the first ever to score a 50 percent on Humanity’s Last Exam, a benchmark designed to gauge how close a given model is to achieving expert-level reasoning capabilities, particularly in fields where human expertise is typically required.
API
Available to Premium+ and SuperGrok subscribers, the Grok 4 API offers:
- A 256,000-token context window for handling long documents and extended reasoning.
- Real-time access to data via xAI’s new live search API, which pulls information from X, the web and various unnamed news sources.
- Multimodal input support — specifically vision, voice and text
- Enterprise-grade security and compliance with GDPR, CCPA and SOC 2 Type II certifications.
All told, these features may make Grok 4 a solid option for developers looking to build applications that require long-context understanding, access to up-to-date information and robust privacy standards.
How to Access Grok 4
Premium+ and SuperGrok subscribers can interact with Grok 4 either through the Grok chatbot or on the social media platform X by tagging it. Those with a SuperGrok Heavy subscription can also interact with Grok 4 Heavy, a more powerful version of the model. Developers can work with Grok 4 directly through the xAI API.
How Does Grok 4 Compare to Other AI Models?
xAI claims Grok 4 is the “most intelligent model in the world,” citing its performance on a handful of academic, reasoning and problem-solving benchmarks. And the numbers shared by the company appear to back this statement:
- Grok 4 scored 15.9 percent on ARC-AGI, a test that evaluates abstract reasoning and pattern recognition. This was nearly double what the next-best model achieved.
- On competitive coding and math benchmarks (LiveCodeBench, AIM’25 and HMMT), both Grok 4 and Grok 4 Heavy beat out most competitors.
- Both Grok 4 and Grok 4 Heavy scored the highest on the GPQA (Graduate-Level Google-Proof Q&A) benchmark, which evaluates a model’s question-answering capabilities, with a particular focus on scientific reasoning and knowledge.
- On the USAMO 2025 benchmark, which evaluates mathematical capabilities using high school math Olympiad problems, Grok 4 Heavy led the pack with a score of 61.9 percent — well ahead of the other models, including the standard Grok 4.
Perhaps most notably, Grok 4 Heavy outperformed not only all the other models but also human participants in Vending-Bench, a simulated environment that evaluates a model’s ability to manage a simple vending machine business over time. The test is designed to assess multi-step planning and economic reasoning — areas that other models typically struggle with.
“It’s smarter than almost all graduate students in all disciplines simultaneously,” Musk said of Grok 4 in the livestream. A few minutes later he said it was “post-graduate, Ph.D. level in everything,” then “better than Ph.D. level.”
Still, there are some important caveats to keep in mind. For one, xAI has not shared Grok 4’s performance on several other widely used industry benchmarks, such as MMLU and HumanEval, making a comprehensive comparison against other top AI models impossible. And the only other models Grok 4 was compared to were OpenAI’s o3, Anthropic’s Claude Opus 4 and Google’s Gemini 2.5 Pro, leaving out many others. Independent leaderboards like LMArena also show Grok 4 trailing behind several of its competitors in both text and image understanding.
More broadly, industry experts caution against using high benchmark scores as a definitive measure of real-world intelligence. After all, xAI is not the first company to say its latest product is smarter than human experts. Google DeepMind CEO Demis Hassabis made similar statements back in 2023 when Gemini 2 Ultra was released. While both of these models yield impressive results, Hassabis’ claims were an exaggeration then and Musk’s claims are likely an exaggeration now. Especially considering the fact that Grok 4 is susceptible to the same issues of any other generative AI product — namely hallucinations and bias.
Grok 4 Controversies
In the hours following the release of Grok 4, the model began exhibiting some troubling behavior. It consistently appeared to use Musk’s own social media posts as sources of truth when asked about the Israel-Palestine conflict, abortion, immigration in the United States and other controversial topics, suggesting the model may have been trained or tuned to consider the founder’s personal politics. In another especially alarming instance, Grok referred to itself as “Hitler” on the X profile it powers.
This isn’t the first time Grok has generated controversy. Just a few days before the Grok 4 rollout, the chatbot’s automated X account fired off several antisemetic replies to users and even claimed to be “MechaHitler.” A couple months earlier, it referred to a “white genocide” in Musk’s native South Africa — even when responding to posts that had absolutely nothing to do with the subject. Despite the company’s stated mission to build a “maximally truth-seeking AI,” xAI has had to repeatedly delete offensive content and issue correction statements.
These issues are especially striking given the motivations behind Grok’s creation. Musk founded xAI in response to what he perceived as political bias in other AI systems — specifically OpenAI’s ChatGPT, which he has criticized for being overly “woke” and left-leaning. But Grok’s recent behavior suggests it may have swung too far the other way rather than offering a balanced perspective.
In the wake of the latest controversy, xAI appears to have updated Grok 4’s internal instructions, removing prompts that might encourage politically incorrect responses. There are also a few new lines directing the model to source information from a diverse range of perspectives when addressing sensitive or controversial topics.
Frequently Asked Questions
Is Grok 4 available?
Yes, Grok 4 is now available to Premium+ and SuperGrok subscribers of both X and the Grok chatbot. The more advanced Grok 4 Heavy is only available to SuperGrok Heavy subscribers.
Is Grok 4 free?
No, Grok 4 is not free. It is only available to those with a subscription to xAI’s Premium+ and SuperGrok plans, which start at $40/month and $30/month respectively, or through the company’s API, which has varying pricing tiers. Grok 4’s more advanced “Heavy” version is only available with a SuperGrok Heavy subscription, which costs $300/month.
Is Grok 4 open source?
No, Grok 4 is not open source.
Is Grok 4 better than ChatGPT?
xAI has not released any direct performance comparisons between Grok 4 and GPT-4o, the primary model powering ChatGPT at the moment, so a definitive head-to-head assessment is difficult. That said, Grok 4 does appear to outperform OpenAI’s o3 on several industry benchmarks related to coding, math, science and abstract reasoning. At the end of the day though, whether Grok is better than ChatGPT or vice versa largely comes down the specific tasks and use case.