xAI & Perplexity leapfrog the competition

By David Franklin on July 18, 2025 in People Technology

David Franklin, Industry Principal, Artificial Intelligence, Yardi: "The reasoning abilities of modern AI models have become, in many ways, superhuman."

In the world of AI models, it’s usually a game of inches. Anthropic releases a new version of Claude that is ever so slightly better than ChatGPT on X, Y or Z benchmarks, Google releases a version of Gemini that is a tiny bit better on other benchmarks and so on. Rarely do we get to see a step change like what xAI just dropped with Grok 4.

Similarly, Perplexity, best known for their AI-powered search engine, just released their own web browser with agentic AI built right in, allowing for incredible use cases all right from a tool that everyone is already intimately familiar with. This may be the first shot fired in a new round of browser wars.

On top of these amazing developments, there’s a lot more to talk about. Let’s dig in!

xAI

What happened

xAI released Grok 4 the latest version of their flagship LLM that they are claiming is “the most intelligent model in the world.” Grok includes native tool use with real-time search integration and even has an interactive voice mode, similar to ChatGPT. Importantly, Grok 4 has dominated the leaderboards on many of the critical benchmarks, most notably on Humanity’s Last Exam and ARC-AGI 2, the two most challenging benchmarks for LLMs today.

Why it matters

Up until now, xAI was not considered to be at the frontier of LLM labs. Grok 3 was a decent, but unremarkable, model (except for the fact that it was known to go completely off the rails, spouting antisemitic hate speech and referring to itself as “MechaHitler”) and was not competitive with tools from OpenAI, Anthropic or Google. They have since made a massive investment in compute, building Colossus, a 200,000 GPU cluster that allowed them to do 10X the training over their earlier models, with a specific focus on what’s called “reinforcement learning”.

The result is that they have leapfrogged the competition, outperforming everyone else by a significant margin on many benchmarks. Specifically, I’d like to focus a test called “Humanity’s Last Exam” a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. The test consists of 2,500 challenging questions across over a hundred subjects. On this benchmark, Grok 4 was able to achieve a score of 44.4%, which is incredible, as OpenAI’s flagship reasoning model, o3, only scored 24.9%. Wow. For context, your average human would likely not be able to answer any of these difficult questions and a PHD-level human would only be able to answer a few percent of the questions, specifically in their area of expertise.

The key takeaway here is that the reasoning abilities of modern AI models have become, in many ways, superhuman. This will have profound implications for the software that can be built and the way in which you as individuals will interact with it.

Perplexity

What happened

Perplexity, the company best known for their AI powered search tool, has released Comet, an AI integrated web browser. While there have certainly been plug-ins for existing browsers to access tools like ChatGPT for search, this is the first fully AI-native browser.

Why it matters

Many of you may remember the browser wars of the 2000’s where Netscape, Firefox and Opera battled Internet Explorer for supremacy. Until 2008, when Chrome came in to eat everyone’s lunch. Ah, what a time that was … I think we may have seen the first shot fired in a new, AI-centric war that will redefine the way in which we interact with the old world wide web. Will Google be able to pivot quickly enough to stay on top? Time will tell.

Cloudflare

What happened

Cloudflare, the world’s largest connectivity cloud company, will block AI bots from crawling websites without permission by default. Website owners will have the option of allowing bots to crawl their sites and can even monetize the transactions!

As Matthew Prince, co-founder and CEO of Cloudflare puts it, “If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone: creators, consumers, tomorrow’s AI founders and the future of the web itself.”

Why it matters

To quote from Cloudflare’s press release: “For decades, the internet has operated on a simple exchange: search engines index content and direct users back to original websites, generating traffic and ad revenue for websites of all sizes. This cycle rewards creators that produce quality content, with money and a following, while helping users discover new and relevant information. That model is now broken. AI crawlers collect content like text, articles and images to generate answers without sending visitors to the original source, depriving content creators of revenue and the satisfaction of knowing someone is viewing their content. If the incentive to create original, quality content disappears, society ends up losing and the future of the internet is at risk.”

How many of you have experienced Google’s “AI Overview” where your query was answered to your satisfaction and you didn’t need to click through to the source website? It’s happening to me more and more. The paradigm is shifting and this is the answer. Importantly, it’s not just a binary decision. Cloudflare is creating a marketplace that benefits both the content creators and the AI companies who can license clean, rich and well-structured data.

fileAI

What happened

fileAI has launched their agentic AI workflow automation platform. Their tools allow businesses to automate the processing of documents and the extraction of schema in order to create new workflows to address challenges like AP automation or general ledger reconciliation.

As they describe it: “Secure, scalable AI automation for any industry. From standard AP/AR workflows, to complex insurance claims processing, fileAI leverages agentic AI to automate any manual business process, across any industry.”

Why it matters

It sounds really impressive, right? Unfortunately, I’m going to use this as a counter example for why AI is not necessarily the be all, end all for software. I’m fairly technical, and I attempted to use their software to extract data from a couple of files, including an invoice and a lease. After struggling for an hour or so to understand how the software worked, I failed to make meaningful progress and gave up. I suspect that this is a typical pattern with many of you, where a seemingly miraculous new software promises to “automate any manual business process” yet doesn’t live up to the hype.

This is not to say that fileAI is a bad product. I’m sure that in the right hands, with the right training, it could be fantastic. What I want to emphasize is that often times, the best solutions are those that are easiest to implement and adopt.

This is why Yardi is so focused on building single-stack solutions that are pre-integrated with our existing tools and that are simple to use, specifically for the purpose of managing real estate business operations. Smart Lease is a prime example of this. It’s a superior product because it fits seamlessly into the existing lease administration ecosystem that is already part of Yardi Voyager. You can learn how to use it in a few minutes, and you never have to leave the tool that you already know and love.

Thank you all for tuning in. Please subscribe to be notified whenever a new post comes out, and we look forward to seeing you on the next one.

Stay curious, my friends!

Author bio

David is the industry principal of AI at Yardi, where he works closely with the sales team to help clients understand how Yardi’s solutions align with their business needs. A real estate technology, AI and IT guru with deep sales expertise and entrepreneurial roots, David brings decades of experience bridging the gap between technical innovation and real-world application. David’s superpower is making complex technical concepts approachable, interesting and easy to understand — especially for non-technical audiences. When he’s not working, you can find him skiing, rock climbing or racing his Tesla.

Disclaimer

This article is for general information purposes only. The opinions, analysis and commentary expressed are not and cannot be relied on as legal advice, and do not necessarily reflect the views of Yardi Systems, Inc., or any of its affiliates.