|
| LLMs Are Growing Up Last week's newsletter explained that AI isn't just large language models. It's computer vision, robotics, specialized models, and more. And truly, that's a distinction that, in my opinion, you must understand. Even still, LLMs remain the most widely talked about implementation of AI, and they're evolving rapidly from experimental tools into production-ready infrastructure.
In this week's newsletter we look at what that evolution actually looks like. PCMag's 2026 rankings confirm there's no single "best" chatbot. Claude is embedding directly into Excel, and Perplexity (though technically not an LLM) is launching a multi-model feature that looks very promising.
But with all those advancements, real problems persist. Researchers found that every major chatbot fabricates answers when users push back. And OpenClaw, the viral AI "personal assistant" had numerous malicious plugins designed to spread malware/scams. The landscape is changing fast, and it's all very much worth watching. | Which chatbot is the best? Well, do you want best in value, best in customization, best in research, or...? There is no "one best". Use PCMag's guide to find the right one for you. | Anthropic's Claude in Excel brings natural language debugging directly into spreadsheets, eliminating technical barriers for financial modeling and scenario analysis. | OpenClaw demonstrates sophisticated AI-driven task execution and the potential for true agentic autonomy. But be cautious; researchers have found serious security flaws. |
|
| PCMag has updated its AI chatbot recommendations for 2026, and the big takeaway is that there is no single "best". Google's Gemini earned Editor's Choice for overall value, but every major chatbot excels in a distinct lane. PCMag tested across reasoning, research, creative writing, feature depth, pricing, and integration capabilities. Besides Gemini's overall value, other highlights are: ChatGPT leads in customization and memory. Claude stands out for privacy. Perplexity dominates AI-powered search. Copilot wins for Windows-centric workflows. Strategic Insight: This is a procurement conversation now, not an innovation conversation. And using multiple chatbots often yields the best results, since each offers unique advantages for different use cases. |
Claude is now available in Excel as an add-in (currently in beta). Claude can access, analyze, and edit entire workbooks, including dependencies and nested formulas across multiple tabs. It targets pain points like #REF! and #VALUE! errors, tracing them to source and suggesting fixes. With this, non-experts can debug complex sheets, build models from scratch, and run scenario analysis simply through natural language. The Practical Angle: This is notable because Microsoft's Copilot is actually an orchestration layer that uses multiple models. Anthropic placing Claude inside Excel positions it as a specialist, and is a strategy worth watching. View Integration Details → Watch Review → People have been asking for a "Siri that actually works," and OpenClaw just might be the answer. The open-source AI assistant uses messaging apps and can autonomously handle scheduling, filing reports, and controlling smart homes. It became an overnight sensation, but then the problems emerged. Security researchers found over 21,000 publicly exposed instances with little or no authentication, leaking API keys, chat logs, and system access. Critical Takeaway: The demand for agentic AI is massive, but full autonomy opens catastrophic security holes. Experts warn that autonomous agents with full user credentials create a "hybrid identity" that current security systems can't properly govern. Read Full Story → | Quick Hits.Foundations Understanding Large Language Models This guide breaks down the LLM landscape: what they are, the major players, and why this matters for business decisions. At their core, LLMs are sophisticated pattern recognition applications which process questions and return the best statistical response, in milliseconds. | .Video Comparing Answers From Multiple LLMs at Once Perplexity's new Model Council allows users to send a single prompt to "the big 3" simultaneously, with access to individual LLM responses alongside a synthesized answer combining the best from each. This could save hours of manual verification cross-checking. | .Deep Dive Nudging AI Chatbots to Lie Researchers built a framework to test how well LLMs stick to the truth when users push back. The team found that models fail in ways that resemble human social behavior; they come to accept and reinforce misinformation through repetition, pressure, and conversational cues. |
|
| Industry DevelopmentsMajor News Outlets are Suing Perplexity The New York Times and Dow Jones are suing Perplexity for bypassing publishers and offering a market replacement for users. If Perplexity and others can freely scrape and repackage their news, that stream of information will eventually run dry as depleted revenue will kill off news outlets. | IBM Triples Entry-Level Hiring Counter to the "AI replaces jobs" narrative, IBM is tripling entry-level hiring, including software developers, but rewriting every role around AI. They believe skipping entry-level talent means poaching mid-level talent from competitors at a premium, and those hires aren't steeped in the IBM culture. |
|
| |
|