NVIDIA and Microsoft Reinvent the PC: RTX Spark Runs 120B-Parameter LLMs Locally, Without Cloud

AI Hardware

2026-06-04 · 8 min read

At Computex 2026, NVIDIA and Microsoft unveiled RTX Spark — a superchip bringing 1 petaflop of AI compute and 128GB unified memory into a laptop, capable of running 120B-parameter models entirely locally with no cloud dependency.

TAIPEI / SAN FRANCISCO, 1 June 2026 — At Computex 2026, NVIDIA and Microsoft jointly unveiled their most ambitious collaboration in the history of both companies: the RTX Spark platform, an ARM superchip designed to bring data-centre-class AI compute directly into laptops and compact desktops — and with it, the ability to run 120-billion-parameter language models entirely locally, with no cloud dependency whatsoever.

The announcement triggered an immediate market reaction: NVIDIA shares (NVDA) gained 6.26% on the day of the launch, adding approximately $316.68 billion to the company's market capitalisation, which reached $5.38 trillion.

What RTX Spark is and why it matters

RTX Spark is not a faster laptop GPU. It is an entirely new architecture — a superchip combining a Grace ARM CPU with 20 cores and a Blackwell GPU with 6,144 fifth-generation CUDA cores, connected via an NVLink-C2C bus at 900 GB/s. All of this in a single package, with up to 128GB of unified LPDDR5X memory and 1 petaflop of AI compute.

A 120-billion-parameter model such as Llama or Qwen, with a 1-million-token context window, can run in real time directly on a laptop — with a response latency of under 2 seconds — without any call to an external server.

For comparison: the same models, run today on existing Copilot+ laptops, require dozens of round-trips to Azure, with response times of 10-15 seconds per complex task. On RTX Spark, the live demo at Build 2026 showed a complete workflow — "find the contract from last March, summarise the key clauses and send a revised version to the legal department" — completed in under 2 seconds.

Technical architecture: what makes RTX Spark different

The Grace Blackwell Superchip in consumer form factor

The Grace Blackwell family was until now exclusive to data centres — the DGX and HGX servers. RTX Spark is the first time the full NVIDIA stack (CUDA, TensorRT, OptiX, DLSS, Reflex) arrives in a laptop just 14 millimetres thick and weighing 1.36 kilograms.

Unified memory — eliminating the classic bottleneck

The fundamental problem with running LLMs on consumer hardware has always been memory: a 70B-parameter model in FP16 requires ~140GB of RAM. On traditional architectures, CPU and GPU have separate memory pools, and data transfer between them creates a severe bottleneck.

RTX Spark eliminates this through unified LPDDR5X memory — both the Grace CPU and the Blackwell GPU access the same 128GB pool at 273 GB/s, without the bandwidth penalty of a traditional PCIe bus. The same architectural approach that made Apple Silicon competitive for local inference — but with the full CUDA stack and native Windows compatibility.

Inference performance

NVIDIA announces a 2× improvement over the previous generation for top agentic models, through optimisations in llama.cpp and vLLM. Fifth-generation Tensor Cores support FP4, FP8, FP16 and BF16 — the FP4 format halves model size without significant quality loss.

The Microsoft partnership: Windows becomes an agentic OS

At Microsoft Build 2026, held simultaneously in San Francisco, the company unveiled Windows Copilot Runtime — a new Windows 11 subsystem that gives AI agents secure access to local files, system settings, peripherals and applications, all running on RTX Spark hardware.

NVIDIA OpenShell

OpenShell is NVIDIA's framework for running autonomous agents on Windows, built on top of Microsoft's new OS security primitives. In concrete terms, an AI agent can:

Access files and applications the user explicitly grants it.
Execute multi-step tasks overnight, when the computer is not in active use.
Run entirely locally — data never leaves the machine.

The system includes architectural guardrails: an agent cannot access more than is explicitly granted, and every action is logged in immutable audit trails. The most popular open-source agent projects — Hermes Agent and OpenClaw — are already integrating OpenShell into their native Windows applications.

RTX Spark complete specifications

CPU: Grace ARM, 20 cores, peak efficiency.
GPU: Blackwell RTX, 6,144 CUDA cores, 5th-gen Tensor.
Tensor Cores: FP4 / FP8 / FP16 / BF16.
AI compute: 1 PetaFLOP.
Unified memory: up to 128GB LPDDR5X at 273 GB/s.
CPU-GPU interconnect: NVLink-C2C, 900 GB/s.
Max local LLM: 120B parameters, 1M token context.
Form factors: 14-16" laptop and compact desktop. Minimum thickness 14mm, weight ~1.36 kg.
Display: Tandem OLED, G-SYNC, colour-accurate.
Software stack: CUDA, TensorRT, OptiX, DLSS 4.5, Reflex.
Availability: autumn 2026.

Who is building RTX Spark devices

NVIDIA confirmed at Computex 2026 that eight major manufacturers already have designs in development for autumn 2026. Confirmed for launch: ASUS (ProArt P16, P14, Mini PC), Dell, HP, Lenovo, Microsoft Surface, MSI. Following: Acer, GIGABYTE.

ASUS revealed it is re-architecting Adobe Photoshop and Premiere Pro for RTX Spark, delivering 2× performance versus current versions. Blender will be able to render 90GB+ 3D scenes and edit 12K 4:2:2 video in real time. The Microsoft Surface RTX Spark Dev Box — announced at Build 2026 — is specifically designed for developers.

The competitive context: Intel, AMD, Qualcomm, Apple

vs. Apple Silicon (M4 Ultra): Apple has been the unchallenged leader in local inference on laptops. RTX Spark brings the same unified memory approach, but with the full CUDA stack and the Windows ecosystem — native compatibility with PyTorch, Hugging Face, vLLM, no porting required.

vs. Qualcomm Snapdragon X Elite: RTX Spark offers 5-10× more raw AI compute, at the cost of higher power consumption.

vs. Intel Core Ultra / AMD Ryzen AI: both meet Microsoft's 40 TOPS Copilot+ requirement but are far from 1 petaflop. They remain competitive in gaming and x86 workstation segments where software compatibility is the priority.

Pricing remains unconfirmed. RTX Spark laptops are estimated between $1,500 and $2,500 for entry configurations, with premium models exceeding $3,000.

What this means for companies using AI

Real privacy for sensitive data — contracts, medical records, financial information processed exclusively on-device.
Elimination of cloud API costs for high volumes — a local 120B model means zero token costs for internal use.
AI agents that work offline — complex tasks executed overnight without an active internet connection.
Local fine-tuning on proprietary data — 128GB unified memory allows on-device training of 7B-13B models on internal datasets.

Relevance for the Visual AI Labs ecosystem

The RTX Spark platform directly validates the strategic direction we pursue at Visual AI Labs: building AI systems that run locally or within the company's own infrastructure, without cloud dependency, with data that never leaves the organisation's perimeter.

The solutions we build — intelligent document processing, internal AI agents, assistants trained on proprietary data — are designed with this architecture in mind. RTX Spark means these solutions will now also run on employees' premium workstations or laptops, not just on dedicated servers.

Free assessment — your on-device AI strategy with Visual AI Labs →