Skip to main content
  1. Posts/

Sizing the DGX Spark in Gigapixels

Author
Nate Burr
Building software-defined broadcast infrastructure.
Table of Contents

A DGX Spark fits on a shelf next to a paperback. It costs less than the rack rails on most broadcast appliances. The interesting question is what kind of broadcast work that small box can actually do: how many uncompressed ST2110 streams it can ingest, how many encoded outputs it can produce, and how much ABR work it can sustain before something gives.

There is a clean way to think about this. Every step in a broadcast pipeline (receiving a 2110 stream, encoding an H.264 output, packing an ABR rendition) has a cost that scales with the same three numbers: width × height × framerate. Multiply those together and you have pixels per second. Divide by 10⁹ and you have gigapixels per second (GP/s), a universal yardstick that works across codecs, resolutions, and framerates.

This post sizes the DGX Spark in GP/s. The math is honest enough to plan against, and the unit is portable: once you know what a workload costs in GP/s and what a Spark can spend, sizing turns into arithmetic.

TL;DR. A single DGX Spark delivers roughly 5 GP/s of ST2110 ingest and ~3.5 GP/s of H.264 NVENC encode. That’s around 40 simultaneous 1080p59.94 ingests and ~18 simultaneous four-rung 1080p ABR ladders, or 72 encoded outputs from one book-sized box. A 3-node N+1 HA cluster doubles those numbers (~80 ingests, ~36 ABR ladders, 144 encoded outputs) for ~$14K hardware in a half-rack. The framing is what makes the number portable: count pixels, not channels.

Common broadcast formats, in GP/s
#

For reference, the formats you actually encounter:

FormatResolutionPixels/secGP/s
1080p59.941920 × 1080124.3 M0.124
1080p601920 × 1080124.4 M0.124
720p601280 × 72055.3 M0.055
720p301280 × 72027.6 M0.028
480p30854 × 48012.3 M0.012
360p30640 × 3606.9 M0.007
240p30426 × 2403.1 M0.003
4K UHD p59.943840 × 2160497.7 M0.498
8K UHD p607680 × 43201991.2 M1.991

Key relationships: 4K is a 1080p stream at the same framerate (it’s a linear function of pixel count, not a magic threshold). 8K is 16×. Halving the framerate halves the cost. A 1080p59.94 stream costs essentially the same as 1080p60. Pick whichever the camera produces.

ST2110 ingest: ~5 GP/s on a single Spark
#

The DGX Spark ships with two 200 GbE ConnectX-7 NICs, totaling 400 Gbps of theoretical network capacity. ST2110-20 (uncompressed video) is the heavy traffic. A 1080p59.94 V210 (10-bit 4:2:2) stream is ~2.7 Gbps on the wire.

That puts a naive network ceiling at roughly 74 streams per NIC. In practice, ingest is gated by CPU and memory bandwidth before the network. DPDK packet processing, the RX-to-host-memory copy, and the kernel and pipeline service overhead all consume budget. Working from first principles, a single Spark sustains:

InputGP/s per streamStreams per SparkTotal
1080p59.94 ST2110-200.124~40~5.0 GP/s
4K UHD p59.94 ST2110-200.498~10~5.0 GP/s
8K UHD p60 ST2110-201.991~2–3~5.0 GP/s

The ~5 GP/s ceiling is what matters. Whether you spend it on many small streams or a few large ones, it’s the same budget. ±20% band on these numbers; they are first-principles estimates, not bench data.

ST 2022-7 dual-fabric is built-in. Spark’s two NICs handle red/blue redundancy natively. Every flow is mirrored at the network layer, not in software. The 5 GP/s is the unique-content ceiling, not halved by redundancy.

Unified memory: no GPUDirect required
#

The Spark’s Grace CPU and Blackwell GPU share one physical pool of LPDDR5X memory: a single address space, not two. That changes the ingest story in a way that matters.

On a discrete-GPU broadcast node, the NIC writes ST2110 packets to host RAM, and getting those packets to GPU memory for processing requires GPUDirect RDMA, a separate stack that DMAs straight from NIC to GPU over PCIe, bypassing host memory. It works when NIC firmware, GPU driver, motherboard PCIe topology, and IOMMU configuration all cooperate. When any one of those doesn’t, you eat a host-RAM round-trip and lose the zero-copy benefit.

The Spark has nothing to plumb. A 2110 packet that lands from the NIC is already in memory the GPU reads directly. No DMA dance, no GPUDirect, no NIC-and-GPU vendor compatibility matrix. The V210-to-NV12 conversion runs on the GPU against the buffer the NIC already wrote to. NVENC reads NV12 from the same memory pool. The whole pipeline lives in one address space.

When grains leave the box, MXL ships them over RDMA via DMF: the cross-node story is the same zero-copy shape, just over the wire.

What you don’t pay is the GPUDirect plumbing tax that makes discrete-GPU broadcast hardware a configuration nightmare.

Encoding one output: NVENC at ~3.5 GP/s
#

The Spark has a Blackwell-class NVENC engine (gen 8/9). One NVENC engine handles roughly 25–35 realtime 1080p60 H.264 streams before quality presets bite. That’s the NVENC budget.

Converting to GP/s:

Codec1080p60 streams (single output)GP/s budget
H.26425–353.1–4.4
HEVC (H.265)20–282.5–3.5
AV120–282.5–3.5

Call it ~3.5 GP/s as a working number for H.264. That’s the NVENC budget to spend across single outputs, ABR ladders, and any other realtime encode work.

For a single encoded output, the math is direct:

  • One 1080p60 H.264 encode consumes 0.124 GP/s, negligible against the 3.5 GP/s budget. You have 28× headroom for whatever else you want to do.
  • One 4K UHD p60 H.264 encode consumes 0.498 GP/s, about 14% of the NVENC budget. A single Spark runs ~7 simultaneous 4K H.264 encodes.
  • One 8K UHD p60 H.264 encode consumes 1.991 GP/s, over half the budget. Two of those fills a Spark.

Encoding an ABR ladder: ~18 ladders per Spark
#

The interesting case is adaptive bitrate (ABR) encoding: the same input rendered at multiple resolutions and framerates so a player can adapt to network conditions. ABR is where the GP/s framing pays off, because the cost of a ladder is just the sum of its rungs.

A standard 1080p ABR ladder targeting most consumer playback:

RungResolutionGP/s
1080p601920 × 10800.124
720p601280 × 7200.055
480p30854 × 4800.012
240p30426 × 2400.003
Total per ladder0.194

At 0.194 GP/s per ladder, a single Spark’s 3.5 GP/s NVENC budget delivers ~18 simultaneous ABR ladders. That’s 18 inputs, each producing four output renditions. 72 encoded outputs in realtime, from a $5K book-sized box.

A heavier 4K-input ladder for premium delivery:

RungResolutionGP/s
4K UHD p603840 × 21600.498
1080p601920 × 10800.124
720p601280 × 7200.055
480p30854 × 4800.012
Total per ladder0.689

At 0.689 GP/s per ladder, a single Spark sustains ~5 simultaneous 4K ABR ladders. Twenty outputs. Still in a box that fits on a desk.

Mixing ingest and encode on one box
#

The ingest budget (~5 GP/s) and the encode budget (~3.5 GP/s) are independent: different hardware, different bottlenecks. You can spend both at the same time, with a small memory-bandwidth tax for the V210-to-NV12 conversion between them (~310 MB/s per 1080p59 stream that’s being encoded, not for streams that are only ingested and forwarded).

A realistic mixed configuration on a single Spark:

  • 40 × 1080p59.94 ST2110 ingests (5 GP/s in)
  • 18 of those inputs encoded as full ABR ladders (3.5 GP/s out, 72 encoded outputs)
  • The remaining 22 ingest channels forwarded across the cluster (MXL grains shipped over RDMA via DMF on the second NIC, no encode cost)
  • Plus headroom for per-stream AI workloads. Whisper Large-V3 transcription runs realtime per channel on Spark today, and that’s what powers our multilingual live-captioning demo.

One appliance. One PSU. Two NIC cables.

3-node HA cluster: 2× single-node capacity
#

Single-node math gives you a sizing unit. A 3-node N+1 HA cluster gives you a deployable broadcast facility.

The pattern: three Sparks, Kubernetes scheduling JBS pods across them, MXL grains crossing the cluster over RDMA via DMF on the second NIC. N+1 means any one node can fail and the remaining two carry full workload. Effective cluster capacity is 2× single-node.

Cluster (3-node N+1 HA)GP/sIn broadcast units
ST2110 ingest~10 GP/s~80 × 1080p59.94 streams, or ~20 × 4K UHD
NVENC encode~7 GP/s~36 four-rung ABR ladders (144 encoded outputs in realtime)
Total broadcast capacity~17 GP/s

What the redundancy actually covers:

  • Pod-level failure. Kubernetes reschedules the pod onto a surviving node. MXL’s recent grain history is already in cross-node memory (RDMA via DMF), so the surviving node has the working frame buffer when the pod comes up. Failover happens without dropping the broadcast.
  • Node-level failure. Same mechanism at the scale of all pods on the failed node. The N+1 headroom absorbs the workload without dropping channels.
  • Network fabric failure. ST 2022-7 dual-fabric on the first NIC handles the red/blue plane. Cross-node MXL rides the second NIC. Each plane has independent redundancy.

Hardware cost: three Sparks at ~$14K total. A complete broadcast facility, half-rack form factor, for the price of one specialized appliance.

What the GP/s framing doesn’t capture
#

A capacity number is the start of sizing, not the end. Things that do not show up in GP/s but matter for real deployments:

  • AI workloads: realtime vs batched. The realtime budget is tight: 33.33 ms per frame at 30 fps, 16.67 ms at 60 fps. Whisper transcription fits inside that on Spark today. Most other AI workloads don’t, which forces batching: collect N frames, run them through the model together, accept higher per-frame latency in exchange for throughput. Plan AI per-stream only where you’ve measured inference under budget; everything else is batched by necessity, not preference.
  • Encoder quality presets. The 25–35 streams per NVENC engine assumes balanced presets. Slower presets (better quality) drop the count; faster presets raise it. The GP/s number scales with whichever you choose.
  • HDR, 10-bit, and 4:2:2 paths. Higher bit depths and chroma subsampling factor in beyond pure pixel rate. 10-bit 4:2:2 is roughly 1.5× the cost of 8-bit 4:2:0 at the same resolution. Pure pixel rate is the right first-order approximation; bit depth is a second-order multiplier on top.

Why this matters
#

The capacity-per-dollar curve for broadcast hardware has been roughly flat for a decade. Each generation of dedicated 2110 appliance is incrementally faster than the last, at roughly the same price. The DGX Spark sits an order of magnitude off that curve: ~8.5 GP/s of total broadcast capacity (ingest + encode) in a $5K box, or ~17 GP/s in a $14K three-node N+1 HA cluster. Both come with a Blackwell GPU per node for AI inference on top.

The gigapixel/second framing makes that concrete. It says: don’t argue about channel counts or codec families. Count pixels per second. The Spark’s number is a lot bigger than its price tag, and that’s the story.

Book a 30-min Demo

Capacity numbers are first-principles estimates with a ±20% band, not bench data.