SAM 3.1: Faster and More Accessible Real-Time Video Detection
SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning
Update March 27, 2026:
We’ve seen incredible adoption of SAM 3 over the last few months, and during that time, we’ve been working behind the scenes on updates to improve video processing efficiency. Today, we’re pleased to introduce SAM 3.1.
As a drop-in replacement for SAM 3, our updated model delivers a significant boost in video processing efficiency by introducing object multiplexing, which allows the model to track up to 16 objects in a single forward pass. This innovation doubles the processing speed for videos with a medium number of objects, increasing throughput from 16 to 32 frames per second on a single H100 GPU. As a result, SAM 3.1 enables real-time object tracking in complex videos while reducing overall GPU resource requirements, making high-performance applications feasible on smaller, more accessible hardware.
Meta Segment Anything Model 3 (SAM 3) Overview
Meta has released Segment Anything Model 3 (SAM 3), the next-generation unified model for detection, segmentation, and tracking of objects in both images and videos. It supports highly flexible prompts including:
- Text prompts (short open-vocabulary noun phrases, e.g., “striped red umbrella”)
- Exemplar image prompts
- Traditional visual prompts (points, boxes, masks)
This addresses a major limitation of earlier models by enabling promptable concept segmentation — finding and segmenting all instances of a concept, even rare or nuanced ones not in fixed label sets.
Key Improvements
- 2x performance gain on the new Segment Anything with Concepts (SA-Co) benchmark for promptable concept segmentation in images and videos.
- Better accuracy in crowded scenes and interactive tasks compared to previous SAM models and strong baselines (e.g., OWLv2, Gemini 2.5 Pro).
- Fast inference: ~30ms per image (even with 100+ objects) on an H200 GPU; near real-time for video with multiple objects.
- SAM 3.1 update: Introduces multiplexing — processes all tracked objects together in a single pass instead of separate passes per object. This reduces redundant computation, lowers memory usage, and improves efficiency and accuracy, especially in crowded or complex video scenes.
Data & Training Innovation
Meta built a scalable hybrid data engine combining SAM 3, Llama-based AI annotators, and human reviewers. This made annotation ~5x faster for negative prompts and enabled creation of a massive training dataset covering over 4 million unique concepts.
Additional Releases
- SAM 3D: Open-source models and data for 3D object/scene reconstruction and human pose/shape estimation from a single image.
- Segment Anything Playground: A user-friendly web platform where anyone (no coding required) can experiment with SAM 3 for creative edits, annotations, and media modification (e.g., pixelating faces, adding effects, spotlighting objects).
- SA-Co benchmark dataset for community evaluation and research.
- Fine-tuning code and approaches to help users adapt SAM 3 to specific domains.
Real-World Applications
- Facebook Marketplace: “View in Room” feature uses SAM 3/SAM 3D to let users visualize furniture and decor in their own space.
- Creator tools: New effects coming to Instagram’s Edits app (apply dynamic effects to specific people/objects with one tap), Meta AI app (Vibes), and meta.ai.
- Science & Conservation: Powers new public wildlife datasets (SA-FARI for camera traps, FathomNet for underwater imagery) in partnership with Conservation X Labs and others.
Future Directions
SAM 3 performs well on short prompts and common scenarios but can be improved for fine-grained domain-specific concepts (e.g., medical terms) via fine-tuning. It also has room to grow in handling very complex/long prompts and more efficient multi-object video tracking with shared context.
Overall, SAM 3 makes advanced visual understanding more accessible and powerful, with open weights, code, data, and a playground for broad experimentation. It continues Meta’s push to empower creators, researchers, and developers while enabling practical applications in e-commerce, content creation, and scientific monitoring.
Would you like a shorter version, bullet-point takeaways only, or a focus on specific parts (e.g., technical improvements or applications)?

