Annotate, convert, augment, merge, and train computer vision models — entirely on your machine. No cloud, no accounts, no bill.
Every computer vision researcher knows the loop: wrong format, conversion scripts, class conflicts, cloud costs. VisOS ends that loop.
Roboflow, Scale AI, V7 — pay per image, per seat, per export. Your dataset budget goes to the platform, not the model.
COCO downloaded, YOLO expected. One hour writing conversion scripts. Repeat for every framework switch.
Your proprietary medical, industrial, or defense data going to third-party servers. Compliance risk. Latency. Loss of control.
Annotation in one tool, augmentation in another, training in a notebook. Zero unified workflow, maximum context switching.
VisOS wraps the entire computer vision data workflow in a single, local UI. From raw images to trained model — no cloud, no accounts, no fragmentation.
A complete toolchain built for computer vision practitioners, not cloud vendors.
Six-tool canvas editor: bounding box, polygon, keypoint, brush, select/edit, and SAM Wand. Full undo/redo. Auto-saves.
6 tools · keyboard shortcutsRun inference with any loaded model directly on your dataset. GroundingDINO supports zero-shot annotation via text prompt.
SAM 3 · GroundingDINO · YOLOConvert between any of 13+ formats with one click. Images and annotations stay in sync.
13+ formats · batch conversionToggle-based visual pipeline builder with 30+ transforms. Preview outputs before committing.
30+ transforms · live previewTrain detection, segmentation, and classification models with live loss/accuracy charts and GPU monitoring.
YOLOv8–10 · RF-DETR · live metricsFind exact and near-duplicate images via MD5, pHash, aHash, or CLIP embeddings.
MD5 · pHash · aHash · CLIPTurn video into annotatable datasets. Extract every Nth frame, keyframes on scene change, or manual scrubber selection.
MP4 · AVI · MOV · MKV · WebMRename, merge, delete, or extract classes via a simple table UI. No JSON editing required.
bulk ops · no JSON editingCombine multiple datasets with a class-mapping UI to resolve naming conflicts before the merge.
class mapping · conflict resolution| Tool | Shortcut | Description |
|---|---|---|
| Select / Edit | V | Move, resize, and adjust existing annotations |
| Bounding Box | B | Draw axis-aligned rectangular regions for object detection |
| Polygon | P | Freeform polygon for instance segmentation and irregular shapes |
| Keypoint | L | Place landmark points for pose estimation tasks |
| Brush | R | Freehand paint masks for pixel-level segmentation |
| SAM Wand | auto | Click-based segment anything — auto-activates when a SAM model is loaded |
Toggle any combination, preview samples, then apply.
Annotate, manage, and train models for segmentation, object detection, and keypoint detection.
Every tool in VisOS is built for speed and clarity. Here's what working with your datasets actually looks like.
Auto-detected on load. Convert to any other format with a single click.
Download pretrained weights inside the app or import your own .pt / .onnx files.
Type a text prompt like "red car" or "person holding phone" — GroundingDINO annotates without any training examples.
Exact byte-for-byte duplicates
Visually similar images
Fast approximate similarity
Semantically similar content
Everything in VisOS exists to serve one mission: give CV practitioners complete control over their data and models.
Your images never leave your machine. No uploads, no cloud processing, no third-party servers. Medical, defense, and proprietary datasets stay where they belong.
13+ annotation formats supported. YOLO, COCO, Pascal VOC, TFRecord — auto-detected on load, converted in one click. Never write another conversion script.
No per-image pricing. No seat licenses. No export fees. VisOS is free and open source under AGPL 3.0. Your dataset budget goes to compute, not platforms.
FastAPI backend + Next.js frontend, managed by a single Python process controller. No containers, no environment variables, no configuration.
localhost:3000 — never directly to port 8000. For remote GPU servers: ssh -L 3000:localhost:3000 -L 8000:localhost:8000 user@serverBase URL: http://localhost:8000/api · Interactive Swagger UI at /docs
| Resource | Endpoints |
|---|---|
| Datasets | GET /datasets · POST /load-local · POST /upload · DELETE /{id} |
| Images | GET /images · GET /{image_id} · PUT /annotations |
| Classes | POST /extract-classes · /delete-classes · /merge-classes |
| Conversion | POST /convert · POST /merge · GET /formats |
| Augmentation | POST /augment-enhanced |
| Video | POST /video/extract |
| Duplicates | POST /find-duplicates · /remove-duplicates |
| Models | GET /models · POST /download · POST /import · POST /{id}/load |
| Auto-Annotate | POST /auto-annotate · GET /jobs |
| Training | POST /start · GET /{id}/status · /pause · /resume · /stop |
| System | GET /health |
import requests response = requests.post( "http://localhost:8000/api/training/start", json={ "dataset_id": "my-dataset", "architecture": "yolov8n", "task": "detect", "epochs": 100, "batch_size": 16, "imgsz": 640, "export_format": "onnx" } ) job = response.json() # Poll: GET /training/{job["id"]}/status
requests.post( f"http://localhost:8000/api/datasets/{id}" "/auto-annotate", json={ "model_id": "groundingdino-base", "prompt": "hard hat . safety vest", "confidence": 0.35 } )
Prerequisites: Python 3.10+ and Node.js 18+. First run takes 2–5 min while PyTorch downloads (~1.5 GB).
Clone the repository from GitHub. No submodules, no monorepo complexity.
git clone https://github.com/Dan04ggg/VisOS.git && cd VisOS
One command creates the virtualenv, installs all deps, starts both servers, and opens your browser.
python3 run.py restart
Point VisOS at a local folder or ZIP. Format is auto-detected. Start annotating immediately.
→ http://localhost:3000
python3 run.py start # Start both servers python3 run.py stop # Stop cleanly python3 run.py restart # Full restart python3 run.py restart-back # Backend only python3 run.py restart-front # Frontend only python3 run.py status # Show PIDs and ports python3 run.py logs # Tail live output
Common issues and how to fix them fast.
Python failed to start. Check .logs/backend.log. Common causes: Python < 3.10, port 8000 in use, missing OpenCV system dependency.
Normal — PyTorch is large. Check .logs/backend.log to watch pip progress. Allow 2–5 minutes on first run.
Auto-detection looks for specific files (data.yaml, instances_train.json, Annotations/*.xml). Match the folder structure exactly.
Lower batch size (try 4–8), reduce image size to 320, or switch to a smaller arch like yolov8n. Check VRAM with nvidia-smi.
Run python3 run.py stop. macOS/Linux fallback: lsof -ti:3000 | xargs kill -9. Windows: netstat -ano | findstr :3000.
Run npm install manually in the project root, then python3 run.py restart-front.
ssh -L 3000:localhost:3000 -L 8000:localhost:8000 user@serverNo sign-up. No cloud. No credit card. Clone, run, and own your entire computer vision pipeline from day one.