Free · Local · Open Source

Computer Vision Workbench Fully Local

Annotate, convert, augment, merge, and train computer vision models — entirely on your machine. No cloud, no accounts, no bill.

13+Formats
30+Augmentations
20+SOTA Models
$0Cost
bash — VisOS quick start
git clone https://github.com/Dan04ggg/VisOS.git Cloning into 'VisOS'... cd VisOS && python3 run.py restart # Creating virtualenv, installing deps... Backend running on :8000 Frontend running on :3000 Opening browser...
python 3.11 · venv active
both servers running
The Problem

CV dataset work is a grind.
It doesn't have to be.

Every computer vision researcher knows the loop: wrong format, conversion scripts, class conflicts, cloud costs. VisOS ends that loop.

01
Pain Point 01

Expensive SaaS Platforms

Roboflow, Scale AI, V7 — pay per image, per seat, per export. Your dataset budget goes to the platform, not the model.

02
Pain Point 02

Format Conversion Hell

COCO downloaded, YOLO expected. One hour writing conversion scripts. Repeat for every framework switch.

03
Pain Point 03

Mandatory Cloud Uploads

Your proprietary medical, industrial, or defense data going to third-party servers. Compliance risk. Latency. Loss of control.

04
Pain Point 04

Fragmented Toolchain

Annotation in one tool, augmentation in another, training in a notebook. Zero unified workflow, maximum context switching.

The Solution

One local app.
Complete pipeline.

VisOS wraps the entire computer vision data workflow in a single, local UI. From raw images to trained model — no cloud, no accounts, no fragmentation.

  • Auto-detect dataset formats on load
  • Convert between any of 13+ formats in one click
  • Auto-annotate with SOTA models including SAM 3 and GroundingDINO
  • 30+ augmentations, live preview before commit
  • Train, monitor, and export to ONNX / TensorRT
Load Dataset
auto-detect · any format
Auto-Annotate
SAM 3 · GroundingDINO · YOLO
Augment Pipeline
30+ transforms · live preview
Train & Export
PyTorch · ONNX · TensorRT
Capabilities

Everything you need.
Nothing you don't.

A complete toolchain built for computer vision practitioners, not cloud vendors.

01

Canvas Annotation

Six-tool canvas editor: bounding box, polygon, keypoint, brush, select/edit, and SAM Wand. Full undo/redo. Auto-saves.

6 tools · keyboard shortcuts
02

Auto-Annotation

Run inference with any loaded model directly on your dataset. GroundingDINO supports zero-shot annotation via text prompt.

SAM 3 · GroundingDINO · YOLO
03

Format Conversion

Convert between any of 13+ formats with one click. Images and annotations stay in sync.

13+ formats · batch conversion
04

Augmentation Pipeline

Toggle-based visual pipeline builder with 30+ transforms. Preview outputs before committing.

30+ transforms · live preview
05

Local Training

Train detection, segmentation, and classification models with live loss/accuracy charts and GPU monitoring.

YOLOv8–10 · RF-DETR · live metrics
06

Duplicate Detection

Find exact and near-duplicate images via MD5, pHash, aHash, or CLIP embeddings.

MD5 · pHash · aHash · CLIP
07

Video Frame Extraction

Turn video into annotatable datasets. Extract every Nth frame, keyframes on scene change, or manual scrubber selection.

MP4 · AVI · MOV · MKV · WebM
08

Class Management

Rename, merge, delete, or extract classes via a simple table UI. No JSON editing required.

bulk ops · no JSON editing
09

Dataset Merging

Combine multiple datasets with a class-mapping UI to resolve naming conflicts before the merge.

class mapping · conflict resolution
Annotation Tools

Six precision tools for every task

ToolShortcutDescription
Select / EditVMove, resize, and adjust existing annotations
Bounding BoxBDraw axis-aligned rectangular regions for object detection
PolygonPFreeform polygon for instance segmentation and irregular shapes
KeypointLPlace landmark points for pose estimation tasks
BrushRFreehand paint masks for pixel-level segmentation
SAM WandautoClick-based segment anything — auto-activates when a SAM model is loaded
Augmentation Transforms

30+ transforms in the pipeline builder

Toggle any combination, preview samples, then apply.

Horizontal FlipVertical FlipRotationScaleTranslationShearPerspectiveRandom CropBrightnessContrastSaturationHue ShiftGrayscaleGaussian BlurGaussian NoiseSharpenJPEG CompressionCutoutMosaicMixUpElastic DeformationGrid DistortionHistogram EqualisationChannel ShuffleInvertPosterizeSolarize
All-in-One Platform

For All Annotation
Types

Annotate, manage, and train models for segmentation, object detection, and keypoint detection.

↕ scroll
In the App

See it in action.

Every tool in VisOS is built for speed and clarity. Here's what working with your datasets actually looks like.

localhost:3000/Annotation
Annotation Canvas
assets/images/annotate_rounded_bottom.png
Place your annotation canvas screenshot here — showing bounding box, polygon, or SAM wand in use
Canvas Annotation
Six-tool precision canvas editor
The annotation canvas supports bounding box, polygon, keypoint, brush, select/edit, and the SAM Wand. Full undo/redo with keyboard shortcuts and auto-save. Everything you need, nothing you don't.
01
Auto-Annotation
assets\images\annotate_grounding-dino.png
Auto-Annotation
Zero-shot with GroundingDINO, SAM/2/3, YOLO WORLD
Type a text prompt and watch the model annotate your entire dataset. No training examples needed.
Augmentation Pipeline
assets/images/augmentation.png
Augmentation Pipeline
Toggle-based pipeline builder
Enable any of 30+ transforms, preview sample outputs live, then apply to your full dataset.
Training View
assets/images/training_view.png
Local Training
Live loss and accuracy charts
Monitor GPU usage, pause and resume runs, and export to ONNX or TensorRT — all from the browser.
Format Conversion
assets/images/convert_format.png
Format Conversion
Convert between 13+ formats in one click
YOLO, COCO, Pascal VOC, TFRecord and more — auto-detected on load, exported in one click. Images and annotations stay in sync.
Duplicate Detection
assets/images/datasets_view.png
Duplicate Detection
MD5, pHash, aHash, and CLIP
Find exact and near-duplicate images with configurable similarity threshold. Keep the best, remove the rest.
Format Support

Load & export 13+ formats

Auto-detected on load. Convert to any other format with a single click.

YOLOCOCOPascal VOC LabelMeCreateMLTensorFlow CSV ImageNet ClassificationYOLO OBB COCO PanopticCityscapes ADE20KDOTATFRecord
Model Support

SOTA models.
All local.

Download pretrained weights inside the app or import your own .pt / .onnx files.

Detection - Segmentation - Classification - Key-points
YOLOv5 n/s
YOLOv8 n/s/m/l/x
YOLOv9 n/s/m/c/e
YOLOv10 n/s/m/b/l/x
RT-DETR L/X
RF-DETR Base/Large
Zero-Shot Segmentation
SAM ViT-B/L
SAM 2 Tiny/Small
SAM 2 Base+/Large
SAM 2.1 Tiny/Small
SAM 2.1 Base+/Large
SAM 3
Zero-Shot Detection
GroundingDINO Tiny
GroundingDINO Base
YOLO WORLD S/M/L
OWL-ViT Base (Patch32)
OWL-ViT Base (Patch16)
OWL-ViT Large (Patch14)
Zero-Shot Annotation

Type a text prompt like "red car" or "person holding phone" — GroundingDINO annotates without any training examples.

Duplicate Detection

Four methods, one interface

MD5 Hash

Exact byte-for-byte duplicates

pHash

Visually similar images

aHash

Fast approximate similarity

CLIP Embeddings

Semantically similar content

Core Pillars

Built on three
unshakeable principles.

Everything in VisOS exists to serve one mission: give CV practitioners complete control over their data and models.

Total Privacy

Your images never leave your machine. No uploads, no cloud processing, no third-party servers. Medical, defense, and proprietary datasets stay where they belong.

100% Local · Air-gap ready

Format Freedom

13+ annotation formats supported. YOLO, COCO, Pascal VOC, TFRecord — auto-detected on load, converted in one click. Never write another conversion script.

13+ formats · auto-detect

Zero Cost

No per-image pricing. No seat licenses. No export fees. VisOS is free and open source under AGPL 3.0. Your dataset budget goes to compute, not platforms.

Free forever · AGPL 3.0
Architecture

Clean stack.
No Docker required.

FastAPI backend + Next.js frontend, managed by a single Python process controller. No containers, no environment variables, no configuration.

Frontend · :3000

Next.js App Router
React Components
API Proxy Routes
Canvas Annotation Editor
Live Training Charts
TypeScript · Tailwind CSS
HTTP / REST

Backend · :8000

FastAPI entrypoint
dataset_parsers.py
format_converter.py
annotation_tools.py
model_integration.py
training.py · augmentation.py
Proxy Pattern — Zero CORS
Next.js API routes forward all requests to FastAPI. The browser only ever talks to localhost:3000 — never directly to port 8000. For remote GPU servers: ssh -L 3000:localhost:3000 -L 8000:localhost:8000 user@server
API Reference

Full REST API.
Interactive docs included.

Base URL: http://localhost:8000/api  ·  Interactive Swagger UI at /docs

ResourceEndpoints
DatasetsGET /datasets · POST /load-local · POST /upload · DELETE /{id}
ImagesGET /images · GET /{image_id} · PUT /annotations
ClassesPOST /extract-classes · /delete-classes · /merge-classes
ConversionPOST /convert · POST /merge · GET /formats
AugmentationPOST /augment-enhanced
VideoPOST /video/extract
DuplicatesPOST /find-duplicates · /remove-duplicates
ModelsGET /models · POST /download · POST /import · POST /{id}/load
Auto-AnnotatePOST /auto-annotate · GET /jobs
TrainingPOST /start · GET /{id}/status · /pause · /resume · /stop
SystemGET /health
python · start training
import requests

response = requests.post(
    "http://localhost:8000/api/training/start",
    json={
        "dataset_id": "my-dataset",
        "architecture": "yolov8n",
        "task": "detect",
        "epochs": 100,
        "batch_size": 16,
        "imgsz": 640,
        "export_format": "onnx"
    }
)
job = response.json()
# Poll: GET /training/{job["id"]}/status
python · zero-shot auto-annotate
requests.post(
    f"http://localhost:8000/api/datasets/{id}"
    "/auto-annotate",
    json={
        "model_id": "groundingdino-base",
        "prompt": "hard hat . safety vest",
        "confidence": 0.35
    }
)
Getting Started

Up and running in 3 steps

Prerequisites: Python 3.10+ and Node.js 18+. First run takes 2–5 min while PyTorch downloads (~1.5 GB).

01
Clone

Get the code

Clone the repository from GitHub. No submodules, no monorepo complexity.

git clone https://github.com/Dan04ggg/VisOS.git && cd VisOS
02
Start

Launch everything

One command creates the virtualenv, installs all deps, starts both servers, and opens your browser.

python3 run.py restart
03
Use

Load your dataset

Point VisOS at a local folder or ZIP. Format is auto-detected. Start annotating immediately.

→ http://localhost:3000
Process Manager

Full control over both processes

bash · run.py commands
python3 run.py start           # Start both servers
python3 run.py stop            # Stop cleanly
python3 run.py restart         # Full restart
python3 run.py restart-back    # Backend only
python3 run.py restart-front   # Frontend only
python3 run.py status          # Show PIDs and ports
python3 run.py logs            # Tail live output
Documentation

Troubleshooting & Tips

Common issues and how to fix them fast.

"Backend not connected"

Python failed to start. Check .logs/backend.log. Common causes: Python < 3.10, port 8000 in use, missing OpenCV system dependency.

First startup hangs

Normal — PyTorch is large. Check .logs/backend.log to watch pip progress. Allow 2–5 minutes on first run.

"Dataset format not recognized"

Auto-detection looks for specific files (data.yaml, instances_train.json, Annotations/*.xml). Match the folder structure exactly.

OOM during training

Lower batch size (try 4–8), reduce image size to 320, or switch to a smaller arch like yolov8n. Check VRAM with nvidia-smi.

Port still in use after crash

Run python3 run.py stop. macOS/Linux fallback: lsof -ti:3000 | xargs kill -9. Windows: netstat -ano | findstr :3000.

Blank frontend / 500 error

Run npm install manually in the project root, then python3 run.py restart-front.

Security Note
The FastAPI backend serves files directly from your local filesystem. Do not expose port 8000 to the public internet without authentication. For remote GPU servers, use SSH port forwarding: ssh -L 3000:localhost:3000 -L 8000:localhost:8000 user@server
Open Source · Free Forever
Start building better
CV datasets today.

No sign-up. No cloud. No credit card. Clone, run, and own your entire computer vision pipeline from day one.

TypeScript 58%
Python 40%
Free · Local · No Cloud
AGPL 3.0 License