VisOS — Local Computer Vision Workbench

The Problem

CV dataset work is a grind.
It doesn't have to be.

Every computer vision researcher knows the loop: wrong format, conversion scripts, class conflicts, cloud costs. VisOS ends that loop.

Pain Point 01

Expensive SaaS Platforms

Roboflow, Scale AI, V7 — pay per image, per seat, per export. Your dataset budget goes to the platform, not the model.

Pain Point 02

Format Conversion Hell

COCO downloaded, YOLO expected. One hour writing conversion scripts. Repeat for every framework switch.

Pain Point 03

Mandatory Cloud Uploads

Your proprietary medical, industrial, or defense data going to third-party servers. Compliance risk. Latency. Loss of control.

Pain Point 04

Fragmented Toolchain

Annotation in one tool, augmentation in another, training in a notebook. Zero unified workflow, maximum context switching.

The Solution

One local app.
Complete pipeline.

VisOS wraps the entire computer vision data workflow in a single, local UI. From raw images to trained model — no cloud, no accounts, no fragmentation.

Auto-detect dataset formats on load
Convert between any of 13+ formats in one click
Auto-annotate with SOTA models including SAM 3 and GroundingDINO
30+ augmentations, live preview before commit
Train, monitor, and export to ONNX / TensorRT

Load Dataset

auto-detect · any format

Auto-Annotate

SAM 3 · GroundingDINO · YOLO

Augment Pipeline

30+ transforms · live preview

Train & Export

PyTorch · ONNX · TensorRT

Capabilities

Everything you need.
Nothing you don't.

A complete toolchain built for computer vision practitioners, not cloud vendors.

Canvas Annotation

Six-tool canvas editor: bounding box, polygon, keypoint, brush, select/edit, and SAM Wand. Full undo/redo. Auto-saves.

6 tools · keyboard shortcuts

Auto-Annotation

Run inference with any loaded model directly on your dataset. GroundingDINO supports zero-shot annotation via text prompt.

SAM 3 · GroundingDINO · YOLO

Format Conversion

Convert between any of 13+ formats with one click. Images and annotations stay in sync.

13+ formats · batch conversion

Augmentation Pipeline

Toggle-based visual pipeline builder with 30+ transforms. Preview outputs before committing.

30+ transforms · live preview

Local Training

Train detection, segmentation, and classification models with live loss/accuracy charts and GPU monitoring.

YOLOv8–10 · RF-DETR · live metrics

Duplicate Detection

Find exact and near-duplicate images via MD5, pHash, aHash, or CLIP embeddings.

MD5 · pHash · aHash · CLIP

Video Frame Extraction

Turn video into annotatable datasets. Extract every Nth frame, keyframes on scene change, or manual scrubber selection.

MP4 · AVI · MOV · MKV · WebM

Class Management

Rename, merge, delete, or extract classes via a simple table UI. No JSON editing required.

bulk ops · no JSON editing

Dataset Merging

Combine multiple datasets with a class-mapping UI to resolve naming conflicts before the merge.

class mapping · conflict resolution

Annotation Tools

Six precision tools for every task

Tool	Shortcut	Description
Select / Edit	V	Move, resize, and adjust existing annotations
Bounding Box	B	Draw axis-aligned rectangular regions for object detection
Polygon	P	Freeform polygon for instance segmentation and irregular shapes
Keypoint	L	Place landmark points for pose estimation tasks
Brush	R	Freehand paint masks for pixel-level segmentation
SAM Wand	auto	Click-based segment anything — auto-activates when a SAM model is loaded

Augmentation Transforms

30+ transforms in the pipeline builder

Toggle any combination, preview samples, then apply.

Horizontal FlipVertical FlipRotationScaleTranslationShearPerspectiveRandom CropBrightnessContrastSaturationHue ShiftGrayscaleGaussian BlurGaussian NoiseSharpenJPEG CompressionCutoutMosaicMixUpElastic DeformationGrid DistortionHistogram EqualisationChannel ShuffleInvertPosterizeSolarize

In the App

See it in action.

Every tool in VisOS is built for speed and clarity. Here's what working with your datasets actually looks like.

localhost:3000/Annotation

assets/images/annotate_rounded_bottom.png

Place your annotation canvas screenshot here — showing bounding box, polygon, or SAM wand in use

Canvas Annotation

Six-tool precision canvas editor

The annotation canvas supports bounding box, polygon, keypoint, brush, select/edit, and the SAM Wand. Full undo/redo with keyboard shortcuts and auto-save. Everything you need, nothing you don't.

assets\images\annotate_grounding-dino.png

Auto-Annotation

Zero-shot with GroundingDINO, SAM/2/3, YOLO WORLD

Type a text prompt and watch the model annotate your entire dataset. No training examples needed.

assets/images/augmentation.png

Augmentation Pipeline

Toggle-based pipeline builder

Enable any of 30+ transforms, preview sample outputs live, then apply to your full dataset.

assets/images/training_view.png

Local Training

Live loss and accuracy charts

Monitor GPU usage, pause and resume runs, and export to ONNX or TensorRT — all from the browser.

assets/images/convert_format.png

Format Conversion

Convert between 13+ formats in one click

YOLO, COCO, Pascal VOC, TFRecord and more — auto-detected on load, exported in one click. Images and annotations stay in sync.

assets/images/datasets_view.png

Duplicate Detection

MD5, pHash, aHash, and CLIP

Find exact and near-duplicate images with configurable similarity threshold. Keep the best, remove the rest.

Format Support

Load & export 13+ formats

Auto-detected on load. Convert to any other format with a single click.

YOLOCOCOPascal VOC LabelMeCreateMLTensorFlow CSV ImageNet ClassificationYOLO OBB COCO PanopticCityscapes ADE20KDOTATFRecord

Model Support

SOTA models.
All local.

Download pretrained weights inside the app or import your own .pt / .onnx files.

Detection - Segmentation - Classification - Key-points

YOLOv5 n/s

YOLOv8 n/s/m/l/x

YOLOv9 n/s/m/c/e

YOLOv10 n/s/m/b/l/x

RT-DETR L/X

RF-DETR Base/Large

Zero-Shot Segmentation

SAM ViT-B/L

SAM 2 Tiny/Small

SAM 2 Base+/Large

SAM 2.1 Tiny/Small

SAM 2.1 Base+/Large

SAM 3

Zero-Shot Detection

GroundingDINO Tiny

GroundingDINO Base

YOLO WORLD S/M/L

OWL-ViT Base (Patch32)

OWL-ViT Base (Patch16)

OWL-ViT Large (Patch14)

Zero-Shot Annotation

Type a text prompt like "red car" or "person holding phone" — GroundingDINO annotates without any training examples.

Duplicate Detection

Four methods, one interface

MD5 Hash

Exact byte-for-byte duplicates

pHash

Visually similar images

aHash

Fast approximate similarity

CLIP Embeddings

Semantically similar content

Core Pillars

Built on three
unshakeable principles.

Everything in VisOS exists to serve one mission: give CV practitioners complete control over their data and models.

Total Privacy

Your images never leave your machine. No uploads, no cloud processing, no third-party servers. Medical, defense, and proprietary datasets stay where they belong.

100% Local · Air-gap ready

Format Freedom

13+ annotation formats supported. YOLO, COCO, Pascal VOC, TFRecord — auto-detected on load, converted in one click. Never write another conversion script.

13+ formats · auto-detect

Zero Cost

No per-image pricing. No seat licenses. No export fees. VisOS is free and open source under AGPL 3.0. Your dataset budget goes to compute, not platforms.

Free forever · AGPL 3.0

Architecture

Clean stack.
No Docker required.

FastAPI backend + Next.js frontend, managed by a single Python process controller. No containers, no environment variables, no configuration.

Frontend · :3000

Next.js App Router

React Components

API Proxy Routes

Canvas Annotation Editor

Live Training Charts

TypeScript · Tailwind CSS

HTTP / REST

↕

Backend · :8000

FastAPI entrypoint

dataset_parsers.py

format_converter.py

annotation_tools.py

model_integration.py

training.py · augmentation.py

Proxy Pattern — Zero CORS

Next.js API routes forward all requests to FastAPI. The browser only ever talks to localhost:3000 — never directly to port 8000. For remote GPU servers: ssh -L 3000:localhost:3000 -L 8000:localhost:8000 user@server

API Reference

Full REST API.
Interactive docs included.

Base URL: http://localhost:8000/api · Interactive Swagger UI at /docs

Resource	Endpoints
Datasets	GET /datasets · POST /load-local · POST /upload · DELETE /{id}
Images	GET /images · GET /{image_id} · PUT /annotations
Classes	POST /extract-classes · /delete-classes · /merge-classes
Conversion	POST /convert · POST /merge · GET /formats
Augmentation	POST /augment-enhanced
Video	POST /video/extract
Duplicates	POST /find-duplicates · /remove-duplicates
Models	GET /models · POST /download · POST /import · POST /{id}/load
Auto-Annotate	POST /auto-annotate · GET /jobs
Training	POST /start · GET /{id}/status · /pause · /resume · /stop
System	GET /health

python · start training

import requests

response = requests.post(
    "http://localhost:8000/api/training/start",
    json={
        "dataset_id": "my-dataset",
        "architecture": "yolov8n",
        "task": "detect",
        "epochs": 100,
        "batch_size": 16,
        "imgsz": 640,
        "export_format": "onnx"
    }
)
job = response.json()
# Poll: GET /training/{job["id"]}/status

python · zero-shot auto-annotate

requests.post(
    f"http://localhost:8000/api/datasets/{id}"
    "/auto-annotate",
    json={
        "model_id": "groundingdino-base",
        "prompt": "hard hat . safety vest",
        "confidence": 0.35
    }
)

Getting Started

Up and running in 3 steps

Prerequisites: Python 3.10+ and Node.js 18+. First run takes 2–5 min while PyTorch downloads (~1.5 GB).

Clone

Get the code

Clone the repository from GitHub. No submodules, no monorepo complexity.

git clone https://github.com/Dan04ggg/VisOS.git && cd VisOS

Start

Launch everything

One command creates the virtualenv, installs all deps, starts both servers, and opens your browser.

python3 run.py restart

Use

Load your dataset

Point VisOS at a local folder or ZIP. Format is auto-detected. Start annotating immediately.

→ http://localhost:3000

Process Manager

Full control over both processes

bash · run.py commands

python3 run.py start           # Start both servers
python3 run.py stop            # Stop cleanly
python3 run.py restart         # Full restart
python3 run.py restart-back    # Backend only
python3 run.py restart-front   # Frontend only
python3 run.py status          # Show PIDs and ports
python3 run.py logs            # Tail live output

Documentation

Troubleshooting & Tips

Common issues and how to fix them fast.

"Backend not connected"

Python failed to start. Check .logs/backend.log. Common causes: Python < 3.10, port 8000 in use, missing OpenCV system dependency.

First startup hangs

Normal — PyTorch is large. Check .logs/backend.log to watch pip progress. Allow 2–5 minutes on first run.

"Dataset format not recognized"

Auto-detection looks for specific files (data.yaml, instances_train.json, Annotations/*.xml). Match the folder structure exactly.

OOM during training

Lower batch size (try 4–8), reduce image size to 320, or switch to a smaller arch like yolov8n. Check VRAM with nvidia-smi.

Port still in use after crash

Run python3 run.py stop. macOS/Linux fallback: lsof -ti:3000 | xargs kill -9. Windows: netstat -ano | findstr :3000.

Blank frontend / 500 error

Run npm install manually in the project root, then python3 run.py restart-front.

Security Note

The FastAPI backend serves files directly from your local filesystem. Do not expose port 8000 to the public internet without authentication. For remote GPU servers, use SSH port forwarding: ssh -L 3000:localhost:3000 -L 8000:localhost:8000 user@server

Computer Vision Workbench Fully Local

Built for everyindustry.

CV dataset work is a grind.It doesn't have to be.

Expensive SaaS Platforms

Format Conversion Hell

Mandatory Cloud Uploads

Fragmented Toolchain

One local app.Complete pipeline.

Everything you need.Nothing you don't.

Canvas Annotation

Auto-Annotation

Format Conversion

Augmentation Pipeline

Local Training

Duplicate Detection

Video Frame Extraction

Class Management

Dataset Merging

Six precision tools for every task

30+ transforms in the pipeline builder

For All AnnotationTypes

See it in action.

Load & export 13+ formats

SOTA models.All local.

Four methods, one interface

MD5 Hash

pHash

aHash

CLIP Embeddings

Built on threeunshakeable principles.

Total Privacy

Format Freedom

Zero Cost

Clean stack.No Docker required.

Frontend · :3000

Backend · :8000

Full REST API.Interactive docs included.

Up and running in 3 steps

Get the code

Launch everything

Load your dataset

Full control over both processes

Troubleshooting & Tips

"Backend not connected"

First startup hangs

"Dataset format not recognized"

OOM during training

Port still in use after crash

Blank frontend / 500 error

Built for every
industry.

CV dataset work is a grind.
It doesn't have to be.

One local app.
Complete pipeline.

Everything you need.
Nothing you don't.

For All Annotation
Types

SOTA models.
All local.

Built on three
unshakeable principles.

Clean stack.
No Docker required.

Full REST API.
Interactive docs included.