Dataset Versioning & Lineage Tracking
Track dataset changes, manage versions, and maintain complete data lineage for reproducible AI model development and experiment tracking.
Find a version
Autonomous Driving v...
v3.2Dec 15, 2024
Latest
2.8M45
Autonomous Driving v...
v3.1Nov 28, 2024
2.1M42
Autonomous Driving v...
v3.0Oct 12, 2024
1.6M38
Autonomous Driving v...
v2.8Sep 5, 2024
1.2M32
Autonomous Driving v...
v2.5Aug 18, 2024
850K28
Autonomous Driving v...
v2.0Jul 30, 2024
420K20
Git-like Branching
Version datasets with visual Git-like branching

Manage dataset evolution in a Git-style branching workflows. Create versions for experiments, track changes across different annotation versions, and glean label insights and analytics while maintaining complete version history.

Data Ingestion Diagram
Visual Dataset Splitting
Partition datasets into training, validation, & test sets
Generate New Version

Generate a version of your dataset for export or model training. This version acts as a snapshot and cannot be edited later.

Source Dataset Details
5M
Total Frames
50
Total Labels
Adjust Breakdown
3.5M4.6M
Training: 3.5M
Validation: 1.1M
Testing: 400k
Version Details
Generate Version
Train Models on Versions
Train models on versions of your dataset & track and analyze performance
Dataset Version 2
Latest
v0.0.2
Version created on May 26, 2025Generated by Michael Moyo
Model Training History
Model 3YOLO_11med
Model 2YOLO_11med
Version Details
9.2KFrames
1.4KLabels
15.2KAnnotations

Annotation Label Distribution:

Vehicle32.0%
Pedestrian28.0%
Traffic Sign15.0%
Traffic Light12.0%
Other13.0%
Training
6.5Kframes
70%
Testing
1.8Kframes
20%
Validation
924frames
10%
Frame 1
Frame 2
Frame 3
Frame 4
Dataset Version Analytics
Dataset version analytics and insights

Gain deep insights into your dataset evolution with comprehensive analytics across all versions. Track label distribution changes, annotation quality metrics, and dataset composition over time to understand how your data has evolved and identify areas for improvement or potential bias.

Analytics
9.2KFrames
1.4KLabels
15.2KAnnotations

Annotation Label Distribution:

Vehicle32.0%
Pedestrian28.0%
Traffic Sign15.0%
Traffic Light12.0%
Other13.0%
Training
6.5Kframes
70%
Testing
1.8Kframes
20%
Validation
924frames
10%
Label Analytics

Label split

Annotations
0
1.7K
3.4K
5.1K
6.9K
Vehicle
Pedestrian
Traffic Sign
Other
Traffic Light
Dataset Version Export
Export datasets in multiple formats

Export your dataset versions in popular formats like COCO, YOLO, and JSON. Access your data through direct downloads, programmatic API calls, or integrate seamlessly with Jupyter notebooks for immediate analysis and model training.

COCO
YOLO
YOLO v11
JSON
Notebook
CURL
URL
PY
notebook.py
1!pip install ocular-ai
2
3from ocular import Ocular
4
5ocu = Ocular(api_key="YOUR_API_KEY")
6project = ocu.workspace("e04b9bf6-ccc3-4490-bf98-52ed10b7bd38").project("4f4c12d9-696d-4fcf-9989-f1797c545d2f")
7version = project.version("2d0e3eca-c13c-4dfc-bd99-27b5385516b2")
8export = version.export("004b3548-308c-4a08-b75f-b419072e3e3c")
9dataset = export.download()
BENEFITS
Dataset versioning built for reliable model training
Reproducible Training Results

Lock in dataset versions to ensure your model training results can be reproduced exactly, every time you run experiments.

Experiment with Confidence

Test different dataset versions and track their impact on model performance without losing your baseline datasets.

Rollback When Needed

Quickly revert to previous dataset versions if new annotations or data changes negatively impact your model accuracy.

Team Collaboration

Multiple team members can work on different dataset branches simultaneously without conflicts or data overwrites.

Complete Data Lineage

Track exactly which dataset version was used to train each model for full audit trails and compliance requirements.

Performance Tracking

Monitor how dataset improvements over versions correlate with model performance gains across training runs.

Git-like Branching

Create branches for different dataset experiments while maintaining a clean main dataset version for production training.

Compliance & Audit Ready

Maintain detailed records of dataset changes and model training history for regulatory compliance and quality assurance.

Ocular SDK
Engineered for ambitious AI projects
PY
ocular-sdk.py
1from ocular import Ocular
2
3# Initialize the SDK with your API key
4ocular = Ocular(api_key="api_key")
5
6# Access a workspace
7workspace = ocular.workspace("workspaceID")
8
9# Get a project from the workspace
10project = workspace.project("projectID")
11
12# Get a version from the project
13version = project.version("versionID")
14
15# Get an export from the version
16export = version.export("exportID")
17
18# Download the export dataset
19dataset_path = export.download()
20
Integrations
Integrate with your tech stack

Ocular integrates with your existing tech stack and multiple integrations that slot seamlessly into your workflows and pipelines.

Even when you connect to external data sources, all data stays on your existing infrastructure and data sources.

AWS
GCP
Azure
Slack
Security
Enterprise-grade security

Enterprise-grade and battle-tested security measures and protocols to protect your data.

All our systems are built with security in mind and are constantly monitored and audited.

SOC
SOC2
Ready to transform your unstructured data into AI?
Loading...