ON THIS PAGE

  • DataDreamer
  • Quickstart
  • Overview
  • Features
  • Installation
  • Available models
  • Example
  • Useful tips

DataDreamer

Quickstart

DataDreamer enables you to create annotated datasets from scratch using Generative AI and foundational Computer Vision models. This allows you to train your own models for edge AI applications, such as object detection, without the need for real-world data.To generate your dataset with custom classes, you need to execute only two commands:
Command Line
1pip install datadreamer
2datadreamer --class_names person moon robot

Overview

DataDreamer is an advanced toolkit engineered to facilitate the development of edge AI models, irrespective of initial data availability. Distinctive features of DataDreamer include:
  • Synthetic Data Generation: Eliminate the dependency on extensive datasets for AI training. DataDreamer empowers users to generate synthetic datasets from the ground up, utilizing advanced AI algorithms capable of producing high-quality, diverse images.
  • Knowledge Extraction from Foundational Models: DataDreamer leverages the latent knowledge embedded within sophisticated, pre-trained AI models. This capability allows for the transfer of expansive understanding from these "Foundation models" to smaller, custom-built models, enhancing their capabilities significantly.
  • Efficient and Potent Models: The primary objective of DataDreamer is to enable the creation of compact models that are both size-efficient for integration into any device and robust in performance for specialized tasks.

Features

  • Prompt Generation: Automate the creation of image prompts using powerful language models.Provided class names: ["horse", "robot"]Generated prompt: "A photo of a horse and a robot coexisting peacefully in the midst of a serene pasture."
  • Image Generation: Generate synthetic datasets with state-of-the-art generative models.
  • Dataset Annotation: Leverage foundation models to label datasets automatically.
  • Edge Model Training: Train efficient small-scale neural networks for edge deployment. (not part of this library)

Installation

To install with pip:
Command Line
1pip install datadreamer

Available models

Model CategoryModel NamesDescription/Notes
Prompt GenerationMistral-7B-Instruct-v0.1Semantically rich prompts
TinyLlama-1.1B-Chat-v1.0Tiny LM
Simple random generatorJoins randomly chosen object names
Image GenerationSDXL-1.0Slow and accurate (1024x1024 images)
SDXL-TurboFast and less accurate (512x512 images)
SDXL-LightningFast and accurate (1024x1024 images)
Image AnnotationOWLv2Open-Vocabulary object detector

Example

Command Line
1datadreamer --save_dir path/to/save_directory --class_names person moon robot --prompts_number 20 --prompt_generator simple --num_objects_range 1 3 --image_generator sdxl-turbo
This command generates images for the specified objects, saving them and their annotations in the given directory. The script allows customization of the generation process through various parameters, adapting to different needs and hardware configurations.

Useful tips

  • Batched generation: To speed up the generation process, consider increasing the batch size with --batch_size_prompt, --batch_size_image and --batch_size_annotation parameters. If you are running out of memory, try reducing the batch size.
  • Better image quality: For better image quality, consider tuning the following parameters:
    • --image_generator: Choose a model with higher image quality. SDXL-Turbo -> SDXL-Lightning -> SDXL (from fastest to slowest, and from lowest to highest quality).
    • --use_image_tester and --image_tester_patience: Enable iterative image generation and use the CLIP model to select the best images. Consider increasing the patience to get better results.
  • Number of objects per image: To generate images with a different number of objects, use the --num_objects_range parameter. For example, --num_objects_range 1 3 generates images with 1, 2, or 3 objects. Values higher than 3 are not recommended due to the limited ability of the current models to generate complex scenes.
  • Prompt generation: To generate more diverse prompts consider using the --prompt_generator tiny generator which uses a small language model to generate prompts.