Batch sizing, assistant analysis, and performance requirements

Overview/Purpose

This is a sub-guide to the one describing Batch Workflow. This guide is focused on points left uncovered but crucial for understanding the Batch Workflow, bringing concrete numbers, details, and guidelines on the following topics:

How to determine the batch size for your batch?
How to analyze an AI assistant’s performance?
How to make sure your AI assistants will perform?

Audience

All solutions roles should have an idea of the contents of this guide.

Batch sizing

What is a batch?

A batch is a collection of images you plan to use on a specific workflow step. For example, your batches might slightly differ for:

Manual annotation;
AI-assisted annotation;
Manual or AI-powered Quality Control;
Model building purposes.

In Hasty, the Batch Workflow philosophy is the core pillar that helps us achieve fantastic automation results. The idea is to annotate and QA batches in ever-increasing numbers as the models’ confidence grows and automation capabilities increase from Level 1 to Level 3.

Additionally, batches are also the mechanism by which we deliver work to a client, communicating project progress in relation to batches, sharing when batches are complete and ready for them to use, and counting annotations in completed batches for billing purposes.

The naming convention for the data

Hasty has three automation Levels, so to make inter-team communication smoother, we suggest using the following naming convention for the data you use throughout the project.

Level X Batch Y

where X is the automation Level number (from 1 to 3), and Y is the number of a specific batch (from one to inf).

Some other tips that might help you not to get lost in the data field:

Batch is a dataset in Hasty;
You can rename any dataset at any given time if you feel like it;
Reset Y when switching an automation Level. For example, the last batch of Level 1 is Level 1 Batch 100, but the first batch of Level 2 is Level 2 Batch 1.

Level 1 batches (annotate an object in a matter of seconds)

Each project starts with raw batches that should be manually processed to activate more advanced automation features. So, at this stage, there are no AI assistants, no auto-labeling, and no AI Consensus Scoring.

The main tools for annotating the Level 1 batches are:

Manual instruments (Polygon, Bounding Box, Brush, and Eraser);
AI-powered instruments (ATOM, Box to Instance).

Unfortunately, it might be challenging to determine the approximate number of images needed to switch from Level 1 to Level 2 automation as it depends on the use case, the number of classes, the exact vision AI task, and the images themselves.

In general, Level 1 batches are relatively small. Some say they should include less than 1000 images while ensuring the batch is balanced class-wise. It sounds logical. Still, some guidelines should ease the pain and be of help in finding a better approximation:

Identify the vision AI task for your use case:
- Classification task;
- Object Detection;
- Segmentation task (Instance or Semantic);
- Attribute Prediction task.
Identify the number of classes you will have in your project:
- 2 classes;
- 2 - 10 classes;
- More than 10 classes
Assess the batch images by the following parameters:
- Resolution;
- Size of the objects;
- Whether objects overlap or not.

The ideal scenario is as follows:

Classification (any number of classes, any image parameters) - at least 100 images per class;
Attribute Prediction (any number of classes/attributes, any images parameters) - at least 100 images per attribute;
Object Detection (any number of classes, any image parameters) - at least 1000 annotations per class;
Instance/Semantic Segmentation (any number of classes, any image parameters) - at least 1000 annotations per class.

However, you can build a more complex logic based on your findings. For example:

I’m solving an Object Detection task with 3 classes: Person, Chair, and Sofa;
The Resolution of my images is high, and the objects are clearly visible in the images without having to zoom in;
Still, there is a lot of overlapping between Person & Chair and Person & Sofa classes;
In a perfect world, I’d have 1000 annotations per each class, but it might not work in my case because of the overlap situation;
So, I’ll add more images without overlapping annotations per class (for example, 100 more annotations per class) to try and preemptively address the edge case.

Please keep in mind that these guidelines are for guidance only. Your use case might be unique, so do not rush with choosing a Level 1 batch size. Carefully assess your task before making a decision. But try to keep the Level 1 batch size from 500 images min to 2000 images max.

Level 2 batches (annotate a full image in a matter of seconds)

With Level 2 batches, you will already have Level 2 automation features to annotate all objects in an image in one action. The time savings compared to Level 1 is drastic as human input decreases radically. However, this automation requires a higher level of confidence than Level 1. The implication is that one starts an annotation project using Level 1 tools until Level 2 tools are ready to be deployed.

In Hasty, Level 2 automation is achieved with the use of AI assistants. These assistants learn in the background while you annotate. When they have reached a certain confidence score, you as a user can start to use them and get suggestions not only for individual objects but for a complete image. The assistants retrain and improve as more images are completed.

In general, there are no strict limitations for Level 2 batches but we’d suggest sticking to 10 000 images as the maximum batch size number.

With Level 2 batches, it’s important to keep track of the models’ performance. If it did well in prior batches, it might be the signal for testing the Auto Labeling feature and moving to the Level 3 batches after sorting the labeling queue with Active Learning.

Level 3 batches (annotate a full image batch/project in a matter of seconds)

With Level 3 batches, you will start using the Level 3 automation feature Auto Labeling to annotate the full image batch/project in a matter of seconds. Level 3 batches are comparable to the whole project and might consist of tens of thousands of images (10 000+).

Troubleshooting

Your Level 3 batches might be gigantic, which makes it challenging to process tech-wise. From our experience, splitting Level 3 batches into sub-batches of 20 000 images max is better. They will be one batch in the workflow, but for each sub-batch, you should run its own feature run (for example, an Automated Labelling or an AI Consensus Scoring run). Such an approach minimizes the effect of tech problems. For instance, if you schedule a run on the whole batch and it fails at 99%, you will lose all of the data and need to restart it. The cost of such an issue is much smaller for small batches. Additionally, less batch size leads to less time for the features' runs to complete, as models have fewer images to process.

Project complexity

Simple

(f.e. Binary Classification case - any image parameters)

Medium

(f.e. Object Detection, <10 classes, few overlapping objects)

Complex

(f.e. Instance Segmentation, 10-30 classes, overlapping objects)

Automation level

100 images per class

Batch size: 200 images

500-1000 annotations per class

1000 annotations per class + additional 100 annotations per class with overlapping

Any number of images

(batch size number can be small - 250 images per class)

5000 images while maintaining the class balance

5 000 - 10 000 while maintaining the class balance and making the data as diverse as possible (active learning)

5 000 - 10 000 images

10 000 - 20 000 images

Table 1: Cheat sheet to guide batch sizing

Assistant analysis

The main tools for processing Level 2 batches are AI assistants. Hasty offers a variety of assistants each corresponding to a specific vision AI task such as:

Image Classification Assistant
Label Attribute Assistant
Class Predictor Assistant
Object Detection Assistant
Instance Segmentation Assistant
Semantic Segmentation Assistant

How to analyze an AI assistant’s performance?

Looking at the data

In line with the principle of “making the complex simple”, the best way to assess assistants is to look at the data. If CloudWorkers are using the assistants, they are working well. It is that simple.

Look at the project report (related guide) and have a look at which tools are being used to create annotations. Ask these questions:

Are the assistants being used to create annotations?
Is the use of assistants, or lack thereof, consistent across all CloudWorkers?
Is the use of assistants increasing with the progression of batches?
What are the CloudWorkers saying about the assistants?

Answering the questions above with help, you see if the issue lies with the annotation guidelines or model, specific CloudWorkers, or if it is simply the early stages of the project and the assistants are improving as will be the norm.

NB: yes, don’t be afraid to ask!

Anecdotal evidence about what is not working is super valuable information to ML engineers. Our clients might find these insights and feedback extremely valuable. This is where the whole concept of our adaptive workflow comes to life.

Looking at the model metrics

Fortunately, all of these assistants can be analyzed similarly. Each Hasty project has an AI Assistants status page that can be used for a quick yet effective overview of each AI assistant.

On this page, you can see how models improve over time to get an idea of how you are progressing toward annotation automation. Every potential model available in Hasty is displayed here. It is important to note that these plots give a relative indication of model performance. So, we can see if models are improving over time. However, it is tricky to make absolute claims about a model from these plots, such as “This is a good model.”

Even more tricky, is to interpret the results when given different variables change. If performance is getting better, does it simply mean that our sample of data is not diverse enough? If it is getting worse, does it mean that: - annotation quality is decreasing, or - the newly labeled data is simply more diverse, or - the examples are more tricky? So, looking at these plots is interesting and informative, but exercise caution when drawing conclusions based on the plots alone; mostly, you need to take the full context of the project into account to do this. This requires experience.

If a model has been successfully trained, you will see a similar output:

There are four pieces of important information:

The status of the model - is it activated, training (activated but with a new model in the works), failed (you shouldn't see this - if you do, contact us), or not activated (not enough data yet);
Next, you see the “Trains automatically next” section. This tells you how many new annotations or images you need to annotate to train a new model (differs depending on the assistant and stage);
Then, we have the Labels required section. This shows you how close you are to triggering the training of a new model;
Finally, we have our graph that shows you how your model improves over time (it displays Loss over images with the only exception being Label class prediction assistant that displays Accuracy over labels). Please note that each point of these graphs is not validated on the same validation set - we keep growing both validation and train sets as the project grows, so these numbers are not absolute truths (for now).

Analyzing an assistant is easy:

You can take a look at how it performs on your data;
- Whether the predicted classes are correct;
- Whether the predicted shapes are correct;
- Whether the bounding boxes are accurate;
- etc.

Such a visual approach might be helpful for a basic overview or identify flaws in your annotation strategy. To get a general picture, though:

You can take a look at the output graphs on the AI Assistants status page.

Here's an example from an internal demo project we did for a customer demo. We see the training loss starting very low, with our validation loss being much higher. As we continued to annotate, we got the two metrics converging closer to each other, indicating a better-performing model.

The ideal case should look like this - a decreasing trend for both train and validation curves over the increasing number of annotations.

What's important to know here, is that you might not see a gradual improvement from the start. Machine learning models are fickle beasts and often take some significant time to become accurate.

How to make sure your AI assistants will perform?

An ML model is as good as the data it was trained on. Hasty automation features are AI-powered, meaning they use ML models to streamline your workflow. For Level 2 and 3 automation features (AI assistants, AI CS, Auto Labeling) to deliver the best possible performance, you should carefully consider the following:

Make sure your batch is not too small. Check out the Level 1 Batch sizing section to learn more;
Make sure your batch is balanced class-wise;
Do not underestimate the Active Learning labeling queue - it’ll help you achieve better model performance faster;
Make sure your annotation strategy makes sense and does not produce noisy data;
Check the performance of the models once in 200 images (when playing with Level 1 batches):
- It will be good for a quick evaluation of the current models’ state;
- Also, it’ll play a pivotal role in ensuring that your annotation strategy produces high-quality data, as you’ll see how a machine sees it.
Switching from Level 1 to Level 2 and from Level 2 to Level 3 is your decision, but you should ensure that the models are ready for such a switch. Otherwise, it might be costly and painful to fix.

PreviousManual Review NextIntroduction to Quality Control

Last updated 1 year ago