The imagededup is a Python package that allows to find exact and near duplicate images in the collection of images. It can be useful to find and remove duplicate images from a dataset when training a model.
The imagededup provides various algorithms to find duplicates. This tutorial provides an example how to use convolutional neural network (CNN) to find duplicate images in a directory.
Using pip package manager, install imagededup from the command line. The pip installs TensorFlow 2 because it required by imagededup.
pip install imagededupWe will use 9 images which stored in the images directory. We will try to find duplicates for image01.jpg.
 
We create a convolutional neural network by using CNN class. To find duplicate images, we use find_duplicates method. The image_dir parameter defines the path to the directory that contains images. If the scores parameter is True then similarity scores are returned together with duplicates. The min_similarity_threshold is a threshold value which defines a minimum score of the similarity. If similarity score is greater than min_similarity_threshold value, then the image will be a duplicate.
from imagededup.methods import CNN
from imagededup.utils import plot_duplicates
imgDir = 'images'
img = 'image01.jpg'
cnn = CNN()
duplicates = cnn.find_duplicates(image_dir=imgDir, scores=True,
                                 min_similarity_threshold=0.9)
plot_duplicates(image_dir=imgDir, duplicate_map=duplicates, filename=img)The find_duplicates method returns a dictionary of the form like this:
{
    'image01.jpg': [
        ('image05.jpg', 0.9601821),
        ('image07.jpg', 0.95339376),
        ('image09.jpg', 0.9276193)
    ],
    'image02.jpg': [
        ('image06.jpg', 0.93348324)
    ],
    'image03.jpg': [
        ('image04.jpg', 0.92860264),
        ('image08.jpg', 0.91540354)
    ],
    'image04.jpg': [
        ('image03.jpg', 0.92860264),
    ...
}Finally, we use the plot_duplicates function to display duplicated images for the image image01.jpg.

 
             
                         
                         
                        
Leave a Comment
Cancel reply