Once upon a time I was sitting in on an online rescue dog adoption event hosted by a rescue organization. They rescue dogs from the dog meat trade in Korea. Oftentimes, when a dog is being shown on camera, there’s always the question of “What breed is that dog?” and the usual response is “We don’t know exactly.” The keyword is “exactly” and the reason for this (as they explain many times) is the dogs get rescued from all over Korea from multiple farms.
This inspired me to try to create a way to use a single image of a dog to correctly identify it’s breed. After doing some research, turns out people have already created a breed classifier already, but of course they don’t have all of the breeds. Therefore, I ended up trying to find a way to identify two breeds, Shiba Inu and Jindo, two common breeds I observed from the rescue adoption event that weren’t on any existing, available models.
Since the goal of this was to use an image to do a classification, I know I am going to end up working with images as my data points and eventually be making a convolutional neural network (CNN).
The first step in this process was collecting data, which in this case were images. To do this I searched GoogleImages with the queries “Jindo dog” and “Shiba dog”. Afterwards I created an image scraper that would first use Selenium to scroll through the entire search and find the links to all the image. Each of the images had a distinct class tag and so I was able to create a big list of each image link that was eventually accessed using urllib3 library. I had to look through the images to make sure they were pictures of actual dogs, and in the end, I was able to gather about 300 images for each breed.
Since making a CNN from scratch would require at least 900 images for each class, I had to conduct offline augmentation. This augmentation is done by taking the original images and randomly applying a combination of rotating, cropping, zooming and horizontal flipping to create a new image. To actually do to the data augmentation, I used the image generator function from the Keras library three times on a selection of the images from the proper directory holding all the images. Below is an example code I used.
img_gen2 = ImageDataGenerator(rescale=1./255,
width_shift_range = 0.2,
fill_mode= 'reflect')data_generatation_1 = img_gen1.flow_from_directory(
batch_size = 604,
classes = ['jindo','shiba'],
class_mode = 'binary',
seed = random_seed, #random seed = 123
After doing this and saving it all, I ended up with a balanced data set with about 1800 images in total. Despite it being the image of the same dog, to the CNN, the new image could as well be the image of another dog. Basically, the end result is making CNN see a dog and the proper classification. To show this and give a sense of the data, augmented images images are shown below.
After gathering enough images, I was finally able to use Keras to make some CNN models. I started with only four layers at first and got really overfit results or one class guesses when training under a few epochs. This made me realize that identifying dog breeds is much more complex and I would need more layers.
I knew the eventual best thing to do time-wise was to apply transfer learning, but before doing that, I built a model that followed an AlexNet structure (a famous CNN design with 5 convolution layers and 3 fully connected) to get a taste of building. I had some help by following the code here, but I adjusted the dropout and learning rates to help with the overfit. Despite this, the model was still overfit, and so it was time to apply transfer learning.
First off, the benefit of transfer learning is not having to train your model from scratch, which would take a long time if you wanted to make a very accurate model. Transfer learning is taking a model that has already been trained and produced for a specific purpose, like classifying hot dogs and then used for a different purpose, like classifying dogs. There are two methods of transfer learning, first there is feature extraction, which takes the previously learned features of the previous model and outputs something entirely different. The second is fine tuning, which is basically taking whichever feature you desire, find the layer associated, and then use it. For this project, I ended up doing feature extraction by loading in the VGG16 model from Keras and setting freezing each layer with:
model = VGG16(include_top=False, input_shape=(224, 224, 3))
# mark loaded layers as not trainable
for layer in model.layers:
layer.trainable = False
I ended up with a final model with an accuracy of 0.8333 and an F1 score of 0.8148.
Some of the properly classified images are below, and it doesn’t seem like it’s doing too bad of a job. This does show, however, that the data images I have aren’t exactly clean, which makes sense since I only tried to clean the data very quickly to make sure all the images were only of a single dog.
Some of the misclassified images are here:
From looking at the misclassified and confusion matrix, it seems the model is having trouble identifying Shibas. Perhaps one potential cause could be there aren’t enough activation in the CNN to make the image know it’s a Shiba.
Unfortunately, since the CNN isn’t very transparent, one can’t see what the model is looking at to do the classifications. As a next step, doing a Grad-CAM can put a heat-map onto what parts of the images or shapes are being activated. After trying for a while, I figured this would be putting a visualization layer in somewhere before the output layer, but I haven’t successfully implemented it yet. Another next step would be to not limit this model to only Shiba and Jindo, and since there are already plenty of models out with breed classification, a cool thing to do would be to add it other models to make it more thorough.