CS539 : Skin Cancer

Background and Motivation

Skin cancer is one of the most common forms of malignancy which affects the human population all over the world. The American Cancer Society’s Facts and Figures shows that more people are diagnosed with skin cancer each year in the U.S. than all other cancer types combined. Early diagnosis is essential to keep a tap on the death rate of the disease and that’s where Machine Learning and Deep Learning Techniques like CNN comes into play. Research showed that CNN’S were better at detecting more melanomas than experienced dermatologists. This motivated us to work on this project.

Data Sources

Skin Cancer: Malignant vs. Benign

https://www.kaggle.com/fanconic/skin-cancer-malignant-vs-benign

The dataset consists of 2,637 training images (1,440 benign and 1,197 malignant images) and 660 testing images (330 benign and 300 malignant images) with a resolution of 224 x 224.

Benign

Malignant

We split 15% of benign and malignant images into the validation data while keeping the other 85% images as a training set. This is an important step because we need to make sure that the validation set is identical for every model training, so that we can avoid data leakage, which leads to a very high accuracy on the validation images but relatively low accuracy on the test images.

Objectives

Data augmentation will be used to increase the varieties of training images by some selected geometric transformations. Leverage the pre-trained models like ResNet, VGG, and EfficientNet in the FastAI Deep Learning Library and implement Transfer Learning on the same for Image Classification on skin cancer images and then improve the accuracy by Ensemble Learning Techniques (ELT). The best model from the process would help to predict the type of cancer such as malignant or benign.