Explicit Content Detection Model

The client is a major Android app developer. They specialize in utility and file recovery tools for the Android operating system. They were working on an app that helps users filter and hide explicit content on their devices in real-time on a wide variety of Android smartphones. They reached out to BroutonLab to develop a custom model. The fast inference is a very important feature of the model, especially due to users owning Android phones with vastly different hardware specifications. Also, it was important to achieve high accuracy and allow users to change settings in order to determine what content is considered NSFW.

The biggest problems of explicit content detection were: - The subjective nature of the task - what is considered explicit? - Lack of high-quality datasets - Inability to use large and powerful models due to hardware restrictions of Android smartphones There were two open-source datasets, but both of them contained a lot of mislabeled data. After agreeing on what should be considered "explicit" with the client, a custom dataset was collected and annotated. The dataset was later expanded to further improve the accuracy of the model. At first, the pretrained models were tested to measure their accuracy and performance. After that, a few models were trained and tested on our custom dataset. Large models were quantized and pruned.

The resulting neural network was light enough to run on Android in real-time, it was accurate, and the app has seen a large increase in the number of downloads. For the client, it saved thousands of dollars that would have been spent over the years, paying for a subscription-based service.

Challenge

Solution

Results