2020 has made every industry reimagine how to move forward in light of COVID-19: civil rights movements, an election year and countless other big news moments. On a human level, we’ve had to adjust to a new way of living. We’ve started to accept these changes and figure out how to live our lives under these new pandemic rules. While humans settle in, AI is struggling to keep up.
The issue with AI training in 2020 is that, all of a sudden, we’ve changed our social and cultural norms. The truths that we have taught these algorithms are often no longer actually true. With visual AI specifically, we’re asking it to immediately interpret the new way we live with updated context that it doesn’t have yet.
Algorithms are still adjusting to new visual queues and trying to understand how to accurately identify them. As visual AI catches up, we also need a renewed importance on routine updates in the AI training process so inaccurate training datasets and preexisting open-source models can be corrected.
Computer vision models are struggling to appropriately tag depictions of the new scenes or situations we find ourselves in during the COVID-19 era. Categories have shifted. For example, say there’s an image of a father working at home while his son is playing. AI is still categorizing it as “leisure” or “relaxation.” It is not identifying this as ‘”work” or “office,” despite the fact that working with your kids next to you is the very common reality for many families during this time.
On a more technical level, we physically have different pixel depictions of our world. At Getty Images, we’ve been training AI to “see.” This means algorithms can identify images and categorize them based on the pixel makeup of that image and decide what it includes. Rapidly changing how we go about our daily lives means that we’re also shifting what a category or tag (such as “cleaning”) entails.
Think of it this way — cleaning may now include wiping down surfaces that already visually appear clean. Algorithms have been previously taught that to depict cleaning, there needs to be a mess. Now, this looks very different. Our systems have to be retrained to account for these redefined category parameters.
This relates on a smaller scale as well. Someone could be grabbing a door knob with a small wipe or cleaning their steering wheel while sitting in their car. What was once a trivial detail now holds importance as people try to stay safe. We need to catch these small nuances so it’s tagged appropriately. Then AI can start to understand our world in 2020 and produce accurate outputs.
Another issue for AI right now is that machine learning algorithms are still trying to understand how to identify and categorize faces with masks. Faces are being detected as solely the top half of the face, or as two faces — one with the mask and a second of only the eyes. This creates inconsistencies and inhibits accurate usage of face detection models.
One path forward is to retrain algorithms to perform better when given solely the top portion of the face (above the mask). The mask problem is similar to classic face detection challenges such as someone wearing sunglasses or detecting the face of someone in profile. Now masks are commonplace as well.
What this shows us is that computer vision models still have a long way to go before truly being able to “see” in our ever-evolving social landscape. The way to counter this is to build robust datasets. Then, we can train computer vision models to account for the myriad different ways a face may be obstructed or covered.
At this point, we’re expanding the parameters of what the algorithm sees as a face — be it a person wearing a mask at a grocery store, a nurse wearing a mask as part of their day-to-day job or a person covering their face for religious reasons.
As we create the content needed to build these robust datasets, we should be aware of potentially increased unintentional bias. While some bias will always exist within AI, we now see imbalanced datasets depicting our new normal. For example, we are seeing more images of white people wearing masks than other ethnicities.
This may be the result of strict stay-at-home orders where photographers have limited access to communities other than their own and are unable to diversify their subjects. It may be due to the ethnicity of the photographers choosing to shoot this subject matter. Or, due to the level of impact COVID-19 has had on different regions. Regardless of the reason, having this imbalance will lead to algorithms being able to more accurately detect a white person wearing a mask than any other race or ethnicity.
Data scientists and those who build products with models have an increased responsibility to check for the accuracy of models in light of shifts in social norms. Routine checks and updates to training data and models are key to ensuring quality and robustness of models — now more than ever. If outputs are inaccurate, data scientists can quickly identify them and course correct.
It’s also worth mentioning that our current way of living is here to stay for the foreseeable future. Because of this, we must be cautious about the open-source datasets we’re leveraging for training purposes. Datasets that can be altered, should. Open-source models that cannot be altered need to have a disclaimer so it’s clear what projects might be negatively impacted from the outdated training data.
Identifying the new context we’re asking the system to understand is the first step toward moving visual AI forward. Then we need more content. More depictions of the world around us — and the diverse perspectives of it. As we’re amassing this new content, take stock of new potential biases and ways to retrain existing open-source datasets. We all have to monitor for inconsistencies and inaccuracies. Persistence and dedication to retraining computer vision models is how we’ll bring AI into 2020.