Utilizing Viral Youtube Challenges as Curated Data Sets for Deep Learning

Google just published a really interesting article about how they developed their depth estimation algorithm using data from a popular viral "Mannequin challenge".   This popular YouTube challenge had people in a variety of scenarios holding rigid poses while a handheld camera moves through the scene.  This provides a fantastic data set as humans are usually the salient target of a camera and the complexity of kinetic human movement creates additional computational complexity that isn’t present in this data.  This challenge had diverse participation from all over the world and in vastly differing settings providing a particularly useful data set.

The results are incredible

The results are incredible

Using over 2000 videos they were able to achieve fantastic results when compared with other state-of-the-art depth estimation approaches.

dataset_compressed-min.gif

After looking at the successful utilization of these crowd-sourced data sets, what other utility can be drawn from other available viral video data sets?  

The ALS Ice Bucket Challenge and the Onset of Hypothermia

The first thing that came to my mind was the ALS Ice Bucket Challenge, in which participants are doused with ice water while there reactions are filmed.  This curated data set shares some of the valuable features of the Mannequin challenge, but instead offers us a different avenue of investigation.  Can we use data from these videos to detect the symptoms of hypothermia or other temperature induced maladies?  There are almost 2 million results when searching for the "Ice Bucket Challenge".  We have a remarkable opportunity to use these memes to generate valuable insights into human reactions to stimuli. 

Cinnamon Challenge and Respiratory inflammation

I don't advocate anyone give this one a try, but the Cinnamon Challenge had participants attempt to swallow a spoonful of cinnamon which would cause most individuals to violently cough and inevitably inhale fine particles of cinnamon.  The individuals experience a high degree of respiratory distress, and once again are captured on camera for us to analyze.  

Just looking through the list of viral challenges, a few look like they could provide valuable medical insights and may be worth investigating.

Ghost Pepper Challenge - Irritation/Nausea/Vomiting/Analgesic Reactions

Rotating Corn Challenge - Loose Teeth/Tooth Decay/Gum Disease

Tide Pod Challenge - Poisoning

Kylie Jenner Lip Challenge - Inflammation/Allergic reactions

Car Surfing Challenge - Scrapes/Lacerations/Bruising/Broken Bones/Overall Life Expectancy

What other Challenges can provide insight for us?

References:

Learning Depths of Moving People by Watching Frozen People

https://www.youtube.com/watch?v=fj_fK74y5_0

Moving Camera, Moving People: A Deep Learning Approach to Depth Prediction

https://ai.googleblog.com/2019/05/moving-camera-moving-people-deep.html

You can read "Learning the Depths of Moving People by Watching Frozen People" here:

https://arxiv.org/pdf/1904.11111.pdf

Acknowledgements

The research described in this post was done by Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu and Bill Freeman. We would like to thank Miki Rubinstein for his valuable feedback.

AI Bias - Questions on the Future of Image Recognition

I'm interested in collaborating on a project about Bias in AI. I made a prototype of an Image Recognition app that detects and classifies objects in a photo. After a running a few tests I began to notice that race and gender were categories that occasionally would appear.

This made me more broadly curious about the practical implications of how AI/Machine Learning is designed and implemented, and the impact these choices could have in the future. There are many different image recognition platforms available to developers that approach the problem in differing ways. Some utilize metadata from curated image datasets, some use images shared on social media, and some use human resources(Mechanical Turk) to tag photos. How do these models differ with respect to inherent cultural, religious, and ethnic biases? The complicated process of classifying abstract more notions such as race, gender or emotion leave a lot of interpretation up to the viewer. Not to mention, the problem of the Null Set, in which ambiguous classifications may not be tagged leaving cruical information out of predictive models.
As a result of these different modes of classification:

What does this AI think a gender, or a race are?

How is the data seen as significant, and under what circumstances should it be used?

Should AI be designed such that it is "color blind"

Please let me know your thoughts in the comments below!

If you are interested in collaborating, or playing with the prototype that led to this discussion, join the mailing list at AIBias.com or Showblender.com


http://imgur.com/a/75qNn