How will design trends evolve to address the emergence of a new audience of Machine Vision and its integration with human perception?

Background

Trends in design have varied dramatically over the years to adapt to the ever-changing expectations of audiences. As technology advances and integrates with humans, our very perception of the world is enhanced. Computer Vision operates using distinctly different techniques than human vision. By examining differences in these complementary modalities, we can adapt new design patterns better suited for communication to both audiences.

Over the last couple of years I've become really excited with the Machine Learning tools that have become available(Turicreate, CoreML, tensorflow etc). After testing different workflows of various Object Detection and Image Annotation technologies I began to notice a pattern in detection performance that seems to be present across each of them.

Simply Shaped Flat logos are hard to detect with Computer Vision.

The current trends of Material or Flat Design create some very elegant and intuitive products that are pleasing to the human eye. Unfortunately, this is not the case for our AI friends, which rely upon shape and pattern complexity for object detection. The fewer visible significant features of an object, makes the job of recognition more ambiguous, leading to mis-categorization and false positives.

By understanding this aspect of computer vision, you can choose to emphasize or de-emphasize objects in Computer Vision by adjusting certain design parameters. These principles are consistent across 2D/3D/XR context. The most obvious design features that influence recognition are texture and shape.

Testing

Take, for example, how the Budweiser logo has changed over time and compare the performance of a naively trained image recognition model across these different logos. What immediately becomes apparent is that the newest, and according to my design sense, the 'best looking' logos are less likely to be detected by the tool. Take a look at the shape complexity of the logos. Notice how the flat icons are often omitted, while those with more complex shading and texture are more accurately recognized. What other features stand out to you?

A 2 Audience Approach

By understanding these differing perspectives, we can develop a new system of visual communication that speaks to both of these audiences. There may be times where creating a unified design theme is perceived equally well with both human and machine, but I think there is an opportunity to utilize their differences to our advantage. Let’s examine the possibilities of visual communication outside the bounds of human perception.

Imagine a scenario where a sign contains 1 image that can be viewed by humans, and another invisible image, superimposed upon the first with information printed in the Infra-Red or Ultraviolet spectrum to communicate with through computer vision without the need to modify or compromise the design of either. Much of the existing imaging technologies we carry with us are optimized for the wavelengths between 400 to 700 nm, so alternative imaging solution would need to be implemented. There are some off the shelf approaches that could be used to encode information in a resilient and robust way.

Infrared or Ultraviolet Printing of embedded data in QR codes offers improved object salience without impacting other human design requirements. We can modify existing signage with UV stenciling or design new more robust systems that integrate dual This design pattern has implications for roadsigns, disability tools, fonts, camouflage, marketing or any XR application.

Self-Driving Cars and Roadsign design - How should road marking and signs be improved for easier detection by computer vision? What other information could be embedded in static or dynamic signage? Imagine lane separators that indicate upcoming interchange information, Offramp Distance with Curve information. Toll lanes, interchange information, turn vector information, environmental warnings, speed limits, and parking zone information.We can imagine a Global initiative to improve signage used for navigation. This could drastically improve the time to market for self-driving cars and other automation processes. There could be digital signage, with human readable information as well as real-time encoded data such as current traffic, emergency services and weather conditions in the non-visible spectrum.

How can we improve disability tools? Aided by computer vision such as text to speech enhanced with embedded information, we could augment a users awareness of their surroundings. Maybe UV/IR Machine Vision enhancements to braille?

How can companies improve their salience in a computer vision context?

Post your ideas in the comments below.

Google just published a really interesting article about how they developed their depth estimation algorithm using data from a popular viral "Mannequin challenge". This popular YouTube challenge had people in a variety of scenarios holding rigid poses while a handheld camera moves through the scene. This provides a fantastic data set as humans are usually the salient target of a camera and the complexity of kinetic human movement creates additional computational complexity that isn’t present in this data. This challenge had diverse participation from all over the world and in vastly differing settings providing a particularly useful data set.

Using over 2000 videos they were able to achieve fantastic results when compared with other state-of-the-art depth estimation approaches.

After looking at the successful utilization of these crowd-sourced data sets, what other utility can be drawn from other available viral video data sets?

The ALS Ice Bucket Challenge and the Onset of Hypothermia

The first thing that came to my mind was the ALS Ice Bucket Challenge, in which participants are doused with ice water while there reactions are filmed. This curated data set shares some of the valuable features of the Mannequin challenge, but instead offers us a different avenue of investigation. Can we use data from these videos to detect the symptoms of hypothermia or other temperature induced maladies? There are almost 2 million results when searching for the "Ice Bucket Challenge". We have a remarkable opportunity to use these memes to generate valuable insights into human reactions to stimuli.

Cinnamon Challenge and Respiratory inflammation

I don't advocate anyone give this one a try, but the Cinnamon Challenge had participants attempt to swallow a spoonful of cinnamon which would cause most individuals to violently cough and inevitably inhale fine particles of cinnamon. The individuals experience a high degree of respiratory distress, and once again are captured on camera for us to analyze.

Just looking through the list of viral challenges, a few look like they could provide valuable medical insights and may be worth investigating.

Ghost Pepper Challenge - Irritation/Nausea/Vomiting/Analgesic Reactions

Rotating Corn Challenge - Loose Teeth/Tooth Decay/Gum Disease

Tide Pod Challenge - Poisoning

Kylie Jenner Lip Challenge - Inflammation/Allergic reactions

Car Surfing Challenge - Scrapes/Lacerations/Bruising/Broken Bones/Overall Life Expectancy

What other Challenges can provide insight for us?

References:

Learning Depths of Moving People by Watching Frozen People

https://www.youtube.com/watch?v=fj_fK74y5_0

Moving Camera, Moving People: A Deep Learning Approach to Depth Prediction

https://ai.googleblog.com/2019/05/moving-camera-moving-people-deep.html

You can read "Learning the Depths of Moving People by Watching Frozen People" here:

https://arxiv.org/pdf/1904.11111.pdf

Acknowledgements

The research described in this post was done by Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu and Bill Freeman. We would like to thank Miki Rubinstein for his valuable feedback.