Eyes for AI – Computer Vision

Thursday, 30 May 2024 00:11 - - {{hitsCtrl.values.hits}}

Machines that understand the world

We the humans have a pair of eyes, as you open the eyes surrounding light enters and creates an imaginary object on the retina. Optical nerves connected to the retina encode and transmit the image in a form of electrical impulses into your brain’s visual cortex where the received signals are perceived with respect to prior knowledge. Visual perception enables you to interact with your loved ones, read, learn, play, work, drive, and enjoy scenic views. Essentially, computer vision (CV) aims to achieve these human or much better visual perception capabilities using cameras and computers.

Most of the recent advancements in the CV were innovated within the last 20 years thanks to the evolution in GP-GPU (General purpose graphics processing units) like hardware accelerated computing platforms and research success in machine learning, particularly in deep learning. Now CV is surpassing average human capabilities, and indicating superhuman vision capabilities allowing self-driving vehicles and autonomous robots to advance. CV is a branch of Artificial intelligence (AI) providing the eyes, and it uses subjects such as image processing, pattern recognition, machine learning, mathematics, physics, and signal processing.

Humans achieve visual perception without ever thinking about that. However, computers to learn these visual perceptions tasks is hugely difficult due to many aspects. Visual perception is beyond just the calculations, but it is more about learning models of the visual world. Typical computer vision systems start with cameras which could be numerous types, e.g., Monocular cameras, Pan-Tilt-Zoom cameras, Stereo cameras, Structured light-based depth cameras. One scanning by camera is called an image frame which is the input into the CV system. A computing platform can be mobile phone, laptop, PC, Embedded systems, or supercomputer. Nowadays, most of these systems use hardware accelerators such as GP-GPUs.

CV Capabilities

At its core CV tries to see the world and understand it. Therefore, recognizing shapes, objects, human languages (OCR), human faces, human posture, and activities are key capabilities where research and development are focused. Most of these capabilities require extracting some features in the image related to the matter of interest, training, and inference using machine learning models for classifying the items into different classes, recognizing associations and interactions between each other.

Applications of CV

Anything that can be performed with human vision is a potential candidate for CV to perform efficiently and accurately. CV is a core enabling technology for autonomous robotics, self-driving vehicles, augmented reality, and visual data analytics. Each of those creates numerous applications in various fields such as in agriculture, industry, medical, defence, mining, space, and social segments.

In Autonomous robotics and self-driving vehicles, CV helps following key tasks.

Localization with respect to the surrounding and map (SLAM)
Obstacle detection and avoidance
Human robot interaction (HRI)
Lane detection
Traffic sign detection
Pedestrian, Vehicles, Building and other object recognition

In Augmented reality, CV helps following core enabling capabilities.

Object detection and recognition
Localization and registration
Human interaction
Generative models for avatar generation

In Visual Analytics, CV provides following key capabilities.

Recognize objects, person, texts of interest
nRecognize faces, and fingerprints
3D reconstruction and mapping of the objects
Recognizing personal behaviour, person-person interactions, and person-object interactions
Further in following business segments have various applications.

Industrial

Factory automation - recognize the objects and find the object’s pose for handling of pallets and materials
Manufacturing - automated defect analysis
Warehouses - automated parcel sorting, address sorting using OCR and autonomous moving of the parcels by delivery robots

Agriculture

Soil and land surveying using aerial imagery

Crop monitoring - diagnosis issues using aerial and closer images, crop estimation
Spraying - vegetation and weed detection for where to spray
Harvesting – autonomous farm vehicle operation
Quality control – grade the harvest into different categories

Mining

Automated heavy vehicles and extraction
Automated sorting of different grades of minerals
Automated mapping of resources and minerals

Medical

Reconstruction of MRI and CT images
Diagnosis of specific medical conditions by analysing scans
Patient monitoring
Surgical robots – anatomical landmarks identification
Training medical staff – augmented reality for superimposing anatomical and underlying layers

Transport

Advanced Driver Assistance Systems (ADAS) and Intelligent transport systems (ITS)
Real time traffic analysis and control
Automated ticketing and parking with number plate recognition
Law enforcement - mobile phone and distracted driver behaviour
Driver fatigue detection

Space

Satellite image analysis for climate, natural disaster prediction and monitoring
Automated space exploration, planetary surface navigation
Space robotics for construction and maintenance missions
Automated astronomical event and anomaly object identification
Earth monitoring for asteroids

Retail

Bar codes (have been there already for long time)
Automated checkouts, cashier less stores, stock monitoring
Theft detection
Customer behaviour analysis - user heatmap
Virtual cloths fit on

Product quality recognition – e.g., for fish, fruits, vegetables, etc.

Social

Large crowd monitoring, recognizing social violations
Recognizing vandalism and threats Social identification/passport
Victim identification in natural disasters and emergencies

Opportunities in enterprise and social development in Sri Lanka

Research and develop CV and AI related products, solutions and services and export to overseas.
CV and VR/AR based applications for tourism
Agricultural crop monitoring and spraying
Wildlife monitoring. E.g., Elephants

Implement more disciplined road traffic system with CV based un-biased traffic violation monitoring system.

Automated surveillance and monitoring for suspected activities such as corruptions and to keep the society peaceful
Automated medical diagnosis for shortage on medical specialists and for more accurate results
Remote sensing for natural disaster monitoring and analysis
Sorting for waste management - sorting and monitoring

We are living at a very interesting point in the timeline in the evolution of the Computer Vision and AI in general. Hardware gets computationally more powerful, compact, cost-effective, and power efficient while software gets more robust inside to handle finer details, and researchers are pushing the boundaries. Sri Lanka has the potential to be a leader in these emerging technologies.

Author: Dr. Kalana Withanage obtained PhD (2019) in Computer vision/Human-robot interaction (HRI) at the University of South Australia, and BSc (Hons) in Electrical and Information Engineering at the University of Ruhuna (2008), Sri Lanka.