Eyes for AI – Computer Vision

Thursday, 30 May 2024 00:11 -     - {{hitsCtrl.values.hits}}

Machines that understand the world

We the humans have a pair of eyes, as you open the eyes surrounding light enters and creates an imaginary object on the retina. Optical nerves connected to the retina encode and transmit the image in a form of electrical impulses into your brain’s visual cortex where the received signals are perceived with respect to prior knowledge. Visual perception enables you to interact with your loved ones, read, learn, play, work, drive, and enjoy scenic views. Essentially, computer vision (CV) aims to achieve these human or much better visual perception capabilities using cameras and computers. 

Most of the recent advancements in the CV were innovated within the last 20 years thanks to the evolution in GP-GPU (General purpose graphics processing units) like hardware accelerated computing platforms and research success in machine learning, particularly in deep learning. Now CV is surpassing average human capabilities, and indicating superhuman vision capabilities allowing self-driving vehicles and autonomous robots to advance. CV is a branch of Artificial intelligence (AI) providing the eyes, and it uses subjects such as image processing, pattern recognition, machine learning, mathematics, physics, and signal processing.

Humans achieve visual perception without ever thinking about that. However, computers to learn these visual perceptions tasks is hugely difficult due to many aspects. Visual perception is beyond just the calculations, but it is more about learning models of the visual world. Typical computer vision systems start with cameras which could be numerous types, e.g., Monocular cameras, Pan-Tilt-Zoom cameras, Stereo cameras, Structured light-based depth cameras. One scanning by camera is called an image frame which is the input into the CV system. A computing platform can be mobile phone, laptop, PC, Embedded systems, or supercomputer. Nowadays, most of these systems use hardware accelerators such as GP-GPUs. 



CV Capabilities

At its core CV tries to see the world and understand it. Therefore, recognizing shapes, objects, human languages (OCR), human faces, human posture, and activities are key capabilities where research and development are focused. Most of these capabilities require extracting some features in the image related to the matter of interest, training, and inference using machine learning models for classifying the items into different classes, recognizing associations and interactions between each other.

Applications of CV

Anything that can be performed with human vision is a potential candidate for CV to perform efficiently and accurately. CV is a core enabling technology for autonomous robotics, self-driving vehicles, augmented reality, and visual data analytics. Each of those creates numerous applications in various fields such as in agriculture, industry, medical, defence, mining, space, and social segments.



In Autonomous robotics and self-driving vehicles, CV helps following key tasks.

  • Localization with respect to the surrounding and map (SLAM)
  •  Obstacle detection and avoidance
  • Human robot interaction (HRI)
  •  Lane detection
  • Traffic sign detection
  • Pedestrian, Vehicles, Building and other object recognition

In Augmented reality, CV helps following core enabling capabilities.

  • Object detection and recognition
  • Localization and registration
  •  Human interaction
  • Generative models for avatar generation

In Visual Analytics, CV provides following key capabilities.

  •  Recognize objects, person, texts of interest
  • nRecognize faces, and fingerprints
  •  3D reconstruction and mapping of the objects
  • Recognizing personal behaviour, person-person interactions, and person-object interactions
  •  Further in following business segments have various applications.

Industrial

  • Factory automation - recognize the objects and find the object’s pose for handling of pallets and materials
  •  Manufacturing - automated defect analysis
  • Warehouses - automated parcel sorting, address sorting using OCR and autonomous moving of the parcels by delivery robots

   



Agriculture

Soil and land surveying using aerial imagery  

  •  Crop monitoring - diagnosis issues using aerial and closer images, crop estimation
  •  Spraying - vegetation and weed detection for where to spray
  • Harvesting – autonomous farm vehicle operation  
  • Quality control – grade the harvest into different categories

    

Mining

  • Automated heavy vehicles and extraction
  •  Automated sorting of different grades of minerals
  • Automated mapping of resources and minerals 

Medical

  • Reconstruction of MRI and CT images
  • Diagnosis of specific medical conditions by analysing scans 
  • Patient monitoring
  • Surgical robots – anatomical landmarks identification  
  • Training medical staff – augmented reality for superimposing anatomical and underlying layers 

     

Transport

  • Advanced Driver Assistance Systems (ADAS) and Intelligent transport systems (ITS)
  •  Real time traffic analysis and control
  • Automated ticketing and parking with number plate recognition
  • Law enforcement - mobile phone and distracted driver behaviour
  • Driver fatigue detection

     

Space

  • Satellite image analysis for climate, natural disaster prediction and monitoring
  • Automated space exploration, planetary surface navigation
  • Space robotics for construction and maintenance missions
  •  Automated astronomical event and anomaly object identification
  • Earth monitoring for asteroids



Retail

  •  Bar codes (have been there already for long time) 
  • Automated checkouts, cashier less stores, stock monitoring
  •  Theft detection
  • Customer behaviour analysis - user heatmap
  •  Virtual cloths fit on

Product quality recognition – e.g., for fish, fruits, vegetables, etc.

Social 

  •  Large crowd monitoring, recognizing social violations
  • Recognizing vandalism and threats   Social identification/passport
  • Victim identification in natural disasters and emergencies

     

Opportunities in enterprise and social development in Sri Lanka

  • Research and develop CV and AI related products, solutions and services and export to overseas. 
  • CV and VR/AR based applications for tourism 
  •  Agricultural crop monitoring and spraying
  • Wildlife monitoring. E.g., Elephants

Implement more disciplined road traffic system with CV based un-biased traffic violation monitoring system.

  • Automated surveillance and monitoring for suspected activities such as corruptions and to keep the society peaceful
  •  Automated medical diagnosis for shortage on medical specialists and for more accurate results
  •  Remote sensing for natural disaster monitoring and analysis 
  • Sorting for waste management - sorting and monitoring

We are living at a very interesting point in the timeline in the evolution of the Computer Vision and AI in general. Hardware gets computationally more powerful, compact, cost-effective, and power efficient while software gets more robust inside to handle finer details, and researchers are pushing the boundaries. Sri Lanka has the potential to be a leader in these emerging technologies. 

 

Author: Dr. Kalana Withanage obtained PhD (2019) in Computer vision/Human-robot interaction (HRI) at the University of South Australia, and BSc (Hons) in Electrical and Information Engineering at the University of Ruhuna (2008), Sri Lanka.

 

COMMENTS