TLDR:
Computer vision is the field of AI that enables machines to interpret and understand visual information from images and video. It encompasses tasks from basic image classification to complex scene understanding, 3D reconstruction, and visual reasoning. Modern computer vision is dominated by deep learning approaches, particularly convolutional neural networks (CNNs) and Vision Transformers (ViT).
Major Tasks
Computer vision encompasses many tasks: image classification (what is in this image?), object detection (locating and classifying objects within images), semantic segmentation (labeling every pixel), instance segmentation (separating distinct instances of the same class), pose estimation (locating human body keypoints), face recognition, optical character recognition (OCR), visual question answering, image generation (handled by diffusion models), and video understanding (action recognition, tracking).
Industry Applications
Computer vision powers diverse applications: autonomous vehicles (Tesla, Waymo, Cruise), medical imaging (radiology, pathology, dermatology), manufacturing quality control, agricultural monitoring (crop health, yield prediction), retail (cashierless stores, theft prevention), security and surveillance, content moderation, augmented and virtual reality, and increasingly creative tools. Foundation models like CLIP and SAM have transformed the field by enabling zero-shot vision tasks.
Legal and Ethical Issues
Computer vision raises particularly acute legal questions: facial recognition is restricted or banned in many jurisdictions (EU AI Act prohibits most public-space real-time facial recognition; multiple US cities ban government use; Illinois BIPA imposes private biometric obligations); demographic bias in detection systems (multiple landmark studies showing higher error rates for darker skin tones); surveillance concerns and chilling effects on civil liberties; data privacy and consent for training; and copyright when training models on copyrighted imagery. Companies deploying computer vision should map their use cases against applicable biometric, privacy, and AI regulations early.