TLDR:
An embedding is a dense vector representation of data—text, images, audio, code—that captures semantic meaning in a continuous numerical space. Embeddings are the foundation of modern AI: they enable semantic search, recommendations, clustering, and serve as input to LLMs and other downstream models.
How Embeddings Work
An embedding model (e.g., OpenAI text-embedding-3, Cohere Embed, Voyage AI, open-source models like sentence-transformers) takes input data and produces a fixed-length vector—typically 384 to 3,072 dimensions. Semantically similar inputs produce vectors that are close together in the embedding space, measured by cosine similarity or Euclidean distance. The relationships are learned during model training on large datasets.
Use Cases
Embeddings power many production AI applications: semantic search (find documents by meaning rather than keywords), recommendation systems (find items similar to user preferences), clustering and classification (group similar items), deduplication (find near-duplicate content), and as inputs to RAG pipelines and downstream ML models. They are foundational infrastructure—almost every production AI system uses embeddings somewhere in the stack.
Choosing an Embedding Model
Selection criteria include: semantic quality (measured on benchmarks like MTEB), dimensionality (higher dimensions can capture more nuance but cost more in storage and compute), domain specialization (general-purpose vs. legal/medical/code-specialized models), supported languages (multilingual capability), and pricing/licensing. Many production teams maintain multiple embedding models for different content types, with periodic re-embedding when models improve significantly.