TLDR:
A vector database is a database optimized for storing and querying high-dimensional vector representations of data (embeddings), enabling fast similarity search across millions to billions of vectors. Vector databases are foundational infrastructure for modern AI applications, particularly RAG systems.
How Vector Databases Differ from Traditional Databases
Traditional relational and document databases query by exact match or keyword similarity. Vector databases query by semantic similarity—finding items whose embedding vectors are closest to a query vector in high-dimensional space. This enables retrieval based on meaning rather than literal text match, supporting use cases that traditional databases cannot serve efficiently.
Core Architecture
Vector databases use approximate nearest neighbor (ANN) algorithms—HNSW, IVF, ScaNN—to make similarity search fast at scale. They typically support filtering on metadata (e.g., retrieve similar documents from a specific user), hybrid search (combining vector similarity with keyword matching) and namespace isolation for multi-tenant applications. Indexing strategies trade off recall, latency, memory usage, and update speed.
Leading Vendors and Open Source Options
Major vector database products include Pinecone (managed cloud), Weaviate (open source + managed), Qdrant (open source + managed), Milvus (open source), and Chroma. Established databases have added vector capabilities: pgvector for PostgreSQL, MongoDB Atlas Vector Search, Elasticsearch dense vector, Redis vector search. The right choice depends on scale, deployment model (cloud vs. on-prem), and integration with existing stack. For most early-stage applications, pgvector or a managed service is sufficient.
References
Vector stores in the compliance perimeter
A vector database is a derived copy of your corpus, and compliance treats it that way: personal data embedded into vectors generally remains personal data (inversion and linkage risks keep it identifiable), so the store needs lawful-basis coverage, access controls mirroring the source systems, retention rules, and erasure mechanics that actually delete vectors and index entries — deleting documents while their embeddings persist fails KVKK/GDPR erasure. Procurement adds the location question: managed vector services are cross-border transfers when hosted abroad, with the 2024 Turkish standard-contract regime applying. Security reviews increasingly name the vector layer explicitly; RAG products should document it like any database of record.