Vector Databases
A vector database is a specialized storage system designed to hold and search embedding vectors — the numeric representations of words, images, and audio that preserve their semantic meaning. While a traditional relational database is optimized for exact matches on columns, a vector database is optimized for similarity: finding the items whose vectors are closest to a query vector in a high-dimensional space.
Vector databases are the storage backbone behind semantic search, recommendations, classification, image retrieval, and retrieval-augmented generation (RAG). They give an LLM long-term memory over data it was never trained on.
Why a Dedicated Database?
Embedding vectors are typically 512–3,072 floats each. Comparing a query vector against millions of stored vectors with brute-force math is too slow for interactive use. Vector databases solve this with specialized approximate nearest neighbor (ANN) indexes such as HNSW, IVF, and DiskANN, which can run similarity searches in milliseconds.
On top of fast search, a modern vector database provides:
- CRUD operations — upsert, retrieve, delete records.
- Metadata storage — keep the original payload next to the vector.
- Filtering — combine vector similarity with structured WHERE-style filters.
- Hybrid search — merge vector similarity with full-text keyword search.
- Scalability — handle millions or billions of vectors.
The End-to-End Workflow
- Inputs — raw non-numeric data: words, images, audio.
- Vectorization — an embedding model turns each input into a numeric vector that preserves its meaning.
- Vector Database — the vectors (plus the original record) are stored and indexed.
- AI Model — at query time the model compares a user's vector against the stored vectors and pulls the most relevant records.
- Outputs — the retrieved records are used to summarize text, find related data, or generate new content such as images.
Common Vector Database Providers
| Provider | Type | Notes |
|---|---|---|
| Azure AI Search | Cloud service | Full-text + vector + hybrid search, with enterprise features. |
| Azure Cosmos DB | Multi-model database | Vector search alongside document, key-value, and graph data. |
| Azure SQL / SQL Server 2025 | Relational database | Native vector column type. |
| PostgreSQL + pgvector | Relational database | Open-source extension that adds vector columns and indexes. |
| Qdrant / Milvus / Weaviate / Pinecone | Dedicated vector DB | Purpose-built for high-volume vector workloads. |
| InMemoryVectorStore | In-process | Lightweight store, ideal for prototyping and demos. |
Vector Databases in .NET
The Microsoft.Extensions.VectorData library defines a provider-agnostic API. Every provider implements the same VectorStore and VectorStoreCollection<TKey,TRecord> abstractions, so you can swap one backend for another without rewriting your application logic.
Install the packages used in this demo:
Defining a Vector Store Record
A record is a plain C# class decorated with attributes:
| Attribute | Purpose |
|---|---|
[VectorStoreKey] | Marks the primary key property. |
[VectorStoreData] | Marks a payload property stored alongside the vector. |
[VectorStoreData(IsIndexed = true)] | Makes a property usable in filters. |
[VectorStoreVector(n)] | Marks the embedding property, where n is the vector dimension. |
Core Collection Operations
| Method | Description |
|---|---|
EnsureCollectionExistsAsync() | Creates the collection if it doesn't already exist. |
UpsertAsync(record or IEnumerable) | Inserts or replaces one or many records in a single call. |
GetAsync(key) | Retrieves a single record by its key. |
SearchAsync(vector, top: N) | Returns the N most similar records. |
SearchAsync(vector, top: N, new VectorSearchOptions<T> { Filter = ... }) | Similarity search with a LINQ metadata filter. |
DeleteAsync(key) | Deletes a record by its key. |
Full Example
The following demo builds a small museum artifact catalog. Each artifact has an era, a title, and a curator's description. The demo creates a collection, batch-upserts all artifacts, runs a plain semantic search, runs a filtered search restricted to one era, then retrieves, updates, and deletes individual records to show every core CRUD operation.
Key Takeaways
- A vector database stores embedding vectors and searches them by similarity.
- It turns raw words, images, and audio into long-term semantic memory for AI models.
- Modern vector DBs combine similarity search with metadata filters and hybrid keyword search.
- In .NET, the Microsoft.Extensions.VectorData abstractions let you swap providers (InMemory, Azure AI Search, Cosmos DB, Qdrant…) without rewriting app logic.
- The core operations are
Upsert,Get,Search, andDelete— just like any database.
Reference
Vector databases for .NET AI apps — Microsoft Learn