Edit

Share via


Define your data model

Microsoft.Extensions.VectorData uses a model-first approach to interacting with databases.

All methods to upsert or get records use strongly typed model classes. There are two ways to define the data model:

  • By decorating properties on the model classes with attributes that indicate the purpose of each property.
  • By defining your storage schema using a record definition that you supply separately from the data model. The record definition is a VectorStoreCollectionDefinition that contains properties.

Here's an example of a class, or data model, whose properties are decorated with VectorStore*Attribute attributes.

public class Hotel
{
    [VectorStoreKey]
    public ulong HotelId { get; set; }

    [VectorStoreData(IsIndexed = true)]
    public required string HotelName { get; set; }

    [VectorStoreData(IsFullTextIndexed = true)]
    public required string Description { get; set; }

    [VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)]
    public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

    [VectorStoreData(IsIndexed = true)]
    public required string[] Tags { get; set; }
}

Data model properties

Note

The .NET property types supported for keys, data, and vectors vary across databases. For information on supported types, check the documentation of your chosen vector store provider.

Key property

Each data model must have a key property that uniquely identifies each record in the collection.

Use the VectorStoreKeyAttribute attribute to indicate that your property is the primary key of the record.

[VectorStoreKey]
public ulong HotelId { get; set; }

The following table shows the parameters for VectorStoreKeyAttribute.

Parameter Required Description
IsAutoGenerated No Indicates whether the key value is auto-generated by the database. Default is false.
StorageName No Can be used to supply an alternative name for the property in the database. This parameter isn't supported by all providers, for example, where alternatives like JsonPropertyNameAttribute are supported.

Data property

Data properties hold general-purpose content such as text, tags, or other metadata that is retrieved when searching for records, and can optionally also be indexed for filtering.

Use the VectorStoreDataAttribute attribute to indicate that your property contains general data that is not a key or a vector.

[VectorStoreData(IsIndexed = true)]
public required string HotelName { get; set; }

The following table shows the parameters for VectorStoreDataAttribute.

Parameter Required Description
IsIndexed No Indicates whether the property should be indexed for filtering in cases where a database requires opting in to indexing per property. The default is false.
IsFullTextIndexed No Indicates whether the property should be indexed for full text search for databases that support full text search. The default is false.
StorageName No Can be used to supply an alternative name for the property in the database. This parameter is not supported by all providers, for example, where alternatives like JsonPropertyNameAttribute are supported.

Vector property

Vector properties contain the embedding vectors used for similarity search; in advanced scenarios, a data model can have multiple vector properties to support searching over different aspects of the record.

Use the VectorStoreVectorAttribute attribute to indicate that your property contains a vector.

[VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

It's also possible to use VectorStoreVectorAttribute on properties that don't have a vector type, for example, a property of type string. When a property is decorated in this way, you need to provide an IEmbeddingGenerator instance to the vector store. When upserting the record, the text that's in the string property is automatically converted and stored as a vector in the database. (It's not possible to retrieve a vector using this mechanism.)

[VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)]
public string DescriptionEmbedding { get; set; }

Tip

For more information on how to use built-in embedding generation, see Vector properties and embedding generation.

The following table shows the parameters for VectorStoreVectorAttribute.

Parameter Required Description
Dimensions Yes The number of dimensions that the vector has. This is required when creating a vector index for a collection.
IndexKind No The type of index to index the vector with. Default varies by vector store type.
DistanceFunction No The type of function to use when doing vector comparison during vector search over this vector. Default varies by vector store type.
StorageName No Can be used to supply an alternative name for the property in the database. This parameter is not supported by all providers, for example, where alternatives like JsonPropertyNameAttribute is supported.

Common index kinds and distance function types are supplied as static values on the IndexKind and DistanceFunction classes. Individual vector store implementations might also use their own index kinds and distance functions, where the database supports unusual types.

Vector properties and embedding generation

Vector databases are all about storing embeddings - or numerical representations of your data - which are generated by an embedding model. When storing or searching data, embedding generation must be performed first to convert the searchable data to such embeddings. MEVD provides two approaches to embedding generation: manual and automatic.

Manual, low-level embedding generation

You can define your vector property as float[] or ReadOnlyMemory<float>, representing the embedding directly, and generate embeddings yourself before each operation:

[VectorStoreVector(Dimensions: 1536)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

When searching, you'd generate the embedding for your query text and pass it to SearchAsync:

ReadOnlyMemory<float> searchEmbedding =
    (await embeddingGenerator.GenerateAsync("Find a happy hotel")).Vector;

var searchResult = collection.SearchAsync(searchEmbedding, top: 3);

While this works, it requires you to manage embedding generation at every call site.

The recommended approach is to configure an IEmbeddingGenerator<TInput,TEmbedding> on your vector store. This lets you define your vector property using the source type (for example, string) instead of float[] or ReadOnlyMemory<float>. MEVD then handles embedding generation automatically during both upsert and search operations.

First, define the vector property as string:

[VectorStoreVector(Dimensions: 1536)]
public string DescriptionEmbedding { get; set; }

Then, configure an embedding generator when creating your vector store:

VectorStore vectorStore = new QdrantVectorStore(
    new QdrantClient("localhost"),
    ownsClient: true,
    new QdrantVectorStoreOptions
    {
        EmbeddingGenerator = embeddingGenerator
    });

You can now pass text directly - MEVD generates embeddings under the hood:

// Search with a plain text query - embedding is generated automatically.
var searchResult = collection.SearchAsync("Find a happy hotel", top: 3);

Important

Vector properties configured this way don't support retrieving the generated vector or the original text from the database. If you need to store the original text, add a separate data property.

Embedding generators can also be configured at the collection, record definition, or individual vector property level. Different embedding models support different vector sizes; make sure the Dimensions value matches the model you've configured. For more information on embedding generators and the Microsoft.Extensions.AI abstractions, see Embeddings in .NET.

Dynamic mapping to a .NET Dictionary

There are cases where it isn't desirable or possible to map a strongly typed .NET type to the database. For example, imagine that you don't know at compile time what your database schema looks like, and the schema is only provided via configuration. Creating a .NET type that reflects the schema would be impossible in this case. Instead, you can map dynamically by using a Dictionary<string, object?> for the record type. Properties are added to the Dictionary with key as the property name and the value as the property value.

Note

Most apps will simply use strongly typed .NET types to model their data. Dynamic mapping via Dictionary<string, object?> is for advanced, arbitrary data-mapping scenarios.

Supply schema information when using Dictionary

When you use a Dictionary, providers still need to know what the database schema looks like. Without the schema information, the provider would not be able to create a collection or know how to map to and from the storage representation that each database uses.

You can use a record definition to provide the schema information. Unlike a data model, a record definition can be created from configuration at runtime when schema information isn't known at compile time.

Example

To use Dictionary with a provider, specify it as your data model when you create the collection. Also provide a record definition.

VectorStoreCollectionDefinition definition = new()
{
    Properties =
    [
        new VectorStoreKeyProperty("Key", typeof(string)),
        new VectorStoreDataProperty("Term", typeof(string)),
        new VectorStoreDataProperty("Definition", typeof(string)),
        new VectorStoreVectorProperty("DefinitionEmbedding", typeof(ReadOnlyMemory<float>), dimensions: 1536)
    ]
};

// Use GetDynamicCollection instead of the regular GetCollection method
// to get an instance of a collection using Dictionary<string, object?>.
VectorStoreCollection<object, Dictionary<string, object?>> dynamicDataModelCollection =
    vectorStore.GetDynamicCollection("glossary", definition);

// Since schema information is available from the record definition,
// it's possible to create a collection with the right vectors,
// dimensions, indexes, and distance functions.
await dynamicDataModelCollection.EnsureCollectionExistsAsync();

// When retrieving a record from the collection,
// access key, data, and vector values via the dictionary entries.
Dictionary<string, object?>? record = await dynamicDataModelCollection.GetAsync("SK");
Console.WriteLine(record["Definition"]);