Comprehensive Guide to Filtering in Qdrant

vandriichuk Comprehensive Guide to Filtering in Qdrant Filter 907d7d46 f449 4bb8 a0cc 47fa0601de40 3 Comprehensive Guide to Filtering in Qdrant

Filtering is a crucial feature in vector databases like Qdrant, allowing users to refine search results based on specific criteria. This guide will explore how filtering works in Qdrant, its implementation, and best practices for optimal performance.

What is Filtering?

In the context of vector search, filtering allows you to narrow down search results based on metadata associated with your vectors. For example, if you’re searching for laptops in an e-commerce database, you might want to filter results based on price, brand, or specifications.

How Qdrant Handles Filtering

Qdrant uses a specific method for search and filtering through dense vectors. When filters are applied, some data points become unreachable for search as the filters break connections in the search graph. To overcome this, Qdrant creates additional connections between the remaining data points, ensuring efficient search capabilities even with filters applied.

Implementing Filters in Qdrant

Let’s look at an example of how to implement filtering in Qdrant using Python:

import requests

# Example data
laptops = [
    (1, [0.1, 0.2, 0.3, 0.4], {"price": 899.99, "category": "laptop"}),
    (2, [0.2, 0.3, 0.4, 0.5], {"price": 1299.99, "category": "laptop"}),
    (3, [0.3, 0.4, 0.5, 0.6], {"price": 799.99, "category": "laptop"}),
    (4, [0.4, 0.5, 0.6, 0.7], {"price": 1099.99, "category": "laptop"}),
    (5, [0.5, 0.6, 0.7, 0.8], {"price": 949.99, "category": "laptop"})
]

# Search request with filter
search_request = {
    "vector": [0.2, 0.1, 0.9, 0.7],
    "filter": {
        "must": [
            {
                "key": "category",
                "match": { "value": "laptop" }
            },
            {
                "key": "price",
                "range": {
                    "lte": 1000
                }
            }
        ]
    },
    "limit": 3,
    "with_payload": True,
    "with_vector": False
}

response = requests.post(
    "http://localhost:6333/collections/online_store/points/search",
    json=search_request
)

print(response.json())
Python

This example searches for laptops priced under $1000.

Advanced Filtering Techniques

Complex Filters

Qdrant supports combining multiple conditions using the following operators:

  • must: All conditions must be met (AND logic)
  • should: At least one condition must be met (OR logic)
  • must_not: The condition must not be met (NOT logic)

Example of a complex filter:

{
  "filter": {
    "should": [
      {
        "key": "price",
        "range": {
          "lte": 1000
        }
      },
      {
        "key": "subcategory",
        "match": { "value": "ultrabook" }
      }
    ]
  }
}
Python

This filter will return laptops either priced under $1000 or categorized as ultrabooks.

Nested Filters

For complex data structures, such as arrays of objects in metadata, you can use nested filters. Here’s an example with dinosaur diet preferences:

{
  "filter": {
    "must": [{
      "nested": {
        "key": "diet",
        "filter": {
          "must": [
            {
              "key": "food",
              "match": { "value": "meat" }
            },
            {
              "key": "likes",
              "match": { "value": true }
            }
          ]
        }
      }
    }]
  }
}
Python

This filter would find dinosaurs that like meat.

Optimizing Filter Performance

Indexing Metadata

To improve search performance with filtering, it’s recommended to create indexes on metadata fields. This allows Qdrant to quickly find relevant data points without scanning the entire collection.

Example of creating an index:

import requests

index_request = {
    "field_name": "category",
    "field_schema": "keyword"
}

response = requests.put(
    "http://localhost:6333/collections/online_store/index",
    json=index_request
)
Python

Benefits of Indexing

  1. Faster searches: Indexes allow quick location of data points based on metadata values.
  2. Reduced load: Less computation is required when searching with filters.
  3. Resource optimization: Efficient use of memory and CPU time.

Working with Scrolling

For scenarios where you need to retrieve all data points matching a certain filter, you can use the scrolling method. This allows you to iterate through results page by page.

Example of using scrolling:

scroll_request = {
    "filter": {
        "must": [
            {
                "key": "category",
                "match": { "value": "laptop" }
            }
        ]
    },
    "limit": 10,
    "with_payload": True,
    "with_vector": False,
    "order_by": [
        { "key": "price" }
    ]
}

response = requests.post(
    "http://localhost:6333/collections/online_store/points/scroll",
    json=scroll_request
)
Python

Best Practices

  1. Filtering on numeric values: When filtering on floating-point numeric values, use ranges instead of exact matches to avoid rounding errors.
  2. Pagination: When implementing pagination, ensure proper filter application to exclude already retrieved results and avoid duplication.
  3. Multi-tenant systems: In systems with multiple users, it’s recommended to index fields that denote the tenant (e.g., user_id) and mark them as is_tenant: true for optimized search and data isolation.

By following these guidelines and leveraging Qdrant’s powerful filtering capabilities, you can create efficient and accurate vector search systems tailored to your specific needs.

Leave a Reply