Filtering is a crucial feature in vector databases like Qdrant, allowing users to refine search results based on specific criteria. This guide will explore how filtering works in Qdrant, its implementation, and best practices for optimal performance.
What is Filtering?
In the context of vector search, filtering allows you to narrow down search results based on metadata associated with your vectors. For example, if you’re searching for laptops in an e-commerce database, you might want to filter results based on price, brand, or specifications.
How Qdrant Handles Filtering
Qdrant uses a specific method for search and filtering through dense vectors. When filters are applied, some data points become unreachable for search as the filters break connections in the search graph. To overcome this, Qdrant creates additional connections between the remaining data points, ensuring efficient search capabilities even with filters applied.
Implementing Filters in Qdrant
Let’s look at an example of how to implement filtering in Qdrant using Python:
import requests
# Example data
laptops = [
(1, [0.1, 0.2, 0.3, 0.4], {"price": 899.99, "category": "laptop"}),
(2, [0.2, 0.3, 0.4, 0.5], {"price": 1299.99, "category": "laptop"}),
(3, [0.3, 0.4, 0.5, 0.6], {"price": 799.99, "category": "laptop"}),
(4, [0.4, 0.5, 0.6, 0.7], {"price": 1099.99, "category": "laptop"}),
(5, [0.5, 0.6, 0.7, 0.8], {"price": 949.99, "category": "laptop"})
]
# Search request with filter
search_request = {
"vector": [0.2, 0.1, 0.9, 0.7],
"filter": {
"must": [
{
"key": "category",
"match": { "value": "laptop" }
},
{
"key": "price",
"range": {
"lte": 1000
}
}
]
},
"limit": 3,
"with_payload": True,
"with_vector": False
}
response = requests.post(
"http://localhost:6333/collections/online_store/points/search",
json=search_request
)
print(response.json())
PythonThis example searches for laptops priced under $1000.
Advanced Filtering Techniques
Complex Filters
Qdrant supports combining multiple conditions using the following operators:
must
: All conditions must be met (AND logic)should
: At least one condition must be met (OR logic)must_not
: The condition must not be met (NOT logic)
Example of a complex filter:
{
"filter": {
"should": [
{
"key": "price",
"range": {
"lte": 1000
}
},
{
"key": "subcategory",
"match": { "value": "ultrabook" }
}
]
}
}
PythonThis filter will return laptops either priced under $1000 or categorized as ultrabooks.
Nested Filters
For complex data structures, such as arrays of objects in metadata, you can use nested filters. Here’s an example with dinosaur diet preferences:
{
"filter": {
"must": [{
"nested": {
"key": "diet",
"filter": {
"must": [
{
"key": "food",
"match": { "value": "meat" }
},
{
"key": "likes",
"match": { "value": true }
}
]
}
}
}]
}
}
PythonThis filter would find dinosaurs that like meat.
Optimizing Filter Performance
Indexing Metadata
To improve search performance with filtering, it’s recommended to create indexes on metadata fields. This allows Qdrant to quickly find relevant data points without scanning the entire collection.
Example of creating an index:
import requests
index_request = {
"field_name": "category",
"field_schema": "keyword"
}
response = requests.put(
"http://localhost:6333/collections/online_store/index",
json=index_request
)
PythonBenefits of Indexing
- Faster searches: Indexes allow quick location of data points based on metadata values.
- Reduced load: Less computation is required when searching with filters.
- Resource optimization: Efficient use of memory and CPU time.
Working with Scrolling
For scenarios where you need to retrieve all data points matching a certain filter, you can use the scrolling method. This allows you to iterate through results page by page.
Example of using scrolling:
scroll_request = {
"filter": {
"must": [
{
"key": "category",
"match": { "value": "laptop" }
}
]
},
"limit": 10,
"with_payload": True,
"with_vector": False,
"order_by": [
{ "key": "price" }
]
}
response = requests.post(
"http://localhost:6333/collections/online_store/points/scroll",
json=scroll_request
)
PythonBest Practices
- Filtering on numeric values: When filtering on floating-point numeric values, use ranges instead of exact matches to avoid rounding errors.
- Pagination: When implementing pagination, ensure proper filter application to exclude already retrieved results and avoid duplication.
- Multi-tenant systems: In systems with multiple users, it’s recommended to index fields that denote the tenant (e.g., user_id) and mark them as
is_tenant: true
for optimized search and data isolation.
By following these guidelines and leveraging Qdrant’s powerful filtering capabilities, you can create efficient and accurate vector search systems tailored to your specific needs.
Leave a Reply
You must be logged in to post a comment.