Multi-Stage Vector Querying Using Matryoshka Representation Learning (MRL) in Qdrant

vandriichuk Multi Stage Vector Querying Using Matryoshka Repr 338e5bfa 8bab 4ce2 986b ca34f436cacf 1 Multi-Stage Vector Querying Using Matryoshka Representation Learning (MRL) in Qdrant

Data retrieval is a crucial component in creating an efficient Retrieval Augmented Generation (RAG) application. The effectiveness of data retrieval directly impacts the performance, accuracy, and reliability of the application.

There are various methods of data retrieval from vector databases. Some of the most efficient ones are:

  1. Self-Query Retrieval
  2. Multi-Stage Query
  3. Auto-Merging Retrieval
  4. Hybrid Retrieval

In this article, we will explore Multi-Stage Query for data retrieval using Matryoshka Representation Learning (MRL) to increase the efficiency of fetching data from the database.

So, let’s first understand: What is Matryoshka Representation Learning?

Matryoshka Representation Learning

The name Matryoshka is inspired by the concept of Russian nesting dolls, also known as stacking dolls, which are a set of dolls in decreasing sizes nested within one another.

The core idea of MRL is to learn representations at multiple levels, similar to how Matryoshka dolls are nested within each other with decreasing sizes. Each level of representation encapsulates information from a different level, enabling the model to understand and create an effective hierarchical structure.

There are various advantages of using MRL:

  1. Enhanced Search Efficiency: MRL embeddings allow multi-stage search where initial filtering can be done with smaller embeddings, thus speeding up the search process.
  2. Improved Accuracy: The ability to refine searches with higher-resolution embeddings after an initial broad match ensures that the most relevant results are surfaced.
  3. Flexibility: Depending on the use case, the resolution of the embeddings can be adjusted, providing flexibility in terms of precision and performance.

Now let’s understand: What is Multi-Stage Query Retrieval?

Multi-Stage Query Retrieval

Among the various methods for retrieving data from a database, Multi-Stage Query retrieval is one of the fastest. Though it may lack precision in terms of accuracy, it is preferred for various applications due to its high-speed performance and because it saves a significant amount of computational power and memory.

Let’s try to understand how Multi-Stage Query retrieval works.

Multi-Stage Query retrieval focuses on retrieving data in different stages within a hierarchical structure, starting with smaller embeddings and increasing in size.

Steps Involved in Multi-Stage Querying:

  1. Transformation into Vector Embeddings: Initially, the data is transformed into vector embeddings of different sizes.
  2. Initial Search: Data chunks based on the query are first searched over smaller embeddings where the search can be conducted faster.
  3. Secondary Search: From the selected chunks, the data is further searched over larger embeddings.

This process ensures that data fetching does not take a lot of time since it is relatively faster to search over embeddings with fewer dimensions. To ensure accuracy, a secondary search is conducted over the selected chunks of data using larger embeddings.

Let’s dive deeper and understand more with a practical implementation:

Let’s Code

The first step is to install the necessary Python libraries required for the project.

pip install sentence_transformers
pip install qdrant-client
pip install langchain
pip install -U langchain-community
pip install pypdf
pip install openai
Python

Once the installation process is completed, initialize them to use their modules and sub-modules in the code.

from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client import models
from qdrant_client.http.models import VectorParams, Distance
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from openai import OpenAI
Python

Before beginning the main code, let’s initialize the OpenAI client variable to use the module. Please remember to set up the OPENAI_API_KEY as the environment variable.

from openai import OpenAI
openai_client = OpenAI()
Python

Let’s now create two functions: one for creating smaller embeddings and another for creating larger embeddings.

Small Embeddings Function

Starting with the function for small embeddings first. Here we are using text-embedding-3-small as the embedding model. We are explicitly declaring the dimensions of the vectors to be 512 in size.

def small_embedding(text,model="text-embedding-3-small"):
  text = text.replace("\n", " ")
  return openai_client.embeddings.create(input = [text], model=model,dimensions=512).data[0].embedding
Python

Large Embeddings Function

Now we need to create the function for larger embeddings. Here we are using text-embedding-3-large as the embedding model. We are explicitly declaring the dimensions of the vectors to be 2048 in size, as the larger embedding size results in better model performance.

def large_embedding(text, model="text-embedding-3-large"):
  text = text.replace("\n", " ")
  return openai_client.embeddings.create(input = [text], model=model,dimensions=2048).data[0].embedding
Python

Loading the Dataset

It’s time to load the dataset. Here we are using a PDF that explains algorithmic trading in detail.

loaders = [
    PyPDFLoader("/content/TEGI0570.pdf"),
]
Python

For further processing, the PDF needs to be broken into smaller chunks which can later be extracted based on their relevance. Here we are breaking the PDF into chunks of 550 characters per chunk with an overlap of 50 characters.

docs = []
chunk_size=550
chunk_overlap = 50

r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
    )
for loader in loaders:
    docs.extend(loader.load())
splits = r_splitter.split_documents(docs)
Python

Let’s find out how many chunks we have achieved after splitting the document.

len(splits)
Python

The output is 558, which means we have divided the PDF text into 558 small chunks, each with a character length of 550 characters.

Now, since our data processing is done, we can proceed to insert the data into the Qdrant vector database.

Initializing the Qdrant Client

Let’s start by initializing the Qdrant client. Here we are using the local memory as the storage for the database.

client = QdrantClient(":memory:")
COLLECTION_NAME = "multi_stage_db"
Python

Now we need to create the collection in the database and also create the schema for the collection. In this schema, we are storing two vectors: one with small embeddings of size 512 vectors and another with larger embeddings of size 2048 vectors. Moreover, we are utilizing COSINE similarity to match the query embeddings with the chunk embeddings we are storing in the database.

client.recreate_collection(
    collection_name=COLLECTION_NAME,
    vectors_config={
        "small-embedding": models.VectorParams(
            size=512,
            distance=models.Distance.COSINE,
            datatype=models.Datatype.FLOAT16
        ),
        "large-embedding": models.VectorParams(
            size=2048,
            distance=models.Distance.COSINE,
            datatype=models.Datatype.FLOAT16
        ),
    },
)
Python

Now that the schema and the database are created, we can upload the data into the database. For that, the two functions we declared above, small_embedding and large_embedding will be utilized to convert the chunk data into vector embeddings of their respective sizes.

for i in range(0,len(splits)):
  client.upsert(
      collection_name=COLLECTION_NAME,
      points=[
          models.PointStruct(
              id=i,
              vector={
                  "small-embedding":small_embedding(splits[i].page_content),
                  "large-embedding":large_embedding(splits[i].page_content),
              },
          )
      ],
  )
Python

Now all the data has been added successfully into the database. So now it’s time to test the model.

We will first declare one query for which data must be present in the database. The next step will be to convert the query into vector embeddings of size 512 and 2048 respectively.

query_text = "what are common measurements and mismeasurements of risk "

small_vector = small_embedding(query_text)
large_vector = large_embedding(query_text)
Python

Now, to match the query embeddings with the embeddings present in the database, we will perform multi-stage vector querying with the help of Qdrant.

result = client.query_points(
    collection_name= COLLECTION_NAME,
    prefetch=models.Prefetch(
        query=small_vector,
        using="small-embedding",
        limit=250,
    ),
    query=large_vector,
    using="large-embedding",
    limit=5,
)
Python

Explanation of Multi-Stage Querying

In this code snippet, we optimize the search process by using embeddings of different sizes. Initially, we use a smaller query embedding of size 512 to match against other smaller embeddings in the database. This approach is computationally efficient and speeds up the initial search phase due to the reduced dimensionality. Once the initial matching is complete, we retrieve the top 250 most similar embeddings from the database. Next, we perform a more detailed and precise matching using larger embeddings of size 2048. However, this second phase of matching is only conducted on the 250 embeddings identified in the first step. By focusing on this smaller subset, we maintain efficiency while benefiting from the higher accuracy of the larger embeddings.

Displaying the Results

Let’s now check the output from the above code. The output will display the top 5 vector embedding IDs that are most similar to the query embedding. To make it more understandable, we match the IDs with the actual chunks and display them. We can do this by adding a small piece of code at the end that extracts the IDs from the result variable and uses those IDs to show the text present in the data chunks with the respective IDs.

ids = [item.id for item in result.points]
for i in ids:
  print(splits[i].page_content)
  print('\n')
  print('-'*75)
Python

Conclusion

In conclusion, the multi-stage query process with MRL embeddings represents a powerful method for balancing computational efficiency with search accuracy in large-scale data environments. By using smaller embeddings for initial filtering and subsequently refining with larger, more detailed embeddings, this approach not only accelerates search times but also enhances the precision of similarity matching. This technique can be particularly beneficial in applications where both speed and accuracy are crucial, such as in real-time information retrieval systems or large-scale document search engines.

Leave a Reply