Vector Embeddings
Last updated
Last updated
The Vector Embeddings field in Xano, powered by pgVector, enables you to store large numerical representations (embeddings) of data, to facilitate efficient referencing and comparison of specific points of data. This is particularly useful for machine learning and AI applications, because they go beyond queries like "find this specific piece of data" by using complex relationships and semantic meanings, enabling algorithmic parse and delivery of tailored results.
What are vector embeddings?
Get a primer on what embeddings are and how they are used.
Vector Filters
Learn about the new filters added to work with embeddings in Xano.
Preparing Your Data
See an example of how your data might look when preparing to use embeddings.
Generating Embeddings
See an example of generating embeddings using an OpenAI model and learn how to store them in Xano.
Utilizing Embeddings
Learn how to use your generated embeddings to tailor custom responses from an AI model.
Common Issues & FAQ
Get quick answers to the most common questions and concerns when using embeddings.
You can think of embeddings like points on a map. When using embeddings to determine relationships between pieces of data, your chosen ML model will use the distance between points to decide what information to return.
In the example below, you can see two pieces of information that are very similar in context, so they have a short distance between them. The third piece of information is unlikely to be referenced in any content that is similar to the other two pieces of information, so it is farther away on our 'map' of embeddings. You will use the associated vector filters to calculate distance between data points.
A typical structure of a table that utilizes embeddings would look like this:
Beyond the standard id
and created_at
fields, we have a text
field which contains the actual content, a page
number to indicate sections of the content, and our embeddings
field.
To ensure your generation of embeddings is effective, it is important to separate your content into logical sections, such as sections of a manual. Read more on chunking here.
To provide a simplistic example, we will be using portions of the Xano documentation. Specifically, some of our documentation from API Basics. You can download our data set as a CSV below and follow along.
You will need to utilize an AI model of your choice to generate your embeddings. It is recommended to use a similar model for generating embeddings that you will use to also deliver responses, but this is not required, as embeddings are standardized.
Start by adding an embeddings field to your table.
Give your field a name, description (if you wish), and specify the number of items. It would benefit you to experiment with the number of items that is most effective for the content in each record. Xano's Embeddings field currently has support for a maximum of 2,000 items per record. For this example, we will be using OpenAI's text-embedding-3-small, which will generate up to 1,536 items.
You should also apply an index to your embeddings field. When indexing vector fields, unlike normal indexing, the number of records does not dictate the need for an index.
Leveraging OpenAI's Embeddings API, we can generate embeddings for each section of our data (each record in our table).
Hint: Use a Database Trigger to auto-generate embeddings for new or updated content. Example here.
In our function stack, we first need to retrieve the record that we want to generate embeddings for using a Get Record or Query All Records statement. You can also use a loop to generate records for all items at once, but keep in mind that a database trigger is the most efficient solution.
After we run this function stack, we can see that our new embeddings
field has been updated with the data.
Using database triggers is the most optimal method for generating embeddings because they can be configured to run when new data is added, or existing data is updated, without resource-hungry loops. If you haven't yet tried out triggers in Xano, we recommend reviewing that documentation before continuing to get an introduction to how they work.
Set up a trigger on the table that contains your information to run on inserts and updates.
Publishing these changes will immediately set the trigger to live, and it will react to any record edits or additions to that table!
Once again, you will want to leverage a model of your choice to generate responses. For this example, we will continue working with the data set shown above, and utilize OpenAI's gpt-3.5-turbo model to generate tailored responses.
Our function stack will be comprised of three main steps:
Generate embeddings based on the question asked
Query our docs
table for potentially matching content
Send the matching content and the query to OpenAI to generate a response
Set up your API request to generate embeddings for the question asked.
Use a Query All Records function to query your table, and make sure to set the following parameters:
Paging is essential, as it will directly control any costs incurred by limiting the data we send to our AI model. If your model employs a different pricing structure that is not based on tokens, this may not be necessary for your use case, but it is still important to consider how much data is required to be effective without sacrificing efficiency.
When using an indexed column, you should always use an ascending sort order. Using descending order would prevent your database from using the index and degrade the performance dramatically. Instead, you are provided reciprocal methods inverting the order of results via the various vector filters.
Let's test it out!
In addition to the new field type, we have introduced new filters, to be utilized within an eval, to calculate distance between vectors.
Cosine Similarity / Cosine Distance
These should be used when your vectors have an amplitude and a length.
Cosine Similarity measures how similar two sets of vectors are. The smaller the value, the more similar they are.
If you want to find the opposite, you can use Cosine Distance instead.
Inner Product / Negative Inner Product
These should be used when working with normalized vectors, such as those from OpenAI.
Measures the similarity between two sets of normalized vectors.
Inner product will deliver the most dissimilar vectors, while Negative Inner Product will deliver the most similar first.
L1 Distance (Manhattan): Imagine you're in a city and you can only move along the grid of streets (like moving from block to block). The L1 distance is like the distance you travel if you can only move along these streets. You sum up the absolute differences in the coordinates.
L2 Distance (Euclidean): This is the regular straight-line distance between two points in space. If you were a bird flying from one point to another, this is the distance you'd cover.
The first step of the trigger will be your API call. The only difference between the call demonstrated above is that instead of having to get a record first, we already have the record data in the new
input of the trigger.
The second and final step will be to edit the record to add the newly generated embeddings.
In the Output tab, click the ✏️icon next to Return, and enable paging.
We now need to calculate the similarity between the question asked and the records in our table. This enables us to only return pages that most closely resemble the question asked. We will use the Inner Product filter with an eval to determine which records to return, because when working with OpenAI embeddings, the Inner Product is particularly useful for measuring alignment or overlap between vectors, which in turn helps us gauge the similarity between the question and the records more effectively. The field with the eval applied will be the vector field from our database. Give it a name, and add the Inner Product filter. In the vector field for the filter, reference the embedding data returned by the output of the previous API call.
In our third and final step, we will once again call our model API and send it the data returned in the previous Query All Records step, along with the question being asked.