Vector Embeddings

The Vector Embeddings field in Xano, powered by pgVector, enables you to store large numerical representations (embeddings) of data, to facilitate efficient referencing and comparison of specific points of data. This is particularly useful for machine learning and AI applications, because they go beyond queries like "find this specific piece of data" by using complex relationships and semantic meanings, enabling algorithmic parse and delivery of tailored results.

What are vector embeddings?

You can think of embeddings like points on a map. When using embeddings to determine relationships between pieces of data, your chosen ML model will use the distance between points to decide what information to return.

In the example below, you can see two pieces of information that are very similar in context, so they have a short distance between them. The third piece of information is unlikely to be referenced in any content that is similar to the other two pieces of information, so it is farther away on our 'map' of embeddings. You will use the associated vector filters to calculate distance between data points.

A visual representation of embeddings

Preparing your Data

A typical structure of a table that utilizes embeddings would look like this:

{
    "id":"integer",
    "created_at":"timestamp",
    "text":"text",
    "page":"integer",
    "embeddings":"vector"
    }

Beyond the standard id and created_at fields, we have a text field which contains the actual content, a page number to indicate sections of the content, and our embeddings field.

To ensure your generation of embeddings is effective, it is important to separate your content into logical sections, such as sections of a manual. Read more on chunking here.

To provide a simplistic example, we will be using portions of the Xano documentation. Specifically, some of our documentation from API Basics. You can download our data set as a CSV below and follow along.

13KB
embeddings-sample-data.csv

Generating Embeddings

You will need to utilize an AI model of your choice to generate your embeddings. It is recommended to use a similar model for generating embeddings that you will use to also deliver responses, but this is not required, as embeddings are standardized.

Start by adding an embeddings field to your table.

Example of adding a vector embeddings database field

Give your field a name, description (if you wish), and specify the number of items. It would benefit you to experiment with the number of items that is most effective for the content in each record. Xano's Embeddings field currently has support for a maximum of 2,000 items per record. For this example, we will be using OpenAI's text-embedding-3-small, which will generate up to 1,536 items.

You should also apply an index to your embeddings field. When indexing vector fields, unlike normal indexing, the number of records does not dictate the need for an index.

Leveraging OpenAI's Embeddings API, we can generate embeddings for each section of our data (each record in our table).

Hint: Use a Database Trigger to auto-generate embeddings for new or updated content. Example here.

The Function Stack

In our function stack, we first need to retrieve the record that we want to generate embeddings for using a Get Record or Query All Records statement. You can also use a loop to generate records for all items at once, but keep in mind that a database trigger is the most efficient solution.

An example of a function stack to generate embeddings using OpenAI's Embeddings API

After we run this function stack, we can see that our new embeddings field has been updated with the data.

An example of data with generated embeddings

Generating Embeddings with Database Triggers

Using database triggers is the most optimal method for generating embeddings because they can be configured to run when new data is added, or existing data is updated, without resource-hungry loops. If you haven't yet tried out triggers in Xano, we recommend reviewing that documentation before continuing to get an introduction to how they work.

  • Set up a trigger on the table that contains your information to run on inserts and updates.

  • The first step of the trigger will be your API call. The only difference between the call demonstrated above is that instead of having to get a record first, we already have the record data in the new input of the trigger.

  • The second and final step will be to edit the record to add the newly generated embeddings.

Publishing these changes will immediately set the trigger to live, and it will react to any record edits or additions to that table!

Utilizing Embeddings to Generate Responses

Once again, you will want to leverage a model of your choice to generate responses. For this example, we will continue working with the data set shown above, and utilize OpenAI's gpt-3.5-turbo model to generate tailored responses.

The Function Stack

Our function stack will be comprised of three main steps:

  1. Generate embeddings based on the question asked

  2. Query our docs table for potentially matching content

  3. Send the matching content and the query to OpenAI to generate a response

Set up your API request to generate embeddings for the question asked.

Use a Query All Records function to query your table, and make sure to set the following parameters:

  • In the Output tab, click the ✏️icon next to Return, and enable paging.

  • Paging is essential, as it will directly control any costs incurred by limiting the data we send to our AI model. If your model employs a different pricing structure that is not based on tokens, this may not be necessary for your use case, but it is still important to consider how much data is required to be effective without sacrificing efficiency.

  • When using an indexed column, you should always use an ascending sort order. Using descending order would prevent your database from using the index and degrade the performance dramatically. Instead, you are provided reciprocal methods inverting the order of results via the various vector filters.

  • We now need to calculate the similarity between the question asked and the records in our table. This enables us to only return pages that most closely resemble the question asked. We will use the Inner Product filter with an eval to determine which records to return, because when working with OpenAI embeddings, the Inner Product is particularly useful for measuring alignment or overlap between vectors, which in turn helps us gauge the similarity between the question and the records more effectively. The field with the eval applied will be the vector field from our database. Give it a name, and add the Inner Product filter. In the vector field for the filter, reference the embedding data returned by the output of the previous API call.

  • In our third and final step, we will once again call our model API and send it the data returned in the previous Query All Records step, along with the question being asked.

Let's test it out!

Vector Filters

In addition to the new field type, we have introduced new filters, to be utilized within an eval, to calculate distance between vectors.

  • Cosine Similarity / Cosine Distance

    • These should be used when your vectors have an amplitude and a length.

    • Cosine Similarity measures how similar two sets of vectors are. The smaller the value, the more similar they are.

    • If you want to find the opposite, you can use Cosine Distance instead.

  • Inner Product / Negative Inner Product

    • These should be used when working with normalized vectors, such as those from OpenAI.

    • Measures the similarity between two sets of normalized vectors.

    • Inner product will deliver the most dissimilar vectors, while Negative Inner Product will deliver the most similar first.

  • L1 Distance (Manhattan): Imagine you're in a city and you can only move along the grid of streets (like moving from block to block). The L1 distance is like the distance you travel if you can only move along these streets. You sum up the absolute differences in the coordinates.

  • L2 Distance (Euclidean): This is the regular straight-line distance between two points in space. If you were a bird flying from one point to another, this is the distance you'd cover.

Common Issues and FAQ

My answers are misleading or incorrect.

If the answers being generated by your model of choice are not accurate, misleading, or not as in-depth as you expect, that could be due to one or more of the following:

  • Data preparation

    • Make sure that your data is separated into logical sections so that your embeddings can be generated more precisely.

  • Prompt crafting

    • The prompt you are using to generate responses may need adjustment to ensure accurate responses are returned and it is prepared to answer the questions in the format provided.

Querying my vector database table is very slow

Vector embeddings can result in a significant amount of data pretty quickly. Make sure you are applying the proper indexes to your Vector fields, and using the correct sorting in your query, which can help tremendously with query speed.

I am experiencing high cost when generating embeddings and responses.

While Xano has no control over the rates charged by your ML provider of choice (such as OpenAI), you have control of the amount of data passed in your API requests, which in most pricing models is directly responsible for how many tokens are used in generation, and what you are charged for.

I'm experiencing slow performance when generating embeddings or responses.

While Xano can not be responsible for the amount of time taken for an external service to generate and return data, there are some factors you can consider and adjust to increase performance.

  • Ensure your data is well organized and unbiased.

    • Making sure that your data is separated in a logical manner and presents unbiased information can ensure that the machine learning model employed can generate responses faster.

  • How much contextual data are you sending?

    • Sending too much data can increase the time it takes for your model to return a response. You can adjust this by modifying the paging settings on the database query.

  • Did you choose the right model?

    • It is possible that the model you have chosen to generate responses is not the best tool for the job. Embeddings generated are standardized, so you should be free to swap models anytime, but keep in mind that some models may not be fully operable with embeddings generated by another.

  • Third-party limits

    • Does the third-party service you are using to generate embeddings rate limit or otherwise speed-check requests you are sending?

I have questions about security when it comes to working with embeddings.

Because Xano only operates as the data storage for the content, embeddings, and the vehicle for sending requests to third party services, it is important to review that third party's privacy and security policies to ensure that sensitive data is treated with care.

Last updated