Vector
Vector turns any dataset into a vector table for semantic search. Select a primary key, choose source columns, add embedding columns, then populate and query by vector similarity.
What is Vector?
Use Vector to generate and persist embeddings from your dataset so you can run semantic search, nearest-neighbor lookups, and similarity exploration. Vectors are kept aligned to your source rows via a primary key, enabling safe, incremental updates.
Use cases
- Generate embeddings from one or more text columns to power semantic search, nearest-neighbor lookups, or similarity exploration.
- Persist those embeddings as a vector table you can refresh and query.
- Keep vectors aligned with your source data using a primary key so updates are safe and incremental.
Before you start
- Your source dataset must have a valid primary key (unique, non-null).
- Only number/string columns can be part of the key.
- Columns in the key can’t be removed until the key is deleted.
Typical workflow
- Set Primary Key: Choose one or more columns to uniquely identify rows.
- Pick Source Columns: Toggle which source fields to include in the vector table.
- Add Vector Column(s): Create embedding columns and select an embedding model and source text column.
- Preview: While modeling, preview up to 100 rows to verify your setup.
- Populate: Compute embeddings in batches and store the vector table (state switches to populated).
- Search: Enter a query string, choose a vector column, and run nearest-neighbor ordering.
Model options and limits
Models
- OpenAI text-embedding-3-small (default, 1536 dims)
- OpenAI text-embedding-3-large (3072 dims)
- OpenAI text-embedding-ada-002
- Google Vertex AI option is shown in the UI but is not implemented server-side.
Input limits
- Text to embed must be non-empty and up to 5000 characters.
Performance
- Preview shows up to 100 rows.
- Populate and query run in batches for large datasets.