OceanBase Releases seekdb: An Open Source AI Native Hybrid Search Database for Multi-model RAG...
OceanBase releases seekdb, an open-source AI-native hybrid search database for multi-model RAG and AI agents.
OceanBase Releases seekdb: An Open Source AI Native Hybrid Search Database
OceanBase has released seekdb, an open-source AI-focused database under the Apache 2.0 license. seekdb is an AI-native search database that unifies relational data, vector data, text, JSON, and GIS in one engine and exposes hybrid search and in-database AI workflows.
What is seekdb?
seekdb is a lightweight, embedded version of the OceanBase engine, aimed at AI applications rather than general-purpose distributed deployments. It runs as a single-node database, supports embedded mode and client or server mode, and remains compatible with MySQL drivers and SQL syntax.
Key Features of seekdb
- Embedded database supported
- Standalone database supported
- Distributed database not supported
- Relational data with standard SQL
- Vector search
- Full text search
- JSON data
- Spatial GIS data
- Hybrid search as the core feature
Hybrid Search
The main feature OceanBase pushes is hybrid search. This is search that combines vector-based semantic retrieval, full-text keyword retrieval, and scalar filters in a single query and a single ranking step. seekdb implements hybrid search through a system package named DBMS_HYBRID_SEARCH with two entry points:
- DBMS_HYBRID_SEARCH.SEARCH which returns results as JSON, sorted by relevance
- DBMS_HYBRID_SEARCH.GET_SQL which returns the concrete SQL string used for execution
Vector and Full Text Engine Details
At its core, seekdb exposes a modern vector and full-text stack. For vectors, seekdb:
- Supports dense vectors and sparse vectors
- Supports Manhattan, Euclidean, inner product, and cosine distance metrics
- Provides in-memory index types such as HNSW, HNSW SQ, HNSW BQ
- Provides disk-based index types including IVF and IVF PQ
For text, seekdb offers full-text search with:
- Keyword, phrase, and Boolean queries
- BM25 ranking for relevance
- Multiple tokenizer modes
AI Functions Inside the Database
seekdb includes built-in AI function expressions that let you call models directly from SQL, without a separate application service mediating every call. The main functions are:
- AI_EMBED to convert text into embeddings
- AI_COMPLETE for text generation using a chat or completion model
- AI_RERANK to rerank a list of candidates
- AI_PROMPT to assemble prompt templates and dynamic values into a JSON object for AI_COMPLETE
Multimodal Data and Workloads
seekdb is built to handle multimodal data and workloads, including:
- Relational data
- Vector data
- Text data
- JSON data
- Spatial GIS data
This makes seekdb an ideal choice for applications that require a unified storage and indexing layer for multiple data types.
Suppporting our sponsors helps us keep LearnTube free for all. Thank you!
Thanks for Learning!
We're thrilled to have you as part of the LearnTube India family. Keep exploring, stay curious, and continue your journey towards excellence.