Scalable Objects Persistence
Note: This package was developed with an AI copilot. I want to keep an open development approach (not finicky and narrow) in this package to keep efficient in “automaton” cycles. Thus, source codes here may be subject to refactor because of AI first philosophy.
The sop/ai package is the SOP AI Kit — a versatile AI Platform that transforms SOP from a storage engine into a complete Computing Platform.
It enables you to build:
It provides a complete toolkit for building local, privacy-first AI applications backed by the power of SOP’s B-Tree storage engine.
ai/vector)A persistent, ACID-compliant vector store that runs on your local filesystem.
sys_config registry.Optimize() method to rebalance clusters (Centroids) and ensure optimal search performance as data grows.
Upsert or Delete will return an error until Optimize completes.Optimize will automatically detect and clean up any stale artifacts before starting fresh.Delete() marks items as deleted in both the Index and Content stores, ensuring they are immediately hidden from search results.Optimize() process acts as a Garbage Collector. It detects these tombstones and performs a physical delete on the underlying data, reclaiming storage space during the maintenance cycle.SetDeduplication(false)) for maximum write performance when data is known to be unique.ContentKey struct as the B-Tree key.
CentroidID, Distance, Version, and Deleted status directly in the key.Optimize() -> Serve queries. Discards temporary build artifacts for efficiency.Optimize().ai/agent)The core of the Computing Platform. It allows you to define complex, multi-step workflows using Natural Language Programming (Scripts/Scripts).
See ai/agent/README.md for full documentation on Scripts, Swarm Computing, and the Tool Registry.
The SOP Agent is equipped with a dual-memory system that leverages the Database Engine itself:
RunnerSession / ConversationThread to track Topics and Goals. Unlike standard chat history (flat list), this structured approach allows the Agent to maintain distinct threads of thought and switch contexts without hallucinating via a rigorous “Executive Function”.llm_knowledge). The Agent “learns” by performing ACID transactions against its own mind. This B-Tree Powerhouse approach ensures that knowledge is scalable, ordered, and corruption-free, unlike brittle JSON/Vector-only memory systems.ai/generator, ai/embed)Interfaces for connecting to AI models:
ai/database/model_store.go)A unified interface for persisting AI models, from small “Skills” (Perceptrons) to large “Brains” (Neural Nets).
BTreeModelStore) for reliability and consistency.{Category, Name}, allowing for organized grouping of model artifacts.search)A transactional, embedded text search engine.
ai/SCRIPTS.md)A unique Hybrid Execution engine that runs inside the Agent.
loop, fetch tables) with Non-Deterministic AI (ask “Analyze this data”).A conversational interface for interacting with your data and building scripts.
The SOP AI Kit is designed to play nicely with the broader AI ecosystem while adhering to strict software engineering standards.
ai.Generator interface to connect any other provider.sop4py) includes convenience methods for LangChain integration.VectorStore[T]), allowing you to store strongly-typed structs or dynamic map[string]any payloads.VectorDBOptions (Python) or Database struct (Go). Replication is optional in both modes.The SOP AI package is built as a high-level abstraction layer on top of the General Purpose SOP engine. This design ensures that both use cases share the same robust foundation while offering appropriate interfaces for their respective domains.
infs B-Tree storage engine, ensuring identical performance, reliability, and ACID compliance.sop): Exposes low-level B-Tree primitives and explicit transaction management for building custom data structures (Key-Value stores, Registries).sop/ai): Abstracts B-Trees into domain-specific “Vector Stores” and “Model Stores” with implicit transaction handling for ease of use.For detailed code examples and usage patterns, please see the AI Cookbook.
For a deep dive into persisting AI models, configurations, and weights, see the Model Store Tutorial.
You can use the ai package directly in your Go applications to build custom solutions.
package main
import (
"context"
"fmt"
"github.com/sharedcode/sop/ai"
"github.com/sharedcode/sop/database"
"github.com/sharedcode/sop/ai/embed"
)
func main() {
// 1. Initialize the Vector Database
db := database.NewDatabase(sop.DatabaseOptions{
Type: sop.Standalone,
StoresFolders: []string{"./my_knowledge_base"},
})
// 2. Start a Transaction
ctx := context.Background()
trans, _ := db.BeginTransaction(ctx, sop.ForWriting)
defer trans.Rollback(ctx) // Safety rollback
// 3. Open an index for a specific domain (e.g., "documents")
idx, _ := db.OpenVectorStore(ctx, "documents", trans, vector.Config{})
// 4. Initialize an Embedder
// (In production, use a real embedding model. Here we use the simple keyword hasher)
emb := embed.NewSimple("simple-embedder", 64, nil)
// 5. Add Data (Upsert)
item := ai.Item[map[string]any]{
ID: "doc-1",
Vector: nil, // Will be filled below
Payload: map[string]any{
"text": "SOP is a high-performance Go library for storage.",
"category": "tech",
},
}
// Generate vector
vecs, _ := emb.EmbedTexts(ctx, []string{item.Payload["text"].(string)})
item.Vector = vecs[0]
// Save to DB
idx.UpsertBatch(ctx, []ai.Item[map[string]any]{item})
// Commit the transaction
trans.Commit(ctx)
// 6. Search (Retrieve) - New Read Transaction
trans, _ = db.BeginTransaction(ctx, sop.ForReading)
idx, _ = db.OpenVectorStore(ctx, "documents", trans, vector.Config{})
query := "storage library"
queryVecs, _ := emb.EmbedTexts(ctx, []string{query})
hits, _ := idx.Query(ctx, queryVecs[0], 5, nil)
for _, hit := range hits {
fmt.Printf("Found: %s (Score: %.2f)\n", hit.Payload["text"], hit.Score)
}
trans.Commit(ctx)
}
This demo showcases a complete “Doctor-Nurse” AI pipeline running entirely locally. It demonstrates how to chain agents together using the SOP AI framework.
The system consists of two agents working in a pipeline:
nurse_local):
doctor_pipeline):
Before the agents can run, we must build their knowledge bases. We use a dedicated ETL (Extract, Transform, Load) tool called sop-etl.
The entire process is defined in etl_workflow.json and consists of three steps:
doctor_data.json).data/nurse_local), indexing symptoms for semantic retrieval.data/doctor_core), indexing diseases and their associated symptoms.We provide a script to build the tools, run the ETL pipeline, and verify the agents.
./rebuild_doctor.sh
This script will:
sop-etl and sop-ai binaries.etl_workflow.json../sop-ai -config data/doctor_pipeline.json
Example Interaction:
Patient> I have a bad cough and a runny nose
AI Doctor: [1] Common Cold... (Score: 0.92)
etl_workflow.json: Defines the ETL pipeline steps and parameters (e.g., batch sizes, input/output paths).data/doctor_pipeline.json: Configuration for the main Doctor agent. It specifies that it should use the nurse_local agent as its “embedder” (translator).data/nurse_local.json: Configuration for the Nurse agent.The system supports two types of “Nurse” agents for embedding/translation:
nurse_local):
nurse_translator):
To switch between them, you would update the embedder configuration in the agent’s JSON file.