sop

Scalable Objects Persistence


Project maintained by SharedCode Hosted on GitHub Pages — Theme by mattgraham

SOP AI Kit

Note: This package was developed with an AI copilot. I want to keep an open development approach (not finicky and narrow) in this package to keep efficient in “automaton” cycles. Thus, source codes here may be subject to refactor because of AI first philosophy.

The sop/ai package is the SOP AI Kit — a versatile AI Platform that transforms SOP from a storage engine into a complete Computing Platform.

It enables you to build:

It provides a complete toolkit for building local, privacy-first AI applications backed by the power of SOP’s B-Tree storage engine.

Core Components

1. Vector Database (ai/vector)

A persistent, ACID-compliant vector store that runs on your local filesystem.

2. Versatile Scripting Engine (ai/agent)

The core of the Computing Platform. It allows you to define complex, multi-step workflows using Natural Language Programming (Scripts/Scripts).

See ai/agent/README.md for full documentation on Scripts, Swarm Computing, and the Tool Registry.

3. Memory Architecture (SOP Unique Design)

The SOP Agent is equipped with a dual-memory system that leverages the Database Engine itself:

4. Generators & Embedders (ai/generator, ai/embed)

Interfaces for connecting to AI models:

4. Model Store (ai/database/model_store.go)

A unified interface for persisting AI models, from small “Skills” (Perceptrons) to large “Brains” (Neural Nets).

A transactional, embedded text search engine.

6. Script System (ai/SCRIPTS.md)

A unique Hybrid Execution engine that runs inside the Agent.

7. AI Copilot (Interactive Mode)

A conversational interface for interacting with your data and building scripts.

Standards & Compatibility

The SOP AI Kit is designed to play nicely with the broader AI ecosystem while adhering to strict software engineering standards.

Supported Interfaces

Deployment Standards

Unified Architecture

The SOP AI package is built as a high-level abstraction layer on top of the General Purpose SOP engine. This design ensures that both use cases share the same robust foundation while offering appropriate interfaces for their respective domains.

API Cookbook

For detailed code examples and usage patterns, please see the AI Cookbook.

Model Store Tutorial

For a deep dive into persisting AI models, configurations, and weights, see the Model Store Tutorial.

Usage as a Library

You can use the ai package directly in your Go applications to build custom solutions.

Example: Building a Simple RAG App

package main

import (
    "context"
    "fmt"
    "github.com/sharedcode/sop/ai"
    "github.com/sharedcode/sop/database"
    "github.com/sharedcode/sop/ai/embed"
)

func main() {
    // 1. Initialize the Vector Database
    db := database.NewDatabase(sop.DatabaseOptions{
        Type:          sop.Standalone,
        StoresFolders: []string{"./my_knowledge_base"},
    })
    
    // 2. Start a Transaction
    ctx := context.Background()
    trans, _ := db.BeginTransaction(ctx, sop.ForWriting)
    defer trans.Rollback(ctx) // Safety rollback

    // 3. Open an index for a specific domain (e.g., "documents")
    idx, _ := db.OpenVectorStore(ctx, "documents", trans, vector.Config{})

    // 4. Initialize an Embedder
    // (In production, use a real embedding model. Here we use the simple keyword hasher)
    emb := embed.NewSimple("simple-embedder", 64, nil)

    // 5. Add Data (Upsert)
    item := ai.Item[map[string]any]{
        ID: "doc-1",
        Vector: nil, // Will be filled below
        Payload: map[string]any{
            "text": "SOP is a high-performance Go library for storage.",
            "category": "tech",
        },
    }
    // Generate vector
    vecs, _ := emb.EmbedTexts(ctx, []string{item.Payload["text"].(string)})
    item.Vector = vecs[0]

    // Save to DB
    idx.UpsertBatch(ctx, []ai.Item[map[string]any]{item})
    
    // Commit the transaction
    trans.Commit(ctx)

    // 6. Search (Retrieve) - New Read Transaction
    trans, _ = db.BeginTransaction(ctx, sop.ForReading)
    idx, _ = db.OpenVectorStore(ctx, "documents", trans, vector.Config{})
    
    query := "storage library"
    queryVecs, _ := emb.EmbedTexts(ctx, []string{query})
    
    hits, _ := idx.Query(ctx, queryVecs[0], 5, nil)
    
    for _, hit := range hits {
        fmt.Printf("Found: %s (Score: %.2f)\n", hit.Payload["text"], hit.Score)
    }
    trans.Commit(ctx)
}

The Doctor Demo: A Local RAG Pipeline

This demo showcases a complete “Doctor-Nurse” AI pipeline running entirely locally. It demonstrates how to chain agents together using the SOP AI framework.

Architecture

The system consists of two agents working in a pipeline:

  1. Nurse Agent (nurse_local):
    • Role: The “Translator”.
    • Task: Takes colloquial patient symptoms (e.g., “tummy hurt”, “hot”) and translates them into standardized clinical terms (e.g., “abdominal pain”, “fever”).
    • Mechanism: Uses a local vector database to find the closest matching clinical terms.
  2. Doctor Agent (doctor_pipeline):
    • Role: The “Diagnostician”.
    • Task: Takes the clinical terms from the Nurse and searches its medical knowledge base to suggest possible conditions.
    • Mechanism: Uses a separate local vector database populated with disease-symptom mappings.

ETL Workflow (Data Ingestion)

Before the agents can run, we must build their knowledge bases. We use a dedicated ETL (Extract, Transform, Load) tool called sop-etl.

The entire process is defined in etl_workflow.json and consists of three steps:

  1. Prepare: Downloads a raw healthcare dataset (CSV) and converts it into JSON format (doctor_data.json).
  2. Build Nurse DB: Ingests the data into the Nurse’s vector store (data/nurse_local), indexing symptoms for semantic retrieval.
  3. Build Doctor DB: Ingests the data into the Doctor’s vector store (data/doctor_core), indexing diseases and their associated symptoms.

Quick Start

We provide a script to build the tools, run the ETL pipeline, and verify the agents.

  1. Run the Rebuild Script:
    ./rebuild_doctor.sh
    

    This script will:

    • Build sop-etl and sop-ai binaries.
    • Clean up old data.
    • Run the ETL workflow defined in etl_workflow.json.
    • Run sanity tests.
  2. Run the Agent Manually: Once the data is built, you can chat with the Doctor agent:
    ./sop-ai -config data/doctor_pipeline.json
    

    Example Interaction:

    Patient> I have a bad cough and a runny nose
    AI Doctor: [1] Common Cold... (Score: 0.92)
    

Configuration Files

Heuristic vs LLM Embedders

The system supports two types of “Nurse” agents for embedding/translation:

  1. Heuristic Agent (nurse_local):
    • How it works: Uses a local dictionary and vector search with manually curated synonyms.
    • Performance: Extremely fast and deterministic.
    • Use Case: Default for this demo. Tuned for high performance in specific areas (e.g., lung-related diseases).
    • Pros: No external dependencies (no Ollama required), predictable.
    • Cons: Requires manual tuning for new slang/terms.
  2. LLM Agent (nurse_translator):
    • How it works: Uses a local LLM (via Ollama) to semantically understand and translate user input.
    • Performance: Slower (depends on GPU/CPU), but more flexible.
    • Use Case: General-purpose understanding without manual synonym mapping.
    • Pros: Understands context and nuance better out-of-the-box.
    • Cons: Requires running Ollama, higher latency.

To switch between them, you would update the embedder configuration in the agent’s JSON file.