sop

Scalable Objects Persistence


Project maintained by SharedCode Hosted on GitHub Pages — Theme by mattgraham

SOP Architecture Guide

This document outlines the high-level architecture of the Scalable Objects Persistence (SOP) library, focusing on the package structure and the design decisions behind public vs. internal components.

Package Structure & Visibility

SOP follows a strict separation between public APIs and internal implementation details to ensure a stable and safe developer experience.

Project Structure

Here is an overview of the project’s folder structure to help you navigate the codebase:

sop/
├── adapters/          # Interface adapters (Redis, Cassandra)
├── ai/                # AI/Vector database modules & Python bindings
├── bindings/          # Cross-language bindings (Python)
├── btree/             # Core B-Tree data structure implementation
├── cache/             # Caching interfaces and implementations
├── database/          # High-level Database API (Entry point)
├── fs/                # Filesystem registry & I/O
├── incfs/             # Hybrid backend (Cassandra + Filesystem)
├── infs/              # Standard backend (Filesystem only)
├── inmemory/          # In-memory backend for standalone mode
├── internal/          # Internal implementation details (hidden)
├── jsondb/            # JSON document store
├── restapi/           # REST API server example
├── search/            # Search engine implementation (BM25)
└── streamingdata/     # Large object (BLOB) streaming support

Public Packages

These packages are intended for direct use by consumers of the library:

Internal Packages

Design Principles

  1. Encapsulation: Complex storage logic (like the inredck blob management) is hidden behind clean, high-level interfaces (incfs).
  2. ACID Transactions: All public operations are designed to participate in SOP’s Two-Phase Commit (2PC) transaction model.
  3. Pluggable Backends: The architecture supports different backend implementations (infs vs incfs) sharing common interfaces where possible.
  4. UI-Driven Configuration: Advanced store configuration, specifically CEL Expressions for custom sorting, is managed exclusively via the Data Admin UI. Language bindings (Go, Python, etc.) do not expose APIs for setting these expressions in code. This ensures that complex, dynamic logic is centralized in the administrative layer rather than scattered across application code.

Development Guidelines

Consistency & Caching Architecture

SOP employs a multi-tiered caching strategy designed to balance high performance with strict consistency, preventing stale reads even in high-throughput scenarios.

The Stale Read Challenge

In a distributed system with local caching (L1), a common risk is Local Staleness:

  1. Transaction A on Host 1 updates Node X.
  2. Transaction B on Host 2 reads Node X from its local L1 cache.
  3. Host 2’s L1 cache might still hold the old version of Node X, leading to a “Stale Read.”

The SOP Solution: Indirect Synchronization (The “Pheromone” Approach)

SOP solves this by utilizing a technique inspired by Swarm Intelligence “Pheromone” algorithms. Instead of synchronizing the bulky data (L1 Cache) across the swarm, we only synchronize the tiny navigational signals (Registry Handles).

  1. Registry as Authority: The Registry maps a Virtual ID (UUID) to a Physical Handle (Version + Physical ID). This mapping resides in the L2 Cache (Redis) or the persistent Registry file.
  2. The “Check-First” Flow:
    • Step 1: When a transaction requests a Node by its Virtual ID, it always queries the Registry (L2) first.
    • Step 2: The Registry returns the current Physical ID (e.g., NodeID_v2).
    • Step 3: The transaction then checks the L1 cache for NodeID_v2.
    • Step 4:
      • Hit: If NodeID_v2 is in L1, it is returned (Fast).
      • Miss: If not, it is fetched from the Blob Store.
  3. Why Staleness is Impossible:
    • If Host 1 updates Node X to v2, the Registry is updated to point Node X -> NodeID_v2.
    • Host 2’s L1 cache might still have NodeID_v1.
    • When Host 2 requests Node X, the Registry tells it: “The current node is NodeID_v2.”
    • Host 2 looks for NodeID_v2 in L1. It won’t find it (or will find the correct new data). It will never accidentally return NodeID_v1 because the Registry handle didn’t ask for it.

Swarm Architecture Benefit: This design eliminates the need for stressful messaging, broadcast invalidations, or heavy L1-L2 synchronization protocols. We intentionally do not cache Registry Handles in L1. By forcing this “tiny” indirect synchronization via the Registry (acting like a minimal pheromone trail), SOP allows the swarm to operate in a lightweight manner without the overhead of heavy cache coherence traffic.

ACID Enforcement: The “Theory of Relativity” Approach

While the “Pheromone” synchronization ensures access to fresh data, SOP guarantees strict ACID compliance through a rigorous Two-Points-in-Time validation mechanism during the commit phase, effectively enforcing a “Theory of Relativity” for transactions.

Most distributed systems compromise on consistency to achieve speed, settling for “Eventual Consistency” (loosey-goosey state). SOP refuses this compromise.

  1. Point A (Read Time): When the transaction acts (reads/writes), it captures the specific Version of every artifact involved.
  2. Point B (Commit Time): During Phase 2 Commit, the system re-verifies these versions against the Registry (Source of Truth).

If the version at Point B matches the version at Point A, it proves that “Time Stood Still” for that transaction relative to the data it touched. No other actor interfered with the state. This mechanism guarantees Snapshot Isolation and strictly serializable behavior without the heavy locking overhead of traditional relational databases.

The Result: State-of-the-Art performance with enterprise-grade ACID guarantees. SOP delivers the speed of “Eventual Consistency” systems while strictly enforcing the correctness of a traditional RDBMS.

Component Interaction & Backends

SOP supports two primary backends, each with a distinct architecture for handling metadata and data.

Designed for distributed, high-scale environments as well as single-node deployments.

2. Hybrid Backend (incfs)

An alternative backend for distributed environments that “Powers up” your existing Cassandra infrastructure.

Transaction Data Flow

The flow of data during a Commit operation is similar for both backends, but the Commit Point—the moment the transaction becomes durable—differs.

inredcfs (Hybrid) Flow

sequenceDiagram
    participant App
    participant SOP as SOP Transaction
    participant Redis as Redis (Lock/Cache)
    participant FS as Blob Store (FS)
    participant Cass as Registry (Cassandra)

    App->>SOP: Commit()
    
    rect rgb(240, 248, 255)
        note right of SOP: Phase 1: Prepare
        SOP->>Redis: Acquire Locks (Rows/Items)
        SOP->>SOP: Conflict Detection
        SOP->>FS: Write "Dirty" Nodes (WAL/Temp)
    end

    rect rgb(255, 250, 240)
        note right of SOP: Phase 2: Commit
        SOP->>Cass: Update Registry (Virtual ID -> New Physical Location)
        SOP->>Redis: Update Cache (New Nodes)
        SOP->>Redis: Release Locks
    end

    rect rgb(240, 255, 240)
        note right of SOP: Cleanup (Async)
        SOP->>FS: Delete Old/Obsolete Nodes
    end

    SOP-->>App: Success

infs (Filesystem) Flow

The flow is identical to the above, except Cassandra is replaced by the Filesystem Registry.

  1. Prepare: Nodes are written to the Blob Store.
  2. Commit: The Registry file on disk is atomically updated (via fsync) to point to the new node locations.
  3. Cleanup: Old blobs are removed.

Key Concepts

Dual-View Architecture & Serialization Efficiency

SOP employs a unique “Dual-View” Architecture that decouples the storage format from the runtime representation, achieving high performance for both strongly-typed and dynamic use cases.

1. The “Common Ground”: JSON on Disk

The underlying storage format for B-Tree nodes and items is JSON (via DefaultMarshaler). This neutral format on disk allows data to be agnostic of the consuming application’s type system.

2. Direct Deserialization (Zero Waste)

SOP avoids the common “double conversion” penalty found in many ORMs or hybrid systems.

The Data Manager reads the exact same bytes as the Application but requests a map[string]any. The JSON decoder constructs the map directly, bypassing the need for the original Struct definition. This allows administrative tools to view and manipulate data without needing the application’s source code or type definitions.

3. JsonDBMapKey: The Intelligence Layer

While map[string]any provides flexibility, it lacks inherent ordering. JsonDBMapKey bridges this gap by injecting a Proxy Comparer into the B-Tree.

Deployment Modes

SOP is designed to run in two distinct modes, catering to different scale requirements.

2. Hybrid Mode (Distributed)

3. Standalone Mode (Embedded)

AI & Agent Architecture

SOP introduces a novel architecture for AI Agents, distinguishing itself from standard “RAG” or “Chatbot” implementations by leveraging the B-Tree as the central nervous system.

1. The “Powerhouse” B-Tree Memory

Unlike systems that rely on vector databases or flat text files for memory, SOP treats Memory as a Database System.

2. Hybrid Scripting Engine (Explicit Execution)

The SOP Scripting Engine (ai/agent) follows a unique “Explicit Execution” design pattern.

Backend Comparison: Isolation & Concurrency

When choosing a backend, it is crucial to understand how they handle isolation, locking, and multi-tenancy. Both backends support high concurrency, but their locking scopes differ.

Feature FileSystem (infs) Cassandra (incfs)
Primary Use Case High-performance distributed or local clusters. Environments with existing Cassandra infrastructure.
Multi-Tenancy Directory-Based: Each database is a separate folder on disk. Keyspace-Based: Each database is a separate Keyspace in Cassandra.
Locking Scope BaseFolder:StoreName
Locks are isolated to the specific database folder. Two stores with the same name in different folders do not block each other.
Keyspace:StoreName
Locks are isolated to the specific Keyspace. Two stores with the same name in different keyspaces do not block each other.
Concurrency High. Operations on different databases (folders) are completely independent. High. Operations on different keyspaces are completely independent.
Metadata Storage Custom high-performance Hash Map on disk. Cassandra Tables (store, registry, etc.).
Data Storage Filesystem Blobs. Filesystem Blobs.
Coordination Redis (Distributed) or In-Memory (Standalone). Redis.

Isolation & Locking Details

SOP uses Redis (in distributed mode) to manage transaction locks. The key design principle is that locking is scoped to the logical database.

This architecture ensures that SOP can host thousands of independent databases (tenants) on the same infrastructure without lock contention between them.

Reliability & Self-Healing

SOP incorporates advanced mechanisms to ensure data integrity and system stability, particularly in distributed environments where infrastructure components like Redis may restart or fail.

Redis Restart Detection (Clustered Mode)

In Clustered mode, SOP relies on Redis for transaction locking and coordination. A Redis restart could potentially lose volatile lock information, leaving transactions in an indeterminate state. To mitigate this, SOP implements a “Not Restarted” Token mechanism:

  1. The Token: A special volatile key (notrestarted) is maintained in Redis with a sliding expiration (TTL).
  2. Detection: The background servicer (onIdle) periodically checks for this token.
    • Presence: If the token exists, Redis is stable.
    • Absence: If the token is missing (e.g., after a restart), the system infers a potential restart event.
  3. Action: Upon detecting a restart, the system triggers a Lock Resurrection process. It scans for incomplete transactions (via Priority Logs) and re-acquires the necessary locks to allow those transactions to either complete or roll back safely.

Transaction Lifecycle Management

This multi-layered approach ensures that SOP databases remain “rock solid” and self-healing, minimizing the need for manual administrative intervention.

Future Optimization Roadmap

As SOP scales to handle trillion to hundreds of trillions of items, the current linear chaining of registry segment files (while effective and simple) presents an opportunity for optimization.