Scalable Objects Persistence
This document outlines the high-level architecture of the Scalable Objects Persistence (SOP) library, focusing on the package structure and the design decisions behind public vs. internal components.
SOP follows a strict separation between public APIs and internal implementation details to ensure a stable and safe developer experience.
Here is an overview of the project’s folder structure to help you navigate the codebase:
sop/
├── adapters/ # Interface adapters (Redis, Cassandra)
├── ai/ # AI/Vector database modules & Python bindings
├── bindings/ # Cross-language bindings (Python)
├── btree/ # Core B-Tree data structure implementation
├── cache/ # Caching interfaces and implementations
├── database/ # High-level Database API (Entry point)
├── fs/ # Filesystem registry & I/O
├── incfs/ # Hybrid backend (Cassandra + Filesystem)
├── infs/ # Standard backend (Filesystem only)
├── inmemory/ # In-memory backend for standalone mode
├── internal/ # Internal implementation details (hidden)
├── jsondb/ # JSON document store
├── restapi/ # REST API server example
├── search/ # Search engine implementation (BM25)
└── streamingdata/ # Large object (BLOB) streaming support
These packages are intended for direct use by consumers of the library:
github.com/sharedcode/sop/infs: The primary and recommended backend. It uses the local filesystem for both metadata (via a high-performance hashmap) and data. Redis is used strictly for caching and coordination (locking), not for data persistence.
github.com/sharedcode/sop/incfs: The “Hybrid” backend. It combines:
github.com/sharedcode/sop/internal/inredck:
incfs and streamingdata.inredck blob management) is hidden behind clean, high-level interfaces (incfs).infs vs incfs) sharing common interfaces where possible.internal/inredck, be aware that changes here can affect both incfs and streamingdata. Always run the full integration test suite (SOP_RUN_INCFS_IT=1) after modifications.incfs or streamingdata, delegating to internal packages for the heavy lifting.SOP employs a multi-tiered caching strategy designed to balance high performance with strict consistency, preventing stale reads even in high-throughput scenarios.
In a distributed system with local caching (L1), a common risk is Local Staleness:
SOP solves this by utilizing a technique inspired by Swarm Intelligence “Pheromone” algorithms. Instead of synchronizing the bulky data (L1 Cache) across the swarm, we only synchronize the tiny navigational signals (Registry Handles).
NodeID_v2).NodeID_v2.NodeID_v2 is in L1, it is returned (Fast).v2, the Registry is updated to point Node X -> NodeID_v2.NodeID_v1.NodeID_v2.”NodeID_v2 in L1. It won’t find it (or will find the correct new data). It will never accidentally return NodeID_v1 because the Registry handle didn’t ask for it.Swarm Architecture Benefit: This design eliminates the need for stressful messaging, broadcast invalidations, or heavy L1-L2 synchronization protocols. We intentionally do not cache Registry Handles in L1. By forcing this “tiny” indirect synchronization via the Registry (acting like a minimal pheromone trail), SOP allows the swarm to operate in a lightweight manner without the overhead of heavy cache coherence traffic.
While the “Pheromone” synchronization ensures access to fresh data, SOP guarantees strict ACID compliance through a rigorous Two-Points-in-Time validation mechanism during the commit phase, effectively enforcing a “Theory of Relativity” for transactions.
Most distributed systems compromise on consistency to achieve speed, settling for “Eventual Consistency” (loosey-goosey state). SOP refuses this compromise.
If the version at Point B matches the version at Point A, it proves that “Time Stood Still” for that transaction relative to the data it touched. No other actor interfered with the state. This mechanism guarantees Snapshot Isolation and strictly serializable behavior without the heavy locking overhead of traditional relational databases.
The Result: State-of-the-Art performance with enterprise-grade ACID guarantees. SOP delivers the speed of “Eventual Consistency” systems while strictly enforcing the correctness of a traditional RDBMS.
SOP supports two primary backends, each with a distinct architecture for handling metadata and data.
infs) - RecommendedDesigned for distributed, high-scale environments as well as single-node deployments.
infs performed 25% faster than incfs.incfs)An alternative backend for distributed environments that “Powers up” your existing Cassandra infrastructure.
The flow of data during a Commit operation is similar for both backends, but the Commit Point—the moment the transaction becomes durable—differs.
inredcfs (Hybrid) FlowsequenceDiagram
participant App
participant SOP as SOP Transaction
participant Redis as Redis (Lock/Cache)
participant FS as Blob Store (FS)
participant Cass as Registry (Cassandra)
App->>SOP: Commit()
rect rgb(240, 248, 255)
note right of SOP: Phase 1: Prepare
SOP->>Redis: Acquire Locks (Rows/Items)
SOP->>SOP: Conflict Detection
SOP->>FS: Write "Dirty" Nodes (WAL/Temp)
end
rect rgb(255, 250, 240)
note right of SOP: Phase 2: Commit
SOP->>Cass: Update Registry (Virtual ID -> New Physical Location)
SOP->>Redis: Update Cache (New Nodes)
SOP->>Redis: Release Locks
end
rect rgb(240, 255, 240)
note right of SOP: Cleanup (Async)
SOP->>FS: Delete Old/Obsolete Nodes
end
SOP-->>App: Success
infs (Filesystem) FlowThe flow is identical to the above, except Cassandra is replaced by the Filesystem Registry.
fsync) to point to the new node locations.Version, Deleted flags, or CentroidID) directly in the B-Tree node. This allows structural operations to be performed by scanning Keys only, avoiding the I/O cost of fetching large Data Blobs (Values).inredcfs: The commit point is the atomic update of the Registry in Cassandra. Once the registry row is updated to point to the new blob location, the transaction is durable.infs: The commit point is the atomic update of the Registry hashmap on the Filesystem.SOP employs a unique “Dual-View” Architecture that decouples the storage format from the runtime representation, achieving high performance for both strongly-typed and dynamic use cases.
The underlying storage format for B-Tree nodes and items is JSON (via DefaultMarshaler). This neutral format on disk allows data to be agnostic of the consuming application’s type system.
SOP avoids the common “double conversion” penalty found in many ORMs or hybrid systems.
Disk Bytes -> Generic Map -> Strong Struct (Double allocation/conversion).Disk Bytes -> Strong Struct (Direct json.Unmarshal).Disk Bytes -> map[string]any (Direct json.Unmarshal).The Data Manager reads the exact same bytes as the Application but requests a map[string]any. The JSON decoder constructs the map directly, bypassing the need for the original Struct definition. This allows administrative tools to view and manipulate data without needing the application’s source code or type definitions.
JsonDBMapKey: The Intelligence LayerWhile map[string]any provides flexibility, it lacks inherent ordering. JsonDBMapKey bridges this gap by injecting a Proxy Comparer into the B-Tree.
SOP is designed to run in two distinct modes, catering to different scale requirements.
infs) - Recommendedinfs.incfs.infs (or inmemory for pure RAM).SOP introduces a novel architecture for AI Agents, distinguishing itself from standard “RAG” or “Chatbot” implementations by leveraging the B-Tree as the central nervous system.
Unlike systems that rely on vector databases or flat text files for memory, SOP treats Memory as a Database System.
llm_knowledge). This ensures that the Agent’s knowledge base is transactional, ordered, and scalable ($O(\log N)$ retrieval). The Agent can safely update its own mind (Self-Learning) without corruption.ConversationThread), not a flat list of tokens. This gives the Agent “Executive Function”—the ability to track topics, manage context switches, and maintain a rigorous “Train of Thought” separate from the raw chat history.The SOP Scripting Engine (ai/agent) follows a unique “Explicit Execution” design pattern.
scan, filter) with Probabilistic Reasoning (e.g., ask).When choosing a backend, it is crucial to understand how they handle isolation, locking, and multi-tenancy. Both backends support high concurrency, but their locking scopes differ.
| Feature | FileSystem (infs) |
Cassandra (incfs) |
|---|---|---|
| Primary Use Case | High-performance distributed or local clusters. | Environments with existing Cassandra infrastructure. |
| Multi-Tenancy | Directory-Based: Each database is a separate folder on disk. | Keyspace-Based: Each database is a separate Keyspace in Cassandra. |
| Locking Scope | BaseFolder:StoreNameLocks are isolated to the specific database folder. Two stores with the same name in different folders do not block each other. |
Keyspace:StoreNameLocks are isolated to the specific Keyspace. Two stores with the same name in different keyspaces do not block each other. |
| Concurrency | High. Operations on different databases (folders) are completely independent. | High. Operations on different keyspaces are completely independent. |
| Metadata Storage | Custom high-performance Hash Map on disk. | Cassandra Tables (store, registry, etc.). |
| Data Storage | Filesystem Blobs. | Filesystem Blobs. |
| Coordination | Redis (Distributed) or In-Memory (Standalone). | Redis. |
SOP uses Redis (in distributed mode) to manage transaction locks. The key design principle is that locking is scoped to the logical database.
StorePath (the folder path).
db1/users and db2/users, a transaction on db1/users acquires a lock on db1:users. It will never block a transaction on db2/users.Keyspace.
keyspaceA.users and keyspaceB.users, a transaction on keyspaceA acquires a lock on keyspaceA:users. It will never block keyspaceB.This architecture ensures that SOP can host thousands of independent databases (tenants) on the same infrastructure without lock contention between them.
SOP incorporates advanced mechanisms to ensure data integrity and system stability, particularly in distributed environments where infrastructure components like Redis may restart or fail.
In Clustered mode, SOP relies on Redis for transaction locking and coordination. A Redis restart could potentially lose volatile lock information, leaving transactions in an indeterminate state. To mitigate this, SOP implements a “Not Restarted” Token mechanism:
notrestarted) is maintained in Redis with a sliding expiration (TTL).onIdle) periodically checks for this token.
This multi-layered approach ensures that SOP databases remain “rock solid” and self-healing, minimizing the need for manual administrative intervention.
As SOP scales to handle trillion to hundreds of trillions of items, the current linear chaining of registry segment files (while effective and simple) presents an opportunity for optimization.
registry-1, registry-2…) as needed. Lookup is linear $O(N)$.