sop

Scalable Objects Persistence


Project maintained by SharedCode Hosted on GitHub Pages — Theme by mattgraham

Configuration & Tuning Guide

This guide details the configuration options available in SOP and provides recommendations for tuning your stores for different workloads.

StoreOptions Reference

The StoreOptions struct is the primary way to configure a B-Tree store.

Field Type Description Default / Recommendation
Name string Short name of the store. Must be unique within the repository. Required.
SlotLength int Number of items stored in a single B-Tree node. Default: 2000. Max: 10,000. Higher values improve density. Trade-off: Larger nodes increase memory usage (L1/L2 cache). Read & Write latency is generally unaffected due to EC striping (parallel I/O). 4,000 is a recommended balance for high scale.
IsUnique bool Enforces uniqueness of keys. true for primary keys, false for non-unique indexes.
IsValueDataInNodeSegment bool Stores the Value directly inside the B-Tree node. Best for Small Data (< 1KB). Improves locality. If false, stores Value in a separate file/blob.
IsValueDataActivelyPersisted bool If true, persists the Value to a separate file immediately upon Add. Best for Big Data & Streaming. Prevents large values from bloating the B-Tree structure. Makes commit faster as data is already persisted.

Active Persistence Optimization

SOP features a unique optimization for handling large data (e.g., media files, GBs/TBs of data) called Active Persistence.

IsValueDataGloballyCached bool Caches the Value in Redis. true for read-heavy workloads. false for write-heavy or very large data.
LeafLoadBalancing bool Checks siblings for space before splitting a node. false (default). Set to true to save space at the cost of insert latency.
BlobStoreBaseFolderPath string Base path for the filesystem blob store. Required for infs / incfs.
CELexpression string CEL expression used for custom key comparison/sorting. Optional.
MapKeyIndexSpecification string JSON specification for compound indexes on Map keys. Optional.

Performance Tuning

Workload: Read-Heavy (e.g., User Profiles, Product Catalog)

Workload: Write-Heavy (e.g., Event Logging, IoT Telemetry)

Workload: Large Objects (e.g., Images, Documents)

Physical Storage & Redundancy

SOP utilizes a dual-layer approach to storage configuration to distinctively handle system-critical metadata vs. high-volume data.

1. Registry Redundancy (StoresFolders)

The StoresFolders option (found in DatabaseOptions or global config) defines the root partitions for the Database Registry and System Tables.

2. Data Striping & Reliability (Erasure Coding)

For the actual B-Trees and BLOB data (User Data), SOP uses Erasure Coding.

Batch Size

When performing bulk operations (e.g., UpdateMany, RemoveMany), SOP processes items in batches.

Cache Configuration

SOP supports pluggable caching backends.

Cache Factory

You can configure the global cache factory to switch between Redis (distributed) and In-Memory (standalone) modes.

import "github.com/sharedcode/sop"

func init() {
    // Use Redis (Default) - Requires a running Redis server
    sop.SetCacheFactory(sop.Redis)

    // OR

    // Use In-Memory - No external dependencies
    sop.SetCacheFactory(sop.InMemory)
}

Store Cache Config

The StoreCacheConfig struct controls how data is cached within the chosen backend.

Field Type Description
RegistryCacheDuration time.Duration TTL for registry entries (Virtual ID -> Physical Location).
StoreInfoCacheDuration time.Duration TTL for store metadata.
NodeCacheDuration time.Duration TTL for B-Tree nodes.
ValueDataCacheDuration time.Duration TTL for value data (if stored separately).
IsNodeCacheTTL bool If true, accessing a node extends its cache TTL (Sliding Window).
IsValueDataCacheTTL bool If true, accessing a value extends its cache TTL.

Registry Partitioning & Tuning

SOP uses a “Registry” to map logical IDs (UUIDs) to physical file locations. This registry is partitioned into multiple “Segment Files” to manage file sizes and concurrency.

Registry Hash Mod

The RegistryHashModValue determines the granularity of this partitioning. This value is configured per database (per StoresBaseFolder) via TransactionOptions.

Scaling & File Handles: SOP automatically allocates additional segment files (e.g., registry-1.reg, registry-2.reg) as needed.

Capacity Planning Table

The following table estimates the storage capacity for a single Registry Segment File based on the RegistryHashModValue.

Assumptions:

Hash Mod Value Segment File Size (Disk) Estimated Capacity (Key/Value Pairs)
250 (Default) ~1 MB (250 * 4096) 77,500,000 (77.5 Million)
500 ~2 MB 155,000,000 (155 Million)
10,000 ~41 MB 3,100,000,000 (3.1 Billion)
100,000 ~410 MB 31,000,000,000 (31 Billion)
400,000 ~1.6 GB 124,000,000,000 (124 Billion)

Capacity Planning (Max Density)

The following table portrays the theoretical maximums using a Slot Length of 20,000 and a typical B-Tree Load Factor of 68%.

Assumptions:

Hash Mod Value Segment File Size (Disk) Estimated Capacity (Key/Value Pairs)
250 (Default) ~1 MB 224,400,000 (224.4 Million)
500 ~2 MB 448,800,000 (448.8 Million)
10,000 ~41 MB 8,976,000,000 (8.97 Billion)
100,000 ~410 MB 89,760,000,000 (89.7 Billion)
400,000 ~1.6 GB 359,040,000,000 (359 Billion)
750,000 (Max) ~3 GB 673,200,000,000 (673.2 Billion)

Note on Horizontal Scaling: The capacity figures above apply to a single registry segment file. When a “sector” (which serves as a hash bucket) within a segment file becomes full, SOP automatically allocates a new segment file (e.g., registry-2.reg). The total capacity scales linearly with the number of files.

  • Example: If your usage requires 5 segment files, your total capacity is 5x the figures shown in the table.

Performance Constraint: It is recommended to limit the number of segment files to 5-10 at most.

  • Reasoning: Segment files are traversed sequentially (like a linked list) when searching for a Virtual ID. Searching for an ID could require visiting up to N files in the worst case (where N is the number of segments).
  • Warning: If you use a small RegistryHashModValue for billions of items, the system will generate many segment files, causing registry lookups to consume excessive IOPS.
  • Best Practice: Fine-tune the RegistryHashModValue and B-Tree SlotLength to accommodate your target capacity within a minimal number of segment files.

Alternative Optimization: Slot Length

Instead of increasing RegistryHashModValue, you can also optimize for large datasets by increasing the B-Tree SlotLength.

Tuning Guidelines

Scenario Recommendation Rationale
Small to Medium Datasets (< 100M items) Default (250) Keeps segment files small (~1MB), minimizing I/O overhead for partial updates.
Large Datasets (> 1B items) Increase (e.g., 1000 - 5000) Creates larger segment files (4MB - 20MB). Reduces the total number of files on disk, which is better for filesystem performance and backup operations.
High Concurrency Moderate (500) Balances file size with lock contention (though SOP uses row-level locking, file handles are still a resource).

Note: Changing RegistryHashModValue after a store has been created is not supported and will result in data inaccessibility. This value must be set once during the initial creation of the repository.