Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 274 additions & 1 deletion Cargo.lock

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -130,3 +130,15 @@ publish-pypi: $(MATURIN) build-wheels
publish-crate:
@echo "Publishing sandd daemon to crates.io..."
cargo publish --package sandd

.PHONY: benchmark
benchmark: $(MATURIN)
@echo "Running benchmarks for sandd daemon..."
@echo ""
cargo bench --package sandd

.PHONY: benchmark-results
benchmark-results:
@echo "Benchmark results for sandd daemon:"
@echo ""
open target/criterion/report/index.html
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,13 +148,14 @@ server = Server(connect="tunnel", tunnel_config=config)
# ✓ No public IPs required
```

See [Tunnel Mode Guide](./docs/TUNNEL.md) for setup instructions.
See [Tunnel Mode Guide](./docs/proposals/TUNNEL.md) for setup instructions.

## Documentation

- [Quick Start Guide](./docs/QUICKSTART.md)
- [Architecture Details](./docs/ARCHITECTURE.md)
- [Protocol Specification](./docs/PROTOCOL.md)
- [Protocol Specification](./docs/proposals/PROTOCOL.md)
- [Tunnel Mode Guide](./docs/proposals/TUNNEL.md)
- [Development Guide](./docs/DEVELOP.md)
- [Examples](./examples)

Expand Down
4 changes: 2 additions & 2 deletions docs/DEVELOP.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ Include motivation and context.

WebSocket-based JSON protocol for agent-daemon communication.

For complete protocol specification, see [PROTOCOL.md](PROTOCOL.md).
For complete protocol specification, see [proposals/PROTOCOL.md](proposals/PROTOCOL.md).

## Resources

Expand All @@ -322,6 +322,6 @@ For complete protocol specification, see [PROTOCOL.md](PROTOCOL.md).
## Questions?

- Check [ARCHITECTURE.md](ARCHITECTURE.md) for design details
- Check [PROTOCOL.md](PROTOCOL.md) for protocol specification
- Check [proposals/PROTOCOL.md](proposals/PROTOCOL.md) for protocol specification
- Check [STATUS.md](STATUS.md) for implementation status
- Check [QUICKSTART.md](QUICKSTART.md) for usage examples
File renamed without changes.
279 changes: 279 additions & 0 deletions docs/proposals/SNAPSHOTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
# SandD Snapshot System

## Overview

A Git-inspired snapshot system for capturing and restoring workspace state in agent sandboxes. This is a **pure snapshot system** (not version control) - focused on state capture/restore rather than tracking changes over time.

## Key Features

- **Hierarchical trees**: Efficient for large projects (100k+ files)
- **Content-addressable storage**: Automatic deduplication via BLAKE3 hashing
- **Cross-platform**: Works on Linux, macOS, Windows without special privileges
- **Tag-based filtering**: Organize snapshots with multiple tags
- **Independent snapshots**: No parent chains, each snapshot stands alone

---

## Similar Systems

This design takes inspiration from:

- **VM Snapshots** (VMware/VirtualBox): State capture/restore
- **ZFS/Btrfs Snapshots**: Filesystem-level snapshots
- **Docker Layers**: Image layers with content addressing
- **Time Machine**: Point-in-time backups

We use Git's storage model (hierarchical trees, content-addressable) but with snapshot semantics (no version control features).

---

## Architecture Overview

### High-Level Architecture

```
┌─────────────────────────────────────────────────────────┐
│ SandD Daemon │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Snapshot Manager (Public API) │ │
│ └────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌────────────────┴───────────────────────────────────┐ │
│ │ Object Store (CAS) │ │
│ │ - Store blobs by content hash │ │
│ │ - Retrieve blobs by hash │ │
│ │ - Automatic deduplication │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌────────────────────────────────┐
│ Filesystem Storage │
│ .snapshots/ │
│ ├── objects/ │
│ │ ├── ab/ │
│ │ │ └── cdef123... │
│ │ └── 12/ │
│ │ └── 3456... │
│ ├── snapshots/ │
│ │ ├── snap-uuid-1.json │
│ │ └── snap-uuid-2.json │
│ └── HEAD │
└────────────────────────────────┘
```

**Note:** `ab/` and `12/` are subdirectories named after the first 2 characters of content hashes. This is explained in detail below.

### Storage Model (Git-Inspired)

**Content-Addressable Storage:**
- Files stored by BLAKE3 hash (64 hex characters, e.g., `abc123def456...`)
- Automatic deduplication (same content = same hash = stored once)
- Immutable objects (never modified after creation)

**Hash-Based Directory Sharding:**

To keep directories fast (many filesystems slow down with >10k files per directory), we split objects into subdirectories based on the **first 2 characters** of their hash:

```
Hash: abc123def456789... (64 chars)
↑↑ ↑↑↑↑↑↑↑↑↑↑↑↑↑
│ └─ Filename
└─ Subdirectory name

Stored as: objects/ab/c123def456789...
↑↑ ↑↑↑↑↑↑↑↑↑↑↑↑↑
│ └─ Rest of hash (62 chars)
└─ First 2 chars (256 possible: 00-ff)
```

**Why this works:**
- BLAKE3 hashes are uniformly distributed (cryptographic property)
- First 2 hex chars = 256 possible subdirectories (16² = 00, 01, ..., fe, ff)
- 10,000 objects = ~39 objects per subdirectory (10000/256)
- Industry standard pattern (used by Git, Docker, IPFS)

**Example:**

```
File: main.rs
Content: "fn main() {}"
Hash: ab7c3ef21a9b4d5e6f8a1c2d3e4f5a6b...
Stored at: objects/ab/7c3ef21a9b4d5e6f8a1c2d3e4f5a6b...
↑↑ ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
│ Remaining 62 characters
First 2 characters
```

**Tree Structure:**
```
workspace/
├── src/
│ ├── main.rs → Hash: ab7c3ef2...
│ └── lib.rs → Hash: cd8e9f1a...
└── Cargo.toml → Hash: 12a4b6c8...

Becomes:

objects/
├── ab/
│ └── 7c3ef2... ← Blob: main.rs content
├── cd/
│ └── 8e9f1a... ← Blob: lib.rs content
├── 12/
│ └── a4b6c8... ← Blob: Cargo.toml content
├── ef/
│ └── aabbcc... ← Tree: src/ directory structure (JSON)
└── 99/
└── 887766... ← Tree: root directory structure (JSON)

snapshots/snap-uuid.json → points to root tree (998877...)
```

---

## Core API

```rust
pub struct SnapshotManager {
store: ObjectStore,
snapshots_dir: PathBuf,
}

impl SnapshotManager {
/// Initialize snapshot manager
pub fn new(root: PathBuf) -> Result<Self>;

/// Create a snapshot of workspace
pub async fn create_snapshot(
&self,
workspace: &Path,
message: Option<String>,
tags: Option<Vec<String>>,
) -> Result<String>; // Returns snapshot ID

/// Restore snapshot to destination
pub async fn restore_snapshot(
&self,
snapshot_id: &str,
destination: &Path,
) -> Result<()>;

/// List all snapshots (optionally filtered by tags)
pub async fn list_snapshots(
&self,
filter_tags: Option<Vec<String>>,
) -> Result<Vec<SnapshotInfo>>;

/// Find snapshots by tag
pub async fn find_by_tag(&self, tag: &str) -> Result<Vec<SnapshotInfo>>;

/// Get snapshot by ID
pub async fn get_snapshot(&self, id: &str) -> Result<Snapshot>;

/// Delete snapshot
pub async fn delete_snapshot(&self, id: &str) -> Result<()>;
}
```

---

## Protocol Integration

**Note:** See [Protocol Specification](PROTOCOL.md) for complete message format details.

**New message types:**

```rust
pub enum Request {
CreateSnapshot {
daemon_id: String,
workspace_path: String,
message: String,
tags: Vec<String>,
},

RestoreSnapshot {
daemon_id: String,
snapshot_id: String,
destination: String,
},

ListSnapshots { daemon_id: String },
DeleteSnapshot { daemon_id: String, snapshot_id: String },
GarbageCollect { daemon_id: String },
}

pub enum Response {
SnapshotCreated {
snapshot_id: String,
file_count: usize,
total_size: u64,
duration_ms: u64,
},

SnapshotRestored { file_count: usize, duration_ms: u64 },
Snapshots { snapshots: Vec<SnapshotInfo> },
SnapshotDeleted { freed_bytes: u64 },
GarbageCollected { objects_deleted: usize, bytes_freed: u64 },
}
```

---


## Example Usage

```rust
use sandd::snapshot::SnapshotManager;

#[tokio::main]
async fn main() -> Result<()> {
let manager = SnapshotManager::new(
PathBuf::from("/var/sandd/snapshots")
)?;

// Create snapshot with optional message and tags
let snapshot_id = manager.create_snapshot(
Path::new("/workspace/agent-123"),
Some("Before task execution".to_string()),
Some(vec!["pre-task".to_string()]),
).await?;

println!("Created snapshot: {}", snapshot_id);

// List all snapshots
let snapshots = manager.list_snapshots(None).await?;
for snap in snapshots {
println!("{}: {} (tags: {:?})", snap.id, snap.message, snap.tags);
}

// Find snapshots by tag
let pre_task_snapshots = manager.find_by_tag("pre-task").await?;

// Get specific snapshot details
let snapshot = manager.get_snapshot(&snapshot_id).await?;
println!("Files: {}, Size: {} bytes", snapshot.file_count, snapshot.total_size);

// Restore if needed
manager.restore_snapshot(
&snapshot_id,
Path::new("/tmp/restored"),
).await?;

Ok(())
}
```

---

## Alternatives Considered

| Alternative | Why Not? |
|-------------|----------|
| Docker volumes | Requires Docker, container-only |
| BTRFS/ZFS | Requires specific filesystem + root |
| overlayfs | Requires root, Linux only |
| fuse-overlayfs | 3-4x I/O overhead, requires /dev/fuse |
| rsync | No built-in versioning, manual management |

**Decision:** Git model is proven, cross-platform, and works everywhere.
File renamed without changes.
Loading
Loading