docs: reorganize sync and operation-log documentation

Move scattered architecture docs into centralized locations:

- Move operation-log docs from src/app/core/persistence/operation-log/docs/
  to docs/op-log/
- Flatten docs/sync/sync/ nested structure to docs/sync/
- Move supersync-encryption-architecture.md from docs/ai/ to docs/sync/
- Copy pfapi sync README to docs/sync/pfapi-sync-overview.md
- Update all cross-references to use new paths

This improves discoverability and keeps architecture documentation
separate from source code.
This commit is contained in:
Johannes Millan 2025-12-27 10:54:13 +01:00
parent a850f8af9e
commit b4ce9d5da6
23 changed files with 168 additions and 12 deletions

132
docs/op-log/README.md Normal file
View file

@ -0,0 +1,132 @@
# Operation Log Documentation
**Last Updated:** December 2025
This directory contains the architectural documentation for Super Productivity's Operation Log system - an event-sourced persistence and synchronization layer.
## Quick Start
| If you want to... | Read this |
| ----------------------------------- | ---------------------------------------------------------------------------------- |
| Understand the overall architecture | [operation-log-architecture.md](./operation-log-architecture.md) |
| See visual diagrams | [operation-log-architecture-diagrams.md](./operation-log-architecture-diagrams.md) |
| Learn the design rules | [operation-rules.md](./operation-rules.md) |
| Understand file-based sync | [hybrid-manifest-architecture.md](./hybrid-manifest-architecture.md) |
| Understand legacy PFAPI sync | [pfapi-sync-persistence-architecture.md](./pfapi-sync-persistence-architecture.md) |
## Documentation Overview
### Core Documentation
| Document | Description | Status |
| ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- |
| [operation-log-architecture.md](./operation-log-architecture.md) | Comprehensive architecture reference covering Parts A-F: Local Persistence, Legacy Sync Bridge, Server Sync, Validation & Repair, Smart Archive Handling, and Atomic State Consistency | ✅ Active |
| [operation-log-architecture-diagrams.md](./operation-log-architecture-diagrams.md) | Mermaid diagrams visualizing data flows, sync protocols, and state management | ✅ Active |
| [operation-rules.md](./operation-rules.md) | Design rules and guidelines for the operation log store and operations | ✅ Active |
### Sync Architecture
| Document | Description | Status |
| ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | -------------- |
| [hybrid-manifest-architecture.md](./hybrid-manifest-architecture.md) | File-based sync optimization using embedded operations buffer and snapshots (WebDAV/Dropbox) | ✅ Implemented |
| [pfapi-sync-persistence-architecture.md](./pfapi-sync-persistence-architecture.md) | Legacy PFAPI sync system that coexists with operation log | ✅ Active |
### Planning & Proposals
| Document | Description | Status |
| ---------------------------------------------------------------------------------------------- | --------------------------------------------- | ------------- |
| [e2e-encryption-plan.md](./e2e-encryption-plan.md) | End-to-end encryption design proposal | 📋 Planned |
| [tiered-archive-proposal.md](./tiered-archive-proposal.md) | Multi-tier archive storage proposal | 📋 Planned |
| [operation-payload-optimization-discussion.md](./operation-payload-optimization-discussion.md) | Discussion on payload optimization strategies | 📋 Historical |
## Architecture at a Glance
The Operation Log system serves four distinct purposes:
```
┌────────────────────────────────────────────────────────────────────┐
│ User Action │
└────────────────────────────────────────────────────────────────────┘
NgRx Store
(Runtime Source of Truth)
┌───────────────────┼───────────────────┐
▼ │ ▼
OpLogEffects │ Other Effects
│ │
├──► SUP_OPS ◄──────┘
│ (Local Persistence - Part A)
└──► META_MODEL vector clock
(Legacy Sync Bridge - Part B)
PFAPI reads from NgRx for sync (not from op-log)
```
### The Four Parts
| Part | Purpose | Description |
| -------------------------- | --------------------------- | ----------------------------------------------------------------------------- |
| **A. Local Persistence** | Fast writes, crash recovery | Operations stored in IndexedDB (`SUP_OPS`), with snapshots for fast hydration |
| **B. Legacy Sync Bridge** | PFAPI compatibility | Updates vector clocks so WebDAV/Dropbox sync continues to work |
| **C. Server Sync** | Operation-based sync | Upload/download individual operations via SuperSync server |
| **D. Validation & Repair** | Data integrity | Checkpoint validation with automatic repair and REPAIR operations |
Additional architectural patterns:
| Pattern | Purpose |
| ------------------------------- | ------------------------------------------------------------------ |
| **E. Smart Archive Handling** | Deterministic archive operations synced via instructions, not data |
| **F. Atomic State Consistency** | Meta-reducers ensure multi-entity changes are atomic |
## Key Concepts
### Event Sourcing
The Operation Log treats the database as a **timeline of events** rather than mutable state:
- **Source of Truth**: The log is truth; current state is derived by replaying the log
- **Immutability**: Operations are never modified, only appended
- **Snapshots**: Periodic snapshots speed up hydration (replay from snapshot + tail ops)
### Vector Clocks
Vector clocks track causality for conflict detection:
- Each client has its own counter in the vector clock
- Comparison reveals: `EQUAL`, `LESS_THAN`, `GREATER_THAN`, or `CONCURRENT`
- `CONCURRENT` indicates a true conflict requiring resolution
### LOCAL_ACTIONS Token
Effects that perform side effects (snacks, external APIs, UI) must use `LOCAL_ACTIONS` instead of `Actions`:
```typescript
private _actions$ = inject(LOCAL_ACTIONS); // Excludes remote operations
```
This prevents duplicate side effects when syncing operations from other clients.
## Related Documentation
| Location | Content |
| ---------------------------------------------------------------------- | ----------------------------------- |
| [/docs/sync/vector-clocks.md](/docs/sync/vector-clocks.md) | Vector clock implementation details |
| [/docs/ai/sync/](/android/sync/) | Historical planning documents |
| [/packages/super-sync-server/](/packages/super-sync-server/) | SuperSync server implementation |
| [/src/app/pfapi/api/sync/README.md](/src/app/pfapi/api/sync/README.md) | PFAPI sync overview |
## Implementation Status
| Component | Status |
| ---------------------------- | --------------------------------------------------- |
| Local Persistence (Part A) | ✅ Complete |
| Legacy Sync Bridge (Part B) | ✅ Complete |
| Server Sync (Part C) | ✅ Complete (single-version) |
| Validation & Repair (Part D) | ✅ Complete |
| Cross-version Sync (A.7.11) | ⚠️ Not implemented |
| Schema Migrations | ✅ Infrastructure ready (no migrations defined yet) |
See [operation-log-architecture.md#implementation-status](./operation-log-architecture.md#implementation-status) for detailed status.

View file

@ -0,0 +1,411 @@
# E2E Encryption for SuperSync Server
## Summary
Add end-to-end encryption to SuperSync where the server cannot read operation payloads. Users provide a separate encryption password which is used to derive an encryption key client-side. This is the same approach used by legacy sync providers (Dropbox, WebDAV, Local File).
## Key Decisions
| Decision | Choice |
| -------------------- | -------------------------------------------------------------- |
| Encryption scope | Payload-only (metadata stays plaintext for conflict detection) |
| Key derivation | User-provided encryption password → Argon2id → key |
| Password change | Not supported (would require re-encrypting all data) |
| Server changes | None required |
| Missing key handling | Fail gracefully with dialog to enter password |
---
## Architecture
### Encryption Flow
```
User Encryption Password → Argon2id (64MB, 3 iter) → AES-256 Key → encrypt/decrypt payloads
```
### Data Flow
```
Upload: Operation → encrypt payload with key → upload (metadata plaintext)
Download: Receive ops → decrypt payload with key → apply to state
```
### Why Payload-Only Encryption?
The server needs plaintext metadata for:
- **Conflict detection** - Uses vector clocks to detect concurrent edits
- **Deduplication** - Uses operation IDs to prevent duplicates
- **Ordering** - Uses timestamps and server sequence numbers
- **Tombstone tracking** - Uses entity IDs for delete tracking
The server does NOT need to read:
- **Payloads** - The actual data being created/updated/deleted
This design encrypts payloads while keeping metadata accessible, giving the server enough information to coordinate sync without seeing user data.
---
## Implementation Plan
### Phase 1: Data Model Changes
**File:** `src/app/pfapi/api/sync/sync-provider.interface.ts`
Add encryption flag to `SyncOperation`:
```typescript
export interface SyncOperation {
// ... existing fields ...
isPayloadEncrypted?: boolean; // NEW: true if payload is encrypted string
}
```
**File:** `src/app/pfapi/api/sync/providers/super-sync/super-sync.model.ts`
The `encryptKey` field already exists in `SyncProviderPrivateCfgBase`. Just need to add the enable flag:
```typescript
export interface SuperSyncPrivateCfg extends SyncProviderPrivateCfgBase {
// ... existing fields ...
isEncryptionEnabled?: boolean; // NEW
// encryptKey?: string; // Already inherited from base
}
```
---
### Phase 2: Client-Side Encryption Service
**New file:** `src/app/core/persistence/operation-log/sync/operation-encryption.service.ts`
```typescript
import { inject, Injectable } from '@angular/core';
import { encrypt, decrypt } from '../../../../pfapi/api/encryption/encryption';
@Injectable({ providedIn: 'root' })
export class OperationEncryptionService {
/**
* Encrypts the payload of a SyncOperation.
* Returns a new operation with encrypted payload and isPayloadEncrypted=true.
*/
async encryptOperation(op: SyncOperation, encryptKey: string): Promise<SyncOperation> {
const payloadStr = JSON.stringify(op.payload);
const encryptedPayload = await encrypt(payloadStr, encryptKey);
return {
...op,
payload: encryptedPayload,
isPayloadEncrypted: true,
};
}
/**
* Decrypts the payload of a SyncOperation.
* Returns a new operation with decrypted payload.
* Throws DecryptError if decryption fails.
*/
async decryptOperation(op: SyncOperation, encryptKey: string): Promise<SyncOperation> {
if (!op.isPayloadEncrypted) {
return op; // Pass through unencrypted ops
}
const decryptedStr = await decrypt(op.payload as string, encryptKey);
return {
...op,
payload: JSON.parse(decryptedStr),
isPayloadEncrypted: false,
};
}
/**
* Batch encrypt operations for upload.
*/
async encryptOperations(
ops: SyncOperation[],
encryptKey: string,
): Promise<SyncOperation[]> {
return Promise.all(ops.map((op) => this.encryptOperation(op, encryptKey)));
}
/**
* Batch decrypt operations after download.
* Non-encrypted ops pass through unchanged.
*/
async decryptOperations(
ops: SyncOperation[],
encryptKey: string,
): Promise<SyncOperation[]> {
return Promise.all(ops.map((op) => this.decryptOperation(op, encryptKey)));
}
}
```
**Reuses:** Existing `src/app/pfapi/api/encryption/encryption.ts` (AES-GCM, Argon2id)
---
### Phase 3: Upload Integration
**File:** `src/app/core/persistence/operation-log/sync/operation-log-upload.service.ts`
Modify `_uploadPendingOpsViaApi()`:
```typescript
// Add injection
private encryptionService = inject(OperationEncryptionService);
// In _uploadPendingOpsViaApi(), after converting to SyncOperation format:
const privateCfg = await syncProvider.privateCfg.load();
let opsToUpload = syncOps;
if (privateCfg?.isEncryptionEnabled && privateCfg?.encryptKey) {
opsToUpload = await this.encryptionService.encryptOperations(syncOps, privateCfg.encryptKey);
}
// Upload opsToUpload instead of syncOps
const response = await syncProvider.uploadOps(opsToUpload, clientId, lastKnownServerSeq);
// Also encrypt piggybacked ops handling needs decryption:
if (response.newOps && response.newOps.length > 0) {
let ops = response.newOps.map((serverOp) => serverOp.op);
if (privateCfg?.encryptKey) {
ops = await this.encryptionService.decryptOperations(ops, privateCfg.encryptKey);
}
const operations = ops.map((op) => syncOpToOperation(op));
piggybackedOps.push(...operations);
}
```
---
### Phase 4: Download Integration
**File:** `src/app/core/persistence/operation-log/sync/operation-log-download.service.ts`
Modify `_downloadRemoteOpsViaApi()`:
```typescript
// Add injection
private encryptionService = inject(OperationEncryptionService);
private matDialog = inject(MatDialog);
// After downloading ops, before converting to Operation format:
const privateCfg = await syncProvider.privateCfg.load();
let syncOps = response.ops
.filter((serverOp) => !appliedOpIds.has(serverOp.op.id))
.map((serverOp) => serverOp.op);
// Check if any ops are encrypted
const hasEncryptedOps = syncOps.some(op => op.isPayloadEncrypted);
if (hasEncryptedOps) {
let encryptKey = privateCfg?.encryptKey;
// If no key cached, prompt user
if (!encryptKey) {
encryptKey = await this._promptForEncryptionPassword();
if (!encryptKey) {
// User cancelled - abort sync
return { newOps: [], success: false };
}
}
try {
syncOps = await this.encryptionService.decryptOperations(syncOps, encryptKey);
} catch (e) {
if (e instanceof DecryptError) {
// Wrong password - prompt again
await this._showDecryptionErrorDialog();
return { newOps: [], success: false };
}
throw e;
}
}
const operations = syncOps.map((op) => syncOpToOperation(op));
```
---
### Phase 5: UI Changes
**File:** `src/app/features/config/form-cfgs/sync-form.const.ts`
Add encryption fields to SuperSync provider form:
```typescript
// In SuperSync fieldGroup, add:
{
key: 'isEncryptionEnabled',
type: 'checkbox',
props: {
label: T.F.SYNC.FORM.SUPER_SYNC.L_ENABLE_E2E_ENCRYPTION,
},
},
{
hideExpression: (model: any) => !model.isEncryptionEnabled,
key: 'encryptKey',
type: 'input',
props: {
type: 'password',
label: T.F.SYNC.FORM.L_ENCRYPTION_PASSWORD,
required: true,
},
},
{
hideExpression: (model: any) => !model.isEncryptionEnabled,
type: 'tpl',
props: {
tpl: `<div class="warn-text">{{ T.F.SYNC.FORM.SUPER_SYNC.ENCRYPTION_WARNING | translate }}</div>`,
},
},
```
**Translations:** `src/assets/i18n/en.json`
```json
{
"F": {
"SYNC": {
"FORM": {
"SUPER_SYNC": {
"L_ENABLE_E2E_ENCRYPTION": "Enable end-to-end encryption",
"ENCRYPTION_WARNING": "Warning: If you forget your encryption password, your data cannot be recovered. This password is separate from your login password."
}
},
"S": {
"DECRYPTION_FAILED": "Failed to decrypt synced data. Please check your encryption password.",
"ENCRYPTION_PASSWORD_REQUIRED": "Encryption password required to sync encrypted data."
}
}
}
}
```
**New dialog component:** `src/app/imex/sync/dialog-encryption-password/`
Simple dialog to prompt for encryption password when needed:
- Input field for password
- Cancel and OK buttons
- Used when encrypted ops are received but no password is cached
---
## File Summary
### New Files
| File | Purpose |
| ----------------------------------------------------------------------------- | -------------------------- |
| `src/app/core/persistence/operation-log/sync/operation-encryption.service.ts` | Encrypt/decrypt operations |
| `src/app/imex/sync/dialog-encryption-password/` | Password prompt dialog |
### Modified Files
| File | Changes |
| ------------------------------------------------------------------------------- | ----------------------------------------- |
| `src/app/pfapi/api/sync/sync-provider.interface.ts` | Add `isPayloadEncrypted` to SyncOperation |
| `src/app/pfapi/api/sync/providers/super-sync/super-sync.model.ts` | Add `isEncryptionEnabled` flag |
| `src/app/core/persistence/operation-log/sync/operation-log-upload.service.ts` | Encrypt before upload |
| `src/app/core/persistence/operation-log/sync/operation-log-download.service.ts` | Decrypt after download |
| `src/app/features/config/form-cfgs/sync-form.const.ts` | Add encryption toggle + password field |
| `src/app/t.const.ts` | Add translation keys |
| `src/assets/i18n/en.json` | Add translation strings |
### No Server Changes Required
The server treats encrypted payloads as opaque strings - no modifications needed.
---
## Security Considerations
### What's Protected
1. **Payload content** - All user data (tasks, projects, notes, etc.) is encrypted
2. **Zero-knowledge** - Server never sees encryption password or plaintext data
3. **Strong crypto** - AES-256-GCM with Argon2id key derivation
### What's Exposed (by design)
1. **Operation metadata** - IDs, timestamps, entity types, vector clocks
2. **Traffic patterns** - Server knows when you sync and how many operations
3. **Encryption status** - Server can see `isPayloadEncrypted: true`
### Cryptographic Details
| Component | Algorithm | Parameters |
| ------------------ | ----------- | --------------------------------------------- |
| Key derivation | Argon2id | 64MB memory, 3 iterations |
| Payload encryption | AES-256-GCM | Random 12-byte IV, 16-byte salt per operation |
### Limitations
| Limitation | Reason |
| -------------------- | --------------------------------------------------------- |
| No password change | Would require re-encrypting all operations on all clients |
| No password recovery | True zero-knowledge means no recovery possible |
| Two passwords | Login password + encryption password (by design) |
### Threat Model
| Threat | Mitigated? | Notes |
| -------------------- | ---------- | --------------------------------------------------- |
| Server reads data | Yes | Payloads encrypted client-side |
| Server breach | Yes | Attacker gets encrypted blobs, needs password |
| MITM attack | Yes | HTTPS + authenticated encryption |
| Password brute force | Partially | Argon2id makes attacks expensive (64MB per attempt) |
| Lost password | No | Data unrecoverable without password |
---
## Migration Path
### Enabling Encryption (Existing User)
1. User enables "E2E Encryption" in SuperSync settings
2. User enters encryption password
3. Warning shown about password recovery
4. Password saved to `SuperSyncPrivateCfg.encryptKey`
5. `isEncryptionEnabled` set to true
6. Future operations encrypted; existing operations remain plaintext
7. Other clients prompted for password when they encounter encrypted ops
### Multi-Client Scenario
When encryption is enabled on one client:
1. Other clients download operations normally
2. When they encounter `isPayloadEncrypted: true`, decryption is attempted
3. If no password cached, dialog prompts for password
4. Password cached in local `SuperSyncPrivateCfg` for future syncs
### Disabling Encryption
1. User unchecks encryption toggle
2. `isEncryptionEnabled` set to false
3. Future operations sent without encryption
4. Existing encrypted operations remain encrypted (still readable with password)
---
## Testing Strategy
1. **Unit tests** for `OperationEncryptionService`
- Encrypt/decrypt round-trips with various payload types
- Non-encrypted ops pass through unchanged
- Wrong password throws DecryptError
2. **Integration tests** for upload/download
- Encrypted operations sync correctly
- Mixed encrypted/unencrypted history works
- Piggybacked operations decrypt correctly
3. **E2E tests**
- Two clients with same password sync correctly
- Missing password shows dialog
- Wrong password shows error and retries

View file

@ -0,0 +1,627 @@
# Hybrid Manifest & Snapshot Architecture for File-Based Sync
**Status:** ✅ Implemented (December 2025)
**Context:** Optimizing WebDAV/Dropbox sync for the Operation Log architecture.
**Related:** [Operation Log Architecture](./operation-log-architecture.md)
> **Implementation Note:** This architecture is fully implemented in `OperationLogManifestService`, `OperationLogUploadService`, and `OperationLogDownloadService`. The embedded operations buffer, overflow file creation, and snapshot support are all operational.
---
## 1. The Problem
The current `OperationLogSyncService` fallback for file-based providers (WebDAV, Dropbox) is inefficient for frequent, small updates.
**Current Workflow (Naive Fallback):**
1. **Write Operation File:** Upload `ops/ops_CLIENT_TIMESTAMP.json`.
2. **Read Manifest:** Download `ops/manifest.json` to get current list.
3. **Update Manifest:** Upload new `ops/manifest.json` with the new filename added.
**Issues:**
- **High Request Count:** Minimum 3 HTTP requests per sync cycle.
- **File Proliferation:** Rapidly creates thousands of small files, degrading WebDAV directory listing performance.
- **Latency:** On slow connections (standard WebDAV), this makes sync feel sluggish.
---
## 2. Proposed Solution: Hybrid Manifest
Instead of treating the manifest solely as an _index_ of files, we treat it as a **buffer** for recent operations.
### 2.1. Concept
- **Embedded Operations:** Small batches of operations are stored directly inside `manifest.json`.
- **Lazy Flush:** New operation files (`ops_*.json`) are only created when the manifest buffer fills up.
- **Snapshots:** A "base state" file allows us to delete old operation files and clear the manifest history.
### 2.2. Data Structures
**Updated Manifest:**
```typescript
interface HybridManifest {
version: 2;
// The baseline state (snapshot). If present, clients load this first.
lastSnapshot?: SnapshotReference;
// Ops stored directly in the manifest (The Buffer)
// Limit: ~50 ops or 100KB payload size
embeddedOperations: EmbeddedOperation[];
// References to external operation files (The Overflow)
// Older ops that were flushed out of the buffer
operationFiles: OperationFileReference[];
// Merged vector clock from all embedded operations
// Used for quick conflict detection without parsing all ops
frontierClock: VectorClock;
// Last modification timestamp (for ETag-like cache invalidation)
lastModified: number;
}
interface SnapshotReference {
fileName: string; // e.g. "snapshots/snap_1701234567890.json"
schemaVersion: number; // Schema version of the snapshot
vectorClock: VectorClock; // Clock state at snapshot time
timestamp: number; // When snapshot was created
}
interface OperationFileReference {
fileName: string; // e.g. "ops/overflow_1701234567890.json"
opCount: number; // Number of operations in file (for progress estimation)
minSeq: number; // First operation's logical sequence in this file
maxSeq: number; // Last operation's logical sequence
}
// Embedded operations are lightweight - full Operation minus redundant fields
interface EmbeddedOperation {
id: string;
actionType: string;
opType: OpType;
entityType: EntityType;
entityId?: string;
entityIds?: string[];
payload: unknown;
clientId: string;
vectorClock: VectorClock;
timestamp: number;
schemaVersion: number;
}
```
**Snapshot File Format:**
```typescript
interface SnapshotFile {
version: 1;
schemaVersion: number; // App schema version
vectorClock: VectorClock; // Merged clock at snapshot time
timestamp: number;
data: AppDataComplete; // Full application state
checksum?: string; // Optional SHA-256 for integrity verification
}
```
---
## 3. Workflows
### 3.1. Upload (Write Path)
When a client has local pending operations to sync:
```
┌─────────────────────────────────────────────────────────────────┐
│ Upload Flow │
└─────────────────────────────────────────────────────────────────┘
┌───────────────────────────────┐
│ 1. Download manifest.json │
└───────────────────────────────┘
┌───────────────────────────────┐
│ 2. Detect remote changes │
│ (compare frontierClock) │
└───────────────────────────────┘
┌───────────────┴───────────────┐
▼ ▼
Remote has new ops? No remote changes
│ │
▼ │
Download & apply first ◄───────┘
┌───────────────────────────────┐
│ 3. Check buffer capacity │
│ embedded.length + pending │
└───────────────────────────────┘
┌───────────────┴───────────────┐
▼ ▼
< BUFFER_LIMIT (50) >= BUFFER_LIMIT
│ │
▼ ▼
Append to embedded Flush embedded to file
│ + add pending to empty buffer
│ │
└───────────────┬───────────────┘
┌───────────────────────────────┐
│ 4. Check snapshot trigger │
│ (operationFiles > 50 OR │
│ total ops > 5000) │
└───────────────────────────────┘
┌───────────────┴───────────────┐
▼ ▼
Trigger snapshot No snapshot needed
│ │
└───────────────┬───────────────┘
┌───────────────────────────────┐
│ 5. Upload manifest.json │
└───────────────────────────────┘
```
**Detailed Steps:**
1. **Download Manifest:** Fetch `manifest.json` (or create empty v2 manifest if not found).
2. **Detect Remote Changes:**
- Compare `manifest.frontierClock` with local `lastSyncedClock`.
- If remote has unseen changes → download and apply before uploading (prevents lost updates).
3. **Evaluate Buffer:**
- `BUFFER_LIMIT = 50` operations (configurable)
- `BUFFER_SIZE_LIMIT = 100KB` payload size (prevents manifest bloat)
4. **Strategy Selection:**
- **Scenario A (Append):** If `embedded.length + pending.length < BUFFER_LIMIT`:
- Append `pendingOps` to `manifest.embeddedOperations`.
- Update `manifest.frontierClock` with merged clocks.
- **Result:** 1 Write (manifest). Fast path.
- **Scenario B (Overflow):** If buffer would exceed limit:
- Upload `manifest.embeddedOperations` to new file `ops/overflow_TIMESTAMP.json`.
- Add file reference to `manifest.operationFiles`.
- Place `pendingOps` into now-empty `manifest.embeddedOperations`.
- **Result:** 1 Upload (overflow file) + 1 Write (manifest).
5. **Upload Manifest:** Write updated `manifest.json`.
### 3.2. Download (Read Path)
When a client checks for updates:
```
┌─────────────────────────────────────────────────────────────────┐
│ Download Flow │
└─────────────────────────────────────────────────────────────────┘
┌───────────────────────────────┐
│ 1. Download manifest.json │
└───────────────────────────────┘
┌───────────────────────────────┐
│ 2. Quick-check: any changes? │
│ Compare frontierClock │
└───────────────────────────────┘
┌───────────────┴───────────────┐
▼ ▼
No changes (clocks equal) Changes detected
│ │
▼ ▼
Done ┌────────────────────────┐
│ 3. Need snapshot? │
│ (local behind snapshot)│
└────────────────────────┘
┌───────────────┴───────────────┐
▼ ▼
Download snapshot Skip to ops
+ apply as base │
│ │
└───────────────┬───────────────┘
┌────────────────────────┐
│ 4. Download new op │
│ files (filter seen) │
└────────────────────────┘
┌────────────────────────┐
│ 5. Apply embedded ops │
│ (filter by op.id) │
└────────────────────────┘
┌────────────────────────┐
│ 6. Update local │
│ lastSyncedClock │
└────────────────────────┘
```
**Detailed Steps:**
1. **Download Manifest:** Fetch `manifest.json`.
2. **Quick-Check Changes:**
- Compare `manifest.frontierClock` against local `lastSyncedClock`.
- If clocks are equal → no changes, done.
3. **Check Snapshot Needed:**
- If local state is older than `manifest.lastSnapshot.vectorClock` → download snapshot first.
- Apply snapshot as base state (replaces local state).
4. **Download Operation Files:**
- Filter `manifest.operationFiles` to only files with `maxSeq > localLastAppliedSeq`.
- Download and parse each file.
- Collect all operations.
5. **Apply Embedded Operations:**
- Filter `manifest.embeddedOperations` by `op.id` (skip already-applied).
- Add to collected operations.
6. **Apply All Operations:**
- Sort by `vectorClock` (causal order).
- Detect conflicts using existing `detectConflicts()` logic.
- Apply non-conflicting ops; present conflicts to user.
7. **Update Tracking:**
- Set `localLastSyncedClock = manifest.frontierClock`.
---
## 4. Snapshotting (Compaction)
To prevent unbounded growth of operation files, any client can trigger a snapshot.
### 4.1. Triggers
| Condition | Threshold | Rationale |
| ------------------------------- | --------- | -------------------------------------- |
| External `operationFiles` count | > 50 | Prevent WebDAV directory bloat |
| Total operations since snapshot | > 5000 | Bound replay time for fresh installs |
| Time since last snapshot | > 7 days | Ensure periodic cleanup |
| Manifest size | > 500KB | Prevent manifest from becoming too big |
### 4.2. Process
```
┌─────────────────────────────────────────────────────────────────┐
│ Snapshot Flow │
└─────────────────────────────────────────────────────────────────┘
┌───────────────────────────────┐
│ 1. Ensure full sync complete │
│ (no pending local/remote) │
└───────────────────────────────┘
┌───────────────────────────────┐
│ 2. Read current state from │
│ NgRx (authoritative) │
└───────────────────────────────┘
┌───────────────────────────────┐
│ 3. Generate snapshot file │
│ + compute checksum │
└───────────────────────────────┘
┌───────────────────────────────┐
│ 4. Upload snapshot file │
│ (atomic, verify success) │
└───────────────────────────────┘
┌───────────────────────────────┐
│ 5. Update manifest │
│ - Set lastSnapshot │
│ - Clear operationFiles │
│ - Clear embeddedOperations │
│ - Reset frontierClock │
└───────────────────────────────┘
┌───────────────────────────────┐
│ 6. Upload manifest │
└───────────────────────────────┘
┌───────────────────────────────┐
│ 7. Cleanup (async, best- │
│ effort): delete old files │
└───────────────────────────────┘
```
### 4.3. Snapshot Atomicity
**Problem:** If the client crashes between uploading snapshot and updating manifest, other clients won't see the new snapshot.
**Solution:** Snapshot files are immutable and safe to leave orphaned. The manifest is the source of truth. Cleanup is best-effort.
**Invariant:** Never delete the current `lastSnapshot` file until a new snapshot is confirmed.
---
## 5. Conflict Handling
The hybrid manifest doesn't change conflict detection - it still uses vector clocks. However, the `frontierClock` in the manifest enables **early conflict detection**.
### 5.1. Early Conflict Detection
Before downloading all operations, compare clocks:
```typescript
const comparison = compareVectorClocks(localFrontierClock, manifest.frontierClock);
switch (comparison) {
case VectorClockComparison.LESS_THAN:
// Remote is ahead - safe to download
break;
case VectorClockComparison.GREATER_THAN:
// Local is ahead - upload our changes
break;
case VectorClockComparison.CONCURRENT:
// Potential conflicts - download ops for detailed analysis
break;
case VectorClockComparison.EQUAL:
// No changes - skip download
break;
}
```
### 5.2. Conflict Resolution
When conflicts are detected at the operation level, the existing `ConflictResolutionService` handles them. The hybrid manifest doesn't change this flow.
---
## 6. Edge Cases & Failure Modes
### 6.1. Concurrent Uploads (Race Condition)
**Scenario:** Two clients download the manifest simultaneously, both append ops, both upload.
**Problem:** Second upload overwrites first client's operations.
**Solution:** Use provider-specific mechanisms:
| Provider | Mechanism |
| ----------- | ------------------------------------------- |
| **Dropbox** | Use `update` mode with `rev` parameter |
| **WebDAV** | Use `If-Match` header with ETag |
| **Local** | File locking (already implemented in PFAPI) |
**Implementation:**
```typescript
interface HybridManifest {
// ... existing fields
// Optimistic concurrency control
etag?: string; // Server-assigned revision (Dropbox rev, WebDAV ETag)
}
async uploadManifest(manifest: HybridManifest, expectedEtag?: string): Promise<void> {
// If expectedEtag provided, use conditional upload
// On conflict (412 Precondition Failed), re-download and retry
}
```
### 6.2. Manifest Corruption
**Scenario:** Manifest JSON is invalid (partial write, encoding issue).
**Recovery Strategy:**
1. Attempt to parse manifest.
2. On parse failure, check for backup manifest (`manifest.json.bak`).
3. If no backup, reconstruct from operation files using `listFiles()`.
4. If reconstruction fails, fall back to snapshot-only state.
```typescript
async loadManifestWithRecovery(): Promise<HybridManifest> {
try {
return await this._loadRemoteManifest();
} catch (parseError) {
PFLog.warn('Manifest corrupted, attempting recovery...');
// Try backup
try {
return await this._loadBackupManifest();
} catch {
// Reconstruct from files
return await this._reconstructManifestFromFiles();
}
}
}
```
### 6.3. Snapshot File Missing
**Scenario:** Manifest references a snapshot that doesn't exist on the server.
**Recovery Strategy:**
1. Log error and notify user.
2. Fall back to replaying all available operation files.
3. If operation files also reference missing ops, show data loss warning.
### 6.4. Schema Version Mismatch
**Scenario:** Snapshot was created with schema version 3, but local app is version 2.
**Handling:**
- If `snapshot.schemaVersion > CURRENT_SCHEMA_VERSION + MAX_VERSION_SKIP`:
- Reject snapshot, prompt user to update app.
- If `snapshot.schemaVersion > CURRENT_SCHEMA_VERSION`:
- Load with warning (some fields may be stripped by Typia).
- If `snapshot.schemaVersion < CURRENT_SCHEMA_VERSION`:
- Run migrations on loaded state.
### 6.5. Large Pending Operations
**Scenario:** User was offline for a week, has 500 pending operations.
**Handling:**
- Don't try to embed all 500 in manifest.
- Batch into multiple overflow files (100 ops each).
- Upload files first, then update manifest once.
```typescript
const BATCH_SIZE = 100;
const chunks = chunkArray(pendingOps, BATCH_SIZE);
for (const chunk of chunks) {
await this._uploadOverflowFile(chunk);
}
// Single manifest update at the end
await this._uploadManifest(manifest);
```
---
## 7. Advantages Summary
| Metric | Current (v1) | Hybrid Manifest (v2) |
| :---------------------- | :----------------------------------- | :---------------------------------------------------- |
| **Requests per Sync** | 3 (Upload Op + Read Man + Write Man) | **1-2** (Read Man, optional Write) |
| **Files on Server** | Unbounded growth | **Bounded** (1 Manifest + 0-50 Op Files + 1 Snapshot) |
| **Fresh Install Speed** | O(n) - replay all ops | **O(1)** - load snapshot + small delta |
| **Conflict Detection** | Must parse all ops | **Quick check** via frontierClock |
| **Bandwidth per Sync** | ~2KB (op file) + manifest overhead | **~1KB** (manifest only for small changes) |
| **Offline Resilience** | Good | **Same** (operations buffered locally) |
---
## 8. Implementation Status
All phases have been implemented as of December 2025:
### ✅ Phase 1: Core Infrastructure (Complete)
1. **Types** (`operation.types.ts`):
- `HybridManifest`, `SnapshotReference`, `OperationFileReference` interfaces defined
- Backward compatibility maintained with existing `OperationLogManifest`
2. **Manifest Handling** (`operation-log-manifest.service.ts`):
- `loadManifest()` handles v1 and v2 formats
- Automatic v1 to v2 migration on first write
- Buffer/overflow logic in upload services
3. **FrontierClock Tracking**:
- Vector clocks merged when adding embedded operations
- `lastSyncedFrontierClock` stored locally for quick-check
### ✅ Phase 2: Snapshot Support (Complete)
4. **Snapshot Operations** (in `operation-log-upload.service.ts` and `operation-log-download.service.ts`):
- Snapshot generation with current state serialization
- Upload with retry logic
- Download + validate + apply
5. **Snapshot Triggers**:
- Automatic triggers based on file count and operation count
- Remote file cleanup after 14 days (`REMOTE_OP_FILE_RETENTION_MS`)
### ✅ Phase 3: Robustness (Complete)
6. **Concurrency Control**:
- Provider-specific revision checking (Dropbox rev, WebDAV ETag)
- Retry-on-conflict logic implemented
7. **Recovery Logic**:
- Manifest corruption recovery with file listing fallback
- Missing file handling with graceful degradation
### ✅ Phase 4: Testing (Complete)
8. **Tests**:
- Unit tests in `operation-log-manifest.service.spec.ts`
- Integration tests in `sync-scenarios.integration.spec.ts`
- E2E tests in `supersync.spec.ts`
### Key Implementation Files
| File | Purpose |
| ----------------------------------- | ------------------------------------------- |
| `operation-log-manifest.service.ts` | Manifest loading, saving, buffer management |
| `operation-log-upload.service.ts` | Upload with buffer/overflow logic |
| `operation-log-download.service.ts` | Download with snapshot support |
| `operation.types.ts` | Type definitions |
---
## 9. Configuration Constants
```typescript
// Buffer limits
const EMBEDDED_OP_LIMIT = 50; // Max operations in manifest buffer
const EMBEDDED_SIZE_LIMIT_KB = 100; // Max payload size in KB
// Snapshot triggers
const SNAPSHOT_FILE_THRESHOLD = 50; // Trigger when operationFiles exceeds this
const SNAPSHOT_OP_THRESHOLD = 5000; // Trigger when total ops exceed this
const SNAPSHOT_AGE_DAYS = 7; // Trigger if no snapshot in N days
// Batching
const UPLOAD_BATCH_SIZE = 100; // Ops per overflow file
// Retry
const MAX_UPLOAD_RETRIES = 3;
const RETRY_DELAY_MS = 1000;
```
---
## 10. Resolved Design Questions
The following questions were resolved during implementation:
1. **Encryption:** Snapshots use the same encryption as operation files (via `EncryptAndCompressHandlerService`).
2. **Compression:** Snapshots are compressed using the same compression scheme as other sync files.
3. **Checksum Verification:** Currently using timestamp-based validation; checksums can be added if needed.
4. **Clock Drift:** Vector clocks handle ordering; timestamps are informational only.
---
## 11. File Reference
### Remote Storage Layout (v2)
```
/ (or /DEV/ in development)
├── manifest.json # HybridManifest (buffer + references)
├── ops/
│ ├── ops_CLIENT1_170123.json # Flushed operations
│ └── ops_CLIENT2_170456.json
└── snapshots/
└── snap_170789.json # Full state snapshot (if present)
```
### Code Files
```
src/app/core/persistence/operation-log/
├── operation.types.ts # HybridManifest, SnapshotReference types
├── store/
│ └── operation-log-manifest.service.ts # Manifest management
├── sync/
│ ├── operation-log-upload.service.ts # Upload with buffer/overflow
│ └── operation-log-download.service.ts # Download with snapshot support
└── docs/
└── hybrid-manifest-architecture.md # This document
```

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,202 @@
# Operation Payload Optimization Discussion
**Date:** December 5, 2025
**Context:** Analysis of operation payload sizes and optimization opportunities
---
## Initial Analysis
We analyzed the codebase for occasions when many or very large operations are produced.
### Issues Identified
| Issue | Severity | Impact |
| --------------------------------- | -------- | ---------------------------------- |
| Tag deletion cascade | **High** | Creates N+1 operations for N tasks |
| Full payload storage | **High** | Large payloads stored repeatedly |
| batchUpdateForProject nesting | Medium | Single op contains nested array |
| Archive operations | Medium | One bulk op for many tasks |
| Single operations per bulk entity | Medium | N operations instead of 1 |
### Fixes Implemented
1. **Payload size monitoring** - Added `LARGE_PAYLOAD_WARNING_THRESHOLD_BYTES` (10KB) and logging when exceeded
2. **Bulk task-repeat-cfg operations** - Tag deletion now uses bulk delete instead of N individual operations
3. **Batch operation chunking** - `batchUpdateForProject` now chunks large operations into batches of `MAX_BATCH_OPERATIONS_SIZE` (50)
---
## Archive Operation Deep Dive
The `moveToArchive` action was identified as having large payloads (~2KB per task). We explored multiple optimization approaches.
### The Core Problem
Two sync systems exist:
1. **Operation Log (SuperSync)** - Real-time operation sync
2. **PFAPI** - Model file sync (daily for archive files)
When Client A archives tasks:
- Operation syncs immediately
- `archiveYoung` model file syncs later (daily)
When Client B receives the operation:
- Must write tasks to local archive
- But tasks are deleted from originating client's state
- Archive file hasn't synced yet
**The operation must carry full task data.**
### Solutions Explored
#### Option A: Hybrid Payload with Private Field
```typescript
moveToArchive: {
taskIds: string[], // Persisted
_tasks: TaskWithSubTasks[] // Stripped before storage
}
```
**Problem:** Remote operations won't have `_tasks` - still need full data for sync.
#### Option B: Meta-Reducer Enrichment
Capture tasks from state before deletion, attach to action for effect.
**Why it seemed possible:**
- Dependency resolution ensures `addTask` ops applied before `moveToArchive`
- Tasks exist in remote client's state when operation arrives
- Meta-reducer runs before main reducer
**Problems:**
- Complex action mutation
- Meta-reducers should be pure
- Awkward async queue from sync reducer
#### Option C: Two-Phase Archive
Split into `writeToArchive` (full data) + `deleteTasks` (IDs only).
**Problem:** Same total payload size. Just added complexity without benefit.
#### Option D: Operation-Derived Archive Store
Archive becomes a separate IndexedDB store populated entirely by operations:
```typescript
archiveTask: { taskIds: string[] } // IDs only
```
Meta-reducer moves task data from active state to archive before deletion.
**Benefits:**
- Tiny payloads
- Single source of truth
- No PFAPI archive sync needed
**Drawbacks:**
1. Migration complexity (years of existing archive data)
2. Initial sync must replay ALL archive ops (20K+ for heavy users)
3. Operation log growth (archive ops span years)
4. Compaction complexity (must preserve archive state)
5. Two storage systems to coordinate
6. PFAPI compatibility during transition
7. Query performance for 20K+ tasks
### The Scale Concern
> "There can be more than 20,000 archived tasks"
If archive was in NgRx store:
- Selectors iterate 20K+ entities
- Entity adapter operations slow down
- Memory bloat on app start
- DevTools unusable
This ruled out simple "add isArchived flag" approaches.
### Key Insight: Dependency Resolution
Operations have causal ordering. When remote client receives `moveToArchive`:
1. `addTask` operations already applied (dependency)
2. Task exists in remote client's active state
3. Could theoretically look up from state before deletion
But the effect runs AFTER the reducer deletes entities. The timing makes this approach impractical without complex meta-reducer side effects.
---
## Final Decision
**Keep the current full-payload approach.**
### Rationale
1. **It works correctly** - Already implemented, tested, documented
2. **Sync reliability** - No edge cases or timing issues
3. **Simplicity** - Single action, clear semantics
4. **Acceptable size** - ~100KB for 50 tasks is manageable
5. **Infrequent operation** - Archiving happens at end of day, not constantly
### Mitigation
For very large archives, chunk operations:
```typescript
const ARCHIVE_CHUNK_SIZE = 25;
async moveToArchive(tasks: TaskWithSubTasks[]): Promise<void> {
const chunks = chunkArray(parentTasks, ARCHIVE_CHUNK_SIZE);
for (const chunk of chunks) {
this._store.dispatch(TaskSharedActions.moveToArchive({ tasks: chunk }));
}
await this._archiveService.moveTasksToArchiveAndFlushArchiveIfDue(parentTasks);
}
```
### Trade-off Summary
| Approach | Payload Size | Complexity | Reliability |
| ------------------------- | ----------------- | ---------- | ----------- |
| Full payload (current) | Large (~2KB/task) | Low | High |
| Meta-reducer enrichment | Small | High | Medium |
| Two-phase archive | Same as current | Higher | High |
| Operation-derived archive | Small | Very High | Medium |
**The payload size reduction doesn't justify the added complexity.**
---
## Related Documentation
- `archive-operation-redesign.md` - Detailed analysis of archive options
- `code-audit.md` - Overall operation compliance audit
- `operation-size-analysis.md` - Initial payload size analysis
---
## Future Considerations
If payload size becomes a real problem (not theoretical), revisit Option D (operation-derived archive) with:
1. Proper migration plan for existing PFAPI data
2. Compaction strategy for long-lived archive operations
3. Performance testing with 20K+ tasks
4. PFAPI compatibility during transition
**Alternative Optimization:**
5. **Payload Compression**: Since task data (text/JSON) compresses extremely well (often >90%), we could compress the `_tasks` payload within the `moveToArchive` operation (e.g., using LZ-string or GZIP) before sending. This would solve the size concern without requiring the architectural overhaul of Option D.
Until then, current approach is the right balance.

View file

@ -0,0 +1,215 @@
# Operation Log: Design Rules & Guidelines
**Last Updated:** December 2025
**Related:** [Operation Log Architecture](./operation-log-architecture.md)
This document establishes the core rules and principles for designing the Operation Log store and defining new Operations. Adherence to these rules ensures data integrity, synchronization reliability, and system performance.
## 1. Store Design Rules
### 1.1 Append-Only Persistence
- **Rule:** The `ops` table in the store must be strictly **append-only** for active operations.
- **Reasoning:** History preservation is critical for event sourcing and conflict resolution.
- **Exception:** Operations can only be deleted by the **Compaction Service**, and only if they are:
1. Older than the retention window.
2. Successfully synced (`syncedAt` is set).
3. "Baked" into a secure snapshot.
### 1.2 Immutable History
- **Rule:** Once an operation is written to `SUP_OPS`, it **MUST NOT** be modified.
- **Reasoning:** Modifying history breaks the cryptographic chain (if implemented later) and confuses sync peers who have already received the operation.
- **Correction:** If an operation was incorrect, append a new _compensating operation_ (e.g., an undo or correction op) rather than editing the old one.
### 1.3 Single Source of Truth
- **Rule:** The Operation Log (`SUP_OPS`) is the ultimate source of truth for the application state.
- **Context:** The `state_cache` and runtime NgRx store are _projections_ derived from the log.
- **Implication:** If the runtime state disagrees with the log replay, the log wins.
### 1.4 Snapshot Mandate
- **Rule:** The store must maintain a valid `state_cache` (snapshot).
- **Frequency:** Snapshots must be updated based on configurable thresholds:
- **Operation count:** After N operations (default: 500, configurable).
- **Time-based:** After T minutes of inactivity following changes.
- **Size-based:** When tail ops exceed S kilobytes.
- **Event-triggered:** Immediately after significant events (large imports, sync completion).
- **Recovery:** The system must be able to rebuild the state entirely from `Snapshot + Tail Ops`.
## 2. Operation Design Rules
### 2.1 Granularity & Atomicity
- **Rule:** Operations should be **atomic** and focused on a **single entity** where possible.
- **Good:** `UPDATE_TASK { id: "A", changes: { title: "New" } }`
- **Bad:** `UPDATE_ALL_TASKS { [ ... entire tasks array ... ] }`
- **Reasoning:** Granular ops reduce conflict probability. Large "dump" ops cause massive conflicts during sync.
- **Exception:** `SYNC_IMPORT` and `BACKUP_IMPORT` are allowed to replace large chunks of state but must be treated as special "reset" events.
### 2.2 Idempotency
- **Rule:** Applying the same operation twice must be safe.
- **Implementation:**
- Use explicit IDs (UUID v7) for creation. `CREATE` with an existing ID must be **ignored** (not merged or updated). If updates are needed, a separate `UPDATE` operation must follow.
- `DELETE` on a missing entity should be a no-op.
- `UPDATE` on a missing entity should be queued for retry (see 3.4 Dependency Awareness).
### 2.3 Serializable Payload
- **Rule:** Operation payloads must be **Pure JSON**.
- **Forbidden:**
- `Date` objects (use `timestamp` numbers).
- Functions or class instances.
- `undefined` (use `null` or omit the key, depending on semantics).
- Circular references.
### 2.4 Causality Tracking
- **Rule:** Every operation **MUST** carry a `vectorClock`.
- **Purpose:** To determine if the operation is concurrent with others or if it causally follows them.
- **Responsibility:** The `OperationLogEffects` (or equivalent creator) captures the clock at the moment of creation.
### 2.5 Schema Versioning
- **Rule:** Every operation **MUST** carry a `schemaVersion`.
- **Purpose:** To allow future versions of the app to migrate or interpret old operations correctly.
- **Default:** Use `CURRENT_SCHEMA_VERSION` from `SchemaMigrationService` at the time of creation.
### 2.6 Explicit Intent (OpType)
- **Rule:** Use specific `OpType`s (`CRT`, `UPD`, `DEL`, `MOV`) rather than a generic `CHANGE`.
- **Reasoning:** Specific types allow for smarter conflict resolution and UI feedback (e.g., "Task was deleted remotely" vs "Task was moved").
## 3. Interaction & Safety Rules
### 3.1 Validation First
- **Rule:** Validate operation payloads **before** appending to the log.
- **Checkpoint:** Structural validation (required fields) happens at the boundary. Deep semantic validation happens during application/replay.
- **Failure:** Reject malformed operations immediately; do not corrupt the log.
### 3.2 Robust Replay
- **Rule:** The replay mechanism (Hydrator) **MUST NOT CRASH** on invalid operations.
- **Behavior:** If an operation fails to apply (e.g., referencing a missing parent):
1. Log a warning.
2. Skip the operation (or queue for retry).
3. Continue replaying the rest.
4. Trigger a `REPAIR` cycle at the end if needed.
### 3.3 Sync Isolation
- **Rule:** The `OperationLogStore` should not contain logic specific to a sync provider (Dropbox, WebDAV).
- **Separation:** The store manages _persistence_. The Sync Services manage _transport_.
- **Interface:** The store exposes `getUnsynced()`, `markSynced()`, `markRejected()` as generic methods.
### 3.4 Dependency Awareness
- **Rule:** Operations creating dependent entities (e.g., Subtask) must ensure the dependency (Parent Task) exists.
- **Handling:** If a parent is missing during sync, the child creation op should be buffered in a `DependencyQueue` until the parent arrives.
- **Safeguards:**
- **Cycle detection:** Before queuing, verify the dependency graph is acyclic. Reject operations that would create circular dependencies.
- **Buffer limits:** The queue must enforce a maximum depth (default: 1000 pending ops) and timeout (default: 5 minutes). Operations exceeding limits should be logged and dropped.
- **Retry policy:** Queued operations should be retried after each batch of new operations is applied, with exponential backoff for repeated failures.
### 3.5 Deletion & Tombstones
> **Status (December 2025):** Tombstones are **DEFERRED**. After comprehensive evaluation, the current event-sourced architecture provides sufficient safeguards without explicit tombstones. See `todo.md` Item 1 for the full evaluation.
- **Current Implementation:** Deletions use **DELETE operations** in the event log (immutable events, not destructive).
- **Alternative Safeguards in Place:**
- Vector clocks detect concurrent delete+update conflicts; user resolution UI is presented.
- Tag sanitization filters non-existent taskIds at reducer level.
- Subtask cascading deletes include all child tasks.
- Auto-repair removes orphaned references and creates REPAIR operations.
- **When to Revisit:**
- If undo/restore functionality is needed.
- If audit compliance requires explicit "entity deleted at time X" records.
- If cross-version sync (A.7.11) reveals edge cases not handled by current safeguards.
### 3.6 Operation Batching
- **Rule:** Normal operations should be batched with reasonable limits.
- **Limits:**
- **Max batch size:** 100 operations per batch for normal sync uploads.
- **Max payload size:** 1 MB per batch to prevent timeout issues.
- **Exception:** `SYNC_IMPORT` and `BACKUP_IMPORT` bypass these limits but must be clearly marked as bulk operations and trigger immediate snapshot creation afterward.
## 4. Effect Rules
### 4.1 LOCAL_ACTIONS for Side Effects
- **Rule:** All NgRx effects that perform side effects MUST use `inject(LOCAL_ACTIONS)` instead of `inject(Actions)`.
- **Reasoning:** Effects should NEVER run for remote sync operations. Side effects (snackbars, API calls, sounds) happen exactly once on the originating client.
- **Exception:** Effects that only dispatch state-modifying actions (not side effects) may use regular `Actions`.
**Example:**
```typescript
@Injectable()
export class MyEffects {
private _actions$ = inject(LOCAL_ACTIONS); // ✅ Correct for side effects
showSnack$ = createEffect(
() =>
this._actions$.pipe(
ofType(completeTask),
tap(() => this.snackService.show('Task completed!')),
),
{ dispatch: false },
);
}
```
### 4.2 Avoid Selector-Based Effects That Dispatch Actions
- **Rule:** Prefer action-based effects (`this._actions$.pipe(ofType(...))`) over selector-based effects (`this._store$.select(...)`).
- **Reasoning:** Selector-based effects fire whenever the store changes, including during hydration and sync replay, bypassing `LOCAL_ACTIONS` filtering.
- **Workaround:** If you must use a selector-based effect that dispatches actions, guard it with `HydrationStateService.isApplyingRemoteOps()`.
### 4.3 Archive Side Effects
- **Rule:** Archive operations (writing to IndexedDB) are handled by `ArchiveOperationHandler`, NOT by regular effects.
- **Local operations:** `ArchiveOperationHandlerEffects` routes through `ArchiveOperationHandler` (via LOCAL_ACTIONS)
- **Remote operations:** `OperationApplierService` calls `ArchiveOperationHandler` directly after dispatch
## 5. Multi-Entity Operation Rules
### 5.1 Use Meta-Reducers for Atomic Changes
- **Rule:** When one action affects multiple entities, use **meta-reducers** instead of effects.
- **Reasoning:** Meta-reducers ensure all changes happen in a single reducer pass, creating one operation in the sync log and preventing partial sync.
- **Example:** Deleting a tag also removes it from tasks → handled in `tagSharedMetaReducer`, not in an effect.
### 5.2 Capture Multi-Entity Changes
- **Rule:** The `OperationCaptureService` automatically captures all entity changes from a single action.
- **Implementation:** The `operation-capture.meta-reducer` calls `OperationCaptureService.enqueue()` with the action.
- **Result:** Single operation with `entityChanges[]` array containing all affected entities.
## 6. Configuration Constants
See `operation-log.const.ts` for all configurable values:
| Constant | Value | Description |
| ----------------------------------- | -------- | ----------------------------------------- |
| `COMPACTION_TRIGGER` | 500 ops | Operations before automatic compaction |
| `COMPACTION_RETENTION_MS` | 7 days | Synced ops older than this may be deleted |
| `EMERGENCY_COMPACTION_RETENTION_MS` | 1 day | Shorter retention for quota exceeded |
| `MAX_COMPACTION_FAILURES` | 3 | Failures before user notification |
| `MAX_DOWNLOAD_OPS_IN_MEMORY` | 50,000 | Bounds memory during API download |
| `REMOTE_OP_FILE_RETENTION_MS` | 14 days | Server-side operation file retention |
| `PENDING_OPERATION_EXPIRY_MS` | 24 hours | Pending ops older than this are rejected |
## 7. Quick Reference Checklist
When adding a new persistent action:
- [ ] Add `meta.isPersistent: true` to the action
- [ ] Add `meta.entityType` and `meta.opType`
- [ ] Ensure related entity changes are in a meta-reducer (not effects)
- [ ] Effects with side effects use `LOCAL_ACTIONS`
- [ ] Archive operations route through `ArchiveOperationHandler`
- [ ] Add action to `ACTION_AFFECTED_ENTITIES` if multi-entity

View file

@ -0,0 +1,860 @@
# PFAPI Sync and Persistence Architecture
This document describes the architecture and implementation of the persistence and synchronization system (PFAPI) in Super Productivity.
## Overview
PFAPI (Persistence Framework API) is a comprehensive system for:
1. **Local Persistence**: Storing application data in IndexedDB
2. **Cross-Device Synchronization**: Syncing data across devices via multiple cloud providers
3. **Conflict Detection**: Using vector clocks for distributed conflict detection
4. **Data Validation & Migration**: Ensuring data integrity across versions
## Architecture Layers
```
┌─────────────────────────────────────────────────────────────────┐
│ Angular Application │
│ (Components & Services) │
└────────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PfapiService (Angular) │
│ - Injectable wrapper around Pfapi │
│ - Exposes RxJS Observables for UI integration │
│ - Manages sync provider activation │
└────────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Pfapi (Core) │
│ - Main orchestrator for all persistence operations │
│ - Coordinates Database, Models, Sync, and Migration │
└────────────────────────────┬────────────────────────────────────┘
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Database │ │ SyncService │ │ Migration │
│ (IndexedDB) │ │ (Orchestrator)│ │ Service │
└───────────────┘ └───────┬───────┘ └───────────────┘
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌───────────┐ ┌───────────┐
│ Meta │ │ Model │ │ Encrypt/ │
│ Sync │ │ Sync │ │ Compress │
└──────────┘ └───────────┘ └───────────┘
│ │
└────────────┼────────────┐
│ │
▼ ▼
┌───────────────────────────┐
│ SyncProvider Interface │
└───────────────┬───────────┘
┌───────────────────────────┼───────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Dropbox │ │ WebDAV │ │ Local File │
└───────────────┘ └───────────────┘ └───────────────┘
```
## Directory Structure
```
src/app/pfapi/
├── pfapi.service.ts # Angular service wrapper
├── pfapi-config.ts # Model and provider configuration
├── pfapi-helper.ts # RxJS integration helpers
├── api/
│ ├── pfapi.ts # Main API class
│ ├── pfapi.model.ts # Type definitions
│ ├── pfapi.const.ts # Enums and constants
│ ├── db/ # Database abstraction
│ │ ├── database.ts # Database wrapper with locking
│ │ ├── database-adapter.model.ts
│ │ └── indexed-db-adapter.ts # IndexedDB implementation
│ ├── model-ctrl/ # Model controllers
│ │ ├── model-ctrl.ts # Generic model controller
│ │ └── meta-model-ctrl.ts # Metadata controller
│ ├── sync/ # Sync orchestration
│ │ ├── sync.service.ts # Main sync orchestrator
│ │ ├── meta-sync.service.ts # Metadata sync
│ │ ├── model-sync.service.ts # Model sync
│ │ ├── sync-provider.interface.ts
│ │ ├── encrypt-and-compress-handler.service.ts
│ │ └── providers/ # Provider implementations
│ ├── migration/ # Data migration
│ ├── util/ # Utilities (vector-clock, etc.)
│ └── errors/ # Custom error types
├── migrate/ # Cross-model migrations
├── repair/ # Data repair utilities
└── validate/ # Validation functions
```
## Core Components
### 1. Database Layer
#### Database Class (`api/db/database.ts`)
The `Database` class wraps the storage adapter and provides:
- **Locking mechanism**: Prevents concurrent writes during sync
- **Error handling**: Centralized error management
- **CRUD operations**: `load`, `save`, `remove`, `loadAll`, `clearDatabase`
```typescript
class Database {
lock(): void; // Prevents writes
unlock(): void; // Re-enables writes
load<T>(key: string): Promise<T>;
save<T>(key: string, data: T, isIgnoreDBLock?: boolean): Promise<void>;
remove(key: string): Promise<unknown>;
}
```
The database is locked during sync operations to prevent race conditions.
#### IndexedDB Adapter (`api/db/indexed-db-adapter.ts`)
Implements `DatabaseAdapter` interface using IndexedDB:
- Database name: `'pf'`
- Main store: `'main'`
- Uses the `idb` library for async IndexedDB operations
```typescript
class IndexedDbAdapter implements DatabaseAdapter {
async init(): Promise<IDBPDatabase>; // Opens/creates database
async load<T>(key: string): Promise<T>; // db.get(store, key)
async save<T>(key: string, data: T): Promise<void>; // db.put(store, data, key)
async remove(key: string): Promise<unknown>; // db.delete(store, key)
async loadAll<A>(): Promise<A>; // Returns all entries as object
async clearDatabase(): Promise<void>; // db.clear(store)
}
```
## Local Storage Structure (IndexedDB)
All data is stored in a single IndexedDB database with one object store. Each entry is keyed by a string identifier.
### IndexedDB Keys
#### System Keys
| Key | Content | Description |
| --------------------- | ------------------------- | ------------------------------------------------------- |
| `__meta_` | `LocalMeta` | Sync metadata (vector clock, revMap, timestamps) |
| `__client_id_` | `string` | Unique client identifier (e.g., `"BCL1234567890_12_5"`) |
| `__sp_cred_Dropbox` | `DropboxPrivateCfg` | Dropbox credentials |
| `__sp_cred_WebDAV` | `WebdavPrivateCfg` | WebDAV credentials |
| `__sp_cred_LocalFile` | `LocalFileSyncPrivateCfg` | Local file sync config |
| `__TMP_BACKUP` | `AllSyncModels` | Temporary backup during imports |
#### Model Keys (all defined in `pfapi-config.ts`)
| Key | Content | Main File | Description |
| ---------------- | --------------------- | --------- | ----------------------------- |
| `task` | `TaskState` | Yes | Tasks data (EntityState) |
| `timeTracking` | `TimeTrackingState` | Yes | Time tracking records |
| `project` | `ProjectState` | Yes | Projects (EntityState) |
| `tag` | `TagState` | Yes | Tags (EntityState) |
| `simpleCounter` | `SimpleCounterState` | Yes | Simple counters (EntityState) |
| `note` | `NoteState` | Yes | Notes (EntityState) |
| `taskRepeatCfg` | `TaskRepeatCfgState` | Yes | Recurring task configs |
| `reminders` | `Reminder[]` | Yes | Reminder array |
| `planner` | `PlannerState` | Yes | Planner state |
| `boards` | `BoardsState` | Yes | Kanban boards |
| `menuTree` | `MenuTreeState` | No | Menu structure |
| `globalConfig` | `GlobalConfigState` | No | User settings |
| `issueProvider` | `IssueProviderState` | No | Issue tracker configs |
| `metric` | `MetricState` | No | Metrics (EntityState) |
| `improvement` | `ImprovementState` | No | Improvements (EntityState) |
| `obstruction` | `ObstructionState` | No | Obstructions (EntityState) |
| `pluginUserData` | `PluginUserDataState` | No | Plugin user data |
| `pluginMetadata` | `PluginMetaDataState` | No | Plugin metadata |
| `archiveYoung` | `ArchiveModel` | No | Recent archived tasks |
| `archiveOld` | `ArchiveModel` | No | Old archived tasks |
### Local Storage Diagram
```
┌──────────────────────────────────────────────────────────────────┐
│ IndexedDB: "pf" │
│ Store: "main" │
├──────────────────────┬───────────────────────────────────────────┤
│ Key │ Value │
├──────────────────────┼───────────────────────────────────────────┤
│ __meta_ │ { lastUpdate, vectorClock, revMap, ... } │
│ __client_id_ │ "BCLm1abc123_12_5" │
│ __sp_cred_Dropbox │ { accessToken, refreshToken, encryptKey } │
│ __sp_cred_WebDAV │ { url, username, password, encryptKey } │
├──────────────────────┼───────────────────────────────────────────┤
│ task │ { ids: [...], entities: {...} } │
│ project │ { ids: [...], entities: {...} } │
│ tag │ { ids: [...], entities: {...} } │
│ note │ { ids: [...], entities: {...} } │
│ globalConfig │ { misc: {...}, keyboard: {...}, ... } │
│ timeTracking │ { ... } │
│ planner │ { ... } │
│ boards │ { ... } │
│ archiveYoung │ { task: {...}, timeTracking: {...} } │
│ archiveOld │ { task: {...}, timeTracking: {...} } │
│ ... │ ... │
└──────────────────────┴───────────────────────────────────────────┘
```
### How Models Are Saved Locally
When a model is saved via `ModelCtrl.save()`:
```typescript
// 1. Data is validated
if (modelCfg.validate) {
const result = modelCfg.validate(data);
if (!result.success && modelCfg.repair) {
data = modelCfg.repair(data); // Auto-repair if possible
}
}
// 2. Metadata is updated (if requested via isUpdateRevAndLastUpdate)
// Always:
vectorClock = incrementVectorClock(vectorClock, clientId);
lastUpdate = Date.now();
// Only for NON-main-file models (isMainFileModel: false):
if (!modelCfg.isMainFileModel) {
revMap[modelId] = Date.now().toString();
}
// Main file models are tracked via mainModelData in the meta file, not revMap
// 3. Data is saved to IndexedDB
await db.put('main', data, modelId); // e.g., key='task', value=TaskState
```
**Important distinction:**
- **Main file models** (`isMainFileModel: true`): Vector clock is incremented, but `revMap` is NOT updated. These models are embedded in `mainModelData` within the meta file.
- **Separate model files** (`isMainFileModel: false`): Both vector clock and `revMap` are updated. The `revMap` entry tracks the revision of the individual remote file.
### 2. Model Control Layer
#### ModelCtrl (`api/model-ctrl/model-ctrl.ts`)
Generic controller for each data model (tasks, projects, tags, etc.):
```typescript
class ModelCtrl<MT extends ModelBase> {
save(
data: MT,
options?: {
isUpdateRevAndLastUpdate: boolean;
isIgnoreDBLock?: boolean;
},
): Promise<unknown>;
load(): Promise<MT>;
remove(): Promise<unknown>;
}
```
Key behaviors:
- **Validation on save**: Uses Typia for runtime type checking
- **Auto-repair**: Attempts to repair invalid data if `repair` function is provided
- **In-memory caching**: Keeps data in memory for fast reads
- **Revision tracking**: Updates metadata on save when `isUpdateRevAndLastUpdate` is true
#### MetaModelCtrl (`api/model-ctrl/meta-model-ctrl.ts`)
Manages synchronization metadata:
```typescript
interface LocalMeta {
lastUpdate: number; // Timestamp of last local change
lastSyncedUpdate: number | null; // Timestamp of last sync
metaRev: string | null; // Remote metadata revision
vectorClock: VectorClock; // Client-specific clock values
lastSyncedVectorClock: VectorClock | null;
revMap: RevMap; // Model ID -> revision mapping
crossModelVersion: number; // Data schema version
}
```
Key responsibilities:
- **Client ID management**: Generates and stores unique client identifiers
- **Vector clock updates**: Increments on local changes
- **Revision map tracking**: Tracks which model versions are synced
### 3. Sync Service Layer
#### SyncService (`api/sync/sync.service.ts`)
Main sync orchestrator. The `sync()` method:
1. **Check readiness**: Verify sync provider is configured and authenticated
2. **Operation log sync**: Upload/download operation logs (new feature)
3. **Early return check**: If `lastSyncedUpdate === lastUpdate` and meta revision matches, return `InSync`
4. **Download remote metadata**: Get current remote state
5. **Determine sync direction**: Compare local and remote states using `getSyncStatusFromMetaFiles`
6. **Execute sync**: Upload, download, or report conflict
```typescript
async sync(): Promise<{ status: SyncStatus; conflictData?: ConflictData }>
```
Possible sync statuses:
- `InSync` - No changes needed
- `UpdateLocal` - Download needed (remote is newer)
- `UpdateRemote` - Upload needed (local is newer)
- `UpdateLocalAll` / `UpdateRemoteAll` - Full sync needed
- `Conflict` - Concurrent changes detected
- `NotConfigured` - No sync provider set
#### MetaSyncService (`api/sync/meta-sync.service.ts`)
Handles metadata file operations:
- `download()`: Gets remote metadata, checks for locks
- `upload()`: Uploads metadata with encryption
- `lock()`: Creates a lock file during multi-file upload
- `getRev()`: Gets remote metadata revision
#### ModelSyncService (`api/sync/model-sync.service.ts`)
Handles individual model file operations:
- `upload()`: Uploads a model with encryption
- `download()`: Downloads a model with revision verification
- `remove()`: Deletes a remote model file
- `getModelIdsToUpdateFromRevMaps()`: Determines which models need syncing
### 4. Vector Clock System
#### Purpose
Vector clocks provide **causality-based conflict detection** for distributed systems. Unlike simple timestamps:
- They detect **concurrent changes** (true conflicts)
- They preserve **happened-before relationships**
- They work without synchronized clocks
#### Implementation (`api/util/vector-clock.ts`)
```typescript
interface VectorClock {
[clientId: string]: number; // Maps client ID to update count
}
enum VectorClockComparison {
EQUAL, // Same state
LESS_THAN, // A happened before B
GREATER_THAN, // B happened before A
CONCURRENT, // True conflict - both changed independently
}
```
Key operations:
- `incrementVectorClock(clock, clientId)` - Increment on local change
- `mergeVectorClocks(a, b)` - Take max of each component
- `compareVectorClocks(a, b)` - Determine relationship
- `hasVectorClockChanges(current, reference)` - Check for local changes
- `limitVectorClockSize(clock, clientId)` - Prune to max 50 clients
#### Sync Status Determination (`api/util/get-sync-status-from-meta-files.ts`)
```typescript
function getSyncStatusFromMetaFiles(remote: RemoteMeta, local: LocalMeta) {
// 1. Check for empty local/remote
// 2. Compare vector clocks
// 3. Return appropriate SyncStatus
}
```
The algorithm (simplified - actual implementation has more nuances):
1. **Empty data checks:**
- If remote has no data (`isRemoteDataEmpty`), return `UpdateRemoteAll`
- If local has no data (`isLocalDataEmpty`), return `UpdateLocalAll`
2. **Vector clock validation:**
- If either local or remote lacks a vector clock, return `Conflict` with reason `NoLastSync`
- Both `vectorClock` and `lastSyncedVectorClock` must be present
3. **Change detection using `hasVectorClockChanges`:**
- Local changes: Compare current `vectorClock` vs `lastSyncedVectorClock`
- Remote changes: Compare remote `vectorClock` vs local `lastSyncedVectorClock`
4. **Sync status determination:**
- No local changes + no remote changes -> `InSync`
- Local changes only -> `UpdateRemote`
- Remote changes only -> `UpdateLocal`
- Both have changes -> `Conflict` with reason `BothNewerLastSync`
**Note:** The actual implementation also handles edge cases like minimal-update bootstrap scenarios and validates that clocks are properly initialized.
### 5. Sync Providers
#### Interface (`api/sync/sync-provider.interface.ts`)
```typescript
interface SyncProviderServiceInterface<PID extends SyncProviderId> {
id: PID;
isUploadForcePossible?: boolean;
isLimitedToSingleFileSync?: boolean;
maxConcurrentRequests: number;
getFileRev(targetPath: string, localRev: string | null): Promise<FileRevResponse>;
downloadFile(targetPath: string): Promise<FileDownloadResponse>;
uploadFile(
targetPath: string,
dataStr: string,
revToMatch: string | null,
isForceOverwrite?: boolean,
): Promise<FileRevResponse>;
removeFile(targetPath: string): Promise<void>;
listFiles?(targetPath: string): Promise<string[]>;
isReady(): Promise<boolean>;
setPrivateCfg(privateCfg): Promise<void>;
}
```
#### Available Providers
| Provider | Description | Force Upload | Max Concurrent |
| ------------- | --------------------------- | ------------ | -------------- |
| **Dropbox** | OAuth2 PKCE authentication | Yes | 4 |
| **WebDAV** | Nextcloud, ownCloud, etc. | No | 10 |
| **LocalFile** | Electron/Android filesystem | No | 10 |
| **SuperSync** | WebDAV-based custom sync | No | 10 |
### 6. Data Encryption & Compression
#### EncryptAndCompressHandlerService
Handles data transformation before upload/after download:
- **Compression**: Uses compression algorithms to reduce data size
- **Encryption**: AES encryption with user-provided key
Data format prefix: `pf_` indicates processed data.
### 7. Migration System
#### MigrationService (`api/migration/migration.service.ts`)
Handles data schema evolution:
- Checks version on app startup
- Applies cross-model migrations sequentially in order
- **Only supports forward (upgrade) migrations** - throws `CanNotMigrateMajorDownError` if data version is higher than code version (major version mismatch)
```typescript
interface CrossModelMigrations {
[version: number]: (fullData) => transformedData;
}
```
**Migration behavior:**
- If `dataVersion === codeVersion`: No migration needed
- If `dataVersion < codeVersion`: Run all migrations from `dataVersion` to `codeVersion`
- If `dataVersion > codeVersion` (major version differs): Throws error - downgrade not supported
Current version: `4.4` (from `pfapi-config.ts`)
### 8. Validation & Repair
#### Validation
Uses **Typia** for runtime type validation:
- Each model can define a `validate` function
- Returns `IValidation<T>` with success flag and errors
#### Repair
Auto-repair system for corrupted data:
- Each model can define a `repair` function
- Applied when validation fails
- Falls back to error if repair fails
## Sync Flow Diagrams
### Normal Sync Flow
```
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Device A│ │ Remote │ │ Device B│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
│ 1. sync() │ │
├────────────────►│ │
│ │ │
│ 2. download │ │
│ metadata │ │
│◄────────────────┤ │
│ │ │
│ 3. compare │ │
│ vector clocks │ │
│ │ │
│ 4. upload │ │
│ changes │ │
├────────────────►│ │
│ │ │
│ │ 5. sync() │
│ │◄────────────────┤
│ │ │
│ │ 6. download │
│ │ metadata │
│ ├────────────────►│
│ │ │
│ │ 7. download │
│ │ changed │
│ │ models │
│ ├────────────────►│
```
### Conflict Detection Flow
```
┌─────────┐ ┌─────────┐
│ Device A│ │ Device B│
│ VC: {A:5, B:3} │ VC: {A:4, B:5}
└────┬────┘ └────┬────┘
│ │
│ Both made changes offline │
│ │
│ ┌─────────────────────────┼───────────────────────────┐
│ │ Compare: CONCURRENT │ │
│ │ A has A:5 (higher) │ B has B:5 (higher) │
│ │ Neither dominates │ │
│ └─────────────────────────┴───────────────────────────┘
│ │
│ Conflict! │
│ User must choose which │
│ version to keep │
```
### Multi-File Upload with Locking
```
┌─────────┐ ┌─────────┐
│ Client │ │ Remote │
└────┬────┘ └────┬────┘
│ │
│ 1. Create lock │
│ (upload lock │
│ content) │
├────────────────►│
│ │
│ 2. Upload │
│ model A │
├────────────────►│
│ │
│ 3. Upload │
│ model B │
├────────────────►│
│ │
│ 4. Upload │
│ metadata │
│ (replaces lock)│
├────────────────►│
│ │
│ Lock released │
```
## Remote Storage Structure
The remote storage (Dropbox, WebDAV, local folder) contains multiple files. The structure is designed to optimize sync performance by separating frequently-changed small data from large archives.
### Remote Files Overview
```
/ (or /DEV/ in development)
├── __meta_ # Metadata file (REQUIRED - always synced first)
├── globalConfig # User settings
├── menuTree # Menu structure
├── issueProvider # Issue tracker configurations
├── metric # Metrics data
├── improvement # Improvement entries
├── obstruction # Obstruction entries
├── pluginUserData # Plugin user data
├── pluginMetadata # Plugin metadata
├── archiveYoung # Recent archived tasks (can be large)
└── archiveOld # Old archived tasks (can be very large)
```
### The Meta File (`__meta_`)
The meta file is the **central coordination file** for sync. It contains:
1. **Sync metadata** (vector clock, timestamps, version)
2. **Revision map** (`revMap`) - tracks which revision each model file has
3. **Main file model data** - frequently-accessed data embedded directly
```typescript
interface RemoteMeta {
// Sync coordination
lastUpdate: number; // When data was last changed
crossModelVersion: number; // Schema version (e.g., 4.4)
vectorClock: VectorClock; // For conflict detection
revMap: RevMap; // Model ID -> revision string
// Embedded data (main file models)
mainModelData: {
task: TaskState;
project: ProjectState;
tag: TagState;
note: NoteState;
timeTracking: TimeTrackingState;
simpleCounter: SimpleCounterState;
taskRepeatCfg: TaskRepeatCfgState;
reminders: Reminder[];
planner: PlannerState;
boards: BoardsState;
};
// For single-file sync providers
isFullData?: boolean; // If true, all data is in this file
}
```
### Main File Models vs Separate Model Files
Models are categorized into two types:
#### Main File Models (`isMainFileModel: true`)
These are embedded in the `__meta_` file's `mainModelData` field:
| Model | Reason |
| --------------- | ------------------------------------- |
| `task` | Frequently accessed, relatively small |
| `project` | Core data, always needed |
| `tag` | Small, frequently referenced |
| `note` | Often viewed together with tasks |
| `timeTracking` | Frequently updated |
| `simpleCounter` | Small, frequently updated |
| `taskRepeatCfg` | Needed for task creation |
| `reminders` | Small array, time-critical |
| `planner` | Viewed on app startup |
| `boards` | Part of main UI |
**Benefits:**
- Single HTTP request to get all core data
- Atomic update of related models
- Faster initial sync
#### Separate Model Files (`isMainFileModel: false` or undefined)
These are stored as individual files:
| Model | Reason |
| -------------------------------------- | ------------------------------------------- |
| `globalConfig` | User-specific, rarely synced |
| `menuTree` | UI state, not critical |
| `issueProvider` | Contains credentials, separate for security |
| `metric`, `improvement`, `obstruction` | Historical data, can grow large |
| `archiveYoung` | Can be large, changes infrequently |
| `archiveOld` | Very large, rarely accessed |
| `pluginUserData`, `pluginMetadata` | Plugin-specific, isolated |
**Benefits:**
- Only download what changed (via `revMap` comparison)
- Large files (archives) don't slow down regular sync
- Can sync individual models independently
### RevMap: Tracking Model Versions
The `revMap` tracks which version of each separate model file is on the remote:
```typescript
interface RevMap {
[modelId: string]: string; // Model ID -> revision/timestamp
}
// Example
{
"globalConfig": "1701234567890",
"menuTree": "1701234567891",
"archiveYoung": "1701234500000",
"archiveOld": "1701200000000",
// ... (main file models NOT included - they're in mainModelData)
}
```
When syncing:
1. Download `__meta_` file
2. Compare remote `revMap` with local `revMap`
3. Only download model files where revision differs
### Upload Flow
```
┌─────────────────────────────────────────────────────────────────────────┐
│ UPLOAD FLOW │
└─────────────────────────────────────────────────────────────────────────┘
1. Determine what changed (compare local/remote revMaps)
local.revMap: { archiveYoung: "100", globalConfig: "200" }
remote.revMap: { archiveYoung: "100", globalConfig: "150" }
→ globalConfig needs upload
2. For multi-file upload, create lock:
Upload to __meta_: "SYNC_IN_PROGRESS__BCLm1abc123_12_5"
3. Upload changed model files:
Upload to globalConfig: { encrypted/compressed data }
→ Get new revision: "250"
4. Upload metadata (replaces lock):
Upload to __meta_: {
lastUpdate: 1701234567890,
vectorClock: { "BCLm1abc123_12_5": 42 },
revMap: { archiveYoung: "100", globalConfig: "250" },
mainModelData: { task: {...}, project: {...}, ... }
}
```
### Download Flow
```
┌─────────────────────────────────────────────────────────────────────────┐
│ DOWNLOAD FLOW │
└─────────────────────────────────────────────────────────────────────────┘
1. Download __meta_ file
→ Get mainModelData (task, project, tag, etc.)
→ Get revMap for separate files
2. Compare revMaps:
remote.revMap: { archiveYoung: "300", globalConfig: "250" }
local.revMap: { archiveYoung: "100", globalConfig: "250" }
→ archiveYoung needs download
3. Download changed model files (parallel with load balancing):
Download archiveYoung → decrypt/decompress → save locally
4. Update local metadata:
- Save all mainModelData to IndexedDB
- Save downloaded models to IndexedDB
- Update local revMap to match remote
- Merge vector clocks
- Set lastSyncedUpdate = lastUpdate
```
### Single-File Sync Mode
Some providers (or configurations) use `isLimitedToSingleFileSync: true`. In this mode:
- **All data** is stored in the `__meta_` file
- `mainModelData` contains ALL models, not just main file models
- `isFullData: true` flag is set
- No separate model files are created
- Simpler but less efficient for large datasets
### File Content Format
All files are stored as JSON strings with optional encryption/compression:
```
Raw: { "ids": [...], "entities": {...} }
↓ (if compression enabled)
Compressed: <binary compressed data>
↓ (if encryption enabled)
Encrypted: <AES encrypted data>
Prefixed: "pf_" + <cross_model_version> + "__" + <base64 encoded data>
```
The `pf_` prefix indicates the data has been processed and needs decryption/decompression.
## Data Model Configurations
From `pfapi-config.ts`:
| Model | Main File | Description |
| ---------------- | --------- | ---------------------- |
| `task` | Yes | Tasks data |
| `timeTracking` | Yes | Time tracking records |
| `project` | Yes | Projects |
| `tag` | Yes | Tags |
| `simpleCounter` | Yes | Simple Counters |
| `note` | Yes | Notes |
| `taskRepeatCfg` | Yes | Recurring task configs |
| `reminders` | Yes | Reminders |
| `planner` | Yes | Planner data |
| `boards` | Yes | Kanban boards |
| `menuTree` | No | Menu structure |
| `globalConfig` | No | User settings |
| `issueProvider` | No | Issue tracker configs |
| `metric` | No | Metrics data |
| `improvement` | No | Metric improvements |
| `obstruction` | No | Metric obstructions |
| `pluginUserData` | No | Plugin user data |
| `pluginMetadata` | No | Plugin metadata |
| `archiveYoung` | No | Recent archive |
| `archiveOld` | No | Old archive |
**Main file models** are stored in the metadata file itself for faster sync of frequently-accessed data.
## Error Handling
Custom error types in `api/errors/errors.ts`:
- **API Errors**: `NoRevAPIError`, `RemoteFileNotFoundAPIError`, `AuthFailSPError`
- **Sync Errors**: `LockPresentError`, `LockFromLocalClientPresentError`, `UnknownSyncStateError`
- **Data Errors**: `DataValidationFailedError`, `ModelValidationError`, `DataRepairNotPossibleError`
## Event System
```typescript
type PfapiEvents =
| 'syncDone' // Sync completed
| 'syncStart' // Sync starting
| 'syncError' // Sync failed
| 'syncStatusChange' // Status changed
| 'metaModelChange' // Metadata updated
| 'providerChange' // Provider switched
| 'providerReady' // Provider authenticated
| 'providerPrivateCfgChange' // Provider credentials updated
| 'onBeforeUpdateLocal'; // About to download changes
```
## Security Considerations
1. **Encryption**: Optional AES encryption with user-provided key
2. **No tracking**: All data stays local unless explicitly synced
3. **Credential storage**: Provider credentials stored in IndexedDB with prefix `__sp_cred_`
4. **OAuth security**: Dropbox uses PKCE flow
## Key Design Decisions
1. **Vector clocks over timestamps**: More reliable conflict detection in distributed systems
2. **Main file models**: Frequently accessed data bundled with metadata for faster sync
3. **Database locking**: Prevents corruption during sync operations
4. **Adapter pattern**: Easy to add new storage backends
5. **Provider abstraction**: Consistent interface across Dropbox, WebDAV, local files
6. **Typia validation**: Runtime type safety without heavy dependencies
## Future Considerations
The system has been extended with **Operation Log Sync** for more granular synchronization at the operation level rather than full model replacement. See `operation-log-architecture.md` for details.

View file

@ -0,0 +1,205 @@
# Tiered Archive Model Proposal
**Date:** December 5, 2025
**Status:** Draft
---
## Overview
Introduce a tiered archive system that bounds the operation log to a configurable time window, making full op-log sync viable while preserving historical time tracking data.
---
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Active Tasks (~500) │ Op-log synced (real-time)
├─────────────────────────────────────────────────────────┤
│ Recent Archive (0-3 years) │ Op-log synced (full data)
├─────────────────────────────────────────────────────────┤
│ Old Archive (3+ years) │ Compressed to time stats
│ │ Device-local only
└─────────────────────────────────────────────────────────┘
```
### Tiers
| Tier | Age | Data | Sync Method |
| -------------- | --------- | ------------------ | ------------------ |
| Active | Current | Full task data | Op-log (real-time) |
| Recent Archive | 0-3 years | Full task data | Op-log (real-time) |
| Old Archive | 3+ years | Time tracking only | Device-local |
---
## Configuration
```typescript
interface ArchiveConfig {
// Years of full task data to keep synced
// Tasks older than this are converted to time tracking records
recentArchiveYears: number; // Default: 3
}
```
### Rationale for 3-Year Default
- Covers most practical use cases (searching recent work)
- Bounds synced task count to ~5,500 tasks (assuming 5 tasks/day)
- Keeps op-log manageable for initial sync
- Still preserves time tracking data indefinitely
---
## Data Model
### Recent Archive (Synced)
Full `TaskWithSubTasks` data, same as today.
### Old Archive (Compressed)
```typescript
interface TimeTrackingRecord {
date: string; // YYYY-MM-DD
projectId?: string;
tagIds: string[];
timeSpent: number; // milliseconds
}
interface OldArchiveModel {
// Aggregated time tracking data
timeTracking: TimeTrackingRecord[];
// Summary stats
totalTasksConverted: number;
oldestConvertedDate: string;
}
```
### Size Comparison
| Model | 10 Years of Data |
| ---------------------------------- | ----------------------- |
| Full tasks (current) | ~40MB (20K tasks × 2KB) |
| Tiered (3yr full + 7yr compressed) | ~12MB + ~250KB |
---
## Implementation
### Conversion Trigger
Run during daily archive flush:
```typescript
async flushArchive(): Promise<void> {
// Existing flush logic...
// After flush, check for tasks to convert
await this.convertOldArchiveTasks();
}
async convertOldArchiveTasks(): Promise<void> {
const cutoffDate = subYears(new Date(), config.recentArchiveYears);
const tasksToConvert = await this.getTasksArchivedBefore(cutoffDate);
if (tasksToConvert.length === 0) return;
// Extract time tracking data
const timeRecords = tasksToConvert.flatMap(task =>
Object.entries(task.timeSpentOnDay).map(([date, ms]) => ({
date,
projectId: task.projectId,
tagIds: task.tagIds,
timeSpent: ms,
}))
);
// Append to old archive
await this.appendToOldArchive(timeRecords);
// Remove from recent archive
await this.removeFromRecentArchive(tasksToConvert.map(t => t.id));
}
```
### Op-Log Compaction
With bounded recent archive, compaction becomes straightforward:
1. Snapshot current state (active + recent archive)
2. Discard all ops older than snapshot
3. Old archive is excluded from op-log entirely
---
## Migration Path
### Phase 1: Implement Tiered Model
- Add `OldArchiveModel` storage
- Implement conversion logic
- Add configuration option
### Phase 2: Enable by Default
- Set 3-year default
- Run initial conversion on existing archives
### Phase 3: Op-Log Optimization
- Exclude old archive from op-log
- Implement efficient compaction
---
## Trade-offs
### What Users Lose (for 3+ year old tasks)
- Task titles and details
- Notes and attachments
- Issue links
- Ability to restore individual tasks
### What Users Keep
- Time tracking per day/project/tag (for reports)
- Summary statistics
### Mitigation
- 3-year default is generous
- Configurable for users who need more
- Time tracking data (the main value) is preserved
---
## Open Questions
1. **Should old archive sync via PFAPI?**
- Pro: Data available on all devices
- Con: Adds complexity, defeats purpose of bounding sync
- Recommendation: Device-local only (users can export/import manually)
2. **Count-based alternative?**
- Instead of years, keep last N tasks (e.g., 5000)
- More predictable performance characteristics
- Could offer both options
3. **What about subtasks?**
- Convert parent and subtasks together
- Aggregate time tracking at parent level?
---
## Success Metrics
- Op-log initial sync < 10 seconds for typical users
- Archive operation payload < 100KB
- Memory usage stable regardless of total historical tasks

View file

@ -2,6 +2,6 @@
> **Note:** This document has been moved to the canonical location. Please see:
>
> **[/src/app/core/persistence/operation-log/docs/hybrid-manifest-architecture.md](/src/app/core/persistence/operation-log/docs/hybrid-manifest-architecture.md)**
> **[/docs/op-log/hybrid-manifest-architecture.md](/docs/op-log/hybrid-manifest-architecture.md)**
This redirect exists for historical reference. All updates should be made to the canonical document.

View file

@ -0,0 +1,156 @@
# Sync System Overview (PFAPI)
**Last Updated:** December 2025
This directory contains the **legacy PFAPI** synchronization implementation for Super Productivity. This system enables data sync across devices through file-based providers (Dropbox, WebDAV, Local File).
> **Note:** Super Productivity now has **two sync systems** running in parallel:
>
> 1. **PFAPI Sync** (this directory) - File-based sync via Dropbox/WebDAV
> 2. **Operation Log Sync** - Operation-based sync via SuperSync Server
>
> See [Operation Log Architecture](/docs/op-log/operation-log-architecture.md) for the newer operation-based system.
## Key Components
### Core Services
- **`sync.service.ts`** - Main orchestrator for sync operations
- **`meta-sync.service.ts`** - Handles sync metadata file operations
- **`model-sync.service.ts`** - Manages individual model synchronization
- **`conflict-handler.service.ts`** - User interface for conflict resolution
### Sync Providers
Located in `sync-providers/`:
- Dropbox
- WebDAV
- Local File System
### Sync Algorithm
The sync system uses vector clocks for accurate conflict detection:
1. **Physical Timestamps** (`lastUpdate`) - For ordering events
2. **Vector Clocks** (`vectorClock`) - For accurate causality tracking and conflict detection
3. **Sync State** (`lastSyncedUpdate`, `lastSyncedVectorClock`) - To track last successful sync
## How Sync Works
### 1. Change Detection
When a user modifies data:
```typescript
// In meta-model-ctrl.ts
lastUpdate = Date.now();
vectorClock[clientId] = vectorClock[clientId] + 1;
```
### 2. Sync Status Determination
The system compares local and remote metadata to determine:
- **InSync**: No changes needed
- **UpdateLocal**: Download remote changes
- **UpdateRemote**: Upload local changes
- **Conflict**: Both have changes (requires user resolution)
### 3. Conflict Detection
Uses vector clocks for accurate detection:
```typescript
const comparison = compareVectorClocks(localVector, remoteVector);
if (comparison === VectorClockComparison.CONCURRENT) {
// True conflict - changes were made independently
}
```
### 4. Data Transfer
- **Upload**: Sends changed models and updated metadata
- **Download**: Retrieves and merges remote changes
- **Conflict Resolution**: User chooses which version to keep
## Key Files
### Metadata Structure
```typescript
interface LocalMeta {
lastUpdate: number; // Physical timestamp
lastSyncedUpdate: number; // Last synced timestamp
vectorClock?: VectorClock; // Causality tracking
lastSyncedVectorClock?: VectorClock; // Last synced vector clock
revMap: RevMap; // Model revision map
crossModelVersion: number; // Schema version
}
```
### Important Considerations
1. **Vector Clocks**: Each client maintains its own counter for accurate causality tracking
2. **Backwards Compatibility**: Supports migration from older versions
3. **Conflict Minimization**: Vector clocks eliminate false conflicts
4. **Atomic Operations**: Meta file serves as transaction coordinator
## Common Sync Scenarios
### Scenario 1: Simple Update
1. Device A makes changes
2. Device A uploads to cloud
3. Device B downloads changes
4. Both devices now in sync
### Scenario 2: Conflict Resolution
1. Device A and B both make changes
2. Device A syncs first
3. Device B detects conflict
4. User chooses which version to keep
5. Chosen version propagates to all devices
### Scenario 3: Multiple Devices
1. Devices A, B, C all synced
2. Device A makes changes while offline
3. Device B makes different changes
4. Device C acts as intermediary
5. Vector clocks ensure proper ordering
## Debugging Sync Issues
1. Enable verbose logging in `pfapi/api/util/log.ts`
2. Check vector clock states in sync status
3. Verify client IDs are stable
4. Ensure metadata files are properly updated
## Integration with Operation Log
When using file-based sync (Dropbox, WebDAV), the Operation Log system maintains compatibility through:
1. **Vector Clock Updates**: `VectorClockFacadeService` updates the PFAPI meta-model's vector clock when operations are persisted locally
2. **State Source**: PFAPI reads current state from NgRx store (not from operation log IndexedDB)
3. **Bridge Effect**: `updateModelVectorClock$` in `operation-log.effects.ts` ensures clocks stay in sync
This allows users to:
- Use file-based sync (Dropbox/WebDAV) while benefiting from Operation Log's local persistence
- Migrate between sync providers without data loss
## Future Direction
The PFAPI sync system is **stable but not receiving new features**. New sync features are being developed in the Operation Log system:
- ✅ Entity-level conflict resolution (Operation Log)
- ✅ Incremental sync (Operation Log)
- 📋 Planned: Deprecate file-based sync in favor of Operation Log with file fallback
## Related Documentation
- [Vector Clocks](./vector-clocks.md) - Conflict detection implementation
- [Operation Log Architecture](/docs/op-log/operation-log-architecture.md) - Newer operation-based sync
- [Hybrid Manifest Architecture](/docs/op-log/hybrid-manifest-architecture.md) - File-based Operation Log sync

View file

@ -2,6 +2,6 @@
> **Note:** This document has been moved to the canonical location. Please see:
>
> **[/src/app/core/persistence/operation-log/docs/pfapi-sync-persistence-architecture.md](/src/app/core/persistence/operation-log/docs/pfapi-sync-persistence-architecture.md)**
> **[/docs/op-log/pfapi-sync-persistence-architecture.md](/docs/op-log/pfapi-sync-persistence-architecture.md)**
This redirect exists for historical reference. All updates should be made to the canonical document.

View file

@ -8,8 +8,8 @@ Super Productivity uses vector clocks to provide accurate conflict detection and
> **Related Documentation:**
>
> - [Operation Log Architecture](/src/app/core/persistence/operation-log/docs/operation-log-architecture.md) - How vector clocks are used in the operation log
> - [Operation Log Diagrams](/src/app/core/persistence/operation-log/docs/operation-log-architecture-diagrams.md) - Visual diagrams including conflict detection
> - [Operation Log Architecture](/docs/op-log/operation-log-architecture.md) - How vector clocks are used in the operation log
> - [Operation Log Diagrams](/docs/op-log/operation-log-architecture-diagrams.md) - Visual diagrams including conflict detection
## Table of Contents
@ -302,4 +302,4 @@ The Operation Log system uses vector clocks in several ways:
3. **Conflict Detection**: `detectConflicts()` compares clocks between pending local ops and remote ops
4. **SYNC_IMPORT Handling**: Vector clock dominance filtering determines which ops to replay after full state imports
For detailed information, see [Operation Log Architecture - Part C: Server Sync](/src/app/core/persistence/operation-log/docs/operation-log-architecture.md#part-c-server-sync).
For detailed information, see [Operation Log Architecture - Part C: Server Sync](/docs/op-log/operation-log-architecture.md#part-c-server-sync).