From bbd50c223a09027e26921dff1994299fd5feef09 Mon Sep 17 00:00:00 2001 From: Johannes Millan Date: Mon, 8 Dec 2025 10:54:36 +0100 Subject: [PATCH] fix(sync): persist compaction counter when no cache exists The incrementCompactionCounter() function returned 1 without persisting when no state cache existed. This caused the compaction threshold to never be reached, leading to unbounded operation log growth. Also updated review docs: - Marked compaction counter bug as fixed - Identified memory leak in TaskElectronEffects as false positive (take(1) ensures automatic cleanup) - Identified TODAY_TAG "board-style" claim as false positive (code correctly REMOVES TODAY_TAG from task.tagIds) - Identified cache invalidation claim as false positive (sequence-based invalidation works correctly) - Identified repair mutex race condition as false positive (JavaScript single-threaded, no await between assignment) --- docs/ai/ngrx-implementation-analysis.md | 92 ++++ docs/ai/ngrx-implementation-review.md | 411 ++++++++++++++++++ docs/ai/operation-log-review-plan.md | 145 ++++++ .../operation-log-implementation-review.md | 308 +++++++++++++ .../store/operation-log-store.service.ts | 3 +- 5 files changed, 958 insertions(+), 1 deletion(-) create mode 100644 docs/ai/ngrx-implementation-analysis.md create mode 100644 docs/ai/ngrx-implementation-review.md create mode 100644 docs/ai/operation-log-review-plan.md create mode 100644 docs/ai/sync/operation-log-implementation-review.md diff --git a/docs/ai/ngrx-implementation-analysis.md b/docs/ai/ngrx-implementation-analysis.md new file mode 100644 index 000000000..6586a030b --- /dev/null +++ b/docs/ai/ngrx-implementation-analysis.md @@ -0,0 +1,92 @@ +# NgRx Implementation Analysis + +**Date:** December 8, 2025 +**Scope:** `src` directory, focusing on `FeatureStoresModule`, `Task` feature, and core state management patterns. +**Last Updated:** December 8, 2025 (added cross-references, validated findings) + +## 1. Architecture Overview + +The application employs a mature, feature-based NgRx architecture. It avoids a monolithic root state in favor of a modular approach where each domain feature manages its own slice of the state. + +- **Modular Organization:** State is strictly compartmentalized. The `src/app/root-store/feature-stores.module.ts` acts as a central registry, importing over 20 distinct feature modules (e.g., `tasks`, `projects`, `tags`, `config`). This is a scalable and clean pattern that keeps feature concerns separated. +- **Root Setup:** The root store is initialized in `src/main.ts` rather than `AppModule`. This unique setup likely caters to the specific bootstrapping requirements of the Electron/Web hybrid environment, allowing for early state initialization or meta-reducer configuration before the full Angular app launches. +- **Core State (Tasks):** The `task` feature (`src/app/features/tasks/store/`) is the most complex part of the application. It utilizes `@ngrx/entity` to manage a normalized state of tasks. This is crucial for handling the relational nature of tasks (parents, subtasks) and their efficient updates. +- **Interaction Pattern:** The application uses a hybrid "Service-Facade" pattern. Components like `TaskListComponent` inject the `Store` directly for dispatching actions but often rely on Services (e.g., `TaskService`) to expose selectors as Observables. This provides a pragmatic balance between the strict Command/Query segregation of NgRx and the developer ergonomics of Services. + +## 2. Potential Bugs & Risks + +### Type Safety Violations in Effects + +- **Location:** `src/app/features/tasks/store/task-internal.effects.ts` +- **Issue:** The `autoSetNextTask$` effect contains distinct type safety bypasses: + - `// @ts-ignore` is used to suppress errors. + - `const a = action as any;` is used to access properties on union types without proper type narrowing. +- **Risk:** This bypasses TypeScript's compile-time safety. If the structure of the actions handled by this effect (`updateTask`, `moveProjectTaskToBacklogList`, etc.) changes, this code could fail silently at runtime, potentially breaking the "auto-next-task" feature. + +### Complex Reducer Logic + +- **Location:** `src/app/features/tasks/store/task.reducer.ts` +- **Issue:** The `moveSubTask` reducer handles a complex multi-step state update. It manually updates the old parent's `subTaskIds`, the new parent's `subTaskIds`, and recalculates time estimates (`reCalcTimesForParentIfParent`) for both. +- **Risk:** Orchestrating relational data updates manually within a reducer is brittle. If one step in this sequence calculates incorrectly or fails, the state could become inconsistent (e.g., a task thinks it has a parent, but the parent doesn't list it as a subtask). + +### ~~Critical: Compaction Counter Bug~~ - FIXED + +- **Location:** `src/app/core/persistence/operation-log/store/operation-log-store.service.ts:434-439` +- **Original Issue:** When no state cache existed, `incrementCompactionCounter()` returned `1` without persisting the value. +- **Status:** Fixed on 2025-12-08. Counter now persists when no cache exists. + +## 3. Redundancies & Complexity + +### Logic in Effects + +- **Observation:** The `TaskInternalEffects` class contains a private method `_findNextTask`. This method encapsulates significant business logic for determining which task should be focused next. +- **Impact:** Coupling complex business logic with the Effects class makes it harder to test. Effects should ideally be "dumb" pipelines that connect Actions to Services. + +### Cross-Feature Coupling + +- **Observation:** The `Planner` feature and `Task` feature are tightly coupled via Actions. For instance, `PlannerActions.planTaskForDay` is caught and handled inside `task.reducer.ts`. +- **Impact:** While "Event-Driven" communication via Actions is a correct NgRx pattern, this "scattered" logic means that understanding how a Task entity changes requires auditing reducers in multiple different feature directories. + +## 4. Testing Analysis + +- **Strengths:** + - **Selector Coverage:** `src/app/features/tasks/store/task.selectors.spec.ts` is robust. It mocks the full state tree (including dependencies like `Project` and `Tag`) and verifies logic for complex derived data (e.g., `selectTasksDueForDay`). + - **Artifact coverage:** Most Store artifacts (Reducers, Effects, Selectors) have corresponding `*.spec.ts` files, indicating a strong testing culture. +- **Gaps:** + - The complex helper logic in `task.reducer.util.ts` (used by the reducer for calculations) should be verified to have its own independent unit tests, as the reducer relies heavily on its correctness for data integrity. + - Effects coverage is at 51% - see `docs/ai/ngrx-implementation-review.md` for the full list. + +## 5. Well-Designed Patterns + +### TODAY_TAG Virtual Pattern (Correctly Implemented) + +- **Location:** Meta-reducers in `src/app/root-store/meta/task-shared-meta-reducers/` +- **Pattern:** TODAY_TAG is correctly implemented as a "virtual tag" where: + - Membership is determined by `task.dueDay`, NOT by `task.tagIds` + - `TODAY_TAG.taskIds` only stores ordering + - Code actively removes any legacy TODAY_TAG from `task.tagIds` (cleanup) +- **Note:** This pattern was initially misidentified as a bug in early reviews. The code is correct. + +### Meta-Reducer Pattern for Atomic Operations + +- **Location:** `src/app/root-store/meta/task-shared-meta-reducers/` +- **Pattern:** Multi-entity changes are handled atomically in meta-reducers to ensure sync consistency. +- **See:** `docs/ai/today-tag-architecture.md` and CLAUDE.md point 8. + +## 6. Recommendations + +1. **Refactor for Type Safety:** + - Rewrite the `autoSetNextTask$` effect in `task-internal.effects.ts`. Use proper TypeScript Discriminated Unions or type guards to narrow the `action` type safely instead of casting to `any`. +2. **Extract Logic from Effects:** + - Move the `_findNextTask` logic into a stateless utility function or a domain service. This would allow it to be unit tested in isolation from the RxJS stream and the Store. +3. **Solidify Reducer Helpers:** + - Ensure that `src/app/features/tasks/store/task.reducer.util.ts` has comprehensive unit tests covering edge cases, specifically for `reCalcTimesForParentIfParent`, to prevent data corruption during subtask moves. +4. ~~**Fix Critical Bugs:**~~ - COMPLETED + - ✅ Compaction counter bug fixed on 2025-12-08. + +## 7. Related Documentation + +- `docs/ai/ngrx-implementation-review.md` - Detailed bug list and testing gaps +- `docs/ai/sync/operation-log-implementation-review.md` - Operation log specific issues +- `docs/ai/operation-log-review-plan.md` - Architecture review and implementation plan +- `docs/ai/today-tag-architecture.md` - TODAY_TAG virtual pattern explanation diff --git a/docs/ai/ngrx-implementation-review.md b/docs/ai/ngrx-implementation-review.md new file mode 100644 index 000000000..0cd490174 --- /dev/null +++ b/docs/ai/ngrx-implementation-review.md @@ -0,0 +1,411 @@ +# NgRx Implementation Review + +**Date:** December 8, 2025 +**Branch:** `feat/operation-logs` +**Reviewed by:** Claude (AI-assisted code review) +**Last Updated:** December 8, 2025 (corrected false positives) + +--- + +## Executive Summary + +A comprehensive review of the NgRx implementation identified **22 issues** across bugs, redundancies, architecture concerns, and testing gaps. + +| Category | Critical | High | Medium | Total | +| ------------ | ----------- | ---- | ------ | ----- | +| Bugs | 0 (1 fixed) | 4 | 8 | 12 | +| Testing Gaps | 3 | 4 | 2 | 9 | + +### Testing Coverage + +| Category | Total | Tested | Coverage | Status | +| ------------- | ----- | ------ | -------- | ---------------- | +| Reducers | 35 | 28 | 80% | Good | +| Effects | 39 | 20 | **51%** | **Critical Gap** | +| Selectors | 13 | 12 | 92% | Excellent | +| Meta-reducers | 10 | 8 | 80% | Good | + +--- + +## Corrected Analysis (False Positives Removed) + +### ~~TODAY_TAG Pattern Inconsistency~~ - FALSE POSITIVE + +**Status:** Not a bug - code is correct. + +**Original Claim:** Functions like `handleTransferTask` and `handlePlanTaskForDay` use "board-style" (adding TODAY_TAG to `task.tagIds`). + +**Actual Code Behavior:** The code explicitly **REMOVES** TODAY_TAG from `task.tagIds`: + +```typescript +// planner-shared.reducer.ts:102-114 +// Remove TODAY from task.tagIds if present +if (hasTaskTodayTag) { + state = { + ...state, + [TASK_FEATURE_NAME]: taskAdapter.updateOne( + { + id: task.id, + changes: { tagIds: currentTagIds.filter((id) => id !== TODAY_TAG.id) }, + }, + state[TASK_FEATURE_NAME], + ), + }; +} +``` + +The comments in the code confirm: "Ensure TODAY_TAG is NOT in task.tagIds (cleanup if present from legacy data)". + +**Conclusion:** The code correctly enforces the virtual tag pattern by removing any legacy TODAY_TAG references from `task.tagIds`. No fix needed. + +--- + +### ~~Repair Mutex Race Condition~~ - FALSE POSITIVE + +**Status:** Not a bug in JavaScript's single-threaded model. + +**Original Claim:** Between promise creation and mutex assignment, another call can pass the `if (!this._repairMutex)` check. + +**Why It's Not a Bug:** JavaScript is single-threaded. The code: + +```typescript +const repairPromise = (async () => { ... })(); // Returns immediately +this._repairMutex = repairPromise; // Assigned synchronously on next line +``` + +There is no `await` between these lines, so the event loop cannot yield. No other code can execute between promise creation and mutex assignment. + +**Optional Improvement:** Setting the mutex before creating the promise is a valid defensive coding practice for clarity, but the current implementation is not buggy. + +--- + +### ~~Cache Not Invalidated on Append~~ - FALSE POSITIVE + +**Status:** Not a bug - cache self-invalidates. + +**Original Claim:** `append()` doesn't invalidate `_appliedOpIdsCache`, causing stale data. + +**Actual Implementation:** + +```typescript +async getAppliedOpIds(): Promise> { + const currentLastSeq = await this.getLastSeq(); + // Cache valid only if lastSeq hasn't changed + if (this._appliedOpIdsCache && this._cacheLastSeq === currentLastSeq) { + return new Set(this._appliedOpIdsCache); + } + // Rebuild cache if seq changed + ... +} +``` + +When new ops are appended, `lastSeq` changes (auto-increment). The cache automatically invalidates because `currentLastSeq !== _cacheLastSeq`. + +**Conclusion:** No bug - the sequence-based invalidation works correctly. + +--- + +## Critical Bugs (Fix Immediately) + +### 1. ~~Compaction Counter Never Persists~~ - FIXED + +**Severity:** Critical +**File:** `src/app/core/persistence/operation-log/store/operation-log-store.service.ts` +**Lines:** 434-439 +**Status:** Fixed on 2025-12-08 + +**Original Issue:** When no state cache exists, `incrementCompactionCounter()` returned 1 without persisting the value. + +**Fix Applied:** + +```typescript +if (!cache) { + // No state cache yet - create one with counter starting at 1 + await store.put({ id: 'current', compactionCounter: 1 }); + await tx.done; + return 1; +} +``` + +--- + +### ~~Memory Leak in TaskElectronEffects~~ - FALSE POSITIVE + +**Status:** Not a bug - `take(1)` ensures automatic cleanup. + +**Original Claim:** Subscriptions accumulate without cleanup in the IPC handler. + +**Why It's Not a Bug:** Looking at the actual code: + +```typescript +this._store$.pipe( + select(selectCurrentTask), + withLatestFrom(...), + take(1), // <-- Completes after 1 emission, auto-unsubscribes +).subscribe(...) +``` + +The `take(1)` operator ensures the observable completes after receiving one value, which automatically triggers unsubscription. Each IPC event creates a short-lived subscription that self-destructs after emitting once. + +--- + +## High Priority Issues + +### 1. Snapshot Saved Before Validation + +**File:** `src/app/core/persistence/operation-log/store/operation-log-hydrator.service.ts` +**Lines:** 167-176 + +Snapshot saved at line 171, validation runs at line 176. If validation finds corruption and repairs, the snapshot is stale. + +**Fix:** Move validation (Checkpoint C) BEFORE saving snapshot. + +--- + +### 2. Missing Subtask Cascade in Tag Deletion + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/tag-shared.reducer.ts` +**Lines:** 85-116 + +Orphan detection only runs on parent tasks. Subtasks of surviving parents that lose all tags become orphaned but aren't cleaned up. + +**Fix:** After removing tags from tasks, also check subtasks of surviving parents. + +--- + +### 3. Race Conditions in TaskDueEffects + +**File:** `src/app/features/tasks/store/task-due.effects.ts` +**Lines:** 40-65 + +Issues: + +- Double sync wait (before AND after debounce) +- Multiple effects (`addTasksForTomorrow$`, `removeOverdueFormToday$`) compete on same signals +- Potential infinite loop if action triggers date change + +**Fix:** Consolidate sync waits, ensure effects don't trigger each other. + +--- + +### 4. Inconsistent Error Handling Across Effects + +| File | Has Error Handling | +| -------------------------- | --------------------- | +| `short-syntax.effects.ts` | Yes (`catchError()`) | +| `task-electron.effects.ts` | No | +| `idle.effects.ts` | No | +| `tag.effects.ts` | No (async operations) | + +**Fix:** Standardize error handling pattern across all effects. Use `catchError()` with appropriate fallbacks. + +--- + +## Medium Priority Issues + +### 5. Missing Bounds Checking + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/planner-shared.reducer.ts` +**Line:** 247 + +```typescript +const targetIndex = taskIds.indexOf(toTaskId); +taskIds.splice(targetIndex, 0, fromTask.id); // What if indexOf returns -1? +``` + +If `toTaskId` not found, `targetIndex = -1`, and `splice(-1, 0, id)` inserts at wrong position. + +--- + +### 6. TOCTOU Race in Compaction + +**File:** `src/app/core/persistence/operation-log/store/operation-log-compaction.service.ts` +**Lines:** 66-93 + +Time-of-check-time-of-use between `getLastSeq()` and `deleteOpsWhere()`. New ops could be written between these calls. + +--- + +### 7. State Capture ID Non-Deterministic + +**File:** `src/app/core/persistence/operation-log/processing/state-change-capture.service.ts` +**Lines:** 354-360 + +`JSON.stringify(action)` hash is non-deterministic for object property order. Could cause capture ID mismatches. + +--- + +### 8. LOCAL_ACTIONS Filter Edge Case + +**File:** `src/app/util/local-actions.token.ts` +**Line:** 37 + +```typescript +return actions$.pipe(filter((action: Action) => !(action as any).meta?.isRemote)); +``` + +Actions without `meta` field pass through (undefined !== true). Remote operations without explicit `isRemote: true` could trigger effects twice. + +--- + +### 9. Duplicated TODAY_TAG Logic + +**Locations:** + +- `planner-shared.reducer.ts:158-234` +- `short-syntax-shared.reducer.ts:241-309` + +Same logic for "moving to/from today" duplicated. Changes require updates in multiple places. + +**Fix:** Extract to shared helper function. + +--- + +### 10. getTag() Throws But Callers Don't Handle + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/task-shared-helpers.ts` +**Lines:** 51-80 + +Meta-reducers are synchronous and can't catch thrown errors. If `getTag()` throws, entire reducer chain fails. + +**Fix:** Use `getTagOrUndefined()` pattern consistently, or add guards before calling `getTag()`. + +--- + +### 11. Async Operations Without Error Handling + +**File:** `src/app/features/tag/store/tag.effects.ts` +**Lines:** 101-118 + +```typescript +tap(async (tagIdsToRemove: string[]) => { + await this._taskArchiveService.removeTagsFromAllTasks(tagIdsToRemove); + // No error handling, effect completes before async finishes +}); +``` + +**Fix:** Use proper async handling with `concatMap` and error boundaries. + +--- + +### 12. Orphan Detection Doesn't Verify Project Exists + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/tag-shared.reducer.ts` +**Lines:** 105-108 + +Checks `!task.projectId` but doesn't verify project still exists (could be deleted in same batch). + +--- + +## Testing Gaps + +### Critical: Effects Coverage at 51% + +**19 untested effect files:** + +**Priority 1 (Core):** + +- `src/app/imex/sync/sync.effects.ts` - Core sync engine +- `src/app/core-ui/layout/store/layout.effects.ts` - Navigation/panel management +- `src/app/features/tasks/store/task-related-model.effects.ts` - Task model sync +- `src/app/features/work-context/store/work-context.effects.ts` - Context switching + +**Priority 2 (Features):** + +- `src/app/features/project/store/project.effects.ts` +- `src/app/features/tag/store/tag.effects.ts` +- `src/app/features/tasks/store/short-syntax.effects.ts` +- `src/app/features/tasks/store/task-internal.effects.ts` +- `src/app/features/tasks/store/task-ui.effects.ts` +- Issue provider effects (caldav, gitlab, jira, open-project) + +--- + +### Critical: Disabled Test Files + +- `src/app/root-store/app-state/app-state.effects.spec.ts` - Completely commented out +- `src/app/features/note/store/note.effects.spec.ts` - Completely commented out +- `src/app/features/boards/store/boards.selectors.spec.ts` - Disabled + +--- + +### Missing Meta-Reducer Tests + +- `src/app/root-store/meta/task-shared-meta-reducers/issue-provider-shared.reducer.ts` +- `src/app/root-store/meta/task-shared-meta-reducers/task-repeat-cfg-shared.reducer.ts` + +--- + +### Missing Integration Tests + +- Race conditions (concurrent append + compaction) +- Error recovery (quota exceeded mid-write) +- Multi-tab coordination (lock deadlocks) +- 3-way conflict resolution +- Very large conflict sets (100+ ops on same entity) +- Network failure mid-pagination (2 tests marked `pending()`) + +--- + +## Architecture Observations + +### Well-Designed Aspects + +1. **Meta-reducer pattern** - Atomic multi-entity operations ensure sync consistency +2. **Persistent action pattern** - Explicit `isPersistent: true` enables operation logging +3. **LOCAL_ACTIONS token** - Filters remote operations from side-effect triggers +4. **Entity adapters** - Normalized state with proper NgRx patterns +5. **Self-healing selectors** - `computeOrderedTaskIdsForTag()` dynamically computes membership +6. **TODAY_TAG virtual pattern** - Correctly implemented with cleanup for legacy data + +### Concerns + +1. **Meta-reducer ordering** - 10 ordered reducers create tight coupling; order changes break logic +2. **Action metadata duplication** - Easy to forget `isPersistent` or use incorrect `opType` +3. **Three-way relationships** - `task.tagIds`, `tag.taskIds`, `planner.days` require careful synchronization + +--- + +## Prioritized Fix Recommendations + +### Phase 1: Critical (Immediate) - COMPLETED + +| # | Issue | Status | +| --- | ------------------ | -------------------------------------- | +| 1 | Compaction counter | **FIXED** (2025-12-08) | +| 2 | ~~Memory leak~~ | FALSE POSITIVE - `take(1)` auto-cleans | + +### Phase 2: High Priority (Current) + +| # | Issue | File | +| --- | ------------------------------ | ----------------------------------------- | +| 1 | Snapshot timing | operation-log-hydrator.service.ts:167-176 | +| 2 | Subtask orphan detection | tag-shared.reducer.ts | +| 3 | Race conditions | task-due.effects.ts:40-65 | +| 4 | Error handling standardization | Multiple effects | + +### Phase 3: Add Tests + +| # | Test | Priority | +| --- | -------------------------------------------- | -------- | +| 1 | Re-enable app-state.effects.spec.ts | Critical | +| 2 | sync.effects.spec.ts (new) | Critical | +| 3 | issue-provider-shared.reducer.spec.ts (new) | High | +| 4 | task-repeat-cfg-shared.reducer.spec.ts (new) | High | +| 5 | layout.effects.spec.ts (new) | High | + +### Phase 4: Medium Priority + +- Bounds checking in planner-shared.reducer.ts +- Extract duplicated TODAY_TAG logic +- Fix getTag() error handling pattern +- Add async error handling in tag.effects.ts + +--- + +## Related Documentation + +- `docs/ai/sync/operation-log-implementation-review.md` - Detailed operation log review +- `docs/ai/operation-log-review-plan.md` - Original review plan +- `src/app/core/persistence/operation-log/docs/operation-log-architecture.md` - Architecture docs +- `docs/ai/today-tag-architecture.md` - TODAY_TAG virtual pattern diff --git a/docs/ai/operation-log-review-plan.md b/docs/ai/operation-log-review-plan.md new file mode 100644 index 000000000..b228c1cd8 --- /dev/null +++ b/docs/ai/operation-log-review-plan.md @@ -0,0 +1,145 @@ +# Operation Log Architecture Review & Plan + +**Date:** Monday, December 8, 2025 +**Status:** Planning +**Module:** `src/app/core/persistence/operation-log/` +**Last Updated:** December 8, 2025 (added compaction counter bug, validated findings) + +## Executive Summary + +A comprehensive architectural analysis of the `operation-log` module has identified **one remaining critical issue**: a data loss risk in the file-based synchronization mechanism (WebDAV/Dropbox/etc.). The compaction counter bug has been fixed. This document outlines the findings and the proposed implementation plan to address remaining issues. + +## 1. Critical Findings + +### ✅ ~~Critical Bug: Compaction Counter Never Persists~~ - FIXED + +**Severity:** Critical +**Location:** `OperationLogStoreService.incrementCompactionCounter` (lines 434-439) +**Status:** Fixed on 2025-12-08 + +**Original Issue:** When no state cache existed, the function returned `1` without persisting that value. The compaction threshold was never reached. + +**Fix Applied:** + +```typescript +if (!cache) { + // No state cache yet - create one with counter starting at 1 + await store.put({ id: 'current', compactionCounter: 1 }); + await tx.done; + return 1; +} +``` + +--- + +### 🚨 Data Loss Risk in Manifest Sync (File-Based Providers) + +**Severity:** Critical +**Location:** `OperationLogManifestService.uploadRemoteManifest` & `OperationLogUploadService` + +**The Issue:** +When uploading the list of operation files (`manifest.json`), the service explicitly forces an overwrite (`revToMatch: null`, `force: true`). This creates a "Lost Update" race condition. + +**Scenario:** + +1. **Client A** reads manifest: `['file1']`. +2. **Client B** reads manifest: `['file1']`. +3. **Client A** uploads `file2`, writes manifest: `['file1', 'file2']`. +4. **Client B** uploads `file3`, writes manifest: `['file1', 'file3']`. + +**Outcome:** +`file2` (Client A's data) is orphaned. **Client C** will never see Client A's operations. If `file3` depends on `file2` (e.g., Update Task vs Create Task), Client C will crash or have corrupted state. + +**Recommendation:** +Implement Optimistic Locking. + +- `loadRemoteManifest` must return the file's `rev` (revision ID). +- `uploadRemoteManifest` must accept `revToMatch`. +- Implement a retry loop: if upload fails due to conflict (412), re-download, merge file lists (union), and retry. + +--- + +### ⚠️ Performance Bottleneck in Compaction + +**Severity:** High +**Location:** `OperationLogStoreService.deleteOpsWhere` + +**The Issue:** +Compaction deletes old operations based on the `appliedAt` timestamp. However, the IndexedDB store only has indexes for `byId` and `bySyncedAt`. + +**Impact:** +Compaction performs a full table scan of the `ops` store. As the log grows (the very problem compaction solves), this operation becomes slower ($O(N)$), potentially causing UI jank or locking timeouts during the critical compaction phase. + +**Recommendation:** +Update `DB_VERSION` and add an index for `appliedAt` in `OperationLogStoreService.init()`. + +--- + +### ⚠️ UX Deadlock in Conflict Resolution + +**Severity:** Medium +**Location:** `ConflictResolutionService` + +**The Issue:** +If the user cancels the Conflict Resolution dialog (e.g., clicks outside or presses Escape), the service returns without applying any changes or advancing the sync state. + +**Impact:** +The sync cycle aborts. The next sync attempt will likely hit the exact same conflict, prompting the user again. This creates a "stuck" state where the user is pestered indefinitely until they resolve it. + +**Recommendation:** +Treat "Cancel" as "Defer" (pause sync temporarily) or provide a "Skip/Ignore" option that allows other non-conflicting ops to proceed (with strict dependency checks). At minimum, log a clear warning that sync is stalled. + +## 2. Architectural Observations + +- **Redundant Logic:** `ConflictResolutionService` (Step 3) duplicates the logic of applying operations found in `OperationApplierService` (error handling, snackbars, etc.). This violates DRY and risks inconsistent behavior. + - _Plan:_ Refactor `ConflictResolutionService` to delegate fully to `OperationApplier`. +- **Locking Integrity:** `OperationLogEffects` (writer) and `OperationLogCompactionService` (cleaner) correctly use the `sp_op_log` lock, preventing race conditions during snapshot creation. +- **Fresh Client Safety:** The `isWhollyFreshClient` check in `OperationLogSyncService` correctly safeguards against accidental remote overwrites by empty clients. +- **Cache Self-Invalidation:** The `_appliedOpIdsCache` uses sequence-based invalidation which works correctly (not a bug as previously thought). + +## 3. Missing Test Coverage + +The current test suite lacks coverage for distributed system edge cases: + +1. **Manifest Concurrency:** No tests simulate two clients updating the manifest simultaneously to reproduce the lost update scenario. +2. **Compaction Race:** No tests verify that `append` waits for `compact` (locking verification). +3. **Partial Sync:** No tests for scenarios where a client downloads `file1` but fails to download `file2` (network error). +4. **Compaction Counter Persistence:** No test verifies counter is persisted when cache doesn't exist. + +## 4. Implementation Plan + +### Phase 1: Fix Critical Bugs (High Priority) + +1. **Fix Compaction Counter Bug:** + + - **File:** `src/app/core/persistence/operation-log/store/operation-log-store.service.ts` + - **Lines:** 428-447 + - **Change:** Persist counter=1 when no cache exists. + +2. **Modify `OperationLogManifestService`:** + + - Update `loadRemoteManifest` to return `{ manifest, rev }`. + - Update `uploadRemoteManifest` to accept `revToMatch`. + +3. **Update `OperationLogUploadService`:** + - Implement the read-modify-write loop with retries for `_uploadPendingOpsViaFiles`. + - Logic: `while (retries < max) { load(rev) -> merge -> upload(rev) }` + +### Phase 2: Optimize Store Indexes + +1. **Update `OperationLogStoreService`:** + - Increment `DB_VERSION`. + - Add `opStore.createIndex('byAppliedAt', 'appliedAt')` in the upgrade callback. + - Optimize `deleteOpsWhere` to use `IDBKeyRange.upperBound` on the new index. + +### Phase 3: Refactor Conflict Resolution + +1. **Refactor `ConflictResolutionService`:** + - Remove custom batch application logic. + - Use `OperationApplierService.applyOperations` directly. + +### Phase 4: Enhance Test Suite + +1. Create `manifest-concurrency.spec.ts` to reproduce the lost update scenario and verify the fix. +2. Create `compaction-counter.spec.ts` to verify counter persistence. +3. Add performance benchmarks for compaction with large datasets. diff --git a/docs/ai/sync/operation-log-implementation-review.md b/docs/ai/sync/operation-log-implementation-review.md new file mode 100644 index 000000000..13712b562 --- /dev/null +++ b/docs/ai/sync/operation-log-implementation-review.md @@ -0,0 +1,308 @@ +# Operation Log Implementation Review + +**Date:** 2025-12-08 +**Branch:** `feat/operation-logs` +**Last Updated:** December 8, 2025 (corrected false positives) + +This document summarizes findings from a comprehensive review of the operation log implementation, covering bugs, redundancies, architecture issues, and test coverage gaps. + +--- + +## Executive Summary + +The operation log implementation has a solid architectural foundation with event sourcing, vector clocks, and meta-reducers for atomic operations. However, several issues and test gaps were discovered that should be addressed before production use. + +| Category | Critical | High | Medium | Total | +| --------- | ----------- | ---- | ------ | ----- | +| Bugs | 0 (1 fixed) | 3 | 6 | 9 | +| Test Gaps | 3 | 4 | 3 | 10 | + +--- + +## Corrected Analysis (False Positives Removed) + +### ~~Repair Mutex Race Condition~~ - FALSE POSITIVE + +**Status:** Not a bug in JavaScript's single-threaded model. + +**Original Claim:** Between promise creation and mutex assignment, another call can pass the `if (!this._repairMutex)` check. + +**Why It's Not a Bug:** JavaScript is single-threaded. The code at lines 700-722: + +```typescript +const repairPromise = (async () => { ... })(); // Returns immediately +this._repairMutex = repairPromise; // Assigned synchronously on next line +``` + +There is no `await` between these lines, so the event loop cannot yield. No other code can execute between promise creation and mutex assignment. + +**Optional Improvement:** Setting the mutex before creating the promise is a valid defensive coding practice for clarity, but the current implementation is not buggy. + +--- + +### ~~Cache Not Invalidated on Append~~ - FALSE POSITIVE + +**Status:** Not a bug - cache self-invalidates. + +**Original Claim:** `append()` doesn't invalidate `_appliedOpIdsCache`, causing stale data. + +**Actual Implementation:** + +```typescript +async getAppliedOpIds(): Promise> { + const currentLastSeq = await this.getLastSeq(); + // Cache valid only if lastSeq hasn't changed + if (this._appliedOpIdsCache && this._cacheLastSeq === currentLastSeq) { + return new Set(this._appliedOpIdsCache); + } + // Rebuild cache if seq changed + ... +} +``` + +When new ops are appended, `lastSeq` changes (auto-increment). The cache automatically invalidates because `currentLastSeq !== _cacheLastSeq`. + +**Conclusion:** No bug - the sequence-based invalidation works correctly. + +--- + +## Critical Issues (Fix Immediately) + +### 1. ~~incrementCompactionCounter() Bug - Compaction Never Triggers~~ - FIXED + +**File:** `src/app/core/persistence/operation-log/store/operation-log-store.service.ts:434-439` +**Status:** Fixed on 2025-12-08 + +**Original Issue:** When no cache exists, returned 1 without persisting. Compaction threshold was never reached. + +**Fix Applied:** + +```typescript +if (!cache) { + // No state cache yet - create one with counter starting at 1 + await store.put({ id: 'current', compactionCounter: 1 }); + await tx.done; + return 1; +} +``` + +--- + +## High Priority Issues + +### 2. Missing Subtask Cascade in Tag Deletion + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/tag-shared.reducer.ts:85-116` + +When tags deleted, orphan detection only runs on parent tasks. Subtasks of surviving parents that lose all tags become orphaned but aren't cleaned up. + +### 3. Snapshot Saved Before Validation + +**File:** `src/app/core/persistence/operation-log/store/operation-log-hydrator.service.ts:167-176` + +Snapshot saved at line 171, validation runs at line 176. If validation finds corruption and repairs, the snapshot is now stale/wrong. + +**Fix:** Move validation (Checkpoint C) BEFORE saving snapshot. + +### 4. No Error Handling in applyShortSyntax Sub-Functions + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/short-syntax-shared.reducer.ts:53-126` + +Multiple state mutations (`moveTaskToProject`, `handleScheduleWithTime`, `handlePlanForDay`) with no try-catch. If any throws mid-operation, state becomes partially updated. + +### 5. Missing Recovery When clientId Load Fails + +**File:** `src/app/core/persistence/operation-log/operation-log.effects.ts:70-76` + +If `loadClientId()` fails or returns empty, operation is lost. No retry mechanism. + +--- + +## Medium Priority Issues + +### 6. TOCTOU Race in Compaction + +**File:** `src/app/core/persistence/operation-log/store/operation-log-compaction.service.ts:66-93` + +Time-of-check-time-of-use between `getLastSeq()` and `deleteOpsWhere()`. New ops written between these calls could theoretically cause issues. + +### 7. Orphan Detection Doesn't Verify Project Exists + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/tag-shared.reducer.ts:105-108` + +```typescript +if (newTagIds.length === 0 && !task.projectId && !task.parentId) { +``` + +Checks `!task.projectId` but doesn't verify project still exists (could be deleted in same batch). + +### 8. State Capture ID Non-Deterministic + +**File:** `src/app/core/persistence/operation-log/processing/state-change-capture.service.ts:354-360` + +`JSON.stringify(action)` hash is non-deterministic for object property order. Could cause capture ID mismatch. + +### 9. Stale Capture Cleanup Only on New Capture + +**File:** `src/app/core/persistence/operation-log/processing/state-change-capture.service.ts:231` + +`cleanupStaleCaptures()` only runs when new capture created. If operations are infrequent, stale captures accumulate. + +### 10. TimeTracking Archive Not Cleaned Atomically + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/tag-shared.reducer.ts:188-210` + +Current time tracking cleaned in meta-reducer, archives handled async in effect. Can cause sync inconsistency. + +### 11. LOCAL_ACTIONS Filter Edge Case + +**File:** `src/app/util/local-actions.token.ts:37` + +```typescript +return actions$.pipe(filter((action: Action) => !(action as any).meta?.isRemote)); +``` + +If action has no `meta` field, it passes (undefined !== true). Remote operations without explicit `isRemote: true` could trigger effects twice. + +--- + +## Test Coverage Gaps + +### Critical Missing Tests + +| Gap | Scenario | Impact | +| -------------------------- | ------------------------------ | ---------------- | +| **Race conditions** | Concurrent append + compaction | Data loss | +| **Error recovery** | Quota exceeded mid-write | State corruption | +| **Multi-tab coordination** | Tab A append + Tab B compact | Lock deadlock | + +### Missing Scenarios + +1. **Concurrent operations:** No test for append during compaction +2. **Quota exceeded:** Emergency compaction path never tested +3. **Schema migration:** Version mismatch during hydration +4. **3-way conflicts:** Only 2-way conflict resolution tested +5. **Conflict on deleted entity:** Update for entity local deleted +6. **Very large conflict sets:** 100+ ops on same entity +7. **Download retry:** Network failure mid-pagination (2 tests skipped with `pending()`) + +### Test Infrastructure Issues + +- `sync-scenarios.integration.spec.ts:18-32`: All SimulatedClients share same IndexedDB (not true isolation) +- `operation-log-download.service.spec.ts:360-373`: 2 tests marked `pending()` due to timeout +- Benchmark tests disabled (`xdescribe`) + +--- + +## Architecture Observations + +### Well-Designed Aspects + +1. **Event sourcing model** with vector clocks for conflict detection +2. **Meta-reducer pattern** for atomic multi-entity operations +3. **Snapshot + tail replay** for fast hydration +4. **LOCAL_ACTIONS token** for filtering remote operations from effects +5. **Sequence-based cache invalidation** for `_appliedOpIdsCache` + +### Potential Redundancies + +1. **Dual compaction triggers:** Both counter-based and time-based cleanup +2. **Multiple validation checkpoints:** A, B, C, D with similar logic +3. **Lock service fallback:** Both Web Locks API and localStorage locking maintained + +### Known Limitations (Documented) + +- Cross-version sync (A.7.11) not implemented +- Archive data bypasses operation log intentionally +- Schema version currently fixed at v1 + +--- + +## Implementation Plan + +### Phase 1: Fix Critical Bug + +#### 1.1 Fix incrementCompactionCounter() Bug + +**File:** `src/app/core/persistence/operation-log/store/operation-log-store.service.ts` +**Lines:** 428-447 + +### Phase 2: Fix High Priority Issues + +#### 2.1 Fix Snapshot Timing in Hydrator + +**File:** `src/app/core/persistence/operation-log/store/operation-log-hydrator.service.ts` +**Lines:** 167-176 + +**Fix:** Move validation (Checkpoint C) BEFORE saving snapshot. + +#### 2.2 Add Subtask Orphan Re-detection + +**File:** `src/app/root-store/meta/task-shared-meta-reducers/tag-shared.reducer.ts` +**Lines:** After 116 + +**Fix:** After removing tags from tasks, run orphan detection on subtasks of surviving parents. + +### Phase 3: Add Missing Tests + +#### 3.1 Race Condition Tests + +**File:** `src/app/core/persistence/operation-log/integration/race-conditions.integration.spec.ts` (new) + +Tests to add: + +- Concurrent append + compaction +- Concurrent upload + download +- Multi-tab lock acquisition +- Rapid local ops during sync + +#### 3.2 Error Recovery Tests + +**File:** `src/app/core/persistence/operation-log/integration/error-recovery.integration.spec.ts` (new) + +Tests to add: + +- Quota exceeded mid-write +- Network failure during download pagination +- Corrupted operation in log +- Schema mismatch during hydration + +#### 3.3 Fix Pending Download Tests + +**File:** `src/app/core/persistence/operation-log/sync/operation-log-download.service.spec.ts` +**Lines:** 360-373 + +**Fix:** Either mock timers properly or increase test timeout. + +### Phase 4: Verify & Run Tests + +1. Run `npm run checkFile` on all modified files +2. Run existing tests: `npm test` +3. Run new integration tests +4. Verify no regressions + +--- + +## Files Reference + +| Category | Files | +| ----------------- | --------------------------------------------------------------------------------------------------------------------- | +| **Core effects** | `operation-log.effects.ts` | +| **Store** | `operation-log-store.service.ts`, `operation-log-hydrator.service.ts`, `operation-log-compaction.service.ts` | +| **State capture** | `state-change-capture.service.ts`, `state-capture.meta-reducer.ts` | +| **Sync** | `operation-log-sync.service.ts`, `conflict-resolution.service.ts`, `vector-clock.service.ts` | +| **Meta-reducers** | `planner-shared.reducer.ts`, `short-syntax-shared.reducer.ts`, `tag-shared.reducer.ts`, `task-shared-crud.reducer.ts` | +| **Filter** | `local-actions.token.ts` | + +--- + +## Files to Modify + +| File | Changes | +| ------------------------------------------- | ---------------------------- | +| `operation-log-store.service.ts` | Fix compaction counter | +| `operation-log-hydrator.service.ts` | Fix snapshot timing | +| `tag-shared.reducer.ts` | Add subtask orphan detection | +| `operation-log-download.service.spec.ts` | Fix pending tests | +| (new) `race-conditions.integration.spec.ts` | Add race condition tests | +| (new) `error-recovery.integration.spec.ts` | Add error recovery tests | diff --git a/src/app/core/persistence/operation-log/store/operation-log-store.service.ts b/src/app/core/persistence/operation-log/store/operation-log-store.service.ts index ebd3fa1cc..05327082d 100644 --- a/src/app/core/persistence/operation-log/store/operation-log-store.service.ts +++ b/src/app/core/persistence/operation-log/store/operation-log-store.service.ts @@ -432,7 +432,8 @@ export class OperationLogStoreService { const cache = await store.get('current'); if (!cache) { - // No state cache yet - counter starts at 1 + // No state cache yet - create one with counter starting at 1 + await store.put({ id: 'current', compactionCounter: 1 }); await tx.done; return 1; }