photoprism/internal/ai/vision/README.md
Michael Mayer 28eb11d468 TensorFlow: Trigger explicit GC to free C-allocated tensor memory #5394
Signed-off-by: Michael Mayer <michael@photoprism.app>
2025-12-23 12:06:26 +01:00

211 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## PhotoPrism — Vision Package
**Last Updated:** December 23, 2025
### Overview
`internal/ai/vision` provides the shared model registry, request builders, and parsers that power PhotoPrisms caption, label, face, NSFW, and future generate workflows. It reads `vision.yml`, normalizes models, and dispatches calls to one of three engines:
- **TensorFlow (builtin)** — default Nasnet / NSFW / Facenet models, no remote service required. Long-running TensorFlow inference can accumulate C-allocated tensor memory until GC finalizers run, so PhotoPrism periodically triggers garbage collection to return that memory to the OS; tune with `PHOTOPRISM_TF_GC_EVERY` (default **200**, `0` disables). Lower values reduce peak RSS but increase GC overhead and can slow indexing, so keep the default unless memory pressure is severe.
- **Ollama** — local or proxied multimodal LLMs. See [`ollama/README.md`](ollama/README.md) for tuning and schema details. The engine defaults to `${OLLAMA_BASE_URL:-http://ollama:11434}/api/generate`, trimming any trailing slash on the base URL; set `OLLAMA_BASE_URL=https://ollama.com` to opt into cloud defaults.
- **OpenAI** — cloud Responses API. See [`openai/README.md`](openai/README.md) for prompts, schema variants, and header requirements.
### Configuration
#### Models
The `vision.yml` file is usually kept in the `storage/config` directory (override with `PHOTOPRISM_VISION_YAML`). It defines a list of models under `Models:`. Key fields are captured below. If a type is omitted entirely, PhotoPrism will auto-append the built-in defaults (labels, nsfw, face, caption) so you no longer need placeholder stanzas. The `Thresholds` block is optional; missing or out-of-range values fall back to defaults.
| Field | Default | Notes |
|-------------------------|----------------------------------------|------------------------------------------------------------------------------------|
| `Type` (required) | — | `labels`, `caption`, `face`, `nsfw`, `generate`. Drives routing & scheduling. |
| `Name` | derived from type/version | Display name; lower-cased by helpers. |
| `Model` | `""` | Raw identifier override; precedence: `Service.Model``Model``Name`. |
| `Version` | `latest` (non-OpenAI) | OpenAI payloads omit version. |
| `Engine` | inferred from service/alias | Aliases set formats, file scheme, resolution. Explicit `Service` values still win. |
| `Run` | `auto` | See Run modes table below. |
| `Default` | `false` | Keep one per type for TensorFlow fallbacks. |
| `Disabled` | `false` | Registered but inactive. |
| `Resolution` | 224 (TensorFlow) / 720 (Ollama/OpenAI) | Thumbnail edge in px; TensorFlow models default to 224 unless you override. |
| `System` / `Prompt` | engine defaults | Override prompts per model. |
| `Format` | `""` | Response hint (`json`, `text`, `markdown`). |
| `Schema` / `SchemaFile` | engine defaults / empty | Inline vs file JSON schema (labels). |
| `TensorFlow` | nil | Local TF model info (paths, tags). |
| `Options` | nil | Sampling/settings merged with engine defaults. |
| `Service` | nil | Remote endpoint config (see below). |
#### Run Modes
| Value | When it runs | Recommended use |
|-----------------|------------------------------------------------------------------|------------------------------------------------|
| `auto` | TensorFlow defaults during index; external via metadata/schedule | Leave as-is for most setups. |
| `manual` | Only when explicitly invoked (CLI/API) | Experiments and diagnostics. |
| `on-index` | During indexing + manual | Fast built-in models only. |
| `newly-indexed` | Metadata worker after indexing + manual | External/Ollama/OpenAI without slowing import. |
| `on-demand` | Manual, metadata worker, and scheduled jobs | Broad coverage without index path. |
| `on-schedule` | Scheduled jobs + manual | Nightly/cron-style runs. |
| `always` | Indexing, metadata, scheduled, manual | High-priority models; watch resource use. |
| `never` | Never executes | Keep definition without running it. |
> **Note:** For performance reasons, `on-index` is only supported for the built-in TensorFlow models.
#### Model Options
The model `Options` adjust model parameters such as temperature, top-p, and schema constraints when using [Ollama](ollama/README.md) or [OpenAI](openai/README.md). Rows are ordered exactly as defined in `vision/model_options.go`.
| Option | Engines | Default | Description |
|--------------------|------------------|----------------------|-----------------------------------------------------------------------------------------|
| `Temperature` | Ollama, OpenAI | engine default | Controls randomness with a value between `0.01` and `2.0`; not used for OpenAI's GPT-5. |
| `TopK` | Ollama | engine default | Limits sampling to the top K tokens to reduce rare or noisy outputs. |
| `TopP` | Ollama, OpenAI | engine default | Nucleus sampling; keeps the smallest token set whose cumulative probability ≥ `p`. |
| `MinP` | Ollama | engine default | Drops tokens whose probability mass is below `p`, trimming the long tail. |
| `TypicalP` | Ollama | engine default | Keeps tokens with typicality under the threshold; combine with TopP/MinP for flow. |
| `TfsZ` | Ollama | engine default | Tail free sampling parameter; lower values reduce repetition. |
| `Seed` | Ollama | random per run | Fix for reproducible outputs; unset for more variety between runs. |
| `NumKeep` | Ollama | engine default | How many tokens to keep from the prompt before sampling starts. |
| `RepeatLastN` | Ollama | engine default | Number of recent tokens considered for repetition penalties. |
| `RepeatPenalty` | Ollama | engine default | Multiplier >1 discourages repeating the same tokens or phrases. |
| `PresencePenalty` | OpenAI | engine default | Increases the likelihood of introducing new tokens by penalizing existing ones. |
| `FrequencyPenalty` | OpenAI | engine default | Penalizes tokens in proportion to their frequency so far. |
| `PenalizeNewline` | Ollama | engine default | Whether to apply repetition penalties to newline tokens. |
| `Stop` | Ollama, OpenAI | engine default | Array of stop sequences (e.g., `["\\n\\n"]`). |
| `Mirostat` | Ollama | engine default | Enables Mirostat sampling (`0` off, `1/2` modes). |
| `MirostatTau` | Ollama | engine default | Controls surprise target for Mirostat sampling. |
| `MirostatEta` | Ollama | engine default | Learning rate for Mirostat adaptation. |
| `NumPredict` | Ollama | engine default | Ollama-specific max output tokens; synonymous intent with `MaxOutputTokens`. |
| `MaxOutputTokens` | Ollama, OpenAI | engine default | Upper bound on generated tokens; adapters raise low values to defaults. |
| `ForceJson` | Ollama, OpenAI | engine default | Forces structured output when enabled. |
| `SchemaVersion` | Ollama, OpenAI | derived from schema | Override when coordinating schema migrations. |
| `CombineOutputs` | OpenAI | engine default | Controls whether multi-output models combine results automatically. |
| `Detail` | OpenAI | engine default | Controls OpenAI vision detail level (`low`, `high`, `auto`). |
| `NumCtx` | Ollama, OpenAI | engine default | Context window length (tokens). |
| `NumThread` | Ollama | runtime auto | Caps CPU threads for local engines. |
| `NumBatch` | Ollama | engine default | Batch size for prompt processing. |
| `NumGpu` | Ollama | engine default | Number of GPUs to distribute work across. |
| `MainGpu` | Ollama | engine default | Primary GPU index when multiple GPUs are present. |
| `LowVram` | Ollama | engine default | Enable VRAM-saving mode; may reduce performance. |
| `VocabOnly` | Ollama | engine default | Load vocabulary only for quick metadata inspection. |
| `UseMmap` | Ollama | engine default | Memory map model weights instead of fully loading them. |
| `UseMlock` | Ollama | engine default | Lock model weights in RAM to reduce paging. |
| `Numa` | Ollama | engine default | Enable NUMA-aware allocations when available. |
#### Model Service
Configures the endpoint URL, method, format, and authentication for [Ollama](ollama/README.md), [OpenAI](openai/README.md), and other engines that perform remote HTTP requests:
| Field | Default | Notes |
|------------------------------------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| `Uri` | required for remote | Endpoint base. Empty keeps model local (TensorFlow). Ollama alias fills `${OLLAMA_BASE_URL}/api/generate`, defaulting to `http://ollama:11434`. |
| `Method` | `POST` | Override verb if provider needs it. |
| `Key` | `""` | Bearer token; prefer env expansion (OpenAI: `OPENAI_API_KEY`, Ollama: `OLLAMA_API_KEY`). |
| `Username` / `Password` | `""` | Injected as basic auth when URI lacks userinfo. |
| `Model` | `""` | Endpoint-specific override; wins over model/name. |
| `Org` / `Project` | `""` | OpenAI headers (org/proj IDs) |
| `RequestFormat` / `ResponseFormat` | set by engine alias | Explicit values win over alias defaults. |
| `FileScheme` | set by engine alias (`data` or `base64`) | Controls image transport. |
| `Disabled` | `false` | Disable the endpoint without removing the model. |
> **Authentication:** All credentials and identifiers support `${ENV_VAR}` expansion. `Service.Key` sets `Authorization: Bearer <token>`; `Username`/`Password` injects HTTP basic authentication into the service URI when it is not already present. When `Service.Key` is empty, PhotoPrism defaults to `OPENAI_API_KEY` (OpenAI engine) or `OLLAMA_API_KEY` (Ollama engine), also honoring their `_FILE` counterparts.
### Field Behavior & Precedence
- Model identifier resolution order: `Service.Model``Model``Name`. `Model.GetModel()` returns `(id, name, version)` where Ollama receives `name:version` and other engines receive `name` plus a separate `Version`.
- Env expansion runs for all `Service` credentials and `Model` overrides; empty or disabled models return empty identifiers.
- Options merging: engine defaults fill missing fields; explicit values always win. Temperature is capped at `MaxTemperature`.
- Authentication: `Service.Key` sets `Authorization: Bearer <token>`; `Username`/`Password` inject HTTP basic auth into the service URI when not already present.
### Minimal Examples
#### TensorFlow (builtin defaults)
```yaml
Models:
- Type: labels
Default: true
Run: auto
- Type: nsfw
Default: true
Run: auto
- Type: face
Default: true
Run: auto
```
#### Ollama Labels
```yaml
Models:
- Type: labels
Model: gemma3:latest
Engine: ollama
Run: newly-indexed
Service:
Uri: ${OLLAMA_BASE_URL}/api/generate
```
More Ollama guidance: [`internal/ai/vision/ollama/README.md`](ollama/README.md).
#### OpenAI Captions
```yaml
Models:
- Type: caption
Model: gpt-5-mini
Engine: openai
Run: newly-indexed
Service:
Uri: https://api.openai.com/v1/responses
Org: ${OPENAI_ORG}
Project: ${OPENAI_PROJECT}
Key: ${OPENAI_API_KEY}
```
More OpenAI guidance: [`internal/ai/vision/openai/README.md`](openai/README.md).
#### Custom TensorFlow Labels (SavedModel)
```yaml
Models:
- Type: labels
Name: transformer
Engine: tensorflow
Path: transformer # resolved under assets/models
Resolution: 224 # keep standard TF input size unless your model differs
TensorFlow:
Output:
Logits: true # set true for most TF2 SavedModel classifiers
```
### Custom TensorFlow Models — Whats Supported
- Scope: Classification tasks only (`labels`). TensorFlow models cannot generate captions today; use Ollama or OpenAI for captions.
- Location & paths: If `Path` is empty, the model is loaded from `assets/models/<name>` (lowercased, underscores). If `Path` is set, it is still searched under `assets/models`; absolute paths are not supported.
- Expected files: `saved_model.pb`, a `variables/` directory, and a `labels.txt` alongside the model; use TF2 SavedModel classifiers.
- Resolution: Stays at 224px unless your model requires a different input size; adjust `Resolution` and the `TensorFlow.Input` block if needed.
- Sources: Labels produced by TensorFlow models are recorded with source `image`; overriding the source isnt supported yet.
- Config file: `vision.yml` is the conventional name; in the latest version, `.yaml` is also supported by the loader.
### CLI Quick Reference
- List models: `photoprism vision ls` (shows resolved IDs, engines, options, run mode, disabled flag).
- Run a model: `photoprism vision run -m labels --count 5` (use `--force` to bypass `Run` rules).
- Validate config: `photoprism vision ls --json` to confirm env-expanded values without triggering calls.
### When to Choose Each Engine
- **TensorFlow**: fast, offline defaults for core features (labels, faces, NSFW). Zero external deps.
- **Ollama**: private, GPU/CPU-hosted multimodal LLMs; best for richer captions/labels without cloud traffic.
- **OpenAI**: highest quality reasoning and multimodal support; requires API key and network access.
### Model Unload on Idle
PhotoPrism currently keeps TensorFlow models resident for the lifetime of the process to avoid repeated load costs. A future “model unload on idle” mode would track last-use timestamps and close the TensorFlow session/graph after a configurable idle period, releasing the models memory footprint back to the OS. The trade-off is higher latency and CPU overhead when a model is used again, plus extra I/O to reload weights. This may be attractive for low-frequency or memory-constrained deployments but would slow continuous indexing jobs, so it is not enabled today.
### Related Docs
- Ollama specifics: [`internal/ai/vision/ollama/README.md`](ollama/README.md)
- OpenAI specifics: [`internal/ai/vision/openai/README.md`](openai/README.md)
- REST API reference: https://docs.photoprism.dev/
- Developer guide (Vision): https://docs.photoprism.app/developer-guide/api/