photoprism/internal/ai/vision/README.md

## PhotoPrism — Vision Package

**Last Updated:** December 23, 2025

### Overview

`internal/ai/vision` provides the shared model registry, request builders, and parsers that power PhotoPrism’s caption, label, face, NSFW, and future generate workflows. It reads `vision.yml`, normalizes models, and dispatches calls to one of three engines:

- **TensorFlow (built‑in)** — default Nasnet / NSFW / Facenet models, no remote service required. Long-running TensorFlow inference can accumulate C-allocated tensor memory until GC finalizers run, so PhotoPrism periodically triggers garbage collection to return that memory to the OS; tune with `PHOTOPRISM_TF_GC_EVERY` (default **200**, `0` disables). Lower values reduce peak RSS but increase GC overhead and can slow indexing, so keep the default unless memory pressure is severe.
- **Ollama** — local or proxied multimodal LLMs. See [`ollama/README.md`](ollama/README.md) for tuning and schema details. The engine defaults to `${OLLAMA_BASE_URL:-http://ollama:11434}/api/generate`, trimming any trailing slash on the base URL; set `OLLAMA_BASE_URL=https://ollama.com` to opt into cloud defaults.
- **OpenAI** — cloud Responses API. See [`openai/README.md`](openai/README.md) for prompts, schema variants, and header requirements.

### Configuration

#### Models

The `vision.yml` file is usually kept in the `storage/config` directory (override with `PHOTOPRISM_VISION_YAML`). It defines a list of models under `Models:`. Key fields are captured below. If a type is omitted entirely, PhotoPrism will auto-append the built-in defaults (labels, nsfw, face, caption) so you no longer need placeholder stanzas. The `Thresholds` block is optional; missing or out-of-range values fall back to defaults.

| Field                   | Default                                | Notes                                                                              |
|-------------------------|----------------------------------------|------------------------------------------------------------------------------------|
| `Type` (required)       | —                                      | `labels`, `caption`, `face`, `nsfw`, `generate`. Drives routing & scheduling.      |
| `Name`                  | derived from type/version              | Display name; lower-cased by helpers.                                              |
| `Model`                 | `""`                                   | Raw identifier override; precedence: `Service.Model` → `Model` → `Name`.           |
| `Version`               | `latest` (non-OpenAI)                  | OpenAI payloads omit version.                                                      |
| `Engine`                | inferred from service/alias            | Aliases set formats, file scheme, resolution. Explicit `Service` values still win. |
| `Run`                   | `auto`                                 | See Run modes table below.                                                         |
| `Default`               | `false`                                | Keep one per type for TensorFlow fallbacks.                                        |
| `Disabled`              | `false`                                | Registered but inactive.                                                           |
| `Resolution`            | 224 (TensorFlow) / 720 (Ollama/OpenAI) | Thumbnail edge in px; TensorFlow models default to 224 unless you override.        |
| `System` / `Prompt`     | engine defaults                        | Override prompts per model.                                                        |
| `Format`                | `""`                                   | Response hint (`json`, `text`, `markdown`).                                        |
| `Schema` / `SchemaFile` | engine defaults / empty                | Inline vs file JSON schema (labels).                                               |
| `TensorFlow`            | nil                                    | Local TF model info (paths, tags).                                                 |
| `Options`               | nil                                    | Sampling/settings merged with engine defaults.                                     |
| `Service`               | nil                                    | Remote endpoint config (see below).                                                |

#### Run Modes

| Value           | When it runs                                                     | Recommended use                                |
|-----------------|------------------------------------------------------------------|------------------------------------------------|
| `auto`          | TensorFlow defaults during index; external via metadata/schedule | Leave as-is for most setups.                   |
| `manual`        | Only when explicitly invoked (CLI/API)                           | Experiments and diagnostics.                   |
| `on-index`      | During indexing + manual                                         | Fast built-in models only.                     |
| `newly-indexed` | Metadata worker after indexing + manual                          | External/Ollama/OpenAI without slowing import. |
| `on-demand`     | Manual, metadata worker, and scheduled jobs                      | Broad coverage without index path.             |
| `on-schedule`   | Scheduled jobs + manual                                          | Nightly/cron-style runs.                       |
| `always`        | Indexing, metadata, scheduled, manual                            | High-priority models; watch resource use.      |
| `never`         | Never executes                                                   | Keep definition without running it.            |

> **Note:** For performance reasons, `on-index` is only supported for the built-in TensorFlow models.

#### Model Options

The model `Options` adjust model parameters such as temperature, top-p, and schema constraints when using [Ollama](ollama/README.md) or [OpenAI](openai/README.md). Rows are ordered exactly as defined in `vision/model_options.go`.

| Option             | Engines          | Default              | Description                                                                             |
|--------------------|------------------|----------------------|-----------------------------------------------------------------------------------------|
| `Temperature`      | Ollama, OpenAI   | engine default       | Controls randomness with a value between `0.01` and `2.0`; not used for OpenAI's GPT-5. |
| `TopK`             | Ollama           | engine default       | Limits sampling to the top K tokens to reduce rare or noisy outputs.                    |
| `TopP`             | Ollama, OpenAI   | engine default       | Nucleus sampling; keeps the smallest token set whose cumulative probability ≥ `p`.      |
| `MinP`             | Ollama           | engine default       | Drops tokens whose probability mass is below `p`, trimming the long tail.               |
| `TypicalP`         | Ollama           | engine default       | Keeps tokens with typicality under the threshold; combine with TopP/MinP for flow.      |
| `TfsZ`             | Ollama           | engine default       | Tail free sampling parameter; lower values reduce repetition.                           |
| `Seed`             | Ollama           | random per run       | Fix for reproducible outputs; unset for more variety between runs.                      |
| `NumKeep`          | Ollama           | engine default       | How many tokens to keep from the prompt before sampling starts.                         |
| `RepeatLastN`      | Ollama           | engine default       | Number of recent tokens considered for repetition penalties.                            |
| `RepeatPenalty`    | Ollama           | engine default       | Multiplier >1 discourages repeating the same tokens or phrases.                         |
| `PresencePenalty`  | OpenAI           | engine default       | Increases the likelihood of introducing new tokens by penalizing existing ones.         |
| `FrequencyPenalty` | OpenAI           | engine default       | Penalizes tokens in proportion to their frequency so far.                               |
| `PenalizeNewline`  | Ollama           | engine default       | Whether to apply repetition penalties to newline tokens.                                |
| `Stop`             | Ollama, OpenAI   | engine default       | Array of stop sequences (e.g., `["\\n\\n"]`).                                           |
| `Mirostat`         | Ollama           | engine default       | Enables Mirostat sampling (`0` off, `1/2` modes).                                       |
| `MirostatTau`      | Ollama           | engine default       | Controls surprise target for Mirostat sampling.                                         |
| `MirostatEta`      | Ollama           | engine default       | Learning rate for Mirostat adaptation.                                                  |
| `NumPredict`       | Ollama           | engine default       | Ollama-specific max output tokens; synonymous intent with `MaxOutputTokens`.            |
| `MaxOutputTokens`  | Ollama, OpenAI   | engine default       | Upper bound on generated tokens; adapters raise low values to defaults.                 |
| `ForceJson`        | Ollama, OpenAI   | engine default       | Forces structured output when enabled.                                                  |
| `SchemaVersion`    | Ollama, OpenAI   | derived from schema  | Override when coordinating schema migrations.                                           |
| `CombineOutputs`   | OpenAI           | engine default       | Controls whether multi-output models combine results automatically.                     |
| `Detail`           | OpenAI           | engine default       | Controls OpenAI vision detail level (`low`, `high`, `auto`).                            |
| `NumCtx`           | Ollama, OpenAI   | engine default       | Context window length (tokens).                                                         |
| `NumThread`        | Ollama           | runtime auto         | Caps CPU threads for local engines.                                                     |
| `NumBatch`         | Ollama           | engine default       | Batch size for prompt processing.                                                       |
| `NumGpu`           | Ollama           | engine default       | Number of GPUs to distribute work across.                                               |
| `MainGpu`          | Ollama           | engine default       | Primary GPU index when multiple GPUs are present.                                       |
| `LowVram`          | Ollama           | engine default       | Enable VRAM-saving mode; may reduce performance.                                        |
| `VocabOnly`        | Ollama           | engine default       | Load vocabulary only for quick metadata inspection.                                     |
| `UseMmap`          | Ollama           | engine default       | Memory map model weights instead of fully loading them.                                 |
| `UseMlock`         | Ollama           | engine default       | Lock model weights in RAM to reduce paging.                                             |
| `Numa`             | Ollama           | engine default       | Enable NUMA-aware allocations when available.                                           |

#### Model Service

Configures the endpoint URL, method, format, and authentication for [Ollama](ollama/README.md), [OpenAI](openai/README.md), and other engines that perform remote HTTP requests:

| Field                              | Default                                  | Notes                                                                                                                                           |
|------------------------------------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| `Uri`                              | required for remote                      | Endpoint base. Empty keeps model local (TensorFlow). Ollama alias fills `${OLLAMA_BASE_URL}/api/generate`, defaulting to `http://ollama:11434`. |
| `Method`                           | `POST`                                   | Override verb if provider needs it.                                                                                                             |
| `Key`                              | `""`                                     | Bearer token; prefer env expansion (OpenAI: `OPENAI_API_KEY`, Ollama: `OLLAMA_API_KEY`).                                                        |
| `Username` / `Password`            | `""`                                     | Injected as basic auth when URI lacks userinfo.                                                                                                 |
| `Model`                            | `""`                                     | Endpoint-specific override; wins over model/name.                                                                                               |
| `Org` / `Project`                  | `""`                                     | OpenAI headers (org/proj IDs)                                                                                                                   |
| `RequestFormat` / `ResponseFormat` | set by engine alias                      | Explicit values win over alias defaults.                                                                                                        |
| `FileScheme`                       | set by engine alias (`data` or `base64`) | Controls image transport.                                                                                                                       |
| `Disabled`                         | `false`                                  | Disable the endpoint without removing the model.                                                                                                |

> **Authentication:** All credentials and identifiers support `${ENV_VAR}` expansion. `Service.Key` sets `Authorization: Bearer <token>`; `Username`/`Password` injects HTTP basic authentication into the service URI when it is not already present. When `Service.Key` is empty, PhotoPrism defaults to `OPENAI_API_KEY` (OpenAI engine) or `OLLAMA_API_KEY` (Ollama engine), also honoring their `_FILE` counterparts.

### Field Behavior & Precedence

- Model identifier resolution order: `Service.Model` → `Model` → `Name`. `Model.GetModel()` returns `(id, name, version)` where Ollama receives `name:version` and other engines receive `name` plus a separate `Version`.
- Env expansion runs for all `Service` credentials and `Model` overrides; empty or disabled models return empty identifiers.
- Options merging: engine defaults fill missing fields; explicit values always win. Temperature is capped at `MaxTemperature`.
- Authentication: `Service.Key` sets `Authorization: Bearer <token>`; `Username`/`Password` inject HTTP basic auth into the service URI when not already present.

### Minimal Examples

#### TensorFlow (built‑in defaults)

```yaml
Models:
  - Type: labels
    Default: true
    Run: auto

  - Type: nsfw
    Default: true
    Run: auto

  - Type: face
    Default: true
    Run: auto
```

#### Ollama Labels

```yaml
Models:
  - Type: labels
    Model: gemma3:latest
    Engine: ollama
    Run: newly-indexed
    Service:
      Uri: ${OLLAMA_BASE_URL}/api/generate
```

More Ollama guidance: [`internal/ai/vision/ollama/README.md`](ollama/README.md).

#### OpenAI Captions

```yaml
Models:
  - Type: caption
    Model: gpt-5-mini
    Engine: openai
    Run: newly-indexed
    Service:
      Uri: https://api.openai.com/v1/responses
      Org: ${OPENAI_ORG}
      Project: ${OPENAI_PROJECT}
      Key: ${OPENAI_API_KEY}
```

More OpenAI guidance: [`internal/ai/vision/openai/README.md`](openai/README.md).

#### Custom TensorFlow Labels (SavedModel)

```yaml
Models:
  - Type: labels
    Name: transformer
    Engine: tensorflow
    Path: transformer   # resolved under assets/models
    Resolution: 224     # keep standard TF input size unless your model differs
    TensorFlow:
      Output:
        Logits: true    # set true for most TF2 SavedModel classifiers
```

### Custom TensorFlow Models — What’s Supported

- Scope: Classification tasks only (`labels`). TensorFlow models cannot generate captions today; use Ollama or OpenAI for captions.
- Location & paths: If `Path` is empty, the model is loaded from `assets/models/<name>` (lowercased, underscores). If `Path` is set, it is still searched under `assets/models`; absolute paths are not supported.
- Expected files: `saved_model.pb`, a `variables/` directory, and a `labels.txt` alongside the model; use TF2 SavedModel classifiers.
- Resolution: Stays at 224px unless your model requires a different input size; adjust `Resolution` and the `TensorFlow.Input` block if needed.
- Sources: Labels produced by TensorFlow models are recorded with source `image`; overriding the source isn’t supported yet.
- Config file: `vision.yml` is the conventional name; in the latest version, `.yaml` is also supported by the loader.

### CLI Quick Reference

- List models: `photoprism vision ls` (shows resolved IDs, engines, options, run mode, disabled flag).
- Run a model: `photoprism vision run -m labels --count 5` (use `--force` to bypass `Run` rules).
- Validate config: `photoprism vision ls --json` to confirm env-expanded values without triggering calls.

### When to Choose Each Engine

- **TensorFlow**: fast, offline defaults for core features (labels, faces, NSFW). Zero external deps.
- **Ollama**: private, GPU/CPU-hosted multimodal LLMs; best for richer captions/labels without cloud traffic.
- **OpenAI**: highest quality reasoning and multimodal support; requires API key and network access.

### Model Unload on Idle

PhotoPrism currently keeps TensorFlow models resident for the lifetime of the process to avoid repeated load costs. A future “model unload on idle” mode would track last-use timestamps and close the TensorFlow session/graph after a configurable idle period, releasing the model’s memory footprint back to the OS. The trade-off is higher latency and CPU overhead when a model is used again, plus extra I/O to reload weights. This may be attractive for low-frequency or memory-constrained deployments but would slow continuous indexing jobs, so it is not enabled today.

### Related Docs

- Ollama specifics: [`internal/ai/vision/ollama/README.md`](ollama/README.md)
- OpenAI specifics: [`internal/ai/vision/openai/README.md`](openai/README.md)
- REST API reference: https://docs.photoprism.dev/
- Developer guide (Vision): https://docs.photoprism.app/developer-guide/api/