# Field Diagnostic Contract

Use this for every parser or extraction change.

## Parser Shape

Every specific parser exposes:

```python
def extract_<name>_fields(text: str) -> dict[str, object]:
    fields, _ = extract_<name>_fields_with_diagnostics(text)
    return fields


def extract_<name>_fields_with_diagnostics(
    text: str,
) -> tuple[dict[str, object], dict[str, dict[str, object]]]:
    ...
```

Return all expected field keys even when values are `None`.

## Diagnostic Shape

Every field diagnostic uses:

```python
{
    "value": value,
    "confidence": 0.0,
    "sources": [],
    "support_count": 0,
    "conflict_count": 0,
    "status": "missing",
}
```

Allowed `status` values:

- `confirmed`: value well supported.
- `partial`: value present but weak.
- `conflicting`: inconsistent sources.
- `missing`: absent field.

## Extraction Rules

- Normalize dates to ISO `YYYY-MM-DD`.
- Use helpers from `field_extraction.utils` for whitespace, dates, CF/P.IVA, and existing normalizations.
- Prefer line-aware and OCR-tolerant parsing over fragile global regex.
- Consolidate candidates when several sources can produce the same field.
- Record candidate evidence through diagnostics.
- Feed diagnostics into reliability and field warning rules.

## Safety

- Do not log extracted personal or supplier data.
- Do not expose raw OCR text in user-facing output unless existing API contract already does.
- Do not omit a supported type from router and fall back to `raw_preview`.
- Do not duplicate reliability warnings in business rules without checking existing deduplication.
