# New Document Type Checklist

Use this when adding or materially changing a supported `docvalidator` document type.

## Source Of Truth

Start from target repo docs:

- `AGENTS.md`
- `_documentation/guida_sviluppo_nuovo_tipo_documento.md`
- existing code for `DURC` and `DOCUMENTO_IDENTITA`

Do not treat a new document as classification-only. It must be a full pipeline extension.

## Runtime Pipeline

Expected flow:

1. `pdf_extraction.extract_text_with_metadata()` extracts text and OCR metadata.
2. `classification.predict_doc_type()` compares text with `REFERENCE_TEXTS`.
3. `field_extraction.extract_fields_with_diagnostics()` routes to a specific parser.
4. `reliability.assess_document_reliability()` scores quality and cross-cutting warnings.
5. `rules.apply_rules()` applies business rules and aggregates status.
6. `messaging.generate_message()` builds Italian user message.
7. API/CLI expose type, fields, status, diagnostics, and metadata.

## Required Implementation Surface

- Add `DOC_TYPE_<NAME> = "CODE"` in `src/docvalidator/config.py`.
- Add 3-5 rich, document-specific reference strings to `REFERENCE_TEXTS`.
- Create `src/docvalidator/field_extraction/<name>.py`.
- Export both:

```python
def extract_<name>_fields(text: str) -> dict[str, object]:
    fields, _ = extract_<name>_fields_with_diagnostics(text)
    return fields


def extract_<name>_fields_with_diagnostics(
    text: str,
) -> tuple[dict[str, object], dict[str, dict[str, object]]]:
    ...
```

- Register the parser in `src/docvalidator/field_extraction/__init__.py`.
- Add rules in `src/docvalidator/rules.py`.
- Extend mandatory fields and cross-cutting signals in `src/docvalidator/reliability.py`.
- Expose the type in `src/docvalidator/api/catalog.py`.
- Update API schema/validate handling if enum or examples are static.
- Update CLI choices and info output.
- Update public docs and tests in the same change.

## Design Before Coding

Define:

- stable internal document code;
- classification markers from real headings, labels, issuer names, and domain terms;
- extracted field names and value formats;
- mandatory vs optional fields;
- KO conditions;
- WARNING conditions;
- expiry/date-order rules;
- context inputs needed from caller;
- clean and OCR-noisy fixtures;
- whether real PDFs can be used without sensitive data.

## Rule Semantics

Use stable issue objects:

```python
{
    "code": "UPPER_SNAKE_CASE",
    "severity": "WARNING",
    "field": "field_name",
    "message_machine": "Messaggio tecnico in italiano.",
}
```

Practical rule:

- `KO`: document is unacceptable, invalid, expired, or missing required fields.
- `WARNING`: readable but weak, conflicting, near expiry, or requiring manual check.
- `OK`: no blocking issue and no warning.

Reuse existing helpers such as expiry and date-order rules where possible.

## Documentation Updates

Update at least:

- `README.md`: supported document types table.
- `_documentation/architettura_sistema.md`: parser/extension points if affected.
- `_documentation/specifiche_regole_validazione.md`: extracted fields and document rules.
- `_documentation/guida_deployment_api.md`: only if API flow or examples change.
- `_documentation/INDICE.md`: if adding dedicated docs.

## Anti-Patterns

Do not:

- add only `REFERENCE_TEXTS` without parser and rules;
- omit missing fields from returned field dicts;
- return non-ISO dates;
- emit issues with ad hoc shapes;
- log extracted personal or supplier data;
- duplicate warnings already emitted by reliability without deduplication;
- change global thresholds to make one document pass without calibration evidence;
- update API catalog without tests and README.
