Skip to main content

Validation Rules

This ruleset should mirror the current runner contract so that PreFlight UI catches packaging problems before a job starts.

Severity classes

Use three outward-facing statuses:

StatusMeaningUser action
ReadyThe package satisfies the current runner contractProceed to submission
WarningThe package is ingestible but likely incomplete for some downstream analysesReview and decide whether to continue
BlockedThe package cannot be submitted safely because the current runner contract would fail or become ambiguousFix before submission

Blocking rules

Treat these conditions as Blocked:

  • The <project_id> directory does not exist at the configured input root
  • The 01_sources/ directory is missing
  • sources.json is missing
  • sources.json cannot be parsed
  • The selection file referenced by sources.json cannot be found
  • portfolio_selected.csv does not contain either smiles or canonical_smiles
  • A file is referenced in sources.json but does not exist under 01_sources/
  • The package layout is staged at the wrong directory level for the configured config.projectId

These are not cosmetic issues. They correspond directly to known runtime failure conditions in the deployed runner.

Warning rules

Treat these conditions as Warning rather than Blocked unless a selected workflow explicitly requires them:

  • assays.csv, compounds.csv, targets.csv, or structure files are absent
  • primary_candidate_id is missing for a multi-candidate submission
  • Recommended identifier columns such as candidate_id or compound_id are absent
  • Extra files are present but unreferenced by sources.json
  • Input provenance is incomplete enough to reduce interpretability without breaking the run

Readiness versus eligibility

Keep these two ideas separate:

  • Eligibility answers whether the package matches a supported input contract
  • Readiness answers whether the package is strong enough to produce useful downstream outputs

A package can be technically eligible but still deserve warnings because evidence depth, identifiers, or optional files are weak.

Rules that should be surfaced in the UI

Each rule should expose:

  • A short machine-stable rule ID
  • Human-readable problem text
  • Severity
  • The exact file or field that triggered the rule
  • A concrete fix recommendation

Prefer direct messages such as:

  • sources.json is missing from <project_id>/01_sources
  • portfolio_selected.csv must contain smiles or canonical_smiles
  • config.projectId does not match the staged input directory

Avoid vague messages that force the user to infer the real packaging error.

For common resolution patterns, see Troubleshooting.