Prepare Inputs
This page defines the current input contract for the runner and the core analysis workflow.
Input root layout
Inputs are discovered from:
<input-root>/<project_id>/01_sources/
With the published chart defaults, that usually maps to:
/data/input/<project_id>/01_sources/
The required entry point is:
<input-root>/<project_id>/01_sources/sources.json
Minimum required files
At minimum, provide:
sources.jsonportfolio_selected.csv
The portfolio_selected.csv file is resolved through sources.json, so both must be present and consistent.
Optional files
Add these when they are available for your project:
assays.csvcompounds.csvtargets.csvstructures/*.pdb
Missing optional files do not necessarily prevent a run, but they can reduce downstream module coverage and interpretation confidence.
sources.json expectations
sources.json should reference files located under the same 01_sources/ directory. Typical keys include:
portfolio_selected_csvprimary_candidate_idassays_csvcompounds_csvtargets_csvpdbs
The exact key names can vary slightly by pipeline generation path, but the operational rule is stable: the manifest must point to real files under 01_sources/ that the runner can resolve at runtime.
portfolio_selected.csv expectations
The selected portfolio file must include one of the following columns:
smilescanonical_smiles
Recommended additional columns:
candidate_idcompound_idname- Any customer metadata columns you want carried forward into downstream context
If you are using multiple candidates, make sure the row identifiers and primary_candidate_id in sources.json do not conflict.
Input provenance expectations
Prepare inputs so a reviewer can answer these questions without guesswork:
- Which files were supplied by the customer
- Which file is the authoritative selection file
- Which structure files belong to which candidate or target
- Which identifiers should be preserved in exported reports
Keeping that mapping clean improves reproducibility and reduces manual support work later in the run.
Common failure mode
If the runner fails with an error similar to:
Project directory not found: <input-root>/<project_id>
the usual causes are:
config.projectIddoes not match the staged folder name- Files were copied into the wrong directory level
- The runner does not have read access to the staged path
Quick check:
ls -la /data/input/<project_id>/01_sources/
Output scoping note
Input lookup is project-scoped, but outputs are run-scoped:
<output-root>/<run_id>/
This is expected behavior and prevents collisions when the same project is run multiple times.
Recommended validation path
If PreFlight UI is part of your workflow, validate the package before submission: