Analysis¶

This section describes the configurations for various analysis-related scripts.

Complex alignment¶

This config file is used to determine how a predicted protein-ligand complex structure is optimally aligned to a corresponding ground-truth protein-ligand complex.

analysis/complex_alignment.yaml¶

method: neuralplexer # the method for which to align predictions - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`)
vina_binding_site_method: p2rank # the method to use for Vina binding site prediction - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `p2rank`)
dataset: posebusters_benchmark # the dataset to use - NOTE: must be one of (`posebusters_benchmark`, `astex_diverse`, `dockgen`, `casp15`)
ensemble_ranking_method: consensus # the method with which to rank-order and select the top ensemble prediction for each target - NOTE: must be one of (`consensus`, `ff`)
input_data_dir: ${oc.env:PROJECT_ROOT}/data/${dataset}_set # the input protein-ligand complex directory to recursively parse
output_dir: ${resolve_method_output_dir:${method},${dataset},${vina_binding_site_method},${ensemble_ranking_method},${repeat_index},${pocket_only_baseline},${v1_baseline}} # the output directory to which to save the relaxed predictions
rank_to_align: 1 # the pose rank to align
aligned_filename_suffix: "_aligned" # the suffix to append to each aligned complex filename
force_process: false # whether to force processing of all complexes, even if they have already been processed
repeat_index: 1 # the repeat index which was used for inference
pocket_only_baseline: false # whether to prepare the pocket-only baseline
v1_baseline: false # whether to prepare the v1 baseline

Inference analysis (PoseBusters, Astex, and DockGen)¶

This config file is used to determine how to score a predicted protein-ligand complex from the PoseBusters Benchmark, Astex Diverse, or DockGen datasets.

analysis/inference_analysis.yaml¶

full_report: true # whether to generate a full PoseBusters report (i.e. with all metrics) or a summary report (i.e. with only the most important metrics)
method: diffdock # the method for which to score predictions - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `vina`, `ensemble`)
vina_binding_site_method: p2rank # the method to use for Vina binding site prediction - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `p2rank`)
dataset: posebusters_benchmark # the dataset to use - NOTE: must be one of (`posebusters_benchmark`, `astex_diverse`, `dockgen`, `casp15`)
ensemble_ranking_method: consensus # the method with which to rank-order and select the top ensemble prediction for each target - NOTE: must be one of (`consensus`, `ff`)
input_csv_path: ${resolve_method_input_csv_path:${method},${dataset},${pocket_only_baseline}} # the input CSV filepath with which to run inference
input_data_dir: ${oc.env:PROJECT_ROOT}/data/${dataset}_set # the input protein-ligand complex directory to recursively parse
posebusters_ccd_ids_filepath: ${oc.env:PROJECT_ROOT}/data/posebusters_pdb_ccd_ids.txt # the path to the PoseBusters PDB CCD IDs file that lists the targets that do not contain any crystal contacts
dockgen_test_ids_filepath: ${oc.env:PROJECT_ROOT}/data/dockgen_set/split_test.txt # the path to the DockGen test set IDs file
output_dir: ${resolve_method_output_dir:${method},${dataset},${vina_binding_site_method},${ensemble_ranking_method},${repeat_index},${pocket_only_baseline},${v1_baseline}} # the output directory to which to save the relaxed predictions
repeat_index: 1 # the repeat index which was used for inference
pocket_only_baseline: false # whether to analyze the pocket-only baseline
v1_baseline: false # whether to analyze the v1 baseline
relax_protein: false # whether to relax the protein - NOTE: currently periodically yields unpredictable protein-ligand separation
force_rescore: false # whether to force rescoring of the predictions with the PoseBusters software suite

Inference analysis (CASP)¶

This config file is used to determine how to score a predicted protein-ligand complex from the CASP15 dataset.

analysis/inference_analysis_casp.yaml¶

full_report: true # whether to generate a full PoseBusters report (i.e. with all metrics) or a summary report (i.e. with only the most important metrics)
python_exec_path: ${oc.env:HOME}/mambaforge/envs/casp15_ligand_scoring/bin/python3 # the Python executable to use
scoring_script_path: ${oc.env:PROJECT_ROOT}/posebench/analysis/casp15_ligand_scoring/score_predictions.py # the path to the script to use for scoring CASP predictions
method: diffdock # the method for which to score predictions - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `vina`, `ensemble`, `tulip`)
vina_binding_site_method: p2rank # the method to use for Vina binding site prediction - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `p2rank`)
dataset: casp15 # the dataset to use - NOTE: must be one of (`casp15`)
ensemble_ranking_method: consensus # the method with which to rank-order and select the top ensemble prediction for each target - NOTE: must be one of (`consensus`, `ff`)
predictions_dir: ${oc.env:PROJECT_ROOT}/data/test_cases/${dataset}/top_${method}_ensemble_predictions_${repeat_index} # the directory containing the predictions to analyze
dataset_dir: ${oc.env:PROJECT_ROOT}/data/${dataset}_set # the input protein-ligand complex directory to recursively parse
targets: null # the optional list of target names for which to analyze predictions; if `null`, then all targets in the dataset will be analyzed
fault_tolerant: true # whether to continue processing targets if an error occurs during processing; note that targets H1171v1-2, H1172v1-4, and T1158v4 fail validation (sequence mismatch) and were run with the `fault_tolerant=true` argument during CASP15
score_relaxed_structures: true # whether to score relaxed structures in addition to the original (unrelaxed) structures
repeat_index: 1 # the run index to use for scoring predictions
no_ilcl: false # whether to score a model trained without an inter-ligand clash loss (ILCL) - NOTE: only applicable to the `neuralplexer` method
relax_protein: false # whether to relax the protein - NOTE: currently periodically yields unpredictable protein-ligand separation
v1_baseline: false # whether to score the v1 baseline predictions
allow_missing_predictions: true # whether to allow missing predictions for a target
force_casp15_rescore: false # whether to force CASP15 rescoring of the predictions
force_pb_rescore: false # whether to force PoseBusters rescoring of the predictions