Binding site crop preparation

class posebench.data.binding_site_crop_preparation.BindingSiteSelect(structure_residues: list[Any], binding_site_residue_indices: list[int])[source]

Custom Select class to filter residues based on binding site residue indices.

accept_residue(residue: Any) bool[source]

Accept residues based on whether they are part of the binding site or not.

Parameters:

residue – Residue object from the Bio.PDB module.

Returns:

Boolean indicating whether the residue is part of the binding site.

posebench.data.binding_site_crop_preparation.crop_protein_binding_site(protein_filepath: str, binding_site_residue_indices: list[int], output_dir: str, pdb_id: str, filename_midfix: str = '', filename_suffix: str = '')[source]

Crop the protein binding site and save it to a separate file.

Parameters:
  • protein_filepath – Path to the input protein structure file.

  • binding_site_residue_indices – List of zero-based residue indices that define the binding site.

  • output_dir – Path to the output directory.

  • pdb_id – PDB ID of the protein-ligand complex.

  • filename_midfix – Optional “midfix” to insert into the cropped protein structure filename.

  • filename_suffix – Optional suffix to append to the cropped protein structure filename.

posebench.data.binding_site_crop_preparation.get_binding_site_residue_indices(protein_filepath: str, ligand_filepath: str, protein_ligand_distance_threshold: float = 4.0, num_buffer_residues: int = 7) list[int][source]

Get the zero-based residue indices of the protein binding site based on native protein- ligand interactions.

Parameters:
  • protein_filepath – Path to the protein structure PDB file.

  • ligand_filepath – Path to the ligand structure SDF file.

  • protein_ligand_distance_threshold – Heavy-atom distance threshold (in Angstrom) to use for finding protein binding site residues in interaction with ligand heavy atoms.

  • num_buffer_residues – Number of residues to include as a buffer around each binding site residue.

Returns:

List of zero-based residue indices that define the binding site.

posebench.data.binding_site_crop_preparation.main(cfg: DictConfig)[source]

Parse a data directory containing subdirectories of protein-ligand complexes and prepare corresponding inference CSV file for the DiffDock model.

Parameters:

cfg – Configuration dictionary from the hydra YAML file.

posebench.data.binding_site_crop_preparation.save_cropped_protein_binding_site(smiles_and_pdb_id_list: list[tuple[Any, str]], input_data_dir: str, input_protein_structure_dir: str, protein_ligand_distance_threshold: float = 4.0, num_buffer_residues: int = 7)[source]

Save the cropped protein binding site to a separate file for each protein-ligand complex.

Parameters:
  • smiles_and_pdb_id_list – A list of tuples each containing a SMILES string and a PDB ID.

  • dataset – Dataset name.

  • input_data_dir – Path to directory of input protein-ligand complex subdirectories.

  • input_protein_structure_dir – Path to the directory containing the protein structure input files.

  • protein_ligand_distance_threshold – Heavy-atom distance threshold (in Angstrom) to use for finding protein binding site residues in interaction with ligand heavy atoms.

  • num_buffer_residues – Number of residues to include as a buffer around each binding site residue.