Binding site crop preparation¶
- class posebench.data.binding_site_crop_preparation.BindingSiteSelect(structure_residues: list[Any], binding_site_residue_indices: list[int])[source]¶
Custom Select class to filter residues based on binding site residue indices.
- posebench.data.binding_site_crop_preparation.crop_protein_binding_site(protein_filepath: str, binding_site_residue_indices: list[int], output_dir: str, pdb_id: str, filename_midfix: str = '', filename_suffix: str = '')[source]¶
Crop the protein binding site and save it to a separate file.
- Parameters:
protein_filepath – Path to the input protein structure file.
binding_site_residue_indices – List of zero-based residue indices that define the binding site.
output_dir – Path to the output directory.
pdb_id – PDB ID of the protein-ligand complex.
filename_midfix – Optional “midfix” to insert into the cropped protein structure filename.
filename_suffix – Optional suffix to append to the cropped protein structure filename.
- posebench.data.binding_site_crop_preparation.get_binding_site_residue_indices(protein_filepath: str, ligand_filepath: str, protein_ligand_distance_threshold: float = 10.0, num_buffer_residues: int = 7) list[int] [source]¶
Get the zero-based residue indices of the protein binding site based on native protein- ligand interactions.
- Parameters:
protein_filepath – Path to the protein structure PDB file.
ligand_filepath – Path to the ligand structure SDF file.
protein_ligand_distance_threshold – Heavy-atom distance threshold (in Angstrom) to use for finding protein binding site residues in interaction with ligand heavy atoms.
num_buffer_residues – Number of residues to include as a buffer around each binding site residue.
- Returns:
List of zero-based residue indices that define the binding site.
- posebench.data.binding_site_crop_preparation.main(cfg: DictConfig)[source]¶
Parse a data directory containing subdirectories of protein-ligand complexes and prepare corresponding inference CSV file for the DiffDock model.
- Parameters:
cfg – Configuration dictionary from the hydra YAML file.
- posebench.data.binding_site_crop_preparation.save_cropped_protein_binding_site(smiles_and_pdb_id_list: list[tuple[Any, str]], input_data_dir: str, input_protein_structure_dir: str, protein_ligand_distance_threshold: float = 10.0, num_buffer_residues: int = 7)[source]¶
Save the cropped protein binding site to a separate file for each protein-ligand complex.
- Parameters:
smiles_and_pdb_id_list – A list of tuples each containing a SMILES string and a PDB ID.
dataset – Dataset name.
input_data_dir – Path to directory of input protein-ligand complex subdirectories.
input_protein_structure_dir – Path to the directory containing the protein structure input files.
protein_ligand_distance_threshold – Heavy-atom distance threshold (in Angstrom) to use for finding protein binding site residues in interaction with ligand heavy atoms.
num_buffer_residues – Number of residues to include as a buffer around each binding site residue.