panorama.utility.translate package#
Submodules#
panorama.utility.translate.macsymodel_translator module#
Model Translation Module for PANORAMA
This module provides comprehensive functionality to translate models from MacSyFinder-based tools into PANORAMA-compatible formats. It handles HMM processing, metadata parsing and model structure conversion.
Supported Sources: - DefenseFinder: Antiviral defense systems identification - CasFinder: CRISPR-Cas Antiviral defense systems identification - CONJScan: Conjugation system detection - TXSScan: Type secretion system detection - TFFScan: Type IV-A pilus detection
- panorama.utility.translate.macsymodel_translator.create_macsyfinder_hmm_list(hmms_path: Path, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, force: bool = False, disable_bar: bool = False) DataFrame#
Create a comprehensive HMM list file for PANORAMA annotation from MacSyFinder HMM files.
This function processes all HMM files in a directory, extracts the metadata, and creates a standardized HMM list file for use in PANORAMA annotation steps.
- Parameters:
hmms_path (Path) – Path to the directory containing HMM files
output (Path) – Path to the output directory for generated files
binary_hmm (bool, optional) – Whether to write HMMs in binary format. Defaults to False.
hmm_coverage (float, optional) – Global HMM coverage threshold override. Defaults to None.
target_coverage (float, optional) – Global target coverage threshold override. Defaults to None.
force (bool, optional) – Whether to overwrite the existing output directory. Defaults to False.
disable_bar (bool, optional) – Whether to disable the progress bar. Defaults to False.
- Returns:
pd.DataFrame – DataFrame containing HMM information indexed by name, with columns: - accession: unique accession ID - path: path to processed HMM file - length: HMM consensus length - protein_name: protein name - secondary_name: alternative name - score_threshold: score threshold - eval_threshold: E-value threshold - ieval_threshold: independent E-value threshold - hmm_cov_threshold: HMM coverage threshold - target_cov_threshold: target coverage threshold - description: HMM description
- Raises:
IOError – If HMM files cannot be read or processed
- panorama.utility.translate.macsymodel_translator.find_cluster_canonical(model_name: str, models: Path) List[str]#
Find canonical models for cluster-based systems.
- Parameters:
model_name – Name of the cluster model
models – Path to models directory
- Returns:
List of canonical model names
- panorama.utility.translate.macsymodel_translator.find_type_subtype_canonical(model_name: str, models: Path) List[str]#
Find canonical models for Type-Subtype naming patterns.
- Parameters:
model_name – Name of the model
models – Path to models directory
- Returns:
List of canonical model names
- panorama.utility.translate.macsymodel_translator.get_models_path(models: Path, source: str) Dict[str, Path]#
Create a mapping of model names to their file paths based on the source tool.
Different bioinformatics tools have different naming conventions and directory structures. This function standardizes the model name extraction process.
- Parameters:
models (Path) – Path to models database directory
source (str) – Name of the source tool (“defense-finder”, “CONJScan”, “TXSScan”, “TFFscan”)
- Returns:
Dict[str, Path] – Dictionary mapping standardized model names to their file paths
- Raises:
ValueError – If the source tool is not supported
- panorama.utility.translate.macsymodel_translator.parse_macsyfinder_hmm(hmm: HMM, hmm_file: Path, panorama_acc: Set[str]) Dict[str, str | int | float]#
Parse a MacSyFinder HMM and extract information for PANORAMA annotation.
This function processes HMM objects and extracts metadata needed for PANORAMA, including generating accession numbers and handling naming conventions.
- Parameters:
hmm (HMM) – HMM object from the HMM file
hmm_file (Path) – Path to the HMM file being processed
panorama_acc (Set[str]) – Set of existing PANORAMA accession IDs to avoid duplicates
- Returns:
Dict[str, Union[str, int, float]] – Dictionary with HMM metadata including: - name: HMM name - accession: unique accession ID - length: consensus sequence length - protein_name: protein name - secondary_name: alternative name - score_threshold: score threshold (NaN if not set) - eval_threshold: E-value threshold - ieval_threshold: independent E-value threshold - hmm_cov_threshold: HMM coverage threshold (NaN if not set) - target_cov_threshold: target coverage threshold (NaN if not set) - description: HMM description
- Raises:
IOError – If HMM name cannot be determined
- panorama.utility.translate.macsymodel_translator.process_attributes(elem: Element, dict_elem: Dict[str, str | Dict[str, int]], transitivity_mut: Callable[[int], int]) Dict[str, str | Dict[str, int] | List]#
Process XML attributes for an element (Family, Functional Unit or Model).
- Parameters:
elem (lxml.etree.Element) – XML element with attributes to process
dict_elem (Dict) – Dictionary to update
transitivity_mut (Callable) – Function to transform transitivity values
- Returns:
Dict – Updated gene dictionary
- panorama.utility.translate.macsymodel_translator.process_exchangeable(elem: Element, data: Dict[str, str | Dict[str, int] | List], hmm_df: DataFrame, exchangeable_set: Set[str]) Set[str]#
Process exchangeable genes from XML relations.
- Parameters:
elem (lxml.etree.Element) – XML element containing relation
data (Dict) – Model data dictionary
hmm_df (pd.DataFrame) – HMM information DataFrame
exchangeable_set (Set) – Set of exchangeable gene names to update
- Returns:
Set – The updated set of exchangeable gene names
- panorama.utility.translate.macsymodel_translator.search_canonical_macsyfinder(model_name: str, models: Path) List[str]#
Search for canonical models related to a given MacSyFinder model.
This function identifies canonical (parent/related) models based on naming patterns specific to different bioinformatics tools and their model hierarchies.
- Parameters:
model_name (str) – Name of the model to find canonical models for
models (Path) – Path to the directory containing all model files
- Returns:
List[str] – List of canonical model names related to the input model. Empty list if no canonical models are found.
Note
Canonical models are identified based on naming patterns: - Type/Subtype relationships (Defense systems) - Cluster relationships (CAS, CBASS, etc.) - Family relationships with underscore notation
- panorama.utility.translate.macsymodel_translator.translate_functional_unit(elem: Element, data: Dict[str, str | Dict[str, int] | List], hmm_df: DataFrame, transitivity_mut: Callable[[int], int]) Dict[str, str | Dict[str, int] | List]#
Translate a functional unit from MacSyFinder models (or like) into PANORAMA format.
Functional units represent collections of genes/families that work together as a unit.
- Parameters:
elem (lxml.etree.Element) – XML element corresponding to the functional unit
data (Dict[str, Any]) – Dictionary containing PANORAMA model information
hmm_df (pd.DataFrame) – HMM information DataFrame for gene translation
transitivity_mut (Callable[[int], int]) – Function to transform transitivity values
- Returns:
Dict[str, Any] – Dictionary containing PANORAMA functional unit model information with keys: - name: functional unit name - presence: presence requirement - families: list of gene families in the unit - parameters: dict with various thresholds and flags
- Raises:
KeyError – If the functional unit name is missing
- panorama.utility.translate.macsymodel_translator.translate_gene(elem: Element, data: Dict[str, str | Dict[str, int] | List], hmm_df: DataFrame, transitivity_mut: Callable[[int], int]) Dict[str, str | Dict[str, int]]#
Translate a gene element from MacSyFinder models (or like) into PANORAMA family models.
This function processes XML gene elements and converts them to PANORAMA Family, handling various attributes like presence, transitivity, and exchangeable genes.
- Parameters:
elem (lxml.etree.Element) – XML element corresponding to the gene to be translated
data (Dict[str, Any]) – Dictionary containing PANORAMA model information
hmm_df (pd.DataFrame) – HMM information DataFrame indexed by gene name for translation
transitivity_mut (Callable[[int], int]) – Function to mutate transitivity values
- Returns:
Dict[str, Any] – Dictionary containing PANORAMA family model information with keys: - name: protein name - presence: gene presence requirement (‘neutral’, ‘mandatory’, etc.) - parameters: dict with transitivity, multi_system, multi_model flags - exchangeable: list of exchangeable gene names (if any)
- Raises:
KeyError – If the gene name is missing or not found in HMM dataframe
ModelTranslationError – For unexpected translation errors
- panorama.utility.translate.macsymodel_translator.translate_macsyfinder(macsy_db: Path, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, source: str = '', force: bool = False, disable_bar: bool = False) List[Dict[str, str | Dict[str, int] | List]]#
Translate MacSyFinder models into PANORAMA format with comprehensive error handling.
This is the main translation function that orchestrates the entire process of converting MacSyFinder-compatible models into PANORAMA format, including HMM processing, model translation, and file generation.
- Parameters:
macsy_db (Path) – Path to the MacSyFinder models database directory
output (Path) – Path to output directory for PANORAMA files
binary_hmm (bool, optional) – Whether to write HMMs in binary format. Defaults to False.
hmm_coverage (float, optional) – Global HMM coverage threshold. Defaults to source-specific values.
target_coverage (float, optional) – Global target coverage threshold. Defaults to None.
source (str, optional) – Name of the source tool. Defaults to “”.
force (bool, optional) – Whether to overwrite existing files. Defaults to False.
disable_bar (bool, optional) – Whether to disable progress bars. Defaults to False.
- Returns:
List[Dict[str, Any]] – List of dictionaries containing translated PANORAMA models
- Raises:
ValueError – If an unsupported source is provided
ModelTranslationError – If model translation fails
IOError – If file operations fail
- panorama.utility.translate.macsymodel_translator.translate_macsyfinder_model(root: Element, model_name: str, hmm_df: DataFrame, canonical: List[str], transitivity_mut: Callable[[int], int]) Dict[str, str | Dict[str, int] | List]#
Translate a complete MacSyFinder model (or like) into PANORAMA format.
This function processes the root XML element of a MacSyFinder model and converts it to PANORAMA format, handling both gene-only models and models with functional units.
- Parameters:
root (et.Element) – Root XML element of the model
model_name (str) – Name of the model being translated
hmm_df (pd.DataFrame) – HMM information DataFrame for gene translation
canonical (List[str]) – List of canonical model names related to this model
transitivity_mut (Callable[[int], int]) – Function to transform transitivity values
- Returns:
Dict[str, Any] – Complete PANORAMA model information with keys: - name: model name - func_units: list of functional units - parameters: model-level parameters - canonical: list of canonical models (if any)
- Raises:
ModelTranslationError – For errors during model translation
panorama.utility.translate.padloc_translator module#
Padloc Model Translation Module for PANORAMA
This module provides functionality to translate models from PADLOC into PANORAMA-compatible formats. It handles HMM processing, metadata parsing and model structure conversion.
- panorama.utility.translate.padloc_translator._add_families_to_functional_unit(families_list: List[str], secondary_names: List[str], family_type: str, metadata_df: DataFrame, seen_families: Set[str]) List[Dict[str, str | List[str]]]#
Add families to a functional unit with proper metadata integration.
This helper function processes a list of family names and creates properly formatted family dictionaries for PANORAMA models. It handles special cases like cas_adaptation and manages exchangeable proteins.
- Parameters:
families_list (List[str]) – List of family names to process
secondary_names (List[str]) – List of known secondary names for exchangeable lookup
family_type (str) – Type of family (‘mandatory’, ‘accessory’, ‘forbidden’, ‘neutral’)
metadata_df (pd.Dataframe) – DataFrame containing family metadata
seen_families (Set[str]) – Set to track already processed families (modified in place)
- Returns:
List of family dictionaries with structure
- name – str (family name)
- presence – str (family type)
- exchangeable – List[str] (optional, list of exchangeable proteins)
- panorama.utility.translate.padloc_translator.parse_meta_padloc(meta_path: Path) DataFrame#
Parse the PADLOC metadata file and return a structured DataFrame.
The function processes the PADLOC hmm_meta.txt file which contains HMM metadata including accession numbers, names, thresholds and descriptions. It handles protein name parsing and secondary name merging.
- Parameters:
meta_path (Path) – Path to the PADLOC metadata file (hmm_meta.txt)
- Returns:
pd.DataFrame – DataFrame with parsed metadata indexed by accession number, containing columns: name, protein_name, secondary_name, score_threshold, eval_threshold, ieval_threshold, hmm_cov_threshold, target_cov_threshold, description
- Raises:
IOError – If the metadata file cannot be read
ValueError – If the metadata format is unexpected
- panorama.utility.translate.padloc_translator.search_canonical_padloc(model_name: str, models_dir: Path) List[str]#
Search for canonical models related to a PADLOC model.
PADLOC uses a naming convention where models ending with ‘_other’ are variants of base models. This function identifies the canonical (base) models for such variants.
- Parameters:
model_name – Name of the current model being processed
models_dir – Directory containing all PADLOC model files
- Returns:
List[str] – Names of canonical models related to the input model
- panorama.utility.translate.padloc_translator.translate_model_padloc(data_yaml: Dict[str, List[str] | int | bool], model_name: str, metadata_df: DataFrame, canonical_models: List[str] = None) Dict[str, str | List[Dict] | Dict[str, int] | List[str]]#
Translate a PADLOC model from YAML format to PANORAMA JSON format.
This function converts PADLOC defense system models into the standardized PANORAMA format, handling gene categories, parameters and canonical relationships.
- Parameters:
data_yaml – PADLOC model data loaded from YAML file
model_name – Name identifier for the model
metadata_df – DataFrame containing HMM metadata for gene information
canonical_models – List of canonical model names (optional)
- Returns:
Dict – Translated model in PANORAMA format with structure:
- name – str (model name)
- func_units – List[Dict] (functional units with families)
- parameters – Dict (global model parameters)
- canonical – List[str] (optional, canonical model references)
- Raises:
KeyError – If required keys are missing from the PADLOC model
AssertionError – If input parameters are invalid
ModelTranslationError – For translation-specific errors
- panorama.utility.translate.padloc_translator.translate_padloc(padloc_db: Path, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, force: bool = False, disable_bar: bool = False) List[Dict]#
Translate all PADLOC models to PANORAMA format.
This function orchestrates the complete translation process for the PADLOC database: 1. Parses HMM metadata 2. Creates HMM list file for annotation 3. Translates all model files from YAML to JSON format 4. Handles canonical model relationships
- Parameters:
padloc_db – Path to the PADLOC database directory
output – Path to output directory for translated files
binary_hmm – Whether to output HMMs in binary format
hmm_coverage – Global HMM coverage threshold (optional)
target_coverage – Global target coverage threshold (optional)
force – Whether to overwrite existing output files
disable_bar – Whether to disable progress bars
- Returns:
List[Dict] – List of translated PANORAMA models
- Raises:
FileNotFoundError – If required PADLOC database files are missing
ModelTranslationError – If translation fails for any model
panorama.utility.translate.translate module#
Model Translation Module for PANORAMA
This module provides comprehensive functionality to translate models from different bioinformatics databases (PADLOC, DefenseFinder, MacSyFinder-based tools) into PANORAMA-compatible formats. It handles HMM processing, metadata parsing and model structure conversion while maintaining compatibility across different annotation frameworks.
Supported Sources: - PADLOC: Prokaryotic Antiviral Defense Location predictor - DefenseFinder: Antiviral defense systems identification - CasFinder: CRISPR-Cas Antiviral defense systems identification - CONJScan: Conjugation system detection - TXSScan: Type secretion system detection - TFFScan: Type IV-A pilus detection
- exception panorama.utility.translate.translate.HMMProcessingError#
Bases:
ExceptionCustom exception for HMM processing errors.
- exception panorama.utility.translate.translate.ModelTranslationError#
Bases:
ExceptionCustom exception for model translation errors.
- panorama.utility.translate.translate.launch_translate(database_path: Path, source: str, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, force: bool = False, disable_bar: bool = False)#
Launch the complete model translation process for any supported source.
This is the main entry point for translating models from different bioinformatics databases to PANORAMA format. It handles the entire workflow from parsing to writing output files.
- Parameters:
database_path – Path to the source database directory
source – Source identifier (padloc, defense-finder, CONJScan, TXSScan, TFFscan)
output – Path to output directory for translated files
binary_hmm – Whether to output HMMs in binary format (default: False)
hmm_coverage – Global HMM coverage threshold (optional, defaults vary by source)
target_coverage – Global target coverage threshold (optional)
force – Whether to overwrite existing output files (default: False)
disable_bar – Whether to disable progress bars (default: False)
- Raises:
ValueError – If the source is not recognized
FileNotFoundError – If the database path doesn’t exist
ModelTranslationError – If the translation process fails
- panorama.utility.translate.translate.read_xml(model_path: Path) Element#
Read and parse an XML file with security considerations.
- Parameters:
model_path (Path) – Path to the XML file to be read
- Returns:
et.Element – The root element of the parsed XML document
- Raises:
IOError – If there is a problem opening the file
et.XMLSyntaxError – If there is a problem parsing the XML content
ModelTranslationError – For any unexpected errors during file processing
- panorama.utility.translate.translate.read_yaml(model_path: Path) Dict[str, List[str] | int | bool]#
Read and parse a YAML file safely.
- Parameters:
model_path (Path) – Path to the YAML file to be read
- Returns:
Dict – The contents of the YAML file as a Python dictionary
- Raises:
IOError – If there is a problem opening the file
yaml.YAMLError – If there is a problem parsing the YAML content
ModelTranslationError – For any unexpected errors during file processing
- panorama.utility.translate.translate.write_model(output_path: Path, model_data: Dict[str, str | List[Dict] | Dict[str, int] | List[str]]) Path#
Write a translated model to a JSON file with proper formatting.
- Parameters:
output_path (Path) – Path to the output directory
model_data (Dict) – Dictionary containing the model data to write
- Returns:
Path – Path to the written JSON file
- Raises:
IOError – If there is a problem writing the file
KeyError – If the model_data doesn’t contain the required ‘name’ field
Module contents#
- exception panorama.utility.translate.ModelTranslationError#
Bases:
ExceptionCustom exception for model translation errors.
- panorama.utility.translate.launch_translate(database_path: Path, source: str, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, force: bool = False, disable_bar: bool = False)#
Launch the complete model translation process for any supported source.
This is the main entry point for translating models from different bioinformatics databases to PANORAMA format. It handles the entire workflow from parsing to writing output files.
- Parameters:
database_path – Path to the source database directory
source – Source identifier (padloc, defense-finder, CONJScan, TXSScan, TFFscan)
output – Path to output directory for translated files
binary_hmm – Whether to output HMMs in binary format (default: False)
hmm_coverage – Global HMM coverage threshold (optional, defaults vary by source)
target_coverage – Global target coverage threshold (optional)
force – Whether to overwrite existing output files (default: False)
disable_bar – Whether to disable progress bars (default: False)
- Raises:
ValueError – If the source is not recognized
FileNotFoundError – If the database path doesn’t exist
ModelTranslationError – If the translation process fails
- panorama.utility.translate.read_xml(model_path: Path) Element#
Read and parse an XML file with security considerations.
- Parameters:
model_path (Path) – Path to the XML file to be read
- Returns:
et.Element – The root element of the parsed XML document
- Raises:
IOError – If there is a problem opening the file
et.XMLSyntaxError – If there is a problem parsing the XML content
ModelTranslationError – For any unexpected errors during file processing
- panorama.utility.translate.read_yaml(model_path: Path) Dict[str, List[str] | int | bool]#
Read and parse a YAML file safely.
- Parameters:
model_path (Path) – Path to the YAML file to be read
- Returns:
Dict – The contents of the YAML file as a Python dictionary
- Raises:
IOError – If there is a problem opening the file
yaml.YAMLError – If there is a problem parsing the YAML content
ModelTranslationError – For any unexpected errors during file processing