panorama.utility.translate package#

Submodules#

panorama.utility.translate.macsymodel_translator module#

Model Translation Module for PANORAMA

This module provides comprehensive functionality to translate models from MacSyFinder-based tools into PANORAMA-compatible formats. It handles HMM processing, metadata parsing and model structure conversion.

Supported Sources: - DefenseFinder: Antiviral defense systems identification - CasFinder: CRISPR-Cas Antiviral defense systems identification - CONJScan: Conjugation system detection - TXSScan: Type secretion system detection - TFFScan: Type IV-A pilus detection

panorama.utility.translate.macsymodel_translator.create_macsyfinder_hmm_list(hmms_path: Path, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, force: bool = False, disable_bar: bool = False) DataFrame#

Create a comprehensive HMM list file for PANORAMA annotation from MacSyFinder HMM files.

This function processes all HMM files in a directory, extracts the metadata, and creates a standardized HMM list file for use in PANORAMA annotation steps.

Parameters:
  • hmms_path (Path) – Path to the directory containing HMM files

  • output (Path) – Path to the output directory for generated files

  • binary_hmm (bool, optional) – Whether to write HMMs in binary format. Defaults to False.

  • hmm_coverage (float, optional) – Global HMM coverage threshold override. Defaults to None.

  • target_coverage (float, optional) – Global target coverage threshold override. Defaults to None.

  • force (bool, optional) – Whether to overwrite the existing output directory. Defaults to False.

  • disable_bar (bool, optional) – Whether to disable the progress bar. Defaults to False.

Returns:

pd.DataFrame – DataFrame containing HMM information indexed by name, with columns: - accession: unique accession ID - path: path to processed HMM file - length: HMM consensus length - protein_name: protein name - secondary_name: alternative name - score_threshold: score threshold - eval_threshold: E-value threshold - ieval_threshold: independent E-value threshold - hmm_cov_threshold: HMM coverage threshold - target_cov_threshold: target coverage threshold - description: HMM description

Raises:

IOError – If HMM files cannot be read or processed

panorama.utility.translate.macsymodel_translator.find_cluster_canonical(model_name: str, models: Path) List[str]#

Find canonical models for cluster-based systems.

Parameters:
  • model_name – Name of the cluster model

  • models – Path to models directory

Returns:

List of canonical model names

panorama.utility.translate.macsymodel_translator.find_type_subtype_canonical(model_name: str, models: Path) List[str]#

Find canonical models for Type-Subtype naming patterns.

Parameters:
  • model_name – Name of the model

  • models – Path to models directory

Returns:

List of canonical model names

panorama.utility.translate.macsymodel_translator.get_models_path(models: Path, source: str) Dict[str, Path]#

Create a mapping of model names to their file paths based on the source tool.

Different bioinformatics tools have different naming conventions and directory structures. This function standardizes the model name extraction process.

Parameters:
  • models (Path) – Path to models database directory

  • source (str) – Name of the source tool (“defense-finder”, “CONJScan”, “TXSScan”, “TFFscan”)

Returns:

Dict[str, Path] – Dictionary mapping standardized model names to their file paths

Raises:

ValueError – If the source tool is not supported

panorama.utility.translate.macsymodel_translator.parse_macsyfinder_hmm(hmm: HMM, hmm_file: Path, panorama_acc: Set[str]) Dict[str, str | int | float]#

Parse a MacSyFinder HMM and extract information for PANORAMA annotation.

This function processes HMM objects and extracts metadata needed for PANORAMA, including generating accession numbers and handling naming conventions.

Parameters:
  • hmm (HMM) – HMM object from the HMM file

  • hmm_file (Path) – Path to the HMM file being processed

  • panorama_acc (Set[str]) – Set of existing PANORAMA accession IDs to avoid duplicates

Returns:

Dict[str, Union[str, int, float]] – Dictionary with HMM metadata including: - name: HMM name - accession: unique accession ID - length: consensus sequence length - protein_name: protein name - secondary_name: alternative name - score_threshold: score threshold (NaN if not set) - eval_threshold: E-value threshold - ieval_threshold: independent E-value threshold - hmm_cov_threshold: HMM coverage threshold (NaN if not set) - target_cov_threshold: target coverage threshold (NaN if not set) - description: HMM description

Raises:

IOError – If HMM name cannot be determined

panorama.utility.translate.macsymodel_translator.process_attributes(elem: Element, dict_elem: Dict[str, str | Dict[str, int]], transitivity_mut: Callable[[int], int]) Dict[str, str | Dict[str, int] | List]#

Process XML attributes for an element (Family, Functional Unit or Model).

Parameters:
  • elem (lxml.etree.Element) – XML element with attributes to process

  • dict_elem (Dict) – Dictionary to update

  • transitivity_mut (Callable) – Function to transform transitivity values

Returns:

Dict – Updated gene dictionary

panorama.utility.translate.macsymodel_translator.process_exchangeable(elem: Element, data: Dict[str, str | Dict[str, int] | List], hmm_df: DataFrame, exchangeable_set: Set[str]) Set[str]#

Process exchangeable genes from XML relations.

Parameters:
  • elem (lxml.etree.Element) – XML element containing relation

  • data (Dict) – Model data dictionary

  • hmm_df (pd.DataFrame) – HMM information DataFrame

  • exchangeable_set (Set) – Set of exchangeable gene names to update

Returns:

Set – The updated set of exchangeable gene names

panorama.utility.translate.macsymodel_translator.search_canonical_macsyfinder(model_name: str, models: Path) List[str]#

Search for canonical models related to a given MacSyFinder model.

This function identifies canonical (parent/related) models based on naming patterns specific to different bioinformatics tools and their model hierarchies.

Parameters:
  • model_name (str) – Name of the model to find canonical models for

  • models (Path) – Path to the directory containing all model files

Returns:

List[str] – List of canonical model names related to the input model. Empty list if no canonical models are found.

Note

Canonical models are identified based on naming patterns: - Type/Subtype relationships (Defense systems) - Cluster relationships (CAS, CBASS, etc.) - Family relationships with underscore notation

panorama.utility.translate.macsymodel_translator.translate_functional_unit(elem: Element, data: Dict[str, str | Dict[str, int] | List], hmm_df: DataFrame, transitivity_mut: Callable[[int], int]) Dict[str, str | Dict[str, int] | List]#

Translate a functional unit from MacSyFinder models (or like) into PANORAMA format.

Functional units represent collections of genes/families that work together as a unit.

Parameters:
  • elem (lxml.etree.Element) – XML element corresponding to the functional unit

  • data (Dict[str, Any]) – Dictionary containing PANORAMA model information

  • hmm_df (pd.DataFrame) – HMM information DataFrame for gene translation

  • transitivity_mut (Callable[[int], int]) – Function to transform transitivity values

Returns:

Dict[str, Any] – Dictionary containing PANORAMA functional unit model information with keys: - name: functional unit name - presence: presence requirement - families: list of gene families in the unit - parameters: dict with various thresholds and flags

Raises:

KeyError – If the functional unit name is missing

panorama.utility.translate.macsymodel_translator.translate_gene(elem: Element, data: Dict[str, str | Dict[str, int] | List], hmm_df: DataFrame, transitivity_mut: Callable[[int], int]) Dict[str, str | Dict[str, int]]#

Translate a gene element from MacSyFinder models (or like) into PANORAMA family models.

This function processes XML gene elements and converts them to PANORAMA Family, handling various attributes like presence, transitivity, and exchangeable genes.

Parameters:
  • elem (lxml.etree.Element) – XML element corresponding to the gene to be translated

  • data (Dict[str, Any]) – Dictionary containing PANORAMA model information

  • hmm_df (pd.DataFrame) – HMM information DataFrame indexed by gene name for translation

  • transitivity_mut (Callable[[int], int]) – Function to mutate transitivity values

Returns:

Dict[str, Any] – Dictionary containing PANORAMA family model information with keys: - name: protein name - presence: gene presence requirement (‘neutral’, ‘mandatory’, etc.) - parameters: dict with transitivity, multi_system, multi_model flags - exchangeable: list of exchangeable gene names (if any)

Raises:
  • KeyError – If the gene name is missing or not found in HMM dataframe

  • ModelTranslationError – For unexpected translation errors

panorama.utility.translate.macsymodel_translator.translate_macsyfinder(macsy_db: Path, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, source: str = '', force: bool = False, disable_bar: bool = False) List[Dict[str, str | Dict[str, int] | List]]#

Translate MacSyFinder models into PANORAMA format with comprehensive error handling.

This is the main translation function that orchestrates the entire process of converting MacSyFinder-compatible models into PANORAMA format, including HMM processing, model translation, and file generation.

Parameters:
  • macsy_db (Path) – Path to the MacSyFinder models database directory

  • output (Path) – Path to output directory for PANORAMA files

  • binary_hmm (bool, optional) – Whether to write HMMs in binary format. Defaults to False.

  • hmm_coverage (float, optional) – Global HMM coverage threshold. Defaults to source-specific values.

  • target_coverage (float, optional) – Global target coverage threshold. Defaults to None.

  • source (str, optional) – Name of the source tool. Defaults to “”.

  • force (bool, optional) – Whether to overwrite existing files. Defaults to False.

  • disable_bar (bool, optional) – Whether to disable progress bars. Defaults to False.

Returns:

List[Dict[str, Any]] – List of dictionaries containing translated PANORAMA models

Raises:
  • ValueError – If an unsupported source is provided

  • ModelTranslationError – If model translation fails

  • IOError – If file operations fail

panorama.utility.translate.macsymodel_translator.translate_macsyfinder_model(root: Element, model_name: str, hmm_df: DataFrame, canonical: List[str], transitivity_mut: Callable[[int], int]) Dict[str, str | Dict[str, int] | List]#

Translate a complete MacSyFinder model (or like) into PANORAMA format.

This function processes the root XML element of a MacSyFinder model and converts it to PANORAMA format, handling both gene-only models and models with functional units.

Parameters:
  • root (et.Element) – Root XML element of the model

  • model_name (str) – Name of the model being translated

  • hmm_df (pd.DataFrame) – HMM information DataFrame for gene translation

  • canonical (List[str]) – List of canonical model names related to this model

  • transitivity_mut (Callable[[int], int]) – Function to transform transitivity values

Returns:

Dict[str, Any] – Complete PANORAMA model information with keys: - name: model name - func_units: list of functional units - parameters: model-level parameters - canonical: list of canonical models (if any)

Raises:

ModelTranslationError – For errors during model translation

panorama.utility.translate.padloc_translator module#

Padloc Model Translation Module for PANORAMA

This module provides functionality to translate models from PADLOC into PANORAMA-compatible formats. It handles HMM processing, metadata parsing and model structure conversion.

panorama.utility.translate.padloc_translator._add_families_to_functional_unit(families_list: List[str], secondary_names: List[str], family_type: str, metadata_df: DataFrame, seen_families: Set[str]) List[Dict[str, str | List[str]]]#

Add families to a functional unit with proper metadata integration.

This helper function processes a list of family names and creates properly formatted family dictionaries for PANORAMA models. It handles special cases like cas_adaptation and manages exchangeable proteins.

Parameters:
  • families_list (List[str]) – List of family names to process

  • secondary_names (List[str]) – List of known secondary names for exchangeable lookup

  • family_type (str) – Type of family (‘mandatory’, ‘accessory’, ‘forbidden’, ‘neutral’)

  • metadata_df (pd.Dataframe) – DataFrame containing family metadata

  • seen_families (Set[str]) – Set to track already processed families (modified in place)

Returns:
  • List of family dictionaries with structure

  • - name – str (family name)

  • - presence – str (family type)

  • - exchangeable – List[str] (optional, list of exchangeable proteins)

panorama.utility.translate.padloc_translator.parse_meta_padloc(meta_path: Path) DataFrame#

Parse the PADLOC metadata file and return a structured DataFrame.

The function processes the PADLOC hmm_meta.txt file which contains HMM metadata including accession numbers, names, thresholds and descriptions. It handles protein name parsing and secondary name merging.

Parameters:

meta_path (Path) – Path to the PADLOC metadata file (hmm_meta.txt)

Returns:

pd.DataFrame – DataFrame with parsed metadata indexed by accession number, containing columns: name, protein_name, secondary_name, score_threshold, eval_threshold, ieval_threshold, hmm_cov_threshold, target_cov_threshold, description

Raises:
  • IOError – If the metadata file cannot be read

  • ValueError – If the metadata format is unexpected

panorama.utility.translate.padloc_translator.search_canonical_padloc(model_name: str, models_dir: Path) List[str]#

Search for canonical models related to a PADLOC model.

PADLOC uses a naming convention where models ending with ‘_other’ are variants of base models. This function identifies the canonical (base) models for such variants.

Parameters:
  • model_name – Name of the current model being processed

  • models_dir – Directory containing all PADLOC model files

Returns:

List[str] – Names of canonical models related to the input model

panorama.utility.translate.padloc_translator.translate_model_padloc(data_yaml: Dict[str, List[str] | int | bool], model_name: str, metadata_df: DataFrame, canonical_models: List[str] = None) Dict[str, str | List[Dict] | Dict[str, int] | List[str]]#

Translate a PADLOC model from YAML format to PANORAMA JSON format.

This function converts PADLOC defense system models into the standardized PANORAMA format, handling gene categories, parameters and canonical relationships.

Parameters:
  • data_yaml – PADLOC model data loaded from YAML file

  • model_name – Name identifier for the model

  • metadata_df – DataFrame containing HMM metadata for gene information

  • canonical_models – List of canonical model names (optional)

Returns:
  • Dict – Translated model in PANORAMA format with structure:

  • - name – str (model name)

  • - func_units – List[Dict] (functional units with families)

  • - parameters – Dict (global model parameters)

  • - canonical – List[str] (optional, canonical model references)

Raises:
  • KeyError – If required keys are missing from the PADLOC model

  • AssertionError – If input parameters are invalid

  • ModelTranslationError – For translation-specific errors

panorama.utility.translate.padloc_translator.translate_padloc(padloc_db: Path, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, force: bool = False, disable_bar: bool = False) List[Dict]#

Translate all PADLOC models to PANORAMA format.

This function orchestrates the complete translation process for the PADLOC database: 1. Parses HMM metadata 2. Creates HMM list file for annotation 3. Translates all model files from YAML to JSON format 4. Handles canonical model relationships

Parameters:
  • padloc_db – Path to the PADLOC database directory

  • output – Path to output directory for translated files

  • binary_hmm – Whether to output HMMs in binary format

  • hmm_coverage – Global HMM coverage threshold (optional)

  • target_coverage – Global target coverage threshold (optional)

  • force – Whether to overwrite existing output files

  • disable_bar – Whether to disable progress bars

Returns:

List[Dict] – List of translated PANORAMA models

Raises:
  • FileNotFoundError – If required PADLOC database files are missing

  • ModelTranslationError – If translation fails for any model

panorama.utility.translate.translate module#

Model Translation Module for PANORAMA

This module provides comprehensive functionality to translate models from different bioinformatics databases (PADLOC, DefenseFinder, MacSyFinder-based tools) into PANORAMA-compatible formats. It handles HMM processing, metadata parsing and model structure conversion while maintaining compatibility across different annotation frameworks.

Supported Sources: - PADLOC: Prokaryotic Antiviral Defense Location predictor - DefenseFinder: Antiviral defense systems identification - CasFinder: CRISPR-Cas Antiviral defense systems identification - CONJScan: Conjugation system detection - TXSScan: Type secretion system detection - TFFScan: Type IV-A pilus detection

exception panorama.utility.translate.translate.HMMProcessingError#

Bases: Exception

Custom exception for HMM processing errors.

exception panorama.utility.translate.translate.ModelTranslationError#

Bases: Exception

Custom exception for model translation errors.

panorama.utility.translate.translate.launch_translate(database_path: Path, source: str, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, force: bool = False, disable_bar: bool = False)#

Launch the complete model translation process for any supported source.

This is the main entry point for translating models from different bioinformatics databases to PANORAMA format. It handles the entire workflow from parsing to writing output files.

Parameters:
  • database_path – Path to the source database directory

  • source – Source identifier (padloc, defense-finder, CONJScan, TXSScan, TFFscan)

  • output – Path to output directory for translated files

  • binary_hmm – Whether to output HMMs in binary format (default: False)

  • hmm_coverage – Global HMM coverage threshold (optional, defaults vary by source)

  • target_coverage – Global target coverage threshold (optional)

  • force – Whether to overwrite existing output files (default: False)

  • disable_bar – Whether to disable progress bars (default: False)

Raises:
  • ValueError – If the source is not recognized

  • FileNotFoundError – If the database path doesn’t exist

  • ModelTranslationError – If the translation process fails

panorama.utility.translate.translate.read_xml(model_path: Path) Element#

Read and parse an XML file with security considerations.

Parameters:

model_path (Path) – Path to the XML file to be read

Returns:

et.Element – The root element of the parsed XML document

Raises:
  • IOError – If there is a problem opening the file

  • et.XMLSyntaxError – If there is a problem parsing the XML content

  • ModelTranslationError – For any unexpected errors during file processing

panorama.utility.translate.translate.read_yaml(model_path: Path) Dict[str, List[str] | int | bool]#

Read and parse a YAML file safely.

Parameters:

model_path (Path) – Path to the YAML file to be read

Returns:

Dict – The contents of the YAML file as a Python dictionary

Raises:
  • IOError – If there is a problem opening the file

  • yaml.YAMLError – If there is a problem parsing the YAML content

  • ModelTranslationError – For any unexpected errors during file processing

panorama.utility.translate.translate.write_model(output_path: Path, model_data: Dict[str, str | List[Dict] | Dict[str, int] | List[str]]) Path#

Write a translated model to a JSON file with proper formatting.

Parameters:
  • output_path (Path) – Path to the output directory

  • model_data (Dict) – Dictionary containing the model data to write

Returns:

Path – Path to the written JSON file

Raises:
  • IOError – If there is a problem writing the file

  • KeyError – If the model_data doesn’t contain the required ‘name’ field

Module contents#

exception panorama.utility.translate.ModelTranslationError#

Bases: Exception

Custom exception for model translation errors.

panorama.utility.translate.launch_translate(database_path: Path, source: str, output: Path, binary_hmm: bool = False, hmm_coverage: float = None, target_coverage: float = None, force: bool = False, disable_bar: bool = False)#

Launch the complete model translation process for any supported source.

This is the main entry point for translating models from different bioinformatics databases to PANORAMA format. It handles the entire workflow from parsing to writing output files.

Parameters:
  • database_path – Path to the source database directory

  • source – Source identifier (padloc, defense-finder, CONJScan, TXSScan, TFFscan)

  • output – Path to output directory for translated files

  • binary_hmm – Whether to output HMMs in binary format (default: False)

  • hmm_coverage – Global HMM coverage threshold (optional, defaults vary by source)

  • target_coverage – Global target coverage threshold (optional)

  • force – Whether to overwrite existing output files (default: False)

  • disable_bar – Whether to disable progress bars (default: False)

Raises:
  • ValueError – If the source is not recognized

  • FileNotFoundError – If the database path doesn’t exist

  • ModelTranslationError – If the translation process fails

panorama.utility.translate.read_xml(model_path: Path) Element#

Read and parse an XML file with security considerations.

Parameters:

model_path (Path) – Path to the XML file to be read

Returns:

et.Element – The root element of the parsed XML document

Raises:
  • IOError – If there is a problem opening the file

  • et.XMLSyntaxError – If there is a problem parsing the XML content

  • ModelTranslationError – For any unexpected errors during file processing

panorama.utility.translate.read_yaml(model_path: Path) Dict[str, List[str] | int | bool]#

Read and parse a YAML file safely.

Parameters:

model_path (Path) – Path to the YAML file to be read

Returns:

Dict – The contents of the YAML file as a Python dictionary

Raises:
  • IOError – If there is a problem opening the file

  • yaml.YAMLError – If there is a problem parsing the YAML content

  • ModelTranslationError – For any unexpected errors during file processing