panorama.systems package#

Submodules#

panorama.systems.detection module#

This module provides functions to detect biological systems in pangenomes.

panorama.systems.detection.check_detection_args(args)#

Checks and processes the provided arguments to ensure they are valid.

Parameters:: args (Namespace) – The parsed command-line arguments.
Returns:: Dict[str, Union[bool, str, List[str]]] – A dictionary indicating necessary information to load the pangenome.
Raises:: argparse.ArgumentTypeError – If ‘jaccard’ is not a restricted float.
Return type:: Dict[str, Union[bool, str, List[str]]]

panorama.systems.detection.check_for_forbidden_unit(su_found, model)#

Checks if there are forbidden system units.

Parameters:

su_found (Dict[str, Set[SystemUnit]]) – Dictionary with all system units found sorted by name.
model (Model) – Model corresponding to the system checked.

Returns:

bool – True if forbidden conditions are encountered, False otherwise.

Return type:

bool

panorama.systems.detection.check_for_needed_units(su_found, model)#

Checks if the presence/absence rules for necessary functional units are respected.

Parameters:

su_found (Dict[str, Set[SystemUnit]]) – Dictionary with all system units found sorted by name.
model (Model) – Model corresponding to the system checked.

Returns:

bool – True if forbidden conditions are encountered, False otherwise.

Return type:

bool

panorama.systems.detection.check_pangenome_detection(pangenome, metadata_sources, systems_source, force=False)#

Checks and loads pangenome information before adding families.

Parameters:

pangenome (Pangenome) – Pangenome object to be checked.
metadata_sources (List[str]) – Sources used to associate families with gene families.
systems_source (str) – Source used to detect systems.
force (bool) – If True, forces the erasure of pangenome systems from the source. Defaults to False.

Raises:

KeyError – If the provided annotation source is not in the pangenome.
ValueError – If systems are already detected based on the source and ‘force’ is not used.
AttributeError – If there is no metadata associated with families.

panorama.systems.detection.get_functional_unit_gene_families(func_unit, gene_families, gene_fam2mod_fam)#

Retrieves the gene families that might be in the functional unit.

Parameters:

func_unit (FuncUnit) – The functional unit to consider.
gene_families (Set[GeneFamily]) – Set of gene families that might be in the system.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to model families.

Returns:

Tuple[Set[GeneFamily], Set[GeneFamily]] –

gene families that code for the functional unit
neutral gene families of the function unit.

Return type:

Tuple[Set[GeneFamily], Set[GeneFamily]]

panorama.systems.detection.get_subcombinations(combi, combinations)#

Removes a combination and all its sub-combinations from a list of combinations.

Parameters:

combi (Set[GeneFamily]) – Combination to be removed.
combinations (List[FrozenSet[GeneFamily]]) – List of combinations to be filtered.

Returns:

List[FrozenSet[GeneFamily]] – List of removed combinations.

Return type:

List[FrozenSet[GeneFamily]]

panorama.systems.detection.get_system_unit_combinations(su_found, model)#

Generates combinations of system units that could code for a system.

The function generates combinations from mandatory, optional, and neutral categories, ensuring that model parameters are respected. Combinations with and without elements from neutral categories are generated.

Parameters:

su_found (Dict[str, Set[SystemUnit]]) – A dictionary where keys are functional unit names and values are sets of elements belonging to each functional unit model.
model (Model) – Model corresponding to the functional unit.

Returns:

List[List[SystemUnit]] – A list of all possible valid combinations based on the model.

Return type:

List[List[SystemUnit]]

panorama.systems.detection.launch(args)#

Launches functions to detect systems in pangenomes.

Parameters:: args (argparse.Namespace) – Argument given in CLI.

panorama.systems.detection.parser_detection(parser)#

Adds arguments to the parser for the ‘systems’ command.

Parameters:: parser (argparse.ArgumentParser) – Parser for the ‘systems’ command.

panorama.systems.detection.search_for_system(model, su_found, source, jaccard_threshold=0.8)#

Searches for a system corresponding to the model based on the units found.

Parameters:

model (Model) – Model corresponding to the system searched.
su_found (Dict[str, Set[SystemUnit]]) – The system units found for the model.
source (str) – Name of the annotation source.
jaccard_threshold (float) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.

Returns:

set of System – Systems detected.

Return type:

Set[System]

panorama.systems.detection.search_system(model, gf2fam, fam2source, source, jaccard_threshold=0.8, sensitivity=1)#

Searches for a model system in a pangenome.

Parameters:

model (Model) – Model to search in the pangenome.
gf2fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to their families.
fam2source (Dict[str, str]) – Dictionary linking families to their sources.
source (str) – Name of the annotation source.
jaccard_threshold (float) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
sensitivity (int) – Sensitivity level for detection: - 1. corresponds to a global Jaccard filtering on the context without looking at all the combinations. - 2. corresponds to a global Jaccard filtering on the specific context of each combination. - 3. corresponds to a local Jaccard filtering on the specific context of each combination. - Defaults to 1.

Returns:

set of System – Set of systems detected in the pangenome for the given model.

Return type:

Set[System]

panorama.systems.detection.search_system_units(model, gf2fam, fam2source, source, jaccard_threshold=0.8, sensitivity=1)#

Searches for system units corresponding to a model.

Parameters:

model (Model) – Model corresponding to the system searched.
gf2fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to their families.
fam2source (Dict[str, str]) – Dictionary linking families to their sources.
source (str) – Name of the annotation source.
jaccard_threshold (float) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
sensitivity (int) – Sensitivity level for detection: - 1. corresponds to a global Jaccard filtering on the context without looking at all the combinations. - 2. corresponds to a global Jaccard filtering on the specific context of each combination. - 3. corresponds to a local Jaccard filtering on the specific context of each combination. - Defaults to 1.

Returns:

Dict[str, Set[SystemUnit]] – System units found with their name as the key and units as value.

Raises:

ValueError – If sensitivity is not 1, 2, or 3.

Return type:

Dict[str, Set[SystemUnit]]

panorama.systems.detection.search_systems(models, pangenome, source, metadata_sources, jaccard_threshold=0.8, sensitivity=1, disable_bar=False)#

Searches for systems present in the pangenome for all models.

Parameters:

models (Models) – Models to search in pangenomes.
pangenome (Pangenome) – Pangenome object containing gene families.
source (str) – Name of the source for the system.
metadata_sources (List[str]) – List of the metadata sources for the families.
jaccard_threshold (float) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
disable_bar (bool) – If True, disables the progress bar. Defaults to False.
sensitivity (int) – Sensitivity level for detection: - 1. corresponds to a global Jaccard filtering on the context without looking at all the combinations. - 2. corresponds to a global Jaccard filtering on the specific context of each combination. - 3. corresponds to a local Jaccard filtering on the specific context of each combination. - Defaults to 1.

panorama.systems.detection.search_systems_in_pangenomes(models, pangenomes, source, metadata_sources, jaccard_threshold=0.8, sensitivity=1, threads=1, lock=None, disable_bar=False)#

Searches for systems in pangenomes by multithreading on pangenomes.

Parameters:

models (Models) – Models to search in pangenomes.
pangenomes (Pangenomes) – Getter object with Pangenome.
source (str) – Name of the source for the system.
metadata_sources (List[str]) – List of the metadata sources for the families.
jaccard_threshold (float) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
threads (int) – Number of available threads. Defaults to 1.
lock (Lock) – Global lock for multiprocessing execution. Defaults to None.
disable_bar (bool) – If True, disables the progress bar. Defaults to False.
sensitivity (int) – Sensitivity level for detection: - 1. corresponds to a global Jaccard filtering on the context without looking at all the combinations. - 2. corresponds to a global Jaccard filtering on the specific context of each combination. - 3. corresponds to a local Jaccard filtering on the specific context of each combination. - Defaults to 1.

panorama.systems.detection.search_unit_in_cc(graph, system_gf, func_unit, source, gf2fam, fam2source, detected_su)#

Searches for functional units in connected components of a graph.

Parameters:

graph (Graph) – The graph to search within.
system_gf (Set[GeneFamily]) – Set of gene families in the system.
func_unit (FuncUnit) – Functional unit to search for.
source (str) – Source of the data.
gf2fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to their families.
fam2source (Dict[str, str]) – Dictionary linking families to their sources.
detected_su (Set[SystemUnit]) – Set of already detected system units.

Returns:

Set[FrozenSet[GeneFamily]] – Set of found combinations of gene families.

Return type:

Set[FrozenSet[GeneFamily]]

panorama.systems.detection.search_unit_in_combination(graph, families, gene_fam2mod_fam, mod_fam2meta_source, func_unit, source, matrix, combinations, combinations2orgs, jaccard_threshold=0.8, local=False)#

Searches for functional unit corresponding to a model in a graph.

Parameters:

graph (Graph) – A graph with families in one connected component.
families (Set[GeneFamily]) – A set of families that code for the searched model.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to their families.
mod_fam2meta_source (Dict[str, str]) – Dictionary linking model families to metadata sources.
func_unit (FuncUnit) – One functional unit corresponding to the model.
source (str) – Name of the source.
matrix (DataFrame) – Dataframe containing association between gene families and unit families.
combinations (List[FrozenSet[GeneFamily]]) – Existing combination of gene families in organisms.
jaccard_threshold (float) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
combinations2orgs (Dict[FrozenSet[GeneFamily], Set[Organism]]) – The combination of gene families corresponding to the context that exists in at least one genome.
local (bool) – (bool, optional): Whether to filter the context with a local Jaccard index or not. Defaults to False.

Returns:

Set[SystemUnit] – Set of all detected functional units in the graph.

Return type:

Set[SystemUnit]

panorama.systems.detection.search_unit_in_context(graph, families, gene_fam2mod_fam, mod_fam2meta_source, matrix, func_unit, source, combinations2orgs=None, jaccard_threshold=0.8, local=False)#

Searches for system unit corresponding to a model in a pangenomic context.

Parameters:

graph (Graph) – A graph with families in a pangenomic context.
families (Set[GeneFamily]) – A set of families that code for the searched model.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to model families.
mod_fam2meta_source (Dict[str, str]) – Dictionary linking model families to metadata sources.
func_unit (FuncUnit) – One functional unit corresponding to the model.
source (str) – Name of the source.
matrix (DataFrame) – Dataframe containing association between gene families and unit families.
jaccard_threshold (float) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
combinations2orgs (Dict[FrozenSet[GeneFamily], Set[Organism]]) – The combination of gene families corresponding to the context that exists in at least one genome. Defaults to None.
local (bool) – (bool, optional): Whether to filter the context with a local Jaccard index or not. Defaults to False.

Returns:

Set[SystemUnit] – Set of detected systems in the pangenomic context.

Return type:

Set[SystemUnit]

panorama.systems.detection.subparser(sub_parser)#

Creates a subparser to launch PANORAMA from the command line.

Parameters:: sub_parser – Sub-parser for the ‘systems’ command.
Returns:: argparse.ArgumentParser – Parser with arguments for the ‘systems’ command.
Return type:: ArgumentParser

panorama.systems.detection.write_systems_to_pangenome(pangenome, source, disable_bar=False)#

Writes detected systems to the pangenome.

Parameters:

pangenome (Pangenome) – Pangenome object containing detected systems.
source (str) – Name of the annotation source.
disable_bar (bool) – If True, disables the progress bar. Defaults to False.

panorama.systems.detection.write_systems_to_pangenomes(pangenomes, source, threads=1, lock=None, disable_bar=False)#

Writes detected systems into pangenomes.

Parameters:

pangenomes (Pangenomes) – Pangenomes object containing all the pangenomes with systems.
source (str) – Metadata source.
threads (int) – Number of available threads. Defaults to 1.
lock (Lock) – Lock for multiprocessing execution. Defaults to None.
disable_bar (bool) – If True, disables the progress bar. Defaults to False.

panorama.systems.models module#

This module provides tools to define and validate rules used to detect biological systems.

class panorama.systems.models._BasicFeatures(name='', transitivity=0, window=1)#

Bases: object

Handles basic features typically required for configurations.

This class is used as a foundation for managing basic configurations, ensuring parameter consistency, and providing utility methods for its derived classes. It includes mechanisms for naming, transitivity, and contextual window definitions. The class also provides methods to override and validate parameters dynamically by integrating it with external sources or inheriting configurations from parent instances.

name#

Name of the element.

Type:: str

transitivity#

Size of the transitive closure used to build the graph.

Type:: int

window#

Number of neighboring genes considered on each side of a gene of

Type:: int

interest when searching for conserved genomic contexts.

__init__(name='', transitivity=0, window=1)#

Initializes the class with specific attributes for name, transitivity, and window. These attributes are used to define the characteristics and behavior of an object of this class.

Parameters:

name (str) – The name associated with the object. Defaults to an empty string.
transitivity (int) – An integer value representing the transitivity associated with the object. Defaults to 0.
window (int) – An integer that defines the window size or configuration parameter. Defaults to 1.

__repr__()#

Provides a string representation of the object. This method is intended to provide a clear and concise human-readable representation of the object in the form of its class name and name attribute.

Returns:: str – A string containing the class name and the object’s name attribute.

__str__()#

Converts the object representation to its string form.

This method is used to return a string representation of the class instance, primarily for debugging or logging purposes. It includes the name of the class and the value of the name attribute.

Returns:: str – A formatted string containing the class name and the name attribute.

read_parameters(parameters, param_keys)#

Reads and assigns parameters from a provided dictionary or falls back to parent attributes if the parameter is not found.

Parameters:

parameters (Dict[str, Union[str, int, bool]]) – A dictionary containing parameter key-value pairs.
param_keys (Set[str]) – A list of keys to be retrieved from the parameters’ dictionary.

class panorama.systems.models._FuFamFeatures(presence='', parent=None, duplicate=0, exchangeable=None, multi_system=False, multi_model=True)#

Bases: object

Represents features related to functional family rules and their properties.

This class is used to define and manage the attributes and rules associated with a functional family. It includes properties such as presence types, duplication count, exchangeable families, and whether a family can be present in multiple systems or models.

presence#

Type of the rule, such as mandatory, accessory, forbidden, or neutral.

Type:: str

duplicate#

Number of duplicates for the functional family.

Type:: int

exchangeable#

Set of exchangeable families that can replace one another in functionality.

Type:: set

multi_system#

Determines if the family can exist across multiple systems.

Type:: bool

multi_model#

Determines if the family can exist across multiple models.

Type:: bool

__init__(presence='', parent=None, duplicate=0, exchangeable=None, multi_system=False, multi_model=True)#

Initializes the instance of the class with given parameters.

Parameters:

presence (str) – Describes the presence attribute, default is an empty string.
parent (Union[FuncUnit, Model]) – The parent object which can be of type FuncUnit or Model. Defaults to None.
duplicate (int) – Specifies the count of duplicates. Defaults to 0.
exchangeable (Set[str]) – A set of values indicating exchangeable items. Defaults to an empty set if not provided.
multi_system (bool) – Indicates if the functionality spans across multiple systems. Defaults to False.
multi_model (bool) – Indicates if the functionality spans across multiple models. Defaults to True.

class panorama.systems.models._ModFuFeatures(mandatory=None, accessory=None, forbidden=None, neutral=None, min_mandatory=1, min_total=1, same_strand=False)#

Bases: object

Represents the features of functional units or families in a model.

This class is responsible for managing sets of child elements categorized into mandatory, accessory, forbidden, and neutral groups. It ensures consistency by enforcing rules about the minimum number of mandatory elements, total elements, and whether the elements belong to the same strand. The class also provides methods to access, validate, and manipulate the functional units or families.

mandatory#

Set of mandatory child elements for the model.

Type:: Set[FuncUnit, Family]

min_mandatory#

Minimum number of mandatory child elements required.

Type:: int

accessory#

Set of accessory child elements for the model.

Type:: Set[FuncUnit, Family]

min_total#

Minimum total number of child elements required.

Type:: int

forbidden#

Set of forbidden child elements for the model.

Type:: Set[FuncUnit, Family]

neutral#

Set of neutral child elements for the model.

Type:: Set[FuncUnit, Family]

same_strand#

Indicates whether all child elements must reside on the same strand.

Type:: bool

__init__(mandatory=None, accessory=None, forbidden=None, neutral=None, min_mandatory=1, min_total=1, same_strand=False)#

Initializes an object with specified sets of functional units or families categorized as mandatory, accessory, forbidden, and neutral. It also allows configuration of constraints on minimum requirements and positional restrictions.

Parameters:

mandatory (Set[FuncUnit, Family], optional) – A set of functional units or families that must be included. Defaults to an empty set if not specified.
accessory (Set[FuncUnit, Family], optional) – A set of functional units or families that are optional to include. Defaults to an empty set if not specified.
forbidden (Set[FuncUnit, Family], optional) – A set of functional units or families that must not be included. Defaults to an empty set if not specified.
neutral (Set[FuncUnit, Family], optional) – A set of functional units or families that are neutral in relevance. Defaults to an empty set if not specified.
min_mandatory (int, optional) – The minimum number of functional units or families that must be included from the mandatory set. Defaults to 1.
min_total (int, optional) – The minimum total number of functional units or families that must be included overall. Defaults to 1.
same_strand (bool, optional) – A flag indicating whether all included functional units or families must be on the same strand. Defaults to False.

_check()#

Validates the configuration of mandatory and total required elements for a specific type. Ensures that the mandatory elements meet the minimum requirements, both in count and presence, and that the total elements conform to the defined minimum limits.

Raises:

Exception – If the number of mandatory elements is lower than the required minimum.
Exception – If the number of total elements is lower than the required minimum.
Exception – If the minimum mandatory count exceeds the minimum total count.
Exception – If no mandatory elements are present.

_child_names(presence=None)#

Retrieves the names of child objects based on their presence status.

Parameters:: presence (str) – A string representing the presence status of child objects to filter (e.g., “active”, “inactive”). If None, retrieves names of all child objects.
Returns:: set – A set containing the names of child objects that match the given presence status, or all child names if presence is None.

_duplicate(filter_type=None)#

Filters and yields child elements based on a specific filter type.

This function allows filtering of child elements according to a specified attribute type, such as ‘mandatory’, ‘accessory’, ‘forbidden’, or ‘neutral’. If no filter type is provided, it selects all available children. It iterates through the filtered children and yields those with a duplicate value of 1 or higher.

Parameters:

filter_type (str) – The type of filter to apply to the children. Acceptable values are ‘mandatory’, ‘accessory’, ‘forbidden’, ‘neutral’, or None. Defaults to None.

Yields:

object –

Each child element that matches the filter condition and has a: duplicate value of 1 or higher.

_mk_child_getter()#

Creates a dictionary for retrieving child objects by their names.

This method iterates over the _children attribute and maps each child’s name to the child object itself, storing these key-value pairs in the _child_getter dictionary. This allows efficient access to child objects later based on their names.

add(child)#

Adds a child element to one of the sets in the instance based on its presence attribute. The child can either be of the type FuncUnit or Family. The presence attribute of the child determines which set (mandatory, accessory, forbidden, or neutral) the child will be added to.

Parameters:: child (Union[FuncUnit, Family]) – The element to be added. The element must have a presence attribute which determines the specific set it belongs to. If it is of the type FuncUnit, its check_func_unit method will be invoked before proceeding.

property child_type#

Determines the consistent child type for the instance.

Raises:: Exception – If the child type among children is inconsistent.
Returns:: Any – The consistent child type of the instance.

get(name)#

Retrieves a child object associated with the given name.

Parameters:: name (str) – The name of the child object to retrieve.
Returns:: Union[FuncUnit, Family] – The child object associated with the given name.
Raises:: KeyError – If no child object is found for the given name.
Return type:: Union[FuncUnit, Family]

panorama.systems.models.check_dict(data_dict, mandatory_keys, param_keys=None)#

Performs validation of the provided dictionary by checking for required keys, key types, value types, and specific constraints based on the content of the dictionary.

This function takes a data dictionary and validates its structure and contents against the provided required keys and optional parameter keys. Validation checks include ensuring required keys exist, verifying data types for key values, and matching key values against accepted criteria. Raises specific exceptions if any checks fail.

Parameters:

data_dict (Dict[str, Union[str, int, list, Dict[str, int]]]) – The dictionary that needs to be validated.
mandatory_keys (Set[str]) – A list of keys that must be present in the dictionary.
param_keys (Set[str]) – An optional list of keys required in the ‘parameters’ field of the dictionary, if present. Defaults to an empty list.

Raises:

KeyError – If required top-level keys are missing, or unexpected keys are found in the dictionary.
TypeError – If a field contains a value of an unexpected type.
ValueError – If a field contains a value that does not meet the required constraints.
Exception – If any unexpected issues occur during the validation process.

Return type:: None

panorama.systems.models.check_key(data, required_keys)#

Validates that all required keys are present in the given dictionary.

Parameters:

data (Dict) – The dictionary to validate.
required_keys (Set[str]) – Set of keys that must be present in the dictionary.

Raises:

KeyError – If any required keys are missing from the dictionary.

Return type:

None

panorama.systems.models.check_parameters(param_dict, mandatory_keys)#

Validates the parameters in the given dictionary against the required keys and their rules.

This function ensures that all the required keys are present in the dictionary and that their values adhere to the specified type and value constraints. If any rule is violated, an exception is raised. The function is designed to handle specific parameter keys with associated rules, raising meaningful errors for incorrect usage.

Parameters:

param_dict (Dict[str, int]) – The dictionary containing parameter keys and their values.
mandatory_keys (Set[str]) – The list of keys that are required to be present in param_dict.

Raises:

KeyError – If one or more required keys are missing from param_dict or an unexpected parameter key is found.
TypeError – If a parameter value in param_dict does not match the expected type.
ValueError – If a parameter value in param_dict does not meet the defined constraints.
Exception – If an unexpected error occurs during parameter validation.

Return type:: None

`Models`	Represents a collection of models and provides interfaces for interaction.
`Model`	A Model class representing rules which describe a biological system.
`FuncUnit`	Represents a Functional Unit class that models functional and operational parameters, structural constraints, and various functional units and families.
`Family`	Represents a family entity with configurable attributes and methods to manage its properties.

panorama.systems.system module#

This module defines classes for representing biological systems detected in pangenomes, including System, SystemUnit, and ClusterSystems. It provides methods for managing system units, gene families, modules, spots, and regions, as well as utilities for comparing, merging, and clustering systems across multiple pangenomes.

panorama.systems.system.check_instance_of_system(method)#

Decorator to ensure that a provided argument is an instance of SystemUnit.

Parameters:: method (Callable) – The method to be wrapped in type-checking functionality.
Returns:: Callable – The wrapped method with type-checking functionality added.
Raises:: TypeError – If the other argument passed to the wrapped method is not an instance of SystemUnit.

panorama.systems.system.check_instance_of_system_unit(method)#

Decorator to ensure that a provided argument is an instance of SystemUnit.

Parameters:: method (Callable) – The method to be wrapped in type-checking functionality.
Returns:: Callable – The wrapped method with type-checking functionality added.
Raises:: TypeError – If the other argument passed to the wrapped method is not an instance of SystemUnit.

`SystemUnit`	Represents a functional unit detected in a pangenome system, associating a FuncUnit model with gene families, annotation sources, and metadata.
`System`	Represents a biological system detected in a pangenome, composed of one or more functional units.
`ClusterSystems`	Represents a cluster of systems that are considered homologous or functionally equivalent across different pangenomes.

panorama.systems.systems_association module#

This module provides functionality for associating systems with other pangenome elements such as RGPs (Regions of Genomic Plasticity), spots, and modules.

The module creates correlation matrices and visualizations to analyze the relationships between systems and various pangenome components.

panorama.systems.systems_association._get_element_frequency(element, system_organisms, pangenome)#

Calculate the frequency of a Spot or Module element across organisms.

Parameters:

element (Union[Spot, Module]) – The Spot or Module element.
system_organisms (Set) – Set of organisms associated with systems.
pangenome (Pangenome) – The pangenome containing organism information.

Returns:

The frequency of the element among organisms.

Return type:

float

panorama.systems.systems_association._get_region_frequency(region, pangenome)#

Calculate the frequency of a Region element across all organisms.

Parameters:

region (Region) – The Region element for frequency calculation.
pangenome (Pangenome) – The pangenome containing organism information.

Returns:

The frequency of the Region in the pangenome.

Return type:

float

Note

TODO: This implementation needs to be fixed to properly calculate region frequency across organisms.

panorama.systems.systems_association.create_coverage_dataframe(element_to_systems, pangenome=None)#

Create a DataFrame describing coverage of systems by pangenome elements.

Parameters:

element_to_systems (Dict[Union[Region, Spot, Module], Set[System]]) – Dictionary mapping pangenome elements to system sets.
pangenome (Optional[Pangenome]) – Optional pangenome object for frequency calculations.

Returns:

DataFrame with coverage and frequency information.

Return type:

DataFrame

panorama.systems.systems_association.create_pangenome_system_associations(pangenome, associations, output_dir, output_formats=None, threads=1, disable_bar=False)#

Create and save associations between systems and pangenome elements.

This function generates association matrices, coverage analysis, and visualizations for the relationships between systems and various pangenome components (RGPs, spots, modules).

Parameters:

pangenome (Pangenome) – The pangenome containing systems and other elements.
associations (List[str]) – List of pangenome elements to associate with systems. Valid options: [‘RGPs’, ‘spots’, ‘modules’]
output_dir (Path) – Directory where output files will be saved.
output_formats (Optional[List[str]]) – List of output formats for visualizations. Valid options: [‘html’, ‘png’]. Defaults to [‘html’].
threads (int) – Number of threads for parallel processing. Defaults to 1.
disable_bar (bool) – Whether to disable the progress bar display. Defaults to False.

Raises:

ValueError – If invalid association types are provided.
FileNotFoundError – If the output directory doesn’t exist.

panorama.systems.systems_association.get_association_dataframes(pangenome, associations, threads=1, disable_progress_bar=False)#

Generate DataFrames for system-pangenome element associations.

Parameters:

pangenome (Pangenome) – Pangenome containing systems and elements.
associations (List[str]) – List of pangenome elements to associate with systems.
threads (int) – Number of threads for parallel processing.
disable_progress_bar (bool) – Whether to disable the progress bar.

Returns:

Tuple containing –

Association DataFrame (systems to elements)
RGP coverage DataFrame
Spot coverage DataFrame
Module coverage DataFrame

Raises:

ValueError – If no systems are found in the pangenome.

Return type:

Tuple[DataFrame, DataFrame, DataFrame, DataFrame]

panorama.systems.systems_association.preprocess_association_data(dataframe, association)#

Preprocess association data to create a correlation matrix.

Parameters:

dataframe (DataFrame) – Association DataFrame between systems and pangenome objects.
association (str) – Type of pangenome object for association.

Returns:

Preprocessed correlation matrix DataFrame.

Return type:

DataFrame

panorama.systems.systems_association.process_system(system, associations, rgp_to_systems, spot_to_systems, module_to_systems)#

Process a single system and update association mappings.

Parameters:

system (System) – The system to process.
associations (List[str]) – List of association types to include.
rgp_to_systems (defaultdict) – Mapping from RGPs to systems (updated in-place).
spot_to_systems (defaultdict) – Mapping from spots to systems (updated in-place).
module_to_systems (defaultdict) – Mapping from modules to systems (updated in-place).

Returns:

Tuple of system ID and system data list.

Return type:

Tuple[str, List[str]]

panorama.systems.systems_association.write_correlation_matrix_visualization(association_df, association, coverage_df, pangenome_name, output_dir, frequency_df=None, output_formats=None)#

Generate and save correlation matrix visualization.

Parameters:

association_df (DataFrame) – Association DataFrame between systems and pangenome objects.
association (str) – Type of pangenome object to visualize.
coverage_df (DataFrame) – Coverage DataFrame for the association.
pangenome_name (str) – Name of the pangenome.
output_dir (Path) – Directory to save output files.
frequency_df (Optional[DataFrame]) – Optional frequency DataFrame.
output_formats (Optional[List[str]]) – List of output formats (default: [‘html’]).

Raises:

ValueError – If an unsupported output format is specified.

AssociationVisualizationBuilder

Builder for correlation matrix visualizations between systems and genomic associations.

panorama.systems.systems_partitions module#

Systems partitions visualization module for pangenome analysis.

This module provides functionality to create heatmap visualizations for pangenome systems partitions and system counts across organisms.

panorama.systems.systems_partitions.preprocess_data(data, disable_bar=False)#

Preprocess data to draw a partition heatmap figure for the pangenome.

Parameters:

data (DataFrame) – Projection of pangenome systems
disable_bar (bool) – If True, disable the progress bar. Defaults to False.

Return type:

DataFrame

panorama.systems.systems_partitions.preprocess_partition_data(data)#

Preprocess data to draw a partition heatmap figure for the pangenome.

Parameters:: data (DataFrame) – Data used to produce the heatmap.
Return type:: DataFrame

panorama.systems.systems_partitions.systems_partition(name, system_projection, output, output_formats=None)#

Create heatmap visualizations for pangenome systems partitions.

This function serves as the main entry point for generating both partition and count heatmap visualizations from pangenome system projection data.

Parameters:

name (str) – Name of the pangenome for visualization titles.
system_projection (DataFrame) – DataFrame containing system projection data with columns: [‘system number’, ‘system name’, ‘organism’, ‘partition’].
output (Path) – Path to the directory where output files will be saved.
output_formats (List[str]) – List of output formats for visualizations. Valid options: [‘html’, ‘png’]. Defaults to [‘html’].

Raises:

ValueError – If required columns are missing from system_projection DataFrame.

Return type:

None

SystemsPartitionVisualizer

Visualizer for pangenome systems partition distributions.

panorama.systems.systems_projection module#

This module provides functions to project systems onto genomes.

panorama.systems.systems_projection._custom_agg(series, unique=False)#

Aggregate a column

Parameters:

series (Series) – series to aggregate
unique (bool) – whether to return unique values or not

Returns:

The aggregated series

panorama.systems.systems_projection.compute_gene_components(model_genes, window_size)#

Compute gene components within a specified window size in the contigs of an organism.

Parameters:

model_genes (Set[Gene]) – Set of genes in one organism corresponding to model gene families.
window_size (int) – The size of the window to consider for grouping genes.

Returns:

List[List[Gene]] – A list of components, each containing genes that are within the specified window.

Return type:

List[List[Gene]]

panorama.systems.systems_projection.compute_genes_graph(model_genes, unit)#

Compute the genes graph for a given genomic context in an organism.

Parameters:

model_genes (Set[Gene]) – Set of genes in one organism corresponding to model gene families.
unit (SystemUnit) – The unit of interest.

Returns:

nx.Graph – A genomic context graph for the given organism.

Return type:

Graph

panorama.systems.systems_projection.custom_agg(series)#

Aggregate a column

Parameters:: series (Series) – series to aggregate
Returns:: The aggregated series

panorama.systems.systems_projection.custom_agg_unique(series)#

Aggregate a column

Parameters:: series (Series) – series to aggregate
Returns:: The aggregated series

panorama.systems.systems_projection.eliminate_empty(org_df)#

Removes systems with no model genes left.

Parameters:: org_df (pd.DataFrame) – A DataFrame with at least the columns “system number” and “category”. The “category” column is used to identify “model” genes, and the “system number” column groups rows into systems.
Returns:: pd.DataFrame – A DataFrame containing only the systems that have at least one “model” gene. The rows are concatenated and re-indexed.

panorama.systems.systems_projection.eliminate_systems(org_df, org_df_filtered)#

Eliminates systems from a filtered DataFrame based on model family changes. This function ensures that systems with any eliminated model families in the filtered dataset are removed entirely.

Parameters:

org_df – pandas.DataFrame containing the original dataset with all systems and associated gene families.
org_df_filtered – pandas.DataFrame containing the already filtered dataset, which may have excluded some gene families or systems.

Returns:

pandas.DataFrame filtered to exclude entire systems where any model families were missing after the
initial filtering step.

panorama.systems.systems_projection.extract_numeric_for_sorting(val)#

Function to extract the numeric value for sorting while keeping the original value

Parameters:: val – the value
Returns:: float – the numeric value
Return type:: float

panorama.systems.systems_projection.get_org_df(org_df)#

Get the reformated projection dataframe for an organism

Parameters:: org_df (DataFrame) – Dataframe for the corresponding organism
Returns:: pd.DataFrame – Dataframe reformated for an organism
Return type:: Tuple[DataFrame, str]

TODO: This function is not used anymore, should we remove it?

panorama.systems.systems_projection.get_org_df_one_unit_per_fam(org_df, eliminate_filtered_systems=False, eliminate_empty_systems=False)#

Filters and processes a DataFrame to retain only one representative unit per gene family based on completeness, and optionally eliminates certain systems based on filtering criteria. Also calculates overlapping information.

Parameters:

org_df – The input DataFrame containing organism data with details such as “gene family”, “system name”, “functional unit name”, “completeness”, and other related columns.
eliminate_filtered_systems – Flag indicating whether to remove systems where any of their model families were filtered out due to lower completeness.
eliminate_empty_systems – Flag indicating whether to remove systems with no model families left after filtering.

Returns:

Tuple containing –

A processed and filtered DataFrame with one row per unit per gene family, with overlapping information added and optional system elimination applied.
The unique organism name derived from the input DataFrame.

Return type:

Tuple[DataFrame, str]

panorama.systems.systems_projection.get_partition(series)#

Parameters:: series (Series)

Returns:

panorama.systems.systems_projection.has_short_path(graph, node_list, n)#

Checks if there exists at least one path of length less than n connecting any two nodes in the given list of nodes in the graph.

Parameters:

graph (Graph) – the graph to search paths
node_list (List[GeneFamily]) – List of gene families to check for paths.
n (int) – The maximum length of the path to consider.

Returns:

bool – True if there exists at least one path of length less than n connecting any two nodes in the list, False otherwise.

Return type:

bool

panorama.systems.systems_projection.project_pangenome_systems(pangenome, system_source, association=None, canonical=False, threads=1, lock=None, disable_bar=False)#

Project systems onto all organisms in a pangenome.

Parameters:

pangenome (Pangenome) – The pangenome to project.
system_source (str) – Source of the systems to project.
association (List[str]) – List of associations to include (e.g., ‘RGPs’, ‘spots’).
canonical (bool) – If True, write the canonical version of systems too. Defaults to False.
threads (int) – Number of threads available (default is 1).
lock (Lock) – Global lock for multiprocessing execution (default is None).
disable_bar (bool) – Disable progress bar (default is False).

Returns:

Tuple[pd.DataFrame, pd.DataFrame] – Two DataFrames containing the projections for each organism and the pangenome

Return type:

Tuple[DataFrame, DataFrame]

panorama.systems.systems_projection.project_unit_on_organisms(components, unit, model_genes, association=None)#

Projects a system unit onto a given organism’s pangenome.

Parameters:

components (List[List[Gene]]) – List of gene components to project.
unit (SystemUnit) – The unit to be projected.
model_genes (Set[Gene]) – Set of genes in one organism corresponding to model gene families.
association (List[str], optional) – List of associations to include (e.g., ‘RGPs’, ‘spots’).

Returns:

A list of projected system information for the organism.

panorama.systems.systems_projection.system_projection(system, fam_index, gene_family2family, association=None)#

Project a system onto all organisms in a pangenome.

Parameters:

system (System) – The system to project.
fam_index (Dict[GeneFamily, int]) – Index mapping gene families to their positions.
gene_family2family (Dict[GeneFamily, Set[Family]]) – Dictionary linking a gene family to model families.
association (List[str]) – List of associations to include (e.g., ‘RGPs’, ‘spots’).

Returns:

Tuple[pd.DataFrame, pd.DataFrame] – Two DataFrames containing the projected system for the pangenome and organisms.

Return type:

Tuple[DataFrame, DataFrame]

panorama.systems.systems_projection.unit_projection(unit, gf2fam, fam_index, association=None)#

Project a system unit onto all organisms in a pangenome.

Parameters:

unit (SystemUnit) – The system unit to project.
gf2fam (Dict[GeneFamily, set[Family]]) – Dictionary linking a pangenome gene family to a model family.
fam_index (Dict[GeneFamily, int]) – Index mapping gene families to their positions.
association (List[str]) – List of associations to include (e.g., ‘RGPs’, ‘spots’).

Returns:

Tuple[pd.DataFrame, pd.DataFrame] – Two DataFrames containing the projected system for the pangenome and organisms.

Return type:

Tuple[DataFrame, DataFrame]

panorama.systems.systems_projection.write_projection_systems(output, pangenome_projection, organisms_projection, organisms=None, threads=1, force=False, disable_bar=False)#

Write the projected systems to output files.

Parameters:

output (Path) – Path to the output directory.
pangenome_projection (DataFrame) – DataFrame containing the pangenome projection.
organisms_projection (DataFrame) – DataFrame containing the organism projections.
organisms (List[str]) – List of organisms to project (default is all organisms).
threads (int) – Number of threads to use for parallel processing. Defaults to 1.
force (bool) – Force write to the output directory (default is False).
disable_bar (bool) – If True, disable the progress bar. Defaults to False.

Returns:

None

panorama.systems.utils module#

This module provides utility functions to detect and write biological systems in pangenomes.

panorama.systems.utils.check_for_families(gene_families, gene_fam2mod_fam, mod_fam2meta_source, func_unit)#

Evaluate gene families against a functional unit to detect forbidden, mandatory, and accessory family conditions.

Parameters:

gene_families (Set[GeneFamily]) – Gene families to evaluate.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Map from gene families to their model families.
mod_fam2meta_source (Dict[str, str]) – Map from model family name to metadata source.
func_unit (FuncUnit) – Functional unit definition to check against.

Returns:

Tuple[bool, Dict[GeneFamily, Tuple[str, int]]] –

A boolean indicating whether the conditions are satisfied (False immediately if a forbidden family is found).
A mapping from gene families to their selected metadata (source, meta_id).

Return type:

Tuple[bool, Dict[GeneFamily, Tuple[str, int]]]

panorama.systems.utils.check_needed_families(matrix, func_unit)#

Check if there are enough mandatory and total families to satisfy the functional unit rules.

Parameters:

matrix (DataFrame) – The association matrix between gene families and families
func_unit (FuncUnit) – The functional unit to search for.

Returns:

Boolean – True if satisfied, False otherwise

Return type:

bool

Notes

This function assumes that a family could play multiple roles to satisfy the model requirements if it has multiple annotations

panorama.systems.utils.conciliate_partition(partition)#

Conciliate a set of partition

Parameters:: partition (Set[str]) – All partitions.
Returns:: str – The reconciled partition.
Return type:: str

panorama.systems.utils.dict_families_context(model, annot2fam)#

Retrieves all gene families associated with the families in the model.

Parameters:

model (Model) – Model containing the families.
annot2fam (Dict[str, Dict[str, Set[GeneFamily]]]) – Dictionary of annotated families.

Returns:

tuple – A tuple containing: - dict: Dictionary linking gene families to their families. - dict: Dictionary linking families to their sources.

Return type:

Tuple[Dict[GeneFamily, Set[Family]], Dict[str, str]]

panorama.systems.utils.filter_global_context(graph, jaccard_threshold=0.8)#

Filters the edges of a gene family graph based on a Jaccard gene proportion threshold.

Copies all nodes to a new graph and retains only those edges where both connected GeneFamily nodes have a Jaccard gene proportion (shared genomes over unique organisms) greater than or equal to the specified threshold. Updates edge data with Jaccard values and family names.

Parameters:

graph (nx.Graph) – The input graph with GeneFamily nodes and edge data containing ‘genomes’.
jaccard_threshold (float, optional) – Minimum Jaccard gene proportion required for both families to retain an edge. Defaults to 0.8.

Returns:

nx.Graph[GeneFamily] – A new graph with filtered edges and updated edge attributes.

panorama.systems.utils.filter_local_context(graph, organisms, jaccard_threshold=0.8)#

Filters a graph based on a local Jaccard index.

Parameters:

graph (nx.Graph) – A sub-pangenome graph.
organisms (Set[Organism]) – Organisms where edges between families of interest exist. Default is None
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Default is 0.8.

panorama.systems.utils.filter_local_context_old(graph, organisms, jaccard_threshold=0.8)#

Filters a graph based on a local Jaccard index.

Parameters:

graph (nx.Graph) – A sub-pangenome graph.
organisms (Set[Organism]) – Organisms where edges between families of interest exist. Default is None
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Default is 0.8.

panorama.systems.utils.get_gfs_matrix_combination(gene_families, gene_fam2mod_fam)#

Build a matrix of association between gene families and families.

Parameters:

gene_families (Set[GeneFamily]) – Set of gene families.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to model families.

Returns:

pd.DataFrame – Matrix of association between gene families and families.

Return type:

DataFrame

panorama.systems.utils.get_metadata_to_families(pangenome, sources)#

Retrieves a mapping of metadata to sets of gene families for each metadata source.

Parameters:

pangenome (Pangenome) – Pangenome object containing gene families.
sources (Iterable[str]) – List of metadata source names.

Returns:

dict – A dictionary where each metadata source maps to another dictionary of metadata to sets of gene families.

Return type:

Dict[str, Dict[str, Set[GeneFamily]]]

VisualizationBuilder

Abstract base class for building correlation matrix and partition visualizations.

panorama.systems.write_systems module#

This module provides functions to write information into the pangenome file

panorama.systems.write_systems.check_pangenome_write_systems(pangenome, sources)#

Check and load pangenome information before adding annotation.

Parameters:

pangenome (Pangenome) – The Pangenome object.
sources (List[str]) – Sources used to detect systems.

Raises:

KeyError – If the provided systems source is not in the pangenome.
Exception – If systems have not been detected in pangenome.
AttributeError – If there is no metadata associated with families.

Return type:

None

panorama.systems.write_systems.check_write_systems_args(args)#

Checks the provided arguments to ensure that they are valid.

Parameters:

args (Namespace) – The parsed arguments.

Returns:

Dict[str, Any] – A dictionary containing the necessary information for further processing.

Raises:

argparse.ArgumentTypeError – If the number of sources is different from models, or if annotations are given, and
their number is different from systems sources.

Return type:

Dict[str, Any]

panorama.systems.write_systems.launch(args)#

Launch functions to read systems.

Parameters:: args – Argument given.

panorama.systems.write_systems.parser_write(parser)#

Parser for specific arguments of the write_systems command.

Parameters:: parser (argparse.ArgumentParser) – Parser for annot argument.

panorama.systems.write_systems.subparser(sub_parser)#

Subparser to launch PANORAMA in the Command line.

Parameters:: sub_parser – Subparser for align command.
Returns:: argparse.ArgumentParser – Parser arguments for align command.
Return type:: ArgumentParser

panorama.systems.write_systems.write_flat_systems_to_pangenome(pangenome, output, projection=False, association=None, partition=False, proksee=None, output_formats=None, organisms=None, canonical=False, threads=1, lock=None, force=False, disable_bar=False)#

Write detected systems from a pangenome to an output directory in a flat format.

Parameters:

pangenome (Pangenome) – The pangenome object containing the detected systems.
output (Path) – The directory where the systems will be written.
projection (bool) – If True, write projection systems. Defaults to False.
association (List[str]) – List of associations to be considered. Defaults to None.
partition (bool) – If True, write partition systems. Defaults to False.
proksee (str) – A placeholder for future Proksee integration. Defaults to None.
output_formats (List[str]) – A list of output formats for visualization. Defaults to None.
organisms (List[str]) – List of organisms to be considered for projection. Defaults to None.
canonical (bool) – If True, write the canonical version of systems too. Defaults to False.
threads (int) – Number of threads to use for parallel processing. Defaults to 1.
lock (Lock) – A multiprocessing lock to synchronize access. Defaults to None.
force (bool) – If True, overwrite existing files. Defaults to False.
disable_bar (bool) – If True, disable the progress bar. Defaults to False.

Raises:

NotImplementedError – If Proksee integration is requested but not implemented.

panorama.systems.write_systems.write_pangenomes_systems(pangenomes, output, projection=False, association=None, partition=False, proksee=None, output_formats=None, organisms=None, canonical=False, threads=1, lock=None, force=False, disable_bar=False)#

Write flat files about systems for all pangenomes.

Parameters:

pangenomes (Pangenomes) – Pangenome objects with all pangenome.
output (Path) – Path to write flat files about systems.
projection (bool) – Flag to enable/disable pangenome projection. Defaults to False.
association (List[str]) – Write systems association to the given pangenome object. Defaults to None.
partition (bool) – Flag to enable write system partition. Defaults to False.
proksee (str) – Write proksee with the systems and the given pangenome object. Defaults to None.
output_formats (List[str]) – List of output formats for visualization. Defaults to None.
organisms (List[str]) – List of organism names to write. Defaults to all organisms.
canonical (bool) – If True, write the canonical version of systems too. Defaults to False.
threads (int) – Number of available threads. Defaults to 1.
lock (Lock) – Global lock for multiprocessing execution. Defaults to None.
force (bool) – Flag to allow overwriting files. Defaults to False.
disable_bar (bool) – Flag to disable the progress bar. Defaults to False.