panorama.systems package#
Submodules#
panorama.systems.detection module#
This module provides functions to detect biological systems in pangenomes.
- panorama.systems.detection.check_detection_args(args: Namespace) Dict[str, bool | str | List[str]]#
Checks and processes the provided arguments to ensure they are valid.
- Parameters:
args (argparse.Namespace) – The parsed command-line arguments.
- Returns:
Dict[str, Union[bool, str, List[str]]] – A dictionary indicating necessary information to load the pangenome.
- Raises:
argparse.ArgumentTypeError – If ‘jaccard’ is not a restricted float.
- panorama.systems.detection.check_for_forbidden_unit(su_found: Dict[str, Set[SystemUnit]], model: Model) bool#
Checks if there are forbidden system units.
- Parameters:
su_found (Dict[str, Set[SystemUnit]]) – Dictionary with all system units found sorted by name.
model (Model) – Model corresponding to the system checked.
- Returns:
bool – True if forbidden conditions are encountered, False otherwise.
- panorama.systems.detection.check_for_needed_units(su_found: Dict[str, Set[SystemUnit]], model: Model) bool#
Checks if the presence/absence rules for necessary functional units are respected.
- Parameters:
su_found (Dict[str, Set[SystemUnit]]) – Dictionary with all system units found sorted by name.
model (Model) – Model corresponding to the system checked.
- Returns:
bool – True if forbidden conditions are encountered, False otherwise.
- panorama.systems.detection.check_pangenome_detection(pangenome: Pangenome, metadata_sources: List[str], systems_source: str, force: bool = False)#
Checks and loads pangenome information before adding families.
- Parameters:
pangenome (Pangenome) – Pangenome object to be checked.
metadata_sources (List[str]) – Sources used to associate families with gene families.
systems_source (str) – Source used to detect systems.
force (bool, optional) – If True, forces the erasure of pangenome systems from the source. Defaults to False.
- Raises:
KeyError – If the provided annotation source is not in the pangenome.
ValueError – If systems are already detected based on the source and ‘force’ is not used.
AttributeError – If there is no metadata associated with families.
- panorama.systems.detection.get_functional_unit_gene_families(func_unit: FuncUnit, gene_families: Set[GeneFamily], gene_fam2mod_fam: Dict[GeneFamily, Set[Family]]) Tuple[Set[GeneFamily], Set[GeneFamily]]#
Retrieves the gene families that might be in the functional unit.
- Parameters:
func_unit (FuncUnit) – The functional unit to consider.
gene_families (Set[GeneFamily]) – Set of gene families that might be in the system.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to model families.
- Returns:
Tuple[Set[GeneFamily], Set[GeneFamily]] –
gene families that code for the functional unit
neutral gene families of the function unit.
- panorama.systems.detection.get_subcombinations(combi: Set[GeneFamily], combinations: List[FrozenSet[GeneFamily]]) List[FrozenSet[GeneFamily]]#
Removes a combination and all its sub-combinations from a list of combinations.
- Parameters:
combi (Set[GeneFamily]) – Combination to be removed.
combinations (List[FrozenSet[GeneFamily]]) – List of combinations to be filtered.
- Returns:
List[FrozenSet[GeneFamily]] – List of removed combinations.
- panorama.systems.detection.get_system_unit_combinations(su_found: Dict[str, Set[SystemUnit]], model: Model) List[List[SystemUnit]]#
Generates combinations of system units that could code for a system.
The function generates combinations from mandatory, optional, and neutral categories, ensuring that model parameters are respected. Combinations with and without elements from neutral categories are generated.
- Parameters:
su_found (Dict[str, Set[SystemUnit]]) – A dictionary where keys are functional unit names and values are sets of elements belonging to each functional unit model.
model (Model) – Model corresponding to the functional unit.
- Returns:
List[List[SystemUnit]] – A list of all possible valid combinations based on the model.
- panorama.systems.detection.launch(args)#
Launches functions to detect systems in pangenomes.
- Parameters:
args (argparse.Namespace) – Argument given in CLI.
- panorama.systems.detection.parser_detection(parser)#
Adds arguments to the parser for the ‘systems’ command.
- Parameters:
parser (argparse.ArgumentParser) – Parser for the ‘systems’ command.
- panorama.systems.detection.search_for_system(model: Model, su_found: Dict[str, Set[SystemUnit]], source: str, jaccard_threshold: float = 0.8) Set[System]#
Searches for a system corresponding to the model based on the units found.
- Parameters:
model (Model) – Model corresponding to the system searched.
su_found (Dict[str, Set[SystemUnit]]) – The system units found for the model.
source (str) – Name of the annotation source.
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
- Returns:
set of System – Systems detected.
- panorama.systems.detection.search_system(model: Model, gf2fam: Dict[GeneFamily, Set[Family]], fam2source: Dict[str, str], source: str, jaccard_threshold: float = 0.8, sensitivity: int = 1) Set[System]#
Searches for a model system in a pangenome.
- Parameters:
model (Model) – Model to search in the pangenome.
gf2fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to their families.
fam2source (Dict[str, str]) – Dictionary linking families to their sources.
source (str) – Name of the annotation source.
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
sensitivity (int, optional) – Sensitivity level for detection: - 1. corresponds to a global Jaccard filtering on the context without looking at all the combinations. - 2. corresponds to a global Jaccard filtering on the specific context of each combination. - 3. corresponds to a local Jaccard filtering on the specific context of each combination. - Defaults to 1.
- Returns:
set of System – Set of systems detected in the pangenome for the given model.
- panorama.systems.detection.search_system_units(model: Model, gf2fam: Dict[GeneFamily, Set[Family]], fam2source: Dict[str, str], source: str, jaccard_threshold: float = 0.8, sensitivity: int = 1) Dict[str, Set[SystemUnit]]#
Searches for system units corresponding to a model.
- Parameters:
model (Model) – Model corresponding to the system searched.
gf2fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to their families.
fam2source (Dict[str, str]) – Dictionary linking families to their sources.
source (str) – Name of the annotation source.
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
sensitivity (int, optional) – Sensitivity level for detection: - 1. corresponds to a global Jaccard filtering on the context without looking at all the combinations. - 2. corresponds to a global Jaccard filtering on the specific context of each combination. - 3. corresponds to a local Jaccard filtering on the specific context of each combination. - Defaults to 1.
- Returns:
Dict[str, Set[SystemUnit]] – System units found with their name as the key and units as value.
- Raises:
ValueError – If sensitivity is not 1, 2, or 3.
- panorama.systems.detection.search_systems(models: Models, pangenome: Pangenome, source: str, metadata_sources: List[str], jaccard_threshold: float = 0.8, sensitivity: int = 1, disable_bar: bool = False)#
Searches for systems present in the pangenome for all models.
- Parameters:
models (Models) – Models to search in pangenomes.
pangenome (Pangenome) – Pangenome object containing gene families.
source (str) – Name of the source for the system.
metadata_sources (List[str]) – List of the metadata sources for the families.
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
disable_bar (bool, optional) – If True, disables the progress bar. Defaults to False.
sensitivity (int, optional) – Sensitivity level for detection: - 1. corresponds to a global Jaccard filtering on the context without looking at all the combinations. - 2. corresponds to a global Jaccard filtering on the specific context of each combination. - 3. corresponds to a local Jaccard filtering on the specific context of each combination. - Defaults to 1.
- panorama.systems.detection.search_systems_in_pangenomes(models: Models, pangenomes: Pangenomes, source: str, metadata_sources: List[str], jaccard_threshold: float = 0.8, sensitivity: int = 1, threads: int = 1, lock: Lock = None, disable_bar: bool = False)#
Searches for systems in pangenomes by multithreading on pangenomes.
- Parameters:
models (Models) – Models to search in pangenomes.
pangenomes (Pangenomes) – Getter object with Pangenome.
source (str) – Name of the source for the system.
metadata_sources (List[str]) – List of the metadata sources for the families.
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
threads (int, optional) – Number of available threads. Defaults to 1.
lock (Lock, optional) – Global lock for multiprocessing execution. Defaults to None.
disable_bar (bool, optional) – If True, disables the progress bar. Defaults to False.
sensitivity (int, optional) – Sensitivity level for detection: - 1. corresponds to a global Jaccard filtering on the context without looking at all the combinations. - 2. corresponds to a global Jaccard filtering on the specific context of each combination. - 3. corresponds to a local Jaccard filtering on the specific context of each combination. - Defaults to 1.
- panorama.systems.detection.search_unit_in_cc(graph: Graph, system_gf: Set[GeneFamily], func_unit: FuncUnit, source: str, gf2fam: Dict[GeneFamily, Set[Family]], fam2source: Dict[str, str], detected_su: Set[SystemUnit]) Set[FrozenSet[GeneFamily]]#
Searches for functional units in connected components of a graph.
- Parameters:
graph (nx.Graph) – The graph to search within.
system_gf (Set[GeneFamily]) – Set of gene families in the system.
func_unit (FuncUnit) – Functional unit to search for.
source (str) – Source of the data.
gf2fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to their families.
fam2source (Dict[str, str]) – Dictionary linking families to their sources.
detected_su (Set[SystemUnit]) – Set of already detected system units.
- Returns:
Set[FrozenSet[GeneFamily]] – Set of found combinations of gene families.
- panorama.systems.detection.search_unit_in_combination(graph: Graph, families: Set[GeneFamily], gene_fam2mod_fam: Dict[GeneFamily, Set[Family]], mod_fam2meta_source: Dict[str, str], func_unit: FuncUnit, source: str, matrix: DataFrame, combinations: List[FrozenSet[GeneFamily]], combinations2orgs: Dict[FrozenSet[GeneFamily], Set[Organism]], jaccard_threshold: float = 0.8, local: bool = False) Set[SystemUnit]#
Searches for functional unit corresponding to a model in a graph.
- Parameters:
graph (nx.Graph) – A graph with families in one connected component.
families (Set[GeneFamily]) – A set of families that code for the searched model.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to their families.
mod_fam2meta_source (Dict[str, str]) – Dictionary linking model families to metadata sources.
func_unit (FuncUnit) – One functional unit corresponding to the model.
source (str) – Name of the source.
matrix (pd.DataFrame) – Dataframe containing association between gene families and unit families.
combinations (List[FrozenSet[GeneFamily]]) – Existing combination of gene families in organisms.
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
combinations2orgs (Dict[FrozenSet[GeneFamily], Set[Organism]]) – The combination of gene families corresponding to the context that exists in at least one genome.
local – (bool, optional): Whether to filter the context with a local Jaccard index or not. Defaults to False.
- Returns:
Set[SystemUnit] – Set of all detected functional units in the graph.
- panorama.systems.detection.search_unit_in_context(graph: Graph, families: Set[GeneFamily], gene_fam2mod_fam: Dict[GeneFamily, Set[Family]], mod_fam2meta_source: Dict[str, str], matrix: DataFrame, func_unit: FuncUnit, source: str, combinations2orgs: Dict[FrozenSet[GeneFamily], Set[Organism]] = None, jaccard_threshold: float = 0.8, local: bool = False) Set[SystemUnit]#
Searches for system unit corresponding to a model in a pangenomic context.
- Parameters:
graph (nx.Graph) – A graph with families in a pangenomic context.
families (Set[GeneFamily]) – A set of families that code for the searched model.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Dictionary linking gene families to model families.
mod_fam2meta_source (Dict[str, str]) – Dictionary linking model families to metadata sources.
func_unit (FuncUnit) – One functional unit corresponding to the model.
source (str) – Name of the source.
matrix (pd.DataFrame) – Dataframe containing association between gene families and unit families.
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Defaults to 0.8.
combinations2orgs (Dict[FrozenSet[GeneFamily], Set[Organism]], optional) – The combination of gene families corresponding to the context that exists in at least one genome. Defaults to None.
local – (bool, optional): Whether to filter the context with a local Jaccard index or not. Defaults to False.
- Returns:
Set[SystemUnit] – Set of detected systems in the pangenomic context.
- panorama.systems.detection.subparser(sub_parser) ArgumentParser#
Creates a subparser to launch PANORAMA from the command line.
- Parameters:
sub_parser – Sub-parser for the ‘systems’ command.
- Returns:
argparse.ArgumentParser – Parser with arguments for the ‘systems’ command.
- panorama.systems.detection.write_systems_to_pangenome(pangenome: Pangenome, source: str, disable_bar: bool = False)#
Writes detected systems to the pangenome.
- Parameters:
pangenome (Pangenome) – Pangenome object containing detected systems.
source (str) – Name of the annotation source.
disable_bar (bool, optional) – If True, disables the progress bar. Defaults to False.
- panorama.systems.detection.write_systems_to_pangenomes(pangenomes: Pangenomes, source: str, threads: int = 1, lock: Lock = None, disable_bar: bool = False)#
Writes detected systems into pangenomes.
- Parameters:
pangenomes (Pangenomes) – Pangenomes object containing all the pangenomes with systems.
source (str) – Metadata source.
threads (int, optional) – Number of available threads. Defaults to 1.
lock (Lock, optional) – Lock for multiprocessing execution. Defaults to None.
disable_bar (bool, optional) – If True, disables the progress bar. Defaults to False.
panorama.systems.dictionary_validation_utils module#
panorama.systems.models module#
This module provides tools to define and validate rules used to detect biological systems.
- class panorama.systems.models.Family(name: str = '', transitivity: int = 0, window: int = 1, presence: str = '', func_unit: FuncUnit = None, duplicate: int = 0, exchangeable: Set[str] = None, multi_system: bool = False, multi_model: bool = False)#
Bases:
_BasicFeatures,_FuFamFeaturesRepresents a family entity with configurable attributes and methods to manage its properties.
This class is designed to encapsulate the attributes and behavior of a “family” entity, providing methods for initialization, property management, and reading family data from structured inputs like JSON files. It supports various configurations related to transitivity, duplicate handling, and exchangeable properties while interacting with functional units.
- name#
The name associated with the family instance, representing its identifier or label.
- Type:
str
- transitivity#
Represents the transitivity value, indicating the structural property of the family.
- Type:
int
- window#
Defines the processing window or range, typically a limit or scope.
- Type:
int
- presence#
Indicates the presence condition or state of the family instance.
- Type:
str
- duplicate#
Parameter controlling the duplication count or flag for this family.
- Type:
int
- exchangeable#
Defines a set of exchangeable features or components related to the family.
- Type:
Set[str]
- multi_system#
Specifies if the family supports operation across multiple systems.
- Type:
bool
- multi_model#
Specifies if the family is configured to handle multiple models.
- Type:
bool
- __init__(name: str = '', transitivity: int = 0, window: int = 1, presence: str = '', func_unit: FuncUnit = None, duplicate: int = 0, exchangeable: Set[str] = None, multi_system: bool = False, multi_model: bool = False)#
Initializes an instance of the class with specified attributes.
- Parameters:
name (str) – The name associated with the instance.
transitivity (int) – Transitivity value, typically indicating the relationship type in a structure.
window (int) – Specifies the window parameter for processing, generally indicates a range or limit.
presence (str) – Describes the presence condition or state of the instance.
func_unit (FuncUnit) – Functional unit associated with the instance represents a unit of functionality.
duplicate (int) – Duplicate control parameter typically denotes count or flag for duplicates.
exchangeable (Set[str]) – Set of exchangeable elements or features related to instance behavior.
multi_system (bool) – Indicates if the instance supports multi-system functionality.
multi_model (bool) – Indicates if the instance supports multiple models or configurations.
- property func_unit: FuncUnit#
Gets the parent
FuncUnitassociated with this instance.- Returns:
FuncUnit – The parent functional unit.
- property model: Model#
Returns the model associated with the functional unit.
- Returns:
Model – The model instance associated with the functional unit.
- read(data_fam: dict)#
Read family.
- Parameters:
data_fam (dict) – Data JSON file with families.
- class panorama.systems.models.FuncUnit(name: str = '', presence: str = '', mandatory: Set[FuncUnit, Family] = None, accessory: Set[FuncUnit, Family] = None, forbidden: Set[FuncUnit, Family] = None, neutral: Set[FuncUnit, Family] = None, min_mandatory: int = 1, min_total: int = 1, same_strand: bool = False, transitivity: int = 0, window: int = 1, duplicate: int = 0, model: Model = None, exchangeable: Set[str] = None, multi_system: bool = False, multi_model: bool = False)#
Bases:
_BasicFeatures,_FuFamFeatures,_ModFuFeaturesRepresents a Functional Unit class that models functional and operational parameters, structural constraints, and various functional units and families. This class provides utilities to manage functional units, their relationships, and interactions.
This class is designed to encapsulate features essential for analyzing and validating relationships between functional sub-elements, families, and systems. It defines the functional unit’s behaviors, attributes, and operations.
- name#
Name of the functional unit.
- Type:
str
- presence#
Defines the presence rule (mandatory, accessory, forbidden, or neutral).
- Type:
str
- min_mandatory#
Minimum number of mandatory sub-elements.
- Type:
int
- min_total#
Minimum number of total sub-elements.
- Type:
int
- same_strand#
Specifies if sub-elements must be on the same strand.
- Type:
bool
- transitivity#
Size of the transitive closure used to build the graph.
- Type:
int
- window#
Number of neighboring genes considered in genomic searches.
- Type:
int
- duplicate#
Number of duplicates allowed.
- Type:
int
- exchangeable#
List of exchangeable families.
- Type:
Set[str]
- multi_system#
Flag indicating if the unit can span multiple systems.
- Type:
bool
- multi_model#
Flag indicating if the unit can span multiple models.
- Type:
bool
- __init__(name: str = '', presence: str = '', mandatory: Set[FuncUnit, Family] = None, accessory: Set[FuncUnit, Family] = None, forbidden: Set[FuncUnit, Family] = None, neutral: Set[FuncUnit, Family] = None, min_mandatory: int = 1, min_total: int = 1, same_strand: bool = False, transitivity: int = 0, window: int = 1, duplicate: int = 0, model: Model = None, exchangeable: Set[str] = None, multi_system: bool = False, multi_model: bool = False)#
Initializes an instance of a class with defined attributes, including functional, operational parameters, and structural constraints. This constructor manages various functionalities and initializes a robust system by leveraging functional units, families, and other attributes.
- Parameters:
name (str) – Name of the functional unit.
presence (str) – Defines the presence rule (mandatory, accessory, forbidden, or neutral).
mandatory (Set[FuncUnit, Family]) – Set of mandatory sub-elements.
accessory (Set[FuncUnit, Family]) – Set of accessory sub-elements.
forbidden (Set[FuncUnit, Family]) – Set of forbidden sub-elements.
neutral (Set[FuncUnit, Family]) – Set of neutral sub-elements.
min_mandatory (int) – Minimum number of mandatory sub-elements.
min_total (int) – Minimum number of total sub-elements.
same_strand (bool) – Specifies if sub-elements must be on the same strand.
transitivity (int) – Size of the transitive closure used to build the graph.
window (int) – Number of neighboring genes considered in genomic searches.
duplicate (int) – Number of duplicates allowed.
model (Model) – Model in which the functional unit is defined.
exchangeable (Set[str]) – List of exchangeable families.
multi_system (bool) – Flag indicating if the unit can span multiple systems.
multi_model (bool) – Flag indicating if the unit can span multiple models.
- add(family: Family)#
Adds a family to one of the sets in the instance based on its
presenceattribute.- Parameters:
family (Family) – The functional to be added.
- check_func_unit()#
Check functional unit consistency.
- Raises:
Exception – If the functional unit is not consistent.
- duplicate_fam(filter_type: str = None)#
Generates items based on the specified filter type, if provided. This method uses an internal mechanism to yield results that match the filtering criteria.
- Parameters:
filter_type (str, optional) – The type of filter to apply. If None, no filtering is applied and all items are yielded.
- Yields:
Any – Items matching the filter criteria.
- property families: Generator[Family, None, None]#
Yields Family objects contained in the current object’s children.
- Yields:
Generator[Family, None, None] – A generator producing
Familyobjects.
- families_names(presence: str = None)#
Returns a filtered list of family names based on the provided presence criteria.
- Parameters:
presence (str, optional) – Filter criteria for the presence of family members. Defaults to None.
- Returns:
list – A list of names matching the given presence criteria.
- get(name: str) Family#
Retrieves an instance of the ‘Family’ class using the provided name.
- Parameters:
name (str) – The name of the ‘Family’ instance to retrieve.
- Returns:
Union[Family] – The retrieved ‘Family’ instance.
- property model: Model#
Retrieves the parent model associated with this instance.
- Returns:
Model – The parent model object.
- read(data_fu: dict)#
Read functional unit.
- Parameters:
data_fu (dict) – Data JSON file of all functional units.
- property size: int#
Gets the size of the collection of families.
The size property calculates and returns the total number of families present in the collection.
- Returns:
int – The total number of families in the collection.
- class panorama.systems.models.Model(name: str = '', mandatory: Set[FuncUnit, Family] = None, accessory: Set[FuncUnit, Family] = None, forbidden: Set[FuncUnit, Family] = None, neutral: Set[FuncUnit, Family] = None, min_mandatory: int = 1, min_total: int = 1, transitivity: int = 0, window: int = 1, same_strand: bool = False, canonical: list = None)#
Bases:
_BasicFeatures,_ModFuFeaturesA Model class representing rules which describe a biological system.
This class provides an abstraction to handle a set of biological rules and their associated functional units and families. It supports operations to query various characteristics of the biological model, including its size, functional units, and families. The Model can also perform checks for consistency and parse data from a given dictionary for initialization and configuration.
- name#
Name of the instance.
- Type:
str
- min_mandatory#
Minimum number of mandatory functional units required.
- Type:
int
- min_total#
Minimum total number of functional elements required.
- Type:
int
- transitivity#
Transitivity value for relationships in functional elements.
- Type:
int
- window#
Window size used for analysis or traversal.
- Type:
int
- same_strand#
Indicates if the functional units should be on the same strand.
- Type:
bool
- canonical#
List containing canonical representations.
- Type:
list
- __init__(name: str = '', mandatory: Set[FuncUnit, Family] = None, accessory: Set[FuncUnit, Family] = None, forbidden: Set[FuncUnit, Family] = None, neutral: Set[FuncUnit, Family] = None, min_mandatory: int = 1, min_total: int = 1, transitivity: int = 0, window: int = 1, same_strand: bool = False, canonical: list = None)#
Initializes attributes of the class and sets up the properties related to the functional units.
- Parameters:
name (str) – Name of the instance.
mandatory (Set[FuncUnit, Family]) – Set of mandatory functional units or families.
accessory (Set[FuncUnit, Family]) – Set of accessory functional units or families.
forbidden (Set[FuncUnit, Family]) – Set of forbidden functional units or families.
neutral (Set[FuncUnit, Family]) – Set of neutral functional units or families.
min_mandatory (int) – Minimum number of mandatory functional units required.
min_total (int) – Minimum total number of functional elements required.
transitivity (int) – Transitivity value for relationships in functional elements.
window (int) – Window size used for analysis or traversal.
same_strand (bool) – Indicates if the functional units should be on the same strand.
canonical (list) – List containing canonical representations.
- add(func_unit: FuncUnit)#
Adds a functional unit to one of the sets in the instance based on its
presenceattribute.- Parameters:
func_unit (FuncUnit) – The functional to be added.
- check_model()#
Validates the consistency of a model by invoking an internal check method.
- Raises:
Exception – Indicates a failure in model consistency, including the name of the model and the specific error message.
- duplicate_fu(filter_type: str = None)#
Duplicates items based on the specified filter type.
- Parameters:
filter_type (str, optional) – The type of filter to apply for duplicates. If
None, no specific filter is applied.- Yields:
Any – The duplicated items obtained from the filtering process.
- property families: Generator[Family, None, None]#
Gets a generator that yields
Familyobjects from all functional units.- Yields:
Generator[Family, None, None] – A generator that yields
Familyobjects.
- property func_units: Generator[FuncUnit, None, None]#
Retrieves generator of functional units.
- Returns:
Generator[FuncUnit, None, None] – A generator yielding functional units from the collection of child elements.
- func_units_names(presence: str)#
Returns the names of child units based on their presence.
- Parameters:
presence (str) – The presence status used for filtering child units.
- Returns:
Any – A list or collection of child unit names filtered by presence.
- get(name: str) FuncUnit#
Retrieves a function unit by its name.
- Parameters:
name (str) – The name of the function unit to be retrieved.
- Returns:
Union[FuncUnit] – The function unit matching the specified name, or
any applicable type derived from FuncUnit.
- read(data_model: dict)#
Reads a data model dictionary and initializes or updates the object’s attributes.
- Parameters:
data_model (dict) – Dictionary containing the data model information. Must include the mandatory keys ‘name’, ‘parameters’, and ‘func_units’. Optionally, it may include ‘window’ and ‘canonical’ attributes.
- static read_model(data_model: dict) Model#
Reads a data model dictionary and initializes a Model object with it.
- Parameters:
data_model (dict) – A dictionary containing the data model to be read.
- Returns:
Model – An instance of the Model class populated with the provided data.
- property size: Tuple[int, int]#
Returns the size of the current object in terms of functional units and families.
- Returns:
Tuple[int, int] – A tuple where the first element is the number of
functional units, and the second element is the number of families.
- class panorama.systems.models.Models(models: Set[Model] = None)#
Bases:
objectRepresents a collection of models and provides interfaces for interaction.
This class is designed to manage a collection of
Modelobjects. It provides functionality for iterating over the models, accessing model details, generating mappings between function units and their models, and handling families associated with models. Additionally, it allows reading model configurations from JSON files and adding new models to the collection.- __init__(models: Set[Model] = None)#
Initializes an instance of the class.
- Parameters:
models (Set[Model], optional) – A set of models to be used. Defaults to None.
- __iter__() Generator[Model, None, None]#
Yields Model instances from the internal model getter.
This method provides an iterator over the
Modelobjects retrieved from theself._model_getter. Each model instance is yielded one at a time.- Yields:
Model – The next model instance in the sequence.
- add_model(model: Model)#
Adds a new model to the collection if it does not already exist.
This method allows adding a model to the collection, provided that no model with the same name exists in the current collection. If a model with the same name is already present, a KeyError will be raised. Before adding the model, it ensures the validity of the model by invoking its
check_modelmethod.- Parameters:
model (Model) – The model instance to be added to the collection.
- Raises:
KeyError – If a model with the same name is already present in the collection.
Any exception raised by model.check_model() if the model is invalid.
- property families: Generator[Family, None, None]#
Provides a generator that yields unique Family instances from models.
This property iterates over all models, collects all unique Family instances associated with the models, and yields them one by one.
- Yields:
Generator[Family, None, None] – A generator of Family instances.
- families_to_model() Dict[Family, Model]#
Generates a mapping of Family objects to their corresponding Model objects.
Constructs a dictionary mapping each Family object in the
familiesattribute to its associated Model object, derived from themodelattribute of each Family.- Returns:
Dict[Family, Model] – A dictionary where keys are Family objects
and values are their corresponding Model objects.
- property func_units: Generator[FuncUnit, None, None]#
Gets a generator of distinct functional units (FuncUnit) across all models.
- Yields:
Generator[FuncUnit, None, None] – A generator that iterates over unique functional units found within all models.
- func_units_to_model() Dict[FuncUnit, Model]#
Creates a mapping between func units and their corresponding models.
This method iterates through the collection of function units and creates a dictionary where each function unit is associated with its corresponding model. It provides a convenient structure to access models by their related function units.
- Returns:
Dict[FuncUnit, Model] – A dictionary mapping function units to their
respective models.
- get_model(name: str) Model#
Retrieves a model from the internal collection based on its name.
This method attempts to fetch a model from an internal mapping using the provided name. If the name does not exist in the mapping, a KeyError is raised. On success, it returns the corresponding model.
- Parameters:
name (str) – The name of the model to retrieve.
- Returns:
Model – The model corresponding to the provided name.
- Raises:
KeyError – If the provided name does not exist in the internal mapping.
- read(model_path: Path)#
Reads a model configuration from a JSON file, processes it, and adds the parsed model.
This function attempts to read the provided file path as a JSON file, extract model configuration data from it, and handle any parsing errors that may occur. Successful processing results in the extracted model being added to the system.
- Parameters:
model_path (Path) – Path to the configuration JSON file to be read.
- Raises:
KeyError – If one or more required keys are missing in the JSON file.
TypeError – If one or more attributes in the JSON file are not correctly structured.
ValueError – If one or more attributes have unacceptable values in the JSON file.
Exception – For any unexpected issues encountered while reading the JSON file.
- class panorama.systems.models._BasicFeatures(name: str = '', transitivity: int = 0, window: int = 1)#
Bases:
objectHandles basic features typically required for configurations.
This class is used as a foundation for managing basic configurations, ensuring parameter consistency, and providing utility methods for its derived classes. It includes mechanisms for naming, transitivity, and contextual window definitions. The class also provides methods to override and validate parameters dynamically by integrating it with external sources or inheriting configurations from parent instances.
- name#
Name of the element.
- Type:
str
- transitivity#
Size of the transitive closure used to build the graph.
- Type:
int
- window#
Number of neighboring genes considered on each side of a gene of
- Type:
int
- interest when searching for conserved genomic contexts.
- __init__(name: str = '', transitivity: int = 0, window: int = 1)#
Initializes the class with specific attributes for name, transitivity, and window. These attributes are used to define the characteristics and behavior of an object of this class.
- Parameters:
name (str) – The name associated with the object. Defaults to an empty string.
transitivity (int) – An integer value representing the transitivity associated with the object. Defaults to 0.
window (int) – An integer that defines the window size or configuration parameter. Defaults to 1.
- __repr__()#
Provides a string representation of the object. This method is intended to provide a clear and concise human-readable representation of the object in the form of its class name and name attribute.
- Returns:
str – A string containing the class name and the object’s name attribute.
- __str__()#
Converts the object representation to its string form.
This method is used to return a string representation of the class instance, primarily for debugging or logging purposes. It includes the name of the class and the value of the
nameattribute.- Returns:
str – A formatted string containing the class name and the
nameattribute.
- read_parameters(parameters: Dict[str, str | int | bool], param_keys: Set[str])#
Reads and assigns parameters from a provided dictionary or falls back to parent attributes if the parameter is not found.
- Parameters:
parameters (Dict[str, Union[str, int, bool]]) – A dictionary containing parameter key-value pairs.
param_keys (List[str]) – A list of keys to be retrieved from the parameters’ dictionary.
- class panorama.systems.models._FuFamFeatures(presence: str = '', parent: FuncUnit | Model = None, duplicate: int = 0, exchangeable: Set[str] = None, multi_system: bool = False, multi_model: bool = True)#
Bases:
objectRepresents features related to functional family rules and their properties.
This class is used to define and manage the attributes and rules associated with a functional family. It includes properties such as presence types, duplication count, exchangeable families, and whether a family can be present in multiple systems or models.
- presence#
Type of the rule, such as mandatory, accessory, forbidden, or neutral.
- Type:
str
- duplicate#
Number of duplicates for the functional family.
- Type:
int
- exchangeable#
Set of exchangeable families that can replace one another in functionality.
- Type:
set
- multi_system#
Determines if the family can exist across multiple systems.
- Type:
bool
- multi_model#
Determines if the family can exist across multiple models.
- Type:
bool
- __init__(presence: str = '', parent: FuncUnit | Model = None, duplicate: int = 0, exchangeable: Set[str] = None, multi_system: bool = False, multi_model: bool = True)#
Initializes the instance of the class with given parameters.
- Parameters:
presence (str) – Describes the presence attribute, default is an empty string.
parent (Union[FuncUnit, Model]) – The parent object which can be of type FuncUnit or Model. Defaults to None.
duplicate (int) – Specifies the count of duplicates. Defaults to 0.
exchangeable (Set[str]) – A set of values indicating exchangeable items. Defaults to an empty set if not provided.
multi_system (bool) – Indicates if the functionality spans across multiple systems. Defaults to False.
multi_model (bool) – Indicates if the functionality spans across multiple models. Defaults to True.
- class panorama.systems.models._ModFuFeatures(mandatory: Set[FuncUnit, Family] = None, accessory: Set[FuncUnit, Family] = None, forbidden: Set[FuncUnit, Family] = None, neutral: Set[FuncUnit, Family] = None, min_mandatory: int = 1, min_total: int = 1, same_strand: bool = False)#
Bases:
objectRepresents the features of functional units or families in a model.
This class is responsible for managing sets of child elements categorized into mandatory, accessory, forbidden, and neutral groups. It ensures consistency by enforcing rules about the minimum number of mandatory elements, total elements, and whether the elements belong to the same strand. The class also provides methods to access, validate, and manipulate the functional units or families.
- min_mandatory#
Minimum number of mandatory child elements required.
- Type:
int
- min_total#
Minimum total number of child elements required.
- Type:
int
- same_strand#
Indicates whether all child elements must reside on the same strand.
- Type:
bool
- __init__(mandatory: Set[FuncUnit, Family] = None, accessory: Set[FuncUnit, Family] = None, forbidden: Set[FuncUnit, Family] = None, neutral: Set[FuncUnit, Family] = None, min_mandatory: int = 1, min_total: int = 1, same_strand: bool = False)#
Initializes an object with specified sets of functional units or families categorized as mandatory, accessory, forbidden, and neutral. It also allows configuration of constraints on minimum requirements and positional restrictions.
- Parameters:
mandatory (Set[FuncUnit, Family], optional) – A set of functional units or families that must be included. Defaults to an empty set if not specified.
accessory (Set[FuncUnit, Family], optional) – A set of functional units or families that are optional to include. Defaults to an empty set if not specified.
forbidden (Set[FuncUnit, Family], optional) – A set of functional units or families that must not be included. Defaults to an empty set if not specified.
neutral (Set[FuncUnit, Family], optional) – A set of functional units or families that are neutral in relevance. Defaults to an empty set if not specified.
min_mandatory (int, optional) – The minimum number of functional units or families that must be included from the mandatory set. Defaults to 1.
min_total (int, optional) – The minimum total number of functional units or families that must be included overall. Defaults to 1.
same_strand (bool, optional) – A flag indicating whether all included functional units or families must be on the same strand. Defaults to False.
- _check()#
Validates the configuration of mandatory and total required elements for a specific type. Ensures that the mandatory elements meet the minimum requirements, both in count and presence, and that the total elements conform to the defined minimum limits.
- Raises:
Exception – If the number of mandatory elements is lower than the required minimum.
Exception – If the number of total elements is lower than the required minimum.
Exception – If the minimum mandatory count exceeds the minimum total count.
Exception – If no mandatory elements are present.
- _child_names(presence: str = None)#
Retrieves the names of child objects based on their presence status.
- Parameters:
presence – A string representing the presence status of child objects to filter (e.g., “active”, “inactive”). If None, retrieves names of all child objects.
- Returns:
set – A set containing the names of child objects that match the given presence status, or all child names if presence is None.
- _duplicate(filter_type: str = None)#
Filters and yields child elements based on a specific filter type.
This function allows filtering of child elements according to a specified attribute type, such as ‘mandatory’, ‘accessory’, ‘forbidden’, or ‘neutral’. If no filter type is provided, it selects all available children. It iterates through the filtered children and yields those with a
duplicatevalue of 1 or higher.- Parameters:
filter_type (str, optional) – The type of filter to apply to the children. Acceptable values are ‘mandatory’, ‘accessory’, ‘forbidden’, ‘neutral’, or None. Defaults to None.
- Yields:
object –
- Each child element that matches the filter condition and has a
duplicatevalue of 1 or higher.
- _mk_child_getter()#
Creates a dictionary for retrieving child objects by their names.
This method iterates over the
_childrenattribute and maps each child’s name to the child object itself, storing these key-value pairs in the_child_getterdictionary. This allows efficient access to child objects later based on their names.
- add(child: FuncUnit | Family)#
Adds a child element to one of the sets in the instance based on its
presenceattribute. The child can either be of the typeFuncUnitorFamily. Thepresenceattribute of the child determines which set (mandatory,accessory,forbidden, orneutral) the child will be added to.
- property child_type#
Determines the consistent child type for the instance.
- Raises:
Exception – If the child type among children is inconsistent.
- Returns:
Any – The consistent child type of the instance.
- get(name: str) FuncUnit | Family#
Retrieves a child object associated with the given name.
- Parameters:
name (str) – The name of the child object to retrieve.
- Returns:
Union[FuncUnit, Family] – The child object associated with the given name.
- Raises:
KeyError – If no child object is found for the given name.
- panorama.systems.models.check_dict(data_dict: Dict[str, str | int | list | Dict[str, int]], mandatory_keys: Set[str], param_keys: Set[str] = None) None#
Performs validation of the provided dictionary by checking for required keys, key types, value types, and specific constraints based on the content of the dictionary.
This function takes a data dictionary and validates its structure and contents against the provided required keys and optional parameter keys. Validation checks include ensuring required keys exist, verifying data types for key values, and matching key values against accepted criteria. Raises specific exceptions if any checks fail.
- Parameters:
data_dict (Dict[str, Union[str, int, list, Dict[str, int]]]) – The dictionary that needs to be validated.
mandatory_keys (List[str]) – A list of keys that must be present in the dictionary.
param_keys (List[str], optional) – An optional list of keys required in the ‘parameters’ field of the dictionary, if present. Defaults to an empty list.
- Raises:
KeyError – If required top-level keys are missing, or unexpected keys are found in the dictionary.
TypeError – If a field contains a value of an unexpected type.
ValueError – If a field contains a value that does not meet the required constraints.
Exception – If any unexpected issues occur during the validation process.
- panorama.systems.models.check_key(data: Dict, required_keys: Set[str]) None#
Validates that all required keys are present in the given dictionary.
- Parameters:
data (Dict) – The dictionary to validate.
required_keys (Set[str]) – Set of keys that must be present in the dictionary.
- Raises:
KeyError – If any required keys are missing from the dictionary.
- panorama.systems.models.check_parameters(param_dict: Dict[str, int], mandatory_keys: Set[str]) None#
Validates the parameters in the given dictionary against the required keys and their rules.
This function ensures that all the required keys are present in the dictionary and that their values adhere to the specified type and value constraints. If any rule is violated, an exception is raised. The function is designed to handle specific parameter keys with associated rules, raising meaningful errors for incorrect usage.
- Parameters:
param_dict (Dict[str, int]) – The dictionary containing parameter keys and their values.
mandatory_keys (List[str]) – The list of keys that are required to be present in param_dict.
- Raises:
KeyError – If one or more required keys are missing from param_dict or an unexpected parameter key is found.
TypeError – If a parameter value in param_dict does not match the expected type.
ValueError – If a parameter value in param_dict does not meet the defined constraints.
Exception – If an unexpected error occurs during parameter validation.
panorama.systems.parameter_validation_utils module#
panorama.systems.system module#
This module defines classes for representing biological systems detected in pangenomes, including System, SystemUnit, and ClusterSystems. It provides methods for managing system units, gene families, modules, spots, and regions, as well as utilities for comparing, merging, and clustering systems across multiple pangenomes.
- class panorama.systems.system.ClusterSystems(identifier: int, *systems: System)#
Bases:
objectRepresents a cluster of systems that are considered homologous or functionally equivalent across different pangenomes.
- ID#
Unique identifier for the cluster.
- Type:
int
- _systems_getter#
Dictionary mapping (pangenome name, system ID) to System instances.
- Type:
Dict[Tuple[str, str], System]
- __getitem__(key: Tuple[str, str]) System#
Retrieves a system by its (pangenome name, system ID) key.
- Parameters:
key (Tuple[str, str]) – The lookup key.
- Returns:
System – The system corresponding to the key.
- Raises:
KeyError – If the key does not exist.
- __init__(identifier: int, *systems: System)#
Initializes a ClusterSystems object and adds provided systems.
- Parameters:
identifier (int) – The identifier for the system cluster.
*systems (System) – Variable number of systems to add initially.
- __setitem__(key: Tuple[str, str], value: System)#
Inserts a system into the cluster.
- Parameters:
key (Tuple[str, str]) – Tuple containing (pangenome name, system ID).
value (System) – The system instance to insert.
- Raises:
KeyError – If the key already exists in the cluster.
- add(system: System) None#
Adds a system to the cluster.
- Parameters:
system (System) – The system to add.
- Raises:
AssertionError – If the input is not a System instance.
KeyError – If a system with the same key already exists in the cluster.
- get(system_id: str, pangenome_name: str) System#
Retrieves a system from the cluster using its system ID and pangenome name.
- Parameters:
system_id (str) – ID of the system.
pangenome_name (str) – Name of the pangenome.
- Returns:
System – The system object.
- Raises:
AssertionError – If system_id is not a string.
KeyError – If the system is not found in the cluster.
- pangenomes() List[str]#
Returns the list of all pangenome names represented in the cluster.
- Returns:
List[str] – List of pangenome names.
- class panorama.systems.system.System(model: Model, source: str, system_id: str | int = None, units: Set[SystemUnit] = None)#
Bases:
MetaFeaturesRepresents a biological system detected in a pangenome, composed of one or more functional units.
A System groups gene families that co-occur according to a defined model. It manages its internal system units (instances of SystemUnit), maintains references to organisms, gene families, regions, modules, and annotation sources, and supports set-based operations such as union, intersection, and inclusion with other systems.
- ID#
Unique identifier for the system. Automatically generated if not provided.
- Type:
str
- source#
Source of the annotation or prediction (e.g., experimental, inferred).
- Type:
str
- cluster_id#
Identifier used to group homologous systems across pangenomes.
- Type:
Optional[int]
- __delitem__(name: str)#
Removes a system unit by its name.
- Parameters:
name (str) – The name of the unit to remove.
- Raises:
KeyError – If no unit with the given name exists.
- __eq__(other: System) bool#
Compares this system to another for structural equality.
- Parameters:
other (System) – Another system to compare with.
- Returns:
bool – True if systems are structurally identical.
- Raises:
TypeError – If
otheris not a System.
- __getitem__(name: str) SystemUnit#
Retrieves a system unit by its name.
- Parameters:
name (str) – The name of the unit.
- Returns:
SystemUnit – The system unit corresponding to the name.
- Raises:
KeyError – If no unit with the given name exists.
- __hash__() int#
Computes a hash based on the set of system units.
The hash includes unit names and their content but excludes the system ID to ensure consistent hashing across different launches.
- Returns:
int – A hash representing the system units and their content.
- __init__(model: Model, source: str, system_id: str | int = None, units: Set[SystemUnit] = None)#
Initializes a System object with a model and optional functional units.
- Parameters:
model (Model) – The model defining the structure and required components of the system.
source (str) – Source of the system annotation (e.g., predicted, curated).
system_id (Union[str, int], optional) – Unique identifier for the system. If not provided, an incremental ID is assigned automatically.
units (Set[SystemUnit], optional) – Optional set of system units (functional components) to initialize the system with.
- Raises:
TypeError – If any unit in
unitsis not an instance of SystemUnit.
- __len__() int#
Returns the number of units in the system.
- Returns:
int – Number of system units.
- __repr__() str#
Returns a human-readable representation of the system.
- Returns:
str – A summary string containing the system ID and model name.
- __setitem__(name: str, unit: SystemUnit)#
Adds a system unit to the system under a given name.
- Parameters:
name (str) – Name under which to register the unit.
unit (SystemUnit) – The unit to register.
- Raises:
TypeError – If
unitis not an instance of SystemUnit.KeyError – If another unit with the same name already exists and differs.
- _mk_fam2unit()#
Internal method to create a mapping from gene family to system unit.
- add_canonical(system: System)#
Adds a canonical system to this instance. If a similar canonical system already exists, merges or replaces it.
- Parameters:
system (System) – Canonical system to incorporate.
- add_unit(unit: SystemUnit)#
Adds a system unit to the system, replacing it if it is a superset.
- Parameters:
unit (SystemUnit) – The unit to add.
- Raises:
AssertionError – If the provided unit is not a SystemUnit.
- annotation_sources() Set[str]#
Collects all annotation sources used in the system.
- Returns:
Set[str] – Unique annotation source identifiers.
- canonical_models() List[str]#
Returns the canonical model names associated with this system.
- Returns:
List[str] – Canonical model names.
- property families: Generator[GeneFamily, None, None]#
Retrieves all gene families associated with the system.
- Returns:
Generator[GeneFamily, None, None] – All gene families from all units.
- get_metainfo(gene_family: GeneFamily) Tuple[str, int]#
Retrieves metadata associated with a gene family (e.g., annotation source).
- Parameters:
gene_family (GeneFamily) – The gene family to query.
- Returns:
Tuple[str, int] – Annotation source and metadata ID.
- get_module(identifier: int) Module#
Retrieves a module by its unique identifier.
- Parameters:
identifier (int) – Module ID.
- Returns:
Module – The corresponding module.
- Raises:
KeyError – If not found.
- get_region(name: str) Region#
Retrieves a region by its name.
- Parameters:
name (str) – Region name.
- Returns:
Region – Matching region.
- Raises:
KeyError – If the region is not associated.
- get_spot(identifier: int) Spot#
Retrieves a spot by its unique identifier.
- Parameters:
identifier (int) – Spot ID.
- Returns:
Spot – Corresponding spot.
- Raises:
KeyError – If not found.
- get_unit(name: str) SystemUnit#
Fetches a unit by name.
- Parameters:
name (str) – Name of the system unit.
- Returns:
SystemUnit – The corresponding unit.
- intersection(other: System) Set[SystemUnit]#
Computes the common units between this system and another.
- Parameters:
other (System) – Another system to intersect with.
- Returns:
Set[SystemUnit] – Units shared between both systems.
- Raises:
TypeError – If
otheris not a System.
- is_subset(other: System) bool#
Checks if this system is fully contained in another.
- Parameters:
other (System) – System to compare against.
- Returns:
bool – True if self is a subset of the other.
- Raises:
TypeError – If
otheris not a System.
- is_superset(other: System) bool#
Checks if this system contains all units of another.
- Parameters:
other (System) – System to compare against.
- Returns:
bool – True if self is a superset of the other.
- Raises:
TypeError – If
otheris not a System.
- merge(other: System)#
Merges another system into this one by unifying their units.
- Parameters:
other (System) – The system to merge into this one.
- Raises:
TypeError – If
otheris not a System.
- property model_families: Generator[GeneFamily, None, None]#
Retrieves all gene families defined by the model.
- Returns:
Generator[GeneFamily, None, None] – Model gene families.
- property model_organisms: Generator[Organism, None, None]#
Retrieves organisms matching model gene family requirements.
- Returns:
Generator[Organism, None, None] – Organisms satisfying the system’s model constraints.
- property modules: Generator[Module, None, None]#
Retrieves all modules associated with the system.
- Returns:
Generator[Module, None, None] – Modules involved in the system.
- property name: str#
Return the name of the system unit, as defined by its associated functional unit.
- Returns:
str – The name of the system unit.
- property number_of_families#
Computes the total number of gene families in the system.
- Returns:
int – Count of gene families across all units.
- property number_of_model_gene_families#
Computes the total number of model gene families in the system.
- Returns:
int – Count of model gene families across all units.
- property organisms: Generator[Organism, None, None]#
Retrieves all organisms where gene families from the system were found.
- Returns:
Generator[Organism, None, None] – Organisms represented in the system.
- property regions: Generator[Region, None, None]#
Retrieves all genomic regions associated with the system.
- Returns:
Generator[Region, None, None] – Regions where system units were found.
- property spots: Generator[Spot, None, None]#
Retrieves all genomic spots linked to the system.
- Returns:
Generator[Spot, None, None] – All spots.
- property units: Generator[SystemUnit, None, None]#
Return a generator yielding all SystemUnit instances contained in the system.
- Yields:
SystemUnit – Each unit in the system.
- class panorama.systems.system.SystemUnit(functional_unit: FuncUnit, source: str, gene_families: Set[GeneFamily] = None, families_to_metainfo: Dict[GeneFamily, Tuple[str, int]] = None)#
Bases:
MetaFeaturesRepresents a functional unit detected in a pangenome system, associating a FuncUnit model with gene families, annotation sources, and metadata. Provides methods for managing gene families, modules, spots, and regions, as well as set operations and metadata retrieval for gene families within the unit.
- ID#
Identifier of the system unit object.
- Type:
int
- source#
Source of the functional unit.
- Type:
str
- __eq__(other: SystemUnit) bool#
Determine if two SystemUnit instances are equal by comparing their sets of model gene families.
- Parameters:
other (SystemUnit) – The other SystemUnit to compare with.
- Returns:
bool – True if both units have the same model gene families, False otherwise.
- Raises:
TypeError – If ‘other’ is not a SystemUnit instance.
- __getitem__(name: str) GeneFamily#
Retrieve a GeneFamily by its name from the system unit.
- Parameters:
name (str) – The name of the gene family to retrieve.
- Returns:
GeneFamily – The gene family associated with the given name.
- Raises:
KeyError – If no gene family with the specified name exists in the system unit.
- __hash__() int#
Compute a hash value for the SystemUnit based on its gene families and annotation sources.
The hash includes gene family names and their annotation sources but excludes the ID to ensure consistent hashing across different launches.
- Returns:
int – The hash value representing the gene families and their annotation sources.
- __init__(functional_unit: FuncUnit, source: str, gene_families: Set[GeneFamily] = None, families_to_metainfo: Dict[GeneFamily, Tuple[str, int]] = None)#
Initialize a SystemUnit instance representing a functional unit detected in a pangenome.
- Parameters:
functional_unit (FuncUnit) – The FuncUnit model associated with this system unit.
source (str) – Source of the functional unit.
gene_families (Set[GeneFamily], optional) – Set of gene families in the system unit.
families_to_metainfo (Dict[GeneFamily, Tuple[str, int]], optional) – Mapping of gene families to their annotation source and metadata ID.
- __len__()#
Returns the number of gene families in the functional unit.
- Returns:
int – Number of gene families.
- __repr__()#
Return a string representation of the SystemUnit, including its ID, name, and associated model name.
- Returns:
str – String representation of the SystemUnit.
- __setitem__(name: str, family: GeneFamily)#
Assigns a GeneFamily to the system by name.
- Parameters:
name (str) – The name of the gene family.
family (GeneFamily) – The GeneFamily instance to assign.
- Raises:
TypeError – If the provided family is not a GeneFamily instance.
KeyError – If a different GeneFamily with the same name already exists.
- _asso_modules()#
Associates modules to the unit based on the gene families present.
- _get_model_families() Set[GeneFamily]#
Return the set of gene families in this SystemUnit that are associated with a nonzero metadata ID.
- Returns:
Set[GeneFamily] – Set of gene families with a nonzero metadata identifier.
- _make_spot_getter()#
Creates and populates the spot getter with spots associated to the unit, either from regions or gene families.
- add_family(gene_family: GeneFamily, annotation_source: str = '', metadata_id: int = 0)#
Adds a GeneFamily to the system and associates it with annotation source and metadata ID.
- Parameters:
gene_family (GeneFamily) – The gene family to add.
annotation_source (str, optional) – The source of the annotation. Defaults to “”.
metadata_id (int, optional) – The metadata identifier. Defaults to 0.
- Raises:
AssertionError – If gene_family is not an instance of GeneFamily.
- add_module(module: Module)#
Associate a module with this unit.
- Parameters:
module (Module) – The module to add.
- Raises:
Exception – If a different module with the same identifier is already associated with this unit.
- add_region(region: Region)#
Adds a region to the unit.
- Parameters:
region (Region) – The region to add.
- Raises:
Exception – If a different region with the same identifier is already associated with the unit.
- add_spot(spot: Spot)#
Adds a spot to the unit.
- Parameters:
spot (Spot) – The spot to add.
- Raises:
Exception – If a different spot with the same identifier is already associated with the unit.
- annotation_sources() Set[str]#
Retrieve a set of unique annotation source names from the system.
- Returns:
Set[str] – A set containing all non-empty annotation source names.
- difference(other: SystemUnit) Set[GeneFamily]#
Return the set of gene families present in this SystemUnit but not in another.
- Parameters:
other (SystemUnit) – The SystemUnit to compare against.
- Returns:
Set[GeneFamily] – Gene families unique to this unit.
- Raises:
TypeError – If ‘other’ is not a SystemUnit instance.
- property families: Generator[GeneFamily, None, None]#
Returns a generator yielding all gene families associated with this system unit.
- Yields:
GeneFamily – Each gene family in the unit.
- property functional_unit: FuncUnit#
Return the Functional unit model associated with this system unit.
- Returns:
FuncUnit – The functional unit model.
- Raises:
AttributeError – If the functional unit is not set.
- get_metainfo(gene_family: GeneFamily) Tuple[str, int]#
Return the annotation source and metadata ID associated with the given gene family.
- Parameters:
gene_family (GeneFamily) – The gene family for which to retrieve metadata.
- Returns:
Tuple[str, int] – A tuple containing the annotation source and metadata identifier.
- get_module(identifier: int) Module#
Retrieve the module associated with the given identifier.
- Parameters:
identifier (int) – The identifier of the module to retrieve.
- Returns:
Module – The module corresponding to the given identifier.
- Raises:
KeyError – If no module with the specified identifier is associated with this unit.
- get_region(name: str) Region#
Retrieves a region by its name.
- Parameters:
name (str) – Name of the region.
- Returns:
Region – The region with the given name.
- Raises:
KeyError – If the region is not associated with the unit.
- get_spot(identifier: int) Spot#
Retrieves a spot by its identifier.
- Parameters:
identifier (int) – Identifier of the spot.
- Returns:
Spot – The spot with the given identifier.
- Raises:
KeyError – If the spot is not associated with the unit.
- hash_content() str#
Generate string content for hashing based on gene families and annotation sources.
- Returns:
str – String representation of the content to be hashed.
- intersection(other: SystemUnit) Set[GeneFamily]#
Return the set of model gene families common to both this SystemUnit and another.
- Parameters:
other (SystemUnit) – The other SystemUnit to compare with.
- Returns:
Set[GeneFamily] – Set of gene families present in both units.
- Raises:
TypeError – If ‘other’ is not a SystemUnit instance.
- is_subset(other: SystemUnit)#
Check if this SystemUnit is a subset of another, i.e., contains all model gene families of the other unit.
- Parameters:
other (SystemUnit) – The unit to compare against.
- Returns:
bool – True if this unit is a subset of the other, False otherwise.
- Raises:
TypeError – If ‘other’ is not a SystemUnit instance.
- is_superset(other: SystemUnit)#
Check if this SystemUnit is a superset of another, i.e., contains all model gene families of the other unit.
- Parameters:
other (SystemUnit) – The unit to compare against.
- Returns:
bool – True if all model gene families in ‘other’ are present in this unit, False otherwise.
- Raises:
TypeError – If ‘other’ is not a SystemUnit instance.
- merge(other: SystemUnit)#
Merge another SystemUnit into this one by adding gene families present in the other unit but not in this unit.
- Parameters:
other (SystemUnit) – The SystemUnit to merge with.
- Raises:
TypeError – If ‘other’ is not a SystemUnit instance.
- property model: Model#
Return the Model instance associated with the functional unit of this system unit.
- Returns:
Model – The model in which the functional unit is defined.
- property model_families: Generator[GeneFamily, None, None]#
Return a generator yielding all gene families in this SystemUnit that are associated with a nonzero metadata ID.
- Yields:
GeneFamily – Each gene family described in the model.
- property model_organisms: Generator[Organism, None, None]#
Return a generator yielding all unique Organism instances present in at least
min_totalmodel gene families within this SystemUnit.- Yields:
Organism – Each organism meeting the minimum model family presence threshold.
Note
Considers only gene families associated with a nonzero metadata ID (model families). Attempts to use a matrix approach for efficient computation. TODO: Try to use organisms bitarray for optimization.
- property modules: Generator[Module, None, None]#
Generator that yields all modules associated with the unit.
- Yields:
Module – Each module associated with the unit.
- property name: str#
Returns the name of the system unit, as defined by its associated functional unit.
- Returns:
str – Name of the system unit.
- property nb_model_families: int#
Return the number of unique gene families in this SystemUnit that are associated with a nonzero metadata ID.
- Returns:
int – Number of distinct model-associated gene families.
- property nb_organisms: int#
Return the number of unique Organism instances associated with the gene families in this SystemUnit.
- Returns:
int – Number of distinct organisms linked to the unit.
- property organisms: Generator[Organism, None, None]#
Return a generator yielding all unique Organism instances associated with the gene families in this SystemUnit.
- Yields:
Organism – Each unique organism linked to any gene family in the unit.
- property regions: Generator[Region, None, None]#
Retrieves the regions associated with the unit.
- Yields:
Region – Each region associated with the unit.
- property spots: Generator[Spot, None, None]#
Retrieves the spots associated with the unit.
- Yields:
Spot – Each spot associated with the unit.
- symmetric_difference(other: SystemUnit) Set[GeneFamily]#
Return the set of gene families that are present in exactly one of this SystemUnit or another.
- Parameters:
other (SystemUnit) – The SystemUnit to compare with.
- Returns:
Set[GeneFamily] – Gene families unique to either this unit or the other.
- Raises:
TypeError – If ‘other’ is not a SystemUnit instance.
- panorama.systems.system.check_instance_of_system(method)#
Decorator to ensure that a provided argument is an instance of SystemUnit.
- Parameters:
method (Callable) – The method to be wrapped in type-checking functionality.
- Returns:
Callable – The wrapped method with type-checking functionality added.
- Raises:
TypeError – If the
otherargument passed to the wrapped method is not an instance of SystemUnit.
- panorama.systems.system.check_instance_of_system_unit(method)#
Decorator to ensure that a provided argument is an instance of SystemUnit.
- Parameters:
method (Callable) – The method to be wrapped in type-checking functionality.
- Returns:
Callable – The wrapped method with type-checking functionality added.
- Raises:
TypeError – If the
otherargument passed to the wrapped method is not an instance of SystemUnit.
panorama.systems.systems_association module#
This module provides functionality for associating systems with other pangenome elements such as RGPs (Regions of Genomic Plasticity), spots, and modules.
The module creates correlation matrices and visualizations to analyze the relationships between systems and various pangenome components.
- class panorama.systems.systems_association.AssociationVisualizationBuilder(association: str, name: str, output_dir: Path, formats: List[str] | None = None)#
Bases:
VisualizationBuilderBuilder for correlation matrix visualizations between systems and genomic associations.
This class creates comprehensive visualizations showing correlations between pangenome systems and various genomic elements (RGPs, modules, etc.), including coverage and frequency plots.
- association#
Type of genomic association being visualized (e.g., ‘rgp’, ‘module’)
- __init__(association: str, name: str, output_dir: Path, formats: List[str] | None = None)#
Initialize the association visualization builder.
- Parameters:
association – Type of pangenome object being visualized (e.g., ‘rgp’, ‘module’)
name – Name of the pangenome for visualization titles
output_dir – Directory path where output files will be saved
formats – List of output formats to generate
- _configure_plot_style() None#
Configure plot styling specific to association visualizations.
Extends the base styling with association-specific axis labels.
- _create_metric_plot(data_df: DataFrame, x_range: FactorRange, metric_name: str, color_palette: List[str], title: str) Tuple[figure, figure]#
Create a generic metric visualization plot (coverage or frequency).
Creates a horizontal strip visualization with an associated color bar to show metric values across genomic elements.
- Parameters:
data_df – DataFrame containing the metric data
x_range – X-axis range for consistent ordering
metric_name – Name of the metric column in the DataFrame
color_palette – Color palette to use for the visualization
title – Title for the color bar
- Returns:
Tuple of (metric_plot, color_bar_plot)
- create_bar_plots(correlation_matrix: DataFrame) None#
Create bar plots showing system and element counts.
Creates both left (system counts) and top (element counts) bar plots to provide marginal summaries of the correlation matrix.
- Parameters:
correlation_matrix – Preprocessed correlation matrix
- create_color_bar(title: str) None#
Create a color bar for the correlation matrix.
- Parameters:
title – Title to display on the color bar
- static create_color_palette(max_value: int) List[str]#
Create an appropriate color palette based on the maximum correlation value.
The palette selection adapts to the data range to provide optimal visual discrimination between different correlation values.
- Parameters:
max_value – Maximum correlation value in the matrix
- Returns:
List of color hex codes for the palette, starting with white for zero values
- create_coverage_plot(coverage_df: DataFrame, x_range: FactorRange) None#
Create a coverage visualization plot.
Coverage represents how well each genomic element is covered by the systems, displayed as a horizontal strip below the main heatmap.
- Parameters:
coverage_df – DataFrame containing coverage data with coverage values
x_range – X-axis range for consistent ordering with main plot
- create_frequency_plot(frequency_df: DataFrame, x_range: FactorRange) None#
Create a frequency visualization plot.
Frequency represents how often each genomic element appears across genomes, displayed as a horizontal strip below the main heatmap.
- Parameters:
frequency_df – DataFrame containing frequency data with frequency values
x_range – X-axis range for consistent ordering with main plot
- create_main_figure(correlation_matrix: DataFrame, x_range: FactorRange, y_range: FactorRange) None#
Create the main correlation matrix heatmap figure.
- Parameters:
correlation_matrix – Preprocessed correlation matrix with systems as rows and associations as columns
x_range – X-axis range for consistent ordering across plots
y_range – Y-axis range for consistent ordering across plots
- plot() None#
Create and save the complete association visualization layout.
Arranges all components (main heatmap, bar plots, color bars, and metric plots) in a grid layout and saves the result in the specified formats.
- panorama.systems.systems_association._get_element_frequency(element: Spot | Module, system_organisms: Set, pangenome: Pangenome) float#
Calculate the frequency of a Spot or Module element across organisms.
- Parameters:
element – The Spot or Module element.
system_organisms – Set of organisms associated with systems.
pangenome – The pangenome containing organism information.
- Returns:
The frequency of the element among organisms.
- panorama.systems.systems_association._get_region_frequency(region: Region, pangenome: Pangenome) float#
Calculate the frequency of a Region element across all organisms.
- Parameters:
region – The Region element for frequency calculation.
pangenome – The pangenome containing organism information.
- Returns:
The frequency of the Region in the pangenome.
Note
TODO: This implementation needs to be fixed to properly calculate region frequency across organisms.
- panorama.systems.systems_association.create_coverage_dataframe(element_to_systems: Dict[Region | Spot | Module, Set[System]], pangenome: Pangenome | None = None) DataFrame#
Create a DataFrame describing coverage of systems by pangenome elements.
- Parameters:
element_to_systems – Dictionary mapping pangenome elements to system sets.
pangenome – Optional pangenome object for frequency calculations.
- Returns:
DataFrame with coverage and frequency information.
- panorama.systems.systems_association.create_pangenome_system_associations(pangenome: Pangenome, associations: List[str], output_dir: Path, output_formats: List[str] | None = None, threads: int = 1, disable_bar: bool = False)#
Create and save associations between systems and pangenome elements.
This function generates association matrices, coverage analysis, and visualizations for the relationships between systems and various pangenome components (RGPs, spots, modules).
- Parameters:
pangenome – The pangenome containing systems and other elements.
associations – List of pangenome elements to associate with systems. Valid options: [‘RGPs’, ‘spots’, ‘modules’]
output_dir – Directory where output files will be saved.
output_formats – List of output formats for visualizations. Valid options: [‘html’, ‘png’]. Defaults to [‘html’].
threads – Number of threads for parallel processing. Defaults to 1.
disable_bar – Whether to disable the progress bar display. Defaults to False.
- Raises:
ValueError – If invalid association types are provided.
FileNotFoundError – If the output directory doesn’t exist.
- panorama.systems.systems_association.get_association_dataframes(pangenome: Pangenome, associations: List[str], threads: int = 1, disable_progress_bar: bool = False) Tuple[DataFrame, DataFrame, DataFrame, DataFrame]#
Generate DataFrames for system-pangenome element associations.
- Parameters:
pangenome – Pangenome containing systems and elements.
associations – List of pangenome elements to associate with systems.
threads – Number of threads for parallel processing.
disable_progress_bar – Whether to disable the progress bar.
- Returns:
Tuple containing –
Association DataFrame (systems to elements)
RGP coverage DataFrame
Spot coverage DataFrame
Module coverage DataFrame
- Raises:
ValueError – If no systems are found in the pangenome.
- panorama.systems.systems_association.preprocess_association_data(dataframe: DataFrame, association: str) DataFrame#
Preprocess association data to create a correlation matrix.
- Parameters:
dataframe – Association DataFrame between systems and pangenome objects.
association – Type of pangenome object for association.
- Returns:
Preprocessed correlation matrix DataFrame.
- panorama.systems.systems_association.process_system(system: System, associations: List[str], rgp_to_systems: defaultdict, spot_to_systems: defaultdict, module_to_systems: defaultdict) Tuple[str, List[str]]#
Process a single system and update association mappings.
- Parameters:
system – The system to process.
associations – List of association types to include.
rgp_to_systems – Mapping from RGPs to systems (updated in-place).
spot_to_systems – Mapping from spots to systems (updated in-place).
module_to_systems – Mapping from modules to systems (updated in-place).
- Returns:
Tuple of system ID and system data list.
- panorama.systems.systems_association.write_correlation_matrix_visualization(association_df: DataFrame, association: str, coverage_df: DataFrame, pangenome_name: str, output_dir: Path, frequency_df: DataFrame | None = None, output_formats: List[str] | None = None)#
Generate and save correlation matrix visualization.
- Parameters:
association_df – Association DataFrame between systems and pangenome objects.
association – Type of pangenome object to visualize.
coverage_df – Coverage DataFrame for the association.
pangenome_name (str) – Name of the pangenome.
output_dir – Directory to save output files.
frequency_df – Optional frequency DataFrame.
output_formats – List of output formats (default: [‘html’]).
- Raises:
ValueError – If an unsupported output format is specified.
panorama.systems.systems_partitions module#
Systems partitions visualization module for pangenome analysis.
This module provides functionality to create heatmap visualizations for pangenome systems partitions and system counts across organisms.
- class panorama.systems.systems_partitions.SystemsPartitionVisualizer(name: str, output_dir: Path, formats: List[str] | None = None)#
Bases:
VisualizationBuilderVisualizer for pangenome systems partition distributions.
This class creates heatmap visualizations showing how pangenome systems are partitioned (persistent, shell, cloud, etc.) across different organisms. The visualization helps understand the conservation patterns of genetic systems.
- partitions#
List of partition categories
- partition2color#
Mapping from partitions to colors
- mapper#
Categorical color mapper for partitions
- __init__(name: str, output_dir: Path, formats: List[str] | None = None)#
Initialize the SystemsPartitionVisualizer.
- Parameters:
name – Name of the pangenome for visualization titles
output_dir – Directory path where output files will be saved
formats – List of output formats to generate
- create_bar_plots(partition_matrix: DataFrame) None#
Create bar plots showing system and organism statistics.
Creates stacked bars for partition distributions (left) and organism counts (top) to provide marginal summaries of the partition matrix.
- Parameters:
partition_matrix – DataFrame with partition information
- create_color_bar(title: str) None#
Create a color bar for the partition matrix.
- Parameters:
title – Title to display on the color bar
- create_left_bar(source: ColumnDataSource) None#
Create a stacked horizontal bar plot showing partition distributions by system.
- Parameters:
source – ColumnDataSource containing stacked partition data
- create_main_figure(partition_matrix: DataFrame, x_range: FactorRange, y_range: FactorRange) None#
Create the main partition matrix heatmap figure.
- Parameters:
partition_matrix – DataFrame with systems, organisms, and partition information
x_range – X-axis range for the plot (organisms)
y_range – Y-axis range for the plot (systems)
- plot() None#
Create and save the complete partition visualization layout.
Arranges the main partition heatmap with supporting bar plots and color bar in a clean grid layout, then saves the result in specified formats.
- panorama.systems.systems_partitions.preprocess_data(data: DataFrame, disable_bar: bool = False) DataFrame#
Preprocess data to draw a partition heatmap figure for the pangenome.
- Parameters:
data (pd.DataFrame) – Projection of pangenome systems
disable_bar (bool, optional) – If True, disable the progress bar. Defaults to False.
- panorama.systems.systems_partitions.preprocess_partition_data(data: DataFrame) DataFrame#
Preprocess data to draw a partition heatmap figure for the pangenome.
- Parameters:
data (pd.DataFrame) – Data used to produce the heatmap.
- panorama.systems.systems_partitions.systems_partition(name: str, system_projection: DataFrame, output: Path, output_formats: List[str] = None) None#
Create heatmap visualizations for pangenome systems partitions.
This function serves as the main entry point for generating both partition and count heatmap visualizations from pangenome system projection data.
- Parameters:
name – Name of the pangenome for visualization titles.
system_projection – DataFrame containing system projection data with columns: [‘system number’, ‘system name’, ‘organism’, ‘partition’].
output – Path to the directory where output files will be saved.
output_formats – List of output formats for visualizations. Valid options: [‘html’, ‘png’]. Defaults to [‘html’].
- Raises:
ValueError – If required columns are missing from system_projection DataFrame.
panorama.systems.systems_projection module#
This module provides functions to project systems onto genomes.
- panorama.systems.systems_projection._custom_agg(series: Series, unique: bool = False)#
Aggregate a column
- Parameters:
series – series to aggregate
unique – whether to return unique values or not
- Returns:
The aggregated series
- panorama.systems.systems_projection.compute_gene_components(model_genes: Set[Gene], window_size: int) List[List[Gene]]#
Compute gene components within a specified window size in the contigs of an organism.
- Parameters:
model_genes (Set[Gene]) – Set of genes in one organism corresponding to model gene families.
window_size (int) – The size of the window to consider for grouping genes.
- Returns:
List[List[Gene]] – A list of components, each containing genes that are within the specified window.
- panorama.systems.systems_projection.compute_genes_graph(model_genes: Set[Gene], unit: SystemUnit) Graph#
Compute the genes graph for a given genomic context in an organism.
- Parameters:
model_genes (Set[Gene]) – Set of genes in one organism corresponding to model gene families.
unit (SystemUnit) – The unit of interest.
- Returns:
nx.Graph – A genomic context graph for the given organism.
- panorama.systems.systems_projection.custom_agg(series: Series)#
Aggregate a column
- Parameters:
series – series to aggregate
- Returns:
The aggregated series
- panorama.systems.systems_projection.custom_agg_unique(series: Series)#
Aggregate a column
- Parameters:
series – series to aggregate
- Returns:
The aggregated series
- panorama.systems.systems_projection.eliminate_empty(org_df)#
Removes systems with no model genes left.
- Parameters:
org_df (pd.DataFrame) – A DataFrame with at least the columns “system number” and “category”. The “category” column is used to identify “model” genes, and the “system number” column groups rows into systems.
- Returns:
pd.DataFrame – A DataFrame containing only the systems that have at least one “model” gene. The rows are concatenated and re-indexed.
- panorama.systems.systems_projection.eliminate_systems(org_df, org_df_filtered)#
Eliminates systems from a filtered DataFrame based on model family changes. This function ensures that systems with any eliminated model families in the filtered dataset are removed entirely.
- Parameters:
org_df – pandas.DataFrame containing the original dataset with all systems and associated gene families.
org_df_filtered – pandas.DataFrame containing the already filtered dataset, which may have excluded some gene families or systems.
- Returns:
pandas.DataFrame filtered to exclude entire systems where any model families were missing after the
initial filtering step.
- panorama.systems.systems_projection.extract_numeric_for_sorting(val) float#
Function to extract the numeric value for sorting while keeping the original value
- Parameters:
val – the value
- Returns:
float – the numeric value
- panorama.systems.systems_projection.get_org_df(org_df: DataFrame) Tuple[DataFrame, str]#
Get the reformated projection dataframe for an organism
- Parameters:
org_df – Dataframe for the corresponding organism
- Returns:
pd.DataFrame – Dataframe reformated for an organism
TODO: This function is not used anymore, should we remove it?
- panorama.systems.systems_projection.get_org_df_one_unit_per_fam(org_df, eliminate_filtered_systems=False, eliminate_empty_systems=False) Tuple[DataFrame, str]#
Filters and processes a DataFrame to retain only one representative unit per gene family based on completeness, and optionally eliminates certain systems based on filtering criteria. Also calculates overlapping information.
- Parameters:
org_df – The input DataFrame containing organism data with details such as “gene family”, “system name”, “functional unit name”, “completeness”, and other related columns.
eliminate_filtered_systems – Flag indicating whether to remove systems where any of their model families were filtered out due to lower completeness.
eliminate_empty_systems – Flag indicating whether to remove systems with no model families left after filtering.
- Returns:
Tuple containing –
A processed and filtered DataFrame with one row per unit per gene family, with overlapping information added and optional system elimination applied.
The unique organism name derived from the input DataFrame.
- panorama.systems.systems_projection.get_partition(series: Series)#
- Parameters:
series
Returns:
- panorama.systems.systems_projection.has_short_path(graph: Graph, node_list: List[GeneFamily], n: int) bool#
Checks if there exists at least one path of length less than
nconnecting any two nodes in the given list of nodes in the graph.- Parameters:
graph (nx.Graph) – the graph to search paths
node_list (List[GeneFamily]) – List of gene families to check for paths.
n (int) – The maximum length of the path to consider.
- Returns:
bool – True if there exists at least one path of length less than
nconnecting any two nodes in the list, False otherwise.
- panorama.systems.systems_projection.project_pangenome_systems(pangenome: Pangenome, system_source: str, association: List[str] = None, canonical: bool = False, threads: int = 1, lock: Lock = None, disable_bar: bool = False) Tuple[DataFrame, DataFrame]#
Project systems onto all organisms in a pangenome.
- Parameters:
pangenome (Pangenome) – The pangenome to project.
system_source (str) – Source of the systems to project.
association (List[str], optional) – List of associations to include (e.g., ‘RGPs’, ‘spots’).
canonical (bool, optional) – If True, write the canonical version of systems too. Defaults to False.
threads (int, optional) – Number of threads available (default is 1).
lock (Lock, optional) – Global lock for multiprocessing execution (default is None).
disable_bar (bool, optional) – Disable progress bar (default is False).
- Returns:
Tuple[pd.DataFrame, pd.DataFrame] – Two DataFrames containing the projections for each organism and the pangenome
- panorama.systems.systems_projection.project_unit_on_organisms(components: List[List[Gene]], unit: SystemUnit, model_genes: Set[Gene], association: List[str] = None) List[NamedTuple[str]]#
Projects a system unit onto a given organism’s pangenome.
- Parameters:
components (List[List[Gene]]) – List of gene components to project.
unit (SystemUnit) – The unit to be projected.
model_genes (Set[Gene]) – Set of genes in one organism corresponding to model gene families.
association (List[str], optional) – List of associations to include (e.g., ‘RGPs’, ‘spots’).
- Returns:
A list of projected system information for the organism.
- panorama.systems.systems_projection.system_projection(system: System, fam_index: Dict[GeneFamily, int], gene_family2family: Dict[GeneFamily, Set[Family]], association: List[str] = None) Tuple[DataFrame, DataFrame]#
Project a system onto all organisms in a pangenome.
- Parameters:
system (System) – The system to project.
fam_index (Dict[GeneFamily, int]) – Index mapping gene families to their positions.
gene_family2family (Dict[GeneFamily, Set[Family]]) – Dictionary linking a gene family to model families.
association (List[str], optional) – List of associations to include (e.g., ‘RGPs’, ‘spots’).
- Returns:
Tuple[pd.DataFrame, pd.DataFrame] – Two DataFrames containing the projected system for the pangenome and organisms.
- panorama.systems.systems_projection.unit_projection(unit: SystemUnit, gf2fam: Dict[GeneFamily, set[Family]], fam_index: Dict[GeneFamily, int], association: List[str] = None) Tuple[DataFrame, DataFrame]#
Project a system unit onto all organisms in a pangenome.
- Parameters:
unit (SystemUnit) – The system unit to project.
gf2fam (Dict[str, set[Family]]) – Dictionary linking a pangenome gene family to a model family.
fam_index (Dict[GeneFamily, int]) – Index mapping gene families to their positions.
association (List[str], optional) – List of associations to include (e.g., ‘RGPs’, ‘spots’).
- Returns:
Tuple[pd.DataFrame, pd.DataFrame] – Two DataFrames containing the projected system for the pangenome and organisms.
- panorama.systems.systems_projection.write_projection_systems(output: Path, pangenome_projection: DataFrame, organisms_projection: DataFrame, organisms: List[str] = None, threads: int = 1, force: bool = False, disable_bar: bool = False)#
Write the projected systems to output files.
- Parameters:
output (Path) – Path to the output directory.
pangenome_projection (pd.DataFrame) – DataFrame containing the pangenome projection.
organisms_projection (pd.DataFrame) – DataFrame containing the organism projections.
organisms (List[str], optional) – List of organisms to project (default is all organisms).
threads (int, optional) – Number of threads to use for parallel processing. Defaults to 1.
force (bool, optional) – Force write to the output directory (default is False).
disable_bar (bool, optional) – If True, disable the progress bar. Defaults to False.
- Returns:
None
panorama.systems.utils module#
This module provides utility functions to detect and write biological systems in pangenomes.
- class panorama.systems.utils.VisualizationBuilder(name: str, output_dir: Path, formats: List[str] | None = None)#
Bases:
ABCAbstract base class for building correlation matrix and partition visualizations.
This class maintains a common configuration and provides a cohesive interface for creating all components of pangenome visualizations. It handles shared functionality like plot dimensions, styling, and file saving.
- BELOW_HEIGHT = 138#
Space for bottom plots
- Type:
int
- CENTER_WIDTH = 1335#
Space for main heatmap
- Type:
int
- DEFAULT_FORMAT = 'html'#
Default output format when none specified
- Type:
str
- LEFT_WIDTH = 267#
Space for left bar plots
- Type:
int
- MIDDLE_HEIGHT = 644#
Space for main heatmap
- Type:
int
- OUTPUT_FORMATS = ['html', 'png']#
Supported output formats for saving figures
- Type:
list
- PNG_EXPORT_HEIGHT = 1080#
Height of PNG exports
- Type:
int
- PNG_EXPORT_WIDTH = 1920#
Width of PNG exports
- Type:
int
- RIGHT_WIDTH = 178#
Space for color bars
- Type:
int
- TOP_HEIGHT = 138#
int Space for top bar plots
- TOTAL_HEIGHT = 920#
Total height of the complete visualization layout.
- Type:
int
- TOTAL_WIDTH = 1780#
Total width of the complete visualization layout.
- Type:
int
- __init__(name: str, output_dir: Path, formats: List[str] | None = None)#
Initialize the visualization builder.
- Parameters:
name – Name of the pangenome for visualization titles and filenames
output_dir – Directory path where output files will be saved
formats – List of output formats to generate. Defaults to [“html”]
- static _configure_bar_plot_style(plot: figure, x_label: str | None = None, y_label: str | None = None, flip_x: bool = False, hide_x_axis: bool = False, hide_y_axis: bool = False) None#
Configure styling for bar plots.
- Parameters:
plot – The figure to configure
x_label – Label for x-axis
y_label – Label for y-axis
flip_x – Whether to flip the x-axis
hide_x_axis – Whether to hide the x-axis
hide_y_axis – Whether to hide the y-axis
- static _configure_minimal_plot(plot: figure) None#
Configure a minimal plot style (no axes, grid, etc.).
- Parameters:
plot – The figure to configure with minimal styling
- _configure_plot_style() None#
Configure common plot styling for the main figure.
This method sets up consistent appearance across all visualization types, including fonts, colors, and axis properties.
- _create_main_figure(matrix: DataFrame, x_range: FactorRange | None = None, y_range: FactorRange | None = None, tooltips: List[Tuple[str, str]] | None = None) None#
Create the main heatmap figure with common configuration.
- Parameters:
matrix – Data matrix for determining ranges if not provided
x_range – X-axis range for the plot. If None, derived from matrix columns
y_range – Y-axis range for the plot. If None, derived from matrix index
tooltips – List of tooltip specifications as (label, field) tuples
- _save_figure(fig: figure, filename_base: str) None#
Save a Bokeh figure in the specified formats.
- Parameters:
fig – The Bokeh figure object to save
filename_base – Base filename without extension
- Raises:
Exception – If an unsupported output format is specified
- property color_bar: figure#
Get the color bar figure.
- create_left_bar_plot(source: ColumnDataSource, matrix: DataFrame, y_field: str = 'system_name', value_field: str = 'count', color: str = 'navy') None#
Create a horizontal bar plot on the left side of the visualization.
- Parameters:
source – ColumnDataSource containing the data for the bars
matrix – Data matrix for determining the y-range
y_field – Field name for the y-axis values
value_field – Field name for the bar values
color – Color for the bars
- abstractmethod create_main_figure(*args, **kwargs) None#
Create the main visualization figure.
This method must be implemented by subclasses to create their specific type of main visualization (correlation matrix, partition matrix, etc.).
- create_top_bar_plot(source: ColumnDataSource, x_field: str, value_field: str = 'count', color: str = 'green', x_order: List[str] | None = None) None#
Create a vertical bar plot on the top of the visualization.
- Parameters:
source – ColumnDataSource containing the data for the bars
x_field – Field name for the x-axis values
value_field – Field name for the bar values
color – Color for the bars
x_order – Custom ordering for x-axis. If None, uses source data order
- property glyph: Glyph#
Get the glyph from the renderer.
- property glyph_renderer: GlyphRenderer#
Get the glyph renderer for the main plot.
- property left_bar: figure#
Get the left bar plot figure.
- property main_plot: figure#
Get the main heatmap plot figure.
- abstractmethod plot() None#
Create and save the complete visualization layout.
This method must be implemented by subclasses to define their specific layout arrangement and save the final visualization.
- property top_bar: figure#
Get the top bar plot figure.
- panorama.systems.utils.check_for_families(gene_families: Set[GeneFamily], gene_fam2mod_fam: Dict[GeneFamily, Set[Family]], mod_fam2meta_source: Dict[str, str], func_unit: FuncUnit) Tuple[bool, Dict[GeneFamily, Tuple[str, int]]]#
Evaluate gene families against a functional unit to detect forbidden, mandatory, and accessory family conditions.
- Parameters:
gene_families (Set[GeneFamily]) – Gene families to evaluate.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]]) – Map from gene families to their model families.
mod_fam2meta_source (Dict[str, str]) – Map from model family name to metadata source.
func_unit (FuncUnit) – Functional unit definition to check against.
- Returns:
Tuple[bool, Dict[GeneFamily, Tuple[str, int]]] –
A boolean indicating whether the conditions are satisfied (False immediately if a forbidden family is found).
A mapping from gene families to their selected metadata (source, meta_id).
- panorama.systems.utils.check_needed_families(matrix: DataFrame, func_unit: FuncUnit) bool#
Check if there are enough mandatory and total families to satisfy the functional unit rules.
- Parameters:
matrix – The association matrix between gene families and families
func_unit – The functional unit to search for.
- Returns:
Boolean – True if satisfied, False otherwise
Notes
This function assumes that a family could play multiple roles to satisfy the model requirements if it has multiple annotations
- panorama.systems.utils.conciliate_partition(partition: Set[str]) str#
Conciliate a set of partition
- Parameters:
partition (Set[str]) – All partitions.
- Returns:
str – The reconciled partition.
- panorama.systems.utils.dict_families_context(model: Model, annot2fam: Dict[str, Dict[str, Set[GeneFamily]]]) Tuple[Dict[GeneFamily, Set[Family]], Dict[str, str]]#
Retrieves all gene families associated with the families in the model.
- Parameters:
model (Model) – Model containing the families.
annot2fam (dict) – Dictionary of annotated families.
- Returns:
tuple – A tuple containing: - dict: Dictionary linking gene families to their families. - dict: Dictionary linking families to their sources.
- panorama.systems.utils.filter_global_context(graph: nx.Graph, jaccard_threshold: float = 0.8) nx.Graph[GeneFamily]#
Filters the edges of a gene family graph based on a Jaccard gene proportion threshold.
Copies all nodes to a new graph and retains only those edges where both connected GeneFamily nodes have a Jaccard gene proportion (shared genomes over unique organisms) greater than or equal to the specified threshold. Updates edge data with Jaccard values and family names.
- Parameters:
graph (nx.Graph) – The input graph with GeneFamily nodes and edge data containing ‘genomes’.
jaccard_threshold (float, optional) – Minimum Jaccard gene proportion required for both families to retain an edge. Defaults to 0.8.
- Returns:
nx.Graph[GeneFamily] – A new graph with filtered edges and updated edge attributes.
- panorama.systems.utils.filter_local_context(graph: nx.Graph, organisms: Set[Organism], jaccard_threshold: float = 0.8) nx.Graph[GeneFamily]#
Filters a graph based on a local Jaccard index.
- Parameters:
graph (nx.Graph) – A sub-pangenome graph.
organisms (Set[Organism]) – Organisms where edges between families of interest exist. Default is None
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Default is 0.8.
- panorama.systems.utils.filter_local_context_old(graph: nx.Graph, organisms: Set[Organism], jaccard_threshold: float = 0.8) nx.Graph[GeneFamily]#
Filters a graph based on a local Jaccard index.
- Parameters:
graph (nx.Graph) – A sub-pangenome graph.
organisms (Set[Organism]) – Organisms where edges between families of interest exist. Default is None
jaccard_threshold (float, optional) – Minimum Jaccard similarity used to filter edges between gene families. Default is 0.8.
- panorama.systems.utils.get_gfs_matrix_combination(gene_families: Set[GeneFamily], gene_fam2mod_fam: Dict[GeneFamily, Set[Family]]) DataFrame#
Build a matrix of association between gene families and families.
- Parameters:
gene_families (Set[GeneFamily]) – Set of gene families.
gene_fam2mod_fam (Dict[GeneFamily, Set[Family]) – Dictionary linking gene families to model families.
- Returns:
pd.DataFrame – Matrix of association between gene families and families.
- panorama.systems.utils.get_metadata_to_families(pangenome: Pangenome, sources: Iterable[str]) Dict[str, Dict[str, Set[GeneFamily]]]#
Retrieves a mapping of metadata to sets of gene families for each metadata source.
- Parameters:
pangenome (Pangenome) – Pangenome object containing gene families.
sources (iterable of str) – List of metadata source names.
- Returns:
dict – A dictionary where each metadata source maps to another dictionary of metadata to sets of gene families.
panorama.systems.write_systems module#
This module provides functions to write information into the pangenome file
- panorama.systems.write_systems.check_pangenome_write_systems(pangenome: Pangenome, sources: List[str]) None#
Check and load pangenome information before adding annotation.
- Parameters:
pangenome (Pangenome) – The Pangenome object.
sources (List[str]) – Sources used to detect systems.
- Raises:
KeyError – If the provided systems source is not in the pangenome.
Exception – If systems have not been detected in pangenome.
AttributeError – If there is no metadata associated with families.
- panorama.systems.write_systems.check_write_systems_args(args: Namespace) Dict[str, Any]#
Checks the provided arguments to ensure that they are valid.
- Parameters:
args (argparse.Namespace) – The parsed arguments.
- Returns:
Dict[str, Any] – A dictionary containing the necessary information for further processing.
- Raises:
argparse.ArgumentTypeError – If the number of sources is different from models, or if annotations are given, and
their number is different from systems sources.
- panorama.systems.write_systems.launch(args)#
Launch functions to read systems.
- Parameters:
args – Argument given.
- panorama.systems.write_systems.parser_write(parser)#
Parser for specific arguments of the write_systems command.
- Parameters:
parser (argparse.ArgumentParser) – Parser for annot argument.
- panorama.systems.write_systems.subparser(sub_parser) ArgumentParser#
Subparser to launch PANORAMA in the Command line.
- Parameters:
sub_parser – Subparser for align command.
- Returns:
argparse.ArgumentParser – Parser arguments for align command.
- panorama.systems.write_systems.write_flat_systems_to_pangenome(pangenome: Pangenome, output: Path, projection: bool = False, association: List[str] = None, partition: bool = False, proksee: str = None, output_formats: List[str] = None, organisms: List[str] = None, canonical: bool = False, threads: int = 1, lock: Lock = None, force: bool = False, disable_bar: bool = False)#
Write detected systems from a pangenome to an output directory in a flat format.
- Parameters:
pangenome (Pangenome) – The pangenome object containing the detected systems.
output (Path) – The directory where the systems will be written.
projection (bool, optional) – If True, write projection systems. Defaults to False.
association (List[str], optional) – List of associations to be considered. Defaults to None.
partition (bool, optional) – If True, write partition systems. Defaults to False.
proksee (str, optional) – A placeholder for future Proksee integration. Defaults to None.
output_formats (List[str]) – A list of output formats for visualization. Defaults to None.
organisms (List[str], optional) – List of organisms to be considered for projection. Defaults to None.
canonical (bool, optional) – If True, write the canonical version of systems too. Defaults to False.
threads (int, optional) – Number of threads to use for parallel processing. Defaults to 1.
lock (Lock, optional) – A multiprocessing lock to synchronize access. Defaults to None.
force (bool, optional) – If True, overwrite existing files. Defaults to False.
disable_bar (bool, optional) – If True, disable the progress bar. Defaults to False.
- Raises:
NotImplementedError – If Proksee integration is requested but not implemented.
- panorama.systems.write_systems.write_pangenomes_systems(pangenomes: Pangenomes, output: Path, projection: bool = False, association: List[str] = None, partition: bool = False, proksee: str = None, output_formats: List[str] = None, organisms: List[str] = None, canonical: bool = False, threads: int = 1, lock: Lock = None, force: bool = False, disable_bar: bool = False)#
Write flat files about systems for all pangenomes.
- Parameters:
pangenomes (Pangenomes) – Pangenome objects with all pangenome.
output (Path) – Path to write flat files about systems.
projection (bool, optional) – Flag to enable/disable pangenome projection. Defaults to False.
association (List[str], optional) – Write systems association to the given pangenome object. Defaults to None.
partition (bool, optional) – Flag to enable write system partition. Defaults to False.
proksee (str, optional) – Write proksee with the systems and the given pangenome object. Defaults to None.
output_formats (List[str], optional) – List of output formats for visualization. Defaults to None.
organisms (List[str], optional) – List of organism names to write. Defaults to all organisms.
canonical (bool, optional) – If True, write the canonical version of systems too. Defaults to False.
threads (int, optional) – Number of available threads. Defaults to 1.
lock (Lock, optional) – Global lock for multiprocessing execution. Defaults to None.
force (bool, optional) – Flag to allow overwriting files. Defaults to False.
disable_bar (bool, optional) – Flag to disable the progress bar. Defaults to False.