panorama.compare package#

Submodules#

panorama.compare.context module#

panorama.compare.context.check_context_comparison(args)#

Checks the provided keyword arguments to ensure either ‘sequences’ or ‘family’ is present.

Parameters:

kwargs (dict) – Keyword arguments to check.

Raises:

Exception – If neither ‘sequences’ nor ‘family’ is present in kwargs.

panorama.compare.context.clean_context_objects(contexts)#

Remove some of the object contain in context to make it pickable.

panorama.compare.context.compare_gene_contexts_graph_mp(gene_contexts: List[GeneContext], gene_fam_2_cluster_fam: Dict[str, str], return_multigraph: bool, outdir: Path, cpus: int = 1, disable_bar: bool = False) Graph#

Compares gene contexts by looking at their context graphs.

Parameters:
  • gene_contexts – A list of GeneContext objects to be compared.

  • gene_fam_2_cluster_fam – dict mapping gene family name to cluster family name

  • max_workers – The maximum number of worker processes for parallel execution.

  • disable_bar – A boolean flag indicating whether to disable the progress bar.

Returns:

A list of GeneContext objects.

panorama.compare.context.compare_gene_contexts_on_cluster_families(gene_contexts: List[GeneContext], min_jaccard, max_workers: int, disable_bar: bool) List[GeneContext]#

Compares gene contexts by calculating the Jaccard similarity between their family clusters.

Parameters:
  • gene_contexts – A list of GeneContext objects to be compared.

  • min_jaccard – jaccard cutoff to report a pair of contexts.

  • max_workers – The maximum number of worker processes for parallel execution.

  • disable_bar – A boolean flag indicating whether to disable the progress bar.

Returns:

A list of GeneContext objects.

panorama.compare.context.compare_pair_of_context_graphs(context_pair: Tuple[GeneContext, GeneContext], return_multigraph: bool, outdir)#
panorama.compare.context.compare_pair_of_contexts(context_pair: Tuple[GeneContext, GeneContext], min_jaccard: float) Tuple[GeneContext, GeneContext, float]#

Compares a pair of gene contexts and calculates the Jaccard similarity between their family clusters.

Parameters:
  • context_pair – A tuple containing two GeneContext objects to be compared.

  • min_jaccard – min jaccard cutoff to report a pair of contexts.

Returns:

A tuple containing the two GeneContext objects and the Jaccard similarity between their family clusters.

panorama.compare.context.compute_CCC(meta_nodes: List[str], g1_edges: List[Tuple[str, str]], g2_edges: List[Tuple[str, str]]) List[Set[str]]#

Compute the conserved connected components (CCC) between two graphs. The method implement here is taken from Boyer et al. article (https://doi.org/10.1093/bioinformatics/bti711).

Parameters:
  • meta_nodes – List of meta-nodes representing the common family clusters.

  • g1_edges – List of edges in graph 1.

  • g2_edges – List of edges in graph 2.

Returns:

List of sets representing the conserved connected components.

panorama.compare.context.context_comparison(pangenomes: Pangenomes, gene_contexts: Set[GeneContext], synteny_score: float, output: Path, tmpdir: Path, cpus: int = 1, lock: Lock = None, force: bool = False, disable_bar: bool = False)#

Perform comparison of gene contexts and cluster families.

Parameters:
  • pangenome_to_path – A dictionary mapping pangenome names to their corresponding paths.

  • contexts_results – The path to the file containing the list of context results.

  • family_clusters – A boolean indicating whether to use precomputed family clusters or run the cluster family function.

  • lock – A Lock object for thread synchronization.

  • output – The output directory path.

  • tmpdir – The temporary directory path.

  • task – The number of tasks for parallel processing (default: 1).

  • threads_per_task – The number of threads per task (default: 1).

  • disable_bar – A boolean indicating whether to disable progress bars (default: False).

  • force – A boolean indicating whether to force overwriting existing output files (default: False).

  • ppanggolin_context_args – Additional keyword arguments to search context using ppanggolin.

panorama.compare.context.create_metanodes(gfA_to_cf: dict, gfB_to_cf: dict) Tuple[List[Tuple[str, dict]], dict, dict]#

Create metanodes for a multigraph based on gene family mappings.

Parameters:
  • gfA_to_cf – A dictionary mapping gene families in graph A to their cluster families.

  • gfB_to_cf – A dictionary mapping gene families in graph B to their cluster families.

Returns:

A tuple containing the metanodes, graph A node to metanodes mapping, and graph B node to metanodes mapping.

Metanodes are created for gene families that have a common cluster family between the two graphs.

When a cluster family is associated with more than one gene family in a graph, multiple metanodes are created, and each metanode is differentiated by adding “´” to its name.

panorama.compare.context.get_connected_components(nodes: List[str], edges: List[Tuple[str, str]]) Iterator[Set[str]]#

Get the connected components in a graph.

Parameters:
  • nodes – List of nodes in the graph.

  • edges – List of edges in the graph.

Returns:

Iterator of sets representing the connected components.

panorama.compare.context.get_conserved_genomics_contexts(gcA_graph: Graph, gcB_graph: Graph, min_cgc_size: int = 2, return_multigraph: bool = False) List[Tuple[Set[str], Set[str]]]#

Get the conserved genomics contexts between two gene context graphs.

Parameters:
  • gcA_graph – The gene context graph A.

  • gcB_graph – The gene context graph B.

  • gene_fam_2_cluster_fam – Dictionary mapping gene families to cluster families.

  • min_cgc_size – Minimum size of a conserved genomic context to report.

  • return_multigraph – Flag indicating whether to return the multigraph representation.

Returns:

Tuple containing a list of tuples representing the conserved genomics contexts and the multigraph representation (if return_multigraph is True). Each tuple in the list contains two sets: the gene nodes from graph A and the gene nodes from graph B.

panorama.compare.context.get_contexts_from_result(pangenome: Pangenome, context_result_path: Path) Set[GeneContext]#

Retrieve gene contexts from a table and create GeneContext objects.

Parameters:
  • context_result_path – The path to the context table file or graph.

  • pangenome_name – The name of the pangenome.

  • pangenome_path – The path to the pangenome file.

  • taxid – The taxonomic ID associated with the pangenome.

panorama.compare.context.get_gene_contexts_from_results_mp(pangenomes: Pangenomes, context_results: Path, cpus: int, disable_bar: bool) List[Set[GeneContext]]#

Retrieve gene contexts from multiple result files using multiprocessing.

Parameters:
  • pan_name_to_path – A dictionary mapping pangenome names to their path information.

  • pan_name_to_context_result – A dictionary mapping pangenome names to their corresponding context tables or graphs.

  • max_workers – The maximum number of workers to use for multiprocessing.

  • disable_bar – A boolean value indicating whether to disable the progress bar.

Returns:

A list of Pangenome objects containing the retrieved gene contexts.

panorama.compare.context.get_multigraph_edges(g: Graph, g_node_2_meta_nodes: dict) List[Tuple[str, str]]#

Translate edges of a graph into edges linking metanodes of a multigraph.

This function takes a graph and a mapping of graph nodes to their corresponding metanodes and translates the edges of the graph into metanodes of the multigraph. Only nodes that have metanodes are considered, and the edges are formed by combining the metanodes corresponding to the endpoints of each edge.

Parameters:
  • g – The graph whose edges are to be translated.

  • g_node_2_meta_nodes – A dictionary mapping graph nodes to their corresponding metanodes.

Returns:

A list of edges in the multigraph.

panorama.compare.context.get_shortest_path_edges_cc_strategy(g, weight='mean_transitivity')#

TODO : This is an aproximation of the shortest path algo.

A more robust implementation is required. https://www.baeldung.com/cs/shortest-path-visiting-all-nodes

panorama.compare.context.launch(args)#

Launch functions to annotate pangenomes

Parameters:

args – Argument given

panorama.compare.context.launch_compare_pair_of_context_graphs(pack: tuple) tuple#

Allow to launch in multiprocessing the context comparison

Parameters:

pack – Pack of argument for context comparison

Returns:

edge metrics

panorama.compare.context.launch_context_comparison(pack: tuple) tuple#

Allow to launch in multiprocessing the context comparison

Parameters:

pack – Pack of argument for context comparison

Returns:

edge metrics

panorama.compare.context.launch_ppanggolin_context(pangenomes: Pangenomes, ppanggolin_context_args: dict, output: Path, align_args: dict = None, disable_bar: bool = False) Set[GeneContext]#
panorama.compare.context.make_gene_context_from_context_graph(pangenome: Pangenome, contexts_graph: Graph) Set[GeneContext]#

Create gene contexts from a context graph.

Parameters:
  • pangenome – The Pangenome object.

  • context_graph – The context graph

Returns:

A set of GeneContext objects.

panorama.compare.context.make_gene_context_from_context_table(pangenome: Pangenome, context_table: str) Set[GeneContext]#

Create gene contexts from a context table.

Parameters:
  • pangenome – The Pangenome object.

  • context_table – The path to the context table.

Returns:

A set of GeneContext objects.

panorama.compare.context.parse_context_results(contexts_result_file_list: Path) Dict[str, Path]#

Parse the context results file list.

Parameters:

contexts_result_file_list – The path to the file containing the list of pangenome names and context file paths

Returns:

A dictionary mapping pangenome names to context file paths.

panorama.compare.context.parser_comparison_context(parser)#

Parser for specific argument of annot command

Parameters:

parser – parser for annot argument

panorama.compare.context.pass_graph_attribute_to_multigraph(meta_nodes_2_attributes, gcA_graph, node_mapper)#
panorama.compare.context.subparser(sub_parser) ArgumentParser#

Subparser to launch PANORAMA in Command line

:param sub_parser : sub_parser for align command

:return : parser arguments for align command

panorama.compare.context.write_context_summary(gene_contexts: Set[GeneContext], output_table: Path)#

Write a summary of gene contexts to a table.

Parameters:
  • gene_contexts – A list of GeneContext objects representing gene contexts to summarize.

  • output_table – The path to the output table file where the summary will be written.

panorama.compare.spots module#

Pangenome spots comparison and conserved spots detection module.

This module provides comprehensive functionality for comparing spots across multiple pangenomes, detecting conserved genomic regions, and analyzing systems relationships through graph-based clustering approaches. It includes utilities for building comparative graphs, computing gene family relatedness relationships (GFRR), and generating various output formats for visualization.

panorama.compare.spots.add_systems_info(pangenomes: Pangenomes, cs_graph: Graph) None#

Annotate conserved spots graph with systems information.

This function enriches the conserved spots graph by adding system-related attributes to nodes. It identifies which systems are associated with each conserved spot and adds boolean attributes for system presence as well as a count of total systems per spot.

Parameters:
  • pangenomes (Pangenomes) – Collection of pangenomes containing systems data.

  • cs_graph (nx.Graph) – Conserved spots’ graph to annotate (modified in-place).

panorama.compare.spots.check_compare_spots_args(args: Namespace) Dict[str, Any]#

Validate and configure arguments for spots comparison analysis.

This function analyzes command-line arguments to determine required data components for pangenome spots comparison. It validates the consistency of system-related arguments and configures the information dictionary for downstream processing.

Parameters:

args (argparse.Namespace) – Parsed command-line arguments containing: - systems (bool): Whether to include systems analysis - models (Optional[List[Path]]): Paths to model files - sources (Optional[List[str]]): System source names - canonical (bool): Whether to include canonical systems - disable_prog_bar (bool): Progress bar control

Returns:

Dict[str, Any] – Configuration dictionary specifying required data: - need_annotations (bool): Whether annotations are required - need_families (bool): Whether gene families are required - need_families_info (bool): Whether family metadata is required - need_rgp (bool): Whether regions of genomic plasticity are required - need_spots (bool): Whether spots are required - Additional system-specific requirements if systems analysis is enabled

Raises:

argparse.ArgumentError – If there’s a mismatch in system-related arguments: - Systems requested but no models provided - Systems requested but no sources provided - Models provided but no sources (or vice versa)

panorama.compare.spots.check_pangenome_cs(pangenome: Pangenome, sources: List[str] | None = None) None#

Validate pangenome status for conserved spots analysis.

This function ensures that a pangenome has the required components computed for conserved spots analysis, including RGPs (Regions of Genomic Plasticity) and spots. Optionally validates systems detection if sources are provided.

Parameters:
  • pangenome (Pangenome) – Pangenome object to validate.

  • sources (Optional[List[str]]) – List of system sources to validate. If None, systems validation is skipped.

Raises:
  • ValueError – If RGPs or spots haven’t been computed for the pangenome. Includes guidance on required commands.

  • AttributeError – If systems detection hasn’t been performed when sources are specified.

  • KeyError – If specified sources aren’t found in the pangenome’s systems.

panorama.compare.spots.compare_spots(pangenomes: Pangenomes, dup_margin: float = 0.05, gfrr_metrics: str = 'min_gfrr', gfrr_cutoff: Tuple[float, float] = (0.8, 0.8), seed: int = 42, threads: int = 1, lock: Lock | None = None, disable_bar: bool = False) Graph#

Identify and cluster conserved spots across multiple pangenomes.

This is the main function for conserved spots detection. It creates a comprehensive graph of spots from all pangenomes, computes similarity edges based on Gene Family Relatedness Relationship (GFRR) metrics, performs clustering to identify conserved spots, and integrates the results into the pangenomes object.

Parameters:
  • pangenomes (Pangenomes) – Collection of pangenomes to analyze.

  • dup_margin (float) – Minimum ratio for multigenic family detection. Default: 0.05.

  • gfrr_metrics (str) – GFRR metric for clustering (‘min_gfrr’ or ‘max_gfrr’). Default: ‘min_gfrr’.

  • gfrr_cutoff (Tuple[float, float]) – Thresholds for (min_gfrr, max_gfrr). Default: (0.8, 0.8).

  • seed (int) – Random seed for reproducibility. Default: 42.

  • threads (int) – Number of threads for parallel processing. Default: 1.

  • lock (Optional[Lock]) – Thread synchronization lock. Default: None.

  • disable_bar (bool) – Whether to disable progress bars. Default: False.

Returns:

nx.Graph – Final spots graph with clustering information and conserved spots annotations.

Side Effects:
  • Adds ConservedSpots objects to the pangenomes collection

  • Modifies the returned graph by adding cluster assignments and removing isolated nodes

Raises:
  • ValueError – If gfrr_metrics is not ‘min_gfrr’ or ‘max_gfrr’.

  • RuntimeError – If clustering fails or produces no valid clusters.

panorama.compare.spots.compute_gfrr_edges(graph: Graph, spots2borders: Dict[int, Set[GeneFamily]], spots2pangenome: Dict[int, str], min_gfrr_cutoff: float = 0.5, max_gfrr_cutoff: float = 0.8, disable_bar: bool = False) None#

Compute and add edges between spots based on Gene Family Relatedness Relationship (GFRR).

This function calculates GFRR metrics between all pairs of spots from different pangenomes and adds edges to the graph when both minimum and maximum GFRR values exceed their respective cutoffs. This creates connections between potentially conserved genomic spots.

Parameters:
  • graph (nx.Graph) – Spots graph to add edges to (modified in-place).

  • spots2borders (Dict[int, Set[GeneFamily]]) – Mapping of spot hashes to their bordering gene families.

  • spots2pangenome (Dict[int, str]) – Mapping of spot hashes to pangenome names.

  • min_gfrr_cutoff (float) – Minimum threshold for min_gfrr metric. Default: 0.5.

  • max_gfrr_cutoff (float) – Minimum threshold for max_gfrr metric. Default: 0.8.

  • disable_bar (bool) – Whether to disable the progress bar. Default: False.

panorama.compare.spots.create_pangenome_spots_graph(pangenome: Pangenome, dup_margin: float = 0.05) Tuple[Graph, Dict[int, Set[GeneFamily]], Dict[int, str], Dict[int, Spot]]#

Create a graph representation of spots from a single pangenome.

This function constructs a NetworkX graph where nodes represent spots from the given pangenome. Each spot is assigned a unique hash identifier and associated with its bordering gene families for subsequent comparison analysis.

Parameters:
  • pangenome (Pangenome) – Source pangenome containing spots to process.

  • dup_margin (float) – Minimum ratio of organisms that must contain multiple copies of a gene family for it to be considered duplicated. Used for multigenic family detection. Default: 0.05 (5%).

Returns:
  • graph (nx.Graph) – NetworkX graph with spot nodes (no edges)

  • spots2borders (Dict[int, Set[GeneFamily]]) – Mapping of spot hashes to their bordering gene families

  • spots2pangenome (Dict[int, str]) – Mapping of spot hashes to pangenome names

  • spothash2spot (Dict[int, Spot]) – Mapping of spot hashes to spot objects

panorama.compare.spots.create_pangenome_system_graph(pangenome: Pangenome, canonical: bool = False) Tuple[Graph, Dict[int, str], Dict[int, Any], defaultdict]#

Create a system graph for a single pangenome.

This function constructs a graph representation of systems within a pangenome, where each node represents a unique system-spot combination. It computes system coverage statistics and tracks conserved spots associations.

Parameters:
  • pangenome (Pangenome) – Source pangenome containing systems to process.

  • canonical (bool) – Whether to use canonical systems (True) or not (False). Default: False.

Returns:
  • graph (nx.Graph) – NetworkX graph with system nodes

  • sys2pangenome (Dict[int, str]) – Mapping of system hashes to pangenome names

  • syshash2sys (Dict[int, Any]) – Mapping of system hashes to system objects

  • systemhash2conserved_spots (defaultdict) – Mapping of system hashes to sets of conserved spot IDs

panorama.compare.spots.create_spots_graph(pangenomes: Pangenomes, dup_margin: float = 0.05, threads: int = 1, lock: Lock | None = None, disable_bar: bool = False) Tuple[Graph, Dict[int, Set[GeneFamily]], Dict[int, str], Dict[int, Spot]]#

Create a comprehensive spots graph from multiple pangenomes.

This function processes all pangenomes in parallel to construct a unified graph where nodes represent spots from all pangenomes. The graph serves as the foundation for conserved spots detection through subsequent edge computation and clustering.

Parameters:
  • pangenomes (Pangenomes) – Collection of pangenomes to process.

  • dup_margin (float) – Minimum ratio for multigenic family detection. Default: 0.05.

  • threads (int) – Number of threads for parallel processing. Default: 1.

  • lock (Optional[Lock]) – Thread synchronization lock. Default: None.

  • disable_bar (bool) – Whether to disable the progress bar. Default: False.

Returns:
  • spots_graph (nx.Graph) – Unified graph containing all spots as nodes

  • spots2borders (Dict[int, Set[GeneFamily]]) – Complete mapping of spot hashes to their bordering families

  • spots2pangenome (Dict[int, str]) – Complete mapping of spot hashes to pangenome names

  • spothash2spot (Dict[int, Spot]) – Complete mapping of spot hashes to spot objects

Raises:

RuntimeError – If parallel processing fails or produces inconsistent results.

panorama.compare.spots.create_systems_graph(pangenomes: Pangenomes, canonical: bool = False, threads: int = 1, lock: Lock | None = None, disable_bar: bool = False) Tuple[Graph, Dict[int, str], Dict[int, Any], defaultdict]#

Create a unified systems graph from multiple pangenomes.

This function processes all pangenomes in parallel to construct a comprehensive graph where nodes represent system-spot combinations across all pangenomes. The resulting graph serves as input for systems clustering and conserved spots analysis.

Parameters:
  • pangenomes (Pangenomes) – Collection of pangenomes to process.

  • canonical (bool) – Whether to use canonical systems (True) or not (False). Default: False.

  • threads (int) – Number of threads for parallel processing. Default: 1.

  • lock (Optional[Lock]) – Thread synchronization lock. Default: None.

  • disable_bar (bool) – Whether to disable the progress bar. Default: False.

Returns:

Tuple containing

  • systems_graph (nx.Graph): Unified graph with all system-spot nodes

  • systems2pangenome (Dict[int, str]): Mapping of system hashes to pangenome names

  • systemhash2system (Dict[int, Any]): Mapping of system hashes to system objects

  • systemhash2conserved_spots (defaultdict): Mapping of system hashes to sets of conserved spot IDs

Raises:

RuntimeError – If parallel processing fails for any pangenome.

Generate and analyze systems linkage graphs based on conserved spots.

This function creates comprehensive graphs linking systems through shared conserved spots and performs dual clustering analysis using both Louvain community detection or Minimum Spanning Tree (MST) approaches. The analysis identifies system clusters that span multiple pangenomes and share conserved genomic locations.

Parameters:
  • community

  • pangenomes (Pangenomes) – Collection of pangenomes with systems and conserved spots.

  • output (Path) – Output directory for generated graphs and results.

  • graph_formats (Optional[List[str]]) – Output formats [‘gexf’, ‘graphml’]. Default: None.

  • canonical – Whether to use the canonical systems (True) or not (False). Default: False.

  • threads (int) – Number of threads for parallel processing. Default: 1.

  • lock (Optional[Lock]) – Thread synchronization lock. Default: None.

  • disable_bar (bool) – Whether to disable progress bars. Default: False.

panorama.compare.spots.launch(args: Namespace) None#

Main entry point for conserved spots comparison analysis.

This function orchestrates the complete workflow for identifying and analyzing conserved spots across multiple pangenomes. It handles argument validation, resource setup, analysis execution, and results output.

Parameters:

args (argparse.Namespace) – Parsed command-line arguments containing all configuration parameters for the analysis.

Raises:

Various exceptions from underlying functions for validation, processing, or I/O errors.

panorama.compare.spots.parser_comparison_spots(parser: ArgumentParser) None#

Configure argument parser for conserved spots comparison command.

This function adds all necessary command-line arguments for conserved spots comparison, including core comparison parameters, systems analysis options, and various output configurations.

Parameters:

parser (argparse.ArgumentParser) – Parser to configure with comparison arguments.

panorama.compare.spots.subparser(sub_parser: _SubParsersAction) ArgumentParser#

Create a subparser for conserved spots comparison command.

This function configures the command-line interface for the conserved spots comparison functionality, setting up the argument parser with appropriate description and configuration.

Parameters:

sub_parser (argparse._SubParsersAction) – Parent subparser to add this command to.

Returns:

argparse.ArgumentParser – Configured parser for the compare_spots command.

panorama.compare.spots.write_conserved_spots(pangenomes: Pangenomes, output: Path, graph_formats: List[str] | None = None, cs_graph: Graph | None = None, force: bool = False, disable_bar: bool = False) None#

Write conserved spots data to files and optionally export graphs.

This function generates comprehensive output files documenting conserved spots across pangenomes, including individual spot details and summary statistics. It also integrates systems information and exports graph representations when requested.

Parameters:
  • pangenomes (Pangenomes) – Collection of pangenomes with conserved spots.

  • output (Path) – Output directory for generated files.

  • graph_formats (Optional[List[str]]) – Graph export formats [‘gexf’, ‘graphml’]. Default: None.

  • cs_graph (Optional[nx.Graph]) – Conserved spots’ graph to export. Default: None.

  • force (bool) – Whether to overwrite existing files. Default: False.

  • disable_bar (bool) – Whether to disable progress bars. Default: False.

Side Effects:
  • Creates output directory structure

  • Generates individual TSV files for each conserved spot

  • Creates a summary TSV file with all conserved spots

  • Exports graph files in specified formats

  • Modifies cs_graph by adding systems information

Raises:
  • AssertionError – If graph_formats specified but cs_graph is None.

  • IOError – If file writing fails.

panorama.compare.systems module#

Systems Comparison Module for PANORAMA

This module provides functionality to compare genomic systems across multiple pangenomes, compute similarity metrics, and generate visualizations.

exception panorama.compare.systems.SystemsComparisonError#

Bases: Exception

Custom exception for systems comparison errors.

panorama.compare.systems.add_system_metadata_to_graph(pangenomes: Pangenomes, graph: Graph) None#

Add system metadata as node attributes to the systems graph.

Parameters:
  • pangenomes – Collection of pangenomes containing systems.

  • graph – NetworkX graph to add metadata to.

panorama.compare.systems.check_compare_systems_args(args: Namespace) Dict#

Validate and prepare arguments for systems comparison.

Parameters:

args – Command line arguments containing sources, models, and other parameters.

Returns:

Dict containing required information flags and parameters for pangenome processing.

Raises:

argparse.ArgumentError – If the number of sources and models don’t match.

panorama.compare.systems.compare_systems(pangenomes: Pangenomes, gfrr_metrics: str = 'min_gfrr_models', gfrr_cutoff: Tuple[float, float] = (0.8, 0.8), gfrr_models_cutoff: Tuple[float, float] = (0.8, 0.8), seed: int = 42, threads: int = 1, lock: Lock | None = None, disable_bar: bool = False) Graph#

Compare systems across pangenomes and identify conserved system clusters.

Parameters:
  • pangenomes – Collection of pangenomes to compare.

  • gfrr_metrics – Metric to use for clustering (‘min_gfrr_models’, ‘max_gfrr_models’, etc.).

  • gfrr_cutoff – GFRR cutoff thresholds for all gene families.

  • gfrr_models_cutoff – GFRR cutoff thresholds for model gene families.

  • seed (int) – Random seed for reproducibility. Default: 42.

  • threads – Number of threads for parallel processing.

  • lock – Thread lock for synchronization.

  • disable_bar – Whether to disable progress bars.

Returns:

NetworkX graph containing conserved systems clusters.

Raises:

SystemsComparisonError – If comparison fails.

panorama.compare.systems.compute_gfrr_edges(graph: Graph, system_to_pangenome: Dict[int, str], system_hash_to_system: Dict[int, System], gfrr_cutoff: Tuple[float, float] = (0.8, 0.8), gfrr_models_cutoff: Tuple[float, float] = (0.8, 0.8), disable_bar: bool = False) None#

Compute GFRR (Gene Families Repertoire Relatedness) edges between systems from different pangenomes.

GFRR is a similarity metric that compares gene family repertoires between systems. Edges are only added between systems that meet both GFRR model and GFRR cutoff thresholds.

Parameters:
  • graph – Graph with system nodes (edges will be added).

  • system_to_pangenome – Mapping from system hash to pangenome name.

  • system_hash_to_system – Mapping from system hash to a system object.

  • gfrr_cutoff – Minimum (min_gfrr, max_gfrr) for all gene families.

  • gfrr_models_cutoff – Minimum (min_gfrr, max_gfrr) for model gene families.

  • disable_bar – Whether to disable the progress bar.

panorama.compare.systems.create_pangenome_system_graph(pangenome) Tuple[Graph, Dict[int, str], Dict[int, System]]#

Create a graph representation of systems for a single pangenome.

Parameters:

pangenome – Pangenome object containing systems.

Returns:

Tuple containing

  • NetworkX graph with system nodes

  • Dictionary mapping system hash to pangenome name

  • Dictionary mapping system hash to a system object

panorama.compare.systems.create_pangenome_systems_heatmaps(pangenomes: Pangenomes, output: Path) None#

Generate heatmaps showing system distribution across pangenomes.

Parameters:
  • pangenomes – Collection of pangenomes to analyze.

  • output – Directory to save the generated heatmaps.

panorama.compare.systems.create_systems_graph(pangenomes: Pangenomes, threads: int = 1, lock: Lock | None = None, disable_bar: bool = False) Tuple[Graph, Dict[int, str], Dict[int, System]]#

Create a comprehensive graph of all systems across pangenomes.

Parameters:
  • pangenomes – Collection of pangenomes to process.

  • threads – Number of threads for parallel processing.

  • lock – Thread lock for synchronization.

  • disable_bar – Whether to disable the progress bar.

Returns:

Tuple containing

  • NetworkX graph with all system nodes

  • Dictionary mapping system hash to pangenome name

  • Dictionary mapping system hash to a system object

panorama.compare.systems.generate_heatmap(data: DataFrame, output: Path, output_name: str, output_formats: List[str], figure_size: Tuple[float, float], title: str = 'Heatmap', font_size: int = 18) None#

Generate a heatmap visualization using Bokeh.

Parameters:
  • data – Input data for the heatmap (should be a DataFrame).

  • output – Directory to save the heatmap files.

  • output_name – Base name for output files.

  • output_formats – List of formats to save (‘html’, ‘png’).

  • figure_size – Size of the figure in pixels (width, height).

  • title – Title for the heatmap.

  • font_size – Base font size for text elements.

panorama.compare.systems.get_pangenomes_to_systems_data(pangenomes: Pangenomes) Dict[str, Dict[str, int]]#

Extract system occurrence data from pangenomes.

Parameters:

pangenomes – Collection of pangenomes.

Returns:

Dictionary mapping pangenome names to system occurrence counts.

panorama.compare.systems.launch(args: Namespace) None#

Main entry point for systems comparison analysis.

Parameters:

args – Command line arguments containing all parameters for the analysis.

Raises:

SystemsComparisonError – If analysis fails at any stage.

panorama.compare.systems.parser_comparison_systems(parser: ArgumentParser) None#

Configure argument parser for systems comparison command.

Parameters:

parser – Argument parser to configure.

panorama.compare.systems.subparser(sub_parser: _SubParsersAction) ArgumentParser#

Create a subparser for systems comparison command.

Parameters:

sub_parser – Subparser action from main argument parser.

Returns:

Configured argument parser for systems comparison.

panorama.compare.systems.write_conserved_systems(pangenomes: Pangenomes, output: Path, conserved_systems_graph: Graph, graph_formats: List[str]) None#

Write a conserved systems graph to various output formats.

Parameters:
  • pangenomes – Collection of pangenomes.

  • output – Output directory path.

  • conserved_systems_graph – Graph of conserved systems.

  • graph_formats – List of output formats (‘gexf’, ‘graphml’).

panorama.compare.utils module#

A collection of utility functions for pangenome comparison.

This module contains functionalities to compute gene family relatedness, cluster gene families based on graph metrics, and launch comparative workflows for pangenome analysis. It also includes utilities for building argument parsers related to these workflows.

panorama.compare.utils.cluster_on_gfrr(graph: Graph, gfrr_metric: str, seed: int = 42) List[Set[Any]]#

Cluster graph nodes using Louvain community detection based on GFRR metrics.

This function applies the Louvain algorithm for community detection to partition the input graph into clusters. Each node is assigned a cluster identifier as a node attribute, and the function returns the cluster partitions.

Parameters:
  • graph (nx.Graph) – Input graph with nodes and weighted edges to cluster.

  • gfrr_metric (str) – Name of the edge weight attribute to use for clustering (e.g., ‘min_frr’, ‘max_frr’).

  • seed (int) – Optional random seed for reproducibility. Default: 42.

Returns:

List[Set[Any]] – List where each element is a set containing nodes belonging to the same cluster.

Raises:

KeyError – If the specified gfrr_metric is not found in graph edge attributes.

panorama.compare.utils.common_launch(args: Any, check_func: Callable, need_info: Dict[str, Any], **kwargs: Any) Tuple[Pangenomes, Path, Manager, Lock]#

Launch a common setup for comparative pangenome workflows.

This function handles the common initialization steps for pangenome comparison workflows, including loading pangenomes, setting up multiprocessing resources, creating temporary directories, and optionally performing gene family clustering.

Parameters:
  • args (Any) –

    Parsed command-line arguments containing workflow parameters such as
    • pangenomes: Path to pangenome list file

    • cpus: Number of CPU threads to use

    • cluster: Optional path to existing clustering results

    • tmpdir: Temporary directory path

    • Various MMSeqs2 clustering parameters

  • check_func (Callable) – Validation function to check pangenome integrity during loading.

  • need_info (Dict[str, Any]) –

    Dictionary specifying required data components:
    • ’need_families_sequences’: bool, whether sequences are needed

    • Other data requirements as key-value pairs

  • **kwargs (Any) – Additional keyword arguments passed to pangenome loading function.

Returns:

Tuple[Pangenomes, Path, Manager, Lock] – Setup resources containing: - pangenomes (Pangenomes): Loaded and processed pangenomes object - tmpdir (Path): Path to temporary directory for intermediate files - manager (Manager): Multiprocessing manager for shared resources - lock (Lock): Thread-safe lock for concurrent operations

Raises:
  • FileNotFoundError – If pangenome files or clustering file cannot be found.

  • ValueError – If clustering parameters are invalid.

panorama.compare.utils.compute_gfrr(queries: Set[GeneFamily], targets: Set[GeneFamily]) Tuple[float, float, int]#

Compute Gene Family Repertoire Relatedness (GFRR) metrics between query and target gene families.

This function evaluates the overlap of ‘akin’ relationships between two sets of gene families and calculates both minimum and maximum Family Relatedness Relationship (FRR) values. The FRR is a metric to assess similarity between gene family sets based on their shared relationships.

Parameters:
  • queries (Set[GeneFamily]) – Set of query gene families to analyze.

  • targets (Set[GeneFamily]) – Set of target gene families to compare against.

Returns:

Tuple[float, float, int] – A tuple containing: - min_frr (float): Minimum FRR value (shared akins / min set size) - max_frr (float): Maximum FRR value (shared akins / max set size) - num_reciprocal (int): Number of reciprocal akin relationships found

Raises:

ValueError – If either queries or targets set are empty.

Note

  • min_frr represents conservative similarity (harder to achieve high values)

  • max_frr represents liberal similarity (easier to achieve high values)

  • Akin relationships represent evolutionary or functional similarities between gene families

panorama.compare.utils.parser_comparison(parser: Any) Tuple[Any, Any, Any]#

Configure an argument parser for pangenome comparison commands.

This function sets up command-line argument groups and options specific to pangenome comparison workflows, including required arguments, comparison options, and MMSeqs2 clustering parameters.

Parameters:

parser (Any) – ArgumentParser instance to configure with comparison-specific arguments.

Returns:

Tuple[Any, Any, Any] – Tuple containing three argument groups: - required (ArgumentGroup): Required arguments group - compare_opt (ArgumentGroup): Comparison optional arguments group - optional (ArgumentGroup): General optional arguments group

Side Effects:
  • Modifies the input parser by adding argument groups and options

  • Configures MMSeqs2 clustering arguments through parser_mmseqs2_cluster

Note

This function defines the command-line interface for comparison workflows, making it easier to maintain consistent argument handling across different comparison subcommands.

Module contents#