panorama package#

Subpackages#

Submodules#

panorama.geneFamily module#

This module provides classes to represent gene families and study them

class panorama.geneFamily.Akin(identifier: int, reference: GeneFamily, *gene_families: GeneFamily)#

Bases: object

Represents a group of gene families that are similar across multiple pangenomes.

ID#

The identifier of the Akin instance.

Type:

int

reference#

The reference gene family name.

Type:

str

_families#

A dictionary of gene families in the Akin group.

Type:

dict

__getitem__(name: str) GeneFamily#

Retrieves a GeneFamily from the Akin group by name.

Parameters:

name (str) – The name of the GeneFamily to retrieve.

Returns:

GeneFamily – The GeneFamily instance with the specified name.

Raises:

KeyError – If the GeneFamily with the given name does not exist in the Akin group.

__init__(identifier: int, reference: GeneFamily, *gene_families: GeneFamily) None#

Initializes an Akin instance.

Parameters:
  • identifier (int) – The identifier of the Akin instance.

  • reference (GeneFamily) – The reference GeneFamily instance.

  • *gene_families (GeneFamily) – Additional GeneFamily instances to add.

__setitem__(name: str, family: GeneFamily)#

Adds a GeneFamily to the Akin group.

Parameters:
  • name (str) – The name of the GeneFamily.

  • family (GeneFamily) – The GeneFamily instance to add.

Raises:

KeyError – If the GeneFamily with the given name already exists in the Akin group.

add(family: GeneFamily)#

Adds a GeneFamily to the Akin group.

Parameters:

family (GeneFamily) – The GeneFamily instance to add.

Raises:

AssertionError – If the provided family is not a GeneFamily instance.

get(name: str) GeneFamily#

Retrieves a GeneFamily from the Akin group by name.

Parameters:

name (str) – The name of the GeneFamily to retrieve.

Returns:

GeneFamily – The GeneFamily instance with the specified name.

class panorama.geneFamily.GeneFamily(family_id: int, name: str)#

Bases: GeneFamily

Represents a single gene family. It is a node in the pangenome graph and is aware of its genes and edges.

name#

The name of the gene family to be printed in output files.

Type:

str

_hmm#

The HMM associated with the gene family.

Type:

HMM, optional

profile#

The profile associated with the gene family.

Type:

optional

optimized_profile#

The optimized profile for the gene family.

Type:

optional

_units_getter#

A dictionary to retrieve system units.

Type:

dict

_systems_getter#

A dictionary to retrieve systems.

Type:

dict

_akin#

The akin families associated with other pangenomes.

Type:

Akin, optional

property HMM: HMM#

Gets the HMM associated with the GeneFamily.

Returns:

HMM – The HMM associated with the GeneFamily.

__eq__(other: GeneFamily) bool#

Checks if two GeneFamily instances are equal based on their genes.

Parameters:

other (GeneFamily) – Another GeneFamily instance to compare.

Returns:

bool – True if the GeneFamily instances are equal, False otherwise.

Raises:

TypeError – If the other object is not a GeneFamily instance.

__hash__() int#

Returns the hash of the GeneFamily instance.

Returns:

int – The hash value of the GeneFamily instance.

__init__(family_id: int, name: str)#

Initializes a GeneFamily instance.

Parameters:
  • family_id (int) – The internal identifier of the gene family.

  • name (str) – The name of the gene family.

__ne__(other: GeneFamily) bool#

Checks if two GeneFamily instances are not equal.

Parameters:

other (GeneFamily) – Another GeneFamily instance to compare.

Returns:

bool – True if the GeneFamily instances are not equal, False otherwise.

__repr__() str#

Returns a string representation of the GeneFamily instance.

Returns:

str – The string representation of the GeneFamily instance.

_getattr_from_ppanggolin(family: GeneFamily)#

Copies attributes from a PPanGGOLiN GeneFamily instance to a PANORAMA GeneFamily instance.

Parameters:

family (Fam) – A PPanGGOLiN GeneFamily instance.

property akin: Akin#

Gets the akin families associated with other pangenomes.

Returns:

Akin – The akin families.

Raises:

KeyError – If no akin families are assigned.

is_multigenic() bool#

Checks whether the GeneFamily is multigenic.

Returns:

bool – True if the GeneFamily is multigenic, False otherwise.

is_multigenic_in_org(organism: Organism) bool#

Checks whether the GeneFamily is multigenic in a specific organism.

Parameters:

organism (Organism) – The organism to check.

Returns:

bool – True if the GeneFamily is multigenic in the organism, False otherwise.

property module#

Return module belonging to the family

Returns:

panorama.region.Module – module belonging to the family

static recast(family: GeneFamily) GeneFamily#

Recasts a PPanGGOLiN GeneFamily into a PANORAMA GeneFamily.

Parameters:

family (Fam) – A PPanGGOLiN GeneFamily instance.

Returns:

GeneFamily – The recast PANORAMA GeneFamily instance.

panorama.main module#

The script serves as the entry point for the Panorama software, a bioinformatics tool for analyzing pangenomes. It provides a command-line interface for various functionalities such as annotation, system detection, alignment, comparison, formatting, and utility workflows.

The script constructs a comprehensive command-line utility with subcommands and their corresponding options, encompassing bioinformatic tools ranging from annotations to workflows. A help system is embedded to guide users through the available subcommands and their usage.

members:

undoc-members:

show-inheritance:

panorama.pangenomes module#

This module provides classes to represent a pangenome or a set of pangenomes

class panorama.pangenomes.Pangenome(name, taxid: int = None)#

Bases: Pangenome

This is a class representing pangenome based on PPanGGOLLiN class. It is used as a basic unit for all the analysis to access to the different elements of your pangenome, such as organisms, contigs, genes or gene families. This class provides some more methods needed to analyze pangenome.

Parameters:

name – Name of the pangenome

__init__(name, taxid: int = None)#

Constructor method.

_create_gene_family(name: str) GeneFamily#

Creates a gene family object with the given name

Parameters:

name – The name to give to the gene family. Must not exist already.

Returns:

GeneFamily – The created GeneFamily object

add_file(pangenome_file: Path, check_version: bool = True)#

Links an HDF5 file to the pan.

If needed elements will be loaded from this file, and anything that is computed will be saved to this file when ppanggolin.formats.writeBinaries.writePangenome() is called.

Parameters:
  • pangenome_file – A string representing the filepath to the hdf5 pan file to be either used or created

  • check_version – Check ppanggolin version of the pangenome file to be compatible with the current version of ppanggolin being used.

Raises:
  • AssertionError – If the pangenome_file is not an instance of the Path class

  • TypeError – If the pangenome_file is not a HDF5 format file

add_gene_family(family: GeneFamily | GeneFamily)#

Adds a gene family to the pangenome

Parameters:

family (Union[GeneFamily, Fam]) – GeneFamily object to add

add_spot(spot: Spot)#

Adds the given iterable of spots to the pangenome.

Parameters:

spot – Spot which should be added

Raises:
  • AssertionError – Error if spot is not a Spot object

  • KeyError – Error if another Spot exist in pangenome with the same identifier

add_system(system: System)#

Add a detected system to the pangenome.

Parameters:

system (System) – Detected system to be added.

property gene_families: Generator[GeneFamily, None, None]#

Returns all the gene families in the pangenome

Returns:

Generator[GeneFamily, None, None] – Generator of gene families

get_gene_family(name: str) GeneFamily | None#

Get the gene family by its name in the pangenome

Parameters:

name – Name of the gene family to get

Returns:

Union[GeneFamily, None] – The desired gene family

get_system(system_id: str) System#

Get a system by its ID in the pangenome

Parameters:

system_id – ID of the system to get

Returns:

System – The desired system

Raises:

KeyError – If the system doesn’t exist in the pangenome

get_system_by_source(source: str) Generator[System, None, None]#

Retrieve systems by their source.

Parameters:

source (str) – Source identifier.

Yields:

Generator[System, None, None] – Systems with the given source.

number_of_systems(source: str = None, with_canonical: bool = True) int#

Get the number of systems in the pangenome.

Parameters:
  • source (str, optional) – Source identifier. Defaults to None.

  • with_canonical (bool, optional) – Include canonical systems. Defaults to True.

Returns:

int – Number of systems.

property systems: Generator[System, None, None]#

Get all systems in the pangenome

Yields:

Generator[System, None, None] – Generator of systems

property systems_sources: Set[str]#

Get sources of all systems in the pangenome

Returns:

Set[str] – Set of system sources

systems_sources_to_metadata_source() Dict[str, Set[str]]#

Get metadata sources related to system sources

Returns:

Dict[str, Set[str]] – System source as key linked to their metadata sources as value

class panorama.pangenomes.Pangenomes#

Bases: object

A collection of pangenome objects.

__init__()#

Initialize an empty collection of pangenomes.

__iter__() Generator[Pangenome, None, None]#

Iterate over the pangenomes in the collection.

__len__()#

Get the number of pangenomes in the collection.

add(pangenome: Pangenome)#

Add a pangenome object to the collection.

Parameters:

pangenome (Pangenome) – The pangenome object to add.

add_cluster(cluster: Akin) None#

Add a cluster of similar gene families between pangenomes

Parameters:

cluster – A set of akin gene families

Raises:

KeyError – If there is already an Akin object with this ID

add_cluster_systems(cluster_systems: ClusterSystems)#

Add a set of conserved spots between pangenomes

Parameters:

cluster_systems – Conserved spots object

Raises:

KeyError – if cluster_systems identifier already exist in pangenomes.

add_conserved_spots(conserved_spots: ConservedSpots)#

Add a conserved spots between pangenomes

Parameters:

conserved_spots – Conserved spots object

Raises:

KeyError – if conserved_spots identifier already exist in pangenomes.

property cluster_systems: Generator[ClusterSystems, None, None]#

Generator of conserved spots between pangenomes

Yields:

ConservedSpots – a set of spots conserved between pangenomes

property conserved_spots: Generator[ConservedSpots, None, None]#

Generator of conserved spots between pangenomes

Yields:

ConservedSpots – a set of spots conserved between pangenomes

get(name: str)#

Retrieve a pangenome object from the collection by its name.

Parameters:

name (str) – The name of the pangenome to retrieve.

Returns:

Pangenome – The pangenome object with the specified name.

get_cluster_systems(cs_id: str)#

Get a cluster systems by its ID

Parameters:

cs_id – Cluster systems ID

Raises:

KeyError – if conserved_spots identifier does not exist in pangenomes.

get_conserved_spots(cs_id: int)#

Get a conserved spots by its ID

Parameters:

cs_id – Conserved spots ID

Raises:

KeyError – if conserved_spots does not exist in pangenomes.

get_family(name: str, check_duplicate_names: bool = True) GeneFamily#

Get a family in the pangenomes

Parameters:
  • name – name of the gene family

  • check_duplicate_names – Flag to raise an error if duplicate family names between pangenomes.

Returns:

The gene family with the given name

items() Generator[Tuple[str, Pangenome], None, None]#

Generator of pangenome name as key and pangenome object as value

mk_families_to_pangenome(check_duplicate_names: bool = True)#

Fill the families2pangenome dictionary to know from which pangenome the family belongs to

Parameters:

check_duplicate_names – Flag to return an error if families name is duplicated between pangenome.

Raises:

KeyError – If there is a duplicate family names

property number_of_cluster_systems: int#

Get the number of conserved spots

Returns:

Number of conserved spots

property number_of_conserved_spots: int#

Get the number of conserved spots

Returns:

Number of conserved spots

read_clustering(clustering: Path | DataFrame, disable_bar: bool = False)#

Read clustering result from panorama

Parameters:
  • clustering – Clustering result

  • disable_bar – Flag to disable progress bar (default: False)

to_list() List[Pangenome]#

Convert the collection to a list of pangenomes.

Returns:

List[Pangenome] – A list of pangenome objects.

to_set() Set[Pangenome]#

Convert the collection to a set of pangenomes.

Returns:

Set[Pangenome] – A set of pangenome objects.

panorama.region module#

This module contains classes to represent regions, spots, conserved spots, and modules in a pangenome.

class panorama.region.ConservedSpots(identifier: int, *spots: Spot)#

Bases: object

Represents a set of conserved spots across multiple pangenomes.

__init__(identifier: int, *spots: Spot)#

Constructor method.

Parameters:
  • identifier (int) – The identifier of the conserved spots set.

  • *spots (Spot) – The spots to add to the conserved set.

add(spot: Spot) None#

Add a spot to the conserved set of spots.

Parameters:

spot (Spot) – The spot to add to the object.

Raises:
  • AssertionError – If the spot is not an instance of the Spot class.

  • KeyError – If the spot already exists in the conserved spots with the same ID.

get(spot_id: int, pangenome_name: str) Spot#

Get a spot from the conserved set of spots.

Parameters:
  • spot_id (int) – The identifier of the spot.

  • pangenome_name (str) – The name of the pangenome from which the spot belongs.

Returns:

Spot – The spot with the given id and pangenome.

Raises:
  • AssertionError – If the spot id is not an integer.

  • KeyError – If the spot is not in the conserved set of spots.

pangenomes() List[str]#

Get the list of pangenomes where the conserved spot belongs.

Returns:

List[str] – The list of pangenome names.

property spots: Generator[Spot, None, None]#

Generator of the spots in the conserved object.

Yields:

Spot – The next spot in the conserved object.

class panorama.region.GeneContext(pangenome, gc_id: int, families: Set[GeneFamily], families_of_interest: Set[GeneFamily])#

Bases: GeneContext

A class used to represent a gene context

__init__(pangenome, gc_id: int, families: Set[GeneFamily], families_of_interest: Set[GeneFamily])#

:param gc_id : identifier of the Gene context :param families: Gene families included in the GeneContext

summarize() dict#

Summarize gene context information in a dict

Returns:

dict with gene context info.

class panorama.region.Module(module_id: int, families: set = None)#

Bases: Module

Represents a module in a pangenome.

Parameters:
  • module_id (int) – The identifier of the module.

  • families (set, optional) – The set of families that define the module.

__init__(module_id: int, families: set = None)#

Constructor method.

Parameters:
  • module_id (int) – The identifier of the module.

  • families (set, optional) – The set of families that define the module. Defaults to None.

add_unit(unit)#

Add a system to the module.

Parameters:

unit (System) – The system to add to the module.

Raises:
  • Exception – If a system with the same ID but different name or

  • gene families is already associated with the module.

property gene_families: Generator[GeneFamily, None, None]#

Get the set of gene families that define the module.

Returns:

GeneFamily – The set of gene families.

get_unit(identifier: int)#

Get a unit associated with the module.

Parameters:

identifier (int) – The identifier of the unit.

Returns:

System – The unit with the given identifier.

Raises:

KeyError – If the unit is not associated with the module.

property number_of_organisms#

Get the number of organisms that contain the module.

Returns:

int – The number of organisms.

property organisms#

Get the set of organisms that contain the module.

Returns:

set – The set of organisms.

property systems#

Generator of the systems associated with the module.

Yields:

System – The next system associated with the module.

property units#

Generator of the systems associated with the module.

Yields:

Generator[SystemUnit] – The next system associated with the module.

class panorama.region.Region(name: str)#

Bases: Region

Represents a region in a pangenome.

Parameters:

name (str) – The name of the region.

__init__(name: str)#

Constructor method.

Parameters:

name (str) – The name of the region.

class panorama.region.Spot(spot_id: int)#

Bases: Spot

Represents a spot in a pangenome.

__init__(spot_id: int)#

Constructor method.

Parameters:

spot_id (int) – The identifier of the spot.

property conserved: bool#

Check if the spot is conserved between pangenomes.

Returns:

bool – True if the spot is conserved, False otherwise.

property number_of_organisms#

Get the number of organisms that contain the spot.

Returns:

int – The number of organisms.

property organisms: Generator[Organism, None, None]#

Generator of the organisms that contain the spot.

Yields:

Organism – The next organism that contains the spot.

panorama.utils module#

This module contains functions for managing files and directories, and checking the sanity of a TSV file.

class panorama.utils.RawTextArgumentDefaultsHelpFormatter(prog, indent_increment=2, max_help_position=24, width=None)#

Bases: ArgumentDefaultsHelpFormatter, RawDescriptionHelpFormatter

panorama.utils.add_common_arguments(subparser: ArgumentParser) None#

Add common argument to the input subparser.

Parameters:

subparser – A subparser object from any subcommand.

panorama.utils.check_log(name: str) TextIO#

Check if the output log is writable

Parameters:

name – Path to the log output

Returns:

file object to write log

panorama.utils.check_tsv_sanity(tsv_path: Path) Dict[str, Dict[str, int | str | Path]]#

Check if the given TSV file is readable for the next PANORAMA step.

Parameters:

tsv_path (Path) – The path to the TSV file with the list of pangenomes.

Returns:

Dict[str, Dict[str, Union[int, str, Path]]] – A dictionary with pangenome name as key and a dictionary with path and taxid as values.

Raises:
  • SyntaxError – If the TSV file has less than 2 columns.

  • ValueError – If there is a line with no value in pangenome name or if the pangenome names contain spaces.

  • FileNotFoundError – If unable to locate one or more pangenomes in the TSV file.

panorama.utils.init_lock(lock: Lock = None)#

Initialize the loading lock.

Parameters:

lock (Lock, optional) – The lock object to be assigned to loading_lock. Defaults to None.

Returns:

Lock – The lock object assigned to loading_lock.

panorama.utils.is_empty(filepath)#

Checks if a file is empty.

Parameters:

filepath (str) – The path to the file to check.

Returns:

bool – True if the file is empty, False otherwise.

panorama.utils.is_true_value(value: str | int | bool) bool#

Check if a value represents a true condition.

True conditions are: 1, “True” or True

Parameters:

value – Value to check

Returns:

True if value represents a true condition

panorama.utils.mkdir(output: Path, force: bool = False, erase: bool = False) Path#

Create a directory at the given path.

Parameters:
  • output (Path) – The path to the output directory

  • force (bool, optional) – Whether to raise an exception if the directory already exists. Defaults to False

  • erase (bool, optional) – Whether to erase the directory if it already exists and force is True. Defaults to False

Returns:

Path – The path to the output directory.

Raises:
  • FileExistsError – If the directory already exists and force is False.

  • Exception – If an unexpected error occurs.

panorama.utils.pop_specific_action_grp(sub: ArgumentParser, title: str) _SubParsersAction#
panorama.utils.set_verbosity_level(args: Namespace) None#

Set the verbosity level

Parameters:

args – argument pass by command line

Module contents#