panorama.annotate package#

Submodules#

panorama.annotate.annotate module#

panorama.annotate.annotate.annot_pangenomes(pangenomes: Pangenomes, source: str = None, table: Path = None, hmm: Path = None, threads: int = 1, k_best_hit: int = None, lock: Lock = None, force: bool = False, disable_bar: bool = False, **hmm_kwgs: Any)#

Gene families annotation with HMM or TSV files for multiple pangenomes in multiprocessing.

Parameters:
  • pangenomes (Pangenomes) – Pangenomes object containing all the pangenome to annotate.

  • source (str, optional) – Name of the annotation source. Defaults to None.

  • table (Path, optional) – Path to the metadata file for gene families annotation. Defaults to None.

  • hmm (Path, optional) – Path to hmm list file. Defaults to None.

  • threads (int, optional) – Number of available threads. Defaults to 1.

  • k_best_hit (int, optional) – Number of best hits to keep. Defaults to None.

  • lock (Lock, optional) – Lock for multiprocessing. Defaults to None.

  • force (bool, optional) – Flag to allow force overwrite in pangenomes. Defaults to False.

  • disable_bar (bool, optional) – Flag to disable progress bar. Defaults to False.

  • **hmm_kwgs (Any) – Arbitrary keyword arguments for hmm alignment.

Raises:

AssertionError – If neither HMM nor TSV are provided.

panorama.annotate.annotate.annot_pangenomes_with_hmm(pangenomes: Pangenomes, hmm: Path = None, source: str = '', mode: str = 'fast', threads: int = 1, disable_bar: bool = False, **hmm_kwgs) Dict[str, DataFrame]#

Main function to add annotation to pangenome from tsv file.

Parameters:
  • pangenomes (Pangenomes) – Pangenomes object containing all the pangenome to annotate.

  • hmm (Path, optional) – Path to hmm list file. Defaults to None.

  • source (str, optional) – Name of the annotation source. Defaults to “”.

  • mode (str, optional) – Which mode to use to annotate gene families with HMM. Defaults to “fast”.

  • threads (int, optional) – Number of available threads. Defaults to 1.

  • disable_bar (bool, optional) – Flag to disable progress bar. Defaults to False.

  • **hmm_kwgs – Arbitrary keyword arguments for HMM annotation.

Returns:
  • Dict[str, pd.DataFrame] – Dictionary with for each pangenome a dataframe

  • containing gene families metadata given by HMM.

panorama.annotate.annotate.check_annotate_args(args: Namespace, silence_warning: bool = False) Tuple[Dict[str, Any], Dict[str, Any]]#

Checks the provided arguments to ensure that they are valid.

Parameters:
  • args (argparse.Namespace) – The parsed arguments.

  • silence_warning (bool, optional) – Flag to silence warning messages. Defaults to False. This option is used for pansystems workflow to not have unwanted warnings.

Returns:
  • Tuple[Dict[str, Any], Dict[str, Any]]

  • Two dictionaries containing necessary information and HMM keyword arguments.

Raises:

argparse.ArgumentError – If any required arguments are missing or invalid.

panorama.annotate.annotate.check_pangenome_annotation(pangenome: Pangenome, source: str, force: bool = False)#

Check pangenome information before adding annotation.

Parameters:
  • pangenome (Pangenome) – Pangenome object that will be checked.

  • source (str) – Source of annotation to check if already in pangenome.

  • force (bool, optional) – Flag to allow overwriting/erasing annotation. Defaults to False.

Raises:

KeyError – If a source with the same name already exists, and force is False.

panorama.annotate.annotate.get_k_best_hit(group, k_best_hit: int) DataFrame#

Get the K best hits for a given group in a dataframe.

Parameters:
  • group – Dataframe group.

  • k_best_hit (int) – Number of best hits to keep.

Returns:

pd.DataFrame – K best hits per group.

panorama.annotate.annotate.keep_best_hit(metadata: DataFrame, k_best_hit: int) DataFrame#

Keep the k best hit for a given metadata.

Parameters:
  • metadata (pd.DataFrame) – Metadata dataframe with multiple annotations for gene families.

  • k_best_hit (int) – Number of best hits to keep.

Returns:

pd.DataFrame – Filtered metadata dataframe with only the k best hits.

panorama.annotate.annotate.launch(args: Namespace) None#

Launch functions to annotate pangenomes

Parameters:

args (argparse.Namespace) – argument given in CLI

panorama.annotate.annotate.parser_annot(parser)#

Add argument to parser for annot command

Parameters:

parser – parser for annot argument

panorama.annotate.annotate.parser_annot_hmm(parser)#

Add argument to parser for HMM annotation

Parameters:

parser – parser for annot argument

panorama.annotate.annotate.read_families_metadata(pangenome: Pangenome, metadata: Path) Tuple[DataFrame, str]#

Read gene families metadata for one pangenome.

Parameters:
  • pangenome (Pangenome) – Pangenome object for which metadata will be associated.

  • metadata (Path) – Path to metadata file containing metadata to add to pangenome.

Returns:

Tuple[pd.DataFrame, str] – The metadata dataframe and the name of the pangenome.

panorama.annotate.annotate.read_families_metadata_mp(pangenomes: Pangenomes, table: Path, threads: int = 1, lock: Lock = None, disable_bar: bool = False) Dict[str, DataFrame]#

Read gene families metadata for multiple pangenomes in multiprocessing.

Parameters:
  • pangenomes (Pangenomes) – Pangenomes object containing all the pangenome to annotate.

  • table (Path) – Path to the metadata file for gene families.

  • threads (int, optional) – Number of available threads. Defaults to 1.

  • lock (Lock, optional) – Lock for multiprocessing execution. Defaults to None.

  • disable_bar (bool, optional) – Flag to disable progress bar. Defaults to False.

Returns:

Dict[str, pd.DataFrame] – Dictionary with the metadata linked to pangenome by its name.

panorama.annotate.annotate.remove_redundant_annotation(metadata: DataFrame) DataFrame#

Remove redundant annotation based on score, e-value, and bias.

Parameters:

metadata (pd.DataFrame) – Metadata dataframe containing annotations.

Returns:

pd.DataFrame – Dataframe with redundant annotations removed.

panorama.annotate.annotate.subparser(sub_parser) ArgumentParser#

Subparser to launch PANORAMA Command line

Parameters:

sub_parser – sub_parser for annot command

Returns:

argparse.ArgumentParser – parser arguments for annot command

panorama.annotate.annotate.write_annotations_to_pangenome(pangenome: Pangenome, metadata: DataFrame, source: str, k_best_hit: int = None, force: bool = False, disable_bar: bool = False)#

Write gene families annotation for one pangenome.

Parameters:
  • pangenome (Pangenome) – Pangenome linked to metadata.

  • metadata (pd.DataFrame) – Metadata dataframe.

  • source (str) – Metadata source.

  • k_best_hit (int, optional) – Number of best hits to keep. Defaults to None.

  • force (bool, optional) – Boolean to allow force writing in pangenomes. Defaults to False.

  • disable_bar (bool, optional) – Allow disabling the progress bar. Defaults to False.

panorama.annotate.annotate.write_annotations_to_pangenomes(pangenomes: Pangenomes, pangenomes2metadata: Dict[str, DataFrame], source: str, k_best_hit: int = None, threads: int = 1, lock: Lock = None, force: bool = False, disable_bar: bool = False)#

Write gene families annotation for pangenomes in multiple processing.

Parameters:
  • pangenomes (Pangenomes) – Pangenomes object containing all the pangenome to annotate.

  • pangenomes2metadata (Dict[str, pd.DataFrame]) – Dictionary with for each pangenome

  • associated. (the metadata dataframe)

  • source (str) – Metadata source.

  • k_best_hit (int, optional) – Number of best hits to keep. Defaults to None.

  • threads (int, optional) – Number of available threads. Defaults to 1.

  • lock (Lock, optional) – Lock for multiprocessing execution. Defaults to None.

  • force (bool, optional) – Boolean to allow force to write in pangenomes. Defaults to False.

  • disable_bar (bool, optional) – Allow disabling the progress bar. Defaults to False.

Module contents#