Systems Comparison Across Pangenomes#
The compare_systems command identifies and analyzes conserved biological systems across multiple pangenomes by
comparing their gene family composition and computing similarity metrics.
This analysis builds upon previously detected systems from individual pangenomes and uses Gene
Family Relatedness Relationship (GFRR) metrics to identify systems that are conserved across different bacterial
populations. The analysis generates visualizations showing system distribution patterns and creates graphs of conserved
system clusters.
Systems Comparison Workflow#
The systems comparison process runs as follows:
Load and Validate Pangenomes
Multiple pangenomes are loaded from .h5 files based on a .tsv file.
Each pangenome is validated to ensure that systems have been detected for the specified sources.
Create Systems
All systems from all pangenomes are represented as nodes in a unified NetworkX graph.
Each system is characterized by its gene families and model families for similarity assessment.
Compute GFRR-based Edges
For each pair of systems from different pangenomes:
Model gene families are compared using GFRR metrics.
If model families exceed thresholds, all gene families are compared.
Edges are added between systems that exceed both GFRR cutoff thresholds.
Cluster Conserved Systems
Graph clustering algorithms (Louvain) identify groups of similar systems that represent conserved biological systems across pangenomes based on the selected GFRR metric.
Generate Visualizations
Heatmaps showing system distribution patterns across pangenomes are generated in HTML format for interactive exploration.
Write Results to Files
Conserved systems are saved as graph files (GEXF, GraphML) and summary tables for further analysis and visualization.
System comparison command Line Usage#
Basic systems comparison with heatmap generation:
panorama compare_systems \
--pangenomes pangenomes.tsv \
--models defense_systems.tsv \
--sources defense_finder \
--output systems_comparison_results \
--heatmap \
--threads 8
Full analysis with conserved systems clustering:
panorama compare_systems \
--pangenomes pangenomes.tsv \
--models defense_systems.tsv cas_systems.tsv \
--sources defense_finder CasFinder \
--output systems_comparison_results \
--heatmap \
--gfrr_metrics min_gfrr_models \
--gfrr_cutoff 0.8 0.8 \
--gfrr_models_cutoff 0.2 0.2 \
--graph_formats gexf graphml \
--threads 8
Key Options#
Shortcut |
Argument |
Type |
Optional |
Description |
|---|---|---|---|---|
-p |
–pangenomes |
str (file path) |
False |
TSV file listing .h5 pangenomes with detected systems |
-m |
–models |
List[str] (file paths) |
False |
Path(s) to system model files (must match –sources order) |
-s |
–sources |
List[str] |
False |
Name(s) of systems sources (must match –models order) |
-o |
–output |
str (directory path) |
False |
Output directory for comparison results |
— |
–gfrr_cutoff |
List[float] (2 values) |
True |
Two thresholds for min_gfrr and max_gfrr values (default: 0.5 0.8) |
— |
–seed |
Int |
Optional |
Random seed to guarantee reproductibility (default 42) |
— |
–heatmap |
bool (flag) |
True |
Generate heatmaps showing system distribution across pangenomes |
— |
–gfrr_metrics |
str (choice) |
True |
GFRR metric for clustering conserved systems (min_gfrr_models, max_gfrr_models, min_gfrr, max_gfrr) |
— |
–gfrr_models_cutoff |
List[float] (2 values) |
True |
GFRR thresholds for model gene families (default: 0.4 0.6) |
— |
–graph_formats |
List[str] |
True |
Export graph formats: gexf, graphml |
— |
–canonical |
bool (flag) |
True |
Include canonical system versions in analysis |
Advanced Configuration Arguments#
Shortcut |
Argument |
Type |
Optional |
Description |
|---|---|---|---|---|
— |
–cluster |
str (file path) |
True |
Tab-separated file with pre-computed clustering results (cluster_name\tfamily_id format) |
— |
–method |
str (choice) |
True |
MMSeqs2 clustering method: linclust or cluster (default: linclust) |
— |
–tmpdir |
str (directory path) |
True |
Directory for temporary files (default: /tmp) |
— |
–keep_tmp |
bool (flag) |
True |
Keep temporary files after completion |
-c |
–cpus |
int |
True |
Number of CPU threads for parallel processing (default: 1) |
— |
–verbose |
int (choice) |
True |
Verbose level: 0 (warnings/errors), 1 (info), 2 (debug) (default: 1) |
— |
–log |
str (file path) |
True |
Log output file (default: stdout) |
-d |
–disable_prog_bar |
bool (flag) |
True |
Disable the progress bars |
— |
–force |
bool (flag) |
True |
Force writing in output directory and pangenome file |
Note
PANORAMA can perform the clustering step first thing, but it’s also possible to use pre-computed clustering results with
the --cluster argument.
If you use let PANORAMA perform the clustering, you can look at the Clustering section for more
details about options.
GFRR Metrics for Systems#
Metric |
Target Families |
Description |
|---|---|---|
min_gfrr_models |
Model families only |
Conservative metric using core functional families |
max_gfrr_models |
Model families only |
Liberal metric using core functional families |
min_gfrr |
All families |
Conservative metric using complete gene repertoire |
max_gfrr |
All families |
Liberal metric using complete gene repertoire |
Cutoff Configuration#
The dual-cutoff system provides hierarchical filtering:
Filtering Stage |
Cutoffs |
Purpose |
|---|---|---|
Model families |
gfrr_models_cutoff |
Primary filter using core functional genes |
All families |
gfrr_cutoff |
Secondary filter using complete gene repertoire |
Recommended settings#
Strict: gfrr_models_cutoff=[0.5, 0.5], gfrr_cutoff=[0.8, 0.8]
Moderate: gfrr_models_cutoff=[0.3, 0.3], gfrr_cutoff=[0.6, 0.7]
Permissive: gfrr_models_cutoff=[0.2, 0.2], gfrr_cutoff=[0.4, 0.5]
Output#
PANORAMA generates multiple outputs: interactive heatmaps, network graphs, and summary tables for comprehensive systems analysis.
File Organization#
output_directory/
├── heatmap_number_systems.html
├── heatmap_normalized_systems.html
├── conserved_systems.gexf (optional)
└── conserved_systems.graphml (optional)
Files description#
Heatmap Visualizations#
Interactive HTML heatmaps showing system distribution patterns:
File |
Description |
|---|---|
heatmap_number_systems.html |
Raw counts of each system type per pangenome |
heatmap_normalized_systems.html |
Normalized percentages showing relative abundance |
[PLACEHOLDER: Heatmap showing system distribution across multiple pangenomes]
[PLACEHOLDER: Normalized heatmap showing relative system abundance patterns]
Conserved System Clustering#
Network Graphs#
When --gfrr_metrics and --graph_formats are specified, generate conserved_systems.gexf/graphml Network graphs of
conserved system clusters.
Node attributes include system metadata, pangenome information, and cluster assignments
Edge attributes contain GFRR similarity scores and the number of shared gene families.
[PLACEHOLDER: Network graph of conserved systems clusters with different colors]