System Association with Pangenome Elements#
The write_systems command enables the creation of associations between systems and other pangenome elements such as
RGPs (Regions of Genomic Plasticity), spots, and modules. Association analysis provides correlation matrices and
visualizations to analyze relationships between systems and various pangenome components.
Association Workflow#
The association process creates comprehensive correlation matrices and coverage analyses:
1. Load Systems and Elements#
Detected systems from the .h5 pangenome file are loaded
Target pangenome elements (RGPs, spots, modules) are retrieved
System-to-element mappings are established for correlation analysis
2. Build Association Matrices#
For each association type, the workflow:
Extract Element Relationships: Map systems to their associated RGPs, spots, or modules
Create Correlation Matrix: Generate system ร element matrices showing co-occurrence patterns
Calculate Coverage: Compute how well each element is covered by associated systems
Compute Frequencies: Determine element occurrence frequencies in organisms in which systems also exist
3. Generate Visualizations#
The association analysis produces interactive correlation matrix heatmaps with:
Main Heatmap: Color-coded correlation matrix showing system-element associations
Coverage Plot: Visual representation of how systems cover pangenome elements
Frequency Plot: Display of element frequencies across genomes
Bar Charts: Summary statistics for systems and elements
Color Bars: Legend and scaling information
Association Command Line Usage#
Basic Association Analysis#
panorama write_systems \
--pangenomes pangenomes.tsv \
--models models.tsv \
--sources defense_finder \
--association RGPs \
--threads 8 \
--output results/
Multiple Association Types#
panorama write_systems \
--pangenomes pangenomes.tsv \
--models models.tsv \
--sources defense_finder \
--association all \ # Analyze RGPs, spots, and modules
--threads 16 \
--output results/
Combined with Other Analyses#
panorama write_systems \
--pangenomes pangenomes.tsv \
--models models.tsv \
--sources defense_finder immune_system \
--projection \ # Also project systems onto genomes
--association spots modules \ # Associate with spots and modules
--partition \ # Write partition heatmaps
--canonical \ # Include canonical versions
--threads 16 \
--output results/
For more information on other analysis options, here is the documentation about projection.
Association command line arguments#
Association-Specific key#
Argument |
Type |
Choices |
Description |
|---|---|---|---|
|
list |
|
Pangenome elements to associate with systems |
Association Types:
RGPs: Regions of Genomic Plasticity - variable genomic regions
spots: Hotspots of genome evolution - frequently variable loci
modules: Functional modules - co-evolving gene clusters
all: Analyze all three association types
Required Arguments#
Argument |
Type |
Description |
|---|---|---|
|
Path |
TSV file listing pangenome .h5 files to process |
|
Path |
Output directory for projection results |
|
Path |
Path(s) to model list files |
|
str |
Name(s) of the systems sources |
Optional Arguments#
Argument |
Type |
Default |
Description |
|---|---|---|---|
|
list |
[โhtmlโ] |
Visualization output format customization |
|
int |
1 |
Number of parallel threads to use |
|
flag |
False |
Overwrite existing projection files |
Output Format Options
The visualization outputs can be customized (currently the HTML format is default):
HTML: Interactive Bokeh plots (default)
PNG: Static high-resolution images (if requested)
Association Output Files#
Association analysis creates organized output in the specified --output directory:
output/
โโโ pangenome_1/
โ โโโ source_1/
โ โโโ association.tsv # Main association matrix
โ โโโ correlation_RGPs.html # RGP correlation visualization
โ โโโ correlation_RGPs.png # RGP correlation visualization (PNG)
โ โโโ correlation_spots.html # Spot correlation visualization
โ โโโ correlation_spots.png # Spot correlation visualization (PNG)
โ โโโ correlation_modules.html # Module correlation visualization
โ โโโ correlation_modules.png # Module correlation visualization (PNG)
โ โโโ rgp_to_systems.tsv # RGP-system mappings
โ โโโ spot_to_systems.tsv # Spot-system mappings
โ โโโ module_to_systems.tsv # Module-system mappings
โโโ pangenome_2/
โโโ source_1/
โโโ ...
1. Main Association File (association.tsv)#
Primary file containing system-element associations:
Column |
Description |
|---|---|
system number |
Unique numeric ID for the system (index) |
system name |
Name of the system from the model |
families |
Comma-separated list of gene families in the system |
RGPs |
Comma-separated list of associated RGP names (if |
spots |
Comma-separated list of associated spot IDs (if |
modules |
Comma-separated list of associated module IDs (if |
2. Element-to-Systems Mapping Files#
RGP Mapping (rgp_to_systems.tsv)#
Column |
Description |
|---|---|
name |
RGP identifier (index) |
systems_ID |
Comma-separated list of associated system IDs |
systems_name |
Comma-separated list of associated system names |
coverage |
Proportion of RGP families covered by systems |
frequency |
Frequency of RGP occurrence across organisms |
Spot Mapping (spot_to_systems.tsv)#
Column |
Description |
|---|---|
name |
Spot identifier (index, format: |
systems_ID |
Comma-separated list of associated system IDs |
systems_name |
Comma-separated list of associated system names |
coverage |
Proportion of spot families covered by systems |
frequency |
Frequency of spot occurrence across organisms |
Module Mapping (module_to_systems.tsv)#
Column |
Description |
|---|---|
name |
Module identifier (index, format: |
systems_ID |
Comma-separated list of associated system IDs |
systems_name |
Comma-separated list of associated system names |
coverage |
Proportion of module families covered by systems |
frequency |
Frequency of module occurrence across organisms |
3. Interactive Visualization Files#
Correlation Matrix Plots (correlation_<type>.html)#
Interactive Bokeh visualizations containing:
Main Components:
Central Heatmap: System ร Element correlation matrix with hover tooltips
Left Bar Chart: System occurrence counts across elements
Top Bar Chart: Element occurrence counts across systems
Right Color Bar: Legend showing correlation intensity scale
Lower Panels:
Frequency Plot: Element frequency across genomes (blue color scale)
Coverage Plot: System coverage of elements (red color scale)
Color Bar Legends: Scaling information for frequency and coverage plots
Interactive Features:
Hover Tooltips: Show detailed information on mouseover
Zoom/Pan: Navigate large correlation matrices
Save Tools: Export plots or data
Responsive Layout: Adapts to different screen sizes
Data Interpretation#
Coverage Metrics#
Coverage represents the number of element gene families covered by the associated systems:
1.0 (dark red): Associated systems cover all families in the element
0.5 (light red): Systems explain half of the elementโs families
0.0 (white): No overlap between element and system families
Depending on the element type, coverage is computed differently:
RGP: use the intersection between the gene of the RGP and gene of an associated system present in the genome of the RGP
Spot: use gene families between borders of the spot
Module: use the intersection of the moduleโs gene families and the gene families of an associated system
Frequency Metrics#
Frequency indicates how common elements and associated systems appear across genomes:
High Frequency (dark blue): Element found in many genomes
Low Frequency (light blue): Element found in few genomes
Interpretation: Helps identify core vs. accessory genomic features
Correlation Intensity#
Correlation values in the heatmap show system-element co-occurrence:
High Values (dark colors): high number of system types associated with the element
Low Values (light colors): Weak or no association
Pattern Analysis: Reveals functional relationships and genomic organization
Technical Details#
Visualization Technology#
Bokeh Framework: Interactive web-based visualizations
Responsive Design: Adapts to correlation matrix size
Color Palettes: Carefully chosen for accessibility and clarity
Export Options: Support for HTML and PNG formats
Performance Optimization#
The association analysis includes several optimizations:
Parallel Processing: Multi-threaded computation of associations
Memory Efficiency: Streaming processing of large pangenomes
Vectorized Operations: Pandas-based matrix operations
Component-based Analysis: Direct mapping without graph construction