Reports

Below are descriptions of reports generated by the centreseq core module. Examples follow where applicable.

Summary Report

./reports/summary_report.tsv

./reports/summary_report_singletons_removed.tsv

Large .tsv file containing detailed information for each cluster detected by centreseq. Displays representative sequence label, number of members, and which sequence labels belong to the cluster. In addition, a filtered report is provided which does not contain any singleton clusters.

Core Gene Count Report

./reports/core_gene_count_report.txt

Contains general metrics on the # of core genes detected among 100% of samples, >=90% of samples, and >50% of samples.

# genes shared among 100% of samples:       2497
# genes shared in >=90% of samples: 3548
# genes shared in >=50% of samples: 3991

Results were generated with:
    min_seq_id      = 0.95
    coverage_length = 0.95

Roary Gene Count Report

./reports/roary_gene_count_report.txt

Gene count report represented in the style of Roary’s output.

Core genes  (99% <= strains <= 100%)        3334
Soft core genes     (95% <= strains < 99%)  178
Shell genes (15% <= strains < 95%)  1311
Cloud genes (0% <= strains < 15%)   21966
Total genes (0% <= strains <= 100%) 26789

Results were generated with:
    min_seq_id      = 0.95
    coverage_length = 0.95

Pairwise Report

./reports/pairwise_gene_match_report.tsv

Large tab-delimited file which stores pairwise information on samples regarding matching core cluster counts. Used to generate the network visualization.

Network Chart

./network_graph.html

./network_graph_coding.tsv

The network chart presents a visualization of genome relatedness generated from the pairwise report file from the core module. Links are drawn between all pairs of genomes that share a relationship determined by % of matching core clusters. This percentage threshold value can be adjusted via a slider.

Samples can be subset into arbitrary colour-coded groups through the network_graph_coding.tsv file.

Example network chart

Gene Count Curve

./reports/gene_count_curve.png

./reports/gene_count_curve.csv

Visualization generated showing the # of ‘core’ genes vs. the # of ‘pan’ genes with increasing numbers of sampled genomes.

Example gene count curve

Looping over a range from 1..n, the following process is executed,

  1. n samples are randomly selected
  2. Number of core genes shared between the subset calculated
  3. Total number of genes existing among the subset is calculated

By default, this entire process is repeated 5 times and the values averaged to reduce variance from the random sampling.