Reports¶

Below are descriptions of reports generated by the centreseq core module. Examples follow where applicable.

Summary Report¶

./reports/summary_report.tsv

./reports/summary_report_singletons_removed.tsv

Large .tsv file containing detailed information for each cluster detected by centreseq. Displays representative sequence label, number of members, and which sequence labels belong to the cluster. In addition, a filtered report is provided which does not contain any singleton clusters.

Core Gene Count Report¶

./reports/core_gene_count_report.txt

Contains general metrics on the # of core genes detected among 100% of samples, >=90% of samples, and >50% of samples.

# genes shared among 100% of samples:       2497
# genes shared in >=90% of samples: 3548
# genes shared in >=50% of samples: 3991

Results were generated with:
    min_seq_id      = 0.95
    coverage_length = 0.95

Roary Gene Count Report¶

./reports/roary_gene_count_report.txt

Gene count report represented in the style of Roary’s output.

Core genes  (99% <= strains <= 100%)        3334
Soft core genes     (95% <= strains < 99%)  178
Shell genes (15% <= strains < 95%)  1311
Cloud genes (0% <= strains < 15%)   21966
Total genes (0% <= strains <= 100%) 26789

Results were generated with:
    min_seq_id      = 0.95
    coverage_length = 0.95

Pairwise Report¶

./reports/pairwise_gene_match_report.tsv

Large tab-delimited file which stores pairwise information on samples regarding matching core cluster counts. Used to generate the network visualization.

Network Chart¶

./network_graph.html

./network_graph_coding.tsv

The network chart presents a visualization of genome relatedness generated from the pairwise report file from the core module. Links are drawn between all pairs of genomes that share a relationship determined by % of matching core clusters. This percentage threshold value can be adjusted via a slider.

Samples can be subset into arbitrary colour-coded groups through the network_graph_coding.tsv file.

Gene Count Curve¶

./reports/gene_count_curve.png

./reports/gene_count_curve.csv

Visualization generated showing the # of ‘core’ genes vs. the # of ‘pan’ genes with increasing numbers of sampled genomes.

Looping over a range from 1..n, the following process is executed,

n samples are randomly selected

Number of core genes shared between the subset calculated

Total number of genes existing among the subset is calculated

By default, this entire process is repeated 5 times and the values averaged to reduce variance from the random sampling.