Usage¶
Usage: centreseq [OPTIONS] COMMAND [ARGS]...
centreseq builds an annotated core genome using assemblies as input.
Options:
--version Print the version and exit.
--help Show this message and exit.
Commands:
core Given an input directory containing assemblies, establishes a core
genome
extract Helper tool to extract sequences from a particular core cluster
subset Subset summary_report.tsv to only samples of interest
tree Produces output for phylogenetic tree software
Core module¶
The primary functionality of centreseq can be found within the the
core module.
Usage: centreseq core [OPTIONS]
Given an input directory containing any number of assemblies (.fasta),
centreseq core will 1) annotate the genomes with Prokka, 2) perform self-
clustering on each genome with MMSeqs2 linclust, 3) concatenate the self-
clustered genomes into a single pan-genome, 4) cluster the pan-genome with
MMSeqs2 linclust, establishing a core genome, 5) generate helpful reports
to interrogate your dataset Note that if specified output directory
already exists, centreseq will search for an existing Prokka directory and
skip this step if possible.
Options:
-f, --fasta-dir PATH Path to directory containing *.fasta files for
input to the core pipeline [required]
-o, --outdir PATH Root directory to store all output files. If
this directory already exists, the pipeline
will attempt to skip the Prokka step by reading
in the existing Prokka output directory, but
will overwrite all other existing result files.
[required]
--n-cpu-medoid INTEGER Number of CPUs for the representative medoid
picking step (if enabled). You will need
substantial RAM per CPU.
--n-cpu-pickbest INTEGER Number of CPUs for pick_best_nucleotide. You
will need substantial RAM per CPU.
-m, --min-seq-id FLOAT Sets the mmseqs cluster parameter "--min-seq-
id". Defaults to 0.95.
-c, --coverage-length FLOAT Sets the mmseqs cluster coverage parameter "-c"
directly. Defaults to 0.95, which is the
recommended setting.
--medoid-repseqs This setting will identify the representative
medoid nucleotide sequence for each core
cluster. Enabling this will increase
computation time considerably. Note that this
parameter has no effect on the number of core
clusters detected.
--pairwise Generate pairwise comparisons of all genomes.
This output file can be used to view an
interactive network chart of the core genome in
a web browser.
-v, --verbose Set this flag to enable more verbose logging.
--version Use this flag to print the version and exit.
--help Show this message and exit.
Tree module¶
Results generated via the core module can be further processed using
the tree module.
Usage: centreseq tree [OPTIONS]
Processes centreseq core output files to produce files that can be fed
into phylogenetic tree building software.
Options:
-s, --summary-report PATH Path to summary_report.csv file produced by the
core pipeline [required]
-p, --prokka-dir PATH Path to the Prokka output directory generated by
the core pipeline [required]
-o, --outdir PATH Root directory to store all output files
[required]
-pct, --percentile FLOAT Filter summary report by n_members to the top nth
percentile. Defaults to 99.0.
-n, --n-cpu INTEGER Number of CPUs to dedicate to parallelizable
steps of the pipeline. Will take all available
CPUs - 1 if not specified.
-v, --verbose Set this flag to enable more verbose logging.
--version Use this flag to print the version and exit.
--help Show this message and exit.
Subset module¶
The subset module allows for filtering of the summary report output
file generated by the core module.
Usage: centreseq subset [OPTIONS]
Given an input text file of Sample IDs and a summary report, will return a
filtered version of the summary report for genes that belong exclusively
in the input sample ID list
Options:
-i, --input-samples PATH Path to a new line separated text file containing
each Sample ID to target [required]
-s, --summary-report PATH Path to summary report generated by the centreseq
core command, i.e. summary_report.tsv [required]
-o, --outpath PATH Path to desired output file. If no value is
provided, will create a new report in the same
directory as the input summary report.
--help Show this message and exit.
Extract module¶
The extract module provides functionality to extract .ffn and
.faa sequences from an existing cluster detected by the core
module.
Usage: centreseq extract [OPTIONS]
Given the path to the centreseq core directory and the ID of a
cluster representative, will create a multi-FASTA containing the sequences
for all members of that cluster. Generates both an .ffn and .faa file.
Options:
-i, --indir PATH Path to your centreseq output directory
[required]
-o, --outdir PATH Root directory to store all output files
[required]
-c, --cluster_representative TEXT
Name of the target cluster representative
e.g. "Typhi.2299.BMH_00195" [required]
--version Use this flag to print the version and exit.
--help Show this message and exit.