Introduction to the Patric Command Line Interface (CLI)

Most of the Patric command line tools take as input a file containing a single column set or a tab-separated table and they output a modified table. The most common modification is the addition of one of more columns. We create "pipelines" of these tools to implement fairly complex transformations leading to the final table containing the desired output. We begin with accessing information about genomes.

Accessing Genome Information

Consider the following example.

p3-all-genomes 
	
genome.genome_id
1390.176
1398.26
1345597.3
282669.3
.
.
.

p3-all-genomes takes no input and is what we call a generator; it returns the set of all genome ids in Patric. Notice that the first line is a header identifying the columns in the table. Most p3 commands expect this header. If you were interested in certain data about the genomes, you would use the command p3-get-genome-data, which takes as input a set of genome ids, like this;

p3-all-genomes |p3-get-genome-data

genome.genome_id	genome.genome_id	genome.genome_name	genome.taxon_id	genome.genome_status	genome.gc_content
1390.176	1390.176	Bacillus amyloliquefaciens strain B425	1390	WGS	45.7
1398.26	1398.26	Bacillus coagulans strain B4098	1398	WGS	47.39
1345597.3	1345597.3	Helicobacter pylori SA216A	1345597	WGS	39.02
282669.3	282669.3	Psychrobacter cibarius strain W1	282669	WGS	70.43
.
.
.

p3-all-genomes is used to generate a set of input ids that we pipe into p3-get-genome-data, which returns a 6 column table with data about the genome; the genome_id, genome_name, taxon_id, genome_status and gc_content. You can control which of these fields is returned with the –a argument.

p3-all-genomes |p3-get-genome-data -a genome_name

genome.genome_id	genome.genome_name
1390.176	Bacillus amyloliquefaciens strain B425
1398.26	Bacillus coagulans strain B4098
1345597.3	Helicobacter pylori SA216A
282669.3	Psychrobacter cibarius strain W1
1349753.3	Caldimonas taiwanensis NBRC 104434
1285191.3	Desulfotomaculum intricatum strain NBRC 109411
.
.
.

If you were interested in only Streptococcus genomes, you could use the match command like this;

p3-all-genomes  | p3-get-genome-data -a genome_name | p3-match -c2 Streptococcus

genome.genome_id	genome.genome_name
1313.7195	Streptococcus pneumoniae strain 2842STDY5753638
1313.7189	Streptococcus pneumoniae strain 2842STDY5643920
1313.7203	Streptococcus pneumoniae strain 2842STDY5643723
1313.7208	Streptococcus pneumoniae strain 2842STDY5643999
1313.7199	Streptococcus pneumoniae strain 2842STDY5644588
1313.7207	Streptococcus pneumoniae strain 2842STDY5643980
.
.
.

Here, we retrieved the id of all genomes in Patric, piped that output to get the name and produce a two column table, and piped that table to a command to match for the string Streptococcus in column 2, thus filtering the table to contain only Streptococcus genomes.
There are other ways to accomplish this, but the example serves to demonstrate what we mean by creating pipelines of commands and producing tables of information.

Accessing Features

If you want to look into the features of a genome, you would use the p3-get-genome-features command. p3-genome-features takes as input a set of genome ids. The following example uses the p3-echo command to generate input for p3-get-genome-features.

p3-echo -t genome.genome_id 282669.3 | p3-get-genome-features | head 

genome.genome_id	feature.patric_id	feature.feature_type	feature.location	feature.product
282669.3	fig|282669.3.repeat.1	repeat_region	1..127	repeat region
282669.3	fig|282669.3.repeat.2	repeat_region	586..712	repeat region
282669.3	fig|282669.3.peg.4	CDS	complement(1..909)	Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit A (EC 6.3.5.7)
282669.3	fig|282669.3.repeat.3	repeat_region	1..127	repeat region
282669.3	fig|282669.3.repeat.4	repeat_region	805..931	repeat region
282669.3	fig|282669.3.repeat.5	repeat_region	869..1006	repeat region
282669.3	fig|282669.3.repeat.6	repeat_region	1..127	repeat region
282669.3	fig|282669.3.repeat.7	repeat_region	1110..1236	repeat region
282669.3	fig|282669.3.repeat.8	repeat_region	1..127	repeat region

Notice that the command returns all information about features by default. If you were only interested in the feature ids, you would specify that with the -a option.

p3-echo -t genome.genome_id 282669.3 | p3-get-genome-features -a patric_id| head 
genome.genome_id	feature.patric_id

282669.3	fig|282669.3.repeat.1
282669.3	fig|282669.3.repeat.2
282669.3	fig|282669.3.peg.4
282669.3	fig|282669.3.repeat.3
282669.3	fig|282669.3.repeat.4
282669.3	fig|282669.3.repeat.5
282669.3	fig|282669.3.repeat.6
282669.3	fig|282669.3.repeat.7
282669.3	fig|282669.3.repeat.8

Since this returns all feature types, it might be desirable to limit the features returned to a specific type. Here, we return the ids of only the pegs in a Genome by using the --equal option.

 p3-echo -t genome.genome_id 282669.3 | p3-get-genome-features --equal feature_type,CDS -a patric_id| head 
 genome.genome_id	feature.patric_id


 282669.3	fig|282669.3.peg.4
 282669.3	fig|282669.3.peg.43
 282669.3	fig|282669.3.peg.72
 282669.3	fig|282669.3.peg.83
 282669.3	fig|282669.3.peg.90
 282669.3	fig|282669.3.peg.117
 282669.3	fig|282669.3.peg.179
 282669.3	fig|282669.3.peg.207
 282669.3	fig|282669.3.peg.214

In this tutorial we have introduced the basics of using the Patric Command Line Interface (CLI) and how to access data relating to genomes and features.
In the following tutorials, you will learn how to install the Patric CLI, what all the commands are and how to use them to explore the Patric website, to build collections of data and to apply bioinformatic tools against your data.