PercolatorAdapter facilitates the input to, the call of and output integration of Percolator. Percolator (http://percolator.ms/) is a tool to apply semi-supervised learning for peptide identification from shotgun proteomics datasets.

Experimental classes:: This tool is work in progress and usage and input requirements might change.

pot. predecessor tools	→ PercolatorAdapter →	pot. successor tools
PSMFeatureExtractor	→ PercolatorAdapter →	IDFilter

Percolator is search engine sensitive, i.e. it's input features vary, depending on the search engine. Must be prepared beforehand. If you do not want to use the specific features, use the generic_feature_set flag. Will incorporate the score attribute of a PSM, so be sure, the score you want is set as main score with IDScoreSwitcher . Be aware, that you might very well experience a performance loss compared to the search engine specific features. You can also perform protein inference with percolator when you activate the protein fdr parameter. Additionally you need to set the enzyme setting. We only read the q-value for protein groups since Percolator has a more elaborate FDR estimation. For proteins we add q-value as main score and PEP as metavalue. For PSMs you can choose the main score. Peptide level FDRs cannot be parsed and used yet.

Multithreading: The thread parameter is passed to percolator. Note: By default, a minimum of 3 threads is used (default of percolator) even if the number of threads is set to e.g. 1 for backwards compatibility reasons. You can still force the usage of less than 3 threads by setting the force flag.

The command line parameters of this tool are:

PercolatorAdapter -- Facilitate input to Percolator and reintegrate.
Full documentation: http://www.openms.de/doxygen/release/3.1.0/html/TOPP_PercolatorAdapter.html
Version: 3.1.0 Oct 18 2023, 10:27:18, Revision: 17a07f8
To cite OpenMS:
 + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for 
   mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  PercolatorAdapter <options>

Options (mandatory options marked with '*'):
  -in <files>                           Input file(s) (valid formats: 'mzid', 'idXML')
  -in_decoy <files>                     Input decoy file(s) in case of separate searches (valid formats: 'mzi
                                        d', 'idXML')
  -in_osw <file>                        Input file in OSW format (valid formats: 'OSW')
  -out <file>*                          Output file (valid formats: 'idXML', 'mzid', 'osw')
  -out_type <type>                      Output file type -- default: determined from file extension or conten
                                        t. (valid: 'mzid', 'idXML', 'osw')
  -enzyme <enzyme>                      Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chy
                                        motrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin,trypsinp (default: 
                                        'trypsin') (valid: 'no_enzyme', 'elastase', 'pepsin', 'proteinasek', 
                                        'thermolysin', 'chymotrypsin', 'lys-n', 'lys-c', 'arg-c', 'asp-n', 
                                        'glu-c', 'trypsin', 'trypsinp')
  -percolator_executable <executable>*  The Percolator executable. Provide a full or relative path, or make 
                                        sure it can be found in your PATH environment.
  -peptide_level_fdrs                   Calculate peptide-level FDRs instead of PSM-level FDRs.
  -protein_level_fdrs                   Use the picked protein-level FDR to infer protein probabilities. Use 
                                        the -fasta option and -decoy_pattern to set the Fasta file and decoy 
                                        pattern.
  -osw_level <osw_level>                OSW: the data level selected for scoring. (default: 'ms2') (valid: 
                                        'ms1', 'ms2', 'transition')
  -score_type <type>                    Type of the peptide main score (default: 'q-value') (valid: 'q-value'
                                        , 'pep', 'svm')
                                        
Common TOPP options:
  -ini <file>                           Use the given TOPP INI file
  -threads <n>                          Sets the number of threads allowed to be used by the TOPP tool (defau
                                        lt: '1')
  -write_ini <file>                     Writes the default configuration file
  --help                                Shows options
  --helphelp                            Shows all options (including advanced)

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+PercolatorAdapterFacilitate input to Percolator and reintegrate.

version3.1.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'PercolatorAdapter'

in[] Input file(s)input file*.mzid, *.idXML

in_decoy[] Input decoy file(s) in case of separate searchesinput file*.mzid, *.idXML

in_osw Input file in OSW formatinput file*.OSW

out Output fileoutput file*.idXML, *.mzid, *.osw

out_pin Write pin file (e.g., for debugging)output file*.tsv

out_pout_target Write pout file (e.g., for debugging)output file*.tab

out_pout_decoy Write pout file (e.g., for debugging)output file*.tab

out_pout_target_proteins Write pout file (e.g., for debugging)output file*.tab

out_pout_decoy_proteins Write pout file (e.g., for debugging)output file*.tab

out_type Output file type -- default: determined from file extension or content.mzid, idXML, osw

enzymetrypsin Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin,trypsinpno_enzyme, elastase, pepsin, proteinasek, thermolysin, chymotrypsin, lys-n, lys-c, arg-c, asp-n, glu-c, trypsin, trypsinp

percolator_executablepercolator The Percolator executable. Provide a full or relative path, or make sure it can be found in your PATH environment.input file, is_executable

peptide_level_fdrsfalse Calculate peptide-level FDRs instead of PSM-level FDRs.true, false

protein_level_fdrsfalse Use the picked protein-level FDR to infer protein probabilities. Use the -fasta option and -decoy_pattern to set the Fasta file and decoy pattern.true, false

osw_levelms2 OSW: the data level selected for scoring.ms1, ms2, transition

score_typeq-value Type of the peptide main scoreq-value, pep, svm

generic_feature_setfalse Use only generic (i.e. not search engine specific) features. Generating search engine specific features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly.true, false

subset_max_train0 Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal.

cpos0.0 Cpos, penalty for mistakes made on positive examples. Set by cross validation if not specified.

cneg0.0 Cneg, penalty for mistakes made on negative examples. Set by cross validation if not specified.

testFDR0.01 False discovery rate threshold for evaluating best cross validation result and the reported end result.

trainFDR0.01 False discovery rate threshold to define positive examples in training. Set to testFDR if 0.

maxiter10 Maximal number of iterations

nested_xval_bins1 Number of nested cross-validation bins in the 3 splits.

quick_validationfalse Quicker execution by reduced internal cross-validation.true, false

weights Output final weights to the given fileoutput file*.tsv

init_weights Read initial weights to the given fileinput file*.tsv

staticfalse Use static model (requires init-weights parameter to be set)true, false

default_direction The most informative feature given as the feature name, can be negated to indicate that a lower value is better.

verbose2 Set verbosity of output: 0=no processing info, 5=all.

unitnormfalse Use unit normalization [0-1] instead of standard deviation normalizationtrue, false

test_each_iterationfalse Measure performance on test set each iterationtrue, false

overridefalse Override error check and do not fall back on default score vector in case of suspect score vectortrue, false

seed1 Setting seed of the random number generator.

doc0 Include description of correct features

klammerfalse Retention time features calculated as in Klammer et al. Only available if -doc is settrue, false

fasta Provide the fasta file as the argument to this flag, which will be used for protein grouping based on an in-silico digest (only valid if option -protein_level_fdrs is active).input file*.FASTA

decoy_patternrandom Define the text pattern to identify the decoy proteins and/or PSMs, set this up if the label that identifies the decoys in the database is not the default (Only valid if option -protein_level_fdrs is active).

post_processing_tdcfalse Use target-decoy competition to assign q-values and PEPs.true, false

train_best_positivefalse Enforce that, for each spectrum, at most one PSM is included in the positive set during each training iteration. If the user only provides one PSM per spectrum, this filter will have no effect.true, false

ipf_max_peakgroup_pep0.7 OSW/IPF: Assess transitions only for candidate peak groups until maximum posterior error probability.

ipf_max_transition_isotope_overlap0.5 OSW/IPF: Maximum isotope overlap to consider transitions in IPF.

ipf_min_transition_sn0.0 OSW/IPF: Minimum log signal-to-noise level to consider transitions in IPF. Set -1 to disable this filter.

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue, false

forcefalse Overrides tool-specific checkstrue, false

testfalse Enables the test mode (needed for internal use only)true, false

Percolator is written by Lukas Käll (http://per-colator.com/ Copyright Lukas Käll lukas.nosp@m..kal.nosp@m.l@sci.nosp@m.life.nosp@m.lab.s.nosp@m.e)