Computes a protein identification score based on an aggregation of scores of identified peptides.

pot. predecessor tools	$\longrightarrow$ ProteinInterference $\longrightarrow$	pot. successor tools
CometAdapter (or other ID engines)		PeptideIndexer
FalseDiscoveryRate
IDFilter

This tool counts and aggregates the scores of peptide sequences that match a protein accession. Only the top PSM for a peptide is used. By default it also annotates the number of peptides used for the calculation (metavalue "nr_found_peptides") and can be used for further filtering. 0 probability peptides are counted but ignored in aggregation method "multiplication".

Note: Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

Todo:

possibly integrate parsimony approach from OpenMS::PSProteinInference class The command line parameters of this tool are:

ProteinInference -- Protein inference based on an aggregation of the scores of the identified peptides.
Full documentation: http://www.openms.de/doxygen/release/2.7.0/html/TOPP_ProteinInference.html
Version: 2.7.0 Sep 13 2021, 20:58:47, Revision: 9110e58
To cite OpenMS:
  Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  ProteinInference <options>

Options (mandatory options marked with '*'):
  -in <file>*                                               Input file(s) (valid formats: 'idXML')
  -out <file>*                                              Output file (valid formats: 'idXML')
  -merge_runs <choice>                                      If your idXML contains multiple runs, merge them 
                                                            beforehand? (default: 'no' valid: 'no', 'all')
  -annotate_indist_groups <choice>                          If you want to annotate indistinguishable protein
                                                            groups, either for reporting or for group based
                                                            quant. later. Only works with a single ID run in
                                                            the file. (default: 'true' valid: 'true', 'false'
                                                            )
                                                            

Merging:
  -Merging:annotate_origin <choice>                         If true, adds a map_index MetaValue to the Peptid
                                                            eIDs to annotate the IDRun they came from. (defau
                                                            lt: 'true' valid: 'true', 'false')
  -Merging:allow_disagreeing_settings                       Force merging of disagreeing runs. Use at your 
                                                            own risk.

Algorithm:
  -Algorithm:min_peptides_per_protein <number>              Minimal number of peptides needed for a protein 
                                                            identification. If set to zero, unmatched protein
                                                            s get a score of -Infinity. If bigger than zero,
                                                            proteins with less peptides are filtered and evid
                                                            ences removed from the PSMs. PSMs that do not
                                                            reference any proteins anymore are removed but
                                                            the spectrum info is kept. (default: '1' min:
                                                            '0')
  -Algorithm:score_aggregation_method <choice>              How to aggregate scores of peptides matching to 
                                                            the same protein? (default: 'maximum' valid: 'max
                                                            imum', 'product', 'sum')
  -Algorithm:treat_charge_variants_separately <text>        If this is set, different charge variants of the 
                                                            same peptide sequence count as individual evidenc
                                                            es. (default: 'true')
  -Algorithm:treat_modification_variants_separately <text>  If this is set, different modification variants 
                                                            of the same peptide sequence count as individual
                                                            evidences. (default: 'true')
  -Algorithm:use_shared_peptides <text>                     If this is set, shared peptides are used as evide
                                                            nces. (default: 'true')
  -Algorithm:skip_count_annotation <text>                   If this is true, peptide counts won't be annotate
                                                            d at the proteins. (default: 'false')

                                                            
Common TOPP options:
  -ini <file>                                               Use the given TOPP INI file
  -threads <n>                                              Sets the number of threads allowed to be used by 
                                                            the TOPP tool (default: '1')
  -write_ini <file>                                         Writes the default configuration file
  --help                                                    Shows options
  --helphelp                                                Shows all options (including advanced)

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+ProteinInferenceProtein inference based on an aggregation of the scores of the identified peptides.

version2.7.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'ProteinInference'

in[] input file(s)input file*.idXML

out output fileoutput file*.idXML

merge_runsno If your idXML contains multiple runs, merge them beforehand?no,all

annotate_indist_groupstrue If you want to annotate indistinguishable protein groups, either for reporting or for group based quant. later. Only works with a single ID run in the file.true,false

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue,false

forcefalse Overrides tool-specific checkstrue,false

testfalse Enables the test mode (needed for internal use only)true,false

+++Merging

annotate_origintrue If true, adds a map_index MetaValue to the PeptideIDs to annotate the IDRun they came from.true,false

allow_disagreeing_settingsfalse Force merging of disagreeing runs. Use at your own risk.true,false

+++Algorithm

min_peptides_per_protein1 Minimal number of peptides needed for a protein identification. If set to zero, unmatched proteins get a score of -Infinity. If bigger than zero, proteins with less peptides are filtered and evidences removed from the PSMs. PSMs that do not reference any proteins anymore are removed but the spectrum info is kept.0:∞

score_aggregation_methodmaximum How to aggregate scores of peptides matching to the same protein?maximum,product,sum

treat_charge_variants_separatelytrue If this is set, different charge variants of the same peptide sequence count as individual evidences.

treat_modification_variants_separatelytrue If this is set, different modification variants of the same peptide sequence count as individual evidences.

use_shared_peptidestrue If this is set, shared peptides are used as evidences.

skip_count_annotationfalse If this is true, peptide counts won't be annotated at the proteins.