Performs an internal mass recalibration on an MS experiment.

pot. predecessor tools	→ InternalCalibration →	pot. successor tools
PeakPickerWavelet		any tool operating on MS peak data (in mzML format)
FeatureFinderCentroided		any tool operating on MS peak data (in mzML format)

Given reference masses (as either peptide identifications or as list of fixed masses) an MS experiment can be recalibrated using a linear or quadratic regression fitted to the observed vs. the theoretical masses.

Chose one of two optional input files: 1) peptide identifications (from featureXML or idXML) using 'id_in' 2) lock masses using 'lock_in'

The user can choose whether the calibration function shall be calculated for each spectrum separately or once for the whole map. If this is done scan-wise, a user-defined range of neighboring spectra is searched for lock masses/peptide IDs. They are used to build a model, which is applied to the spectrum at hand. The RT range ('RT_chunking') should be small enough to resolve time-dependent change of decalibration, but wide enough to have enough calibrant masses for a stable model. A linear model requires at least two calibrants, a quadradic at least three. Usually, the RT range should provide about 3x more calibrants than required, i.e. 6(=3x2) for linear, and 9(=3x3) for quadratic models. If the calibrant data is too sparse for a certain scan, the closest neighboring model will be used automatically. If no model can be calculated anywhere, the tool will fail.

Optional quality control output files allow to judge the success of calibration. It is strongly advised to inspect them. If PNG images are requested, 'R' (statistical programming language) needs to be installed and available on the system path!

Outlier detection is supported using the RANSAC algorithm. However, usually it's better to provide high-confidence calibrants instead of relying on automatic removal of outliers.

Post calibration statistics (median ppm and median-absolute-deviation) are automatically computed. The calibration is deemed successful if the statistics are within certain bounds ('goodness:XXX').

Detailed description for each calibration method: 1) [id_in] The peptide identifications should be derived from the very same mzML file using a wide precursor window (e.g. 25 ppm), which captures the possible decalibration. Subsequently, the IDs should be filtered for high confidence (e.g. low FDR, ideally FDR=0.0) and given as input to this tool. Remaining outliers can be removed by using RANSAC. The data might benefit from a precursor mass correction (e.g. using HighResPrecursorMassCorrector), before an MS/MS search is done. The list of calibrants is derived solely from the idXML/featureXML and only the resulting model is applied to the mzML.

2) [lock_in] Calibration can be performed using specific lock masses which occur in most spectra. The structure of the cal:lock_in CSV file is as follows: Each line represents one lock mass in the format: <m/z>, <ms-level>, <charge> Lines starting with # are treated as comments and ignored. The ms-level is usually '1', but you can also use '2' if there are fragment ions commonly occurring.

Example:

# lock mass at 574 m/z at MS1 with charge 2

574.345, 1, 2

Additional filters ('cal:lock_require_mono', 'cal:lock_require_iso') allow to exclude spurious false-positive calibrant peaks. These filters require knowledge of the charge state, thus charge needs to be specified in the input CSV. Detailed information on which lock masses passed these filters are available when -debug is used (any level).

The calibration function will use all lock masses (i.e. from all ms-levels) within the defined RT range to calibrate a spectrum. Thus, care should be taken that spectra from ms-levels specified here, are recorded using the same mass analyzer (MA). This is no issue for a Q-Exactive (which only has one MA), but depends on the acquisition scheme for instruments with two/three MAs (e.g. for Orbitrap Velos, MS/MS spectra are commonly acquired in the ion trap and should not be used during calibration of MS1).

General remarks: The user can select what MS levels are subjected to calibration. Calibration must be done once for each mass analyzer. Usually, peptide ID's provide calibration points for MS1 precursors, i.e. are suitable for MS1. They are applicable for MS2 only if the same mass analyzer was used (e.g. Q-Exactive). In other words, MS/MS spectra acquired using the ion trap analyzer of a Velos cannot be calibrated using peptide ID's. Precursor m/z associated to higher-level MS spectra are corrected if their precursor spectra are subject to calibration, e.g. precursor information within MS2 spectra is calibrated if target ms-level is set to 1. Lock masses ('cal:lock_in') can be specified freely for MS1 and/or MS2.

Note: The tool assumes the input data is already picked/centroided.; Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

InternalCalibration -- Applies an internal mass recalibration.
Full documentation: http://www.openms.de/doxygen/release/3.1.0/html/TOPP_InternalCalibration.html
Version: 3.1.0 Oct 18 2023, 10:27:18, Revision: 17a07f8
To cite OpenMS:
 + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for 
   mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  InternalCalibration <options>

Options (mandatory options marked with '*'):
  -in <file>*                                Input peak file (valid formats: 'mzML')
  -out <file>*                               Output file  (valid formats: 'mzML')
  -rscript_executable <file>                 Path to the Rscript executable (default: 'Rscript').
                                             
  -ppm_match_tolerance <delta m/z in [ppm]>  Finding calibrants in raw data uses this tolerance (for lock 
                                             masses and ID's). (default: '25.0')

Chose one of two optional input files ('id_in' or 'lock_in') to define the calibration masses/function:
  -cal:id_in <file>                          Identifications or features whose peptide ID's serve as calibrat
                                             ion masses. (valid formats: 'idXML', 'featureXML')
  -cal:lock_in <file>                        Input file containing reference m/z values (text file with each 
                                             line as: m/z ms-level charge) which occur in all scans. (valid 
                                             formats: 'csv')
  -cal:lock_out <file>                       Optional output file containing peaks from 'in' which were match
                                             ed to reference m/z values. Useful to see which peaks were used 
                                             for calibration. (valid formats: 'mzML')
  -cal:lock_fail_out <file>                  Optional output file containing lock masses which were NOT found
                                              or accepted(!) in data from 'in'. Useful to see which peaks 
                                             were used for calibration. (valid formats: 'mzML')
  -cal:lock_require_mono                     Require all lock masses to be monoisotopic, i.e. not the iso1, 
                                             iso2 etc ('charge' column is used to determine the spacing). 
                                             Peaks which are not mono-isotopic are not used.
  -cal:lock_require_iso                      Require all lock masses to have at least the +1 isotope. Peaks 
                                             without isotope pattern are not used.
  -cal:model_type <model>                    Type of function to be fitted to the calibration points. (defaul
                                             t: 'linear_weighted') (valid: 'linear', 'linear_weighted', 'quad
                                             ratic', 'quadratic_weighted')

                                             
  -ms_level i j ...                          Target MS levels to apply the transformation onto. Does not affe
                                             ct calibrant collection. (default: '[1 2 3]')
  -RT_chunking <RT window in [sec]>          RT window (one-sided, i.e. left->center, or center->right) aroun
                                             d an MS scan in which calibrants are collected to build a model.
                                              Set to -1 to use ALL calibrants for all scans, i.e. a global 
                                             model. (default: '300.0')

Robust outlier removal using RANSAC:
  -RANSAC:enabled                            Apply RANSAC to calibration points to remove outliers before 
                                             fitting a model.
  -RANSAC:threshold <threshold>              Threshold for accepting inliers (instrument precision (not accur
                                             acy!) as ppm^2 distance) (default: '10.0')
  -RANSAC:pc_inliers <# inliers>             Minimum percentage (of available data) of inliers (<threshold 
                                             away from model) to accept the model. (default: '30') (min: '1' 
                                             max: '99')
  -RANSAC:iter <# iterations>                Maximal # iterations. (default: '70')

Thresholds for accepting calibration success:
  -goodness:median <threshold>               The median ppm error of calibrated masses must be smaller than 
                                             this threshold. (default: '4.0')
  -goodness:MAD <threshold>                  The median absolute deviation of the ppm error of calibrated 
                                             masses must be smaller than this threshold. (default: '2.0')

Tables and plots to verify calibration performance:
  -quality_control:models <table>            Table of model parameters for each spectrum. (valid formats: 
                                             'csv')
  -quality_control:models_plot <image>       Plot image of model parameters for each spectrum. (valid formats
                                             : 'png')
  -quality_control:residuals <table>         Table of pre- and post calibration errors. (valid formats: 'csv'
                                             )
  -quality_control:residuals_plot <image>    Plot image of pre- and post calibration errors. (valid formats: 
                                             'png')

                                             
Common TOPP options:
  -ini <file>                                Use the given TOPP INI file
  -threads <n>                               Sets the number of threads allowed to be used by the TOPP tool 
                                             (default: '1')
  -write_ini <file>                          Writes the default configuration file
  --help                                     Shows options
  --helphelp                                 Shows all options (including advanced)

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+InternalCalibrationApplies an internal mass recalibration.

version3.1.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'InternalCalibration'

in Input peak fileinput file*.mzML

out Output file output file*.mzML

rscript_executableRscript Path to the Rscript executable (default: 'Rscript').input file, is_executable

ppm_match_tolerance25.0 Finding calibrants in raw data uses this tolerance (for lock masses and ID's).

ms_level[1, 2, 3] Target MS levels to apply the transformation onto. Does not affect calibrant collection.

RT_chunking300.0 RT window (one-sided, i.e. left->center, or center->right) around an MS scan in which calibrants are collected to build a model. Set to -1 to use ALL calibrants for all scans, i.e. a global model.

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue, false

forcefalse Overrides tool-specific checkstrue, false

testfalse Enables the test mode (needed for internal use only)true, false

+++calChose one of two optional input files ('id_in' or 'lock_in') to define the calibration masses/function

id_in Identifications or features whose peptide ID's serve as calibration masses.input file*.idXML, *.featureXML

lock_in Input file containing reference m/z values (text file with each line as: m/z ms-level charge) which occur in all scans.input file*.csv

lock_out Optional output file containing peaks from 'in' which were matched to reference m/z values. Useful to see which peaks were used for calibration.output file*.mzML

lock_fail_out Optional output file containing lock masses which were NOT found or accepted(!) in data from 'in'. Useful to see which peaks were used for calibration.output file*.mzML

lock_require_monofalse Require all lock masses to be monoisotopic, i.e. not the iso1, iso2 etc ('charge' column is used to determine the spacing). Peaks which are not mono-isotopic are not used.true, false

lock_require_isofalse Require all lock masses to have at least the +1 isotope. Peaks without isotope pattern are not used.true, false

model_typelinear_weighted Type of function to be fitted to the calibration points.linear, linear_weighted, quadratic, quadratic_weighted

+++RANSACRobust outlier removal using RANSAC

enabledfalse Apply RANSAC to calibration points to remove outliers before fitting a model.true, false

threshold10.0 Threshold for accepting inliers (instrument precision (not accuracy!) as ppm^2 distance)

pc_inliers30 Minimum percentage (of available data) of inliers (1:99

iter70 Maximal # iterations.

+++goodnessThresholds for accepting calibration success

median4.0 The median ppm error of calibrated masses must be smaller than this threshold.

MAD2.0 The median absolute deviation of the ppm error of calibrated masses must be smaller than this threshold.

+++quality_controlTables and plots to verify calibration performance

models Table of model parameters for each spectrum.output file*.csv

models_plot Plot image of model parameters for each spectrum.output file*.png

residuals Table of pre- and post calibration errors.output file*.csv

residuals_plot Plot image of pre- and post calibration errors.output file*.png