FLASHDECONV 2.0 BETA+, FINALLY WITH A GUI!

Finally a GUI is here. You can find the GUI command in [OpenMS path]/bin folder. Go to [OpenMS path]/bin and run FLASHDeconvWizard! FLASHDeconv 2.0 beta+ works for MS1 and MS2 spectral deconvolution and feature deconvolution. It supports various output formats (e.g., *.tsv, *.mzML, *.msalign, and *.feature). FLASHDeconv 2.0 stable version will be officially integrated in OpenMS 2.7.0 released in near future. FLASHDeconv 2.0 beta+ also supports TopPIC identification better than the previous version, by generating all msalign and feature files for TopPIC inputs. We also added spectral merging function to support QTOF dataset analysis and NativeMS dataset analysis.

Changes:

  • FLASHDeconvWizard (GUI) is added!
  • FLASHIda support (-in_log option)
  • We no longer recommend profile mode spectra. Peak picked spectra (by MSConvert vendor provided peak picking) are recommended as inputs.
  • merging_method option is introduced to merge or average MS2 spectra.
  • use_ensemble_spectrum option has been removed (replaced by -merging_method).
  • target_mass option is added to perform targeted deconvolution (deconvolution quality control is relaxed for target masses) – target_sequence or proteoform option will be soon added.
  • min_precursor_snr option is introduced that (currently) only affects msalign and feature files for TopPIC.
  • out_topFD_feature option is introduced that outputs feature file for TopPIC. In TopPIC, no need to use -x option with this feature file input.
  • Quality measure score (QScore) is added per each deconvolved mass in spectral deconvolution results. QScore is the probability that a mass is identified, learned by a logistic regression (related publication will be added here). Note that it is the probability that the mass is “identified” not “correct.”
  • Both MS1 and MS2 deconvolution have been extensively improved (tested by proteoform ID sensitivity, coupled with TopPIC).
  • Works well for both centroid and profile spectra. In particular for MS2, centroid spectra should be used.
  • Support negative charges (set by -Algorithm:min_charge and -Algorithm:max_charge parameters; see below).
  • Parameter set is redefined (see below).
  • Batch execution is not supported for FLASHDeconv binary. Separate batch files will be prepared soon.
  • Deconvolved spectra may be output in mzml format (-out_mzml [mzml file]).
  • Deconvolved MS1 spectra may be output in Promex format (-out_promex [ms1ft file]).
  • Deconvolved MS1/2 spectra may be output in TopFD format (-out_topFD [msalign file per MS level]).
  • Deconvolved MS1/2 features may be output in TopFD format (-out_topFD_feature [feature file per MS level]).
  • Effective harmonic artifact elimination in mass dimension effectively reduces false negatives while keeping true positives.

Under development

  • Proforma 2.0 support (-target_seq option)
  • Deep learning based deconvolution quality measure
  • QScore training interface
  • Parameter set for different protocols (e.g., Native-MS, HighRes TDP, …)
  • Merge into OpenMS 3.0

Installation

FLASHDeconv installation files (OpenMS-2.x.0-HEAD-, for windows *.exe, for mac *.dmg, and for linux .deb) and source code (-src.tar.gz) are found in here. For the latest version, go to the bottom side of the page and select the most recent installation file.

Parameters

FLASHDeconv basic parameters are found by simply running FLASHDeconv. Only -in and -out are mandatory. FLASH advanced parameters are found by running FLASHDeconv –helphelp. FLASHDeconv parameters have three categories: FLASHDeconv tool parameters, FLASHDeconv algorithm parameters, and FeatureTracing algorithm parameters. Firstly the basic parameters in each category are described, and then the advanced ones are explained.

Basic tool parameters:

  • in: input file (only *.mzML files are currently accepted).
  • in_log: Log file generated by FLASHIda (IDA*.log). Only needed for coupling with FLASHIda acquisition (valid formats: ‘log’)
  • out: *.tsv file for feature level deconvolution results.
  • out_spec: *.tsv files for spectrum level deconvolution results. Files should be specified per MS level.
  • out_mzml: *.mzML file for MS1 and MS2 deconvolved spectra.
  • out_promex: *.ms1ft (promex output format) file. Only MS1 deconvolved masses are written.
  • out_topFD: *.msalign (TopFD output format) files. Files should be specified per MS level.
  • out_topFD_feature: *.feature (TopFD feature output format) files. Files should be specified per MS level.
  • min_precursor_snr: minimum precursor SNR (default 1.0)
  • mzml_mass_charge: specifies the charge of deconvolved masses (-1, 0, or +1) in mzML output.
  • preceding_MS1_cout: specifies until how many preceding MS1 spectra precursor mass will be searched in, given an MS2 spectrum. In top-down proteomics, some precursor peaks in MS2 are not part of deconvolved masses in MS1 immediatly preceding the MS2. In such cases, increasing this parameter allows for the search in further preceding MS1 spectra and helps determine exact precursor masses.
  • write_detail: to write peak information more in detail (in spectrum level deconvolution *tsv files)
  • merging method: method of spectra merging which should be used. 0: No merging (default) 1: Average gaussian method to perform moving gaussian averaging of spectra per MS level. Effective to increase proteoform ID sensitivity (in particular for Q-TOF datasets). 2: Block method to perform merging of all spectra into a single one per MS level (e.g., for NativeMS datasets)
  • target_mass: target monoisotopic masses for deconvolution or a txt file containing target masses. Masses are separated by commas. For instance, 100.0,200.0 will target 100.0 and 200.0 Da masses. A plane text file containing the same target mass information may be used instead. For each targeted mass, FLASHDeconv attempts to find the mass from input spectrum file. If spectral peaks corresponding to the target mass, the target mass will be reported regardless of its quality (e.g., IsotopeCosine score).

Basic algorithm parameters (with prefix Algorithm: )

  • Algorithm:tol: tolerance for each MS level in PPM. For example, 10.0 15.0 specify 10ppm and 15ppm for MS1 and MS2, respectively.
  • Algorithm:min_mass: minimum deconvolved mass.
  • Algorithm:max_mass: maximum deconvolved mass.
  • Algorithm:min_charge: minimum charge of MS1 peaks. This can be set negative for negative mode MS runs (as in RNA sequencing). For MS2, minimum charge is set to 1.
  • Algorithm:max_charge: maximum charge of MS1 peaks. This can be set negative for negative mode MS runs (as in RNA sequencing). For MS2, maximum charge is set to its precursor charge.
  • Algorithm:min_isotope_cosine: Cosine threshold between avg. and observed isotope pattern for MS1, 2, …

Basic FeatureTracing parameters (with prefix FeatureTracing: )

  • FeatureTracing:mass_error: mass tolerance for feature tracing. Default mass tolerance unit is ppm.
  • FeatureTracing:mass_error_unit: mass tolerance unit for feature tracing. ppm (default) or da.
  • FeatureTracing:min_sample_rate: minimum fraction of scans along the feature trace that must contain a peak. To raise feature detection sensitivity, lower this value close to 0.
  • FeatureTracing:min_trace_length: minimum expected length of a feature in second.

Advanced tool parameters:

  • max_MS_level: specifies the maximum MS level.
  • use_RNA_averagine: if set to 1, RNA averageine model is used instead of protein model.

Advanced algorithm parameters (with prefix Algorithm: )

  • Algorithm:min_mz : minimum m/z value in Th.
  • Algorithm:max_mz : maximum m/z value in Th.
  • Algorithm:min_rt : minimum retention time in seconds.
  • Algorithm:max_rt : maximum retention time in seconds.
  • Algorithm:min_peaks : minimum number of peaks of consecutive charge states per MS level.(e.g., -min_peaks 4 2 to specify 4 and 2 for MS1 and MS2, respectively). This affects only for peaks of highly charged peaks (>8). The peaks of low charges are detected based on m/z distance between isotopes.
  • Algorithm:min_mass_count : minimum number of deconvolved mass per spectrum. Only used for real time deconvolution.
  • Algorithm:min_intensity : minimum peak intensity to consider. Default is 100 to remove extremely low intensity peaks (e.g., in Bruker spectra)
  • Algorithm:rt_window : retention time window for MS1 deconvolution.

Advanced FeatureTracing parameters (with prefix FeatureTracing: )

  • FeatureTracing:quant_method: Method of quantification for mass traces. For LC data ‘area’ is recommended, ‘median’ for direct injection data. ‘max_height’ simply uses the most intense peak in the trace.
  • FeatureTracing:max_trace_length: maximum expected length of a feature in second.
  • FeatureTracing:min_isotope_cosine: Cosine threshold between avg. and observed isotope pattern for mass features. If not set, controlled by -Algorithm:min_isotope_cosine_ option.

Running FLASHDeconv with GUI

GUI command is found under [OpenMS path]/bin directory. From the bin directory, type

./FLASHDeconvWizard

And this window pops up.

From the “LC-MS files” menu you can select (possibly multiple) mzML files to analyze. The selected files are analyzed with the same parameter set.

Then if you go to the “Run FLASHDeconv” menu, you can control all the parameters and output options.

The default output folder is [home directory]/FLASHDeconvOut folder. You may change this by using Browse button in the right side. Below we have four toggle output buttons.

If “masses per spectrum” is selected (selected by default), spectrum level deconvolution results (per MS level) are generated (in tsv format). In the command line, this is controlled with -out_spec option, and users must specify file name per MS level. But in GUI, simply activating this “masses per spectrum” button will set the output spectrum file name per MS level automatically.

If “mzML” is selected, the deconvolved spectra are generated in mzML format.

“Promex (.ms1ft)” triggers the Promex format output generation (only for MS1), and “TopFD (.msalign,*.feature)” triggers the TopFD format output generation (both msalign and feature formats). Again, these buttons override -out_promex, -out_topFD, and -out_topFD_feature in the command line and automatically set the output file names.

The box below the toggle buttons controls the parameters. In default it shows only basic parameters. If the “Show advanced parameters” toggle button is activated, the advanced parameters will appear.

Lastly, the “Log” menu shows the log from FLASHDeconv. During or after FLASHDeconv run, one may check the log from FLASHDeconv from this menu. Here, also the command line commands corresponding to the current parameter selection by GUI also appear for reference and future use.

Running FLASHDeconv on command line

Runnable FLASHDeconv file can be found under [OpenMS path]/bin directory.

The mandatory options are -in and -out options. FLASHDeconv 2.0 only takes mzML file as its input. Basic parameters could be adjusted by the user according to instrumental setup. For input mzML file conversion from raw file, we recommend to use MSConvert with vendor provided peak picking methods.

For example if one wants to deconvolve /User/me/data/infile.mzml and get the result /User/me/out/outfilefeature.tsv,

one could run FLASHDeconv by typing as follows in the directory where FLASHDeconv is installed.

./FLASHDeconv -in /User/me/data/infile.mzml -out /User/me/out/outfilefeature.tsv

Output files

  • Deconvolved feature file (*.tsv) specified by -out
  • (optional) Deconvolved MSn spectra files (*.tsv) specified by -out_spec
  • (optional) Deconvolved mzML spectra file (*.mzML) specified by -out_mzml
  • (optional) Deconvolved MS1 in promex output format (*.ms1ft) specified by -out_promex
  • (optional) Deconvolved MSn spectra files in topfd output format (*.msalign) specified by -out_topFD
  • (optional) Deconvolved MSn feature files in topfd output format (*.feature) specified by -out_topFD_feature

Example datasets

Mass spectrometry datasets(*.raw and *.mzML) and corresponding results have been uploaded to MassIVE (https://massive.ucsd.edu) and are available under accession number MSV000084001.