FLASHDeconv: Ultrafast, high-quality feature deconvolution for top-down proteomics

Feature deconvolution, the determination of intact proteoform masses, is crucial for native and denatured top-down proteomics but currently suffers from long runtimes and frequent artifacts. We present FLASHDeconv, an algorithm based on a simple transformation of mass spectra, which turns deconvolution into the search for constant patterns thus greatly accelerating the process. We show higher deconvolution quality and two to three orders of magnitude faster execution speed than existing approaches.

 

Note:

Changes

  • MS2 spectral deconvolution enabled
  • By default, each run outputs feature deconvolution result (*.tsv) and spectral deconvolution results (*_MSn_spec.tsv)
  • Deconvoluted spectra may be output in mzml format (-mzml_out 1)
  • Deconvoluted MS1 spectra may be output in Promex format (-promex_out 1)
  • Deconvoluted MS2 spectra may be output in TopFD format  (-topfd_out 1)
  • Quality measure score (QScore) is added per each deconvoluted mass in spectral deconvolution results
  • Option name and output column name have been changed
  • More extensive harmonic artifact elimination
  • Tested with Waters and Bruker mzml files

Installation:

FLASHDeconv installation files (OpenMS-2.4.0-HEAD-, for windows *.exe, for mac *.dmg, and for linux *.deb) and source code (*-src.tar.gz) are found in here.

 

Parameters:

FLASHDeconv basic parameters are found by simply running FLASHDeconv. Only -in and -out are mandatory

  • -in: input file or directory (only *.mzML files are currently accepted)
  • -out: output file prefix or output dir; If prefix, [prefix].tsv and [prefix]_MSn_spec.tsv are generated. Otherwise, [inputfile].tsv and [inputfile]_MSn_spec.tsv are generated.
  • -tol: tolerance for each MS level in PPM (default: 10 ppm for MS1 and 5 ppm for MS2)
  • -min_charge: minimum charge of peaks (default: 1)
  • -max_charge: maximum charge of peaks (default: 100)
  • -min_mass: minimum mass of peaks (default: 50)
  • -max_mass: maximum mass of peaks (default: 100,000)
  • -write_detail: to write peak info per deconvoluted mass in detail or not in *_MSn_ spec.tsv files. If set to 1, all peak information (m/z, intensity, charge, and isotope index) per mass is reported. (default: ‘0’)

 

FLASHDeconv advanced parameters are found by running FLASHDeconv with –helphelp option.

  • -min_isotope_cosine: cosine threshold between avg. and observed isotope pattern for each MS level (default: 0.8 and 0.6 for MS1 and MS2, respectively)
  • -min_charge_cosine: cosine threshold between per-charge-intensity and fitted gaussian distribution (applies only to MS1; default: 0.5)
  • -min_peaks: minimum number of supporting peaks for each mass per MS level (default: 3 and 1 for MS1 and MS2, respectively). For MS1, supporting peaks are the peaks of distinct charges from the same mass. For MSn, supporting peaks are the peaks of distinct charges + the peaks of water addition or NH3 loss.
  • -max_mass_count: maximum mass count per spec for each MS level (default: -1 and -1 for MS1 and MS2, respectively, meaning unlimited)
  • -min_intensity: intensity threshold (default: 0)
  • -RT_window: RT window in second. Only for MS1. When 0, 15 MS1 spectra will be used (default: 0)
  • -max_MS_level: max MS level (inclusive)
  • -min_RT_span: minimum RT span for features in second (default: 1)
  • -promex_out: to write MS1 spectral deconvolution in promex output format (*_FD.ms1ft will be generated)
  • -topfd_out: to write MS2 spectral deconvolution in topfd output format (*_FD_ms2.msalign will be generated)
  • -mzml_out: to write spectral deconvolution in mzml output format (*.mzml will be generated)

 

Running FLASHDeconv:

Currently no GUI is prepared. Only runnable on command line. Runnable FLASHDeconv file can be found under [OpenMS path]/bin directory.

The mandatory options are -in and -out options. Basic parameters could be adjusted by the user according to instrumental setup. For input mzML file conversion from raw file, we recommend not to use any peak picking method.

 

You can specify a file or a directory for -in and -out options.

 

For example if one wants to deconvolute /User/me/data/infile.mzml and get the result /User/me/out/outfilefeature.tsv,

one could run FLASHDeconv by typing as follows in the directory where FLASHDeconv is installed.

  1. -in [infile] -out [prefix]
    ./FLASHDeconv -in /User/me/data/infile.mzml -out /User/me/out/prefix

    In /User/me/out/ directory, prefix.tsv (feature deconvolution result) and prefix_MSn_spec.tsv (for each MS level n)  will be generated.

  2. -in [infile] -out [outdir]
    ./FLASHDeconv -in /User/me/data/infile.mzml -out /User/me/out/

    In /User/me/out/ directory, infile.tsv (feature deconvolution result) and infile_MSn_spec.tsv (for each MS level n) will be generated (output filenames are determined by input filename).

  3. -in [dir] -out [file]
    ./FLASHDeconv -in /User/me/data/ -out /User/me/out/prefix

    FLASHDeconv will find all mzML files in /User/me/data/ (recursively) and process them. In /User/me/out/ directory, prefix.tsv and prefix_MSn_spec.tsv will be generated (all features from all input files are written in this file).

  4. -in [dir] -out [dir]
    ./FLASHDeconv -in /User/me/data/ -out /User/me/out/

    FLASHDeconv will find all mzML files in /User/me/data/ (recursively) and process them. In /User/me/out/ directory, output files will be generated per input file.

 

Output file :

  • Deconvoluted feature file (*.tsv)
  • Deconvoluted MSn spectra file (*_MSn_spec.tsv)
  • (optional) Deconvoluted MSn mzml file (*.mzml)
  • (optional) Deconvoluted MS1 in promex output format (*_FD.ms1ft)
  • (optional) Deconvoluted MS2 in topfd output format (*_FD_ms2.msalign)

Example datasets:

Mass spectrometry datasets(*.raw and *.mzML) and corresponding results have been uploaded to MassIVE (https://massive.ucsd.edu) and are available under accession number MSV000084001.