Matches tandem mass spectra to nucleic acid sequences.

Given a FASTA file containing RNA sequences (and optionally decoys) and an mzML file from a nucleic acid mass spec experiment:

Generate a list of digestion fragments from the FASTA file (based on a specified RNase)
Search the mzML input for MS2 spectra with parent masses corresponding to any of these sequence fragments
Match the MS2 spectra to theoretically generated spectra
Score the resulting matches

Output is in the form of an mzTab-like text file containing the search results. Optionally, an idXML file suitable for visualizing search results in TOPPView (parameter id_out) and a "target coordinates" file for label-free quantification using FeatureFinderMetaboIdent (parameter lfq_out) can be generated.

Modified ribonucleotides can either be specified in the FASTA input file (as fixed modifications), or set as variable modifications in the tool options. Information on available modifications is taken from the Modomics database (http://modomics.genesilico.pl/). In addition to these "standard" modifications, OpenMS defines "generic" and "ambiguous" ones:
A generic modification represents a group of modifications that cannot be distinguished by tandem mass spectrometry. For example, "mA" stands for any methyladenosine (could be "m1A", "m2A", "m6A" or "m8A"), "mmA" for any dimethyladenosine (with two methyl groups on the base), and "mAm" for any 2'-O-dimethyladenosine (with one methyl group each on base and ribose). There is no technical difference between searching for "mA" or e.g. "m1A", but the generic code better represents that no statement can be made about the position of the methyl group on the base.
In contrast, an ambiguous modification represents two isobaric modifications (or modification groups) with a methyl group on either the base or the ribose, that could in principle be distinguished based on a-B ions. For example, "mA?" stands for methyladenosine ("mA", see above) or 2'-O-methyladenosine ("Am"). When using ambiguous modifications in a search, NucleicAcidSearchEngine can optionally try to assign the alternative that generates better a-B ion matches in a spectrum (see parameter modifications:resolve_ambiguities).

The command line parameters of this tool are:

NucleicAcidSearchEngine -- Annotate nucleic acid identifications to MS/MS spectra.
Full documentation: http://www.openms.de/doxygen/release/3.0.0/html/UTILS_NucleicAcidSearchEngine.html
Version: 3.0.0 Jul 14 2023, 11:57:33, Revision: be787e9
To cite OpenMS:
 + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for 
   mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  NucleicAcidSearchEngine <options>

Options (mandatory options marked with '*'):
  -in <file>*                                  Input file: spectra (valid formats: 'mzML')
  -database <file>                             Input file: sequence database. Required unless 'digest' is 
                                               set. (valid formats: 'fasta')
  -digest <file>                               Input file: pre-digested sequence database. Can be used instea
                                               d of 'database'. Sets all 'oligo:...' parameters. (valid forma
                                               ts: 'oms')
  -out <file>*                                 Output file: mzTab (valid formats: 'mzTab')
  -id_out <file>                               Output file: idXML (for visualization in TOPPView) (valid form
                                               ats: 'idXML')
  -db_out <file>                               Output file: oms (SQLite database) (valid formats: 'oms')
  -digest_out <file>                           Output file: sequence database digest. Ignored if 'digest' 
                                               input is used. (valid formats: 'oms')
  -lfq_out <file>                              Output file: targets for label-free quantification using Featu
                                               reFinderMetaboIdent ('id' input) (valid formats: 'tsv')

Precursor (parent ion) options:
  -precursor:mass_tolerance <tolerance>        Precursor mass tolerance (+/- around uncharged precursor mass)
                                                (default: '10.0')
  -precursor:mass_tolerance_unit <unit>        Unit of precursor mass tolerance (default: 'ppm') (valid: 'Da'
                                               , 'ppm')
  -precursor:min_charge <num>                  Minimum precursor charge to be considered (default: '-1')
  -precursor:max_charge <num>                  Maximum precursor charge to be considered (default: '-20')
  -precursor:include_unknown_charge            Include MS2 spectra with unknown precursor charge - try to 
                                               match them in any possible charge between 'min_charge' and 
                                               'max_charge', at the risk of a higher error rate
  -precursor:use_avg_mass                      Use average instead of monoisotopic precursor masses (appropri
                                               ate for low-resolution instruments)
  -precursor:use_adducts                       Consider possible salt adducts (see 'precursor:potential_adduc
                                               ts') when matching precursor masses
  -precursor:potential_adducts <list>          Adducts considered to explain mass differences. Format: 'Eleme
                                               nt:Charge(+/-)', i.e. the number of '+' or '-' indicates the 
                                               charge, e.g. 'Ca:++' indicates +2. Only used if 'precursor:use
                                               _adducts' is set. (default: '[Na:+]')
  -precursor:isotopes <list>                   Correct for mono-isotopic peak misassignments. E.g.: 1 = precu
                                               rsor may be misassigned to the first isotopic peak. Ignored 
                                               if 'use_avg_mass' is set. (default: '[0 1 2 3 4]')

Fragment (Product Ion) Options:
  -fragment:mass_tolerance <tolerance>         Fragment mass tolerance (+/- around fragment m/z) (default: 
                                               '10.0')
  -fragment:mass_tolerance_unit <unit>         Unit of fragment mass tolerance (default: 'ppm') (valid: 'Da',
                                                'ppm')
  -fragment:ions <choice>                      Fragment ions to include in theoretical spectra (default: '[a-
                                               B a b c d w x y z]') (valid: 'a-B', 'a', 'b', 'c', 'd', 'w', 
                                               'x', 'y', 'z')

Modification options:
  -modifications:variable <mods>               Variable modifications (valid: 'io6A', 's2U', 'k2C', 'm2Gm', 
                                               'Ym', 'f5Cm', 'Qbase', 'ac4Cm', 'imG-14', 'cm5s2U', 'mnm5s2U',
                                                'm227G', 'yW-58', 'I', 'g6A', 'nm5U', 'm7G', 's2Um', 'Y', 
                                               'hm5C', 'm5U', 'preQ0', 'o2yW', 'm5Um', 'preQ1', 'm66Am', 'ac6
                                               A', 'ms2io6A', 'Am', 'Im', 'mnm5U', 'm22G', 't6A', 'm8A', 'm7G
                                               pppN', 'm27GpppN', 'm227GpppN', 'mpppN', 'm28A', 'acp3D', 'acp
                                               3Y', 'imG', 'D', 'N', 'C+', 'm27Gm', 'ho5C', 'inm5U', 'inm5Um'
                                               , 'inm5s2U', 'pppN', 'GpppN', 'CoApN', 'm44C', 'acCoApN', 'mal
                                               ...
                                               , 'dT*')
  -modifications:variable_max_per_oligo <num>  Maximum number of residues carrying a variable modification 
                                               per candidate oligonucleotide (default: '2')
  -modifications:resolve_ambiguities           Attempt to resolve ambiguous modifications (e.g. 'mA?' for 
                                               'mA'/'Am') based on a-B ions.
                                               This incurs a performance cost because two modifications have 
                                               to be considered for each case.
                                               Requires a-B ions to be enabled in parameter 'fragment:ions'.

Oligonucleotide (digestion) options (ignored if 'digest' input is used):
  -oligo:min_size <num>                        Minimum size an oligonucleotide must have after digestion to 
                                               be considered in the search (default: '5')
  -oligo:max_size <num>                        Maximum size an oligonucleotide must have after digestion to 
                                               be considered in the search, leave at 0 for no limit (default:
                                                '0')
  -oligo:missed_cleavages <num>                Number of missed cleavages (default: '1')
  -oligo:enzyme <choice>                       The enzyme used for RNA digestion (default: 'no cleavage') 
                                               (valid: 'RNase_A', 'RNase_H', 'mazF', 'no cleavage', 'unspecif
                                               ic cleavage', 'RNase_MC1', 'RNase_U2', 'colicin_E5', 'RNase_T1
                                               ', 'cusativin')

False Discovery Rate options:
  -fdr:decoy_pattern <string>                  String used as part of the accession to annotate decoy sequenc
                                               es (e.g. 'DECOY_'). Leave empty to skip the FDR/q-value calcul
                                               ation.
  -fdr:cutoff <value>                          Cut-off for FDR filtering; search hits with higher q-values 
                                               will be removed (default: '1.0') (min: '0.0' max: '1.0')
  -fdr:remove_decoys                           Do not score hits to decoy sequences and remove them when filt
                                               ering

                                               
Common UTIL options:
  -ini <file>                                  Use the given TOPP INI file
  -threads <n>                                 Sets the number of threads allowed to be used by the TOPP tool
                                                (default: '1')
  -write_ini <file>                            Writes the default configuration file
  --help                                       Shows options
  --helphelp                                   Shows all options (including advanced)

Average mass of t6A differs substantially from its formula mass.
Average mass of m7GpppN differs substantially from its formula mass.
Average mass of m2,7GpppN differs substantially from its formula mass.
Average mass of m2,2,7GpppN differs substantially from its formula mass.
Average mass of mpppN differs substantially from its formula mass.
Average mass of pppN differs substantially from its formula mass.
Average mass of GpppN differs substantially from its formula mass.
Average mass of CoApN differs substantially from its formula mass.
Average mass of acCoApN differs substantially from its formula mass.
Average mass of malonyl-CoApN differs substantially from its formula mass.
Average mass of succinyl-CoApN differs substantially from its formula mass.
Average mass of ppN differs substantially from its formula mass.
Average mass of NADpN differs substantially from its formula mass.
Average mass of pG(pN) differs substantially from its formula mass.
Average mass of m3C differs substantially from its formula mass.
Average mass of cmnm5se2U differs substantially from its formula mass.
Average mass of mnm5se2U differs substantially from its formula mass.
Average mass of se2U differs substantially from its formula mass.

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+NucleicAcidSearchEngineAnnotate nucleic acid identifications to MS/MS spectra.

version3.0.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'NucleicAcidSearchEngine'

in Input file: spectrainput file*.mzML

database Input file: sequence database. Required unless 'digest' is set.input file*.fasta

digest Input file: pre-digested sequence database. Can be used instead of 'database'. Sets all 'oligo:...' parameters.input file*.oms

out Output file: mzTaboutput file*.mzTab

id_out Output file: idXML (for visualization in TOPPView)output file*.idXML

db_out Output file: oms (SQLite database)output file*.oms

digest_out Output file: sequence database digest. Ignored if 'digest' input is used.output file*.oms

lfq_out Output file: targets for label-free quantification using FeatureFinderMetaboIdent ('id' input)output file*.tsv

theo_ms2_out Output file: theoretical MS2 spectra for precursor mass matchesoutput file*.mzML

exp_ms2_out Output file: experimental MS2 spectra for precursor mass matchesoutput file*.mzML

decharge_ms2false Decharge the MS2 spectra for scoringtrue, false

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue, false

forcefalse Overrides tool-specific checkstrue, false

testfalse Enables the test mode (needed for internal use only)true, false

+++precursorPrecursor (parent ion) options

mass_tolerance10.0 Precursor mass tolerance (+/- around uncharged precursor mass)

mass_tolerance_unitppm Unit of precursor mass toleranceDa, ppm

min_charge-1 Minimum precursor charge to be considered

max_charge-20 Maximum precursor charge to be considered

include_unknown_chargefalse Include MS2 spectra with unknown precursor charge - try to match them in any possible charge between 'min_charge' and 'max_charge', at the risk of a higher error ratetrue, false

use_avg_massfalse Use average instead of monoisotopic precursor masses (appropriate for low-resolution instruments)true, false

use_adductsfalse Consider possible salt adducts (see 'precursor:potential_adducts') when matching precursor massestrue, false

potential_adducts[Na:+] Adducts considered to explain mass differences. Format: 'Element:Charge(+/-)', i.e. the number of '+' or '-' indicates the charge, e.g. 'Ca:++' indicates +2. Only used if 'precursor:use_adducts' is set.

isotopes[0, 1, 2, 3, 4] Correct for mono-isotopic peak misassignments. E.g.: 1 = precursor may be misassigned to the first isotopic peak. Ignored if 'use_avg_mass' is set.

+++fragmentFragment (Product Ion) Options

mass_tolerance10.0 Fragment mass tolerance (+/- around fragment m/z)

mass_tolerance_unitppm Unit of fragment mass toleranceDa, ppm

ions[a-B, a, b, c, d, w, x, y, z] Fragment ions to include in theoretical spectraa-B, a, b, c, d, w, x, y, z

+++modificationsModification options

variable[] Variable modificationsio6A, s2U, k2C, m2Gm, Ym, f5Cm, Qbase, ac4Cm, imG-14, cm5s2U, mnm5s2U, m227G, yW-58, I, g6A, nm5U, m7G, s2Um, Y, hm5C, m5U, preQ0, o2yW, m5Um, preQ1, m66Am, ac6A, ms2io6A, Am, Im, mnm5U, m22G, t6A, m8A, m7GpppN, m27GpppN, m227GpppN, mpppN, m28A, acp3D, acp3Y, imG, D, N, C+, m27Gm, ho5C, inm5U, inm5Um, inm5s2U, pppN, GpppN, CoApN, m44C, acCoApN, malonyl-CoApN, succinyl-CoApN, ppN, NADpN, m6t6A, OHyWy, pG(pN), ncm5s2U, nchm5U, mchm5Um, pN, ges2U, cmnm5ges2U, mnm5ges2U, nm5ges2U, m5D, mmpN, mpN, 5'-OH-N, N2'3'cp, ct6A, hm6A, f6A, cnm5U, mcmo5Um, m5C, ms2ct6A, ht6A, msms2i6A, hm5Cm, pY, pm5C, ps2U, hn6A, pD, pm2G, pm66A, pGm, pm5U, pm22G, pm3U, m5Cm, Ar(p), pac4C, pCm, pppG, pm7G, pm1A, pUm, pAm, pm4Cm, pC2'3'cp, tm5U, pm6A, pyW, pm1G, pA2'3'cp, ps4U, pI, ppG, pm2A, m1G, pmcm5s2U, cmnm5Um, pse2U, pt6A, pf5C, pppA, pG, pG2'3'cp, pms2i6A, pcmo5U, pyW, Xm, pm44C, pi6A, phm5C, pU, pAr(p), pac4Cm, pGp, m1I, pU2'3'cp, mcm5U, Gm, pm4C, ncm5Um, pm5Um, pmnm5U, pm3C, m5s2U, yW, pC, pm6t6A, f5C, pAp, m1Am, m1Im, ApppN, AppppN, ApppppN, m6ApppN, m6AppppN, m6ApppppN, m7GppppN, Gr(p), pm1Am, pm1Gm, pm1Im, pm1acp3Y, pm1I, pm1Y, pk2C, ps2C, pm1acp3Y, pm3Y, nm5s2U, pm5Cm, pmchm5U, pinm5Um, pinm5s2U, pinm5U, pnm5U, pncm5U, pchm5U, pcmnm5U, pcm5U, m2G, pho5U, pmcm5Um, pmcm5U, pmo5U, pm5D, pmimG, phm5Cm, pIm, pYm, pmcmo5Um, m2A, pGr(p), pm28A, pmsms2i6A, pges2U, pms2ct6A, pms2io6A, pms2hn6A, pms2m6A, pms2t6A, ps2Um, pm3Um, pacp3D, pacp3U, pimG-14, pmchm5Um, pnm5ges2U, pnm5se2U, pnm5s2U, pnchm5U, pncm5Um, m4Cm, pncm5s2U, pcm5s2U, pcmnm5Um, pcmnm5ges2U, pcmnm5se2U, pcmnm5s2U, pcnm5U, pf5Cm, pho5C, pm5s2U, m27G, m22Gm, pmnm5ges2U, pmnm5se2U, pmnm5s2U, ptm5s2U, ptm5U, pyW-86, pyW-72, pyW-58, ppreQ1, ppreQ0, m66A, pm8A, pC+, pG+, pct6A, poQ, pgalQ, pgluQ, pht6A, pOHyW, pimG2, gluQ, pmanQ, pOHyWy, pm27Gm, pQ, pOHyWx, pmcmo5U, pimG, pm44Cm, pm2Gm, pm22Gm, ncm5U, pm6Am, pm66Am, pio6A, pac6A, pf6A, pg6A, phm6A, phn6A, cm5U, pm27G, pm227G, pm27Gm, pA, cmnm5s2U, cmo5U, m3Y, m3U, ms2m6A, Um, ms2i6A, m3C, cmnm5se2U, ms2t6A, i6A, m3Um, mcmo5U, mimG, oQ, preQ1base, nm5se2U, m1Gm, ho5U, Q, xG, mcm5s2U, m44Cm, s4U, xC, yW-86, xA, chm5U, mo5U, acp3U, xU, yW-72, mnm5se2U, ms2hn6A, m1acp3Y, mcm5Um, ac4C, m6Am, m1A, Cm, mchm5U, galQ, cmnm5U, m1Y, imG2, m4C, manQ, tm5s2U, s2C, OHyWx, se2U, preQ0base, m6A, OHyW, xX, G+, 3'-p, 5'-p, 5'-p*, 3'-c, mA, mC, mG, mU, sU, mmA, mAm, mCm, mGm, mUm, cmo5U/chm5U, mchm5U/mcmo5U, mchm5Um/mcmo5Um, m6t6A/hn6A, galQ/manQ, mA?, mC?, mG?, mU?, mI?, msU?, mmA?, mmC?, mmG?, mmmG?, ac4C/f5Cm?, acp3U/cmnm5Um?, dA, dC, dG, dU, dT, A*, C*, G*, U*, dA*, dC*, dG*, dU*, dT*

variable_max_per_oligo2 Maximum number of residues carrying a variable modification per candidate oligonucleotide

resolve_ambiguitiesfalse Attempt to resolve ambiguous modifications (e.g. 'mA?' for 'mA'/'Am') based on a-B ions.
This incurs a performance cost because two modifications have to be considered for each case.
Requires a-B ions to be enabled in parameter 'fragment:ions'.true, false

+++oligoOligonucleotide (digestion) options (ignored if 'digest' input is used)

min_size5 Minimum size an oligonucleotide must have after digestion to be considered in the search

max_size0 Maximum size an oligonucleotide must have after digestion to be considered in the search, leave at 0 for no limit

missed_cleavages1 Number of missed cleavages

enzymeno cleavage The enzyme used for RNA digestionRNase_A, RNase_H, mazF, no cleavage, unspecific cleavage, RNase_MC1, RNase_U2, colicin_E5, RNase_T1, cusativin

+++reportReporting Options

top_hits1 Maximum number of top-scoring hits per spectrum that are reported ('0' for all hits)0:∞

+++fdrFalse Discovery Rate options

decoy_pattern String used as part of the accession to annotate decoy sequences (e.g. 'DECOY_'). Leave empty to skip the FDR/q-value calculation.

cutoff1.0 Cut-off for FDR filtering; search hits with higher q-values will be removed0.0:1.0

remove_decoysfalse Do not score hits to decoy sequences and remove them when filteringtrue, false