OpenPepXL: a fast and versatile XL-MS identification tool
OpenPepXL is a new protein-protein cross-link identification tool implemented in C++ as a TOPP tool. It can make use of labeled linkers to denoise spectra by comparing the spectra containing the light and heavy linkers. The tool is applicable to CID and HCD spectra and can make use of high resolution instruments by deisotoping fragment peaks and considering their charge for matching to theoretical peaks. Efficient data structures and algorithms keep runtime and memory consumption low.
Although no heuristics are used to reduce the search space, it still has a reasonable memory usage and can be used effectively on a desktop computer with 16GB memory for many applications. It is applicable to all labeled and label-free cross-linkers, including long linkers such as DSS and zero length linkers such as EDC.
OpenPepXL supports the upcoming mzIdentML 1.2 format and a visualization of matched peaks based on this new standard is implemented in TOPPView.
The tool settings can be set up using the INIFileEditor GUI, which is used to visually edit TOPP tool INI files. The input files should be in mzML format with centroided MS and MS/MS spectra (either on acquisition, conversion or in a workflow using the TOPP tool PeakPickerHiRes). For label-free cross-linkers the executable OpenPepXLLF must be used. To make use of labeled linkers, the executable OpenPepXL must be used and a consensusXML file produced by the TOPP tool FeatureFinderMultiplex is needed for each mzML file.
OpenPepXL output is compatible with several tools for further processing, visualization and publishing of XL-MS data:
OpenPepXL supports the output format of xQuest (Rinner et al., 2008). This means the output is compatible with any post-processing and visualization tools developed for the xQuest pipeline, such as xProphet (Leitner et al., 2014) for FDR estimation and xTract (Walzthoeni et al., 2015) for quantification, as well as the UCSF Chimera plug-in Xlink Analyzer (Kosinski et al., 2015) for visualizing and analyzing cross-links on structures.
Output of cross-link identification data in mzIdentML 1.2 format (Vizcaíno et al., 2017) will allow complete submissions of Cross-Linking MS data to the PRIDE database and ProteomeXchange (Ternent et al., 2014).
Visualizing spectra and matched peaks with TOPPView:
First, open a spectrum in TOPPView. Then go to “Tools” -> “Annotate with identification” and select the *.mzid file produced by OpenPepXL.
You can select an identified cross-link in the table on the right side of the TOPPView window to visualize it.
TOPPView allows you to zoom in and out freely. Peak annotations are read from the *.mzid file but can be edited, moved, added or removed, e.g. to prepare clean images for publication (only the sequence coverage visualization in the top right corner is static).
Manual for setting up OpenPepXL using a user interface and running it on the command line
Introduction to TOPP INI files
All the TOPP tools of the OpenPepXL workflow share a common framework of setting up and running the tools in the command line using INI files. To generate a tool specific INI file with default settings, call the tool executable with the parameter “-write_ini filename.ini”.
Example: PeakPickerHiRes -write_ini picker.ini
To edit the settings, start the TOPP tool INIFileEditor and open the *.ini file in this tool. The INIFileEditor shows a description of each parameter at the bottom and helps to fill out many parameters, e.g. by using a file browser to select input and output files and showing the possible choices of parameters with limited options.
To run a tool using the edited INI file, call the tool executable with the parameter “-ini filename.ini”.
Example: PeakPickerHiRes -ini path/to/picker.ini
One INI file can be reused for several runs with different parameters (e.g. another input and output file) by explicitly giving the tool additional parameters. These will have a higher priority than what is written in the INI file.
PeakPickerHiRes -ini path/to/picker.ini -in input_file_01.mzML -out output_file_01.mzML
PeakPickerHiRes -ini path/to/picker.ini -in input_file_02.mzML -out output_file_02.mzML
The INI files can also be edited using a normal text editor when opening a GUI is not possible, e.g. when working on a remote server.
The examples assume OpenMS with OpenPepXL is installed on a Linux computer and the binaries are in the PATH. On Windows computers the names of the tools would end with .exe, otherwise everything should work in the same way.
Setting up PeakPickerHiRes (centroiding MS and MS/MS spectra):
Open the generated *.ini file with the INIFileEditor. Choose your input and output file and the MS levels, that are not centroided in the input file. The other settings do not need to be changed from the default settings for most applications.
Run the tool using the command line: PeakPickerHiRes -ini picker.ini
Setting up FeatureFinderMultiplex (finding feature pairs on the MS level):
This step is only needed for cross-linking experiments done with a mixture of light and heavy labeled cross-linkers.
Open the generated *.ini file with the INIFileEditor. Choose your input file and a path and filename for the output in .consensusXML format (parameter -out).
Open the advanced settings and change the mass of any of the labels in the “Labels” category to the mass difference between the light and heavy linkers you are using. In the labels field in the “Algorithm” category, choose an empty label (representing the light linker) and the label you edited. The parameter value for “labels” should look like this, if you edited the mass of the label Arg6: [Arg6]
Change the charge parameter to 3:7 and mz_tolerance and mz_unit to your MS instrument precursor tolerance.
Run the tool using the command line: FeatureFinderMultiplex -ini featurefinder.ini
Setting up OpenPepXL / OpenPepXLLF (searching for cross-linked peptides in paired MS/MS spectra or all spectra):
First choose the tool that you need. If you have data from an experiment done with a mixture of light and heavy labeled cross-linkers, you need OpenPepXL. For label-free linkers you need OpenPepXLLF. Generate an INI file for one of these tools.
Open the generated *.ini file with the INIFileEditor. Choose your mzML input file (-in) and your consensusXML file (-consensus, only OpenPepXL) and output files in any of the three supported formats (-out_xquestxml, -out_idxml, -out_mzid). If you do not specify any of these, the tool will run through to the end, but will not write any results! With the parameter -threads you can choose the number of CPU cores this tool will use. You can either process each file on multiple cores or start the tool with different parameters or input files in parallel.
Adapt the precursor and fragment tolerances to your MS instrument and add fixed and variable modifications, which you expect in your samples (aside from the cross-linker).
In the cross-linker category you can define your cross-linker. The default settings are for DSS and should also be correct for BS3 (just change the name parameter to “BS3” to write the correct linker name into the output file). The residue1 and residue2 parameters accept lists of residues, so that you can define any heterobifunctional cross-linker.
Run the tool using the command line:
Labeled linkers: OpenPepXL -ini openpepxl.ini
Label-free linkers: OpenPepXLLF -ini openpepxl.ini
OpenPepXL is now integrated into the OpenMS github repository. It is not included in the precompiled binaries / installers yet, but can be installed together with the OpenMS library by following the building instructions for your operating system found here.
If you have any questions or suggestions, please visit the support page or write an email to firstname.lastname@example.org.
Rinner O, Seebacher J, Walzthoeni T, Mueller L, Beck M, Schmidt A, Mueller M, Aebersold R (2008) Identification of cross-linked peptides from large sequence databases.
Leitner A, Walzthoeni T, Aebersold R. (2014) Lysine-specific chemical cross-linking of protein complexes and identification of cross-linking sites using LC-MS/MS and the xQuest/xProphet software pipeline.
Walzthoeni T, Joachimiak LA, Rosenberger G, Röst HL, Malmström L, Leitner A, Frydman R, Aebersold R (2015) xTract: software for characterizing conformational changes of protein complexes by quantitative cross-linking mass spectrometry.
Kosinski, J., et al. (2015) Xlink Analyzer: Software for analysis and visualization of cross-linking data in the context of three-dimensional structures. J. Struct. Biol.
Vizcaíno, J. A., Mayer, G., Perkins, S. R., Barsnes, H., Vaudel, M., Perez-Riverol, Y., … & Rappsilber, J. (2017). The mzIdentML data standard version 1.2, supporting advances in proteome informatics. Molecular & Cellular Proteomics, mcp-M117.
Ternent, T., Csordas, A., Qi, D., Gómez‐Baena, G., Beynon, R.J., Jones, A.R., Hermjakob, H. and Vizcaíno, J.A., 2014. How to submit MS proteomics data to ProteomeXchange via the PRIDE database. Proteomics, 14(20), pp.2233-2241.