• Stars
    star
    183
  • Rank 209,207 (Top 5 %)
  • Language
    C#
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Thermo RAW file parser that runs on Linux/Mac and all other platforms that support Mono

ThermoRawFileParser

Wrapper around the .net (C#) ThermoFisher ThermoRawFileReader library for running on Linux with mono (works on Windows too). It takes a thermo RAW file as input and outputs a metadata file and the spectra in 3 possible formats:

  • MGF
  • mzML and indexed mzML
  • Apache Parquet: under development

As of version 1.2.0, 2 subcommands are available (shoutout to the eubic 2020 developers meeting, see usage for examples):

  • query: returns one or more spectra in JSON PROXI by scan number(s)
  • xic: returns chromatogram data based on JSON filter input

These features are still under development, remarks or suggestions are more than welcome.

RawFileReader reading tool. Copyright © 2016 by Thermo Fisher Scientific, Inc. All rights reserved

ThermoRawFileParser Publication:

  • Hulstaert N, Shofstahl J, Sachsenberg T, Walzer M, Barsnes H, Martens L, Perez-Riverol Y: ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion [PMID 31755270].
  • If you use ThermoRawFileParser as part of a publication, please include this reference.

(Linux) Requirements

Mono (install mono-complete if you encounter "assembly not found" errors).

Download

Click here to go to the release page (with release notes starting from v1.1.7).

You can find the ThermoRawFileParserGUI here.

Release Notes

You can read release notes (starting from version 1.1.7) in the wiki page

Usage

mono ThermoRawFileParser.exe -i=/home/user/data_input/raw_file.raw -o=/home/user/data_input/output/ -f=0 -g -m=0

with only the mimimal required argument -i or -d this becomes

mono ThermoRawFileParser.exe -i=/home/user/data_input/raw_file.raw

or

mono ThermoRawFileParser.exe -d=/home/user/data_input/

For running on Windows, omit mono. The optional parameters only work in the -option=value format. The tool can output some RAW file metadata -m=0|1 (0 for JSON, 1 for TXT) and the spectra file -f=0|1|2|3 (0 for MGF, 1 for mzML, 2 for indexed mzML, 3 for Parquet) or both. Use the -p flag to disable the thermo native peak picking.

ThermoRawFileParser.exe --help
Usage is ThermoRawFileParser.exe [subcommand] [options]
optional subcommands are xic|query (use [subcommand] -h for more info]):
  -h, --help                 Prints out the options.
      --version              Prints out the version of the executable.
  -i, --input=VALUE          The raw file input (Required). Specify this or an
                               input directory -d.
  -d, --input_directory=VALUE
                             The directory containing the raw files (Required).
                               Specify this or an input raw file -i.
  -o, --output=VALUE         The output directory. Specify this or an output
                               file -b. Specifying neither writes to the input
                               directory.
  -b, --output_file=VALUE    The output file. Specify this or an output
                               directory -o. Specifying neither writes to the
                               input directory.
  -s, --stdout               Write to standard output. Cannot be combined with
                               file or directory output. Implies silent logging,
                                i.e. logging level 0
  -f, --format=VALUE         The spectra output format: 0 for MGF, 1 for mzML,
                               2 for indexed mzML, 3 for Parquet; both numeric
                               and text (case insensitive) value recognized.
                               Defaults to indexed mzML if no format is
                               specified.
  -m, --metadata=VALUE       The metadata output format: 0 for JSON, 1 for TXT;
                               both numeric and text (case insensitive) value
                               recognized
  -c, --metadata_output_file=VALUE
                             The metadata output file. By default the metadata
                               file is written to the output directory.
  -g, --gzip                 GZip the output file.
  -p, --noPeakPicking[=VALUE]
                             Don't use the peak picking provided by the native
                               Thermo library. By default peak picking is
                               enabled. Optional argument allows disabling peak
                               peaking only for selected MS levels and should
                               be a comma-separated list of integers (1,2,3)
                               and/or intervals (1-3), open-end intervals (1-)
                               are allowed
  -z, --noZlibCompression    Don't use zlib compression for the m/z ratios and
                               intensities. By default zlib compression is
                               enabled.
  -a, --allDetectors         Extract additional detector data: UV/PDA etc
  -l, --logging=VALUE        Optional logging level: 0 for silent, 1 for
                               verbose, 2 for default, 3 for warning, 4 for
                               error; both numeric and text (case insensitive)
                               value recognized.
  -e, --ignoreInstrumentErrors
                             Ignore missing properties by the instrument.
  -x, --excludeExceptionData Exclude reference and exception data
  -L, --msLevel=VALUE        Select MS levels (MS1, MS2, etc) included in the
                               output, should be a comma-separated list of
                               integers (1,2,3) and/or intervals (1-3), open-
                               end intervals (1-) are allowed
  -P, --mgfPrecursor         Include precursor scan number in MGF file TITLE
  -N, --noiseData            Include noise data in mzML output
  -w, --warningsAreErrors    Return non-zero exit code for warnings; default
                               only for errors
  -u, --s3_url[=VALUE]       Optional property to write directly the data into
                               S3 Storage.
  -k, --s3_accesskeyid[=VALUE]
                             Optional key for the S3 bucket to write the file
                               output.
  -t, --s3_secretaccesskey[=VALUE]
                             Optional key for the S3 bucket to write the file
                               output.
  -n, --s3_bucketName[=VALUE]
                             S3 bucket name

Output file extension is determined by the used output format and (optional) gzip compression, for example, if format is MGF without gzip compression, the output file will receive .mgf extension, if format is mzML with gzip compression the output file will have .mzML.gz extension. All user input will be standardized to fulfill abovementioned requrements.

A (java) graphical user interface is also available here that enables the selection of an input RAW directory or one ore more RAW files.

query subcommand

Enables the retrieval spectra by (a) scan number(s) in PROXI format.

mono ThermoRawFileParser.exe query -i=/home/user/data_input/raw_file.raw -o=/home/user/output.json n="1-5, 20, 25-30"
ThermoRawFileParser.exe query --help
usage is:
  -h, --help                 Prints out the options.
  -i, --input=VALUE          The raw file input (Required).
  -n, --scans=VALUE          The scan numbers. e.g. "1-5, 20, 25-30"
  -b, --output_file=VALUE    The output file. Specifying none writes the output
                               file to the input file parent directory.
  -p, --noPeakPicking        Don't use the peak picking provided by the native
                               Thermo library. By default peak picking is
                               enabled.
  -s, --stdout               Pipes the output into standard output. Logging is
                               being turned off
  -w, --warningsAreErrors    Return non-zero exit code for warnings; default
                               only for errors
  -l, --logging=VALUE        Optional logging level: 0 for silent, 1 for
                               verbose, 2 for default, 3 for warning, 4 for
                               error; both numeric and text (case insensitive)
                               value recognized.

xic subcommand

Return one or more chromatograms based on query JSON input.

mono ThermoRawFileParser.exe xic -i=/home/user/data_input/raw_file.raw -j=/home/user/xic_input.json
ThermoRawFileParser.exe xic --help
  -h, --help                 Prints out the options.
  -i, --input=VALUE          The raw file input (Required). Specify this or an
                               input directory -d
  -d, --input_directory=VALUE
                             The directory containing the raw files (Required).
                               Specify this or an input file -i.
  -j, --json=VALUE           The json input file (Required).
  -p, --print_example        Show a json input file example.
  -o, --output=VALUE         The output directory. Specify this or an output
                               file. Specifying neither writes to the input
                               directory.
  -b, --output_file=VALUE    The output file. Specify this or an output
                               directory. Specifying neither writes to the
                               input directory.
  -6, --base64               Encodes the content of the xic vectors as base 64
                               encoded string.
  -s, --stdout               Pipes the output into standard output. Logging is
                               being turned off.
  -w, --warningsAreErrors    Return non-zero exit code for warnings; default
                               only for errors
  -l, --logging=VALUE        Optional logging level: 0 for silent, 1 for
                               verbose, 2 for default, 3 for warning, 4 for
                               error; both numeric and text (case insensitive)
                               value recognized.

Provide one of the following filters:

  • M/Z and tolerance (tolerance unit optional, defaults to ppm)
  • M/Z start and end
  • sequence and tolerance (tolerance unit optional, defaults to ppm)

optionally one can define starting and ending retention times and thermo filter string (defaults to ms)

An example input JSON file:

[
        {
            "mz":488.5384,
            "tolerance":10,
            "tolerance_unit":"ppm"           
        },
        {
            "mz":575.2413,
            "tolerance":10,
            "rt_start":630,
            "rt_end":660,
            "scan_filter":"ms2"
        },
        {
            "mz_start":749.7860,
            "mz_end" : 750.4,            
            "rt_start":630,
            "rt_end":660
        },
        {
            "sequence":"TRANNEL",
            "tolerance":10
        }
]

Go to top of page

Galaxy integration

ThermoRawFileParser is available in the Galaxy ToolShed and is deployed at the European Galaxy Server.

Logging

By default the parser only logs to console. To enable logging to file, uncomment the file appender in the log4net.config file.

<log4net>
    <root>
        <level value="INFO" />
        <appender-ref ref="console" />
        <!--<appender-ref ref="file" />-->
    </root>
    <appender name="console" type="log4net.Appender.ConsoleAppender">
        <layout type="log4net.Layout.PatternLayout">
            <conversionPattern value="%date %level %logger - %message%newline" />
        </layout>
    </appender>
    <!--<appender name="file" type="log4net.Appender.RollingFileAppender">
        <file value="ThermoRawFileParser.log" />
        <appendToFile value="true" />
        <rollingStyle value="Size" />
        <maxSizeRollBackups value="5" />
        <maximumFileSize value="10MB" />
        <staticLogFileName value="true" />
        <layout type="log4net.Layout.PatternLayout">
            <conversionPattern value="%date [%thread] %level %logger - %message%newline" />
        </layout>
    </appender>-->
</log4net>

Docker

First check the latest version tag on biocontainers/thermorawfileparser/tags. Then pull and run the container with

docker run -i -t -v /home/user/raw:/data_input quay.io/biocontainers/thermorawfileparser:<tag> ThermoRawFileParser.sh --help

Go to top of page

More Repositories

1

DeepLC

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
Python
52
star
2

peptide-shaker

Interpretation of proteomics identification results
Java
47
star
3

ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
Python
40
star
4

searchgui

Highly adaptable common interface for proteomics search and de novo engines
Java
39
star
5

ms2pip

MS²PIP: Fast and accurate peptide spectrum prediction for multiple fragmentation methods, instruments, and labeling techniques.
HTML
38
star
6

moFF

A modest Feature Finder (moFF) to extract MS1 intensities from Thermo raw file
Python
33
star
7

compomics-utilities

Open source Java library for computational proteomics
Java
30
star
8

meta-proteome-analyzer

MetaProteomeAnalyzer (MPA) software for analyzing and visualizing MS-based metaproteomics data.
TSQL
27
star
9

psm_utils

Common utilities for parsing and handling peptide-spectrum matches and search engine results in Python
Python
24
star
10

ThermoRawFileParserGUI

Graphical user interface for the ThermoRawFileParser
Java
18
star
11

denovogui

Graphical user interface for de novo sequencing of tandem mass spectra
Java
14
star
12

spectrum_similarity

Scoring functions to compare MS/MS spectra
Java
10
star
13

jsparklines

Sparklines for java tables
Java
8
star
14

COSS

COSS: CompOmics Spectral Searching
Java
8
star
15

workshop-ml-proteomics

Workshop: Tackling peptide identification ambiguity with machine learning
Jupyter Notebook
8
star
16

scop3d

Sequence conservation of protein on 3D structure.
Python
7
star
17

CALLC

Python
6
star
18

cellmissy

Cell Migration Invasion Storage System.
Java
6
star
19

lesSDRF

Jupyter Notebook
5
star
20

pout2prot

Pout2Prot converts Percolator output files to protein group and subgroup files using Occam's razor or anti-Occam's razor. These output files can directly be used in Prophane for further downstream taxonomic and functional analysis
Python
5
star
21

pladipus

Platform for Distributed Proteomics Software
Java
4
star
22

colims

A lims system to automate and expedite proteomics data management, processing and analysis.
APL
4
star
23

reporter

Protein quantification based on reporter ions
Java
4
star
24

IM2Deep

Collisional cross-section prediction for modified peptides
Python
4
star
25

fragmentation-analyzer

A standalone java tool for analyzing ms/ms fragmentation data.
Java
4
star
26

thermo-msf-parser

Parser and viewer for thermo msf files.
Java
3
star
27

mascotdatfile

Java API for MS/MS search results by Mascot (Matrix Science).
Java
3
star
28

icelogo

Next-generation visualization of protein consensus sequences by iceLogo.
Java
3
star
29

xilmass

An algorithm to identify cross-linked peptides
Java
3
star
30

ms-lims

Mass spectrometry based proteomics information management system.
Java
2
star
31

mumble

Finding the perfect modification for your mass shift
Python
2
star
32

ms2rescore-rs

Rust functionality for the MS²Rescore package
Rust
2
star
33

moff-gui

A graphical user interface for the Modest Feature Finding algorithm (moFF)
Java
2
star
34

CellMojo

Python
2
star
35

mitraq

Multiple iTRAQ/TMT Data Analysis
Java
1
star
36

xtandem-parser

Java-based parser for X!Tandem output xml files
Java
1
star
37

pride-asa-pipeline

Uniform annotation of identified spectra.
Java
1
star
38

tabloidProteome

HTML
1
star
39

elude-jwrapper

A Java API to make use of the retention time prediction functionality from elude.
Java
1
star
40

CompPyTools

Small Python scripts developed in the CompOmics group
Python
1
star
41

ReSpinOnline

Online public viewer for ReSpin Results
HTML
1
star
42

spectrawl

A rich client tool used for browsing through spectra looking for specific mass differences.
Java
1
star
43

dbtoolkit

Manipulating fasta sequence databases
Java
1
star
44

QuantPridePep

Pipeline to bring peptdie quantification in Pride using moFF
Python
1
star
45

search-all-assess-subset

An implementation of the Search All, Asses Subset strategy for FDR estimation in shotgun proteomics.
HTML
1
star
46

Tissue_prediction_manuscript

Jupyter Notebook
1
star