Abstract:
The embedded algorithms in commercial software and existing open-source toolkits exhibited limitations in dealing with the poor reproducibility and mass scale drift of raw matrix assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) spectra. Hence, a comprehensive data preprocessing pipeline, named TOFpipe, was developed to provide optimized technical support for constructing high-quality MALDI-TOF MS fingerprint dataset. The pipeline covered the full process of Profile-mode raw spectra pre-processing, from smoothing and denoising, baseline subtraction, mass scale calibration, data compression from Profile-mode to Centroid-mode to rapid detection of outliers. TOFpipe innovatively employed a wavelet transform-based derivative technique for peak detection and peak width estimation, and integrated a peak fitting strategy that combines an Exponentially Modified Gaussian (EMG) function with a linear baseline. These strategies enabled efficient denoising and baseline subtraction while maximally preserving the details of the original MS peak profiles and their relative intensity relationships. In this study, TOFpipe was applied to process 1,275 raw MALDI-TOF MS spectra from 12 different vegetable oils. Compared to conventional peak fitting strategies, TOFpipe effectively avoided “artifacts” and profile distortion especially when dealing with broadened peaks and regions with low signal-to-noise ratios. On this basis, the calculation of the centroid and area of MS peaks could perform with robustness, enabling high-fidelity conversion from Profile-mode to Centroid-mode. Additionally, TOFpipe employed a characteristics peaks-based segmentation strategy to precisely calibrate the mass scale offset (drift) and/or scaling. After calibration, the variance explained by the first principal component from MALDI-TOF MS spectral subsets of 12 species of vegetable oils have increased up to 4.49%~38.40%. Furthermore, TOFpipe employed cosine distance-based Multidimensional Scaling (MDS) analysis to reduce dimensionality reduction and visualize sample distributions, and successfully identified the fingerprints of 2 “atypical” high-oleic sunflower oils from 23 sunflower oil samples. Finally,
1200 spectra were selected to construct a fingerprint dataset. In the overall evaluation, different species of vegetable oils exhibited well-separated clustering in the projected space, indicating that TOFpipe can provide a reliable technical prerequisite for the construction of high-quality and highly reliable MALDI-TOF MS fingerprint datasets.