Data Format of Peak List Files

From MicrobeMS Wiki
Jump to: navigation, search

Peak list files combine multiple peak lists in one single file. These files are stored in a Matlab™ specific data format and contain the peak lists as well as the respective metadata. Peak list files can be loaded in Matlab by entering the following command:

>> load('ecoli-peaklist-oct16.pkf','-mat')

This command will open ecoli-peaklist-oct16.pkf, an example peak list file consisting of 16 individual peak lists from spectra of five different strains of E. coli. The file ecoli-peaklist-oct16.pkf can be downloaded here. If loading was successful, you will have access to a new Matlab variable C (structure array). Details of the structure of C are described next.
 


Fields of the structure array C:

Fields Description Data type
nam spectra id string
Matlab screenshot - format of a peak list file (*.pkf) demonstrating the general structure of the structure array 'C'. In this example the metadata of peak list #1 are shown.
gen genus information string
spe species info string
str strain info string
typ type string
uid taxonomy identification number for species as used by the NCBI (see [1]) integer
uie taxonomy identification number for strains used by the NCBI (see [2]) interger
gti cultivation conditions: growth time string
tem cultivation conditions: cultivation temperature string
air cultivation conditions: cultivation under aerobic or anaerobic conditions string
med cultivation conditions: cultivation medium string
spo spore formers (YES or NO) string
con sample concentration string
trt sample treatment string
ext extra information string
las laser parameters (power, diameter, frequency, etc.) string
cal calibration info string
met measurement method string
cus customer info string
tim date and time of measurement string
pth path to spectrum string
pik peak table, an array of the dimension [4 x npeaks] npeaks: number of peaks float32
cls class assignment (valid values are 0,1,2,3 and 4) float32
lst formatted text containing the peak table char array
seq sequence of preprocessing steps string
smo the number of smoothing points (Savitzky-Golay smoothing) float32
bas number of intervals used for baseline correction float32
nrm normalization parameter (Yes:1, No:0) float32
clb calibration paarmeters (see below for details) float32
red data reduction factor (spectral binning) string
cut cut in the spectral domain string
mod original data modified by cut or red (Yes:1, No:0) float32
prm parameters of peak detection string
ccl calibration information (see below) structure array
dbs data base spectrum (Yes:1, No:0) structure array
avr average spectrum (Yes:1, No:0) structure array


Format of peak tables (C.pik):


Fields Description
C.pik(1,:)
 
m/z positions of the peaks in the peak table
 
C.pik(2,:)
 
absolute intensities of these peaks
 
C.pik(3,:)
 
weighting factors (the sum of these factors equals 100)
 
C.pik(4,:)
 
in case of single spectra, i.e. no database or average spectra: baseline-corrected absolute intensities of the peaks, in case of average or database spectra: the relative peak frequency


Calibration Information (C.ccl):

Fields Description Type
cl1 calibration constant 1 float32
Matlab screenshot - format of structure array C.ccl containing the calibration info, such as calibration constants, delay time, number of spectra data points, etc. for spectrum #1.
cl2 calibration constant 2 float32
cl3 calibration constant 3 float32
del delay time [ns] float32
npt number of data points float32
res time resolution [ns] float32
ncl calibration info required to store the spectrum in a Bruker-specific data format string
ncr calibration info required to store the spectrum in a Bruker-specific data format string
bid hardware id of the spectrum string
org manufacturer info string
tfu manufacturer info string
tfu software info, required for compatibility issues string
spm type of instrumentation string
stp type of measurement (should be 'TOF') string
acq path to the original spectrum string



Data Base Spectrum (C.dbs):

A database spectrum is usually created from many (>3) individual mass spectra. The structure array C.dbs contains information (metadata, peak tables) on the mass spectra used to produce the given database spectrum. Details of the structure of C.dbs are given in the table below.

Fields Description Type
mem string defining if the current spectrum is a data base spectrum (1) or not (0) string
Matlab screenshot - format of structure array C.dbs. C(1,17).dbs(1,1) contains information of mass spectrum #1 which was used with others to obtain data base spectrum #17, such as the id, taxonomic information, peak tables and the respective peak detection parameters).
ids id of the individual mass spectrum used to create the data base spectrum string
tax taxonomic info of the source spectrum string
pik peak table of the source spectrum float32
prm parameters of peak detection string


Average Spectrum (C.avr):

An average spectrum is usually created from many (>3) individual mass spectra. The structure array C.avr contains information (metadata, peak tables) on the mass spectra used to produce the given avarage spectrum. Details of the structure of C.avr are given in the table below.

Fields Description Type
mem string defining if the current spectrum is an average spectrum (1) or not (0) string
Matlab screenshot - format of structure array C.avr. spec(1,18).avr(1,1) contains information of mass spectrum #1 which was used with others to obtain an average spectrum #18, such as the id, taxonomic information, peak tables and the respective peak detection parameters).
ids id of the individual mass spectrum used to create the avarage spectrum string
tax taxonomic info of the source spectrum string
pik peak table of the source spectrum float32
prm parameters of peak detection string