Geena 2: data file format

Geena 2
Build August 31, 2021. This system is under active development,
please forgive us for possible errors and
send us your comments, criticisms and congratulations,
if any. the authors

Geena 2 information page

Welcome to Geena 2, the new tool for multi-spectra filtering, averaging and alignment brought to you by researchers from Genoa and Naples (GEnova E NApoli).
The use of Geena 2 should be straightforward. Results are displayed in a simple format. Nevertheless, should you have any problem, see the help page.

You may find it useful to perform a first analysis by using the example input data file Test1.txt that is provided for testing purposes.
In the following, you will find links to output files that were generated by Geena 2 from this input data by using the following parameters:

Analysis range: 1,200 - 1,700 m/z - Normalization peak at: 1,420.80 m/z
Abundance thresholds: 5 at 1,200 m/z, 4 at 1,700 m/z
Maximum number of isotopic replicas: 5 - Maximum delta between isotopic peaks: 0.05 Da
Maximum delta for aligning replicates: 0.1 Da - Minimum number of signals in replicates: 1
Maximum delta for aligning average spectra: 0.2 Da - Minimum number of signals in average spectra: 2
Some of these values, namely the analysis range and the normalization peak, MUST be inserted in the Geena 2 web interface of use, either the Quick Search Interface (QSI) or the Standard Search Interface (SSI). Remaining values correspond to default values which are implicity used by the QSI and explicity shown by the SSI, where they can be modified.

NB! Although we periodically check that this file is aligned with the version of the software, some differences can arise because Geena 2 is under active development.

Data file formats

Here, the formats that are used by Geena 2 are shortly introduced.

Input data file

Intermediate information on isotopic peaks joining

Filtered spectra

Average spectrum from replicates

Alignments

Input data file
The input data file is a simple text file with data delimited by tab characters.
NB! This format can be easily achieved by saving data from MS Excel with the "Text (tab delimited)" format.

Sample section
The basic block of the iput file is the "Sample section" that includes data referring to replicated spectra from the same origin sample. Each spectrum is reported by pairs of m/z and abundance values which are ordered by increasing m/z value and listed in column, as specified below.

First line
The Sample section begins with a line that includes reference names for all spectra for the Sample. Names are separated by two tab characters. The last name is not followed by any tab character.

Second line
The second line includes fixed labels (usually "m/z" and "abund") that are used as headers for the following data and define the contents of the respective columns. This line does not presently affect analysis.

Following lines
Following lines include pairs of m/z and abundance values, each pair representing a peak of the spectrum, in m/z value ascending order. The first number reports the m/z value, the second one the abundance (aka intensity).
On a single line of the file, the number of such pairs equals the number of spectra for the same sample. So, on the i-th line, the (i-2)-th pair of m/z and abundance values for each spectra is included. This means that m/z values in the same line may be (and usually are) different. In general, the pair of values in the k-th spectrum occupies columns 2*k-1 (m/z value) and 2*k (abundance value).
All values, both within and between pairs, are separated by a tab character. Since spectra may have a differente number of peaks, some columns may include more values than the others. Missing values should be replaced by zeroes.

Example of a sample section
The following excerpt of the example input data file Test1.txt shows the initial and final lines of the first sample section. The section refers to three spectra from the same sample, named "20A", "20B" and "20C". The first spectrum has a greater number of peaks than the others. Missing peaks are represented by zeroes.
20A 20B 20C m/z abund m/z abund m/z abund 707.36 47 707.36 29 707.36 35 708.36 21 709.37 68 709.37 72 709.37 94 710.38 26 710.38 41 710.38 34 711.39 26 711.39 44 711.39 48 713.40 20 713.40 25 713.40 24 723.11 18 723.35 55 723.35 45 723.35 38 739.30 51 724.35 26 725.36 19 767.31 33 725.38 21 738.28 27 803.20 30 739.29 40 739.30 67 804.20 58 741.30 28 740.30 25 805.21 63 763.29 30 741.30 50 806.22 25 ... .. ... .. ... .. ... .. ... .. ... .. 2532.90 2 0 0 0 0 2541.57 2 0 0 0 0 2547.73 2 0 0 0 0 2554.99 3 0 0 0 0 2556.06 2 0 0 0 0 2557.02 3 0 0 0 0 2669.06 3 0 0 0 0 2821.16 16 0 0 0 0 2821.79 3 0 0 0 0

Multiple sample sections
At the end of each sample section, but the last one, a line with two backslashes indicates the separation with the following sample section, as shown in the following excerpt of the same example file. E.g.:
... .. ... .. ... .. ... .. ... .. ... .. 2557.02 3 0 0 0 0 2669.06 3 0 0 0 0 2821.16 16 0 0 0 0 2821.79 3 0 0 0 0 \\ 21A 21B 21C m/z abund m/z abund m/z abund 707.36 17 702.44 14 707.36 14 709.38 33 707.36 13 709.38 38 711.39 16 709.38 23 710.38 21 723.36 21 710.38 13 711.39 13 729.34 33 711.39 12 713.37 11 730.35 12 713.41 11 723.36 25 ... .. ... .. ... .. ... .. ... .. ... ..

Example file Test1.txt
The example file Test1.txt may be downloaded from the Geena 2 web site. This file reports four sample sections, each of which includes three spectra replicates.

Intermediate information on isotopic peaks joining
These files include the results of filtering and joining of isotopic peaks for a spectrum. They are formatted for readability as HTML since they do not include information that can be re-analysed by Geena 2. They are however useful for checking how isotopic peaks were joined. Their analysis can suggest changes in values of input parameters.

These files can be downloaded at the end of the analysis.
Their name is defined as follows:
"<job>_<sample>_<spectrum>_groups.html"
where <job> is the job name given by the researcher, <sample> is the label that is associated by Geena 2 to the sample in the analysis, and <spectrum> is the name of the spectrum in the input file.
These files include first some essential information on the spectrum. After that, m/z and abundance values of "peak groups", i.e. of peaks resulting from filtering and joining of isotopic peaks, are listed.

Summary data
In this part, the name associated to the spectrum is listed together with its total number of peaks, the number of peaks included in the analysis range, the m/z value of the normalization peak (if used) and its overall abundance (resulting from the sum of the abundance of all its isotopic peaks), and, finally, the numer of peak groups that were identified.
Example:
Spectrum name: 20A
There are 202 peaks in the spectrum
There are 76 peaks in the range
Normalization peak was found at 1420.763 m/z
Normalization abundance is 1472
There are 36 peak groups in the range

Peak groups data
This section includes a list of all peak groups identified.
For each group, its m/z and abundance values, i.e. of the m/z value of the base (monoisotopic) peak and the overall abundance associated to that peak (sum of the abundances of all isotopic peaks), are reported.
Morover, the list of isotopic peaks associated to that peak group, each of which with its m/z and abundance values, is reported.
Example:
Peak Group 8, Basic peak 1360.739 m/z, Overall abundance 5.571
--> m/z 1360.739, ab 2.378
--> m/z 1361.724, ab 1.970
--> m/z 1362.730, ab 1.223
Peak Group 12, Basic peak 1403.749 m/z, Overall abundance 7.473
--> m/z 1403.749, ab 2.717
--> m/z 1404.736, ab 2.038
--> m/z 1405.756, ab 1.562
--> m/z 1406.742, ab 1.155

Example files
The following files were generated by Geena 2 from the example input data file. See input parameters above.

Geena2_861_Sample1_20A_groups.html

Geena2_861_Sample1_20B_groups.html

Geena2_861_Sample1_20C_groups.html

Filtered spectra
These files include the results of pre-processing of spectra. They are used as input for the computation of the average spectrum for a given sample.
They are formatted as simple text, but with the defined syntax that is shown below, since they constitute an intermediate result and must be further analysed by Geena 2.
These files can be downloaded at the end of the analysis.
Their name is
"<job>_<sample>_<spectrum>_filtered.txt"
where <job> is the job name given by the researcher, <sample> is the label that is associated by Geena 2 to the sample in the analysis, and <spectrum> is the name of the spectrum in the input file

Spectrum section
The filtered spectrum is shown as a list of pairs of m/z and abundance values which are ordered by increasing m/z value and listed in column, as specified below.

First line
The section begins with a line that includes the reference name for the replicate, preceded by a "#" character.

Following lines
Following lines include pairs of m/z and abundance values, each pair representing a peak of the filtered spectrum. The first number reports the m/z value, the second one the abundance (aka intensity). Values are separated by a tab character.
Last line
At the end of the spectrum, a line with two backslashes is included.

Example

#20A 1360.739 5.571 1403.749 7.473 1420.763 230.163 1432.766 35.054 1442.746 12.636 1522.815 7.201 1524.793 27.582 1536.821 38.315 1640.852 5.299 \\

Example files
The following files were generated by Geena 2 from the example input data file. See input parameters above.

Geena2_861_Sample1_20A_filtered.txt

Geena2_861_Sample1_20B_filtered.txt

Geena2_861_Sample1_20C_filtered.txt

Average spectra
These files include the average spectrum achieved by aligning and averaging all spectra from the same sample. They are used as input for the computation of the average spectrum and the alignment for all samples under anmalysis.
They are formatted as simple text having almost the same format of filtered spectra.
These files can be downloaded at the end of the analysis.
Their name is
"<job>_<sample>_average.txt"
where <job> is the job name given by the researcher, and <sample> is the label that is associated by Geena 2 to the sample in the analysis.

Spectrum section
The filtered spectrum is shown as a list of pairs of m/z and abundance values which are ordered by increasing m/z value and listed in column, as specified below.

First line
The section begins with a line that includes the label that is associated by Geena 2 to the sample in the analysis, preceeded by a "#" and followed by " (avg)".

Following lines
Following lines include pairs of m/z and abundance values, each pair representing a peak of the filtered spectrum. The first number reports the m/z value, the second one the abundance (aka intensity). Values are separated by a tab character.

Example

#Sample 1 (avg) 1360.738 5.466 1403.742 7.704 1420.763 228.586 1432.766 36.095 1442.745 10.274 1443.744 8.435 1476.694 5.527 1522.810 7.452 1524.791 26.454 1536.823 39.780 1548.814 6.207 1640.849 5.626

Example files
The following files were generated by Geena 2 from the example input data file. See input parameters above.

Geena2_861_Sample1_average.txt

Geena2_861_Sample2_average.txt

Geena2_861_Sample3_average.txt

Alignments
These files include the alignment data generated either from filtered spectra of the same sample or from average spectra of all samples in the analysis.
They are formatted as simple text, but with the defined syntax that is presented below.
These files can be downloaded at the end of the analysis.
Their name is "<jobname>_Alignment.txt" (for overal alignement) and "<job>_<sample>_alignment.txt" (for single samples),
where <job> is the job name given by the researcher
and <sample> is the label associated to the sample by Geena 2.
An HTML version of these results is also shown at the end of the anlysis in the results page.

First line
All files begin with a line that includes the job name for the analysis, preceded by a "#" character.

Alignment data
The alignment is reported in a table. The second row of this file includes some headers.
Each following row includes data that refer to the alignment of a single peak. These numbers are separated each other by tab characters. Aligned peaks are ordered by increasing m/z value.
The first number in the row refers to the number of aligned signals for the peak.
The second number refers to the m/z value of the aligned peak.
From the third number, the m/z value of aligned peaks in average spectra are reported. Since the alignment can be defined on the basis of a limited number of aligned average spectra (as said, the firt number of the row shows how many peaks were aligned for the given peak), some m/z values may be missing. In this case, the value is not shown, but both tab characters, those that should preceed and follow the value, are included. This allows to identify the exact average spectrum for which the value is missing, i.e. the spectrum that wasn't aligned for the peak.
Similarly, the mean abundance / intensity values and the abundance values of aligned average spectra are reported in the following positions of the row. Again, missing values are not included, but two consecutive tab characters are found.

Example
NB! Header not shown for overall readability.
2 1360.738 1360.739 1360.736 5.466 5.571 5.361 2 1403.742 1403.749 1403.735 7.704 7.473 7.934 3 1420.763 1420.763 1420.763 1420.763 228.586 230.163 226.361 229.235 3 1432.766 1432.766 1432.767 1432.766 36.095 35.054 35.204 38.027 3 1442.745 1442.746 1442.746 1442.743 10.274 12.636 12.755 5.432 1 1443.744 1443.744 8.435 8.435 1 1476.694 1476.694 5.527 5.527 3 1522.810 1522.815 1522.804 1522.810 7.452 7.201 8.078 7.076 3 1524.791 1524.793 1524.791 1524.789 26.454 27.582 26.190 25.590 3 1536.823 1536.821 1536.823 1536.825 39.780 38.315 41.497 39.528 1 1548.814 1548.814 6.207 6.207 2 1640.849 1640.852 1640.846 5.626 5.299 5.952

Example file
The following file was generated by Geena 2 from the example input data file. See input parameters above.

Geena2_861_alignment.txt

Geena2_861_Sample1_alignment.txt

Geena2_861_Sample2_alignment.txt

Geena2_861_Sample3_alignment.txt

For information, get in touch with:
Paolo Romano,
IRCCS Ospedale Policlinico San Martino,
Genoa, Italy
Click here to see my email address

If you use Geena, please cite the following paper:
Romano P et al.
Geena 2, improved automated analysis of MALDI/TOF mass spectra.
BMC Bioinformatics 2016, 17(Suppl 4):61
PMID: 26961516; DOI: 10.1186/s12859-016-0911-2