| Title: | Microbial Ecology by Tandem Mass Spectrometry |
|---|---|
| Description: | Tools that researchers can use to analyze untargeted metabolomics data generated using tandem mass spectroscopy from microbial communities. The overall approach taken to analyze metabolomics data parallels that used to analyze microbial communities using 16S rRNA gene sequencing data. Thus, we have a number of methods a user is able to use to generate data. Firstly, users can import Mass Spectrometry 1(MS1) data and filter it. Users are then able to match Mass Spectrometry 2(MS2) data to the filtered (or unfiltered) MS1 data. With the matched data users are able to cluster it, annotate it, predict de novo chemical formulas and calculate alpha and beta diversity. For chemical formula predictions, this was the method used; "Towards de novo identification of metabolites by analyzing tandem mass spectra" (Sebastian Böcker, Florian Rasche (2008) <doi:10.1093/bioinformatics/btn270>). The similarity/dissimilarity calculations we used to cluster our data together was: "Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification" (Li, Y., Kind, T., Folz, J. et al. (2021) <doi:10.1038/s41592-021-01331-z>) and "Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking" (Wang, M., Carver, J., Phelan, V. et al. (2021) <doi:10.1038/nbt.3597>). |
| Authors: | Allison Mason [aut] (ORCID: <https://orcid.org/0000-0003-1339-1592>), Gregory Johnson [aut] (ORCID: <https://orcid.org/0009-0008-3890-0297>), Patrick Schloss [aut, cre] (ORCID: <https://orcid.org/0000-0002-6935-4275>), Anton Pervukhin [ctb, cph], Florian Rasche [ctb, cph], Henner Sudek [ctb, cph], Marcel Martin [ctb, cph], Yuanyue Li [ctb, cph] |
| Maintainer: | Patrick Schloss <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1 |
| Built: | 2026-06-11 21:25:51 UTC |
| Source: | https://github.com/mums2/mums2 |
Alpha Diversity calculates the amount of diversity in a single sample. We can conduct this analysis using your created community object. We support the use of Shannon and Simpson diversity index.
alpha_summary( community_object, size, threshold, diversity_index = c("shannon", "simpson"), subsample = TRUE, number_of_threads = detectCores(), iterations = 100, seed = 123 )alpha_summary( community_object, size, threshold, diversity_index = c("shannon", "simpson"), subsample = TRUE, number_of_threads = detectCores(), iterations = 100, seed = 123 )
community_object |
the object created from
the |
size |
the size you wish to rarefy your diversity matrix to. |
threshold |
the threshold you want your species to reach before it is included in the rarefaction sum. |
diversity_index |
the diversity index you wish to calculate diversity, the options are shannon, simpson, or richness. You may also compute many indexes at the same time using a vector (ie. c("shannon", "simpson")). |
subsample |
if true, we will rarefy the data before we run the diversity calculations. Default is TRUE. |
number_of_threads |
the number of threads you wish to use for this calculation. Defaults to the number of threads on your computer. |
iterations |
the amount of times you wish to run your calculation. |
seed |
the RNG (random number generator) seed you would like to use. |
a data.frame object that shows the dissimilarity in samples.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") community_object <- create_community_matrix_object(cluster_results) alpha_summary(community_object = community_object, size = 400, threshold = 100, diversity_index = c("shannon", "simpson", "richness"), subsample = TRUE, iterations = 1, number_of_threads = 1)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") community_object <- create_community_matrix_object(cluster_results) alpha_summary(community_object = community_object, size = 400, threshold = 100, diversity_index = c("shannon", "simpson", "richness"), subsample = TRUE, iterations = 1, number_of_threads = 1)
Annotate query LC-MS/MS features in a mass_data object given a
reference list.
annotate_ms2() allows for annotation of mass spectrometry features.
Similarity between query and reference level 2 spectra are determined via
spectral scoring methods. Currently scoring methods "gnps" and
"spectral_entropy" are supported. The scoring method is specified by the
score_params argument. score_params is a list of parameters for the
chosen scoring method. Parameters for "gnps" and "spectral_entropy" can be
created with functions modified_cosine_params() and
spec_entropy_params(), respectively. If you are using an hmdb
reference database, or a database that does not contain a precursormz
please ensure that you expand your ppm to account for the difference.
annotate_ms2(mass_data, reference, scoring_params, ppm, min_score, chemical_min_score, cluster_data = NULL, min_peaks = 0, number_of_threads = detectCores())annotate_ms2(mass_data, reference, scoring_params, ppm, min_score, chemical_min_score, cluster_data = NULL, min_peaks = 0, number_of_threads = detectCores())
mass_data |
The object generated from |
reference |
Your reference database generated from the
|
scoring_params |
Parameters for scoring method to be applied.
This can be either |
ppm |
Parts per million. MS2 scans with a difference in ppm less than or equal to this value will be scored. |
min_score |
Similarity score threshold to determine a match for annotation. Comparisons with scores below this value will not be reported. |
chemical_min_score |
data2 |
cluster_data |
the cluster object that was generated by
|
min_peaks |
the minimum number of peaks that need to be present before you compare the ms2 spectra. |
number_of_threads |
the number of threads you wish to use for this calculation. Defaults to the number of threads on your computer. |
A data.frame with all comparisons with scores above the threshold.
Information for the query scan include query_ms1_id (the variable_id
for features in expression_data of the mass_data object)
"query_ms2_id" (the ms2_spectrum_id in the query
object), "query_mz" (the precursor mz for the scan), and "query_rt"
(the retention time for the scan). query_mz and query_rt are derived
from the ms2 matches data. A column ("ref_idx) is included to report the
location for the matching reference molecule in "reference". Scores
are reported in the "score" column. query_formula and
chemical_similarity
are also reported. Annotation information is returned
given the information provided in the reference used as input.
a data.frame object containing annotations
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) massbank <- read_msp(mums2_example("MSMS-Neg-Respect.msp")) annotations <- annotate_ms2(mass_data = matched_data, reference = massbank, scoring_params = modified_cosine_params(0.5), ppm = 1.6e3, min_score = 0.5, chemical_min_score = 0, number_of_threads = 2)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) massbank <- read_msp(mums2_example("MSMS-Neg-Respect.msp")) annotations <- annotate_ms2(mass_data = matched_data, reference = massbank, scoring_params = modified_cosine_params(0.5), ppm = 1.6e3, min_score = 0.5, chemical_min_score = 0, number_of_threads = 2)
This function will change your ms1 peak table rt time to
rt time in seconds or minutes. This modification happens in place
(or by reference), so the mpactr_object will be updated.
change_rt_to_seconds_or_minute(mpactr_object, rt_type = "seconds")change_rt_to_seconds_or_minute(mpactr_object, rt_type = "seconds")
mpactr_object |
The object created from |
rt_type |
how you want to convert your retention time, your options are minutes, or seconds. defaults to seconds. |
a modified mpactr object.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") change_rt_to_seconds_or_minute(data, "minutes")data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") change_rt_to_seconds_or_minute(data, "minutes")
cluster_data() allows users to cluster features inside
the mass data object. This is done by creating a sparse matrix
using the distMs2() function and inputting that inside the
clutur package. This allows us to easily cluster features
that contain an ms2 spectra.
cluster_data( distance_df, ms2_match_data, cutoff = 0.3, cluster_method = "opticlust" )cluster_data( distance_df, ms2_match_data, cutoff = 0.3, cluster_method = "opticlust" )
distance_df |
a distance df that was generated
from the |
ms2_match_data |
your mass data set object generated
from |
cutoff |
the cutoff value you wish to cluster to. |
cluster_method |
the clustering algorithm you wish to use. The options are: furthest, nearest, weighted, average, and opticlust. |
a shared data.frame (or a mothur_cluster object) displaying all
the clustered and abundance data.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.6, precursor_thresh = 100, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust")data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.6, precursor_thresh = 100, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust")
Add another database to your reference database.
combined_reference_database(reference, other_reference)combined_reference_database(reference, other_reference)
reference |
reference database object. |
other_reference |
your other reference database object |
a reference_database that includes references from both
reference databases.
reference <- read_msp(mums2_example("massbank_example_data.msp")) reference2 <- read_msp(mums2_example("massbank_example_data.msp")) combined_reference_database(reference, reference2)reference <- read_msp(mums2_example("massbank_example_data.msp")) reference2 <- read_msp(mums2_example("massbank_example_data.msp")) combined_reference_database(reference, reference2)
de novo algorithm for computing molecular formulas. Using fragmentation trees we are able to generate a resultant molecular formula. To ensure efficient we are using a greedy heurstic to generate the resultant formula. Although this may not always result in the correct prediction, it allows us to efficiently calculate a multitudeof chemical formulas.
compute_molecular_formulas( mass_data, parent_ppm = 3, number_of_threads = detectCores() - 1 )compute_molecular_formulas( mass_data, parent_ppm = 3, number_of_threads = detectCores() - 1 )
mass_data |
your mass_data object generated from |
parent_ppm |
the ppm you wish to generate the candidate molecular formulas. |
number_of_threads |
the number of threads you wish to use for this calculation. Defaults to the number of threads on your computer. |
your mass_data object with an additional character
vector of all the predicted formulas.
Sebastian Böcker, Florian Rasche, Towards de novo identification of metabolites by analyzing tandem mass spectra, Bioinformatics, Volume 24, Issue 16, August 2008, Pages i49–i55, https://doi.org/10.1093/bioinformatics/btn270
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 0.1, 1) compute_molecular_formulas(matched_data, number_of_threads = 2)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 0.1, 1) compute_molecular_formulas(matched_data, number_of_threads = 2)
To account for users measuring there data in triplicates or other forms of measurement, we have implemented a function that can transform your matched data object to use group averages instead of each sample individually.
convert_to_group_averages(matched_data, mpactr_object)convert_to_group_averages(matched_data, mpactr_object)
matched_data |
your mass data set
object generated from |
mpactr_object |
The object created from |
a mass_data object using group averages
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) matched_data_avg <- convert_to_group_averages(matched_data, data)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) matched_data_avg <- convert_to_group_averages(matched_data, data)
Using your community_object, we are able to convert it into a community matrix for easier usability of the object.
create_community_matrix(cluster_object)create_community_matrix(cluster_object)
cluster_object |
the result of the matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") community_matrix <- create_community_matrix_object(cluster_results) |
a data.frame object of your community_object.
Using the data generated from clustering or adding ms2 data to your object, we are able to create a community matrix object. The community matrix object stores the same data a community matrix but within a cpp object. We use this object to conduct analysis more efficiently.
create_community_matrix_object(data) ## S3 method for class 'mass_data' create_community_matrix_object(data) ## S3 method for class 'mothur_cluster' create_community_matrix_object(data)create_community_matrix_object(data) ## S3 method for class 'mass_data' create_community_matrix_object(data) ## S3 method for class 'mothur_cluster' create_community_matrix_object(data)
data |
the result of the |
a external pointer to an Rcpp object.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") community_with_cluster <- create_community_matrix_object(cluster_results) community_object_mass_data <- create_community_matrix_object(matched_data)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") community_with_cluster <- create_community_matrix_object(cluster_results) community_object_mass_data <- create_community_matrix_object(matched_data)
dist_ms2 calculates and stores all non-zero distance values above
the user defined cutoff (default = 0.3).
dist_ms2( data, cutoff, precursor_threshold, score_params, min_peaks = 6, number_of_threads = detectCores() )dist_ms2( data, cutoff, precursor_threshold, score_params, min_peaks = 6, number_of_threads = detectCores() )
data |
the object generated from |
cutoff |
The maximum distance value ( |
precursor_threshold |
Precursor mz tolerance. MS2 scans with a difference in precursor mz less than or equal to this value will be scored. Disable this by setting this value to -1 or less. |
score_params |
Parameters for scoring method to be applied.
See |
min_peaks |
the minimum number of peaks that need to be present before you compare the ms2 spectra. |
number_of_threads |
the number of threads you wish to use for this calculation. Defaults to the number of threads on your computer. |
This function takes a mass_data object as input and calculates distance
between ms2 peaks. Currently, MS1 features without MS2 peaks returns
no distance value. Distance can be calculated with method "gnps"
or "spectral_entropy". A sparse matrix is returned.
A sparse matrix of class "data.frame"
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist_gnps <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_threshold = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) dist_entropy <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_threshold = 2, score_params = spec_entropy_params(), min_peaks = 0, number_of_threads = 2)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist_gnps <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_threshold = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) dist_entropy <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_threshold = 2, score_params = spec_entropy_params(), min_peaks = 0, number_of_threads = 2)
Creates a list of filter cv arguments
for the filter_peak_table() function
filter_cv_params(cv_threshold = NULL, copy_object = FALSE)filter_cv_params(cv_threshold = NULL, copy_object = FALSE)
cv_threshold |
Coefficient of variation threshold. A lower cv_threshold will result in more stringent filtering and higher reproducibility. Recommended values between 0.2 - 0.5. |
copy_object |
A |
a list object of arguments needed to call the given mpactr
function when supplied to the filter_peak_table() wrapper function.
filter_cv_params(0.2) filter_cv_params(0.2)filter_cv_params(0.2) filter_cv_params(0.2)
Creates a list of filter group arguments
for the filter_peak_table() function
filter_group_params( group_threshold = 0.01, group_to_remove, remove_ions = TRUE, copy_object = FALSE )filter_group_params( group_threshold = 0.01, group_to_remove, remove_ions = TRUE, copy_object = FALSE )
group_threshold |
Relative abundance threshold at which to remove ions. Default = 0.01. |
group_to_remove |
Biological group name to remove ions from. |
remove_ions |
A |
copy_object |
A |
a list object of arguments needed to call the given mpactr
function when supplied to the filter_peak_table() wrapper function.
filter_group_params(group_to_remove = "blank")filter_group_params(group_to_remove = "blank")
Creates a list of filter insource ions arguments
for the filter_peak_table() function
filter_insource_ions_params(cluster_threshold = 0.95, copy_object = FALSE)filter_insource_ions_params(cluster_threshold = 0.95, copy_object = FALSE)
cluster_threshold |
Cluster threshold for ion deconvolution. Default = 0.95. |
copy_object |
A |
a list object of arguments needed to call the given mpactr function
when supplied to the filter_peak_table() wrapper function.
filter_insource_ions_params()filter_insource_ions_params()
Creates a list of filter mispicked ions arguments
for the filter_peak_table() function
filter_mispicked_ions_params( ringwin = 0.5, isowin = 0.01, trwin = 0.005, max_iso_shift = 3, merge_peaks = TRUE, merge_method = "sum", copy_object = FALSE )filter_mispicked_ions_params( ringwin = 0.5, isowin = 0.01, trwin = 0.005, max_iso_shift = 3, merge_peaks = TRUE, merge_method = "sum", copy_object = FALSE )
ringwin |
Ringing mass window or detector saturation mass window. Default = 0.5 atomic mass units (AMU). |
isowin |
Isotopic mass window. Default = 0.01 AMU. |
trwin |
A |
max_iso_shift |
A |
merge_peaks |
A |
merge_method |
If merge_peaks is TRUE, a method for how similar peaks should be merged. Can be one of "sum". |
copy_object |
A |
a list object of arguments needed to call the given
mpactr function when supplied to the filter_peak_table() wrapper function.
filter_mispicked_ions_params()filter_mispicked_ions_params()
This function is a wrapper for all of mpactr's filter functions.
When called with a list of parameters that was generated from one of the
following functions, it will call the subsequent filter:
filter_mispicked_ions_params(), filter_group_params(),
filter_cv_params(), and filter_insource_ions_params().
You can also find more information on these functions inmpactr
documentation.
filter_peak_table(mpactr_object, params) ## S3 method for class 'filter_mispicked_ions' filter_peak_table(mpactr_object, params) ## S3 method for class 'filter_group' filter_peak_table(mpactr_object, params) ## S3 method for class 'filter_cv' filter_peak_table(mpactr_object, params) ## S3 method for class 'filter_insource_ions' filter_peak_table(mpactr_object, params)filter_peak_table(mpactr_object, params) ## S3 method for class 'filter_mispicked_ions' filter_peak_table(mpactr_object, params) ## S3 method for class 'filter_group' filter_peak_table(mpactr_object, params) ## S3 method for class 'filter_cv' filter_peak_table(mpactr_object, params) ## S3 method for class 'filter_insource_ions' filter_peak_table(mpactr_object, params)
mpactr_object |
the mpactr_object is an object generated from the
|
params |
the list of arguments generated from calling one of these
functions: |
a mpactr object that has been filter based on
the supplied parameters.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") filtered_data <- data |> filter_peak_table(filter_mispicked_ions_params())data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") filtered_data <- data |> filter_peak_table(filter_mispicked_ions_params())
This function will use the generated matched data, annotations and cluster data, to create a combined dataframe of all the generated data. It has the ability to create the dataframe without annotations or clustering data. However, if annotations are supplied and a feature has more than one annotation, the data will be returned in long format.
generate_a_combined_table( matched_data, annotations = NULL, cluster_data = NULL )generate_a_combined_table( matched_data, annotations = NULL, cluster_data = NULL )
matched_data |
massdata object created from |
annotations |
annotations table created from |
cluster_data |
cluster data created from |
a data.frame object.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") massbank <- read_msp(mums2_example("massbank_example_data.msp")) annotations <- annotate_ms2(mass_data = matched_data, reference = massbank, scoring_params = modified_cosine_params(0.5), ppm = 1000, min_score = 0.1, chemical_min_score = 0, number_of_threads = 2) generate_a_combined_table(matched_data, annotations, cluster_results)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") massbank <- read_msp(mums2_example("massbank_example_data.msp")) annotations <- annotate_ms2(mass_data = matched_data, reference = massbank, scoring_params = modified_cosine_params(0.5), ppm = 1000, min_score = 0.1, chemical_min_score = 0, number_of_threads = 2) generate_a_combined_table(matched_data, annotations, cluster_results)
Returns the community matrix or the data
that you used to create the object.
get_community_matrix(community_object)get_community_matrix(community_object)
community_object |
the object created from
the |
returns matrix, based on the community object.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") community_with_cluster <- create_community_matrix_object(cluster_results) community_object_mass_data <- create_community_matrix_object(matched_data)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) dist <- dist_ms2(data = matched_data, cutoff = 0.3, precursor_thresh = 2, score_params = modified_cosine_params(0.5), min_peaks = 0, number_of_threads = 2) cluster_results <- cluster_data(distance_df = dist, ms2_match_data = matched_data, cutoff = 0.3, cluster_method = "opticlust") community_with_cluster <- create_community_matrix_object(cluster_results) community_object_mass_data <- create_community_matrix_object(matched_data)
Returns all of the molecular formula predictions.
get_molecular_formula_preds(mass_data)get_molecular_formula_preds(mass_data)
mass_data |
The object generated from |
a character vector contain all of your predicted molecular
formulas.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 0.1, 1) matched_data <- compute_molecular_formulas(matched_data, number_of_threads = 2) get_molecular_formula_preds(matched_data)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 0.1, 1) matched_data <- compute_molecular_formulas(matched_data, number_of_threads = 2) get_molecular_formula_preds(matched_data)
A getter that returns the generated ms2 data in your mass_data object.
get_ms1_data(mass_data)get_ms1_data(mass_data)
mass_data |
The object generated from |
a data.frame object containing ms1 data.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) get_ms1_data(matched_data)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) get_ms1_data(matched_data)
A getter that returns the generated ms2 data in your mass_data object.
get_ms2_matches(mass_data)get_ms2_matches(mass_data)
mass_data |
The object generated from |
a data.frame object containing ms2 data.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) get_ms2_matches(matched_data)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) get_ms2_matches(matched_data)
A getter that will return the peak data of all of the matched specturms. The peak data is the list of mz/intensities found in the ms2 file.
get_ms2_peaks_data(mass_data)get_ms2_peaks_data(mass_data)
mass_data |
The object generated from |
a list object containing peak data.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) get_ms2_peaks_data(matched_data)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) get_ms2_peaks_data(matched_data)
Will return the data inside the reference object based on the index given.
get_reference_data(reference, index)get_reference_data(reference, index)
reference |
reference database object. |
index |
the index of the data. The index starts at 1. |
returns a list object with all of the reference data at
the specified index.
reference <- read_msp(mums2_example("massbank_example_data.msp")) get_reference_data(reference, 1)reference <- read_msp(mums2_example("massbank_example_data.msp")) get_reference_data(reference, 1)
Returns a list of your samples found in the metadata file.
get_samples(mass_data)get_samples(mass_data)
mass_data |
The object generated from |
a character vector contain all of your samples.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) get_samples(matched_data)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) get_samples(matched_data)
This function is a wrapper for the mpactr import_data function. It will import your peak table and meta data and create a mpactr_object.
import_all_data(peak_table, metadata, format)import_all_data(peak_table, metadata, format)
peak_table |
The file path to your feature table file. |
metadata |
The file path to your metadata file or |
format |
The expected exported type of your peak table, can be one of "Progenesis", "Metaboscape", "None". |
a mpactr object.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None")data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None")
returns the length of the database
## S3 method for class 'reference_database' length(x)## S3 method for class 'reference_database' length(x)
x |
reference database object. |
returns the length of the regerence database
reference <- read_msp(mums2_example("massbank_example_data.msp")) length(reference)reference <- read_msp(mums2_example("massbank_example_data.msp")) length(reference)
modified_cosine_params() generates a parameter list to perform GNPS-like
cosine similarity score calculation between two MS2 spectra.
modified_cosine_params(frag_tolerance)modified_cosine_params(frag_tolerance)
frag_tolerance |
The mz fragment tolerance threshold for aligning fragment peaks from two ms2 spectra. GNPS default = 0.5. |
modified_cosine_params() will initiate cosine scoring based on the Python
code by Wang et al. (2016), which is currently used for cosine scoring
in GNPS, to calculate similarity between two MS2 spectra. This scoring
method will compare peaks data, apply a square root normalization
to peak intensities, align peaks both with and without correction
for mass shifts, and calculate similarity.
A parameters list for similarity scoring method "gnps"
Mingxun Wang, Jeremy J. Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don Duy Nguyen et al. "Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking." Nature biotechnology 34, no. 8 (2016): 828. PMID: 27504778
modified_cosine_params(0.5)modified_cosine_params(0.5)
We are matching your ms1 to your supplied ms2 by looking at the difference between the mz and rt.
ms2_ms1_compare(ms2_files, mpactr_object, mz_tolerance, rt_tolerance)ms2_ms1_compare(ms2_files, mpactr_object, mz_tolerance, rt_tolerance)
ms2_files |
a list of all your mgf, mzml, or mzxml files. |
mpactr_object |
your mpactr object created from |
mz_tolerance |
your mass-charge ratio tolerance in ppm (parts per million). |
rt_tolerance |
your retention time tolerance. |
returns a mass_data object of all of the ms2 and ms1 matches.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6)
mums2 contains a number of example files in the inst/extdata directory.
This function makes them accessible in documentation that shows how file
paths are used in function examples.
mums2_example(file = NULL)mums2_example(file = NULL)
file |
Name of a file. If |
A file path to example data stored in the inst/extdata directory
of the package.
returns a character object
mums2_example() mums2_example("massbank_example_data.msp")mums2_example() mums2_example("massbank_example_data.msp")
S3 function for printing the community object
## S3 method for class 'community_object' print(x, ...)## S3 method for class 'community_object' print(x, ...)
x |
the object created from the |
... |
other parameters that are included in the |
returns matrix representation of the community object and
prints it to the screen.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) community_object <- create_community_matrix_object(matched_data) print(community_object)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) community_object <- create_community_matrix_object(matched_data) print(community_object)
print reference objects.
## S3 method for class 'reference_database' print(x, ...)## S3 method for class 'reference_database' print(x, ...)
x |
reference database object. |
... |
any extra print arguments you want to include. |
prints customized message to the console
reference <- read_msp(mums2_example("massbank_example_data.msp")) print(reference)reference <- read_msp(mums2_example("massbank_example_data.msp")) print(reference)
rarefy_ms() performs a single subsampling of MS1 features in sample.
Feature intensities are subsampled to the supplied size and accounts
for intensity thresholds due to machine limits and background noise.
Specifically, features whose abundance falls below the threshold
after rarefying are removed. This allows for accurate representation
of samples at different dilutions regardless of the desired
submsampling size.
rarefy_ms( community_object, size, threshold, number_of_threads = detectCores(), seed = 123 )rarefy_ms( community_object, size, threshold, number_of_threads = detectCores(), seed = 123 )
community_object |
A |
size |
The desired total sample intensity to subsample to. |
threshold |
The individual feature threshold. Each subsampled feature must be >= this value to be retained. |
number_of_threads |
the number of threads you wish to use for this calculation. Defaults to the number of threads on your computer. |
seed |
the RNG (random number generator) seed you would like to use. |
A external_pointer that references a community
matrix of rarefied feature intensities.
returns a matrix object that contains your rarefied data.
data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) community_object <- create_community_matrix_object(matched_data) rarefy_ms(community_object, 400, 10)data <- import_all_data(peak_table = mums2::mums2_example("botryllus_pt_small.csv"), metadata = mums2::mums2_example("boryillus_metadata.csv"), format = "None") matched_data <- ms2_ms1_compare(mums2_example("botryllus_v2.gnps.mgf"), data, 1, 6) community_object <- create_community_matrix_object(matched_data) rarefy_ms(community_object, 400, 10)
This function allows you to create an hmdb database. However you are required to supply an xml hmdb file and a folder path that contains all of the ms2 spectras from the hmdb download page https://www.hmdb.ca/downloads.
read_hmdb(hmdb_file, ms2_folder)read_hmdb(hmdb_file, ms2_folder)
hmdb_file |
the xml hmdb file. |
ms2_folder |
the folder path of your ms2 spectra files. |
a reference_database object.
Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, Dizon R, Sayeeda Z, Tian S, Lee BL, Berjanskii M, Mah R, Yamamoto M, Jovel J, Torres-Calzada C, Hiebert-Giesbrecht M, Lui VW, Varshavi D, Varshavi D, Allen D, Arndt D, Khetarpal N, Sivakumaran A, Harford K, Sanford S, Yee K, Cao X, Budinski Z, Liigand J, Zhang L, Zheng J, Mandal R, Karu N, Dambrova M, Schiöth HB, Greiner R, Gautam V. HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res. 2022 Jan 7;50(D1):D622-D631. doi: 10.1093/nar/gkab1062. PMID: 34986597; PMCID: PMC8728138.
read_msp(mums2_example("massbank_example_data.msp" ))read_msp(mums2_example("massbank_example_data.msp" ))
Creates a reference database by reading a download msp file. These files can be downloaded from sites like https://systemsomicslab.github.io/compms/msdial/main.html#MSP or https://mona.fiehnlab.ucdavis.edu/downloads
read_msp(msp_file)read_msp(msp_file)
msp_file |
the file path of your msp file |
a reference_database object.
read_msp(mums2_example("massbank_example_data.msp"))read_msp(mums2_example("massbank_example_data.msp"))
Calculate spectral entropy similarity between two MS2 spectra
spec_entropy_params( ms2_tolerance_in_da = 0.02, ms2_tolerance_in_ppm = -1, clean_spectra = TRUE, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, max_peak_num = 100, weighted = TRUE )spec_entropy_params( ms2_tolerance_in_da = 0.02, ms2_tolerance_in_ppm = -1, clean_spectra = TRUE, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, max_peak_num = 100, weighted = TRUE )
ms2_tolerance_in_da |
MS2 peak tolerance in Da, set to -1 to disable.
Defaults to |
ms2_tolerance_in_ppm |
MS2 peak tolerance in ppm, set to -1 to disable.
Defaults to |
clean_spectra |
Either |
min_mz |
|
max_mz |
|
noise_threshold |
Background intensity threshold, all peaks with
intensity < noise_threshold * max_intensity are removed. Set to -1 to
disable. Defaults to |
max_peak_num |
|
weighted |
|
spec_entropy_params() will initiate spectral entropy similarity scoring via
the msentropy package (Li et al. 2021). For more information about
parameters see there GitHub.
A parameters list for similarity scoring method "spectral_entropy"
Li, Y., Kind, T., Folz, J. et al. Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification, Nat Methods 18, 1524–1531 (2021). https://doi.org/10.1038/s41592-021-01331-z
spec_entropy_params()spec_entropy_params()