Title: | PARAFAC Analysis of EEMs from DOM |
---|---|
Description: | 'This is a user-friendly way to run a parallel factor (PARAFAC) analysis (Harshman, 1971) <doi:10.1121/1.1977523> on excitation emission matrix (EEM) data from dissolved organic matter (DOM) samples (Murphy et al., 2013) <doi:10.1039/c3ay41160e>. The analysis includes profound methods for model validation. Some additional functions allow the calculation of absorbance slope parameters and create beautiful plots.' |
Authors: | Matthias Pucher [aut, cre], Daniel Graeber [aut, ctb], Stefan Preiner [ctb], Renata Pinto [ctb] |
Maintainer: | Matthias Pucher <[email protected]> |
License: | AGPL-3 |
Version: | 1.1.29 |
Built: | 2025-03-04 05:50:58 UTC |
Source: | https://github.com/matthiaspucher/stardom |
Import EEMs from generic csv files.
.eem_csv(file, col = "ex")
.eem_csv(file, col = "ex")
file |
path to file |
col |
either "ex" or "em", whatever wavelength is arranged in columns |
list with EEM data
Add data of a PARAFAC model derived from multiway from EEMs
.trans_parafac(parafac, em, ex, samples, comp, const, norm_factors)
.trans_parafac(parafac, em, ex, samples, comp, const, norm_factors)
parafac |
parafac model |
em |
emission wavelengths |
ex |
excitation wavelengths |
samples |
sample names |
comp |
number of components |
const |
constraints |
norm_factors |
factors to invert normalisation |
parafac model
Samples from an eemlist that were not used in the modelling process are added as entries in the A-modes. Values are calculated using fixed B and C modes in the PARAFAC algorithm. B and C modes can be provided via a previously calculated model or as matrices manually.
A_missing( eem_list, pfmodel = NULL, cores = parallel::detectCores(logical = FALSE), adj_samples = TRUE, components = NULL, const = NULL, control = NULL, ... )
A_missing( eem_list, pfmodel = NULL, cores = parallel::detectCores(logical = FALSE), adj_samples = TRUE, components = NULL, const = NULL, control = NULL, ... )
eem_list |
object of class eemlist with sample data |
pfmodel |
object of class parafac |
cores |
number of cores to use for parallel processing |
adj_samples |
logical, whether wavelengths in the samples are cut automatically to fit the present data in the model |
components |
optionally supply components to use manually, either as a variable of class parafac_components or as a list of variables of class parafac_components, if you do so, |
const |
optional constraints for model, just used, when components are supplied |
control |
optional constraint control parameters for model, just used, when components are supplied |
... |
additional arguments passed to eem_parafac |
This function can be used to calculate A modes (sample loadings) for samples that were previously excluded from the modelling process (e.g. outliers). The wavelengths in the samples and in the PARAFAC model must be identically. Surplus wavelengths in the samples can be adjusted automatically with the argument adj_samples = TRUE, surplus wavelengths in the PARAFAC model have to be mitigated by interpolating the missing wavelengths in the samples manually. If there is a mismatch in wavelengths, the function provides you with the necessary information. Another way to use it would be a recombination of components from different models and calculating the according sample loadings. Expecially the later application is experimental and results have to be seen critically! Nevertheless, I decided to supply this function to stimulate some experiments on that and would be interested in your findings and feedback.
object of class parafac
data(eem_list) data(pf_models) A_missing(eem_list, pf4[[1]], cores = 2)
data(eem_list) data(pf_models) A_missing(eem_list, pf4[[1]], cores = 2)
Baseline correction for absorbance data
abs_blcor(abs_data, wlrange = c(680, 700))
abs_blcor(abs_data, wlrange = c(680, 700))
abs_data |
data.frame containing samples in columns, the column containing wavelengths must be named "wavelength" |
wlrange |
range of wavelengths that should be used for correction, absorbance mean in that range is subtracted from each value (sample-wise) |
data.frame
data(absorbance) abs_data_cor <- abs_blcor(absorbance) abs_data_cor1 <- abs_blcor(absorbance[1:2])
data(absorbance) abs_data_cor <- abs_blcor(absorbance) abs_data_cor1 <- abs_blcor(absorbance[1:2])
drm
is used for the fitting process.Fit absorbance data to exponential curve. drm
is used for the fitting process.
abs_fit_slope( wl, abs, lim, l_ref = 350, control = drmc(errorm = FALSE, noMessage = TRUE), ... )
abs_fit_slope( wl, abs, lim, l_ref = 350, control = drmc(errorm = FALSE, noMessage = TRUE), ... )
wl |
vector containing wavelengths |
abs |
vector containing absorption in m^-1 |
lim |
vector containing lower and upper limits for wavelengths to use |
l_ref |
numerical. reference wavelength, default is 350, if set to NA l_ref is fitted |
control |
control parameters for drm, see |
... |
parameters that are passed on to drm |
numeric exponential slope coefficient
data(absorbance) abs_fit_slope(absorbance$wavelength,absorbance$sample1,lim=c(350,400),l_ref=350)
data(absorbance) abs_fit_slope(absorbance$wavelength,absorbance$sample1,lim=c(350,400),l_ref=350)
Calculating slopes and slope ratios of a data frame of absorbance data.
abs_parms( abs_data, cuvle = NULL, unit = c("absorbance", "absorption"), add_as = NULL, limits = list(c(275, 295), c(350, 400), c(300, 700)), l_ref = list(275, 350, 300), S = TRUE, lref = FALSE, p = FALSE, model = FALSE, Sint = FALSE, interval = 21, r2threshold = 0.8, cores = parallel::detectCores(logical = FALSE), verbose = FALSE )
abs_parms( abs_data, cuvle = NULL, unit = c("absorbance", "absorption"), add_as = NULL, limits = list(c(275, 295), c(350, 400), c(300, 700)), l_ref = list(275, 350, 300), S = TRUE, lref = FALSE, p = FALSE, model = FALSE, Sint = FALSE, interval = 21, r2threshold = 0.8, cores = parallel::detectCores(logical = FALSE), verbose = FALSE )
abs_data |
data frame containing absorbance data. |
cuvle |
cuvette (path) length in cm, ignored if unit is absorption |
unit |
unit of absorbance data: if "absorbance", absorbance data is multiplied by log(10) = 2.303 for slope calculations |
add_as |
additionally to a254 and a300, absorbance at certain wavelengths can be added to the table |
limits |
list with vectors containig upper and lower bounds of wavelengeth ranges to be fitted |
l_ref |
list with reference wavelengths, same length as limits |
S |
logical, include slope indices in the table |
lref |
logical, include reference wavelength in the table |
p |
logical, include ps of the coefficients in the table |
model |
logical, include complete model in data frame |
Sint |
logical, wether the spectral curve is calculated interval-wise ( |
interval |
passed on to |
r2threshold |
passed on to |
cores |
number of cores to be used for parallel processing |
verbose |
logical, additional information is provided |
The absorbance data is a data frame with the first column called "wavelength" containg the wavelength. Each other column contains the data from one sample. You can use absorbance_read to read in appropriate data.
The following spectral parameters are calculated:
slope between 275 and 295 nm calculated with nonlinear regression
slope between 350 and 400 nm calculated with nonlinear regression
slope between 275 and 295 nm calculated with nonlinear regression
SR slope ratio, calculated by
E2:E3 ratio
E4:E6 ratio
absorbance at 254 nm
absorbance at 300 nm
Depending on available wavelength range, values might be NA. Additionally other wavelength limits can be defined. The slope ratio might fail in this case. For further details please refer to Helm et al. (2008).
A data frame containing the adsorption slopes and slope ratios in column, one line for each sample.
Helms, J., Kieber, D., Mopper, K. 2008. Absorption spectral slopes and slope ratios as indicators of molecular weight, source, and photobleaching of chromophoric dissolved organic matter. Limnol. Oceanogr., 53(3), 955–969 https://aslopubs.onlinelibrary.wiley.com/doi/10.4319/lo.2008.53.3.0955
data(absorbance) a1 <- abs_parms(absorbance, cuvle = 5, verbose = TRUE, cores = 2) a2 <- abs_parms(absorbance, cuvle = 5,l_ref=list(NA,NA,NA), lref=TRUE, cores = 2) # fit lref as well
data(absorbance) a1 <- abs_parms(absorbance, cuvle = 5, verbose = TRUE, cores = 2) a2 <- abs_parms(absorbance, cuvle = 5,l_ref=list(NA,NA,NA), lref=TRUE, cores = 2) # fit lref as well
Reading absorbance data from txt and csv files.
absorbance_read( absorbance_path, order = TRUE, recursive = TRUE, dec = NULL, sep = NULL, verbose = FALSE, cores = parallel::detectCores(logical = FALSE), ... )
absorbance_read( absorbance_path, order = TRUE, recursive = TRUE, dec = NULL, sep = NULL, verbose = FALSE, cores = parallel::detectCores(logical = FALSE), ... )
absorbance_path |
directory containing absorbance data files or path to single file. See details for format of absorbance data. |
order |
logical, data is ordered according to wavelength |
recursive |
read files recursive, include subfolders |
dec |
optional, either you set a decimal separator or the table is tested for . and , |
sep |
optional, either you set a field separator or it is tried to be determined automatically |
verbose |
logical, provide more information |
cores |
number of CPU cores to be used simultanuously |
... |
additional arguments that are passed on to |
If absorbance_path is a directory, contained files that end on "csv" or "txt" are passed on to read.table
. If the path is a file, this file is read. Tables can either contain data from one sample or from several samples in columns. The first column is considered the wavelength column. A multi-sample file must have sample names as column names. All tables are combined to one with one wavelength column and one column for each sample containing the absorbance data.
Column and decimal separators are guessed from the supplied data. In some cases, this can lead to strange results. Plaese set 'sep' and 'dec' manually if you encounter any problems.
A data frame containing absorbance data. An attribute "location" contains the filenames where each sample was taken from.
absorbance_path <- system.file("extdata", "absorbance", package = "staRdom") absorbance <- absorbance_read(absorbance_path, verbose = TRUE, cores = 2)
absorbance_path <- system.file("extdata", "absorbance", package = "staRdom") absorbance <- absorbance_read(absorbance_path, verbose = TRUE, cores = 2)
Converting EEM data from class eem to data.frame.
## S3 method for class 'eem' as.data.frame(x, row.names = NULL, optional = FALSE, gather = TRUE, ...)
## S3 method for class 'eem' as.data.frame(x, row.names = NULL, optional = FALSE, gather = TRUE, ...)
x |
abc |
row.names |
abc |
optional |
ignored |
gather |
logical, says whether data.frame is returned with excitation wavelength as column names or as values of a column. If the data is gathered, the sample name is added as value in a calumn |
... |
ignored |
A data frame containing the EEM data.
data(eem_list) as.data.frame(eem_list[[1]]) as.data.frame(eem_list[[1]],gather=FALSE)
data(eem_list) as.data.frame(eem_list[[1]]) as.data.frame(eem_list[[1]],gather=FALSE)
According to dilution data absorbance is either multiplied by the according factor or the undiluted absorbance data is deleted. You can either specify the cor_data data table coming from eem_dilcorr
or supply an eemlist, and the dilution data to created on the fly.
eem_absdil( abs_data, eem_list = NULL, dilution = NULL, cor_data = NULL, auto = TRUE, verbose = FALSE )
eem_absdil( abs_data, eem_list = NULL, dilution = NULL, cor_data = NULL, auto = TRUE, verbose = FALSE )
abs_data |
absorbance data |
eem_list |
optional eemlist |
dilution |
optional dilution data as data frame |
cor_data |
optional output from |
auto |
optional, see |
verbose |
optional, see |
data frame
# no appropriate exmaple data available yet
# no appropriate exmaple data available yet
Applying functions on EEMs
eem_apply(data, func, return = c("eemlist", "value"), ...)
eem_apply(data, func, return = c("eemlist", "value"), ...)
data |
eemlist to be modified |
func |
a function to be applied on the data. |
return |
either "eemlist" or "value" |
... |
additional arguments passed on to func |
The EEMs are passed on as first argument to func. Additionally, the vector of excitation wavelengths is passed on as ex
and the emission wavelengths as em
. Therefore, the supplied function has to allow these arguments. The easiest way would be ...
(see example).
eemlist or list
## define a function, that would divide a matrix by its maximum # more general, if you want to return a valid eemlist (see below), # a matrix of the same size has to be returned # ... is used as a placeholder for any argument, important: em and # ex wavelengths are passed on, so the function needs to take them as arguments, # even if they are not used norm_max <- function(x, ...){ x/max(x) } # load example data data("eem_list") # normalise eems by the function defined above norm_eems <- eem_apply(eem_list,norm_max,"eemlist") # plot the results to see the difference ggeem(norm_eems) # define another function. what values were used to # multiply the eems with? norm_fac <- function(x, ...){ 1/max(x) } # return a list of factors used for normalisation norm_factors <- eem_apply(eem_list,norm_fac,"value") unlist(norm_factors) # return list of em vectors. # important: x needs to be in the first position, but # is not used later! extr_em <- function(x,em,...){ em } em_vectors <- eem_apply(eem_list,extr_em,"value") em_vectors
## define a function, that would divide a matrix by its maximum # more general, if you want to return a valid eemlist (see below), # a matrix of the same size has to be returned # ... is used as a placeholder for any argument, important: em and # ex wavelengths are passed on, so the function needs to take them as arguments, # even if they are not used norm_max <- function(x, ...){ x/max(x) } # load example data data("eem_list") # normalise eems by the function defined above norm_eems <- eem_apply(eem_list,norm_max,"eemlist") # plot the results to see the difference ggeem(norm_eems) # define another function. what values were used to # multiply the eems with? norm_fac <- function(x, ...){ 1/max(x) } # return a list of factors used for normalisation norm_factors <- eem_apply(eem_list,norm_fac,"value") unlist(norm_factors) # return list of em vectors. # important: x needs to be in the first position, but # is not used later! extr_em <- function(x,em,...){ em } em_vectors <- eem_apply(eem_list,extr_em,"value") em_vectors
The function tries to lead you to possible problems in your data.
eem_checkdata( eem_list, absorbance, metadata = NULL, metacolumns = NULL, correction = FALSE, error = TRUE )
eem_checkdata( eem_list, absorbance, metadata = NULL, metacolumns = NULL, correction = FALSE, error = TRUE )
eem_list |
eemlist continaing EEM data. |
absorbance |
data.frame containing absorbance data. |
metadata |
optional data.frame containing metadata. |
metacolumns |
character vector of columns that are checkt for complete data sets |
correction |
logical, whether EEMs should be checked for applied corrections |
error |
logical, whether a problem should cause an error or not. |
The returned list contains character vectors with sample names where possible problems were found: problem (logical, whether a severe problem was found), nas (sample names with NAs in EEM data), missing_correction (correction of EEM samples was not done or not done successfully),eem_no_abs (EEM samples with no absorbance data), abs_no_eem (samples with present absorbance but no EEM data), duplse (duplicate sample names in EEM data), duplsa (duplicate sample names in absorbance data), invalid_eem (invalid EEM sample name), invalid_abs (invalid absorbance sample name), range_mismatch (wavelength ranges of EEM and absorbance data are mismatching), metadupls (duplicate sample names in metadata), metamissing (EEM samples where metadata is missing), metaadd (samples in metadata without EEM data)
writes out possible porblems to command line, additionally list with sample names where possible problems were found, see details.
folder <- system.file("extdata/EEMs", package = "staRdom") # load example data eem_list <- eem_read(folder, recursive = TRUE, import_function = eem_csv) abs_folder <- system.file("extdata/absorbance", package = "staRdom") # load example data absorbance <- absorbance_read(abs_folder, cores = 2) metatable <- system.file("extdata/metatable_dreem.csv",package = "staRdom") meta <- read.table(metatable, header = TRUE, sep = ",", dec = ".", row.names = 1) checked <- eem_checkdata(eem_list, absorbance, metadata = meta, metacolumns = "dilution", error = FALSE) # This example returns a message, that absorbance data for the # blank samples are missing. As absorbance is supposed to be 0 over # the whole spectrum when you measure blanks, there is no need # to supply the data and do an inner-filter effect correction.
folder <- system.file("extdata/EEMs", package = "staRdom") # load example data eem_list <- eem_read(folder, recursive = TRUE, import_function = eem_csv) abs_folder <- system.file("extdata/absorbance", package = "staRdom") # load example data absorbance <- absorbance_read(abs_folder, cores = 2) metatable <- system.file("extdata/metatable_dreem.csv",package = "staRdom") meta <- read.table(metatable, header = TRUE, sep = ",", dec = ".", row.names = 1) checked <- eem_checkdata(eem_list, absorbance, metadata = meta, metacolumns = "dilution", error = FALSE) # This example returns a message, that absorbance data for the # blank samples are missing. As absorbance is supposed to be 0 over # the whole spectrum when you measure blanks, there is no need # to supply the data and do an inner-filter effect correction.
The size of EEMs in an eemlist is checked and the sample names of samples with more data than the sample with the smallest range are returned.
eem_checksize(eem_list)
eem_checksize(eem_list)
eem_list |
eemlist |
character vector
data(eem_list) eem_checksize(eem_list)
data(eem_list) eem_checksize(eem_list)
Return names of samples where certain corrections are missing.
eem_corrections(eem_list)
eem_corrections(eem_list)
eem_list |
eemlist to be checked |
prints out sample names
data(eem_list) eem_corrections(eem_list)
data(eem_list) eem_corrections(eem_list)
This function can be used to import generic csv files containing EEM data using eem_read
. Excitation wavelengths are assumed column-wise and emission wavelengths row-wise. If your data is arranged the other way round, please use eem_csv2
eem_csv(file)
eem_csv(file)
file |
path to file passed from eem_read |
list with EEM data
eems <- system.file("extdata/EEMs",package="staRdom") eem_list <- eem_read(eems, recursive = TRUE, import_function = eem_csv) eem_list
eems <- system.file("extdata/EEMs",package="staRdom") eem_list <- eem_read(eems, recursive = TRUE, import_function = eem_csv) eem_list
This function can be used to import generic csv files containing EEM data using eem_read
. Excitation wavelengths are assumed row-wise and emission wavelengths column-wise If your data is arranged the other way round, please use eem_csv
eem_csv2(file)
eem_csv2(file)
file |
path to file passed from eem_read |
list with EEM data
## no example data provided with the package ## below is an example how this could like like # eems <- "C:/some/path/to/eem.csv" # eem_list <- eem_read(eems, recursive = TRUE, import_function = eem_csv2) # eem_list
## no example data provided with the package ## below is an example how this could like like # eems <- "C:/some/path/to/eem.csv" # eem_list <- eem_read(eems, recursive = TRUE, import_function = eem_csv2) # eem_list
Due to dilution absorbance spectra need to be multiplied by the dilution factor and names of EEM samples can be adjusted to be similar to their undiluted absorbance sample. The table contains information about these two steps. Undiluted samples are suggested by finding absorbance samples match the beginning of EEM sample name (see details).
eem_dilcorr(eem_list, abs_data, dilution, auto = FALSE, verbose = TRUE)
eem_dilcorr(eem_list, abs_data, dilution, auto = FALSE, verbose = TRUE)
eem_list |
eemlist |
abs_data |
absorbance data as data frame |
dilution |
dilution data as data frame with rownames |
auto |
way how to deal with dilution is chosen automatically. See details. |
verbose |
print out more information |
If you choose an automatic analysis EEMs are renamed if there is only one matching undiluted absorbance sample. Matching samples is done by comparing the beginning of the sample name (e.g. "sample3_1to10" fits "sample3").
data frame
# no appropriate exmaple data available yet
# no appropriate exmaple data available yet
If samples were diluted before measuring, a dilution factor has to be added to the measured data. This function can do that by either multilpying each sample with the same value or using a data frame with different values for each sample.
eem_dilution(data, dilution = 1)
eem_dilution(data, dilution = 1)
data |
fluorescence data with class eemlist |
dilution |
dilution factor(s), either numeric value or data frame. Row names of data frame have to be similar to sample names in eemlist. |
fluorescence data with class eemlist
data(eem_list) eem_list2 <- eem_dilution(eem_list, dilution = 5) dilutionT <- data.frame(dilution = rep(5, length(eem_list))) row.names(dilutionT) <- eem_names(eem_list) dilutionT eem_list3 <- eem_dilution(eem_list, dilution = dilutionT)
data(eem_list) eem_list2 <- eem_dilution(eem_list, dilution = 5) dilutionT <- data.frame(dilution = rep(5, length(eem_list))) row.names(dilutionT) <- eem_names(eem_list) dilutionT eem_list3 <- eem_dilution(eem_list, dilution = dilutionT)
Check for duplicate sample names
eem_duplicates(data) ## Default S3 method: eem_duplicates(data) ## S3 method for class 'eemlist' eem_duplicates(data) ## S3 method for class 'data.frame' eem_duplicates(data)
eem_duplicates(data) ## Default S3 method: eem_duplicates(data) ## S3 method for class 'eemlist' eem_duplicates(data) ## S3 method for class 'data.frame' eem_duplicates(data)
data |
eemlist or data.frame containing absorbance data |
named character vector with duplicate sample names
### check
### check
In your default editor (e.g. RStudio), a Rmd file is opened. It consists of bloacks gathering the parameters and information needed and continues with a series of data corrections, peak picking and plots. Finally you get a report of your analysis, a table with the peaks and optional pngs of your fluorescence data. To continue working and keeping your settings, the file can be sa ved anywhere and reused anytime.
eem_easy()
eem_easy()
Function does not work well in Windows. You might try file.edit(system.file("EEM_simple_analysis.Rmd", package = "staRdom"))
A pdf report, a peak picking table and optional plots.
## Not run: # eem_easy() # this function fails very often, so you might use that: file.edit(system.file("EEM_simple_analysis.Rmd", package = "staRdom")) ## End(Not run)
## Not run: # eem_easy() # this function fails very often, so you might use that: file.edit(system.file("EEM_simple_analysis.Rmd", package = "staRdom")) ## End(Not run)
Correct names of EEM samples to match undiluted absorbance data.
eem_eemdil( eem_list, abs_data = NULL, dilution = NULL, cor_data = NULL, auto = TRUE, verbose = FALSE )
eem_eemdil( eem_list, abs_data = NULL, dilution = NULL, cor_data = NULL, auto = TRUE, verbose = FALSE )
eem_list |
eemlist |
abs_data |
optinal absorbance data as data frame |
dilution |
optinal dilution data as data frame |
cor_data |
optional output from |
auto |
optional, see |
verbose |
optional, see |
eemlist
# no appropriate exmaple data available yet
# no appropriate exmaple data available yet
Outliers in all modes should be avoided. With this functions excitation or emission wavelengths as well as samples can be removed completely from your sample set.
eem_exclude(eem_list, exclude = list, verbose = FALSE)
eem_exclude(eem_list, exclude = list, verbose = FALSE)
eem_list |
object of class eemlist |
exclude |
list of three vectors, see details |
verbose |
states whether additional information is given in the command line |
The argument exclude is a named list of three vectors. The names must be "ex", "em" and "sample". Each element contains a vector of wavelengths or sample names that are to be excluded from the data set.
object of class eemlist
data(eem_list) exclude <- list("ex" = c(280,285,290,295), "em" = c(), "sample" = c("667sf", "494sf") ) eem_list_ex <- eem_exclude(eem_list, exclude)
data(eem_list) exclude <- list("ex" = c(280,285,290,295), "em" = c(), "sample" = c("667sf", "494sf") ) eem_list_ex <- eem_exclude(eem_list, exclude)
Export all samples of an eem_list
eem_export(file, format = c("csv", "mat"), ...)
eem_export(file, format = c("csv", "mat"), ...)
file |
path to directory (csv format) or file (Matlab format) |
format |
either "csv" or "mat" to specify export format |
... |
one or more eem_list objects |
0 on successful export
# create temporary directory to write out file <- paste0(tempdir(),"/eem_export/") dir.create(file) # run eem_export to write one csv file for each sample eem_export(file, format = "csv", eem_list) # show content of output directory dir(file)
# create temporary directory to write out file <- paste0(tempdir(),"/eem_export/") dir.create(file) # run eem_export to write one csv file for each sample eem_export(file, format = "csv", eem_list) # show content of output directory dir(file)
Compared to the whole sample set, wavelengths missing in some samples are added and set NA or interpolated. This can be especially helpful, if you want to combine data measured with different wavelength intervals in a given range.
eem_extend2largest(eem_list, interpolation = FALSE, ...)
eem_extend2largest(eem_list, interpolation = FALSE, ...)
eem_list |
eemlist |
interpolation |
logical, whether added NAs should be interpolated |
... |
arguments passed to eem_interp |
eemlist
library(dplyr) data(eem_list) eem_list <- eem_exclude(eem_list[1:5] %>% `class<-`("eemlist"), exclude = list(em = c(318,322,326,550,438), ex = c(270,275))) %>% eem_bind(eem_list[6:15] %>% `class<-`("eemlist")) ggeem(eem_list) eem_extend2largest(eem_list) %>% ggeem()
library(dplyr) data(eem_list) eem_list <- eem_exclude(eem_list[1:5] %>% `class<-`("eemlist"), exclude = list(em = c(318,322,326,550,438), ex = c(270,275))) %>% eem_bind(eem_list[6:15] %>% `class<-`("eemlist")) ggeem(eem_list) eem_extend2largest(eem_list) %>% ggeem()
Determines the the biggest range of EEM spectrum where data is available from each sample.
eem_getextreme(data)
eem_getextreme(data)
data |
eemlist |
list of numeric vector containing the biggest available range
data(eem_list) eem_getextreme(eem_list) eem_list <- eem_range(eem_list,ex = c(250,Inf),em = c(280,500)) eem_getextreme(eem_list)
data(eem_list) eem_getextreme(eem_list) eem_list <- eem_range(eem_list,ex = c(250,Inf),em = c(280,500)) eem_getextreme(eem_list)
This function can be used to import txt files from Hitachi F-7000 containing EEM data using eem_read
.
eem_hitachi(file)
eem_hitachi(file)
file |
path to file passed from eem_read |
list with EEM data
## no example data provided with the package ## below is an example how this could like like # eems <- "C:/some/path/to/hitachi.TXT" # eem_list <- eem_read(eems, recursive = TRUE, import_function = eem_hitachi) # eem_list
## no example data provided with the package ## below is an example how this could like like # eems <- "C:/some/path/to/hitachi.TXT" # eem_list <- eem_read(eems, recursive = TRUE, import_function = eem_hitachi) # eem_list
Calls eem_inner_filter_effect
for each sample to use different cuvette lengths.
eem_ife_correction( data, abs_data, cuvl = NULL, unit = c("absorbance", "absorption") )
eem_ife_correction( data, abs_data, cuvl = NULL, unit = c("absorbance", "absorption") )
data |
fluorescence data of class eemlist |
abs_data |
absorbance data |
cuvl |
length of cuvette of absorption measurment in cm. Either a number or a data frame. Row names of data frame have to be similar to sample names in data. This is ignored, if unit is "absorption". |
unit |
unit of absorbance data. Either "absorbance" or "absorption". |
fluorescence data of class eemlist
folder <- system.file("extdata/cary/scans_day_1", package = "eemR") # load example data eem_list <- eem_read(folder, import_function = "cary") data(absorbance) eem_list <- eem_ife_correction(data = eem_list, abs_data = absorbance, cuvl = 5, unit = "absorbance")
folder <- system.file("extdata/cary/scans_day_1", package = "eemR") # load example data eem_list <- eem_read(folder, import_function = "cary") data(absorbance) eem_list <- eem_ife_correction(data = eem_list, abs_data = absorbance, cuvl = 5, unit = "absorbance")
Reads Rdata and RDa files with one eemlist each. The eemlists are combined into one and returned.
eem_import_dir(dir, verbose = FALSE)
eem_import_dir(dir, verbose = FALSE)
dir |
folder where RData files are saved |
verbose |
logical, set TRUE to show more information during import |
eemlist
## Not run: # due to package size issues no example data is provided for this function # eem_import_dir("C:/some_folder/with_EEMS/only_Rdata_files") ## End(Not run)
## Not run: # due to package size issues no example data is provided for this function # eem_import_dir("C:/some_folder/with_EEMS/only_Rdata_files") ## End(Not run)
Missing EEM data can be interpolated. Usually it is the result of removing scatter or other parts where noise is presumed. Different interpolation algorithms can be used (see details).
eem_interp( data, cores = parallel::detectCores(logical = FALSE), type = TRUE, verbose = FALSE, nonneg = TRUE, extend = FALSE, ... )
eem_interp( data, cores = parallel::detectCores(logical = FALSE), type = TRUE, verbose = FALSE, nonneg = TRUE, extend = FALSE, ... )
data |
object of class eemlist with spectra containing missing values |
cores |
specify number of cores for parallel computation |
type |
numeric 0 to 4 or TRUE which resembles type 1 |
verbose |
logical, whether more information on calculation should be provided |
nonneg |
logical, whether negative values should be replaced by 0 |
extend |
logical, whether data is extrapolated using type 1 |
... |
arguments passed on to other functions (pchip, na.approx, mba.points) |
The types of interpolation are (0) setting all NAs to 0, (1) spline interpolation with mba.points
, (2) excitation and emission wavelength-wise interpolation with pchip
and subsequent mean, (3) excitation wavelength-wise interpolation with pchip
and (4) linear interpolation in 2 dimensions with na.approx
and again subsequent mean calculation. Calculating the mean is a way of ensuring NAs are also interpolated where missing boundary values would make that impossible. Using type = 1, extrapolation can be suppressed by adding the argument extend = FALSE.
object of class eemlist with interpoleted spectra.
Elcoroaristizabal, S., Bro, R., García, J., Alonso, L. 2015. PARAFAC models of fluorescence data with scattering: A comparative study. Chemometrics and Intelligent Laboratory Systems, 142, 124-130 doi:10.1016/j.chemolab.2015.01.017
data(eem_list) eem_list <- eem_list[1:6] class(eem_list) <- "eemlist" remove_scatter <- c(FALSE, TRUE, TRUE, TRUE) remove_scatter_width = c(15, 10, 16, 12) eem_list <- eem_rem_scat(eem_list, remove_scatter, remove_scatter_width) eem_list <- eem_interp(eem_list, cores = 2) ggeem(eem_list) eem_list2 <- eem_setNA(eem_list, ex = 200:280, interpolate=FALSE) ggeem(eem_list2) eem_list3 <- eem_interp(eem_list2, type = 1, extend = TRUE, cores = 2) ggeem(eem_list3) eem_list3 <- eem_interp(eem_list2, type = 1, extend = FALSE, cores = 2) ggeem(eem_list3)
data(eem_list) eem_list <- eem_list[1:6] class(eem_list) <- "eemlist" remove_scatter <- c(FALSE, TRUE, TRUE, TRUE) remove_scatter_width = c(15, 10, 16, 12) eem_list <- eem_rem_scat(eem_list, remove_scatter, remove_scatter_width) eem_list <- eem_interp(eem_list, cores = 2) ggeem(eem_list) eem_list2 <- eem_setNA(eem_list, ex = 200:280, interpolate=FALSE) ggeem(eem_list2) eem_list3 <- eem_interp(eem_list2, type = 1, extend = TRUE, cores = 2) ggeem(eem_list3) eem_list3 <- eem_interp(eem_list2, type = 1, extend = FALSE, cores = 2) ggeem(eem_list3)
Check for NAs in EEM data
eem_is.na(eem_list)
eem_is.na(eem_list)
eem_list |
eemlist to check |
named character vector with sample names where EEM data contains NAs
### check
### check
15 fluorescence samples from drEEM used for examples.
eem_list
eem_list
eemlist
2 fluorescence samples from drEEM that were excluded as outliers from the PARAFAC model.
eem_list_outliers
eem_list_outliers
eemlist
Load original data from the drEEM tutorial and return it as eemlist
eem_load_dreem()
eem_load_dreem()
eemlist
# Reading MATLAB files from recent versions like the demo dataset from drEEM # can cause problems if the R installation lacks UTF8 support in iconv. # Therefore, we use try() in the example. If you encounter related problems, # please refer to the help for R.matlab::readMat() for details. eem_list <- try(eem_load_dreem(), silent = FALSE) eem_list
# Reading MATLAB files from recent versions like the demo dataset from drEEM # can cause problems if the R installation lacks UTF8 support in iconv. # Therefore, we use try() in the example. If you encounter related problems, # please refer to the help for R.matlab::readMat() for details. eem_list <- try(eem_load_dreem(), silent = FALSE) eem_list
Multiply all EEMs with a matrix
eem_matmult(eem_list, matrix = NULL, value = 0)
eem_matmult(eem_list, matrix = NULL, value = 0)
eem_list |
EEM data as eemlist |
matrix |
either a vactor containing "l" and/or "u" or a matrix, see details. |
value |
in case matrices "l" or "u" are used, this specifies the value to use in this areas. Usually this is 0 (default) or NA but any numeric value can be used. |
All EEMs must be of the same size. If matrix is of type matrix, it is used right away to multiply the EEMs. It has to be of the same size as the EEMs. If matrix is a vector containing "l", values below 1st order Rayleigh scattering are set to 0. If matrix contains "u", values above 2nd order Raman scattering are set to 0. If you want to remove wavelength ranges, take into consideration to use eem_cut
or eem_range
.
eemlist
data(eem_list) eem <- eem_list[1:9] class(eem) <- "eemlist" ggeem(eem) eem_list_cut <- eem_matmult(eem,matrix=c("l"), value= NA) ggeem(eem_list_cut)
data(eem_list) eem <- eem_list[1:9] class(eem) <- "eemlist" ggeem(eem) eem_list_cut <- eem_matmult(eem,matrix=c("l"), value= NA) ggeem(eem_list_cut)
You can use this table as an overview of your files and/or as a template for creating a metadata table.
eem_metatemplate(eem_list = NULL, absorbance = NULL)
eem_metatemplate(eem_list = NULL, absorbance = NULL)
eem_list |
eemlist |
absorbance |
data frame with absorbance data |
data frame
folder <- system.file("extdata/EEMs", package = "staRdom") # load example data eem_list <- eem_read(folder, recursive = TRUE, import_function = eem_csv) data(absorbance) eem_metatemplate(eem_list,absorbance)
folder <- system.file("extdata/EEMs", package = "staRdom") # load example data eem_list <- eem_read(folder, recursive = TRUE, import_function = eem_csv) data(absorbance) eem_metatemplate(eem_list,absorbance)
Sample names in eemlist can be altered.
eem_name_replace(eem_list, pattern, replacement)
eem_name_replace(eem_list, pattern, replacement)
eem_list |
data of class eemlist |
pattern |
character vector containing pattern to look for. |
replacement |
character vector of replacements. Has to have the same length as pattern |
str_replace_all
from package stringr is used for the replacement. Please read the corresponding help for further options.
An eemlist.
eem_names(eem_list) eem_list <- eem_name_replace(eem_list,"sample","Sample") eem_names(eem_list)
eem_names(eem_list) eem_list <- eem_name_replace(eem_list,"sample","Sample") eem_names(eem_list)
Plot fluorescence data from several samples split into several plots.
eem_overview_plot(data, spp = 8, ...)
eem_overview_plot(data, spp = 8, ...)
data |
fluorescence data of class eemlist |
spp |
number of samples per plot or a vector with the numbers of rows and columns in the plot. |
... |
arguments passed on to |
list of ggplots
data(eem_list) eem_overview_plot(eem_list, spp = 9) # define number of rows and columns in plot eem_overview_plot(eem_list, spp = c(3, 5))
data(eem_list) eem_overview_plot(eem_list, spp = 9) # define number of rows and columns in plot eem_overview_plot(eem_list, spp = c(3, 5))
One or more PARAFAC models can be calculated depending on the number of components. The idea is to compare the different models to get the most suitable. B-mode is emmission wavelengths, C-mode is excitation wavelengths and, A-mode is the loadings of the samples. The calculation is done with parafac
, please see details there.
eem_parafac( eem_list, comps, maxit = 2500, normalise = TRUE, const = c("nonneg", "nonneg", "nonneg"), nstart = 30, ctol = 10^-8, strictly_converging = FALSE, cores = parallel::detectCores(logical = FALSE), verbose = FALSE, output = "best", ... )
eem_parafac( eem_list, comps, maxit = 2500, normalise = TRUE, const = c("nonneg", "nonneg", "nonneg"), nstart = 30, ctol = 10^-8, strictly_converging = FALSE, cores = parallel::detectCores(logical = FALSE), verbose = FALSE, output = "best", ... )
eem_list |
object of class |
comps |
vector containing the desired numbers of components. For each of these numbers one model is calculated |
maxit |
maximum iterations for PARAFAC algorithm |
normalise |
state whether EEM data should be normalised in advance |
const |
constraints of PARAFAC analysis. Default is non-negative ("nonneg"), alternatively smooth and non-negative ("smonon") might be interesting for an EEM analysis. |
nstart |
number of random starts |
ctol |
Convergence tolerance (R^2 change) |
strictly_converging |
calculate nstart converging models and take the best. Please see details! |
cores |
number of parallel calculations (e.g. number of physical cores in CPU) |
verbose |
print infos |
output |
Output the |
... |
additional parameters that are passed on to |
PARAFAC models are created based on multiple random starts. In some cases, a model does not converge and the resulting model is then based on less than nstart converging models. In case you want to have nstart converging models, set strictly_converging TRUE. This calculates models stepwise until the desired number is reached but it takes more calculation time. Increasing the number of models from the beginning is much more time efficient.
object of class parafac
data(eem_list) dim_min <- 3 # minimum number of components dim_max <- 7 # maximum number of components nstart <- 25 # random starts for PARAFAC analysis, models built simulanuously, best selected # cores <- parallel::detectCores(logical=FALSE) # use all cores but do not use all threads cores <- 2 # package checks only run with 2 cores maxit = 2500 ctol <- 10^-7 # tolerance for parafac pfres_comps <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), normalise = TRUE, maxit = maxit, nstart = nstart, ctol = ctol, cores = cores) ## with a defined number of converging models #pfres_comps <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), # normalise = TRUE, maxit = maxit, nstart = nstart, ctol = ctol, # output = "all", strictly_converging = TRUE, cores = cores, verbose = TRUE) pfres_comps2 <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), normalise = TRUE, maxit = maxit, nstart = nstart, ctol = ctol, cores = cores, output = "all")
data(eem_list) dim_min <- 3 # minimum number of components dim_max <- 7 # maximum number of components nstart <- 25 # random starts for PARAFAC analysis, models built simulanuously, best selected # cores <- parallel::detectCores(logical=FALSE) # use all cores but do not use all threads cores <- 2 # package checks only run with 2 cores maxit = 2500 ctol <- 10^-7 # tolerance for parafac pfres_comps <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), normalise = TRUE, maxit = maxit, nstart = nstart, ctol = ctol, cores = cores) ## with a defined number of converging models #pfres_comps <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), # normalise = TRUE, maxit = maxit, nstart = nstart, ctol = ctol, # output = "all", strictly_converging = TRUE, cores = cores, verbose = TRUE) pfres_comps2 <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), normalise = TRUE, maxit = maxit, nstart = nstart, ctol = ctol, cores = cores, output = "all")
Calculate raman area of EEM samples
eem_raman_area(eem_list, blanks_only = TRUE, average = FALSE)
eem_raman_area(eem_list, blanks_only = TRUE, average = FALSE)
eem_list |
An object of class eemlist. |
blanks_only |
logical. States whether all samples or just blanks will be used. |
average |
logical. States whether samples will be averaged before calculating the raman area. |
Code based on eem_raman_normalisation
.
data frame containing sample names, locations and raman areas
folder <- system.file("extdata/EEMs",package="staRdom") eem_list <- eem_read(folder, recursive = TRUE, import_function = eem_csv) blank <- eem_extract(eem_list,sample ="blank", keep = TRUE) eem_raman_area(blank)
folder <- system.file("extdata/EEMs",package="staRdom") eem_list <- eem_read(folder, recursive = TRUE, import_function = eem_csv) blank <- eem_extract(eem_list,sample ="blank", keep = TRUE) eem_raman_area(blank)
Usually Raman normalisation is done with fluorescence data from a blank sample. Sometimes you already know a value for the Raman area. This function can do both.
eem_raman_normalisation2(data, blank = "blank")
eem_raman_normalisation2(data, blank = "blank")
data |
fluorescence data of class eemlist |
blank |
defines how Raman normalisation is done (see 'Details') |
Possible values for blank:
"blank": normalisation is done with a blank sample. Please refer to eem_raman_normalisation
.
numeric: normalisation is done with one value for all samples.
data frame: normalisation is done with different values for different samples. Values are taken from a data.frame with sample names as rownames and one column containing the raman area values.
fluorescence data of class eemlist
data(eem_list) # correction by blank eems_bl <- eem_raman_normalisation2(eem_list,blank="blank") # correction by value eems_num <- eem_raman_normalisation2(eem_list,blank=168)
data(eem_list) # correction by blank eems_bl <- eem_raman_normalisation2(eem_list,blank="blank") # correction by value eems_num <- eem_raman_normalisation2(eem_list,blank=168)
Cut EEM data matching a given wavelength range
eem_range(data, ex = c(0, Inf), em = c(0, Inf))
eem_range(data, ex = c(0, Inf), em = c(0, Inf))
data |
EEM data as eemlist |
ex |
optional desired range of excitation wavelength |
em |
optional desired range of emission wavelength |
An eemlist of reduced spectra size.
data(eem_list) eem_range(eem_list,ex = c(250,Inf),em = c(280,500))
data(eem_list) eem_range(eem_list,ex = c(250,Inf),em = c(280,500))
This function is deprecate, please use eem_read
(..., import_function = eem_csv
) or eem_read(..., import_function = eem_csv2
) instead.
EEM data is loaded from generic files. First column and first row contains wavelength values. The other values are to be plain numbers. fread
is used to read the table. It offers a lot of helpful functions (e.g. skipping any number n of header lines by adding 'skip = n')
eem_read_csv( path, col = "ex", recursive = TRUE, is_blank_corrected = FALSE, is_scatter_corrected = FALSE, is_ife_corrected = FALSE, is_raman_normalized = FALSE, manufacturer = "unknown", ... )
eem_read_csv( path, col = "ex", recursive = TRUE, is_blank_corrected = FALSE, is_scatter_corrected = FALSE, is_ife_corrected = FALSE, is_raman_normalized = FALSE, manufacturer = "unknown", ... )
path |
path to file(s), either a filename or a folder |
col |
either "ex" or "em", what wavelengths are in the columns |
recursive |
logical, whether directories are loaded recursively |
is_blank_corrected |
logical, whether blank correction was done |
is_scatter_corrected |
logical, wether scatters were corrected |
is_ife_corrected |
logical, wether inner-filter effect correction was done |
is_raman_normalized |
logical, wether raman normalisation applied |
manufacturer |
string specifying manufacturer of instrument |
... |
parameters from other functions, currently not used |
eems <- system.file("extdata/EEMs",package="staRdom") eem_list <- eem_read_csv(eems) eem_list
eems <- system.file("extdata/EEMs",package="staRdom") eem_list <- eem_read_csv(eems) eem_list
Remove wavelengths, that are missing in at least one sample form the whole set.
eem_red2smallest(data, return = "eems", verbose = FALSE)
eem_red2smallest(data, return = "eems", verbose = FALSE)
data |
data of EEM samples as eemlist |
return |
either "eems" or "list", function returens reduced eems or a list of the wavelengths to be reduced. |
verbose |
states whether additional information is given in the command line |
This step is neccessary to perform a PARAFAC analysis which can only be calculated with spectra of similar range.
eemlist with reduced spectral width
require(dplyr) data(eem_list) eem_list_red <- eem_red2smallest(eem_list) # create an eemlist where data is missing eem_list2 <- eem_exclude(eem_list, list("ex" = c(280,290,350), "em" = c(402,510), "sample" = c())) # modify names of samples with missing data eem_names(eem_list2) <- paste0("x",eem_names(eem_list2)) # combined the lists with and without missing data eem_list3 <- eem_bind(eem_list,eem_list2) #ggeem(eem_list3) # reduce the data in the whole sampleset to the smallest wavelengths that are present in all samples eem_list4 <- eem_red2smallest(eem_list3) #ggeem(eem_list4)
require(dplyr) data(eem_list) eem_list_red <- eem_red2smallest(eem_list) # create an eemlist where data is missing eem_list2 <- eem_exclude(eem_list, list("ex" = c(280,290,350), "em" = c(402,510), "sample" = c())) # modify names of samples with missing data eem_names(eem_list2) <- paste0("x",eem_names(eem_list2)) # combined the lists with and without missing data eem_list3 <- eem_bind(eem_list,eem_list2) #ggeem(eem_list3) # reduce the data in the whole sampleset to the smallest wavelengths that are present in all samples eem_list4 <- eem_red2smallest(eem_list3) #ggeem(eem_list4)
Wrapper function to remove several scatterings in one step using eem_remove_scattering
.
eem_rem_scat( data, remove_scatter, remove_scatter_width = 10, interpolation = FALSE, cores = parallel::detectCores(logical = FALSE), verbose = FALSE )
eem_rem_scat( data, remove_scatter, remove_scatter_width = 10, interpolation = FALSE, cores = parallel::detectCores(logical = FALSE), verbose = FALSE )
data |
object of class eemlist |
remove_scatter |
logical vector. The meanings of the vector are "raman1", "raman2", "rayleigh1" and "rayleigh2" scattering. Set |
remove_scatter_width |
numeric vector containing width of scattering to remove. If there is only one element in this vector, each this is the width of each removed scattering. If there are 4 values, differnt widths are used ordered by "raman1", "raman2", "rayleigh1" and "rayleigh2". |
interpolation |
logical, optionally states whether interpolation is done right away |
cores |
optional, CPU cores to use for interpolation |
verbose |
logical, provide additional information |
eemlist
data(eem_list) remove_scatter <- c(TRUE, TRUE, TRUE, TRUE) remove_scatter_width = c(15,10,16,12) eems <- eem_rem_scat(eem_list,remove_scatter,remove_scatter_width) ggeem(eems)
data(eem_list) remove_scatter <- c(TRUE, TRUE, TRUE, TRUE) remove_scatter_width = c(15,10,16,12) eems <- eem_rem_scat(eem_list,remove_scatter,remove_scatter_width) ggeem(eems)
Determine the range of fluorescence values in a set of samples
eem_scale_ext(data)
eem_scale_ext(data)
data |
eemlist containing the EEM data |
numeric vector
data(eem_list) eem_scale_ext(eem_list)
data(eem_list) eem_scale_ext(eem_list)
set parts of specific samples to NA and optionally interpolate these parts
eem_setNA( eem_list, sample = NULL, em = NULL, ex = NULL, interpolate = TRUE, ... )
eem_setNA( eem_list, sample = NULL, em = NULL, ex = NULL, interpolate = TRUE, ... )
eem_list |
EEMs as eemlist |
sample |
optional, names or indices of samples to process |
em |
optional, emission wavelengths to set NA |
ex |
optional, excitation wavelengths to set NA |
interpolate |
FALSE, 1 or 2, interpolate NAs or not, 2 different methods, see |
... |
arguments passed on to |
Samples and wavelengths are optional and if not set all of them are considered in setting data to NA. Wavelengths can be set as vectors containing more than the wavelengths present in the data. E.g. 230:250 removes all wavelengths between 230 and 250 if present. Data is best interpolated if it does not reach data boundaries. Please check the results otherwise as in some cases the interpolation might not produce meaningful data.
eemlist
data(eem_list) eem <- eem_list[1:9] class(eem) <- "eemlist" ggeem(eem) eem_list2 <- eem_setNA(eem,ex=200:280,em=500:600, interpolate=FALSE) ggeem(eem_list2)
data(eem_list) eem <- eem_list[1:9] class(eem) <- "eemlist" ggeem(eem) eem_list2 <- eem_setNA(eem,ex=200:280,em=500:600, interpolate=FALSE) ggeem(eem_list2)
Smooth fluorescence data by calculating rolling mean along excitation wavelengths.
eem_smooth(data, n = 4, cores = parallel::detectCores(logical = FALSE))
eem_smooth(data, n = 4, cores = parallel::detectCores(logical = FALSE))
data |
fluorescence data of class eemlist |
n |
width of rolling mean window in nm |
cores |
number of CPU cores to be used |
eemlist with smoothed data
data(eem_list) eem_list <- eem_smooth(eem_list, n = 4, cores = 2)
data(eem_list) eem_list <- eem_smooth(eem_list, n = 4, cores = 2)
Multiply EEMs with spectral correction vectors (Emission and Excitation)
eem_spectral_cor(eem_list, Excor, Emcor)
eem_spectral_cor(eem_list, Excor, Emcor)
eem_list |
eemlist |
Excor |
data frame, first column wavelengths, second column excitation correction |
Emcor |
data frame, first column wavelengths, second column emission correction |
eemlist
eems <- system.file("extdata/EEMs",package="staRdom") eem_list <- eem_read(eems, recursive = TRUE, import_function = eem_csv) excorfile <- system.file("extdata/CorrectionFiles/xc06se06n.csv",package="staRdom") Excor <- data.table::fread(excorfile) emcorfile <- system.file("extdata/CorrectionFiles/mcorrs_4nm.csv",package="staRdom") Emcor <- data.table::fread(emcorfile) # adjust range of EEMs to cover correction vectors eem_list <- eem_range(eem_list,ex = range(Excor[,1]), em = range(Emcor[,1])) eem_list_sc <- eem_spectral_cor(eem_list,Excor,Emcor)
eems <- system.file("extdata/EEMs",package="staRdom") eem_list <- eem_read(eems, recursive = TRUE, import_function = eem_csv) excorfile <- system.file("extdata/CorrectionFiles/xc06se06n.csv",package="staRdom") Excor <- data.table::fread(excorfile) emcorfile <- system.file("extdata/CorrectionFiles/mcorrs_4nm.csv",package="staRdom") Emcor <- data.table::fread(emcorfile) # adjust range of EEMs to cover correction vectors eem_list <- eem_range(eem_list,ex = range(Excor[,1]), em = range(Emcor[,1])) eem_list_sc <- eem_spectral_cor(eem_list,Excor,Emcor)
Export samples in an EEM list to a single csv files
eem_write_csv(eem_list, output, ...)
eem_write_csv(eem_list, output, ...)
eem_list |
EEM data as eemlist |
output |
path to folder where csv files are exported to |
... |
additional arguments |
returns the exported EEMs as a list of data.frames
data(eem_list) output <- tempdir() output a <- eem_write_csv(eem_list, output)
data(eem_list) output <- tempdir() output a <- eem_write_csv(eem_list, output)
Data matrices from EEM are combined to an array that is needed for a PARAFAC analysis.
eem2array(eem_list)
eem2array(eem_list)
eem_list |
object of class eemlist |
object of class array
data(eem_list) X <- eem2array(eem_list)
data(eem_list) X <- eem2array(eem_list)
Combining extracted components of PARAFAC models
eempf_bindxc(components)
eempf_bindxc(components)
components |
list of parafac_components |
parafac_components
data(pf_models) pfmodel <- pf4[[1]] comps <- eempf_excomp(pfmodel,c(1,3)) comps2 <- eempf_excomp(pfmodel,c(4,6)) comps3 <- eempf_bindxc(list(comps, comps2))
data(pf_models) pfmodel <- pf4[[1]] comps <- eempf_excomp(pfmodel,c(1,3)) comps2 <- eempf_excomp(pfmodel,c(4,6)) comps3 <- eempf_bindxc(list(comps, comps2))
Additionally a bar plot with the amounts of each component in each sample is produced.
eempf_comp_load_plot(pfmodel, ...)
eempf_comp_load_plot(pfmodel, ...)
pfmodel |
object of class parafac |
... |
attributes passe don to |
ggplot
data(pf_models) eempf_comp_load_plot(pf4[[1]])
data(pf_models) eempf_comp_load_plot(pf4[[1]])
The components of a PARAFAC analysis are extracted as a data frame
eempf_comp_mat(pfmodel, gather = TRUE)
eempf_comp_mat(pfmodel, gather = TRUE)
pfmodel |
object of class parafac |
gather |
logical value whether excitation wavelengths are a column, otherwise excitation wavelengths are column names |
a list of class data frames
data(pf_models) eempf_comp_mat(pf4[[1]])
data(pf_models) eempf_comp_mat(pf4[[1]])
Extract names from PARAFAC model components
eempf_comp_names(pfmodel)
eempf_comp_names(pfmodel)
pfmodel |
parafac model |
vector of names or list of vecters of names
data(pf_models) eempf_comp_names(pf4) eempf_comp_names(pf4) <- c("A","B","C","D","E","F","G") value <- list(c("A1","B1","C1","D","E","F","G"), c("A2","B2","C","D","E","F","G"), c("A3","B3","C","D","E","F","G"), c("A4","B4","C","D","E","F","G"), c("A5","B5","C","D","E","F","G5") ) eempf_comp_names(pf4) <- value eempf_comp_names(pf4) ggeem(pf4[[1]])
data(pf_models) eempf_comp_names(pf4) eempf_comp_names(pf4) <- c("A","B","C","D","E","F","G") value <- list(c("A1","B1","C1","D","E","F","G"), c("A2","B2","C","D","E","F","G"), c("A3","B3","C","D","E","F","G"), c("A4","B4","C","D","E","F","G"), c("A5","B5","C","D","E","F","G5") ) eempf_comp_names(pf4) <- value eempf_comp_names(pf4) ggeem(pf4[[1]])
Set names of PARAFAC components
eempf_comp_names(pfmodel) <- value
eempf_comp_names(pfmodel) <- value
pfmodel |
model of class parafac |
value |
character vector containing the new names for the components |
parafac model
data(pf_models) eempf_comp_names(pf4) <- c("A","B","C","D","E","F","G")
data(pf_models) eempf_comp_names(pf4) <- c("A","B","C","D","E","F","G")
Three plots are returned:
plot of numer of components vs. model fit
plot of different components as colour maps
plot of different components as peak lines
The plots are intended to help with a suitable number of components.
eempf_compare(pfres, ...)
eempf_compare(pfres, ...)
pfres |
list of several objects of class parafac |
... |
arguments passed on to |
3 objects of class ggplot
data(pf_models) eempf_compare(pf4)
data(pf_models) eempf_compare(pf4)
Interactive 3D plots are created using plotly.
eempf_comps3D(pfmodel, which = NULL)
eempf_comps3D(pfmodel, which = NULL)
pfmodel |
object of class parafac |
which |
optional, if numeric selects certain component |
plotly plot
## Not run: data(pf_models) eempf_comps3D(pf4[[1]]) ## End(Not run)
## Not run: data(pf_models) eempf_comps3D(pf4[[1]]) ## End(Not run)
The convergence behaviour of all initialisations in a PARAFAC model is shown by printing the numbers
eempf_convergence(pfmodel, print = TRUE)
eempf_convergence(pfmodel, print = TRUE)
pfmodel |
PARAFAC model created with staRdom using output = "all" |
print |
logical, whether you want console output or just a list with results |
list with numbers of converging models, cflags and SSEs
data("pf_models") pfmodel <- pf4[[1]] conv_beh <- eempf_convergence(pfmodel)
data("pf_models") pfmodel <- pf4[[1]] conv_beh <- eempf_convergence(pfmodel)
This is basically a wrapper for corcondia
that deals with the normalisation of the original data., Other than corcondia
, the default dicisor = "core".
eempf_corcondia(pfmodel, eem_list, divisor = "core")
eempf_corcondia(pfmodel, eem_list, divisor = "core")
pfmodel |
PARAFAC model |
eem_list |
eemlist |
divisor |
divisor, please refer to |
numeric
## Not run: # due to data limitation in package, example does not work with that data! # eempf_corcondia(pfmodel,eem_list) ## End(Not run)
## Not run: # due to data limitation in package, example does not work with that data! # eempf_corcondia(pfmodel,eem_list) ## End(Not run)
A pair plot showing correlations between samples is created.
eempf_corplot( pfmodel, normalisation = FALSE, lower = list(continuous = "smooth"), mapping = aes(alpha = 0.2), ... )
eempf_corplot( pfmodel, normalisation = FALSE, lower = list(continuous = "smooth"), mapping = aes(alpha = 0.2), ... )
pfmodel |
object of class parafac |
normalisation |
logical, whether normalisation is undone or not |
lower |
style of lower plots, see |
mapping |
aesthetic mapping, see |
... |
passed on to |
object of class ggplot
data(pf_models) eempf_corplot(pf4[[1]])
data(pf_models) eempf_corplot(pf4[[1]])
Calculating correlations between the component loadings in all samples (C-Modes).
eempf_cortable(pfmodel, normalisation = FALSE, method = "pearson", ...)
eempf_cortable(pfmodel, normalisation = FALSE, method = "pearson", ...)
pfmodel |
results from a PARAFAC analysis, class parafac |
normalisation |
logical, whether normalisation is undone or not |
method |
method of correlation, passed to |
... |
passed on to |
matrix
data(pf_models) eempf_cortable(pf4[[1]])
data(pf_models) eempf_cortable(pf4[[1]])
Calculating EEMqual which is an indicator of a PARAFAC model's quality
eempf_eemqual(pfmodel, eem_list, splithalf = NULL, ...)
eempf_eemqual(pfmodel, eem_list, splithalf = NULL, ...)
pfmodel |
PARAFAC model |
eem_list |
EEM data as eemlist |
splithalf |
optionally, you can supplie available splithalf results from model to decrease computation time |
... |
additional arguments passed to splithalf |
data frame containing fit, corcondia, product of best TCCs from splithalf analysis, eemqual and splithalf models
Rasmus Bro, Maider Vidal, EEMizer: Automated modeling of fluorescence EEM data, Chemometrics and Intelligent Laboratory Systems, Volume 106, Issue 1, 2011, Pages 86-92, ISSN 0169-7439
# data(eem_list) # data(pf_models) # pfmodel <- pf4[[1]] # eempf_eemqual(eem_list,pfmodel) # insuficient example data to run!
# data(eem_list) # data(pf_models) # pfmodel <- pf4[[1]] # eempf_eemqual(eem_list,pfmodel) # insuficient example data to run!
Extracting components of a PARAFAC model
eempf_excomp(pfmodel, comps)
eempf_excomp(pfmodel, comps)
pfmodel |
parafac model |
comps |
vector with numbers of components to extract |
list
data(pf_models) pfmodel <- pf4[[1]] comps <- eempf_excomp(pfmodel,c(1,3))
data(pf_models) pfmodel <- pf4[[1]] comps <- eempf_excomp(pfmodel,c(1,3))
Create one table containing the PARAFAC models factors and optionally exporting it to csv or txt
eempf_export(pfmodel, export = NULL, Fmax = TRUE, ...)
eempf_export(pfmodel, export = NULL, Fmax = TRUE, ...)
pfmodel |
PARAFAC model |
export |
file path to export table |
Fmax |
rescale modes so the A mode shows the maximum fluorescence |
... |
additional parameters passed to |
data frame
data(pf_models) factor_table <- eempf_export(pf4[[1]])
data(pf_models) factor_table <- eempf_export(pf4[[1]])
Fits vs. components of PARAFAC models are plotted
eempf_fits(pfres, ...)
eempf_fits(pfres, ...)
pfres |
list of objects of class parafac |
... |
arguments passed on to ggplot |
object of class ggplot
data(pf_models) eempf_fits(pf4)
data(pf_models) eempf_fits(pf4)
Calculate the leverage of each emission and excitation wavelength and each sample from a single PARAFAC model
eempf_leverage(pfmodel)
eempf_leverage(pfmodel)
pfmodel |
object of class parafac |
list of 3 named vectors (emission, excitation wavelengths and samples)
data(pf_models) eempf_leverage(pf4[[1]])
data(pf_models) eempf_leverage(pf4[[1]])
Combine leverages into one data frame and add optional labels.
eempf_leverage_data(cpl, qlabel = 0.1)
eempf_leverage_data(cpl, qlabel = 0.1)
cpl |
leverage, outpout from |
qlabel |
optional, quantile of which labels are shown (1 = all, 0 = no labels) |
data frame
data(pf_models) leverage <- eempf_leverage(pf4[[1]]) lev_data <- eempf_leverage_data(leverage)
data(pf_models) leverage <- eempf_leverage(pf4[[1]]) lev_data <- eempf_leverage_data(leverage)
Plot is interactive where you can select values with your mouse. A list of vectors is returned to remove this outliers in a further step from your samples. The labels to be shown can be selected by adding the quatile of samples with highest leverages to be labeled.
eempf_leverage_ident(cpl, qlabel = 0.1)
eempf_leverage_ident(cpl, qlabel = 0.1)
cpl |
leverage, outpout from |
qlabel |
optional, quantile of which labels are shown (1 = all, 0 = no labels) |
list of three vectors containing the names of selected samples
data(pf_models) leverage <- eempf_leverage(pf4[[1]]) outliers <- eempf_leverage_ident(leverage)
data(pf_models) leverage <- eempf_leverage(pf4[[1]]) outliers <- eempf_leverage_ident(leverage)
The labels to be shown can be selected by adding the quatile of samples with highest leverages to be labeled.
eempf_leverage_plot(cpl, qlabel = 0.1)
eempf_leverage_plot(cpl, qlabel = 0.1)
cpl |
leverage, outpout from |
qlabel |
optional, quantile of which labels are shown (1 = all, 0 = no labels) |
ggplot
data(pf_models) leverage <- eempf_leverage(pf4[[1]]) eempf_leverage_plot(leverage)
data(pf_models) leverage <- eempf_leverage(pf4[[1]]) eempf_leverage_plot(leverage)
Plot amount of each component in each sample as bar plot
eempf_load_plot(pfmodel)
eempf_load_plot(pfmodel)
pfmodel |
parafac model |
ggplot
data(pf_models) eempf_load_plot(pf4[[1]])
data(pf_models) eempf_load_plot(pf4[[1]])
Calculate the leverage of each emission and excitation wavelength and each sample from a list of PARAFAC models
eempf_mleverage(pfres_comps, ecdf = FALSE, stats = FALSE)
eempf_mleverage(pfres_comps, ecdf = FALSE, stats = FALSE)
pfres_comps |
object of class parafac |
ecdf |
logical, transforme leverages to according empirical quantiles ( |
stats |
logical, whether means and standard deviations are calculated from leverages |
data frame containing leverages of wavelengths and samples for each model
data(pf_models) eempf_mleverage(pf3)
data(pf_models) eempf_mleverage(pf3)
This function uploads a PARAFAC model to openfluor.org from within R. You need to have an account at openfluor.org and supply the email used for the account to the function. Your password is then asked in a secure way and only used within one execution of this function.
eempf_OF_upload(email, file)
eempf_OF_upload(email, file)
email |
email address you use to login at openfluor.org as string |
file |
the file containing a PARAFAC model in openfluor format |
HTTP status code from the upload POST
## due to the need of a valid account, this function cannot be ## tested with generic data. ## Please use your own account to do so. ## Not run: data(pf_models) file <- file.path(tempdir(),"openfluor_example.txt") eempf_openfluor(pf4[[1]],file) eempf_OF_upload("[email protected]", file) ## End(Not run)
## due to the need of a valid account, this function cannot be ## tested with generic data. ## Please use your own account to do so. ## Not run: data(pf_models) file <- file.path(tempdir(),"openfluor_example.txt") eempf_openfluor(pf4[[1]],file) eempf_OF_upload("[email protected]", file) ## End(Not run)
openfluor.org offers the possibility to compare your results to others, that were uploaded to the database. This functions writes out a txt containing the header lines and your components. Please open the file in an editor and fill in further information that cannot be covered by this function.
eempf_openfluor( pfmodel, file, Fmax = TRUE, upload = FALSE, email = NULL, model_details = list() )
eempf_openfluor( pfmodel, file, Fmax = TRUE, upload = FALSE, email = NULL, model_details = list() )
pfmodel |
PARAFAC model |
file |
string, path to outputfile. The directory must exist, the file will be created or overwritten if already present. |
Fmax |
rescale modes so the A mode shows the maximum fluorescence. As openfluor does not accept values above 1, this is a way of scaling the B and C modes to a range between 0 and 1. |
upload |
logical, whether model is directly uploaded to openfluor.org |
email |
optional email address to log into openfluor.org |
model_details |
optional named list with strings to be added in the openfluor file in the fields corresponding to the list names |
txt file
data(pf_models) model_details <- list(name = "River", creator = "Helena Glory", constraints = "non-negative", validation = "split-half", unit= "RU") eempf_openfluor(pf4[[1]],file.path(tempdir(),"openfluor_example.txt"), upload = FALSE, model_details = model_details)
data(pf_models) model_details <- list(name = "River", creator = "Helena Glory", constraints = "non-negative", validation = "split-half", unit= "RU") eempf_openfluor(pf4[[1]],file.path(tempdir(),"openfluor_example.txt"), upload = FALSE, model_details = model_details)
The components can be plottet in two ways: either as a colour map or as two lines (emission, excitation wavelengths) intersecting at the component maximum. If the list of provided models is named, these names are shown in the plot. Otherwise, the models are automatically named by "model#".
eempf_plot_comps( pfres, type = 1, names = TRUE, contour = FALSE, colpal = "default", ... )
eempf_plot_comps( pfres, type = 1, names = TRUE, contour = FALSE, colpal = "default", ... )
pfres |
list of PARAFAC models |
type |
1 for a colour map and 2 for em and ex wavelength loadings |
names |
logical, whether names of components should be written into the plot |
contour |
in case of 3 dimensional component plots, contours are added |
colpal |
"default" to use the viridis colour palette, "rainbow" to use a subset of the rainbow palette, any custom vector of colors or a colour palette. A gradient will be produced from this vector. Larger vectors (e.g. 50 elements) can produce smoother gradients. |
... |
arguments passed on to other functions, e.g. |
object of class ggplot
data(pf_models) eempf_plot_comps(pf4, type = 1) # use a different colour scheme: # eempf_plot_comps(pf4, type = 1, colpal = heat.colors(50)) eempf_plot_comps(pf4, type = 2) eempf_plot_comps(list(pf4[[1]],pf4[[1]]), type=1)
data(pf_models) eempf_plot_comps(pf4, type = 1) # use a different colour scheme: # eempf_plot_comps(pf4, type = 1, colpal = heat.colors(50)) eempf_plot_comps(pf4, type = 2) eempf_plot_comps(list(pf4[[1]],pf4[[1]]), type=1)
Plot results from an SSC check
eempf_plot_ssccheck(ssccheck)
eempf_plot_ssccheck(ssccheck)
ssccheck |
outpout from |
ggplot element
data(pf_models) ssccheck <- eempf_ssccheck(pfmodels = pf3[1:3], cores = 2) eempf_plot_ssccheck(ssccheck)
data(pf_models) ssccheck <- eempf_ssccheck(pfmodels = pf3[1:3], cores = 2) eempf_plot_ssccheck(ssccheck)
Reorder PARAFAC components
eempf_reorder(pfmodel, order, decreasing = FALSE)
eempf_reorder(pfmodel, order, decreasing = FALSE)
pfmodel |
model of class parafac |
order |
vector containing desired new order or "em" or "ex" to reorder according to emission or excitation wavelengths of the peaks |
decreasing |
logical, whether components are reordered according to peak wvalengths in a decreasing direction |
parafac model
data(pf_models) ggeem(pf4[[1]]) pf4r <- eempf_reorder(pf4[[1]], "ex") ggeem(pf4r)
data(pf_models) ggeem(pf4[[1]]) pf4r <- eempf_reorder(pf4[[1]], "ex") ggeem(pf4r)
Create a html report of a PARAFAC analysis
eempf_report( pfmodel, export, eem_list = NULL, absorbance = NULL, meta = NULL, metacolumns = NULL, splithalf = FALSE, shmodel = NULL, performance = FALSE, residuals = FALSE, spp = 5, cores = parallel::detectCores(logical = FALSE), ... )
eempf_report( pfmodel, export, eem_list = NULL, absorbance = NULL, meta = NULL, metacolumns = NULL, splithalf = FALSE, shmodel = NULL, performance = FALSE, residuals = FALSE, spp = 5, cores = parallel::detectCores(logical = FALSE), ... )
pfmodel |
PARAFAC model |
export |
path to exported html file |
eem_list |
optional EEM data |
absorbance |
optional absorbance data |
meta |
optional meta data table |
metacolumns |
optional column names of metadata table |
splithalf |
optional logical, states whether split-half analysis should be included |
shmodel |
optional results from split-half analysis. If this data is not supplied but EEM data is available the split-half analysis is calculated on the creation of the report. Calculating the split-half analysis takes some time! |
performance |
calculating model performance: |
residuals |
logical, whether residuals are plotted in the report |
spp |
plots per page for loadgins and residuals plot |
cores |
cores to be used for the calculation |
... |
arguments to or from other functions |
TRUE if report was created
folder <- system.file("extdata/EEMs", package = "staRdom") # load example data eem_list <- eem_read(folder, recursive = TRUE, import_function = eem_csv) abs_folder <- system.file("extdata/absorbance", package = "staRdom") # load example data absorbance <- absorbance_read(abs_folder, cores = 2) metatable <- system.file("extdata/metatable_dreem.csv",package = "staRdom") meta <- read.table(metatable, header = TRUE, sep = ",", dec = ".", row.names = 1) checked <- eem_checkdata(eem_list, absorbance, metadata = meta, metacolumns = "dilution", error = FALSE) eem_names(eem_list) pfm <- A_missing(eem_list,pf4[[1]], cores = 2) eempf_report(pfm, export = file.path(tempdir(),"pf_report.html"), eem_list = eem_list, absorbance = absorbance, meta = metatable, metacolumns = "dilution", cores = 2)
folder <- system.file("extdata/EEMs", package = "staRdom") # load example data eem_list <- eem_read(folder, recursive = TRUE, import_function = eem_csv) abs_folder <- system.file("extdata/absorbance", package = "staRdom") # load example data absorbance <- absorbance_read(abs_folder, cores = 2) metatable <- system.file("extdata/metatable_dreem.csv",package = "staRdom") meta <- read.table(metatable, header = TRUE, sep = ",", dec = ".", row.names = 1) checked <- eem_checkdata(eem_list, absorbance, metadata = meta, metacolumns = "dilution", error = FALSE) eem_names(eem_list) pfm <- A_missing(eem_list,pf4[[1]], cores = 2) eempf_report(pfm, export = file.path(tempdir(),"pf_report.html"), eem_list = eem_list, absorbance = absorbance, meta = metatable, metacolumns = "dilution", cores = 2)
B and C modes (emission and excitation wavelengths) are rescaled to RMS of value newscale. This is compensated in A mode (sample loadings).
eempf_rescaleBC(pfmodel, newscale = "Fmax")
eempf_rescaleBC(pfmodel, newscale = "Fmax")
pfmodel |
object of class parafac |
newscale |
If (default) newscale = "Fmax", each component will be scaled so the maximum of each component is 1. It is also possible to set a desired root mean-square for each column of the rescaled mode. Can input a scalar or a vector with length equal to the number of factors for the given mode. |
object of class parafac
data(pf_models) new_pf <- eempf_rescaleBC(pf4[[1]])
data(pf_models) new_pf <- eempf_rescaleBC(pf4[[1]])
Calculate residuals of EEM data according to a certain model
eempf_residuals( pfmodel, eem_list, select = NULL, cores = parallel::detectCores(logical = FALSE)/2 )
eempf_residuals( pfmodel, eem_list, select = NULL, cores = parallel::detectCores(logical = FALSE)/2 )
pfmodel |
PARAFAC model of class parafac |
eem_list |
eemlist containing EEM data |
select |
character vector containing the names of the desired samples |
cores |
number of cores to use for parallel processing |
data frame with EEM residuals
data(eem_list) data(pf_models) residuals <- eempf_residuals(pf4[[1]], eem_list, cores = 2)
data(eem_list) data(pf_models) residuals <- eempf_residuals(pf4[[1]], eem_list, cores = 2)
The metrics calculated with this function are:
RSS: residual sum of squares
MAE: mean absolute error
SAE: sum of absolute errors
RSAE: sum of absolute error in relation to the sum of fluorescence and
LEV: the leverage as described in eempf_leverage
The example contains a way to plot these numbers.
eempf_residuals_metrics(residuals, leverage)
eempf_residuals_metrics(residuals, leverage)
residuals |
data.frame as derived from |
leverage |
list of data.frames as derived from |
a list of data.frames containing residuals metrics for each sample, emission and excitation wavelength
data(eem_list) data(pf_models) residuals <- eempf_residuals(pf4[[1]], eem_list, cores = 2) leverage <- eempf_leverage(pf4[[1]]) metrics <- eempf_residuals_metrics(residuals, leverage) metrics$sample ## plot different residual metrics require(dplyr) require(tidyr) require(ggplot2) lapply(names(metrics), function(name){ metrics[[name]] %>% mutate(mode = name, element = !!sym(name)) }) %>% bind_rows() %>% pivot_longer(cols = RSS:LEV, names_to = "metric", values_to = "value") %>% # uncomment the following line to select certain metrics # filter(metric %in% c("RSS","LEV")) %>% ggplot(aes(x = element, y = value, colour = metric))+ geom_point()+ facet_wrap(mode ~ ., ncol = 3, scales = "free")+ theme(axis.text.x = element_text(angle = 90))+ scale_y_continuous(trans="log")
data(eem_list) data(pf_models) residuals <- eempf_residuals(pf4[[1]], eem_list, cores = 2) leverage <- eempf_leverage(pf4[[1]]) metrics <- eempf_residuals_metrics(residuals, leverage) metrics$sample ## plot different residual metrics require(dplyr) require(tidyr) require(ggplot2) lapply(names(metrics), function(name){ metrics[[name]] %>% mutate(mode = name, element = !!sym(name)) }) %>% bind_rows() %>% pivot_longer(cols = RSS:LEV, names_to = "metric", values_to = "value") %>% # uncomment the following line to select certain metrics # filter(metric %in% c("RSS","LEV")) %>% ggplot(aes(x = element, y = value, colour = metric))+ geom_point()+ facet_wrap(mode ~ ., ncol = 3, scales = "free")+ theme(axis.text.x = element_text(angle = 90))+ scale_y_continuous(trans="log")
A raster of plots is created. Each column shows one sample. The top n rows show the n components from the model according their occurance in the certain samples. The second last row shows the residual, not covered by any component in the model and the last row shows the whole sample.
eempf_residuals_plot( pfmodel, eem_list, res_data = NULL, spp = 5, select = NULL, residuals_only = FALSE, cores = parallel::detectCores(logical = FALSE), contour = FALSE, colpal = "default" )
eempf_residuals_plot( pfmodel, eem_list, res_data = NULL, spp = 5, select = NULL, residuals_only = FALSE, cores = parallel::detectCores(logical = FALSE), contour = FALSE, colpal = "default" )
pfmodel |
object of class parafac containing the generated model |
eem_list |
object of class eemlist with all the samples that should be plotted |
res_data |
optional, data of sample residuals related to the model, output from |
spp |
optional, samples per plot |
select |
optional, character vector of samples you want to plot |
residuals_only |
plot only residuals |
cores |
number of cores to use for parallel processing |
contour |
logical, states whether contours should be plotted |
colpal |
"default" to use the viridis colour palette, "rainbow" to use a subset of the rainbow palette, any custom vector of colors or a colour palette. A gradient will be produced from this vector. Larger vectors (e.g. 50 elements) can produce smoother gradients. |
eem_list may contain samples not used for modelling. Calculation is done by A_missing
. This especially interesting if outliers are excluded prior modelling and should be evaluated again afterwards.
Usually, residuals contain negative values, while these is the exception in samples and PARAFAC components. Therefore, we decided to use a similar colour palette as in the other plot functions but adding a purple tone for negative values.
several ggplot objects
data(eem_list) data(pf_models) eem_list <- eem_extract(eem_list, 1:10) eem_list <- eem_rem_scat(eem_list, rep(TRUE, 4), c(15,10,16,12)) eempf_residuals_plot(pf4[[1]], eem_list, cores = 2) # use other colour schemes: # eempf_residuals_plot(pf4[[1]], eem_list, colpal = c("blue",heat.colors(50))) # plots <- eempf_residuals_plot(pf4[[1]], eem_list) # lapply(plots, function(pl){ # pl + # scale_fill_viridis_c() + # scale_colour_viridis_c() # })
data(eem_list) data(pf_models) eem_list <- eem_extract(eem_list, 1:10) eem_list <- eem_rem_scat(eem_list, rep(TRUE, 4), c(15,10,16,12)) eempf_residuals_plot(pf4[[1]], eem_list, cores = 2) # use other colour schemes: # eempf_residuals_plot(pf4[[1]], eem_list, colpal = c("blue",heat.colors(50))) # plots <- eempf_residuals_plot(pf4[[1]], eem_list) # lapply(plots, function(pl){ # pl + # scale_fill_viridis_c() + # scale_colour_viridis_c() # })
The data variable pf_models can be supplied as list of PARAFAC models, output from a splithalf analysis or list of matrices Please see details of calculation in: U.J. Wünsch, R. Bro, C.A. Stedmon, P. Wenig, K.R. Murphy, Emerging patterns in the global distribution of dissolved matter fluorescence, Anal. Methods, 11 (2019), pp. 888-893
eempf_ssc( pfmodels, tcc = FALSE, m = FALSE, cores = parallel::detectCores(logical = FALSE) )
eempf_ssc( pfmodels, tcc = FALSE, m = FALSE, cores = parallel::detectCores(logical = FALSE) )
pfmodels |
list of either PARAFAC models or component matrices |
tcc |
if set TRUE, TCC is returned instead |
m |
logical, if TRUE, emission and excitation SSCs or TCCs are combined by calculating the geometric mean |
cores |
number of CPU cores to be used |
(list of) tables containing SCCs between components
pf_models <- pf3[1:3] sscs <- eempf_ssc(pf_models, cores = 2) sscs tcc <- eempf_ssc(pf_models, tcc = TRUE, cores = 2) tcc ## mixed tcc (combine em and ex) mtcc <- eempf_ssc(pf_models, tcc = TRUE, m = TRUE, cores = 2) mtcc ## compare results from splithalf analysis sh_sscs <- eempf_ssc(sh, cores = 2) sh_sscs ## view diagonals only (components with similar numbers only) lapply(sh_sscs, lapply, diag)
pf_models <- pf3[1:3] sscs <- eempf_ssc(pf_models, cores = 2) sscs tcc <- eempf_ssc(pf_models, tcc = TRUE, cores = 2) tcc ## mixed tcc (combine em and ex) mtcc <- eempf_ssc(pf_models, tcc = TRUE, m = TRUE, cores = 2) mtcc ## compare results from splithalf analysis sh_sscs <- eempf_ssc(sh, cores = 2) sh_sscs ## view diagonals only (components with similar numbers only) lapply(sh_sscs, lapply, diag)
Check SSCs between different models or initialisations of one model
eempf_ssccheck( pfmodels, best = length(pfmodels), tcc = FALSE, cores = parallel::detectCores(logical = FALSE) )
eempf_ssccheck( pfmodels, best = length(pfmodels), tcc = FALSE, cores = parallel::detectCores(logical = FALSE) )
pfmodels |
list of parafac models |
best |
number of models with the highest R^2 to be used, default is all models |
tcc |
logical, if TRUE, TCC instead of SSC is calculated |
cores |
number of CPU cores to be used |
data.frame containing SSCs
data(pf_models) eempf_ssccheck(pf3[1:2], cores = 2) # SSCs of split-half models, models need to be unlisted data(sh) eempf_ssccheck(unlist(sh, recursive = FALSE), cores = 2)
data(pf_models) eempf_ssccheck(pf3[1:2], cores = 2) # SSCs of split-half models, models need to be unlisted data(sh) eempf_ssccheck(unlist(sh, recursive = FALSE), cores = 2)
Calculate the importance of each component.
eempf_varimp( pfmodel, eem_list, cores = parallel::detectCores(logical = FALSE), ... )
eempf_varimp( pfmodel, eem_list, cores = parallel::detectCores(logical = FALSE), ... )
pfmodel |
model of class parafac |
eem_list |
eemlist used to calculate that model |
cores |
cores to be used for the calculation |
... |
other aruments passed to eem_parafac |
The importance of each variable is calculated by means of creating a model without a specific component and calculating the difference between the original R-squared and the one with the left out component. The derived values state the loss in model fit if one component is not used in the modeling process. For the creation of the new models, the exact components of the original model are used.
numeric vector, values are in the same order of the components in the supplied model.
data(pfmodel) data(eem_list) eempf_varimp(pf4[[1]], eem_list, cores = 2)
data(pfmodel) data(eem_list) eempf_varimp(pf4[[1]], eem_list, cores = 2)
Please refer to eem_biological_index
, eem_coble_peaks
, eem_fluorescence_index
, eem_biological_index
and abs_parms
for details on the certain values
eempf4analysis( pfmodel, eem_list = NULL, absorbance = NULL, cuvl = NULL, n = 4, export = NULL, cores = parallel::detectCores(logical = FALSE), ... )
eempf4analysis( pfmodel, eem_list = NULL, absorbance = NULL, cuvl = NULL, n = 4, export = NULL, cores = parallel::detectCores(logical = FALSE), ... )
pfmodel |
PARAFAC model where loadings of the components are extracted |
eem_list |
optional eemlist used for peak and indices calculation |
absorbance |
optional absorbance table used for absorbance slope parameter calculation |
cuvl |
optional cuvette length of absorbance data in cm |
n |
optional size of moving window in nm for data smoothing in advance of peak picking |
export |
optional file path of csv or txt table where data is exported |
cores |
number of parallel calculations (e.g. number of physical cores in CPU) |
... |
additional parameters passed to |
data frame
data(eem_list) data(pf_models) results <- eempf4analysis(pfmodel = pf4[[1]], eem_list = eem_list, cuvl = 5, n = 4, cores = 2)
data(eem_list) data(pf_models) results <- eempf4analysis(pfmodel = pf4[[1]], eem_list = eem_list, cuvl = 5, n = 4, cores = 2)
Plots from EEM spectra of class ggplot
. In case you work with a larger number of EEMs and want to show then in several plots, you can use eem_overview_plot
.
ggeem(data, fill_max = FALSE, ...) ## Default S3 method: ggeem(data, fill_max = FALSE, ...) ## S3 method for class 'eemlist' ggeem(data, fill_max = FALSE, eemlist_order = TRUE, ...) ## S3 method for class 'eem' ggeem(data, fill_max = FALSE, ...) ## S3 method for class 'parafac' ggeem(data, fill_max = FALSE, ...) ## S3 method for class 'data.frame' ggeem( data, fill_max = FALSE, colpal = "default", contour = FALSE, interpolate = FALSE, redneg = NULL, ... )
ggeem(data, fill_max = FALSE, ...) ## Default S3 method: ggeem(data, fill_max = FALSE, ...) ## S3 method for class 'eemlist' ggeem(data, fill_max = FALSE, eemlist_order = TRUE, ...) ## S3 method for class 'eem' ggeem(data, fill_max = FALSE, ...) ## S3 method for class 'parafac' ggeem(data, fill_max = FALSE, ...) ## S3 method for class 'data.frame' ggeem( data, fill_max = FALSE, colpal = "default", contour = FALSE, interpolate = FALSE, redneg = NULL, ... )
data |
eem, eemlist, parafac or data.frame. The details are given under 'Details'. |
fill_max |
sets the maximum fluorescence value for the colour scale. This is mainly used by other functions, and makes different plots visually comparable. |
... |
parameters passed on to |
eemlist_order |
logical, in case of an eemlist, the order of samples in the plot is the same as in the eemlist, alphabetically otherwise |
colpal |
"default" to use the viridis colour palette, "rainbow" to use a subset of the rainbow palette, any custom vector of colors or a colour palette. A gradient will be produced from this vector. Larger vectors (e.g. 50 elements) can produce smoother gradients. |
contour |
logical, whether contours should be plotted (default FALSE), see |
interpolate |
logical, whether fluorescence should be interpolated, see |
redneg |
deprecated! logical, whether negative values should be coloured discreet. |
The data can be of different sources:
eem: a single EEM spectrum is plotted
eemlist: all spectra of the samples are plotted, arranged in a grid
data.frame: a data.frame containing EEM data. Can be created by e.g. as.data.frame.eem
parafac: a PARAFAC model, the components are plotted then.
Using redneg you can give negative values a reddish colour. This can help identifying these parts in samples or components. Negative values are physically not possible and can only be the result of measuring errors, model deviations and problems with interpolated values.
Interpolation (interpolate = TRUE) leeds to smoother plots. The default is FALSE because it might cover small scale inconsistencies.
Contours (contour = TRUE)can be added to the EEM plots.
A colour palette can be specified using the argument colpal.
Plotting distinct samples can be done using eem_extract
. Please see example.
a ggplot object
## plotting two distinct samples data(eem_list) eem_names(eem_list) eem <- eem_extract(eem_list,c("^d667sf$", "^d661sf$"),keep=TRUE) ggeem(eem) # the former redneg argument is deprecated, please see a similar looking example below! #ggeem(eem, redneg = TRUE) ggeem(eem, colpal = c(rainbow(75)[58],rainbow(75)[53:1])) # use any custom colour palette ggeem(eem, colpal = heat.colors(50)) # needs package matlab to be installed: # ggeem(eem, colpal = matlab::jet.colors(50)) # or by adding ggplot2 colour and fill functions: # ggeem(eem)+ # scale_fill_viridis_c()+ # scale_color_viridis_c() ggeem(eem, interpolate = TRUE) ggeem(eem, contour = TRUE)
## plotting two distinct samples data(eem_list) eem_names(eem_list) eem <- eem_extract(eem_list,c("^d667sf$", "^d661sf$"),keep=TRUE) ggeem(eem) # the former redneg argument is deprecated, please see a similar looking example below! #ggeem(eem, redneg = TRUE) ggeem(eem, colpal = c(rainbow(75)[58],rainbow(75)[53:1])) # use any custom colour palette ggeem(eem, colpal = heat.colors(50)) # needs package matlab to be installed: # ggeem(eem, colpal = matlab::jet.colors(50)) # or by adding ggplot2 colour and fill functions: # ggeem(eem)+ # scale_fill_viridis_c()+ # scale_color_viridis_c() ggeem(eem, interpolate = TRUE) ggeem(eem, contour = TRUE)
Full join of a list of data frames.
list_join(df_list, by)
list_join(df_list, by)
df_list |
list of data frames to by joined |
by |
character vector containing information how to join data frames. Format to be according to by in |
The joint data frame.
a <- data.frame(what=letters[1:5],a=c(1:5)) b <- data.frame(what=letters[1:5],b=c(7:11)) c <- data.frame(what=letters[1:5],c=c(20:24)) df_list <- list(a,b,c) list_join(df_list,by="what")
a <- data.frame(what=letters[1:5],a=c(1:5)) b <- data.frame(what=letters[1:5],b=c(7:11)) c <- data.frame(what=letters[1:5],c=c(20:24)) df_list <- list(a,b,c) list_join(df_list,by="what")
Data for each wavelengths is returned. For each component the lines intersecting at the component maxima are returned.
maxlines(pfmodel)
maxlines(pfmodel)
pfmodel |
object of class parafac |
data frame
data(pf_models) ml <- maxlines(pf4[[1]])
data(pf_models) ml <- maxlines(pf4[[1]])
Normalise 3-dimensional array in first and second dimension
norm_array(eem_array)
norm_array(eem_array)
eem_array |
3-dimensional array |
array
data(eem_list) a <- eem2array(eem_list) an <- norm_array(a)
data(eem_list) a <- eem2array(eem_list) an <- norm_array(a)
Factors used for normalisation are saved separately in the PARAFAC models. With this function, the normalisation factors are combined with the A-modes of the model and removed as a separate vector. This means former normalisation is accounted for in the amount of each component in each sample. If no normalisation was done, the original model is returned without warning.
norm2A(pfmodel)
norm2A(pfmodel)
pfmodel |
object of class parafac |
object of class parafac
data(pf_models) pf4[[1]] <- norm2A(pf4[[1]])
data(pf_models) pf4[[1]] <- norm2A(pf4[[1]])
parafac
.Please refer to parafac
for input parameters and details. This wrapper function ensures 'nstart' converging models are calculated. On the contrary, parafac calculates 'nstart' models regardless if they are converging.
parafac_conv( X, nstart, verbose = FALSE, output = c("best", "all"), cl = NULL, ... )
parafac_conv( X, nstart, verbose = FALSE, output = c("best", "all"), cl = NULL, ... )
X |
array |
nstart |
number of converging models to calculate |
verbose |
logical, whether more information is supplied |
output |
Output the best solution (default) or output all nstart solutions. |
cl |
cluster to be used for parallel processing |
... |
arguments passed on to |
either a parafac model or a list of parafac models
data(eem_list) dim_min <- 3 # minimum number of components dim_max <- 4 # maximum number of components nstart <- 25 # random starts for PARAFAC analysis, models built simulanuously, best selected # cores <- parallel::detectCores(logical=FALSE) # use all cores but do not use all threads cores <- 2 # package checks only run with 2 cores maxit = 2500 ctol <- 10^-7 # tolerance for parafac pfres_comps <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), normalise = TRUE, strictly_converging = TRUE, maxit = maxit, nstart = nstart, ctol = ctol, cores = cores) # keep all calculated models for diagnostics pfres_comps_all <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), normalise = TRUE, strictly_converging = TRUE, output = "all", maxit = maxit, nstart = nstart, ctol = ctol, cores = cores)
data(eem_list) dim_min <- 3 # minimum number of components dim_max <- 4 # maximum number of components nstart <- 25 # random starts for PARAFAC analysis, models built simulanuously, best selected # cores <- parallel::detectCores(logical=FALSE) # use all cores but do not use all threads cores <- 2 # package checks only run with 2 cores maxit = 2500 ctol <- 10^-7 # tolerance for parafac pfres_comps <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), normalise = TRUE, strictly_converging = TRUE, maxit = maxit, nstart = nstart, ctol = ctol, cores = cores) # keep all calculated models for diagnostics pfres_comps_all <- eem_parafac(eem_list, comps = seq(dim_min, dim_max), normalise = TRUE, strictly_converging = TRUE, output = "all", maxit = maxit, nstart = nstart, ctol = ctol, cores = cores)
PARAFAC model, see vignette, unconstrained
pf1
pf1
list of parafacs
PARAFAC model, see vignette, non-negative constraints
pf1n
pf1n
list of parafacs
PARAFAC model, see vignette, non-negative constraints, normalised
pf2
pf2
list of parafacs
PARAFAC model, see vignette, non-negative constraints, normalised, outliers removed
pf3
pf3
list of parafacs
PARAFAC model, see vignette, non-negative constraints, normalised, outliers removed, high accuarcy
pf4
pf4
list of parafacs
result from PARAFAC split-half analysis, periodic data split
sh
sh
list of parafacs
The samples are split into four subsamples: A,B,C,D. Subsamples are then combined and compared: AB vs. CD, AC vs. BD, AD vs. BC. The results show graphs from the components of each of the 6 models.
splithalf( eem_list, comps, splits = NA, rand = FALSE, normalise = TRUE, nstart = 20, cores = parallel::detectCores(logical = FALSE), maxit = 2500, ctol = 10^(-7), rescale = TRUE, strictly_converging = FALSE, verbose = FALSE, ... )
splithalf( eem_list, comps, splits = NA, rand = FALSE, normalise = TRUE, nstart = 20, cores = parallel::detectCores(logical = FALSE), maxit = 2500, ctol = 10^(-7), rescale = TRUE, strictly_converging = FALSE, verbose = FALSE, ... )
eem_list |
eemlist containing sample data |
comps |
number of desired components |
splits |
optional, list of 4 numerical vectors containing the sample numbers for A,B,C and D sample subsets |
rand |
logical, splits are randomised |
normalise |
state whether EEM data should be normalised in advance |
nstart |
number of random starts |
cores |
number of parallel calculations (e.g. number of physical cores in CPU) |
maxit |
maximum iterations for PARAFAC algorithm |
ctol |
Convergence tolerance (R^2 change) |
rescale |
rescale splithalf models to Fmax, see |
strictly_converging |
calculate nstart converging models and take the best. Please see |
verbose |
states whether you want additional information during calculation |
... |
additional parameters that are passed on to |
Split data sets can be split suboptimal and cause low TCCs. Therefore, subsamples are recombined in 3 different ways and a TCC close to 1 in only one split combination per component is already a positive result. Check the split sets to check for sample independency.
data frame containing components of the splithalf models
data(eem_list) splithalf <- splithalf(eem_list, comps = 6, verbose = TRUE, cores = 2) splithalf_plot(splithalf) # Similarity of splits using SSCs sscs <- splithalf_tcc(splithalf)
data(eem_list) splithalf <- splithalf(eem_list, comps = 6, verbose = TRUE, cores = 2) splithalf_plot(splithalf) # Similarity of splits using SSCs sscs <- splithalf_tcc(splithalf)
Graphs of all components of all models are plotted to be compared.
splithalf_plot(fits)
splithalf_plot(fits)
fits |
list of components data |
ggplot
data(sh) splithalf_plot(sh) str(sh)
data(sh) splithalf_plot(sh) str(sh)
Extracting a list of sample names in each subsample from a splithalf analysis
splithalf_splits(fits)
splithalf_splits(fits)
fits |
list of parafac models (from a splithalf analysis) |
data frame containing TCC values
data(sh) splithalf_splits(sh)
data(sh) splithalf_splits(sh)
Extracting TCC values from a splithalf analysis
splithalf_tcc(fits)
splithalf_tcc(fits)
fits |
list of parafac models (from a splithalf analysis) |
data frame containing TCC values
data(sh) splithalf_tcc(sh)
data(sh) splithalf_tcc(sh)
Please see details in: U.J. Wünsch, R. Bro, C.A. Stedmon, P. Wenig, K.R. Murphy, Emerging patterns in the global distribution of dissolved matter fluorescence, Anal. Methods, 11 (2019), pp. 888-893
ssc(mat1, mat2, tcc = FALSE)
ssc(mat1, mat2, tcc = FALSE)
mat1 |
matrix |
mat2 |
matrix |
tcc |
if set TRUE, TCC is returned instead |
table containing pairwise SCC of matrices columns
pf_models <- pf3 mat1 <- pf_models[[1]][[2]] mat2 <- pf_models[[2]][[2]] ## calculate SSC ssc(mat1,mat2) ## calculate TCC ssc(mat1,mat2, tcc = TRUE)
pf_models <- pf3 mat1 <- pf_models[[1]][[2]] mat2 <- pf_models[[2]][[2]] ## calculate SSC ssc(mat1,mat2) ## calculate TCC ssc(mat1,mat2, tcc = TRUE)
Calculate the combination of components giving the maximum of geometric mean of TCCs
ssc_max(mat)
ssc_max(mat)
mat |
matrix |
vector with TCCs having the highest possible geometric mean
mat <- matrix(c(7,2,13,6,0,7,1,5,5), nrow = 3) mat sscs <- ssc_max(mat) sscs # order of components: attr(sscs,"order")
mat <- matrix(c(7,2,13,6,0,7,1,5,5), nrow = 3) mat sscs <- ssc_max(mat) sscs # order of components: attr(sscs,"order")
Componets must be passed as modes, see maxlines
tcc(maxl_table, na.action = "na.omit")
tcc(maxl_table, na.action = "na.omit")
maxl_table |
data frame containing the peak lines of components |
na.action |
if "na.omit" NA are deleted from prior the test |
data.frame containing the TCCs
data(pf_models) ml <- maxlines(pf4[[1]]) tcc(ml)
data(pf_models) ml <- maxlines(pf4[[1]]) tcc(ml)
When running a splithalf analysis similar components are not necessarily on the same position. This function looks for best fits with Tucker's Congruence Coefficients and returns a list of models with reordered components.
tcc_find_pairs(fits)
tcc_find_pairs(fits)
fits |
list of parafac models |
list of parafac models
data(eem_list) # function currently only used from within splithalf splithalf(eem_list, 6, nstart = 2, cores = 2)
data(eem_list) # function currently only used from within splithalf splithalf(eem_list, 6, nstart = 2, cores = 2)