vgn1_data_processing.Rmd
# Example: load a dataset included in the package
# ring measurement
dt.samples <- fread(system.file("extdata", "dt.samples.csv", package = "growthTrendR"))
# formatting the users' data conformed to CFS-TRenD data structure
dt.samples_trt <- CFS_format(data = list(dt.samples, 39:68), usage = 1, out.csv = NULL)
class(dt.samples_trt)
#> [1] "cfs_format"
# save it to extdata for further use
# saveRDS(dt.samples_trt, "inst/extdata/dt.samples_trt.rds")data
All information should be provided in a single file in wide format, with metadata first, followed by the ring-width measurements (in mm).
The column names for the ring-width measurements can follow two formats to indicate the year of measurement: Directly as the year (e.g., 1980) or Prefixed with a character (e.g., X1980).
It is highly recommended that the ring-width measurement columns are ordered by year and consecutive in the dataset, as the column indices will be used as input for the function CFS_format().
data = list (dt.samples, 39:68)), the second item refers to the column indices
usage
If users intend to submit their data to the CFS-TRenD online repository, set usage = 1 in the function. This will enable the function to format the data structure and perform detailed checks, including column names, geographic coordinates, species, and other requirements to conform to the CFS-TRenD collection standards.
Otherwise, use usage = 2 to perform a reduced checking procedure, which still builds the CFS-TRenD structure but skips some of the detailed validations.
out.csv
if user wants to export the processed tables in csv format, specify the folder here. the default is NULL.
Note: Running the function CFS_format() is the first and mandatory step before using any other functions in the growthTrendR package. The data provided in this tutorial is already prepared to run the vignette; in practice, users may need to add or modify their own data based on the messages generated by the function.
The data report provides an overview of the tree ring data’s quality and characteristics at four levels: project, project-species, project-species-site, and project-species-radii, including the quality assessment at site and radii levels with the default parameters. More details on quality assessment will be presented next section.
generate_report(robj = dt.samples_trt, qa.label_data = "demo-samples ", data_report.reports_sel = c(1,2,3,4), qa.min_nseries = 5, scale.max_dist_km = 200, scale.N_nbs = 2)robj
The input for the data report is the output of the CFS_format() function, which assigns the class “cfs_format” to the resulting object.
qa.label_data
A short description of the input dataset. This text will appear in the report as the data source for the generated figures.
data_report.reports_sel
This argument specifies the level of data summaries to be included in the reports. Valid options are 1, 2, 3, or 4, each corresponding to one of the four available report types. In this tutorial, we demonstrate only the project–species level summary.
output_file
This argument allows users to export the HTML-formatted report to a specified location by providing a folder and filename (e.g., “path/to/report.html”). If left as NULL (default), the report will not be saved to disk and will instead open directly in the browser for viewing.
This report provides an overview of the tree ring data’s quality and characteristics at four levels:
1. project: Data Completeness: Assessment of missing or incomplete data of the whole data;
2. project-species: Data Summary: Summary statistics and descriptions;
3. project-species-site: data summary tables and series graphing;
4. project-species-site-radii: Correlation Analysis and quality assessment.
project name: Douglas-fir retrospective monitoring
selected reports: 1, 2, 3, 4
This table presents the completeness of each variable of the whole dataset as a percentage. A value of 0 indicates no effective data. Please carefully verify that all required data has been included in the submission.
| var | pct |
|---|---|
| var | pct |
|---|---|
| var | pct |
|---|---|
This section presents key summary statistics, including spatial and temporal ranges, summary of ring width measurements, series length, etc., categorized by species.
In this dataset, there’s 1 species: PSEUMEN
*Number of series that passed the test of CFS_qa() on differentiated
series
**The values refer to mean ± sd (min, max)
This section presents site-level data summaries, including a figure showing ring width measurements over time and a table with key statistics.
| PSEUMEN | ||||||||
*ratio between median of rw of the site and median of rw of
its 2 nearest neighbors.
**The values refer to mean ± sd (min, max)
This table provides a summary for each series, including raw ring width measurements, autocorrelation, correlation with chronologies for both raw and differentiated data, and quality assessment code (qa_code) which was derived from the CFS_qa() function. The chronologies includes all the series with qa_code ‘pass’.
| Description of qa_code | |
| qa_code | Description |
|---|---|
| pass | The maximum correlation occurs at lag 0 |
| borderline | The correlation at lag 0 ranks as the second highest, and its difference from the maximum remains within a predefined threshold, categorizing as a quasi-pass |
| pm1 | The maximum correlation occurs at lag 1 or -1, suggesting slight misalignment. |
| highpeak | The maximum correlation occurs at a non-zero lag and is more than twice the second-highest value, potentially signaling an issue |
| fail | All other measurements that do not fit into the aforementioned categories fall under this classification. |
| PSEUMEN | |||||||||
*developed from raw series
**developed from
differentiated series
&correlation with chronologies, the
value represents correlation (p-value)
%qa_code is identified
using the current data as reference dataset