1. data processing

# Example: load a dataset included in the package

# ring measurement
dt.samples <- fread(system.file("extdata", "dt.samples.csv", package = "growthTrendR"))

# formatting the users' data conformed to CFS-TRenD data structure
dt.samples_trt <- CFS_format(data = list(dt.samples, 39:68), usage = 1, out.csv = NULL)
class(dt.samples_trt)
#> [1] "cfs_format"

# save it to extdata for further use
# saveRDS(dt.samples_trt, "inst/extdata/dt.samples_trt.rds")


arguments of CFS_format() function:

data

All information should be provided in a single file in wide format, with metadata first, followed by the ring-width measurements (in mm).

The column names for the ring-width measurements can follow two formats to indicate the year of measurement: Directly as the year (e.g., 1980) or Prefixed with a character (e.g., X1980).

It is highly recommended that the ring-width measurement columns are ordered by year and consecutive in the dataset, as the column indices will be used as input for the function CFS_format().

data = list (dt.samples, 39:68)), the second item refers to the column indices

usage

If users intend to submit their data to the CFS-TRenD online repository, set usage = 1 in the function. This will enable the function to format the data structure and perform detailed checks, including column names, geographic coordinates, species, and other requirements to conform to the CFS-TRenD collection standards.

Otherwise, use usage = 2 to perform a reduced checking procedure, which still builds the CFS-TRenD structure but skips some of the detailed validations.

out.csv

if user wants to export the processed tables in csv format, specify the folder here. the default is NULL.

Note: Running the function CFS_format() is the first and mandatory step before using any other functions in the growthTrendR package. The data provided in this tutorial is already prepared to run the vignette; in practice, users may need to add or modify their own data based on the messages generated by the function.



2. generate data report:

The data report provides an overview of the tree ring data’s quality and characteristics at four levels: project, project-species, project-species-site, and project-species-radii, including the quality assessment at site and radii levels with the default parameters. More details on quality assessment will be presented next section.


generate_report(robj = dt.samples_trt, qa.label_data = "demo-samples ", data_report.reports_sel = c(1,2,3,4), qa.min_nseries = 5, scale.max_dist_km = 200, scale.N_nbs = 2)
arguments of the generate_report() function:

robj

The input for the data report is the output of the CFS_format() function, which assigns the class “cfs_format” to the resulting object.

qa.label_data

A short description of the input dataset. This text will appear in the report as the data source for the generated figures.

data_report.reports_sel

This argument specifies the level of data summaries to be included in the reports. Valid options are 1, 2, 3, or 4, each corresponding to one of the four available report types. In this tutorial, we demonstrate only the project–species level summary.

output_file

This argument allows users to export the HTML-formatted report to a specified location by providing a folder and filename (e.g., “path/to/report.html”). If left as NULL (default), the report will not be saved to disk and will instead open directly in the browser for viewing.


Data summary report



This report provides an overview of the tree ring data’s quality and characteristics at four levels:

1. project: Data Completeness: Assessment of missing or incomplete data of the whole data;
2. project-species: Data Summary: Summary statistics and descriptions;
3. project-species-site: data summary tables and series graphing; 
4. project-species-site-radii: Correlation Analysis and quality assessment. 


project name: Douglas-fir retrospective monitoring

selected reports: 1, 2, 3, 4


data completeness


This table presents the completeness of each variable of the whole dataset as a percentage. A value of 0 indicates no effective data. Please carefully verify that all required data has been included in the submission.

var pct
tr1_submission_id 100
tr1_project_name 100
tr1_description 100
tr1_year_range 100
tr1_reference 100
tr1_open_data 100
tr1_contact1 100
tr1_contact2 100
tr2_site_id 100
tr2_latitude 100
tr2_longitude 100
tr2_datasource 100
tr2_investigators 100
var pct
tr2_province_iso_code 100
tr3_tree_id 100
tr3_species 100
tr4_meas_no 0
tr4_meas_date 100
tr4_status 100
tr4_dbh_cm 100
tr4_ht_tot_m 100
tr5_sample_id 100
tr5_sample_type 100
tr5_sample_ht_m 0
tr5_sample_diameter_cm 0
var pct
tr6_radius_id 100.0
tr6_cofecha_id 0.0
tr6_ring_meas_method 100.0
tr6_crossdating_visual 100.0
tr6_crossdating_validation 100.0
tr6_age_corrected 0.0
tr6_bark_thickness_mm 0.0
tr6_radius_inside_cm 100.0
tr6_dtc_measured_mm 100.0
tr6_dtc_estimated_mm 22.2
tr6_rw_ystart 100.0
tr6_rw_yend 100.0
tr6_comments 0.0

Data Summary (Species)


This section presents key summary statistics, including spatial and temporal ranges, summary of ring width measurements, series length, etc., categorized by species.


In this dataset, there’s 1 species: PSEUMEN


*Number of series that passed the test of CFS_qa() on differentiated series

**The values refer to mean ± sd (min, max)


site-level data summary


This section presents site-level data summaries, including a figure showing ring width measurements over time and a table with key statistics.


PSEUMEN on ring width measurement


This table provides a site-level summary, including the series length, raw ring width measurements, the median ring width, the median ring width of its 2 closest neighbors, and the ratio between them. This offers insights into potential outliers caused by scaling issues.

PSEUMEN
site_id lon lat Nb. trees len.series** rw(mm)** rw.median_tgt rw.median_nbs ratio_median
X003b -123.9337 48.96656 3 30 ± 0 ( 30, 30 ) 1.5 ± 0.2 ( 0.6, 3.7 ) 1.38 2.52 0.55
X005c -124.7485 49.42990 3 30 ± 0 ( 30, 30 ) 4 ± 0.8 ( 1.4, 9 ) 3.84 1.80 2.13
X011c -125.5210 50.04691 3 30 ± 0 ( 30, 30 ) 2 ± 0.3 ( 0.9, 3.5 ) 1.97 2.15 0.92



*ratio between median of rw of the site and median of rw of its 2 nearest neighbors.

**The values refer to mean ± sd (min, max)




Correlation and quality assessment code


This table provides a summary for each series, including raw ring width measurements, autocorrelation, correlation with chronologies for both raw and differentiated data, and quality assessment code (qa_code) which was derived from the CFS_qa() function. The chronologies includes all the series with qa_code ‘pass’.


Description of qa_code
qa_code Description
pass The maximum correlation occurs at lag 0
borderline The correlation at lag 0 ranks as the second highest, and its difference from the maximum remains within a predefined threshold, categorizing as a quasi-pass
pm1 The maximum correlation occurs at lag 1 or -1, suggesting slight misalignment.
highpeak The maximum correlation occurs at a non-zero lag and is more than twice the second-highest value, potentially signaling an issue
fail All other measurements that do not fit into the aforementioned categories fall under this classification.


PSEUMEN
site_id radius_id year from year to len series raw AR1* raw corr_mean*& trt AR1** trt corr_mean**& qa_code**%
X003b X003_101_004 1991 2020 30 0.74 0.83 ( 0 ) -0.28 0.6 ( 0 ) pass
X003b X003_101_005 1991 2020 30 0.61 0.7 ( 0 ) -0.41 0.56 ( 0 ) pass
X003b X003_101_008 1991 2020 30 0.83 0.85 ( 0 ) -0.40 0.67 ( 0 ) pass
X005c X005_101_003 1991 2020 30 0.88 0.87 ( 0 ) 0.07 0.72 ( 0 ) pass
X005c X005_101_004 1991 2020 30 0.78 0.91 ( 0 ) -0.02 0.72 ( 0 ) pass
X005c X005_101_005 1991 2020 30 0.76 0.93 ( 0 ) -0.38 0.74 ( 0 ) pass
X011c X011_101_005 1991 2020 30 0.64 0.65 ( 0 ) -0.16 0.55 ( 0 ) pass
X011c X011_101_007 1991 2020 30 0.66 0.75 ( 0 ) -0.17 0.5 ( 0.01 ) pass
X011c X011_101_008 1991 2020 30 0.61 0.47 ( 0.01 ) 0.00 0.51 ( 0 ) pass



*developed from raw series

**developed from differentiated series

&correlation with chronologies, the value represents correlation (p-value)

%qa_code is identified using the current data as reference dataset