Evaluation of the palliative symptom burden score (PSBS) in a specialised palliative care unit of a university medical centre - a longitudinal study

Background The implementation of standardised, valid and reliable measurements in palliative care is subject to practical and methodological challenges. One aspect of ongoing discussion is the value of systematic proxy-based assessment of symptom burden in palliative care. In 2011, an expert-developed proxy-based instrument for the assessment of symptom burden in palliative patients, the Palliative Symptom Burden Score (PSBS), was implemented at the Specialised Palliative Care Unit of the University Medical Centre in Dusseldorf, Germany. The present study investigated its feasibility, acceptance and psychometric properties. Methods The PSBS was rated by nursing staff three times a day over 5 years (N = 820 patients). Feasibility and nurses’ acceptance of PSBS were analysed. Structural validity was investigated by principal component analysis. Construct validity was examined via cross-validation with the Hospice and Palliative Care Evaluation checklist. Discriminative validity of the PSBS was analysed by means of Kruskal-Wallis test of patients’ performance score. Reliability of the PSBS was evaluated by internal consistency analysis, test-retest and split-half-reliability. Inter-rater reliability was investigated by observer agreement of nurses’ ratings of symptom burden within a day. Sensitivity to change was analysed by Wilcoxon test with repeated measures of the PSBS before and after palliative complex treatment. Results A high degree of acceptance and the feasibility of a high-frequency proxy-based symptom burden assessment approach were demonstrated. There were low rates of missing values and no indications of the adoption of prior ratings. PSBS in its present form demonstrates good structural and construct validity (rs = .27–.79, p’s < .001) and high sensitivity to changes in symptom burden (p’s < .01, except sweating), but unsatisfactory reliability (α = .41–.67; test-retest: rs = .30–.88; p’s < .001; split-half: rs = .69; p < .001; inter-rater: n.s.). Conclusions The study presents a framework for the post hoc validation of an already existing documentation tool in palliative care. This study supports the notion that PSBS might not be reflective of an overall construct and will therefore require further development and critical comparison to other already established symptom burden instruments in palliative care.


Background
Palliative care deals, by definition, with human beings in a very complex and difficult situation. This situation includes, for instance, multi-morbidity at a very late stage of treatment, with a multitude of different pharmacological treatments resulting in both physical and mental suffering. Consequently, patients in a palliative care setting exhibit a broad range of physical and psychological symptoms. The documentation of patients' symptom development at short intervals by means of standardised documentation systems can improve patient care, clinical decision-making, quality assurance and evaluation of treatment delivery.
Due to the identified need for standardised documentation instruments, the last few years have seen a growing interest in the measurement of symptom development in palliative care patients. Several national and international collaborations were founded to foster and harmonise research on this topic [European PRISMA-Group [1]. A recently published consensus-paper by the European Association for Palliative Care Task Force on outcome measurement highlights the critical importance of implementing standardised, psychometrically evaluated outcome measures into the daily clinical routine in specialised palliative care units (SPCU) [2].
Nevertheless, there are several practical and methodological challenges to consider when examining methods of documenting symptoms [3]. The first challenge is determining an adequate outcome measure for the end-of-life setting and the frequency of measurement. The numerous existing outcome measurement systems were not developed for palliative care populations, and most of them have not yet been validated [4]. Given the high temporal variability in patients' symptom presentation, there is no consensus on the appropriate frequency of symptom measurement [4,5]. Even though the self-report is considered the gold standard to obtain information on patient symptom burden [4], many palliative care patients are not able to complete questionnaires or answer questions due to fatigue, decreased alertness or delirium [6]. For other patients, a high-frequency self-assessment approach might constitute an undue burden. Additionally, being confronted with a terminal disease and the prospect of proceeding towards their personal death results in severe distress and a broad range of affective reactions, which in turn cause various coping and self defence mechanisms [7,8]. Those mechanisms might also lead to bias with regard to not reporting symptoms. Consequently, especially in the end-of-life setting, proxy-based symptom documentation appears to be a promising additional source of information to complement self-reported measurement instruments. Because high-frequency proxy-based symptom measurement approaches always entail an increased workload for hospital staff, the successful implementation of such an approach depends to a great degree on its practical feasibility in daily clinical routine and its acceptance by nurses and physicians [9].
Currently, only a few validated proxy-based assessment tools for symptom burden in palliative care patients are available in the German language: The Basic Documentation for Psycho-Oncology [10], the Hospice and Palliative Care Evaluation checklist [HOPE, [11] and the Edmonton Symptom Assessment System [ESAS, [12].
The Basic Documentation for Psycho-Oncology focuses on the psycho-social burden of cancer patients. HOPE measures the symptom burden of the previous three days of palliative patients. ESAS was originally developed as a self-assessment tool for symptom burden in cancer patients but is also used as a proxy-based assessment tool in some cases.
Nevertheless, a large number of SPCUs and hospices still use non-validated, unpublished, self-administered and non-expert-developed documentation tools to assess patients' symptom burden. To acquire additional knowledge regarding outcome measures in palliative care, it would be advisable to evaluate these instruments in terms of their psychometric validity and to share experiences in the implementation of such approaches with the scientific community in addition to palliative care practitioners.
One outcome instrument used within the interdisciplinary palliative care centre at the university hospital Duesseldorf (see methods for details) is the Palliative Symptom Burden Score [PSBS, [13], which measures physical and psychosomatic symptom burden. The PSBS items are alertness, confusion, restlessness/anxiety, sweating, weakness, nausea, vomiting, dyspnoea, coughing, pain and constipation. Symptom intensity is rated on a ten-point verbal rating scale three times a day by nursing staff. The data collected by PSBS were first reported in a study on high-dependency palliative care patients dying in a tertiary hospital inpatient unit [13]. To date, no studies have examined the psychometric properties of PSBS. The tool was heuristically developed by clinical experts in palliative care. The most frequent symptoms in palliative care patients according to experts' clinical impressions were included as items for the instrument (see Table 1); there has been no further psychometric validation.
In 2011, the SPCU of the University Medical Centre in Dusseldorf, Germany implemented a proxy-based measurement instrument embedded in the electronic patient record (EPR) for high-frequency assessments of symptom burden in palliative patients [14]. Physical and psychological symptom burden is now assessed by means of the PSBS [8]. To our knowledge, no studies have yet been conducted implementing a longitudinal proxy-rated, high-frequency assessment system into daily clinical care. Considering the importance of such an approach for the quality assurance of treatment delivery and clinical evaluation of therapy outcomes, the present paper intends to share empirical knowledge concerning the feasibility and acceptance of longitudinal high-frequency proxy-based assessments of symptom burden in palliative patients. Given the demand for reliable and valid instruments [2], this study reports data concerning the psychometric properties of the PSBS and presents a framework for the validation of expert-created tools in palliative care. Based on the experience gained during the implementation process, the paper also presents practical and useful recommendations for the development, implementation and evaluation of proxy-based assessments in SPCUs.

Study design
This study was an observational cohort study with a retrospective analysis of longitudinal data on symptom burden assessment in an inpatient palliative care setting. The study reporting follows the STROBE [15] guidelines for reporting observational cohort studies. This study was approved by the Ethics Committee of the Medical Faculty of Heinrich Heine University Dusseldorf, Germany (protocol number 5287, approved 09 November 2015).

Setting
The Interdisciplinary Centre for Palliative Medicine is a SPCU at a university hospital in an urban area of Germany. It offers inpatient palliative care treatment at a ward with 8 beds. Furthermore, there is a liaison service for inpatients at other hospitals. A detailed overview of the setting, i.e., the SPCU Dusseldorf, is presented in Table 1. The data for this study were only collected from inpatients of the SPCU (not from inpatients of other hospitals).

Implementation
High-frequency proxy-based symptom assessment by means of the PSBS was implemented in August 2011. To train nurses in the utilisation of the new documentation system and to foster their acceptance for the new approach, a training course was offered. Subsequently, a pilot phase was conducted in which the nurses were asked to evaluate the documentation system and share their experiences with it in daily clinical routine. As a result of the nurses' feedback, the user interface was adjusted to enhance its ease-of-use.

Dependent variables
The palliative symptom burden score The PSBS was developed as a high-frequency documentation tool for medical professionals to measure the symptom burden of palliative care patients. Symptom burden is rated three times daily, measuring the last 8 h. An overview of its 10 items and their assessment is given in Table 2. Symptom burden indicators were originally developed by an expert panel of two palliative care physicians and one senior palliative care nurse in a heuristic process including a narrative literature search and iterative discussion. The original development of the instrument took place in one specialised palliative care centre in Berlin during the pioneer phase of palliative medicine in the 1990s in Germany and did not follow traditional tool development guidelines. The final set of items used for the instrument was based on expert opinion and had not initially been tested in a pilot phase. Patients or carers were not involved in the development phase. The items in the PSBS represent the most common symptoms of patients in SPCUs as defined by the original expert panel: alertness, confusion, restlessness/anxiety, sweating, weakness, nausea, vomiting, dyspnoea, coughing, pain and constipation. Each symptom is measured with one item. The intensity of the symptom is rated by means of a five-point verbal rating scale ranging from zero points (no symptom burden) to five points (strong symptom burden). Pain was rated via a 10-point verbal rating scale ranging from zero points to ten points. For reasons of comparability to the other items and for the statistical analyses, it was converted into a 5-point verbal rating scale after data collection. Constipation was originally measured dichotomously, reported as yes or no. Consequently, this item was excluded from the PSBS because conversion into a 5-point verbal rating scale was not possible. In total, the PSBS consists of 11 items. Symptom assessment and operationalisation of the items are described in Table 2. Overall symptom burden is reflected by the sum of single items (Min = 0; Max = 44). Regarding component structure, it was proposed by the authors that the items alertness, confusion, restlessness/anxiety and weakness may constitute a component indicating a psychosomatic symptom complex subscale. In addition, nausea and vomiting were allocated to a gastrointestinal subscale and dyspnoea and cough to a respiratory subscale. There were no further groupings regarding pain, sweating and itching. The authors therefore expected six components of symptom burden.
Hope The symptom and problem checklist HOPE [11,16] consists of 16 items for the documentation of symptom burden of the previous 3 days. Eight items (pain, nausea, vomiting, dyspnoea, constipation, weakness, loss of appetite, tiredness) measure physical symptoms, four items (feeling depressed, anxiety, tension, disorientation/confusion) measure psychological issues, two items (wound care, activities of daily living) measure nursing issues, and two items (organisation of care, overburdening of the family) measure social issues [16]. Additionally, one free entry is provided for possible further issues, e.g., symptoms that are not assessed in the instrument. The symptom intensity is measured on a 4-point verbal rating scale (0 = no, 1 = mild, 2 = moderate, 3 = severe). Sum scores are calculated for the four subscales and a global sum score ranging from a minimum of zero points to a maximum of 51 points.
HOPE's item structure is similar to the PSBS: anxiety, confusion, weakness, nausea, vomiting, dyspnoea and pain are measured in both HOPE and PSBS. HOPE's item tension can be compared to PSBS' item restlessness. In both instruments, symptom burden is rated via verbal rating scales (HOPE: 4-point Likert-Scale; PSBS: 5-point Likert-Scale). Due to these similarities, HOPE was chosen for cross-validation and investigation of the construct validity of PSBS. Nevertheless, the instruments are not completely interchangeable. While PSBS covers alertness as an important cognitive parameter within The ECOG scale of performance status The Eastern Cooperation Oncology Group (ECOG) scale of performance status [17] is a widely used prognostic tool to quantify functional status in cancer patients. In palliative care, it has also been used to report the functional status of non-cancer patients with life-limiting illness [18]. The ECOG describes patients' functional status regarding ambulatory status and need for care. The scale categorises functional status via five symptom burden classes (0-4). A score of zero indicates normal activity; a score of one point indicates that the patient is able to walk and that light activity is possible. A score of two points means the patient is < 50% bedridden, with self-care being possible; a score of three points means the patient is > 50% bedridden with limited self-care capability, while a score of 4 points indicates the patient is completely bedridden and in need of care [19].

Independent variables
Palliative complex treatment Between day one and day seven of treatment at the SPCU, patients received a specialised palliative complex treatment that included a set of interventions performed by palliative care professionals focusing on patient stabilisation and the reduction of symptom burden.

Data collection
Data collection was performed between August 2011 and August 2015. Symptom burden assessment by means of the PSBS was conducted three times a day by trained palliative care nurses of the SPCU. The results were documented digitally via a standardised documentation interface. An assessment took two to 3 min. HOPE and ECOG were measured on admission and at discharge. For deceased patients, assessments for PSBS, HOPE and ECOG were performed post-mortem by nurses within a day after death. Among the patients, 476 (58%) died at the SPCU, 298 (36.30%) were discharged, 27 (3.30%) were moved to another ward within the university hospital and 9 (1.10%) were moved to another institution (e.g., hospice). Patients' palliative stage was reported for day one of admission for those patients in whom initial assessment of performance stage and clinical survival prediction was deemed reliable. A majority of patients needed a longer period of assessment and were discussed during our weekly multidisciplinary team meetings to improve prognostic accuracy, as suggested by White et al. [20].

Sample
Sample characteristics regarding age, sex, diagnosis group, palliative stage and ECOG performance status are shown in Table 3.

Statistical analyses
Patient data were extracted from the clinic's electronic medical records and anonymised before transferring the data into SPSS. All statistical analyses were performed using IBM SPSS 22 for Windows (IBM Corp. in Armonk, NY). The data were checked for plausibility prior to inferential analyses. Descriptive statistics are reported. For each analysis, the data timepoint is reported, whereas the first number indicates the day of data collection and the second number indicates the daytime (morning, noon, evening). For example, t1_3 is day 1 (admission), measure 3 (evening) and t7_1 is day 7 (1 week after admission), measure 1 (morning). We used measure 3 (evening) for the analyses wherever possible due to low rates of missing values. For comparisons within a day, data for day 7 instead of day 1 were used for the same reason.

Feasibility and acceptance of the PSBS
To evaluate the feasibility and acceptance of the PSBS, high-frequency documentation data were investigated regarding rates of missing values. In addition, the data were checked for potential bias caused by the adoption of prior ratings by means of the Kendall-W coefficient of concordance [21]. High and significant Kendall-W values and significant results were assumed to be an indicator of systematic adoption of prior ratings.

Structural validity
PSBS' structural validity data from timepoint t1_3 was analysed because of a low rate of missing values. To investigate the structural validity of the PSBS, a principal component analysis (PCA) with a cut-off criterion of 6 principal components was conducted. Although PCA is not a factor analysis, it is the most frequently used approach for data reduction in psychology [22]. Analyses were performed in accordance with the procedure suggested by Klopp [22]: Suitability of the data for PCA Prior to analysis, the data were controlled for adequacy to perform a principal component analysis using the Kaiser-Meyer-Olkin measure of sampling adequacy (KMO) and Bartlett's test of sphericity. KMO-values > .05 [23] and significant Bartlett's test results were taken as indicators of the adequacy of the data for PCA.

Number of components
The main goal of principal component analysis is to determine a component structure that is stable concerning the performed method of component extraction and rotation and replicable in other conditions. In subjective assessment methods, such as scree plot analysis [24], there are some objective procedures. A criterion for estimation of the number of components to be extracted is the replicability of the component structure. Therefore, the dataset was split into two random samples, and two principal component analyses were performed on each random sample with a cut-off criterion of the proposed number of components. The resulting two component loading matrices were then compared with each other by calculating Tucker's coefficient of congruency [25,26] as follows: where a ij represents the loading of variables i on component j of the first component loading matrix, and b ik is the loading of variables i on component k of the second component loading matrix. The resulting coefficient C may have values between − 1 and + 1, which can be interpreted similarly to Pearson's correlation coefficient [27]. Values of Tucker's congruency coefficients > .80 are assumed to be indicators of good replicability of the component structure [26].
Interpretability and rotation of the principal components To facilitate the interpretability of the component solution, we chose the orthogonal rotation method varimax. The aim of the varimax rotation method is to achieve a simple structure of the component solution, which means that some variables load very high on one component, while other variables load very low. Thus, the variance of the squared component loadings is maximised [28].
Significance of component loadings After facilitation of interpretability by means of varimax rotation, the variables that are used for the interpretation of a component must be determined. In accordance with the rule proposed by Gorsuch [29], only variables with component loadings < .30 were assumed to correspond to a component. We further considered the general rule of Guadagnoli and Velicer [30]: if fewer than 10 variables have a component loading > .40, then the sample size must be greater than 300 persons.

Construct validity
Due to its comparable item structure and similar outcome measure, the construct validity of the PSBS was investigated via cross-validation with the HOPE checklist. HOPE subscales nursing problems and social problems were excluded because there were no similar subscales in the PSBS. Spearman's rank correlation was calculated for sum scores and subscales of the PSBS and HOPE. Because HOPE does not include a gastrointestinal and respiratory symptom complex component, no analysis concerning these PSBS subscales was performed. Consequently, further analyses were performed on single item levels, Significant positive correlations were assumed to be indicators of good construct validity.

Discriminative validity
The discriminative validity of the PSBS was investigated using two nonparametric analyses of variance using the Kruskal-Wallis test [31] with ECOG performance status stages as independent and the PSBS sum score at t1_3 and t7_3 as dependent variables.

Reliability
The internal consistency of the PSBS subscales was tested by Cronbach's alpha.
In accordance with [32], values > .70 were taken as indicators of acceptable internal consistency. Additionally, the split-half reliability for the whole test was calculated using the odd-even method. Spearman-Brown coefficients [33] are reported. The test-retest reliability was evaluated by Spearman's rank correlation of PSBS sum scores and subscales within a day (t7_1 morning and t7_3 evening) and within a week (at t1_3 and after 1 week of treatment t7_3). To assess PSBS inter-rater reliability, intermediate measurements of different nurses during a day (t7_1, t7_2, t7_3) were examined using Kendall's W concordance coefficient [21].

Sensitivity to change
To investigate the sensitivity of the PSBS to changes in patients' symptom burden as a consequence of treatment interventions, the sum scores of the PSBS were evaluated with respect to significant mean differences pre-(t1_3) and after complex palliative treatment (t7_3) using the Wilcoxon test with repeated measures [34]. The level of significance was Bonferroni-adjusted to p < .01. Only patients who completed the palliative complex treatment were included in the analysis (n = 514). Patients who died within the first week and did not complete treatment were excluded from the analysis.

Feasibility and acceptance of the PSBS
Analyses showed a high degree of acceptance of the PSBS implementation by the specialised palliative care nurses. The rates of missing values in the PSBS documentation were low (0.32%).

Descriptive PSBS and HOPE
The overall mean PSBS score was 12.  Table 4.

Construct validity
There was a significant positive correlation of PSBS on admission with HOPE scores on admission (r s = .58; p < .001) and at discharge (r s = .54; p < .001). The psychological problems scale of HOPE correlated significantly with the psychosomatic symptom complex of the PSBS on admission (r s = .43; p < .001) and at discharge (r s = .28; p < .001). As principal component analysis did not reveal a physical symptom burden complex component for the PSBS, no correlations concerning the physical problems subscale of HOPE were calculated. Single item correlations of PSBS and HOPE checklist revealed positive significant correlations ranging between r s = .48 and r s = .79 on admission and between r s = .18 and r s = .61 (all p-values < .001) 1 week after admission. The single item correlations of the PSBS and the HOPE checklist are presented in Table 5.
Two nonparametric analyses of variance using the Kruskal-Wallis test revealed significant differences in PSBS sum scores for patients in different subgroups of ECOG on admission (χ 2 (4) = 121.91; p < .001) and 1 week after admission (χ 2 (4) = 57.68; p < .001). The mean sum scores for each ECOG group and measurement point are presented in Table 6.

Reliability
The Cronbach's alpha coefficients for the PSBS sum score and the PSBS subscales did not meet the criterion of acceptable internal consistency (> .70). The coefficients are presented in Table 8. The split-half reliability was investigated using the odd-even method. The results were adjusted using the Spearman-Brown-formula, revealing a coefficient of .69. Spearman's rank correlation of PSBS sum scores on admission and PSBS sum scores 1 week after admission revealed a significant positive moderate correlation (r s = .55; p < .001). Correlations and p-values for the PSBS subscales are shown in Table 7. Analyses of inter-rater reliability revealed poor and non-significant values for all items but confusion (Kendall's W = .01; χ 2 (2) = 9.97; p < .01). Pain marginally missed the level of significance (Kendall's W = .01; χ 2 (2) = 5.92; p = .05). The results of inter-rater-reliability analyses therefore indicated no hints for systematic adoption of prior ratings. Kendall's W, chi-square-and p-values for each item are shown in Table 8.

Sensitivity to change
The Wilcoxon test with repeated measures showed significant differences before and after palliative complex treatment for all PSBS subscales and sum scores except sweating (z = − 0.34; p = .73). The mean PSBS subscales and sum scores before and after palliative complex treatment with corresponding zand p-values are presented in Table 9.

Discussion
The aim of the present study was to report the implementation, acceptability and feasibility of a high-frequency proxy-based symptom assessment instrument in palliative care, to describe data concerning the psychometric properties of the instrument and to present a framework for the evaluation of such an approach. Systematic proxy-based assessment of symptom burden in palliative care obtained by nurses can be similar in accuracy to patient-reported outcomes and has special value in low-functioning or confused patients [9,35].

Feasibility and acceptance
Since its implementation in 2011, the PSBS has been integrated into daily clinical routine at the SPCU at the University Medical Centre in Dusseldorf, Germany. Symptom burden was successfully documented three times a day, and further analysis showed a low rate of missing values and no hints of adoption of prior ratings. We would argue that this finding can be interpreted as two indicators of the acceptability of this instrument, but interviews with nurses who conduct their daily assessments with PSBS are needed to confirm this preliminary finding. In summary, successful implementation of PSBS and the quantitative analysis of nurses' ratings  provide some evidence for the feasibility of a high-frequency proxy-based symptom documentation approach in a SPCU. To gain a more detailed understanding of PSBS' feasibility and acceptance in clinical practice, an implication for further research is to conduct qualitative assessments, e.g., by means of qualitative interviews with nursing staff.

Psychometric properties Validity
Because PSBS as an expert-developed documentation instrument has not yet been validated, another aim was to report data concerning the psychometric properties of this instrument. PCA revealed six main component solutions, including three multiple-item subscales (psychosomatic symptom complex; gastrointestinal symptom complex and respiratory complex) and three single-item scales (pain, itching, sweating). Considering the large amount of explained variance, the psychosomatic symptom complex appears to be a very relevant aspect of palliative patient symptom burden. This result is in agreement with former studies highlighting the importance of psychological symptoms in palliative care patients [36]. Several different methods are used for data reduction. Common factor analysis (CFA) and principal component analysis (PCA) are widely used multivariate techniques for this purpose [37]. According to Widaman [38], "the final word on comparisons between CFA and PCA has not yet been written" (p. 201). In the present study, we chose PCA for data reduction and evaluation of PSBS' structural validity because nonzero PCA loadings are higher and more stable than nonzero common factor analysis loadings and are closer approximations of the true factor loadings than the loadings produced by common factor analysis [37]. An implication for further research is to further evaluate PSBS' latent factor structure by structural equation modelling.
PSBS and HOPE sum scores showed a positive moderate significant correlation on admission and at discharge, indicating good construct validity of the PSBS. The aspect of moderate correlations implies that both instruments measure similar objectives but are not redundant, potentially due to the slightly different items. The psychosomatic subscales of the PSBS and HOPE show moderate positive significant correlations on admission and at discharge, which may be because both instruments cover different aspects of mental symptom burden. While the PSBS measures alertness and weakness as important mental symptoms of palliative care patients, HOPE covers depression, which is of no less importance. The results of single item correlations of the PSBS and HOPE support the construct validity of the PSBS.  Split-half-reliability: r s = .69; p < .001 f Interval of one-day time points: t7_1 (day 7, measure 1 (morning)) and t7_3 (day 7, measure 3 (evening)) g Interval of one-week time points: t1_3 (day 1, measure 3 (evening)) and t7_3 (day 7, measure 3 (evening)) Interestingly, the strength of the correlations decreases at the second point of measurement (discharge), which may be caused by HOPE post-mortem ratings for deceased patients. Univariate analysis of variance showed significant ECOG subgroup differences in mean PSBS sum scores, demonstrating a good discriminative validity of the PSBS regarding different intensities of symptom burden.

Reliability
Analyses of the internal consistency of PSBS subscales revealed below cut-off results for all subscales. Whereas acceptable reliability was almost met by the psychosomatic and gastrointestinal symptom complexes, values for the respiratory symptom complex showed poor internal consistency. These indicators do not support the use of the proposed subscales for symptom assessment in the current instrument. It might be best to measure symptom burden on a single item level. The sum score for PSBS should not be used because it does not appear to be reliable.
The split-half reliability of the PSBS also slightly missed the criterion of acceptable reliability. Analysis of test-retest reliability showed a moderate correlation between PSBS sum scores on admission and 1 week later. A correlation of .70 is an indicator of fair test-retest reliability, but this value highly depends on the interval between the points of measurement. In terms of a state-like symptom burden that is subject to frequent fluctuations within a single day, an interval of 1 week may have been too short to detect good test-retest reliability.
The results of test-retest reliability of PSBS subscores indicated a difference in stability between symptom burden subscores. The psychosomatic symptom complex and the respiratory symptom complex appeared to be more stable indicators, while the gastrointestinal symptom complex and the items pain, itching and sweating appeared to be less stable.
The inter-rater reliability of nurses' ratings of symptom burden within a day showed poor and non-significant results for all items but confusion. Because inter-rater agreement can only be high if the rating objective remains constant, this result may be regarded as another indicator of fluctuations of symptom intensity within a day. Therefore, high-frequency documentation of symptom burden appears to be a reasonable approach. In contrast to other items, confusion appeared to be a stable symptom with high inter-rater agreement.
This result indicates that there was no systematic adoption of prior ratings within the instrument. If this had been the case, the interrater-agreement would have been high and significant.

Sensitivity to change
Another matter of interest was the sensitivity of the PSBS to changes in symptom burden caused by interventions, a psychometric property that is often underreported in palliative care [39]. Given the hypothesis that interventions cause changes in symptom burden, the PSBS can be assumed to be sensitive for changes in symptom burden. In this context, it is probable that the symptom of sweating was not influenced by any intervention. All PSBS subscales (except sweating) and PSBS sum scores showed significant differences before and after palliative complex treatment  intervention. Scores were significantly lower for most subscales. However, there was a significant increase in the psychosomatic and itching subscales. Whilst significant, it is unclear whether these findings have clinical relevance given that the psychosomatic symptom burden remained at a high level, itching remained at a very low level overall, and changes were measured after the decimal point [40]. From a clinical perspective, it is not surprising to observe a tentative increase in pruritus given the difficult and complex nature of its pathophysiology and treatment, including opioid-induced pruritus (OIP), and its increase in end-stage presentations of malignancy, cholestasis and uraemia [41,42]. Psychological assessment in palliative care is inherently complex given the high level of confusion and low functioning of patients and the limited uptake of self-reported measures [43]. Further research is needed to establish the reasons for the significant increase in our psychosomatic symptom subscale.

Lessons learned
The current study demonstrates that the implementation of a high-frequency proxy-based assessment of symptom burden in palliative patients is feasible and appears to be acceptable to nurses. According to the performed analyses, the PSBS is a feasible tool for the documentation of physical and psychological symptom burden with high sensitivity to changes in symptom burden but unsatisfactory reliability. The study further presents a framework for the post hoc validation of an already existing documentation tool to encourage other clinicians and researchers to evaluate existing documentation tools to contribute to the demand for valid and reliable outcome measures in palliative care. Based on the experiences gained during the study/the experiences authors had during the study, the authors want to share the following recommendations for further endeavours.

Limitations
The present study deals with proxy-based measurements of symptom burden in palliative patients. Even though there are many advantages of this assessment approach, the rating itself is, to a great degree, dependent on the raters' impression and extends only limited consideration to patients' perception of their symptom burden. It should be mentioned that there could also have been a bias in nurses' ratings because they were not blinded to the intervention of the palliative complex treatment. Due to the post hoc design and field setting of the study, it was not possible to use blinded raters. Our evaluation of psychometric properties was based on classical test theory, and given our findings, it is possible that this tool is not reflective of an overall construct such as the Mini-Suffering State Examination [44] and the Palliative Outcome Scale [POS, [45]. Similar to the POS, the PSBS captures three factors and some independent items that do not load onto these factors, which makes this measure less ideal for the assessment of internal consistency and factor structure. Consequently, it appears that the PSBS is, in its present form, less suitable for this type of assessment.
In the current study, a post hoc psychometric analysis of an existing expert-developed documentation tool was performed. From a methodological perspective, a post hoc validation has its limitations. If possible, ad hoc theory-based test construction and validation should always be preferred. Further research is needed to enhance PSBS. For example, it would be an interesting research question to assess whether it can be adapted to other time scales than the prior 8 h.

Recommendations
Regarding the complex issues of designing and/or implementing high-frequency proxy-based symptom measurement instruments, the authors recommend integrating nursing staff into the implementation process at an early stage. This integration includes offering specific training in the use of the documentation interface in addition to the possibility of providing feedback and adapting the measurement system to foster its ease-of-use. Based on the experience of the authors, this procedure increases acceptance and compliance of the measurement approach.
From a methodological perspective, the use of an expert-developed tool caused several challenges regarding the psychometric evaluation of the documentation system. Clinical experts rarely consider theoretical aspects in the development of documentation systems or measurement instruments, resulting in different measurement levels for sub-items. In the present study, it became necessary to adjust graduations of the item pain to ensure its comparability to other items of the PSBS. It was further necessary to exclude the item constipation from analyses because of its non-ordinal level of measurement. To avoid methodological challenges regarding the psychometric and clinical evaluation of patient data, the authors recommend ensuring that sub-items are measured on at least ordinal verbal rating scale with comparable intervals between characteristic values, e.g., such as the Likert scale.
To maintain the possibility of evaluating a proxy-based measurement system with respect to its psychometric properties, it is highly recommended to add an empirically validated instrument for data collection. When evaluating such instruments for their suitability, it is important to consider a similar outcome objective and comparable item structure. From a test-theoretical perspective, it is also important to assure continuous and comparable measurement times of the second instrument to maintain the possibility of evaluating construct validity at several times of measurement.
The current study yielded evidence that symptom burden is subject to frequent fluctuations in its intensity within a day. Therefore, the authors highly recommend a high-frequency measurement approach of symptom burden data. Even though this approach leads to an additional workload for nursing staff, the experience gained within this study shows that it is feasible and accepted by nurses.

Conclusions
High-frequency proxy-based symptom burden assessment is a feasible and acceptable approach for nurse-led assessments of symptom burden in palliative care. PSBS in its present form demonstrates good structural and construct validity and high sensitivity to changes in symptom burden, but unsatisfactory reliability. This study supports the notion that PSBS might not be reflective of an overall construct and will therefore require further development and critical comparison to other already established symptom burden instruments in palliative care. Future research should focus on improving longitudinal psychosomatic symptom burden assessments.