Validity, reliability and responsiveness to change of the Italian palliative care outcome scale: a multicenter study of advanced cancer patients

Background There is an increasing requirement to assess outcomes, but few measures have been tested for advanced medical illness. We aimed to test the validity, reliability and responsiveness of the Palliative care Outcome Scale (POS), and to analyse predictors of change after the transition to palliative care. Methods Phase 1: multicentre, mixed method study comprising cognitive and qualitative interviews with patients and staff, cultural refinement and adaption. Phase 2: consecutive cancer patients on admission to 8 inpatient hospices and 7 home-based teams were asked to complete the POS, the EORTC QLQ-C15-PAL and the FACIT-Sp (T0), to assess internal consistency, convergent and divergent validity. After 6 days (T1) patients and staff completed the POS to assess responsiveness to change (T1-T0), and agreement between self-assessed POS and POS completed by the staff. Finally, we asked hospices an assessment 24–48 h after T1 to assess its reliability (test re-test analysis). Results Phase I: 209 completed POS questionnaires and 29 cognitive interviews were assessed, revisions made and one item substituted. Phase II: 295 consecutive patients admitted to 15 PCTs were approached, 175 (59.3 %) were eligible, and 150 (85.7 %) consented. Consent was limited by the severity of illness in 40 % patients. We found good convergent validity, with strong and moderate correlations (r ranged 0.5–0.8) between similar items from the POS, the QLQ-C15-PAL and the FACIT-Sp. As hypothesised, the physical function subscale of QLQ-C15-PAL was not correlated with any POS item (r ranged -0.16–0.02). We found acceptable to good test re-test reliability in both versions for 6 items. We found significant clinical improvements during the first week of palliative care in 7/10 items assessed-pain, other symptoms, patient and family anxiety, information, feeling at peace and wasted time. Conclusions Both the patient self-assessed and professional POS versions are valid and with an acceptable internal consistency. POS detected significant clinical improvements during palliative care, at a time when patients are usually expected to deteriorate. These results suggest that there is room for substantial improvement in the management of patients with advanced disease, across all key domains-symptoms, psychological, information, social and spiritual. Electronic supplementary material The online version of this article (doi:10.1186/s12904-016-0095-6) contains supplementary material, which is available to authorized users.


Background
Capturing the patient centred outcomes remains pivotal for demonstrating the value and quality of medical care. Outcome information is vital for evaluation, quality improvement, and to sustain services [1]. It allows patients to assess the quality of their care [2], and when captured in real time can help to screen for problems and monitor response to ensure professionals can appropriately support patients and family members [3]. In research, outcome measures are at the crux of assessing response to treatment [4]. Although the numbers of people with chronic, progressive and advanced illness are increasing [5], outcome measures in this context are lacking, yet are essential to develop appropriate interventions and support the development of models of care.
Outcome measurement in advanced illness brings particular problems. Physical function-a central component in most quality of life and outcome measures-is often not the patient's main priority, nor is it necessarily the target of medical care. In advanced or progressive illness, many standard quality of life measures have severe floor effects, i.e., their values are always low, even if symptoms improve. This has led to an assumption that in advanced disease and towards the end of life, a patient's quality of life cannot improve. However, an appropriate palliative care approach may lead to improvements in symptoms, psychological, social and spiritual wellbeing. Appropriate tools are needed to detect changes in these complex aspects, important for quality of life in advanced illness. At the same time, patients are too ill to complete long questionnaires and need short outcome tools. Capturing complex aspects with a short instrument is difficult. This has often led to the development of tools for proxy completion, but these have questionable validity [6].
It is crucial to build on existing tools, and yet ensure these are well validated. Harding R et al, in a European wide survey with 311 respondents in palliative care settings, identified 99 different tools being used in clinical care and audit, and 94 in research, that were cited by less than 10 participants. This makes comparison and standardisation difficult [7]. One barrier to the more widespread use of common tools is an appropriate and validated translation in different cultures and languages of existing tools [8]. Moreover, a better knowledge of the distribution of outcomes in different settings and stages of cancer disease, including the analysis of factors associated to their changes, makes possible a process of measurement and evaluation, for quality or research purpose, of different aspects of palliative care provision.
Therefore, this multicentre study aimed to culturally adapt and test the feasibility, validity, reliability and responsiveness of a brief and widely used outcome measure, the Palliative care Outcome Scale (POS), using the Italian context as a model. Moreover, we examined predictors of change on POS scales after transition to palliative care settings.

Study design, settings and ethical approval
This is a mixed method multicentre study, following relevant European Organisation for Research and Treatment of Cancer (EORTC) and other guidelines [9,10] Four teams (two inpatient hospices, two home-based PCT's) participated in the preliminary testing, 16 in cognitive interviewing and 15 in phase II (8 inpatient hospices and 7 home-based PCTs). All were multiprofessional PCTs comprising doctors, nurses, and for some teams, psychologists and/or social workers. The inpatient hospice PCTs provided in-patient care and took over provision of all aspects of care for patients and their families. The home-based PCTs supported patients and their families in the community, offering an extra layer of support for those with the most complex or advanced illness. Liaison with the PCTs was conducted primarily virtually, using e-mail, telephone and skype.
The Ethical Committee of the National Institute for Cancer Research of Genoa approved the study (Deliberation EC07.001 of 19 February 2007).

Instruments
1. The Palliative care Outcome Scale (POS) is a widely used, validated, brief outcome measure, used in in-patient, community and outpatient settings among patients with advanced illnesses [11][12][13]. It was initially developed for assessing outcomes in advanced cancer patients [11] based on systematic reviews, collaboration of a multidisciplinary advisory group and input from patients and caregivers. Subsequently, the POS was used widely in many settings, contexts and among patients with different and multiple conditions. It has established validity (tested against established longer measures and qualitative reports), reliability, and internal consistency (Cronbach's alpha~0.7). It demonstrated responsiveness to change and acceptability, taking around 5 min to complete [2]. It can detect differences in performance status and predict survival [16,17]. The Italian version and the Scoring Manual were provided by the EORTC, although a validation in Italian was not published. It is designed for patient self-completion. 3. The Functional Assessment of Chronic Illness Therapy, Spiritual Wellbeing Scale (FACIT-Sp) was developed originally in cancer patients to capture spiritual well-being. The instrument comprises 12 positive statements, with two subscales, one measuring a sense of meaning and peace (8 statements) and the other assessing the role of faith in illness (4 statements). Each statement is rated from 0 (not at all) to 4 (very much). A total score ranging from 0 to 48 for assessing spiritual well-being can also be produced. For both the scales and the overall score, the higher is the score; the better is the spiritual well-being. It has good internal consistency reliability, correlates with other quality of life measures and measures of religion and spirituality and takes around 5-10 min to complete [18][19][20]. The Italian version validation was not published, but the FACITOrg provided the Italian version of the questionnaire with the Scoring Manual. It is designed for patient self-completion.

Procedures and participants
Phase I Preliminary field-testing and assessment of feasibility, content and face validity, using version 1 for staff, previously translated, but not back translated, by Franco Toscani [21]. We asked palliative care clinicians in four PCTs to assess all new patients using this version of the POS during their weekly meetings. For each item, staff assessed the comprehensibility, face validity relative to the assessed dimension, uniqueness and the relevance of the content and rating scale. They reported any other dimensions not covered by the POS potentially useful for assessing quality of life in that specific patient. Translation into Italian the original English POS (version 2, both staff and patient completed questionnaires) following the forward-backward procedures recommended by the EORTC QL Group [9]. This was combined with findings from the preliminary testing to provide a version of POS for cognitive testing.
Cognitive testing of the POS in 16 PCTs. Clinicians were requested to identify two patients in their care and asked them to complete the POS. After completion, the clinicians conducted a semistructured interview with patients, focused on any potential problems in filling in each item of the POS. Following the EORTC [9] guidelines, patients were asked to respond to five questions for each item of the POS: Did you have difficulty in replying to this question? Did you find this question unclear?
Were words in this question that you found difficult to understand? Did you find the way was worded to be upsetting or annoying in any way? Would you have asked the question in a different way?
The comments were transcribed verbatim during the interview. The interviewer grouped the transcripts according to the five questions, and sent the material to the coordination centre. A researcher (MC) reviewed the written material and identified all relevant issues for each specific question. The POS was then modified accordingly for formal testing.

Phase II
We included consecutive consenting cancer patients admitted to the care of the participating PCTs, and excluded patients unable or unwilling to provide informed consent. Reasons for exclusion and refusals were recorded. Patients completed the POS at admission (T0) together with the EORTC QLQ-C15-PALand the FACIT-Sp. At admission, the staff also collected demographic and clinical details, including the functional status through the Eastern Cooperative Oncology Group (ECOG) scale. After 6 days from admission (T1), patients were requested to complete again the POS. In the meantime, the patient's main clinician, blind to the patient scoring, was requested to complete the staff version of the POS.
To assess test-retest reliability of the POS, we asked inpatient hospices an additional POS assessment 24-48 h after T1 assessment (T2). Also in this step, the clinician completed a staff POS assessment blind to the patient scoring. We limited this assessment to the inpatient hospices because of the greater intensity of contact and care of their professionals. The short time to retest (24-48 h) was a compromise between the need to avoid recall bias and the need to retest patients in a stable clinical condition [22].

Statistical analysis
We evaluated the psychometric properties of the POS according to standard methods [9,23].
1. Feasibility and acceptability, data and scaling quality, was assessed by calculating the percentage of missing items (number of missing items/total number of item responses possible), the distribution of scores and floor or ceiling effects. We assumed that 5 % was an acceptable proportion of missing for each item of the questionnaire, taking into account the settings where the POS was administered. determining the change in POS score from admission (T0) to the subsequent assessment (6 ± 2 day later-T1). We calculated mean scores and effect sizes. 6. Clinician (doctor or nurse) proxy assessments were tested for validity by comparing their scores with patient (as the gold standard) synchronous (±1 day) scores. The assessment was assessed at T1. We calculated the percentage agreement, percentage agreement within one score, weighted Kohen's Kappa, correlations. We also estimated the one-way ICC for the POS total scores [24].
A linear regression analysis was flitted to the data, using the change from admission as continuous dependent variable. The aim of this analysis was to identify subgroups of patients with specific characteristics showing significant improvement or deterioration in their quality of life in the week after admission to palliative care. We tested the association between the dependent variable and the demographic and clinical characteristics of the patients (age, gender, education, marital status, primary tumour, ECOG, setting) in both univariate and multivariate analyses. In the multivariate analysis, we estimated the means of the changes from admission, from the regression model after adjusting for all independent variables.

Results
Phase I, feasibility, content and face validity, cultural adaptation For the preliminary assessment, the POS was completed during staff meetings for 82 patients, giving 209 completed questionnaires (most patients had 2-3 assessments). For 96 (46 %) evaluations, the staff did not identify any problems/issues. For 113 (54 %) concerns in comprehensibility and/or uniqueness of the content were reported [see Additional file 1]. Most (n = 57, 50 %) were related to item 5 (information), in particular variation of information levels between patients and families. In all assessments, staff deemed the POS dimensions as comprehensive, except for one case, which proposed an additional  Fig. 1 Flow chart of the study assessment of communication with other professionals. Clinicians reported that a POS manual providing further guidance would be helpful.
Results of translation, back-translation and adaptation of the Italian version of the POS patient version are summarized in Additional file 2. This includes modifications made as a result of the preliminary assessment's findings. The main change was that we replaced the original Item 8 ("Have you felt good about yourself as a person?") with the question: "Are you at peace?" , originally developed and validated to probe spiritual concerns at the end of life [25] and used in the African adaption of POS [26].
In the cognitive testing 29 questionnaires (range 1-4 per PCT) were administered, including 15 patients. Overall, the items performed well [see Additional file 3] without any major difficulties for seven out of the ten items. Some patients found it difficult to answer items 6 (share feelings) and 10 (personal affairs). One patient reported "… difficulties due to the problematic relationship with my family" for item 6. Only one item (number 10) was found upsetting by only one patient, who "… found the question too intrusive". Patients experienced most difficulties in answering item 8 (are you feel at peace?), but we were not able to find a better alternative. Based on these results no further changes were made.

Phase II formal evaluation
Two hundred and ninety-five consecutive cancer patients admitted to 15 PCTs were screened for their eligibility, of these 175 (59.3 %) resulted eligible. Main reasons for exclusion were coma and cognitive impairment (Fig. 1). Participation to the study was proposed to 175 patients, and 150 (85.7 %) consented. The characteristics of the 150 eligible consenting patients were similar to the whole sample in terms of age, gender, educational level, diagnosis and marital status, but had less severe functional status as determined by ECOG status (P < 0.001) ( Table 1). Fewer patients from inpatient hospices than home based PCTs were able to be included in the study. (P = 0.002) All 150 patients completed the POS at admission, slightly lower numbers for QLQ-C15-PAL and FACIT-Sp (Fig. 1). After 6 days at T1, 138 patients were alive and 120 (87 %) POS patient and 131 (95 %) POS staff assessments were completed. At T2, in the 8 inpatient hospices that reassessed the patients, 59 patients were alive and 33 (56 %) POS patient and staff (Fig. 1) assessments were completed.
There was little missing data for POS assessments, all less than 5 %. The highest was for the items 'information' and 'personal affairs' (3.3 % each) [see Additional file 4]. The entire range of possible scoring was used for all POS items. 'Family anxiety' had the highest proportion of score 4 (worst score), with 49.7 % of recording that their families were overwhelmingly anxious. For four items-'wasted time', 'personal affairs', 'information', 'share feelings'-were Correlations between POS and QLQ-C15-PAL and FACIT-Sp met most the prior hypotheses (Table 2). There was a strong correlation between the items assessing pain (r = 0.77), and moderate correlations between POS anxiety and depression and QLQ-C15-PAL emotional functioning (r = -0.51 and−0.68, respectively). We also found moderate correlations between POS 'at peace' and the FACIT-Sp meaning and peace subscale (r =−0.44) and the total score (r =−0.40), but not the 'faith' subscale (r =−0.16). POS 'other symptoms' was correlated moderately with fatigue, nausea and vomiting, appetite loss and constipation on QLQ-C15-PAL, but not breathlessness. As hypothesised, the physical function subscale of QLQ-C15-PAL was not correlated with any POS item (r ranged−0.16 to 0.02), nor with the total POS score (r =−0.12).
Test re-test reliability of the POS total score for both versions (self-assessed by the patients and assessed by the staff ) was rather good, with the ICC of 0,72 (95 % CI = 0,50-0,85) and 0,82 (95 % CI = 0,67-0,91) respectively. Test re-test reliability of the POS items for both versions showed acceptable or good agreement over time for all items apart item 2 (other symptoms) and 6 (share feelings) when assessed by the patients, and items 9 (wasted time) and 10 (personal affairs). These results are difficult to judge accurately because it was difficult to identify and ensure patients were stable. (Tables 3 and 4).
Considering responsiveness to change, we found that POS demonstrated good responsiveness to change during admission to palliative care. After 6 days from admission to PCTs, patients reported significant improvements in 7/10 POS items, all except depression, share feelings, and personal affairs (Table 5). Effect sizes for these seven items ranged between−0.21 (feeling at peace) to−0.38 (other symptoms), with a total score effect size of−0.43. (Table 5; Fig. 2).
Regression analyses found two variables significantly associated with change from admission. An inverse linear relationship between years of education and improvement in POS scores after admission was observed both in univariate (P-value =0.008) and in multivariate analysis (P-value = 0.002). Then, in the multivariate analysis, the improvement was significantly (P-value =0.014) higher for patients admitted to Home Care as compared to those admitted to the in-patient hospice ( Table 6).
Comparing the staff assessments of POS vs. patients self-assessments we found moderate agreement for the  The staff version of POS at T1 had similar internal consistency to that found for the patient version; the staff Cronbach's alpha (95 % CI) at T1 was 0.68 (0.61-0.75) and the patient Cronbach's alpha (95 % CI) at T1 was 0.72 (0.65-0.78).

Discussion
We found that the POS was a feasible opportunity to assess outcomes and quality of life in advanced illness. The professional completion of POS was possible for most patients in care, but in this sample, on admission, the severity of illness limited the self-assessment in almost 40 % of patients (115/295), due to cognitive impairment, coma or early death. We found the POS had an excellent construct validity, a limited, but acceptable internal consistency and reliability, a good convergent and divergent validity when compared with other measures. Although the professional assessments had acceptable agreement and correlation with patient ratings,  there were differences, suggesting that wherever possible patient assessments should be used. The POS was responsive to change, with significant clinical improvements during the first week of palliative care in seven out of 10 items assessed -pain, other symptoms, patient and family anxiety, information, feeling at peace and wasted time. This study detected substantial clinical improvement, even with one week of palliative care, in many dimensions important in quality of life at a point of the trajectory of disease when modern medicine often considers there is nothing more that can be done. There was a medium effect size of 0.43 for the total score and between 0.21 and 0.38 (small to medium) effect sizes for the 7/10 individual items where significant improvements were found [27]. A question is-could these benefits be achieved at an earlier time? The mean POS score at admission was 14.3 indicating that most patients had serious and multiple problems.
The regression analysis found that the improvements were six times greater in the home care group, compared to the inpatient hospice group. These findings are difficult to interpret and different explanations could be discussed. Survival is usually shorter for hospice patients, and it is possible that an earlier admission to palliative care for home care patients was associated with a greater improvement in POS total scores. Although data from recent trials of early palliative care [28][29][30], make this hypothesis appealing, this study does not allow to get to a clear conclusion.
An unexpected finding was the strong inverse relationship between educational level and improvement in POS Table 5 Responsiveness to change in a sample of patients self-assessed with the POS after 6 days from admission in palliative care (T0) No.
Admission (T0) 6 ± 2 days after T0 Difference (T1-T0 score during palliative care. Recent commentaries [31] and specific researches [32] have highlighted inequities in access to palliative care across the UK, as it is less likely to be available for people living in areas of social deprivation. The results of this study, if confirmed, suggest that palliative care has the potential to benefit more just the socially disadvantaged groups. The first four items of POS (pain, other symptoms, patients' and family anxiety) worked well, without concerns from patients. These domains are reported by clinicians to be the most important questions [13]. However, one patient suggested replacing item 2 with a checklist of symptoms. Symptom management is a cornerstone in palliative care. Correlations with QLQ-C15-PAL found that the single POS item 'other symptoms' captured some symptoms, but not others, especially not breathlessness. As symptoms are highly prevalent in advanced disease, assessing individual symptoms should be considered in future developments of the POS.
We decided to change the original POS item 8, "Have you felt good about yourself", with the Steinhauser's item "are you at peace" that showed promising psychometric properties in the validation study [25]. The original item had experienced some problem in the validation of the German version of the POS [12]. In the African POS, the "peace item" was included in the final questionnaire [26] and subsequently showed good validity in the African context [33]. The "peace item" may cover a different dimension as compared to the removed item. This could be a limitation, as the original item 8 together with item 7 (depression) have shown to be useful for screening of depression [34]. Not all the items showed satisfactory psychometric properties. A clear floor effect was observed for items 9 "wasted time" and 10 "personal affairs" , and to a lesser extent for items 5 "information" and 6 "share feelings". Four of these items-"other symptoms" , "share feelings" , "wasted time" and "personal affairs-" showed a poor reliability when assessed by the patients, and to a lesser extent when assessed by the staff. The low reliability of "wasted time" and "personal affairs" could be explained by the skewness of their distribution, considering that the proportion of agreement is rather high, and the kappa statistics is rather sensitive to marginal distributions. Conversely, the poor reliability of "other symptoms" and "share feelings" , associated with a wide distribution of the scores and a low proportion of agreement, suggests some problem in the items when they are self-assessed by the patients. Future revision of the tool should take into consideration these points.
This study has limitations. First, the psychometrics properties of the EORTC QLQ-C15Pal and the Facit-Sp were never assessed in Italy. Both questionnaires are widely used, and we used the official translations, but we must take into account that the results of the convergent and divergent analyses could be slightly biased. Then, we estimated test-retest reliability without any confirmation of clinical stability before retesting the patients 24-48 h later. Hospice patients are prone to deteriorate rather quickly and the inclusion of clinically instable patients could have affected the results, by providing biased underestimates of reliability. Moreover, as we did not collect any external clinical or patient-based data during prospective assessments, we could not estimate the minimal important difference for the POS. Other studies should explore this important property of the tool.
Then, we only included patients with a diagnosis of cancer, although many were elderly and had co-morbidities. The advanced phase of illness in cancer is common in internal medicine, and there are many similarities in symptoms and problems between cancer and non-cancer [30]. Five percent of patients declined to take part in the study. Although this is low, we do not know whether they refused the study, with all the questionnaires, or completion of the POS in itself.
Finally, the study was based on one country, Italy, where the provision of services may be different to others; but we had six regions and 20 centres that should guarantee a certain degree of heterogeneity.