Development the Care Evaluation Scale Version 2.0: a modified version of a measure for bereaved family members to evaluate the structure and process of palliative care for cancer patient

Background The Care Evaluation Scale (CES1.0) was designed to allow bereaved family members to evaluate the structure and process of care, but has been associated with a high frequency of misresponses. The objective of this study was to develop a modified version of CES1.0 (CES2.0) that would eliminate misresponses while maintaining good reliability and validity. Methods We conducted a cross-sectional questionnaire survey by mail in October 2013. The participants were bereaved family members of patients who died from cancer in seven institutions in Japan. All family members were asked to complete CES2.0, the short form CES1.0, items on overall care satisfaction, the Family Satisfaction with Advanced Cancer Care (FAMCARE) Scale, the Patient Health Questionnaire-9 (PHQ-9) and the Brief Grief Questionnaire (BGQ). To examine test-retest reliability, all participants were asked to complete a second CES2.0. Results Of 596 questionnaires sent, 461 (77%) were returned and 393 (66%) were analyzed. In the short form CES1.0, 17.1% of the responses were identified as misresponses. No misresponses were found in CES2.0. We identified 10 CES2.0 subscales similar to those in CES1.0 using exploratory factor analysis. Cronbach’s alpha was 0.96, and the intraclass correlation coefficient was 0.83. Correlations were found between CES2.0 and overall satisfaction (r = 0.83) and FAMCARE (r = 0.58). In addition, total CES2.0 scores were negatively correlated with the PHQ-9 (r = −0.22) and BGQ (r = −0.10). Conclusion These results suggest that CES2.0 eliminated misresponses associated with CES1.0 while maintaining good reliability and validity and greatly improving test-retest reliability.


Background
The way in which the quality of medical care is measured is an essential part of accurate quality assurance. In palliative care, in addition to the difficulties associated with prognostication, patients are frequently too ill to complete questionnaire surveys or take part in interviews. Therefore, it is common to have bereaved family members evaluate the provision of care at the patients' end-of life [1].
Donabedian [2] stated that quality of care consists of the following three components: 1) structure of care; 2) process of care; and 3) outcome of care. Although numerous measures have been developed to evaluate the outcomes of palliative care, such as quality of the dying process [3][4][5][6][7], few measures have been developed to evaluate the structure and process of care [3]. One of these measures is the Care Evaluation Scale (CES), which was developed in 2004 to evaluate the palliative care structure and process [8]. The CES has been used in a number of nationwide surveys and clinical audits by institutions in Japan [9][10][11].
To measure the structure and process of care as experienced by bereaved family members, the original CES (CES1.0) consists of 28 items with the following 10 subscales: physical care by physicians; physical care by nurses; psycho-existential care; help with decisionmaking for patients; help with decision-making for the family; environment; family burden; cost; availability; and coordination/consistency. In CES1.0, bereaved family members are asked whether improvements are needed in regard to the care provider. This is because when CES1.0 was developed, the provision of care was typically rated using agreement or satisfaction level, but this makes it difficult to interpret whether the respondents actually desire further improvement in this area [8].
However, during data entry, we noticed that the care assessment method used in CES1.0 appeared to result in misresponses by a substantial proportion of the respondents. In this article, we define misresponse operationally that "misresponse occurs when a respondent selects an option that is inconsistent or incongruent with his or her corresponding beliefs". For example, respondents who described themselves as being "very satisfied" with the provision of care also responded that "improvement is highly necessary". We assume that such respondents misunderstood "highly necessary" as "the item, for example physical care by the physician, is very good". Although this may be somewhat confusing to English readers, the wording of the item in Japanese was thought to lead to reverse-score errors in the responses if not read carefully. In Japanese, although the question regarding satisfaction was quite straightforward, the question regarding the necessity of improvement was in the form of a double negative, and therefore somewhat confusing. In fact, we found that question regarding satisfaction was consistent with the other questions, while the question regarding the necessity of improvement was consistent with those associated with suspected misresponses. Consequently, we found a misresponse rate of 5-10% (unpublished data). Therefore, we referred to the overall satisfaction question in previous studies and manually corrected the inverse responses to reflect the inverse score [9][10][11]. However, this was not only time consuming, but also insufficient, because categorizing respondents who answer "somewhat satisfied" for the overall satisfaction question as satisfied or unsatisfied, which would be necessary to correct such CES1.0 responses, was not feasible. Alternatively, we hypothesized that we could eliminate misresponses by improving the language of the response options on the questionnaire. Therefore, the objective of this study was to develop a modified version of CES1.0 (CES2.0) that would allow the quality, structure, and process of palliative care to be evaluated more accurately while maintaining good reliability and validity.

Participants and procedures
We conducted a cross-sectional questionnaire survey by mail in October 2013. The potential participants were bereaved family members of 100 consecutive patients who died in four inpatient palliative care units, two home hospices, and a general hospital ward before May 31, 2013. Patients who died before 2012 were not included because a similar bereavement study in a palliative care unit was conducted in 2013. The inclusion criteria were as follows: 1) the patient died from cancer; and 2) the patient and bereaved were at least 20 years of age. The exclusion criteria were as follows: 1) the responsible family member or guarantor could not be identified; 2) treatment-related death or death in the intensive care unit; 3) the bereaved had psychological distress at level that kept them from participating in the study as determined by the primary physician; and 4) the bereaved was incapable of completing the questionnaire due to cognitive dysfunction or the inability to read Japanese. We estimated that a sample size of 200 would be required based on exploratory factor analysis, convergent validity, and test-retest reliability [12].
We sent a questionnaire by mail to the responsible family member or guarantor as noted in the hospital records. They were asked to complete the questionnaires by the primary caregiver of the deceased. Reminders were sent to all non-responders 2 weeks later. This retest interval was standard in Japan and was also elected in acoordance with development of patient version of CES [13]. Potential participants who refused to participate were asked to indicate their decision in a check box on the cover sheet and return the questionnaire. Reminders were not sent to such responders. To examine the test-retest reliability of CES2.0, a retest questionnaire was sent to all respondents 2 weeks after they returned the first questionnaire. In Japan, written informed consent is not required for anonymous questionnaire surveys. Potential participants were informed about the details of the survey, and completing and returning a questionnaire was regarded as voluntary consent to participate.
This study was approved by the institutional review boards of Tohoku University and all participating institutions and conducted in accordance with the ethical guidelines for epidemiological research issued by the Ministry of Education, Culture, Sports, Science, and Technology and the Ministry of Health, Labour, and Welfare of Japan.

Measurements
Care Evaluation Scale version 2.0 (CES2.0) We developed CES2.0 based on modifications from the original CES1.0 for bereaved family members [8]. In CES1.0, participants were asked whether improvements were necessary in relation to the care provider. Because this appeared to be confusing for the participants, we changed the response option to a 6-point Likert scale (6: highly agree; 5: agree; 4: somewhat agree; 3: somewhat disagree; 2: disagree; 1: highly disagree). We also asked participants to select "7: N/A" if none of the other scores were applicable to the patient.
CES1.0 consisted of 28 items with 10 subscales. We modified the expression of each item to match the new response options without changing the underlying concept of each subscale. For example, the statement "The same doctors and nurses provided care" in CES1.0 was changed to "Important information was shared even when the attending physician or nurse was changed" in CES2.0. To allow easy interpretation, all scores were proportionally adjusted to range from 0 to 100, similar to CES1.0; higher scores indicated good structure or process of care. We designed this scale to be used not only in inpatient settings, but also in the home.

Short version of the original Care Evaluation Scale (CES1.0)
We used the short version of the original CES1.0 [8], which consists of 10 items and 10 subscales. In the original CES1.0, participants were asked to indicate if improvements were necessary using the following 6-point Likert scale ("1: improvement is not necessary"; "2: improvement is little necessary"; "3: improvement is somewhat necessary"; "4: improvement is necessary"; "5: improvement is quite necessary"; and "6: improvement is highly necessary").

Overall care satisfaction
Participants were asked about their overall satisfaction with care using the following question: "Overall, in the past month, were you satisfied with the medical care the patient received in the last place of care?" Participants were asked to respond on a 6-point Likert scale ("1: highly dissatisfied" to "6: highly satisfied"). This measure was used in the development of CES1.0 [8] and in a nationwide bereavement study [11].

FAMCARE (Family Satisfaction with Advanced Cancer Care)
We used the FAMCARE Scale to measure family satisfaction with advanced cancer care. The original FAMCARE Scale is composed of 20 items that can be rated on a 7-point Likert scale ("7: highly satisfied" to "1: highly dissatisfied") [14]. The Japanese version of the FAMCARE Scale was translated by the forward and backward translation procedure [15].

Expectation of care
Expectation of care before receiving the last place of care was also investigated using a method similar to that for CES2.0. Participants were asked to rate their level of expectation for seven items (physical care by physician; physical care by nurse; psycho-existential care; help with decision-making; environment; family burden; and cost) on a 3-point Likert scale ("1: not very expected" to "3: highly expected") [8].
Patient Health Questionnaire-9 (PHQ-9) We used the Patient Health Questionnaire-9 (PHQ-9) to measure depression among the participants. The PHQ-9 is a self-administered questionnaire which scores each of the nine Diagnostic and Statistical Manual of Mental Disorders version IV criteria for depression from "0" (not at all) to "3" (nearly every day) [16]. We used the Japanese version of the PHQ-9, which has confirmed validity [17].

Brief Grief Questionnaire
We used the Brief Grief Questionnaire (BGQ) to measure the grief of the participants. The original BGQ was developed for a study on the 9-11 terrorist attacks in New York City. It is composed of five items that can be rated on a 3-point Likert scale, and can be easily administered [18]. The validity of the Japanese version of the BGQ was also confirmed in the general Japanese population [19].

Participants' characteristics
Information on the patients' age, sex, primary cancer site, month after death, place of death, and duration of stay in the last place of care (hospital or home) was collected from medical charts, while data on family age, sex, relationship to the deceased, physical and mental health status during the caregiving period, frequency attending to the patient, presence of other caregivers, education, medical expenditures during the last month, annual household income during the caregiving period, and feelings regarding the household budget during the caregiving period were collected by questionnaire.
Although CES2.0, overall care satisfaction, and the PHQ-9 were administered to all participants, in consideration of the volume of the questionnaire, CES1.0, the FAMCARE Scale, the BGQ and expectation of care were only administered to half of the participants, and the retest questionnaire only consisted of CES2.0.

Analysis
Before analysis, we identified the likely misresponses on CES2.0 and the short version of the CES1.0 by referring to the question on overall satisfaction. We then counted and manually corrected the misresponses. We calculated descriptive statistics and conducted item analysis. We treated the response "7: N/A" on CES2.0 as a missing value. To calculate the subscale scores, we imputed mean responses from within each subscale for all missing values.
We tested the reliability of the factor structure using the split-half method; we randomly split the data in half and performed exploratory factor analysis on one half, and confirmatory factor analysis on the other half. For internal consistency and test-retest reliability, we calculated Cronbach's alpha coefficients and intraclass correlation coefficients (ICC). To examine concurrent and discriminant validity, we calculated Pearson's correlation coefficients between the total score and each subscale score for CES2.0, overall satisfaction, the FAMCARE Scale, expectation of care, the PHQ-9 and the BGQ. In addition, we calculated Pearson's correlation coefficients between CES2.0 and the short version of CES1.0. Finally, we developed a short version of CES2.0 by selecting one item from all subscales. The selection criteria were as follows: 1) high correlation with subscale total score (r > 0.90); and 2) high reliability as a single item (ICC > 0.60) and low missing rate (<10%). All analyses were performed using the SAS statistical package (version 9.4; SAS Institute, Cary, NC).
The participants' characteristics are shown in Table 1. The mean patient age ± standard deviation (SD) was 73 ± 12 years, and males made up 60% of the total. The place of death was as follows: inpatient palliative care units (53%); home (32%); and general hospital ward (15%). The mean duration of stay in the last place of care (hospital or home) was 52 ± 90 days, and time after death was 14 ± 8 months. The mean family age was 61 ± 12 years, and females made up 69% of the total. Spouses comprised 50% and children 35% of the family members. A total of 84 family members attended to the patient 4 days or more per week. No significant differences of characteristics collected for lists were observed between responders and non-responders.

Item analysis and factor validity
In the short version of CES1.0, we found a misresponse rate of 17.1%; in contrast, no misresponses were found on CES2.0. The results of the item analysis with all data, and the factor analysis with one half of the split data are shown in Table 2. We succeeded in recreating the factor structure of CES1.0, with confirmatory factor analysis on the other half of the split data Missing values ranged from 1.3% to 10.4%. The ICCs of all but one item were over 0.60 and we retained this item because it was considered indispensable and the ICC was 0.59.
We identified 10 subscales similar to CES1.0 using exploratory factor analysis. The mean ± SD total score on CES2.0 was 81.0 ± 12.4. The CES1.0 short version items are indicated with asterisks in Table 2.

Concurrent and discriminant validity
Concurrent and discriminant validity as demonstrated by Pearson's correlation coefficient are shown in Table 4. The CES2.0 total score was correlated with overall satisfaction (r = 0.83) and FAMCARE (r = 0.58), but not with expectation of care (r = 0.02). In addition, the CES2.0 total score was negatively correlated with the PHQ-9 (r = −0.22) and the BGQ (r = −0.10). Although these results were     Table 2 Item and factor analysis Pearson's correlation coefficient with each subscale total score c intraclass correlation coefficient similar to those found using the short version of CES1.0, CES2.0 was more strongly correlated with overall satisfaction (r = 0.71) and FAMCARE (r = 0.53).
Correlation between CES2.0 and CES1.0 We added correlation between CES2.0 and CES1.0 in Table 4. The correlation coefficient between CES2.0 and CES1.0 total scores was 0.78 and that between the corresponding subscales ranged from 0.56 to 0.71. In addition, as more comparable case, the correlation coefficient of CES2.0 total score of short version and CES1.0 was 0.79 and that between the corresponding items ranged from 0.53 to 0.67.

Discussion
CES2.0 was shown to have good reliability and validity. The most important result was that the misresponse rate was 0% in CES2.0, in contrast to 17% in CES1.0. In   addition, no decrease in validity was seen in CES2.0 compared with CES1.0, and test-retest reliability was greatly improved.
Regarding the misresponse rate, in this study, 17% of the respondents incorrectly responded to items on the short version of CES1.0. This rate was higher than that found in our past experiences (from 5-10%). The respondents might have been less careful in answering items on CES1.0 in this study than in previous studies because they were asked to complete similar questionnaires (CES2.0 and CES1.0), and CES1.0 was administered after the other measures.
The test-retest reliability of CES2.0 was high (total score: ρ = 0.83). In another study involving CES1.0, the test-retest reliability was reported as 0.57 [8]. We believe that this improved reliability can be explained as follows. First, CES2.0 was easier to understand that CES1.0 and no misresponses were identified. Second, the test-retest interval of the previous study was 6 months [8], which was much longer than that used in this study. In developing CES2.0, several expressions for questionnaire items were modified to be easier to answer without altering the concept of the subscale. Therefore, respondents could respond to items more accurately.
No other psychometrics of CES2.0 decreased after the modifications from CES1.0. Factor validity, internal consistency, and concurrent/discriminant validity were all good. The correlations between CES2.0 and overall care satisfaction and the FAMCARE Scale were slightly higher than those with CES1.0. This could also be the result of the elimination of misresponses from CES2.0.
In developing CES2.0, we did not adopt satisfaction as a measurement concept because this was often criticized as being insufficiently theorized and having no widely accepted definition [20]. One of the criticisms of the satisfaction measure is the effect of expectation of care in studies in Australia and the US [21][22][23]. In this study, CES2.0 was not correlated with expectation of care, and slightly correlated with depression among the bereaved. This slight correlation is consistent with that observed with CES1.0 in this study, the original CES1.0 study, and other previous studies [8,24]. It is therefore difficult to make a causal inference between depression and satisfaction because lower satisfaction of care might lead to depression in bereaved family members [25]. In this study, because only a slight correlation was observed between CES2.0 and depression or grief, this was not thought to have had a large effect on the perception of quality of care among the bereaved.
This study did have several limitations. First, as described in the Analysis section, misresponses were checked manually, which is an imperfect procedure. It is possible that some misresponses may have gone undetected in CES2.0. Second, 85% of the patients died in inpatient palliative care units or home hospices. However, in Japan, the actual proportion of patients who die in these places of care is less than 20%. Therefore, our study sample may not be representative of the general population. It may be better to use descriptive statistics based on place of care because the quality of care in inpatient palliative care units and home hospices is more highly rated than general hospital wards in Japan [11]. Because of the possibility of a ceiling effect, an additional survey using a more representative sample might be necessary to confirm the factor structure and achieve high convergent validity.
Third, the ceiling effect of responses might have resulted in an underestimation of validation data. Fourth, we compared CES2.0 to the short version of CES1.0 instead of the long version due to space limitations. A comparison between the CES2.0 and the CES1.0 long forms might provide more accurate results. This is because the space of questionnaire was limited because this survey had several objectives simultaneously and contained more questions other than development of the CES2.0. Fifth, in this study, we did not conduct cognitive interviews with the participants. Performing cognitive interviews could help provide important respondent feedback on the language, comprehensibility, ambiguity, and relevance of the items. Sixth, due to ethical considerations, bereaved family members considered to have severe psychological distress as determined by the primary physician were excluded. This might have resulted in a selection bias. Finally, the possibility of additional biases, including a recall bias, a cultural bias (this survey was only conducted in a Japanese population), and a bias in that this scale was only evaluated in relation to cancer patients, cannot be excluded.

Conclusions
The modified CES2.0 demonstrated good reliability and validity. The most important result was that the misresponse rate in CES2.0 was 0%, in contrast to the 17% in CES1.0 found in this study. Second, no other CES2.0 psychometrics decreased after the modifications from CES1.0. Furthermore, test-retest reliability was greatly improved.