Using natural language processing to explore heterogeneity in moral terminology in palliative care consultations

Background High quality serious illness communication requires good understanding of patients’ values and beliefs for their treatment at end of life. Natural Language Processing (NLP) offers a reliable and scalable method for measuring and analyzing value- and belief-related features of conversations in the natural clinical setting. We use a validated NLP corpus and a series of statistical analyses to capture and explain conversation features that characterize the complex domain of moral values and beliefs. The objective of this study was to examine the frequency, distribution and clustering of morality lexicon expressed by patients during palliative care consultation using the Moral Foundations NLP Dictionary. Methods We used text data from 231 audio-recorded and transcribed inpatient PC consultations and data from baseline and follow-up patient questionnaires at two large academic medical centers in the United States. With these data, we identified different moral expressions in patients using text mining techniques. We used latent class analysis to explore if there were qualitatively different underlying patterns in the PC patient population. We used Poisson regressions to analyze if individual patient characteristics, EOL preferences, religion and spiritual beliefs were associated with use of moral terminology. Results We found two latent classes: a class in which patients did not use many expressions of morality in their PC consultations and one in which patients did. Age, race (white), education, spiritual needs, and whether a patient was affiliated with Christianity or another religion were all associated with membership of the first class. Gender, financial security and preference for longevity-focused over comfort focused treatment near EOL did not affect class membership. Conclusions This study is among the first to use text data from a real-world situation to extract information regarding individual foundations of morality. It is the first to test empirically if individual moral expressions are associated with individual characteristics, attitudes and emotions.


Background
End-of-life (EOL) care causes many challenges for patients and palliative care (PC) physicians. Decisions that must be made regarding the care of the dying patient must be considered within the context of the psychological, physical, financial and social experiences of the patient's life [1]. Within the hybrid of these domains, researchers have tried to identify which factors are most important to patients, how they relate to decisions and how decisions relate to PC treatments and outcomes [2][3][4][5]. Previous studies have analyzed the relationship between patient's preferences and PC treatments [6]; between PC treatments and outcomes [2]; and between preferences and outcomes [3,7,8]. Treatments that are inconsistent with patient preferences are associated with some negative outcomes, such as higher healthcare utilization costs [6], lower quality of life, and physical and psychological distress [7]. Some PC research has also focused on the relation between underlying factors such as beliefs, norms and values and preferences, sometimes leading to EOL decisions [7][8][9]. Studies found that religion is one factor in a patient's desire to request lifesustaining treatments even when a palliative care (PC) physician thinks such treatments are ineffective [10,11]. Indeed, some religious laws prohibit interventions that shorten life, or reversely: interventions that extend life. Spiritual beliefs are also often documented in surveys and previous research has already explored the importance of these beliefs in seriously ill, hospitalized populations [8,[12][13][14].
In order to get a better understanding of some of these underlying factors, PC communication plays an important role in improving quality. The primary purpose of a PC consultation is to give patients information about their prognosis, as well as discuss goals of care, and address patients' questions or concerns. Previous research has established that feeling "Heard & Understood" is a promising quality measure for PC communication in the inpatient palliative care setting [1][2][3], and that mirroring and reflection in the communication can help patients understand their prognosis better [4][5][6].
It is established that patients who have a better understanding of their prognosis are more likely to prefer and have treatments that are aligned with their personal values, goals and beliefs [15]. Palliative care consultations can thus be an important factor influencing palliative care outcomes, however, it remains unclear what defines a "quality" conversation or "effective" palliative care consultation. Based on previous studies, we know that informing patients about their prognosis is important [16][17][18], but the way physicians communicate these messages may also depend on how patients receive the information and willingness to listen. The latter, for example, has been shown to relate to dogmatism, among others [19]. The content and addressing uncertainty [9] is important, but using conversational pauses [20] and connectional silence [21] have also been proven to be effective tools in PC communication. Quality of PC consultations is primarily focused on aligning prognosis communication with patient's personal values and beliefs to optimize understanding.
In this context, the choice of language plays an important role in PC consultations. It is well-established in sociolinguistics that the use of particular words in conversation are a reflection of underlying value [22][23][24]. Little is known, in the context of end-of-life care, about which factors may explain some of the rhetoric used by patients. In order for physicians to improve prognosis communication, we need to be able to differentiate PC consultations and get a better understanding of differences in rhetoric used. This way physicians may be able to better align conversation language with patient's underlying values, beliefs and preferences in prognosis communication.
The objective of this study is to identify specific moral rhetoric used by patients in palliative care consultations and analyze if emotions, self-reported EOL preferences, religion and spiritual needs are associated with differences in moral expressions. We focus on analyzing moral rhetoric in PC consultations and explore the factors related to differences in moral expressions used by patients. The main research question of this study is: "Are preferences, emotions, religious affiliation and spiritual needs associated with vice or virtue and different moral expressions in EOL conversations?" This is a unique contribution to the literature as this study combines attitudinal data and moral expressions and maps how these are being used in a real-life and morally salient context.

Foundations of morality
There has been some research into the role of morality in palliative care. Some scholars have described morality as one of the three sources of transcendence in PC patients 14 and others have looked at whether these values or beliefs have an effect on PC outcomes, such as coping strategies, emotional outcomes and spiritual quality of life [25]. This potential influence of moral values makes the study of EOL decisions a unique and important contribution to empirical investigation of human motivations and decision making, capturing the full socialfunctional range of morality.
There are several schools of thought in moral psychology defining "morality". The moral foundation theory (MFT), developed by Haidt and Joseph (2004), has been one of the most influential theories within moral psychology in the last decade. The MFT intends to explain the origins of and variation in human moral reasoning based on innate, modular foundations. In one of the key publications, Graham, Haidt and colleagues explain that "monists" describe morality as "one" type: this is usually identified as justice or fairness, referred to as "virtue" [26]. As time has evolved, evolutionary thinking has encouraged pluralist thinking about morality, they suggest. They describe in detail [11] how five moral foundations can be defined, which can be described by their characteristic emotions and relevant virtues: As can be seen, Graham and Haidt describe the five foundations that can have either a "vice" or a "virtue" realm. For example, loyalty is the "virtue" in the third foundations where betrayal is the "vice". The authors describe the two different values, positive and negative, as being part of one and the same foundation that can be explained by group pride and rage at traitors.
Despite the ongoing work on this MFT in moral psychology, the validity of this scale (both internal and external) across different cultures is not yet fully established. Also, it remains a challenge for the theorists to fully capture the highly variable and subjective nature of individual moral values [27]. For this reason, Graham and Haidt developed a MFT dictionary which can be used to analyze any corpus of text. They recently "called" on researchers in big data analytics to use their dictionary [28] to incorporate big data analytics into the study of morality to gain a new way to gather information in natural settings about the structure of moral visions, large-scale moral behavioral patterns, and the relation between the two.
To our knowledge, only one study used the Moral Foundations Dictionary (MFD) to analyze real conversational data. The authors of that study used short-post social media to compare the accuracy of text analysis methods for detecting moral rhetoric and longer form political speeches to explore detecting shifts in that rhetoric over time [29]. They demonstrated how capturing moral rhetoric in text over time opens up new avenues for research such as assessing when and how arguments become moralized and how moral rhetoric impacts subsequent behavior. We used the MFD for this study, to provide a framework to "test" if we can use an existing data dictionary to analyze and explore moral values expressed in conversations by grouping words according to their morality framework.

Data
For this study, we used data from the Palliative Care Communication Research Initiative (PCCRI), a multi-site cohort study of naturally occurring inpatient palliative care consultations [30,31]. The PCCRI was designed to understand the relation between clinical communication and patient-centered outcomes. The 6-month cohort data includes directly observed and audio-recorded palliative care consultations; patient/proxy and clinician self-report questionnaires both before and the day after consultation; post-consultation in-depth interviews; and medical/administrative records. The audio data for the PC consultations and follow-up interviews were converted to a transcription of text data for analysis.
The study data were collected for 231 hospitalized patients with advanced cancer who consulted with PC in two large academic medical centers in the United States. For our study we used the patient/proxy questionnaire for patients' demographic information (age, gender, race, education, financial insecurity) and self-reported preference for comfort-directed care near EOL, and attitudinal variables such as distressing uncertainty, spiritual distress, emotional distress, religious affiliation (if any), and whether patients felt their spiritual needs were being met by their religious community or the medical system. We used verbatim transcriptions of the palliative care conversations to identify moral words using the MFD data dictionary described in the previous paragraph.
The psychologists who developed the MFD did this by classifying words in one of the five moral foundations, by vice or virtue. This results in 10 potential "dimensions" of moral words in the text: each of the 5 foundations with "vice" and "virtue" categories for each foundation.

Text mining
We used 231 audio-recorded and transcribed inpatient PC consultations and data from baseline and follow-up patient questionnaires at two large academic medical centers in the United States. With these data, we identified different moral expressions using text mining techniques and natural language processing. The words that each patient or proxy said were combined into a single corpus of text. We included only text used by patients, not physicians or other members of the conversation.
The corpus was then split into a list of individual words, which were set to lowercase and stemmed. Stop words, such as" and"," the", and" of", were removed from each corpus to reduce the noise of the data.
First, we added up all the morality words used by the patient in a PC consultation, and counted, after preprocessing, the total number of words used by the patient as a proxy for the length of the conversation. We then disaggregated the words from the data dictionary to create the 10 different categories of moral terminology in the PC consultations. We created a matrix for all categories where a word from the Moral Foundations Theory Dictionary (MFD) occurred in a patient's text, that patient was assigned a value of" 1″ for that word's associated MFD category. The text mining process was performed with Python 3.7.3.

Statistical analysis
After merging the data from the text with the data from the PC survey, we analyzed the data in a few steps, adopting an exploratory approach to test relations between underlying factors and moral expressions in the PC consultations.
First, we used latent class analysis (LCA) to classify the patterns of MFD expressions into mutually exclusive classes. LCA is based on the idea that a discrete latent variable accounts for observed associations between a set of indicators, such that, conditional on the latent class variable, these associations become insignificant [32]. A statistical indicator is simply an observed value of a variable, so that they can be used in statistical models to allow for meaningful comparisons and to show positive or negative change. "High quality" indicators are predicted by the latent variable, in this case "moral charge", to have a probability near zero or one. Such indicators are generally necessary for model estimation and the interpretation of the latent classes.
The ten indicators in our analysis were created after the text mining phase: each one indicated whether a patient used a vice-or virtue-related word in one of the five dimensions of the MFT. In addition to the indicators (which are used for the actual classification) covariates were included in the model to explain class membership: age, gender, race, education, financial security and religion. We also included self-reported variables regarding patient's spiritual needs, whether they reported emotional, spiritual or uncertainty-related distress, and preferences for comfort-directed treatment at EOL and looked at patterns of several of the attitudinal variables. Our analyses focused on preferences for comfort-directed EOL treatment; emotional, spiritual or uncertainty-related distress; and whether patients felt their spiritual needs were being met by (1) their religious community or (2) the medical system. EOL preference was defined by the answer to a survey question: "During the last few months of my life, I would prefer a plan of treatment that focused on my comfort and quality of life, even if that meant not living quite as long", which is answered by a 5-point Likert scale.
The questions related to emotional feelings, also answered by 5-point Likert scales, included: -Over the past 2 days, how much have you been bothered by emotional problems such as feeling anxious, depressed, irritable, or downhearted and blue. -Over the past 2 days, how much have you been bothered by uncertainty about what to expect from the course of your illness? -Over the past 2 days, how much have you felt at peace?
Questions related to spiritual needs included: "How much are your spiritual needs being supported by a religious community (like clergy or members of a congregation)?", and: "How much are your spiritual needs being supported by the medical system (doctors, nurses and chaplains)?" where both were answered by "completelyquite a bit-moderately-slightly-not at all".
Second, to explore which factors were associated with patients' use of vice-or virtue-related words, and their use of words belonging to the 5 different foundations of morality, we used Poisson regressions. Age was a continuous variable, race was represented by a binary variable for "white", education was categorical, and "financial security" was represented by a categorical variable: "When you think about the amount of income that you have available in a typical month, how often is it enough for things you really need like food, clothing, medicine, repairs to the home, and transportation?"answered by "all the time", "most of the time", "some of the time". We included a binary variable for "Christian" religion and one for "other religion" which included Judaism, Islam, Hinduism, Buddhism, and "other" from the survey data. We also controlled for the total amount of words used by the patient in the consultations, as a proxy for the length of the conversation.

Descriptive statistics
The summary statistics are reported in Table 1. Three quarters of patients were above the age of 55 and half of the patients were female; 79% were White while the remaining 21% were either Black or Latino. About one third of the PC patients had a college degree or higher while half of the sample finished high school or had some years in college. One third of patients felt financially insecure, described by not having enough income in a typical month to pay for clothing, food or transportation. 67% of patients maintained a connection with Christianity, while 24% did not have a religion. Fewer patients connected with Judaism (2.1%), Islam (0.8%), Hinduism (1.3%) and Buddhism (0.4%); in total 9% of patients have some "other" religion than Christianity. About 69% strongly agreed or agrees that EOL treatment plan should focus on comfort and quality of life even at the expense of longevity, 22% were unsure and about 10% either disagreed or strongly disagreed. A little more than a third feels that their spiritual needs are supported by their religious community while one third beliefs that those needs are being met by the medical system. Almost half of patients have felt bothered by emotional problems such as feeling anxious, depressed, irritable, or downhearted and blue in the past 2 days. Also, about half of patients feels uncertain about the course of their illness and does not feel "at peace".
We continued our descriptive analysis looking at the use of MFD words. We found that about half of the patients did not use any of the MFD words at all. For those who did use MFD words, we looked at the number of words they used per category of the Moral Foundations Theory. Table 2 provides an overview of the MFD words used in each of the 10 dimensions of morality: five moral foundations times two subcategories (vice and virtue) per foundation. It also reports the total number of morality words used by patients, the total number of words used in the consultation and their relative frequency.

Latent class models
After identifying how many moral words were being used in the different dimensions of morality and their vice-virtue subcategories, we explored the results of the latent class analysis (LCA). The dependent variable was "moral charge" defined by the number of moral words used in the PC consultation; the indicators were the 10 morality categories. First, we determined the number of latent classes. Table 3 reports the fit of the models using a different number of classes. We considered theoretical interpretability and compared the statistical tests of model fit using models for one to five possible latent classes. Table 3 illustrates that the likelihood decreased slightly when moving from two classes to three classes while the Bayesian Information Criterion (BIC) of the 2-class model was lowest, suggesting the 2-class model provides the optimal balance between model fit and model complexity. Table 3 also illustrates the profiles of the two latent classes, including the class sizes, the indicators and the covariates mentioned in the previous section. The Wald test statistics indicate that 9 of the 10 indicators are highly significant and thus classify the two groups, except the indicator "Fairness-Vice" (Wald 0.2484, p = 0.62). Overall, the two classes can be interpreted as one in which individuals use many morality words (31.7% of the sample of patients) and one where moral terms occur infrequently (68.3%). Individuals in the first class use some words in the Harm-virtue, Harm-vice and Ingroup-virtue dimensions, but not many in the other dimensions of morality.
Except for gender, financial security and preferences for comfort-directed treatment near EOL, all exogenous variables (age, white race, education level, Christian, other religion, emotional, spiritual, and uncertainty-related distress and spiritual needs) are associated with class membership. Being female, feeling financially secure and preferring comfort-directed treatment near EOL are independent of class membership. There are slightly more males (52% vs 48%) in the class using fewer moral words, and more younger patients and more Whites (80% vs 75%). Overall, among patients in class 1, there were fewer Christians (68% vs 72%) and fewer patients with another religion (6% vs 14%).

Poisson models
Following the LCA, we looked at which variables were related to the number of morality words used. First, we explored the variables associated with vice and with virtue, to see if any of the individual characteristics, religious affiliation or attitudes were related to the use by patients of virtue-versus vice-words. The data followed a Poisson distribution: the use of virtue and vice words could be treated as rare events, since many patients did not use MFD words at all. As the Poisson distribution assumes that the mean and variance are the same, we tested the fit of a Poisson model versus Negative Binomial models. The likelihood ratio test is a test of the over dispersion parameter alpha: when alpha is zero, the more flexible negative binomial distribution is equivalent to a Poisson distribution. In our case, alpha was not significantly different from zero, suggesting the Poisson distribution was appropriate, both for virtue and for vice, so we used Poisson regressions to estimate the amount of moral rhetoric in PC consultations. We also used a Vuong test of the zero-inflated model versus the standard Poisson model and found that the excess zeros should not be modeled independently. We used robust standard errors for the Poisson models [33]. In all Poisson models, we controlled for the length of the conversation by normalizing based on the total number of words used by the patient in the consultation. Table 4 reports the results of the virtue and vice models. We found that being White and Christian was, on average, associated with using fewer words in the virtue categories (− 0.46 (p = 0.09); − 0.42 (p = 0.05)). Patients who had been increasingly bothered by emotional problems such as feeling anxious, depressed, irritable, or downhearted and blue, felt more uncertain about their prognosis or felt less "at peace", used more moral terms in the "vice" category of the MFD (p < 0.01).
After establishing that emotional distress, white race and Christianity were associated with the use of virtue and vice words by patients, we were interested in which variables were related to using words in each of the five distinct morality foundations described by the MFT (merging the vice and virtue sub-categories per dimension). Table 5 reports the results of Poisson models estimating the use of words in these dimensions. We found that  Most of the time 29 27 All of the time 38 37 Spiritual Needs supported (%) (Wald = 3, p = 0.08)

Completely 22
being white was also associated with the use of fewer words in the "Care/Harm" foundation (− 0.25, p = 0.07); and being Christian was related to using fewer "Loyalty/ Betrayal" words (− 0.29, p < 0.01). Feeling more emotional, spiritual or uncertainty-related distress was associated with more words in "Care/Harm" and "Sanctity/ Degradation". In addition, we found that patients who were higher educated used, on average, slightly more words in the "Fairness/Cheating" foundation (0.04, p = 0.08) and "Authority/Subversion" (0.08, p = 0.02), and fewer words in "Loyalty/Betrayal" (− 0.05, p = 0.05). Interestingly, the more patients felt that their spiritual needs were being supported by their religious community or the medical system, the more words they would use in the "Fairness/ Cheating" foundation (0.03, p = 0.07), but fewer words in "Loyalty/Betrayal" (− 0.03, p = 0.10).

Discussion
This study used data from transcribed palliative care consultations to identify moral expressions used by hospitalized patients with advanced cancer and to analyze if individual characteristics, religion, self-reported EOL preferences, spiritual needs and emotional distress were associated with (differences in) the moral lexicon as determined by the moral foundation dictionary (MFD) corpus. We found in our LCA that about two thirds of patients use few or no morality words at all while about a third does use a lot of moral rhetoric. Employing the MFD, which distinguishes five moral foundations and vice and virtue subcategories within each dimension, we found that being White and Christian were both associated with the use of fewer words in the "virtue" category and more emotional distress were associated with the use of more "vice" words. These factors were also related to the use of words in Care/Harm, Loyalty/Betrayal, and Sanctity/Degradation dimensions of the MFT. We also found that education level was related to the use of words in Fairness/Cheating, Loyalty/Betrayal and Authority/Subversion. To what extent patients stated that their spiritual needs were being supported by religious community or medical system was also associated with moral rhetoric used in the Fairness and Loyalty foundations.
There are a number of limitations in this study. First, our analysis assumes that the MFD is a correct tool to identify morality, and in particular the five different foundations identified in the MFT. However, the data dictionary is relatively novel and has not been tested very often empirically, other than the study mentioned further above. Second, our study results may not be generalizable to other populations of patients. There are further limitations associated with the bag of wordsapproach that we used in the text mining phase of the study. A disadvantage is that it limits the context of the conversation and loses the order of specific information. Results report marginal effectsrepresenting the change in the number of words used. Robust standard errors in parentheses *** p < 0.01, ** p < 0.05, * p < 0.1  Results report marginal effectsrepresenting the change in the number of words used. Robust standard errors in parentheses *** p < 0.01, ** p < 0.05, * p < 0.1 Bag-of-words requires supervised machine learning which entails modeling linguistic knowledge through the use of dictionaries containing words that are tagged with their semantic orientation [33]. We used the existing MFD data dictionary which was created by others and we accept the classification of the English words to identify morals as a given.
The most important piece of our study was to adopt a plurality perspective to morality, and therefore we wanted to distinguish between different types of morality words used by different patients. In order to further explore the differentiation of moral terminology and evaluate which factors are related to specific moral terms, we would need more data as for some groups we did not have enough words in some MFT foundations. Based on our analysis, for example, we found that having any religion mattered but we could not differentiate enough between morality among several religions because of sample size issues.

Conclusions
This study is among the first to use text data from a real-world situation to extract information regarding individual foundations of morality. It is the first to test empirically if individual moral expressions are associated with individual characteristics, attitudes and emotions. The results of this study are relevant to those who seek to improve the quality of communication in order to achieve better and more values-concordant treatment at EOL.
Some of our findings may be relevant for a broader context. We found that those who feel that their spiritual needs are being met tend to use more moral language than those who do not. This study gives rise to the further development of conversation science which can be used by physicians to align moral and other sensitive aspects of PC consultations [34]. For example, it may be helpful to differentiate prognosis communication with respect to patients a-priori moral or spiritual values which may influence their EOL preferences. More research would be needed to establish the exact relationship between (any) religious affiliation and spirituality on the moral dimensions of conversations, in palliative care and in a broader societal context.