Our study was the first to translate and test the HURT Questionnaire in clinical use in an Arab population. It showed that HURT in Arabic language and applied to a population of Arabic-speaking headache patients in primary care is a reliable instrument. The 4-6-week period between test and retest balanced potential recollection bias (retest being influenced by the patients’ possible recollections of his or her previous responses) against the likelihood of real change in the disease during the test-retest interval. Questions 1-4 showed moderate but significant correlations (ranging from 0.66 to 0.78). These are acceptable, and at levels expected for this type of instrument, for questions that require recall of symptoms and medication use over the preceding 1-3 months [17, 18]. For questions 5-7, excellent correlations were noted (ranging from 0.90 to 0.93) [17, 18]. This reflects the more opinion-based nature of these questions and their relationship to present time rather than being recall-dependent. Internal consistency (Cronbach’s alpha = 0.74) was also acceptable.
We have also shown that HURT, in Arabic, is responsive as an outcome measure. Although the clinical change between baseline and follow-up visits was not quantified (no “gold-standard” measure exists), it was probably real for two reasons. First, most change was toward improvement, which must be expected after 3 months of medical treatment. Second, patients in whom HURT questions 1-4 signalled improvement reported satisfaction (positive PSS scores), while those in whom HURT signalled worsening (or no improvement) reported dissatisfaction (negative PSS scores). The opposite direction of change in the responses to question 7 was unexpected, but it might, perhaps, be explained. This question addresses patients’ feelings about headache control in general, and may have been interpreted in different ways. Some patients may have understood it to be asking about a “cure” for their condition, rather than effective management or control. It may well be that (some) patients’ expectations were unduly high and consequently unmet, or, very possibly, that 3 months was not sufficient to engender a feeling of control.
Validation of an outcome measure against expressions of patients’ satisfaction is methodologically debatable. We chose this approach for two reasons. First, there is no other outcome measure validated for Saudi Arabian culture. This was decisive on its own, but, second, patients’ satisfaction is of itself an important aspect of outcome. The drawback is that patients’ satisfaction has many determinants. It would be out-of-place here to discuss the large literature on this (none of it related to a Saudi population). However, while change in the disease itself is of course among these determinants, so, and importantly, is change in the way patients cope with and perceive their disease. The latter is highly subject to prior expectation, which may or may not be reasonable (either too high or too low). Nevertheless, the clear correlation, in the expected direction, between patients’ satisfaction and change as quantified by HURT strongly suggests that HURT detected and measured real change.
Whether change was due solely to standard care or improvement was enhanced by PCPs’ use of HURT is not absolutely clear: we found only a strong trend (P = 0.06) towards greater satisfaction in patients in the intervention (HURT) group compared with those in the control (standard care) group. Although the PSS was locally developed and itself not previously validated, we believe we showed here that PSS scores were an indicator, generally, of good outcome. But, for the reasons given above, patients’ satisfaction may be neither sensitive nor specific enough to reflect any effect of an intervention of this sort. DSS scores showed no difference between groups. The DSS was also locally developed and unvalidated. Doctors’ satisfaction has different determinants: it is likely of course to be increased by improved outcomes, but it may also be decreased by use of an outcome measure that indicates outcomes could be better (as HURT is intended to do). To establish the clinical utility of HURT as a management aid needs further study, but the lack of a gold-standard outcome measure (a gap that HURT was designed to fill) remains as an impediment to such study.
The study had one other limitation. For practical reasons, we randomized physicians rather than patients. Although all physicians received similar training, outcome differences between groups could in part have reflected differences in practice. Any such influence was partially offset by switching the two control centres to intervention, applying HURT, during the last six months of the study. Although this introduced the possibility of a period effect, it was unlikely that this was large or significant, and anyway it was diluted. We do not believe the minor differences between control and intervention groups in gender and level of education (Table 5) would have had significant impact on the comparison.