Main findings
The study found excellent agreement for register variables that were clearly described in all medical records. The study has led to improvements in the initial patient registration form.
The Norwegian quality register for severe headaches (the Headache Register) was given national status in spring 2022, with the primary aim of implementing quality improvement measures to ensure equal access to high-quality diagnostics, treatment and follow-up for patients with trigeminal autonomic cephalalgias (TACs) (1). The register is based on patient consent and includes patients assessed in the specialist health service with cluster headache, hemicrania continua, paroxysmal hemicrania and short-lasting unilateral neuralgiform headache attacks with cranial autonomic symptoms (2). The overall one-year prevalence of these headache types in Norway is estimated at just under 0.02 % (3). After obtaining patient consent, an initial registration form is completed by a neurologist or other qualified healthcare personnel (1). In less than three years, over 670 such forms have been completed.
Documenting data quality is essential for ensuring that register data can be used in quality improvement efforts and research (4). Well-defined, unambiguous register variables increase the likelihood of accurate data (4, 5). The aim of this data quality study was to examine whether the register variables in the initial registration form (see appendix (in Norwegian)) are interpreted consistently by different participating healthcare personnel (4).
Material and method
The initial registration form consists of 27 mandatory questions recorded for all patients, but for those with cluster headache there are at least 31 questions. The Headache Register's clinical advisory board has defined selected variables as quality indicators as there is potential for improvement at many hospitals. The register secretariat invited all 18 neurology departments in Norwegian hospitals to participate in the data quality study.
Prior to the study, ten anonymised case reports were developed based on real patient histories. Emphasis was placed on data protection, which involved changing data on sex, age and date of examination. The case reports were structured as realistic medical records, with information presented in an unstructured format and with some data intentionally omitted.
To determine the proportion of correct or incorrect responses, an answer key was devised for each case report before data analysis, ensuring that excellent agreement was not simply a result of everyone making the same error. Data were collected in a customised database for each hospital. Before the data collection started, an information meeting was held to ensure that participating healthcare personnel received the same instructions. It was emphasised that the register secretariat should only be contacted in the event of technical issues, but participants were allowed to discuss uncertainties with colleagues at the same hospital.
The data were collected between 1 and 30 April 2025. After completing the ten initial registration forms, participants were asked to email the register secretariat with any additional comments. Inter-rater agreement was measured using various methods in accordance with the guidelines for inter-rater agreement (or interrater reliability) studies (5). Observed agreement, expressed as a percentage for each variable, was calculated by dividing the number of responses in which participants recorded the same answer by the total number of responses for that variable. A limitation of this method is that it does not account for chance agreement and does not capture inter-rater disagreement (5).
For categorical data, supplementary analyses were therefore performed using Fleiss' kappa, which adjusts for chance agreement (6). A major limitation of the Fleiss' kappa formula arises when interpreting 'Yes/No' response options in cases where almost all participants answered 'Yes' (7). Even if observed agreement is 90–100 %, Fleiss' kappa may be zero or negative. When responses are highly skewed, with 90–100 % answering 'Yes', Fleiss' kappa is not reported due to interpretative limitations (7). A Fleiss' kappa of 1.00 indicates perfect agreement; between 0.99 and 0.81 indicates almost perfect agreement; 0.80–0.61 indicates substantial agreement; and 0.60–0.41 indicates moderate agreement.
Results
Eleven of the 18 hospital departments participated, resulting in a total of 110 completed initial registration forms. All participants were from different hospitals and had prior experience with recording data in the Headache Register. Six participants were neurologists, and the remaining five were other categories of qualified healthcare personnel. Observed agreement was ≥ 90 % for 18 of the total 28 variables (Table 1). Excellent agreement (≥ 94 %) was observed for three quality indicators defined by the clinical advisory board: diagnosis within one year of symptom onset, received non-pharmacological treatment, and minimal opioid use for cluster headache. Moderate agreement was found for use of a headache diary (kappa 0.58) and provision of written information (kappa 0.51). Agreement was lowest for the efficacy of preventive medication (kappa 0.19) and acute medication (kappa 0.40) (Table 1).
Table 1
Selection of variables from the initial registration form of the Norwegian quality register for severe headaches, by decreasing observed agreement among 11 participants, all of whom completed the form for ten anonymised case reports. Agreement adjusted for chance was calculated using Fleiss' kappa for variables where this provided meaningful additional information.
| Variable | Observed agreement (%) | Fleiss' kappa |
|---|---|---|
| Trigeminal autonomic cephalalgias diagnosis | 100 | 1.00 |
| Smoking status (four response options) | 100 | 1.00 |
| Tried acute medication (Yes/No) | 100 | - |
| Use of preventive medication (Yes/No) | 100 | - |
| Chronic cluster headache with minimal opioid use1 (Yes/No) | 100 | - |
| Performed supplementary investigations (Yes/No) | 98 | - |
| Received surgical treatment (Yes/No) | 96 | 0.81 |
| Offered non-surgical treatment (Yes/No) | 96 | 0.75 |
| Use of benzodiazepines (Yes/No) | 96 | - |
| Received non-pharmacological treatment1 (Yes/No) | 95 | 0.75 |
| Active migraine in the past year (Yes/No) | 95 | 0.80 |
| Brain MRI performed | 95 | - |
| Use of opioids (Yes/No) | 95 | 0.87 |
| Received greater occipital nerve block (Yes/No) | 95 | - |
| Diagnosis within one year of symptom onset1 | 94 | 0.78 |
| Patient has multiple headache types (Yes/No) | 92 | 0.63 |
| Receiving disability benefits (Yes/No) | 91 | 0.73 |
| Medication-overuse headache (Yes/No) | 91 | 0.53 |
| Sick leave if not receiving disability benefits (Yes/No) | 89 | 0.40 |
| Days in the past month with trigeminal autonomic cephalalgias | 88 | - |
| Offered follow-up in the specialist health service (Yes/No) | 87 | 0.56 |
| Headache diary kept prior to consultation1 | 87 | 0.58 |
| Received written information1 (Yes/No) | 85 | 0.51 |
| Initial consultation (Yes/No) | 85 | 0.21 |
| Number of days in the past month with other type of headache | 83 | - |
| Use of oxygen during cluster headache attacks | 79 | 0.54 |
| Effect of acute medication (four response options) | 73 | 0.40 |
| Effect of preventive medication (four response options) | 64 | 0.19 |
1Variable defined as a quality indicator because the register's advisory board has identified potential for quality improvements.
Discussion
The study was designed in accordance with guidelines from Norway's national support units for medical quality registers and based on previous experience from this unit in the Central Norway region (4, 5). Similar studies have been conducted for the Norwegian Stroke Register and the Norwegian Myocardial Infarction Register, among others (8, 9).
This study demonstrated excellent agreement for register variables containing complete information. This applied to, for example, smoking status, headache diagnosis and the provision of non-pharmacological treatment options. Agreement was lower where information was missing or open to interpretation. In two case reports, it was unclear whether written information had been provided and whether the patient had written a headache diary over the preceding month, which explains the reduced agreement for these two quality indicators. The question regarding efficacy only applied to preventive medications in tablet form. Nevertheless, several participants responded to this question for three patients who received injection treatment with botulinum toxin. In addition, there were differing responses for one case report concerning a patient with paroxysmal hemicrania who had previously responded well to indometacin but had to discontinue treatment due to unacceptable adverse effects (2).
Diagnosis was challenging for two case reports in relation to distinguishing between episodic and chronic cluster headache, as well as between probable and definite diagnoses of short-lasting unilateral neuralgiform headache attacks with cranial autonomic symptoms. The primary diagnosis was used when assessing inter-rater agreement.
The study revealed few obvious data entry errors. It was clearly stated in the case reports that imaging had been performed in all ten anonymised patients. Nevertheless, supplementary investigations were recorded as not having been performed in 2 of the 110 initial registration forms. This may be due either to data entry errors or to the case reports not having been read carefully in these two instances.
Completion of all variables was mandatory, including where information was missing. This applied to, for example, the number of days with a headache in the preceding month, and several participants subsequently requested a response option of 'Unknown' in cases where this information was unavailable. In the data analyses, it was noted that patients' ages varied between participants because some chose to backdate the initial registration form to the time when the consultation described in the case report took place. The user guidance for the initial registration form should therefore be revised on this point so that dating is applied consistently by all users of the Headache Register. In routine clinical practice, when registering patients in the quality register, missing information from medical records can be clarified by contacting the patient. This option was not available during the test registration, which contributed to lower agreement where relevant information was missing in the case reports. Unlike routine clinical practice, however, we cannot rule out the possibility that inter-rater agreement observed in this study was due to participants being particularly meticulous with their data entry and because of the extra time they were allocated. The response rate for the study was good, with as many as 61 % of the invited hospitals taking part. The study proved useful in relation to a revision of the initial registration form.
The article has been peer-reviewed.
St. Olavs hospital HF. Årsrapport 2024 Norsk kvalitetsregister for alvorlige primære hodepiner. https://www.stolav.no/fag-og-forskning/medisinske-kvalitetsregistre/hodepineregisteret/publikasjoner/ Accessed 28.10.2025.
Headache Classification Committee of the International Headache Society (IHS). The International Classification of Headache Disorders (IHS), 3rd edition. Cephalalgia 2018; 38: 1–211. 10.1177/0333102417738202
Hagen K. One-year prevalence of cluster headache, hemicrania continua, paroxysmal hemicrania and SUNCT in Norway: a population-based nationwide registry study. J Headache Pain 2024; 25: 30. [PubMed][CrossRef]
Nasjonalt servicemiljø for medisinske kvalitetsregistre. Dimensjoner av datakvalitet. Dimensjoner av datakvalitet - NasjonaltServicemiljø for Medisinske kvalitetsregistre. https://www.kvalitetsregistre.no/datakvalitet/datakvalitetsdimensjoner/ Accessed 28.10.2025.
Servicemiljø for medisinske kvalitetsregistre – Region Midt-Norge. Datakvalitet: En oversikt over metoder og analyser til bruk I valideringsstudier. 2023. https://www.kvalitetsregistre.no/4ac06e/siteassets/dokumenter/datakvalitet/analysemetoder-til-bruk-i-valideringsstudier.clean.pdf Accessed 28.10.2025.
Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971; 76: 378–82. [CrossRef]
Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990; 43: 543–9. [PubMed][CrossRef]
Varmdal T, Ellekjær H, Fjærtoft H et al. Inter-rater reliability of a national acute stroke register. BMC Res Notes 2015; 8: 584. [PubMed][CrossRef]
Govatsmark RE, Sneeggen S, Karlsaune H et al. Interrater reliability of a national acute myocardial infarction register. Clin Epidemiol 2016; 8: 305–12. [PubMed][CrossRef]