plos PLoS Currents: Huntington Disease 2157-3999 Public Library of Science San Francisco, USA 10.1371/currents.hd.f19ef63fff962f5cd9c0e88f4844f43b Test-Retest Reliability of Diffusion Tensor Imaging in Huntington’s Disease Cole James H. Huntington's Disease Research Group, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK; Computational, Cognitive & Clinical Neuroimaging Laboratory, Department of Medicine, Imperial College London, UK Farmer Ruth E. Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK Rees Elin M. Huntington's Disease Research Group, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK Johnson Hans J. Department of Psychiatry, University of Iowa, Iowa City, Iowa, USA Frost Chris Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK Scahill Rachael I. Huntington's Disease Research Group, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK Hobbs Nicola Z. Huntington's Disease Research Group, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK 21 3 2014 Diffusion tensor imaging (DTI) has shown microstructural abnormalities in patients with Huntington’s Disease (HD) and work is underway to characterise how these abnormalities change with disease progression. Using methods that will be applied in longitudinal research, we sought to establish the reliability of DTI in early HD patients and controls. Test-retest reliability, quantified using the intraclass correlation coefficient (ICC), was assessed using region-of-interest (ROI)-based white matter atlas and voxelwise approaches on repeat scan data from 22 participants (10 early HD, 12 controls). T1 data was used to generate further ROIs for analysis in a reduced sample of 18 participants. The results suggest that fractional anisotropy (FA) and other diffusivity metrics are generally highly reliable, with ICCs indicating considerably lower within-subject compared to between-subject variability in both HD patients and controls. Where ICC was low, particularly for the diffusivity measures in the caudate and putamen, this was partly influenced by outliers. The analysis suggests that the specific DTI methods used here are appropriate for cross-sectional research in HD, and give confidence that they can also be applied longitudinally, although this requires further investigation. An important caveat for DTI studies is that test-retest reliability may not be evenly distributed throughout the brain whereby highly anisotropic white matter regions tended to show lower relative within-subject variability than other white or grey matter regions. This work has been supported by the European Union – PADDINGTON project, contract no. HEALTH-F2-2010-261358. RS is supported by the CHDI/High Q Foundation, a not for profit organization dedicated to finding treatments for Huntington’s disease. This work was undertaken at UCLH/UCL, which received a proportion of funding from the Department of Health’s NIHR Biomedical Research Centres funding scheme. Introduction

Magnetic resonance imaging (MRI) is a valuable tool for investigating the progressive changes caused by neurodegenerative diseases, such as Huntington’s Disease (HD). A popular variant of MRI, diffusion weighted imaging (DWI) and the most commonly used analytic model, diffusion tensor imaging (DTI), can offer unique insights into the microstructural properties of both white and grey matter. Accordingly, DTI has been widely used to demonstrate regional neuroanatomical abnormalities in symptomatic HD patients 1,2,3,4,5, as well as in premanifest HD-gene carriers 6,7,8 , indicating that the technique is sensitive to some of the earliest neuropathological changes in HD.

One major strength of MRI is the ability to collect neuroanatomical data from multiple timepoints on a given sample in order to carry out longitudinal research studies; particularly pertinent in progressive diseases such as HD. Already, volumetric MRI has quantified on-going caudate nucleus atrophy across multiple disease stages 9,10, and offers utility as potential biomarkers of disease progression. Longitudinal DWI has been performed in HD 11,12,13,14, but these preliminary studies have been limited in scope and provided inconclusive results (see 15 for review).

Compared to T1- and T2-weighted imaging, DWI is prone to higher levels of image ghosting, susceptibility artefacts, eddy currents and geometric distortions, due to its reliance on single-shot echo-planar imaging 16. Crucially for movement disorders like HD, DWI is particularly sensitive to subtle bulk motion effects 17, meaning motion-induced signal loss is greater when compared with volumetric methods. These factors combine to increase the signal variability in DWI, thus it is important to demonstrate adequate test-retest (i.e. scan-rescan) reliability, if DTI is to have utility as a tool for accurately quantifying progressive microstructural changes. If within-subject variability (i.e. scan-rescan variability), influenced by the random occurrence of signal loss and distortion artefacts at different timepoints, is large relative to the between-subject variability at any one timepoint, then sensitivity to meaningful within-subject signal change will be compromised. Metrics such as intraclass correlation coefficient (ICC) can be used to quantify variability in order to provide an indication of the reliability of measurement technique.

Acceptable reliability of DTI metrics per se has been demonstrated previously in various settings 18,19,20,21,22,23,24, including in HD 43 ,however, what is apparent from these studies is that estimates of reliability vary considerably. Factors such as the specific image acquisition parameters, scanner characteristics (e.g. field strength, manufacturer), data processing methods or brain region under investigation 22,24,25, all lead to differing estimates of reliability. Hence the reliability of the specific techniques adopted in a longitudinal DTI study should be established, particularly if those techniques are to ever meet the exacting standards required by clinical trials; the long-term goal of MRI biomarker development. Furthermore, the presence of minor HD-related chorea in early HD patients may mean that such participants are more prone to inducing motion artefacts, reducing the signal-to-noise ratio in their acquired datasets and thus potentially leading to a problematic bias when comparing longitudinal changes with control groups.

The aim of the current study is to quantify the reliability of DTI measures in a sample of early HD patients and controls, specifically using those methods that have demonstrated cross-sectional sensitivity to HD 4 and will be used in on-going longitudinal studies. In addition, reliability will be compared between early HD patients and controls to investigate any potential disease-related group biases that may influence test-retest reliability.

Methods

Participants

Ten early HD patients and 12 healthy control participants (see Table 1) were scanned at 3T (Siemens) at the Institute of Neurology, University College London. Early HD subjects were required to be within stage I of the disease 26, defined by a Unified Huntington’s Disease Rating Scale (UHDRS) Total Functional Capacity (TFC) ≥ 11, indicating good functional capacity. Control participants were spouses, partners or gene-negative siblings of the early HD subjects. Inclusion criteria included participants being over 18 years of age, free from major psychiatric and concomitant neurological disorders, not currently participating in a clinical trial and no contraindications to MRI. The local ethics committee approved the study and written informed consent was obtained from each participant. Participants were drawn from the larger PADDINGTON study (Pharmacodynamic Approaches to Demonstration of Disease-modification in Huntington's disease by SEN0014196), designed to assess potential biomarkers of HD 4. This specific subset was solely drawn from the London site where we had access to the scanner and participants were included where additional free time on the scanner was available to allow the repeated diffusion scan.

Baseline characteristics Controls Early HD patients
N 12 10
Age (Years) 45.88 (15.69) 23.44-67.87 50.97 (7.68) 42.19-66.46
Sex (male/female) 4/8 3/7
CAG repeat length 43.3 (2.4) 39 - 46
Disease Burden [age x (cag-35.5)] 388.8 (114.56) 232.6 - 563.4
Total Functional Capacity 13.0 (0.0) 13 - 13 11.5 (1.27) 9 - 13
Total Motor Score 0.125 (0.35) 0 - 1 22.8 (10.79) 7 - 45

Values displayed as mean (standard deviation) followed by range, for continuous variables. Discrete variables show counts of numbers. Disease burden calculated according to the formula by Penney et al., (42). Total functional capacity and total motor score are taken from the Unified Huntington’s Disease Rating Scale (UHDRS).

MRI acquisition

Two diffusion-weighted MRI scans were acquired at 3T for each participant using an EPI sequence with the following parameters: TR = 7600 ms, TE = 84 ms, 65 axial slices of 2 mm thickness, with no inter-slice gaps, acquisition matrix = 96 x 128, in-plane resolution of 2 mm2, resulting in isotropic voxels. Diffusion data were acquired in 42 different encoding directions with b = 1000 s/mm2, along with 7 b = 0 images. The same protocol was repeated immediately in order to acquire back-to-back datasets for test-retest reliability analysis (i.e. the participant was not removed from the scanner between acquisitions). The scanning session also included collecting a high-resolution T1-weighted MP-RAGE scan for region-of-interest (ROI) segmentation with the following parameters; TR = 2200ms, TE = 2.2ms, flip angle = 10°, FOV = 28cm, matrix size = 256x256, in-plane resolution = 1 mm2, slice thickness = 1.0 mm with no inter-slice gap. Visual quality controls assessed the following: compliance with relevant acquisition protocols, minimal artefacts (e.g. movement, intensity) and head positioning.

Image analysis

Pre-processing

For each participant the two DWI scans were randomly assigned into one of two independent pre-processing and statistical analysis streams. Procedures were carried out for each stream separately. This was done to ensure no effects of acquisition order influenced the results. Firstly, diffusion-weighted images were registered to the mean of the seven b0 images to correct for motion and eddy current distortions, and the gradient direction scheme was updated accordingly. Subsequently, a non-linear least-squares method was used to fit the tensor at each voxel. Scalar maps of diffusion metrics such as fractional anisotropy (FA), mean (MD), axial (AD) and radial diffusivity (RD) were then derived from these tensor images. In order to carry out a comprehensive assessment of test-retest reliability, three different approaches were used: a T1 ROI analysis (as per 4) where the analysis was conducted in native diffusion space, an atlas-based automated white matter ROI analysis and a voxelwise analysis.

T1 ROI analysis

Four ROIs were defined on the T1 images. For the caudate, corpus callosum and cerebral white matter regions manual delineation was carried out using the MIDAS software package 27. For the putamen the automated BRAINS3 program was used 28. The resulting ROIs were transformed into native diffusion space by first registering the T1 image to the participant’s FA image, using an initial affine registration, followed by a non-linear registration, necessary to account for the non-linear distortions found in DWI. This was achieved using Nifty-Reg (https://sourceforge.net/projects/niftyreg) for both the affine 29 and non-linear 30 stages. The transformation from T1 to native FA space was then applied to the binary ROI labels using a nearest neighbour interpolation scheme. Registration accuracy for all data was assessed visually to ensure accurate placement of ROIs in diffusion space. The mean FA, MD, AD and RD values across the corpus callosum, cerebral white matter and bilateral caudate and putamen ROIs was then calculated using FSL (https://fsl.fmrib.ox.ac.uk). Four control participants did not have T1-weighted scans and were excluded from this element of the analysis, leaving 10 early HD patients and 8 controls with data for the T1 ROIs.

Automated atlas-based ROI analysis

Tensor images were converted into DTI-TK format (https://dti-tk.sourceforge.net) to run tensor-based registration, a method shown to improve registration accuracy compared with using FA images 31. Using the standard DTI-TK pipeline, a ‘bootstrap’ template was defined by an affine registration step to put all subjects into approximately the same space. Each native-space tensor image was non-linearly aligned to this template, using an iterative approach to refine the accuracy of the registration until the difference between successive iterations becomes minimal, based on the Euclidean distance of the tensors 32. Affine and non-linear transformation parameters were combined to allow the native space tensor images to be warped to common space in a single interpolation step. Once all the images were in a common space, the mean FA map was generated to act as a study specific template. As mentioned above, this was done independently for both processing streams; hence two separate templates were produced.

With all the tensor images aligned to the group template, FA, MD, AD and RD maps were generated for each participant. The next stage was to take the white matter ROIs defined by the ICBM-DTI-81 white matter tract atlas 33, which is supplied with FSL and contains labels for 48 white matter regions. The 2mm3 ICBM-DTI-81 atlas image was registered to the group FA template using Nifty-Reg to run an initial affine step followed by a non-linear refinement stage. The resultant transformation was then used to warp the white matter label files to group FA template space, which were finally thresholded at template FA > 0.2 to reduce partial volume effects. Mean FA, MD, AD and RD were then computed across each label region using FSL. This was repeated for the second processing stream and the resultant values were used in the subsequent reliability analyses.

Statistical analysis

As a simple assessment of agreement, Bland-Altman plots 34 were used to examine variability of DTI metrics within each T1 ROI, with HD patient and control data combined.

A common measure of reliability, the intraclass correlation coefficient (ICC) was used to assess the reliability of the scanning procedure in greater detail, and was calculated for controls and HD patients separately, for each region. Confidence intervals (CIs) at 95% were obtained for ICC values using the delta method. The within- and between-subject variances were also calculated for each measure. ICCs were unadjusted for age and sex in order to avoid making the methods incomparable with previous studies of DTI reliability 22,24.

Voxelwise analysis

A similar procedure to the atlas-based analysis was used, with the addition of a within-subject registration step, in order to increase the likelihood of voxelwise correspondence across the brain. Again using DTI-TK to register the tensor images, scans from both processing streams for each participant were co-registered using the initial affine and iterative non-linear steps as detailed above. Once co-registered to an unbiased ‘mid-space’, the subject means were calculated and then fed into the registration pipeline to define a group template, this time combining scans for both processing streams. The transformations from the separate registration steps were then combined together to allow registration from native space to this combined group space in one interpolation step. The ICC could then be calculated for both FA and MD at each voxel using the fslmaths utility in FSL. This registration procedure and subsequent statistical analysis was first completed using the 10 early HD patients and the using data from the 12 control participants.

Results

Regional reliability analysis

Test-retest reliability of T1-based and atlas ROIs showed generally high ICCs indicating good of levels of reliability (Tables 2-5). This was the case for all diffusion metrics (i.e. FA, MD, RD and AD). However, some variability in reliability was present across different brain regions.

For FA in the controls, 38 (79%) atlas ROIs had an ICC of 0.8 or above, with the majority of these being > 0.9. Exceptions to this included the corticospinal tract and cerebellar ROIs. This pattern was reflected in the early HD patients, with 41 (85%) ROIs having ICCs > 0.8. Generally, between-subject variance was larger in the HD group than controls, as would be expected. It is also worth noting that the within-subject variances are relatively similar for controls and HD in many regions. This implies that, in general, differences in ICC observed between controls and HD are driven by the between-subject variation rather than by the scanning technique being less reliable in one group than the other.

For MD 37 (77%) atlas ROIs had ICCs above 0.8 in the controls; the corresponding number was 44 (92%) in the early HD patients. As with FA, any between group differences in ICC were accompanied by wide 95% CIs around each group estimate, demonstrating the imprecision in the estimates. Results from AD were similar, where 33 (69%) atlas ROIs had ICCs > 0.8, and RD with 37 (77%) ROIs having ICCs > 0.8. In controls mean ICCs were very similar across metrics (FA mean ICC = 0.851; MD = 0.854; AD = 0.811; RD = 0.857). However, a few ROIs did show discrepancies between metrics, such as the right anterior limb of internal capsule and right external capsule, which exhibited substantially lower ICCs in AD compared to the other metrics. This is driven by the much smaller between-subject variation for AD than other for metrics in the right anterior limb of internal capsule, and by an increased within-subject variability for the external capsule. Examination of scatter plots of the data identified two outliers that may have contributed to this.

For the T1 ROIs, FA reliability was also high, with the caudate, putamen and whole-brain white matter regions having ICCs > 0.8 in controls and early HD patients. The corpus callosum region had a lower ICC in the controls (ICC = 0.697) though the ICC was high in the early HD patients (ICC = 0.877). For MD, there was considerable regional variability in reliability, with very high ICCs for the putamen and white matter (ICC > 0.95), an intermediate value for the corpus callosum (ICC = 0.76) and a low value for the caudate (ICC = 0.48). Furthermore, the ICC in the early HD patients was reasonably high for caudate MD (ICC = 0.84). For AD, both the caudate and the putamen had low reliability in the controls (ICCs < 0.4), though for RD the putamen values were very high (ICC = 0.99). The corpus callosum ROI showed the converse pattern, with a very high ICC for AD (ICC = 0.91) and a low one for RD (ICC = 0.61). The white matter ROI showed very high reliability (ICCs > 0.9) across the board.

To put these divergent results from the T1 ROIs into context, the Bland-Altman (BA) plots were considered for FA (Figure 1) and MD (Figure 2). These generally suggest good agreement, with the differences tending to lie within a small range. The exceptions to this were MD in the caudate, and FA and MD in the corpus callosum. In these cases, there was a suggestion of deviation from exact reproducibility in the combined HD patient and control group. However, the BA plots do not suggest that the difference between scans is dependent on the magnitude of the measurement in question, and indicated the possibility of outliers within the data.

Bland Altman plots of fractional anisotropy (FA) values to visually assess agreement, systematic bias and proportional bias in scanning technique for T1 ROIs (caudate, putamen, white matter and corpus callosum), for early HD patients (blue triangles) and controls (red circles). FA is a relative value derived from the diffusion tensor, where 0 indicates perfectly isotropic tensor dimensions (i.e. a sphere) and 1 indicates the maximum theoretical level of anisotropy.

Figure 1Bland-Altman for FA

Bland Altman plots of mean diffusivity (MD) values to visually assess agreement, systematic bias and proportional bias in scanning technique for T1 ROIs (caudate, putamen, white matter and corpus callosum), for early HD patients (blue triangles) and controls (red circles).

Figure 2Bland-Altman plot for MD

Bland Altman plots of axial diffusivity (AD) values to visually assess agreement, systematic bias and proportional bias in scanning technique for T1 ROIs (caudate, putamen, white matter and corpus callosum), for early HD patients (blue triangles) and controls (red circles).

Figure 3Bland-Altman plot for AD

Bland Altman plots of radial diffusivity (RD) values to visually assess agreement, systematic bias and proportional bias in scanning technique for T1 ROIs (caudate, putamen, white matter and corpus callosum), for early HD patients (blue triangles) and controls (red circles).

Figure 4Bland-Altman plots for RD

Voxelwise reliability analysis

The ICC maps for FA (Figure 3), MD (Figure 4), AD (Figure 5) and RD (Figure 6) indicated that the reliability was generally high across the brain, with the exception of areas inferior to the lateral ventricles. There is also a degree of noise evident in these maps, likely due to residual registration error between voxels. The group maps appeared qualitatively similar between controls and early HD patients. FA and MD also seemed to generate similar patterns of voxelwise reliability, although the diffusivity metrics (i.e. MD, AD, RD) tend to show consistently higher ICC scores with less regional variability than FA. Lower ICCs were evident in the basal ganglia regions for all four measures.

Voxelwise distribution of reliability metrics for fractional anisotropy (FA). Panel A) shows sagittal, coronal and axial slices of the mean FA image created during the image processing, included for anatomical reference. B) Equivalent three slices for the intraclass correlation coefficient (ICC) of FA in early HD patients. Higher values reflect higher test-retest reliability. C) ICC of FA in healthy controls.

Figure 5FA maps

Voxelwise distribution of reliability metrics for mean diffusivity (MD). Panel A) shows sagittal, coronal and axial slices of the mean FA image created during the image processing, included for anatomical reference. B) Equivalent three slices for the intraclass correlation coefficient (ICC) of MD in early HD patients. Higher values reflect higher test-retest reliability. C) ICC of MD in healthy controls.

Figure 6MD maps

Voxelwise distribution of reliability metrics for axial diffusivity (AD). Panel A) shows sagittal, coronal and axial slices of the mean FA image created during the image processing, included for anatomical reference. B) Equivalent three slices for the intraclass correlation coefficient (ICC) of AD in early HD patients. Higher values reflect higher test-retest reliability. C) ICC of AD in healthy controls.

Figure 7AD maps

Voxelwise distribution of reliability metrics for radial diffusivity (RD). Panel A) shows sagittal, coronal and axial slices of the mean FA image created during the image processing, included for anatomical reference. B) Equivalent three slices for the intraclass correlation coefficient (ICC) of RD in early HD patients. Higher values reflect higher test-retest reliability. C) ICC of RD in healthy controls.

Figure 8RD maps

Discussion

Analysis of DTI scans acquired back-to-back gave generally high levels of reliability in both early stage HD patients and controls, when using either atlas-based or T1-segmentation-based ROIs. Using specific methods that will be applied to longitudinal datasets in studies of HD, the four major DTI metrics (FA, MD, AD and RD) all showed low levels of within-subject (i.e. scan-rescan) variability relative to between-subject variability. This concurs with previous research that has assessed the reliability of DTI in HD 43 and other samples using different analysis methods, which generally report acceptable levels of reliability (i.e. high ICC), either test-retest 24,35 or between-scanners 21,22,36,37. All metrics tended to perform as well in both experimental groups and importantly there was no qualitative evidence of group bias in reliability. Although no formal statistical comparisons were made between HD and controls, the ICC estimates and 95% CI suggested that for most regions the ICCs were consistent between groups. Given the magnitude of DTI effect sizes in cross-sectional comparisons of early HD patients and controls (e.g. 4), the low within-subject variances found support the continued use of DTI as a tool for detecting patterns of neurodegeneration in this HD population.

It is important to recognise the imprecision in the ICCs reflected in the relatively wide confidence intervals, particularly for the lower ICCS. Nonetheless it does appear that levels of reliability were not entirely consistent across the brain. Those regions that showed lower reliability (ICC < 0.8) tended to be regions in the inferior of the brain, including cerebellum and brainstem ROIs. Explanations for this could include the increased presence of tissue susceptibility errors in inferior regions, or sub-optimal image registration due to the registration procedure being largely driven by the increased contrast in signal intensity found in and around the lateral ventricles 32. Also, these cerebellar regions have smaller volumes when defined according to the ICBM-DTI-81 atlas, thus may be more susceptible to partial volume effects than larger areas where such effects are more likely to be averaged out. The present findings indicate that extra caution should be exercised when examining patterns of longitudinal change in small inferior regions, as these tend to have greater inherent variability using atlas-based registration methods.

Interestingly, the grey matter ROIs, segmented on a T1 image, showed lower reliability for diffusion metrics, particularly MD in the caudate and AD in both caudate and putamen. Regional atrophy may contribute to inaccuracies in the registration of T1 images to FA maps and the caudate in particular is susceptible to partial volume effects due to its proximity to the lateral ventricles. The process of transforming T1 ROIs to diffusion space is performed regularly in DTI ROI analyses and although not routinely reported, visual quality assurance of registration accuracy is absolutely necessary as achieving precise anatomical correspondence between images acquired using different modalities is challenging.

Although the low ICC values in the striatal ROIs may be some cause for concern for longitudinal studies, this may be explained by the presence of particularly low between-subject variability, meaning that minor divergences within a small number of participants can lead to large fluctuations of the estimated ICC. The influence of such outliers, particularly in light of the reduced control sample size for the T1-segmented regions (N = 8), could be pronounced. To assess this, outliers identified from scatter plots were removed and the ICCs recalculated. In most cases, the ICC increased on removal, though not necessarily to the high values observed for other regions. As an example, AD in the left corticospinal tract had a low ICC in the HD subjects. Removing a single outlier increased the ICC from 0.471 to 0.763, which is still lower than the value of 0.889 observed for the same region in the controls, or 0.964, found in HD patients in the right hemisphere.

The within-subject variance should also be considered when comparing between regions or metrics. While in general within-variance was strongly related to ICC, as would be expected, there are some regions, such as the fornix, where ICC was high but within-subject variance appeared much greater than other regions. The magnitude of the within-subject variance is approximately inversely proportional to the amount of signal, so for example, one may be cautious about the reliability of the fornix for longitudinal studies, despite a relatively high ICC of 0.90 for MD. If wishing to restrict the number of brain regions tested, areas such as the genu and body of corpus callosum, for which we observed high ICC and very low within-subject variance, might be more appropriate for longitudinal research studies.

Voxelwise ICC maps of FA, MD, AD and RD indicated that the distribution of variability was qualitatively very similar between early HD patients and controls. High ICC for FA was apparent across the white matter of the brain, with reduced values in the lateral ventricles and subcortical grey matter nuclei. The strength of the signal in DTI depends on the degree of anisotropy 38 and FA is a direct representation of this measure, so it is plausible that the inherent anisotropy of brain tissue influences the between-scan variability. The MD ICC maps also did not materially differ between groups, though when compared with the FA maps, the pattern was one of generally higher ICC throughout the brain, which was matched by very similar patterns for AD and RD; not unexpected given the relatedness of the metrics. The voxelwise patterns of reliability concur with the study by Marenco and colleagues 24 in healthy controls at 1.5T who showed a similar distribution of ICC for FA and trace (i.e. total diffusivity). In accordance with this, the voxelwise and ROI-based findings may add weight to the idea that DTI in general, and FA in particular, is primarily suited for examining white matter microstructure 22 and is less quantifiable 39 and harder to interpret in grey-matter regions 40. Measures of anisotropy in tissue not thought to be characteristically anisotropic may not give particularly meaningful insight into biological or pathological processes.

Although the four DTI metrics were generally comparable across the board, there was a trend for lower reliability in AD, compared with FA, MD or RD. One possible explanation for this finding is that AD, unlike any of the other metrics, is derived from a single component of the diffusion tensor. For FA, MD and RD there is a degree of averaging across tensor elements (i.e. eigenvalues) that may increase the signal-to-noise ratio and reduce variability, whereas AD is derived solely from the primary eigenvalue. Previous reliability studies have not reported on AD before and this potential difference with other diffusion metrics should be investigated further, particularly if AD results are to be analysed longitudinally

There are a few caveats to consider when interpreting the present findings. The sample size was relatively small once divided into early HD patients and controls, thus resulting in increased susceptibility to the influence of outliers and reduced precision. One issue in extrapolating these findings to future longitudinal studies is the scan-rescan data was collected back-to-back, with the participant remaining in the scanner. This meant that the position of the head within the magnetic field was very similar between scans, which is less often the case with longer intervals where re-positioning is required. When running longitudinal studies, monitoring the consistency of head positioning at each acquisition could help reduce the variability that can be caused by subtle differences in orientation and slice positioning 19,41 . Another limitation is that these results are scanner specific. The reliability of DTI may differ between scanner manufacturers 37, or show higher noise/within-subject variability in older machines that require servicing. This point is particularly relevant to multi-centre studies, which are likely to be increasingly necessary in relatively uncommon diseases such as HD, in order to have well-powered studies. While the reliability of DTI has been demonstrated across scanners in principle 36, collecting repeat scans both within-scanner and across study sites would be a prudent step to help establish the reliability of DTI data in any large scale study.

In conclusion, test-retest reliability analysis of atlas-based and T1-segmented ROI approaches DTI analysis show generally high ICCs on diffusion data acquired with a clinically-acceptable scan time. This gives confidence that such data acquisition and analysis methods can be reliably used for cross-sectional comparisons, alongside lending support for their utility to measure within-subject change over time; the goal of on-going longitudinal research into progressive neurodegeneration in HD. However, longitudinal reliability can only be explicitly demonstrated by taking the expected magnitude of the longitudinal effects into account, to determine whether these effects are greater than the scan-rescan variability. Finally, it is notable that there are some inconsistencies to the generally high reliability; particularly in striatal AD measures and it could be concluded that test-retest reliability is not evenly distributed throughout the brain, potentially due to intrinsic tissue differences, non-linear geometric distortions or uneven registration accuracy. This has implications for selecting which brain regions are most appropriate for future longitudinal studies, above and beyond the biological evidence for involvement in neurodegeneration.

Competing Interests

The authors have declared that no competing interests exist.

Acknowledgments

In particular, the authors would like to acknowledge Prof. Sarah Tabrizi of UCL Institute of Neurology, as Principal Investigator of the PADDINGTON project and for her support of our work. The authors would also like to thank the patients and controls who took part in this study, along with all the Work Package 2 site staff at Paris, Leiden, Ulm and London.

References Bohanna I, Georgiou-Karistianis N, Sritharan A, Asadi H, Johnston L, Churchyard A, Egan G. Diffusion tensor imaging in Huntington's disease reveals distinct patterns of white matter degeneration associated with motor and cognitive deficits. Brain Imaging Behav. 2011 Sep;5(3):171-80. PubMed PMID:21437574. 21437574 Della Nave R, Ginestroni A, Tessa C, Giannelli M, Piacentini S, Filippi M, Mascalchi M. Regional distribution and clinical correlates of white matter structural damage in Huntington disease: a tract-based spatial statistics study. AJNR Am J Neuroradiol. 2010 Oct;31(9):1675-81. PubMed PMID:20488902. 20488902 Douaud G, Behrens TE, Poupon C, Cointepas Y, Jbabdi S, Gaura V, Golestani N, Krystkowiak P, Verny C, Damier P, Bachoud-Lévi AC, Hantraye P, Remy P. In vivo evidence for the selective subcortical degeneration in Huntington's disease. Neuroimage. 2009 Jul 15;46(4):958-66. PubMed PMID:19332141. 19332141 Hobbs NZ, Cole JH, Farmer RE, Rees EM, Crawford HE, Malone IB, Roos RA, Sprengelmeyer R, Durr A, Landwehrmeyer B, Scahill RI, Tabrizi SJ, Frost C. Evaluation of multi-modal, multi-site neuroimaging measures in Huntington's disease: Baseline results from the PADDINGTON study. Neuroimage Clin. 2012 Dec 9;2:204-11. PubMed PMID:24179770. 24179770 Delmaire C, Dumas EM, Sharman MA, van den Bogaard SJ, Valabregue R, Jauffret C, Justo D, Reilmann R, Stout JC, Craufurd D, Tabrizi SJ, Roos RA, Durr A, Lehéricy S. The structural correlates of functional deficits in early huntington's disease. Hum Brain Mapp. 2013 Sep;34(9):2141-53. PubMed PMID:22438242. 22438242 Magnotta VA, Kim J, Koscik T, Beglinger LJ, Espinso D, Langbehn D, Nopoulos P, Paulsen JS. Diffusion Tensor Imaging in Preclinical Huntington's Disease. Brain Imaging Behav. 2009 Mar 1;3(1):77-84. PubMed PMID:21415933. 21415933 Reading SA, Yassa MA, Bakker A, Dziorny AC, Gourley LM, Yallapragada V, Rosenblatt A, Margolis RL, Aylward EH, Brandt J, Mori S, van Zijl P, Bassett SS, Ross CA. Regional white matter change in pre-symptomatic Huntington's disease: a diffusion tensor imaging study. Psychiatry Res. 2005 Oct 30;140(1):55-62. PubMed PMID:16199141. 16199141 Rosas HD, Tuch DS, Hevelone ND, Zaleta AK, Vangel M, Hersch SM, Salat DH. Diffusion tensor imaging in presymptomatic and early Huntington's disease: Selective white matter pathology and its relationship to clinical measures. Mov Disord. 2006 Sep;21(9):1317-25. PubMed PMID:16755582. 16755582 Tabrizi SJ, Reilmann R, Roos RA, Durr A, Leavitt B, Owen G, Jones R, Johnson H, Craufurd D, Hicks SL, Kennard C, Landwehrmeyer B, Stout JC, Borowsky B, Scahill RI, Frost C, Langbehn DR. Potential endpoints for clinical trials in premanifest and early Huntington's disease in the TRACK-HD study: analysis of 24 month observational data. Lancet Neurol. 2012 Jan;11(1):42-53. PubMed PMID:22137354. 22137354 Tabrizi SJ, Scahill RI, Owen G, Durr A, Leavitt BR, Roos RA, Borowsky B, Landwehrmeyer B, Frost C, Johnson H, Craufurd D, Reilmann R, Stout JC, Langbehn DR. Predictors of phenotypic progression and disease onset in premanifest and early-stage Huntington's disease in the TRACK-HD study: analysis of 36-month observational data. Lancet Neurol. 2013 Jul;12(7):637-49. PubMed PMID:23664844. 23664844 Sritharan A, Egan GF, Johnston L, Horne M, Bradshaw JL, Bohanna I, Asadi H, Cunnington R, Churchyard AJ, Chua P, Farrow M, Georgiou-Karistianis N. A longitudinal diffusion tensor imaging study in symptomatic Huntington's disease. J Neurol Neurosurg Psychiatry. 2010 Mar;81(3):257-62. PubMed PMID:19237387. 19237387 Vandenberghe W, Demaerel P, Dom R, Maes F. Diffusion-weighted versus volumetric imaging of the striatum in early symptomatic Huntington disease. J Neurol. 2009 Jan;256(1):109-14. PubMed PMID:19267169. 19267169 Weaver KE, Richards TL, Liang O, Laurino MY, Samii A, Aylward EH. Longitudinal diffusion tensor imaging in Huntington's Disease. Exp Neurol. 2009 Apr;216(2):525-9. PubMed PMID:19320010. 19320010 Domínguez D JF, Egan GF, Gray MA, Poudel GR, Churchyard A, Chua P, Stout JC, Georgiou-Karistianis N. Multi-modal neuroimaging in premanifest and early Huntington's disease: 18 month longitudinal data from the IMAGE-HD study. PLoS One. 2013 Sep 16;8(9):e74131. PubMed PMID:24066104. 24066104 Rees, EM, Scahill RI, Hobbs NZ. Longitudinal Neuroimaging Biomarkers in Huntington's Disease. Journal of Huntington's Disease. 2013 2(1):21-39. DOI: 10.3233/JHD-120030. Skare ST, Bammer R. EPI-Based Pulse Sequences for Diffusion Tensor MRI. In: Jones DK, editor. Diffusion MRI : theory, methods, and applications. New York ; Oxford: Oxford University Press; 2011. Tijssen RH, Jansen JF, Backes WH. Assessing and minimizing the effects of noise and motion in clinical DTI at 3 T. Hum Brain Mapp. 2009 Aug;30(8):2641-55. PubMed PMID:19086023. 19086023 Bisdas S, Bohning DE, Besenski N, Nicholas JS, Rumboldt Z. Reproducibility, interrater agreement, and age-related changes of fractional anisotropy measures at 3T in healthy subjects: effect of the applied b-value. AJNR Am J Neuroradiol. 2008 Jun;29(6):1128-33. PubMed PMID:18372415. 18372415 Bonekamp D, Nagae LM, Degaonkar M, Matson M, Abdalla WM, Barker PB, Mori S, Horská A. Diffusion tensor imaging in children and adolescents: reproducibility, hemispheric, and age-related differences. Neuroimage. 2007 Jan 15;34(2):733-42. PubMed PMID:17092743. 17092743 Pfefferbaum A, Adalsteinsson E, Sullivan EV. Replicability of diffusion tensor imaging measurements of fractional anisotropy and trace in brain. J Magn Reson Imaging. 2003 Oct;18(4):427-33. PubMed PMID:14508779. 14508779 Teipel SJ, Reuter S, Stieltjes B, Acosta-Cabronero J, Ernemann U, Fellgiebel A, Filippi M, Frisoni G, Hentschel F, Jessen F, Klöppel S, Meindl T, Pouwels PJ, Hauenstein KH, Hampel H. Multicenter stability of diffusion tensor imaging measures: a European clinical and physical phantom study. Psychiatry Res. 2011 Dec 30;194(3):363-71. PubMed PMID:22078796. 22078796 Vollmar C, O'Muircheartaigh J, Barker GJ, Symms MR, Thompson P, Kumari V, Duncan JS, Richardson MP, Koepp MJ. Identical, but not the same: intra-site and inter-site reproducibility of fractional anisotropy measures on two 3.0T scanners. Neuroimage. 2010 Jul 15;51(4):1384-94. PubMed PMID:20338248. 20338248 Huang L, Wang X, Baliki MN, Wang L, Apkarian AV, Parrish TB. Reproducibility of structural, resting-state BOLD and DTI data between identical scanners. PLoS One. 2012;7(10):e47684. PubMed PMID:23133518. 23133518 Marenco S, Rawlings R, Rohde GK, Barnett AS, Honea RA, Pierpaoli C, Weinberger DR. Regional distribution of measurement error in diffusion tensor imaging. Psychiatry Res. 2006 Jun 30;147(1):69-78. PubMed PMID:16797169. 16797169 Brander A, Kataja A, Saastamoinen A, Ryymin P, Huhtala H, Ohman J, Soimakallio S, Dastidar P. Diffusion tensor imaging of the brain in a healthy adult population: Normative values and measurement reproducibility at 3 T and 1.5 T. Acta Radiol. 2010 Sep;51(7):800-7. PubMed PMID:20707664. 20707664 Shoulson I, Fahn S. Huntington disease: clinical care and evaluation. Neurology. 1979 Jan;29(1):1-3. PubMed PMID:154626. 154626 Freeborough PA, Fox NC, Kitney RI. Interactive algorithms for the segmentation and quantitation of 3-D MRI brain scans. Comput Methods Programs Biomed. 1997 May;53(1):15-25. PubMed PMID:9113464. 9113464 Pierson R, Johnson H, Harris G, Keefe H, Paulsen JS, Andreasen NC, Magnotta VA. Fully automated analysis using BRAINS: AutoWorkup. Neuroimage. 2011 Jan 1;54(1):328-36. PubMed PMID:20600977. 20600977 Ourselin S, Roche A, Subsol G, Pennec X, Ayache N, Reconstructing a 3D structure from serial histological sections, Image and Vision Computing. 2001 Jan, 19(1):25-31. Modat M, Ridgway GR, Taylor ZA, Lehmann M, Barnes J, Hawkes DJ, Fox NC, Ourselin S. Fast free-form deformation using graphics processing units. Comput Methods Programs Biomed. 2010 Jun;98(3):278-84. PubMed PMID:19818524. 19818524 Wang Y, Gupta A, Liu Z, Zhang H, Escolar ML, Gilmore JH, Gouttard S, Fillard P, Maltbie E, Gerig G, Styner M. DTI registration in atlas based fiber analysis of infantile Krabbe disease. Neuroimage. 2011 Apr 15;55(4):1577-86. PubMed PMID:21256236. 21256236 Zhang H, Yushkevich PA, Alexander DC, Gee JC. Deformable registration of diffusion tensor MR images with explicit orientation optimization. Med Image Anal. 2006 Oct;10(5):764-85. PubMed PMID:16899392. 16899392 Mori S, Wakana S, Van Zijl PCM. MRI atlas of human white matter. 1st ed. Amsterdam, The Netherlands ; San Diego, CA: Elsevier; 2005. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb 8;1(8476):307-10. PubMed PMID:2868172. 2868172 Jansen JF, Kooi ME, Kessels AG, Nicolay K, Backes WH. Reproducibility of quantitative cerebral T2 relaxometry, diffusion tensor imaging, and 1H magnetic resonance spectroscopy at 3.0 Tesla. Invest Radiol. 2007 Jun;42(6):327-37. PubMed PMID:17507802. 17507802 Magnotta VA, Matsui JT, Liu D, Johnson HJ, Long JD, Bolster BD Jr, Mueller BA, Lim K, Mori S, Helmer KG, Turner JA, Reading S, Lowe MJ, Aylward E, Flashman LA, Bonett G, Paulsen JS. Multicenter reliability of diffusion tensor imaging. Brain Connect. 2012;2(6):345-55. PubMed PMID:23075313. 23075313 Fox RJ, Sakaie K, Lee JC, Debbins JP, Liu Y, Arnold DL, Melhem ER, Smith CH, Philips MD, Lowe M, Fisher E. A validation study of multicenter diffusion tensor imaging: reliability of fractional anisotropy and diffusivity values. AJNR Am J Neuroradiol. 2012 Apr;33(4):695-700. PubMed PMID:22173748. 22173748 Pierpaoli C, Jezzard P, Basser PJ, Barnett A, Di Chiro G. Diffusion tensor MR imaging of the human brain. Radiology. 1996 Dec;201(3):637-48. PubMed PMID:8939209. 8939209 Farrell JA, Landman BA, Jones CK, Smith SA, Prince JL, van Zijl PC, Mori S. Effects of signal-to-noise ratio on the accuracy and reproducibility of diffusion tensor imaging-derived fractional anisotropy, mean diffusivity, and principal eigenvector measurements at 1.5 T. J Magn Reson Imaging. 2007 Sep;26(3):756-67. PubMed PMID:17729339. 17729339 Rulseh AM, Keller J, Tintěra J, Kožíšek M, Vymazal J. Chasing shadows: what determines DTI metrics in gray matter regions? An in vitro and in vivo study. J Magn Reson Imaging. 2013 Nov;38(5):1103-10. PubMed PMID:23440865. 23440865 Virta A, Barnett A, Pierpaoli C. Visualizing and characterizing white matter fiber structure and architecture in the human pyramidal tract using diffusion tensor MRI. Magn Reson Imaging. 1999 Oct;17(8):1121-33. PubMed PMID:10499674. 10499674 Penney JB Jr, Vonsattel JP, MacDonald ME, Gusella JF, Myers RH. CAG repeat number governs the development rate of pathology in Huntington's disease. Ann Neurol. 1997 May;41(5):689-92. PubMed PMID:9153534. 9153534 Müller HP, Grön G, Sprengelmeyer R, Kassubek J, Ludolph AC, Hobbs N, Cole J, Roos RA, Duerr A, Tabrizi SJ, Landwehrmeyer GB, Süssmuth SD. Evaluating multicenter DTI data in Huntington's disease on site specific effects: An ex post facto approach. Neuroimage Clin. 2013;2:161-7. PubMed PMID:24179771. 24179771