However, the reliability of data obtained with most quality assessment scales has not been established.
The raters were not aware that the reliability of ratings would be evaluated.The main findings of our studies were that the reliability of ratings of individual PEDro scale items varied from “fair” to “substantial,” or from “moderate” to “substantial” when rated by panels of raters, and the reliability for the total PEDro score was “fair” to “good.”SPSS Inc, 233 Wacker Dr, Chicago, IL 60606.For full access to this pdf, sign in to an existing account, or purchase an annual subscription.For researchers considering which scale they should use in a systematic review, there is an additional problem. Interrater reliability was evaluated for individual ratings and consensus ratings.Estimates of Reliability from Study 1 for Each of the 11 Items of the PEDro ScaleThe final rating (that agreed on by the first 2 raters or assigned by the third rater) will be referred to as the “consensus rating.” The 120 RCTs were assessed by 25 raters who each rated from 1 to 56 RCTs (X̄=13.8). With the consensus ratings, 5 of the 11 items (“eligibility criteria specified,” “random allocation,” “groups similar at baseline,” “less than 15% dropouts,” and “between-group statistical comparisons”) achieved reliability in a higher benchmark than was achieved for individual ratings. [Maher CG, Sherrington C, Herbert RD, et al. Method. C.Maher@fhs.usyd.edu.au These findings suggest that the total PEDro score can be assessed with “fair” to “good” reliability. The Physiotherapy Evidence Database (PEDro) scale has been widely used to investigate methodological quality in physiotherapy randomized controlled trials; however, its validity has not been tested for pharmaceutical trials. Background and purpose: Assessment of the quality of randomized controlled trials (RCTs) is common practice in systematic reviews. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwideBase rate for a “yes” response.The ICCs for interrater reliability of the total PEDro scores for individual raters were .55 (95% confidence interval [CI]=.41, .72) for study 1 and .56 (95% CI=.47, .65) for study 2. Method. [Maher CG, Sherrington C, Herbert RD, et al. [Maher CG, Sherrington C, Herbert RD, et al. Systematic reviews of randomized controlled trials (RCTs) are considered by some authors1–3 to constitute the best single source of information about the effectiveness of health care interventions. 2003 Aug;83(8):713-21. Results. For the remaining 6 items, the reliability was within the same benchmark for individual and consensus ratings.Subsequently, the 120 RCTs in study 2 were re-rated by a different set of raters. Reliability of ratings of PEDro scale items was calculated using multirater kappas, and reliability of the total (summed) score was calculated using intraclass correlation coefficients (ICC [1,1]). The ICC for the total score was .56 (95% confidence interval=.47-.65) for ratings by individuals, and the ICC for consensus ratings was .68 (95% confidence interval=.57-.76). Each RCT had previously been independently rated by 2 raters, and where the ratings for any scale item in any RCT disagreed, a third (consensus) rater arbitrated. T1 - Reliability of the PEDro scale for rating quality of randomized controlled trialsBackground and Purpose. The reliability of ratings of PEDro scale items varied from "fair" to "substantial," and the reliability of the total PEDro score was "fair" to "good." In the first study, 11 raters independently rated 25 RCTs randomly selected from the PEDro database. Reliability of the PEDro scale for rating quality of randomized controlled trials. A third rater was required for at least one scale item in all except 24 RCTs. Results. Assessment of the quality of randomized controlled trials (RCTs) is common practice in systematic reviews. Reliability of ratings of PEDro scale items was calculated using multirater kappas, and reliability of the total (summed) score was calculated using intraclass correlation coefficients (ICC [1,1]). However, the reliability of data obtained with most quality assessment scales has not been established.
The items “groups similar at baseline,” “point measures and variability data,” and “intention-to-treat analysis” demonstrated “moderate” reliability, whereas the other 8 scale items demonstrated “substantial” reliability. RESULTS: The kappa value for each of the 11 items ranged from.36 to.80 for individual assessors and from.50 to.79 for consensus ratings generated by groups of 2 or 3 raters.