Tag Archives: critical appraisal

What are WEE Waiting for? The Quick-Wee Method for Faster Clean Catch Urine Collection

Where can I find this paper?


Please note this paper is OPEN ACCESS. You are strongly advised to read the original paper before reading any further.

What is this paper about (what is the research question)?

Does suprapubic cutaneous stimulation with cold fluid-soaked gauze (the “Quick-Wee” method) reduce the amount of time spent waiting for clean catch urine?

Summary of the Paper

Design: single centre, randomised, prospective non-blinded trial

Objective: to evaluate the efficacy of the Quick-Wee method

Outcome of interest: voiding of urine within five minutes (binary outcome)

Intervention: genital cleaning for 10 seconds with sterile water at room temperature, followed by continued rubbing of the suprapubic area in a circular pattern with gauze (soaked in cold saline) held by forceps

Reference standard: genital cleaning for 10 seconds with sterile water at room temperature (standard practice)

Participants: patients presenting to an Australian paediatric emergency department between September 2015-April 2016

Inclusions: pre-continent infants aged 1-12 months in whom clean catch urine sample was required

  • Exclusions: neonates (defined as <1 month of age); infants with anatomical or neurological abnormalities affecting voiding of urine or sensation; those patients with need for an immediate sample by invasive method

Results: 354 subjects were recruited of whom 344 participated in the analysis; 175 in the control group and 179 in the intervention group (5 patients were excluded from each group after randomisation, giving 170 in the control group and 174 in the intervention group).

54/174 (31%) of patients voided within five minutes in the Quick-Wee group

20/170 (12%) of patients voided within five minutes in the control group

The difference in proportions was 19% (95% confidence interval for difference 11-28%).

This gave an NNT of 4.7 to successfully catch one additional sample within five minutes (95% confidence interval 3.4-7.7).

Authors’ Conclusions:

The Quick-Wee method requires minimal resources and is a simple way to trigger faster voiding for clean catch urine from infants in the acute care setting.

On the study design

Firstly it is important to note that this was a single-centre study in which trained clinicians were identifying and recruiting potential test subjects in addition to performing the intervention. This introduces a potential for innovation or novelty bias, whereby new treatments or procedures are preferred (or possibly considered less favourably) than traditional treatments or methods. This could be exacerbated by a lack of blinding, such as in this study, although it would be practically impossible to blind subjects to the treatment they are receiving in this particular case. In an ideal world, the clinicians recruiting and randomising patients would be different from those performing the procedure, and the results would be interpreted by people blinded to the groups to which patients were randomised – but research rarely occurs under ideal circumstances (if ever).

That said, a considerable effort has been made to overcome this through blinding which was carried out in a 1:1 ratio of consecutive patients using random permuted blocks of different sizes and allocation concealment (opaque envelopes) selected sequentially.

The Quick-Wee procedure itself was well standardised; teaching was delivered through face-to-face intervention and written instruction and standardised packs were used for the initial cleaning phase. A separate pack was prepared for the Quick-Wee intervention itself.

Several secondary outcomes were considered, including successful catch of the specimen, contamination of sample, parental and clinician satisfaction with method.

A sample size calculation was performed, requiring 322 patients (161 in each group) to achieve 80% power to detect a difference in the primary outcome; based on pilot study data, the expected change in proportions was 15% with a baseline expected proportion of 21% in the control group and therefore 35% in the intervention arm (a small inconsistency in these percentages is likely due to rounding). The authors performed an intention-to-treat analysis and planned to recruit an additional 10% of subjects beyond the sample size calculation to account for anticipated attrition.

What were the results and what does this mean?

The study achieved the required sample size due in part to the forethought of including 10% more patients to account for attrition.

The 344 subjects analysed were divided into control (170 patients) and intervention (174 patients) groups and in each case successful voiding was determined if it occurred within five minutes of the initial cleaning step. The data collection section mentions paper case record forms but it is not clear whether these were standardised for the research study or the usual clinical documentation. In addition, interobserver reliability is inferred through the use of a timer but in practice there is an opportunity for bias here if the observer is not independent of the clinician carrying out the procedure (forgetting to press “start” and adding a few extra seconds, for example).

The results are certainly impressive; 54/174 patients voided within five minutes with the Quick-Wee method (31% – 95% confidence interval 24%-39%) compared with 20/170 in the control group (12% – 95% confidence interval 7%-18%). The difference in proportions was 19% with a 95% confidence interval of 11%-28% and a P value of <0.001 using the χ2 test.

The use of binary data here certainly makes for simpler analysis rather than looking at specific timings for each subject; five minutes is not an unreasonable amount of time to wait for a sample but it should be recalled that there is a member of staff tied up in undertaking the Quick-Wee method for potentially the entire five minute duration – this might prove challenging in busy Emergency Departments.

The authors also looked at voiding with successful catch and found similar proportions (Quick-Wee 52/174 [30%: 95% confidence interval 23%-37%]; control 15/170 [9%: 95% confidence interval 5%-4%]). Does the Quick-Wee method make missed voids less likely? Perhaps, due to increased attention focus on the relevant anatomical area..!

The difference in rates of contamination was not statistically significant (27% in the Quick-Wee group [95% confidence interval 15%-43%], 46% in the control group [95% confidence interval 17%-77%] – this could be an area for further work in a larger sample, given high contamination rates in both groups.

Finally, the satisfaction scores of both parents and clinicians were better in the Quick-Wee group. The data is given in a slightly counter-intuitive way (the Likert scale runs from 1=very satisfied to 5=very unsatisfied) which they have called “higher rate” of satisfaction – it is worth noting that this does not correspond to a higher number! In the Quick-Wee group, median parental and clinician satisfaction was 2, while in the control group the median for both was 3.

What can we take from this paper into clinical practice?

This method appears to be reliable from this pragmatic and robust study. It is certainly appealing as a first-line technique over invasive methods such as suprapubic aspiration or catheterisation. It certainly seems worthy of adoption into clinical practice provided you can spare the staff.

More questions to ask

  • Would this technique work in older children, given its theoretical basis in the neonatal cutaneous voiding reflex?
  • Would warmer water work as reliably?
  • Would time be further reduced with a pre-emptive feed (or oral hydration) as in the study by Herreros et al?
  • Could this method also reduce contamination rates?


Probing Questions: Lung Ultrasound in Diagnosis and Management of Bronchiolitis

Screen Shot 2015-10-16 at 15.28.20

Thanks to Casey Parker of Broomedocs for this guest contribution – his review is cross-posted here.

Where can I find this paper?


What is this paper about (what is the research question)?

This paper aimed to correlate sonographic lung findings with clinically diagnosed bronchiolitis in infants.  The authors also attempted to provide some prognostic information [the need for oxygen support] based on sonographic lung features.

Summary of the Paper

The subjects were infants admitted for clinically suspected bronchiolitis.  There was also a cohort of “normal controls” used as a comparison.  The children underwent a clinical scoring by the treating Paediatrician and lung ultrasound by both a radiologist and Paediatrician sonographer.  The scans were all completed by two of the authors.

Design: Single-centre, observational cohort study conducted in an Italian Paediatric unit.

Objective: to evaluate the accuracy of lung ultrasonography in the diagnosis and management of bronchiolitis in infants.

Outcome of interest:  correlation between clinical and sonographic lung findings in bronchiolitic infants.  Can LUS findings be used to predict the need for supplemental oxygen requirements?

Participants: One hundred six infants, aged from 9 to 239 days old were enrolled.

  • Inclusions: clinically “suspected bronchiolitis” in infants.  Unclear as to whether these were consecutive cases – only 106 over a 3 year study period.
  • Exclusions: radiological pneumonia, other “concomitant pathology” or the unavailability of the study sonographer.

Results: There was a high level [ ~90%] of agreement between the clinician’s severity rating and the predetermined sonographic severity scores.  There was also a high level of agreement between the two sonographers scoring of the LUS findings (K = 89.6%).  The lung US scoring predicted the need for oxygen supplementation with good accuracy [sensitivity: 96.6 %, specificity 98.7 % ] although there were wide confidence intervals as a result of the small numbers in this trial.

Authors’ Conclusions:

In summary, this pilot study demonstrates that the use of LUS in bronchiolitis can be considered as an extension of the clinical evaluation and could be incorporated into clinical algorithms to aid decision-making. Our promising data needs to be confirmed in larger cohort studies also involving critical patients.

On the study design

 This study design is typical of many pilot ultrasound papers.  Small numbers of patients in which sonography is compared to a gold-standard that may not be entirely accurate of itself.  Bronchiolitis is a clinical diagnosis, with no really objective diagnostic standard.  The use of just 2 experienced Paediatric sonographers in a single centre does raise questions about the external validity of the results and there is a high likelihood of bias here.  The clinicians were blinded to the sonographic findings – and therefore the risk of bias here was removed.  The use of “normal cohort” and the “RSV swabs” in the study design was a little confusing and doesn’t really add to the results.

What were the results and what does this mean?

The results suggest that clinically diagnosed bronchiolitis looks like…. sonographic bronchiolitis as per the defined criteria used in this paper.  The protocol used did identify infants with more severe lung disease.  The need for supplemental oxygen was consistent with more severe LUS changes.  However, given the “standard” was clinical examination it is unclear exactly what LUS would add to the prognostication by paediatricians.  The high degree of agreement between the two study sonographers is difficult to extrapolate given they are both highly skilled, ultrasound enthusiasts – a larger mix of observers would be needed to draw any conclusions about our ability to utilise LUS in small kids.

What can we take from this paper into clinical practice?

Lung ultrasound for the diagnosis and severity scoring of bronchiolitis is reasonably accurate.  Does it add anything?  Probably not, unless you are currently using CXR to ‘diagnose’ bronchiolitis.  This paper does provide some useful descriptions of the spectrum of disease and their sonographic appearance.

I think this paper is interesting in that it describes the sonographic spectrum of a common disease of infants.  The study is not really large enough, nor does it have the external validity to make it a “practice changer”.   This pilot can help inform us about the appearance of bronchiolitis – and in the future this may become a more commonplace part of our clinical assessment of children – but for now I am not sure it adds to our quiver.

More questions to ask

  • Can ultrasound reliably differentiate bronchiolitis from important differential diagnoses in infants ? (e.g.. pneumonia, heart failure, upper airway obstruction… )
  • Are the sonographic findings in bronchiolitis consistent when obtained by sonographers of various experience?
  • Previous papers have compared LUS to conventional CXR for the diagnosis of bronchiolitis – and LUS was favourable.  It would be nice to see a paper looking at children with severe disease in which clinicians often turn to CXR to “reconfirm the working diagnosis” in order to ascertain its utility at that end of the spectrum.

Follow us on twitter: @PEMLit

Bouncing Back: Repeated ED Visits Among Children With Meningitis or Septicaemia

Screen Shot 2015-10-08 at 16.17.39

Where can I find this paper?


What is this paper about (what is the research question)?

How often have children, subsequently diagnosed with meningitis or septicaemia, attended an ED and been discharged in the preceding five days?

Summary of the Paper

Design: retrospective cohort study using pan-Toronto hospital database

Objective: to ascertain the proportion of children with an ultimate diagnosis of meningitis and septicaemia who had attended an Emergency Department in the five preceding days

Outcome of interest: proportion of reattendances; ED factors in the group with preceding attendance compared with those admitted at first attendance

Participants: children (aged 30 days to 5 years) with a diagnosis of meningitis or septicaemia with linked data regarding prior attendances in the period 06/04/2005-01/03/2010.

  • Inclusions: children with an ultimate diagnosis of meningitis or septicaemia and a minimum inpatient stay of 4 days (or death in hospital)
  • Exclusions: length of stay <4 days, patients discharged within the preceding 14 days of admission with meningitis/septicaemia

Results: 521 children were admitted with a final diagnosis of meningitis/septicaemia during the study period. 125 had attended an ED in the preceding 5 days with 114 attending with apparent infection. Those with repeated visits had similar lengths of stay, critical care use and 30-day mortality.

Authors’ Conclusions:

Our study reveals that despite the imperative to provide early diagnosis and treatment to children and infants with critical infections, current practices differ markedly from this goal, with 1 in 5 children having repeated ED presentations before admission with meningitis or septicaemia.

On the study design

This was a retrospective cohort study which depended on ICD-10 reporting of diagnoses and database correlation to link admissions with meningitis or septicaemia with prior ED attendances. As with all such studies, findings are dependent on the quality of data recorded, even more so when the analysis is performed on retrospective data.

Nonetheless the study asks a valid question about how good we are at identifying serious bacterial illness the first time around.

What were the results and what does this mean?


The low prevalence of serious bacterial infection is interesting; there is no data given about the number of ED attendances for children who were not given a diagnosis of meningitis or septicaemia, so this reinforces the “needle-in-a-haystack” feeling we have in the UK. These diseases are thankfully rare but identifying them early is a clinical priority.

That 125 children reattended (after not being admitted at first attendance) does not resonate with me in the same way as they authors. I feel this rather reflects my experiences that patients who have severe illness do not always suddenly present acutely unwell but rather at a time point along a clinical trajectory, at which reliable clinical signs may or may not be present. Notably children who reattended had lower acuity scores at first presentation, which supports this.

Unfortunately much of the analysis is focused on whether attending a department with dedicated paediatric consultants made a difference. I suspect that this is association rather than causation and would be difficult to prove. In any case we would need to see the background rates of paediatric attendances to each unit to determine whether these district general hospitals were genuinely outliers. There may also be a parental tendency to reattend at a “specialist” hospital or a clinician tendency to admit more patients at a specialist hospital due to a higher acuity presenting there – the paper does not answer this question.

What can we take from this paper into clinical practice?

What this study seems to tell us is that diagnosis is tricky and that time and observation is valuable – and that we should not only make the most of opportunities to observe and review patients but that we should safety-net properly. Any child with any apparently benign illness may re-present with a deterioration in condition and we must ensure that parents feel confident in returning to us if that occurs.

More questions to ask

  • How on earth can we identify serious bacterial illness in children? Answers on a postcard for a Nobel prize… 🙂

Follow us on twitter: @PEMLit

Clinician Suspicion in Blunt Torso Trauma – Place Your Bets

Screen Shot 2015-10-08 at 11.34.30

Where can I find this paper?


What is this paper about (what is the research question)?

Are clinicians better at predicting intra-abdominal injuries in children with blunt torso trauma than a derived clinical prediction rule?

Summary of the Paper

Design: Secondary analysis of some existing PECARN group data from a prospective cohort study of children with blunt torso trauma

Objective: to compare the test characteristics of clinician suspicion with a derived clinical prediction rule to identify children at very low risk of intra-abdominal injuries undergoing acute intervention

Outcome: test characteristics for clinician suspicion, measured against presence or absence of need for acute intervention for intra-abdominal injury.

Comparison: test characteristics of a derived clinical prediction rule from the same population.

Participants: 12044 patients recruited between May 2007-January 2010 and eligible to participate in the parent study (http://www.ncbi.nlm.nih.gov/pubmed/23375510) underwent secondary analysis.

  • Inclusions: children <18 years old with blunt torso trauma presenting to participating PECARN Emergency Departments
  • Exclusions: injury >24h prior to attendance; pre-existing neurological disorders affecting examination findings; pregnancy; transfer from another institution.


3016/9252 deemed low risk (<1%) for clinician suspicion had CT abdomen performed; 35 patients  subsequently had acute intervention. Of the remaining patients with clinician suspicion ≥1%, 168/2667 had an acute intervention.

Negative clinician suspicion had the following test characteristics;

  • sensitivity 82.8% (95% CI 77.0-87.3)
  • specificity 78.7% (95% CI 77.9-79.4%)
  • NPV 99.6 (95% CI 99.5-99.7%)
  • LR- 0.2 (95% CI 0.2-0.3)

Low risk on the prediction rule had the following test characteristics;

  • sensitivity 97.0% (95% CI 93.7-98.6)
  • specificity 42.5% (95% CI 41.6-43.4%)
  • NPV 99.9 (95% CI 99.7-99.9%)
  • LR- 0.1 (95% CI 0.0-0.2)

Authors’  conclusions

A clinical prediction rule had a significantly higher sensitivity for identifying intra-abdominal injury undergoing acute intervention, but a lower specificity. The higher specificity of clinician suspicion did not translate into clinical practice as clinicians frequently obtained abdominal CT scans in patients they considered to be at very low risk.

On the study design


This was a secondary analysis of data collected as part of an original PECARN study on abdominal trauma in children. It’s always worth remembering that while secondary analysis can reveal some very useful information and trends, this was not the original purpose for which the study group was recruited or the study powered (although the authors tell us this study was preplanned, and the standardised data collection forms used to collect information about clinician decision making supports this).

The study has an issue in that the “gold standard” abdominal CT was not applied to all patients, only those deemed to be at risk of injury. This means there is a large portion of patients who had no imaging and no intervention who may still have had intra-abdominal injury although without a need for clinical intervention the significance of this is doubtful.

Good attempts were made to follow subjects up to ensure no clinically important outcomes were omitted.

What were the results and what does this mean?

There is an important distinction in this paper between the presence of an abdominal injury and one requiring intervention (specified as death, therapeutic intervention at laparotomy, angiographic embolisation, blood transfusion for anaemia or administration of intravenous fluids for at least two nights). This composite reference standard is pragmatic but we could argue about whether intra-abdominal injuries not requiring intervention are also clinically relevant or not, considering the comparative risks of radiation exposure with abdominal CT.

It is worth noting that not all of the 12044 subjects enrolled had CT abdomen performed. 11919 were deemed to have no suspicion of injury, which we must doubt given the fact that neither clinician suspicion nor clinical prediction rule achieved 100% sensitivity.

The study found that in patients with intra-abdominal injury requiring intervention, the clinician correctly identified the risk as ≥1% in 82.8% (95% CI 77.0-87.3) of cases, and in patients who did not have intra-abdominal injury requiring intervention, the clinician correctly identified that the risk was <1% in 78.7% (95% CI 77.9-79.4%) of cases. Unfortunately this shows that clinician judgement alone is neither sensitive nor specific enough to support decision making in isolation. This is borne out in a high CT abdomen rate in the population, despite a high proportion of low risk patients.

The decision rule, which determined risk as “not low” in the presence of any one of:

  • no evidence of abdominal wall trauma or seat belt sign
  • GCS >13
  • no abdominal tenderness
  • no evidence of thoracic wall trauma
  • no complaints of abdominal pain
  • no decreased breath sounds
  • no history of vomiting after the injury

had better sensitivity (so the absence of these signs performs better as a predictor of the lack of need for CT and intervention) but poorer specificity (i.e. the presence of any sign does not accurately predict a need for intervention).

Of note there were three patients whose injuries were not identified by clinician prediction or derived clinical prediction rule, so neither predictor achieved 100% sensitivity.

What can we take from this paper into clinical practice?

We as clinicians rely a lot on clinical judgement but that alone is a poor predictor of the need for intervention for intra-abdominal injury, especially when compared with this non-validated derived prediction rule. Following validation the prediction rule may have some diagnostic utility, especially when combined with observation.

More questions to ask

  • How will this decision rule perform when validated?
  • How would the rule perform if the specificity of clinician judgement was incorporated?

See Also:

St Emlyns – RCR Guidelines on imaging in paediatric trauma Imaging in Paediatric Trauma – RCR Guidelines – St.Emlyn’s

Follow us on twitter: @PEMLit

Oxygen Saturation Targets in Bronchiolitis – Magic Numbers?

Screen Shot 2015-10-08 at 09.02.54

Where can I find this paper?

http://www.ncbi.nlm.nih.gov/pubmed/26382998 – this paper is currently open access

What is this paper about (what is the research question)?

Is a target oxygen saturation of 90% or higher equivalent to 94% or higher for resolution of illness in acute viral bronchiolitis?

Summary of the Paper

Design: multicentre, parallel group, randomised controlled equivalence trial with allocation concealment.

Objective: to determine whether accepting a reduced lower limit target oxygen saturation in infants with viral bronchiolitis affected time to resolution of illness

Primary outcome measure: time to resolution of cough (parental reporting)

Intervention: subjects were randomised following decision to admit, to either standard SpO2 monitoring or a modified oximeter which skewed the reading such that SpO2 90% read as 94%. All other care was standard.

Participants: 615 subjects randomised between 03/10/2011-30/03/2012 and 01/10/2012-29/03/2013. 308 randomised to standard group, 307 to modified oximeter group

  • Inclusions: infants aged 6 weeks to 12 months (corrected gestational age) with clinically diagnosed bronchiolitis admitted to hospital for supportive care following presentation to the Emergency Department or Acute Assessment Area
  • Exclusions: preterm (<37 weeks) who had received oxygen in past 4 weeks; cyanotic or haemodynamically significant heart disease; CF or interstitial lung disease; documented immunodeficiency; direct admission to HDU/ICU; previously randomised

Results: Median time to cough resolution was 15.0 days in both groups with a median difference of 1.0 days (95% CI -1 to 2). This fell between the prespecified equivalence limits of plus and minus two days.


Authors’  conclusions

In children with acute viral bronchiolitis, the time taken for symptoms to resolve was the same whether they were managed to a target oxygen saturation of 90% or 94%.

On the study design


This study used eight centres to recruit a sample with 80% power to detect non-equivalence of greater than two days in time to resolution of cough. Cough resolution was determined by parents at pre-determined follow-up phonecalls (7, 14, 28 days and 6 months). Some allowances were made for inaccurate recording of this data using random selection of a date between the last time the cough was known to be present and the first date it was noted to be absent (if available). This method of reporting does still leave the outcome open to some parental bias and accuracy of reporting cannot be guaranteed.

Allocation to a group was concealed until definite enrolment, and the allocation was masked to study staff, hospital staff and parents. It’s not clear why the authors have chosen to use the work “masking” rather than “blinding”.

Several interesting secondary outcomes were also recorded although it is always worth remembering that studies are designed and powered to detect differences in the primary outcome and may be underpowered to detect differences in secondary outcome. The authors decided in advance to statistically analyse time until “fit for discharge” and actual discharge date for both groups, along with parental anxiety scores and whether the child was fit to attend daycare.

What were the results and what does this mean?


Following some loss to follow-up and protocol violations, 293 subjects were analysed in the standard group at 6 months and 291 in the modified oximeter group. This still reflects a study population greater than that determined by the power calculation. There was no difference in the median time to cough resolution which was 15.0 days in both groups.

The authors addressed both intention to treat analysis (analysing those subjects with protocol violations – being given the wrong oximeter probe – according to their original allocated group) and per-protocol analysis (analysing them only if they fulfilled the allocation from start to finish) and found this did not affect the results.

The modified group also had quicker return to adequate feeding and “back to normal” time. Patients in the modified group, predictably, received supplemental oxygen in fewer cases, for a shorter period, were considered fit for discharge sooner and were discharged sooner. There were fewer serious adverse events and adverse events in the modified group (35 SAEs in 32 infants in the standard group vs 25 SAEs in 24 infants in the modified group). The modified group had increased HDU admissions (13 episodes in the modified group vs 8 in the standard group) but fewer reattendances (26 in the standard group vs 12 in the modified group).

The authors postulate that having a higher target oxygen saturation influences decisions about fitness for discharge and that the increased use of oxygen in the standard group might have adversely affected feeding through drying of nasal passages, reflected in the time to adequate feeding. They also suggest that increased time in hospital in the standard group might expose these infants to nosocomial infection, causing the increased readmission rate – but of course this is all speculation 🙂

What can we take from this paper into clinical practice?

It seems that infants subjectively recover from bronchiolitis at the same rate even if we target SpO2 90% or above instead of 94% or above. However this was a population for whom a need for admission to hospital had already been identified and the extrapolation of this to the Emergency Department population is not wholly appropriate. We can be reasonably relaxed about SpO2 90-94% in these patients but until further work is done to reflect our undifferentiated population we should probably be careful about assuming we can safely discharge these infants.

More questions to ask

  • Would we see the same resolution and patterns of return to normal behaviour/complications in the undifferentiated ED population of infants with bronchiolitis?

See Also:

Don’t Forget the Bubbles – Tessa Davis reviews a JAMA paper on oxygen saturations in admission decision-making in patients with bronchiolitis – http://dontforgetthebubbles.com/effect-oximetry-hospital-admission-bronchiolitis/


Follow us on twitter: @PEMLit

5th April 2013: Prospective Pilot Derivation of a Decision Tool for Children at Low Risk for Testicular Torsion


Where can I find this paper?


What is this paper about (what is the research question)?

Is it possible to exclude a diagnosis of testicular torsion on the basis of history and examination alone?

Summary of the Paper

Design: prospective cohort study for derivation of a clinical decision rule

Objective: to derive a pilot clinical decision tool with 100% NPV for testicular torsion

Outcome: Proposed low-risk decision tree determined by recursive partitioning based on historical and examination variables recorded prior to ultrasonographic or specialist assessment

Reference Standard: presence of testicular torsion defined by: diminished blood flow on testicular doppler US (read by paediatric radiologist), or ischaemic/infarcted testicle at operative assessment (by paediatric surgeon or urologist), or presence of testicular atrophy at 1- to 3-month follow-up (contralateral difference in testicular size as measured by orchidometer)

Participants: Convenience sample of male patients aged 0-21 years with acute (<72h) testicular pain presenting to a tertiary children’s ED between July 2005-February 2008

Results: 228 patients (of 552 eligible patients) were enrolled. 55 (10% of eligible patients) were diagnosed with testicular torsion, of whom 21 (9.2%) were among those recruited into the study.

Odds ratios:

  • Horizontal/inguinal testicular lie OR=18.17 (95%CI 6.2-53.2)
  • Unilaterally or bilaterally absent cremasteric reflect OR=11.01 (95%CI 3.14-38.64)
  • Nausea or vomiting OR=5.63 (95%CI 2.08-15.22)
  • Age 11-21 years OR=3.9 (95%CI 1.27-11.97)
  • Scrotal oedema OR=3.42 (95%CI 1.21-9.69)

Authors’ Conclusions:

Patients with normal testicular lie, without nausea or vomiting, and between the ages of 0-10 years are at low risk for having testicular torsion despite the presence of acute testicular pain. Thus, patients who do not meet all three of these criteria should be considered at risk for possible testicular torsion and should undergo subsequent emergent evaluation.

On the study design

The inclusion and exclusion criteria seem sensible too; patients were included in the age 0-21 group with testicular pain of <72h duration, and subsequently excluded if they had prior ipsilateral inguinal or  urological surgery, definite hydrocoele or inguinal hernia or known diagnosis at initial evaluation. The authors have tried to maximise their awareness of the patient population by using database searches during the study period to identify “missed” participants.

Unfortunately the convenience sample meant that more than half of patients presenting during the study period who were diagnosed with testicular torsion were not included in the data collection. This means the study was underpowered for the question it intended to ask. Convenience sampling is often significantly cheaper and easier than a 24-hr recruiting presence in the ED but as this paper demonstrates it can have a profound effect on the numbers recruited, particularly in conditions which are relatively rare.

Various measures have been utilised to minimise the effect of bias; standardised data collection forms are always helpful in this regard. The initial ED assessments were made prior to ultrasound or speciality assessment which acts as a blind assessment, although surgeons and radiologists determining the outcome were not blinded. The authors argue that clinical information is essential in patient care, but many studies use blinded radiological assessment after the event and this could certainly have been undertaken in this case even if the surgeons could not be blinded.

In the UK, it is likely that testicular tissue would be sent for histological diagnosis; arguably, this is a more definitive outcome and could certainly be blinded.

The decision to follow-up at 1- 3 months with orchidometer measurements when baseline measurements were not taken is an odd one; surely this invites all manner of confounders? Thankfully this did not actually involve any subjects but it seems a strange choice – perhaps an afterthought?

What were the results and what does this mean?

Odds ratios for the various examination and historical findings were given in table 2. These variables were formulated into a decision rule using recursive partitioning.

050413 Table 2

The most strongly predictive finding was abnormal testicular lie, with an odds ratio of 18.17 but a very wide confidence interval (95%CI 6.2-53.2) reflecting the small study numbers.

The decision rule in itself had the following test characteristics:

  • NPV 100% (95%CI 98-100%)
  • Sensitivity 100% (95%CI 98-100%)
  • Specificity 44% (95%CI 38-50%)
  • PPV 15% (95%CI 11-21%)

Obviously an NPV of 100% and sensitivity of 100% is impressive and important in a rule-out tool such as this, but the specificity and positive predictive value are very low. This would ordinarily expose a large number of patients to further examination and assessment, but as these patients have not yet had doppler examination it may not be unworkable.

However, this rather raises the question – if I saw a 7-year-old patient with testicular pain and vomiting, would I really need this decision rule to tell me that he needed further assessment to exclude testicular torsion?

What can we take from this paper into clinical practice?

I don’t think that at this stage we can rely fully on the absence of abnormal lie, nausea/vomiting and age <10 years to exclude testicular torsion as a diagnosis in patients with acute testicular pain in the ED, but it will be interesting to see how the proposed decision tool performs in external validation.

However, taking a step back, we are able to see that what this paper is  trying to do is formalise the process of diagnostic suspicion of testicular torsion. We have little information about the skill and experience levels of the ED physicians performing the initial assessment. Does this paper tell us anything we don’t already know as clinicians?

Well, maybe yes – it looks as though we can be a little reassured by the group of patients aged <10 without abnormal lie or nausea/vomiting. The use of sensitivity analysis adds to this – the authors have included  patients lost to follow-up and assumed that they had torsion, finding that the decision rule performed just as well.

However, we really need to see how the rule performs in a fresh setting when applied to all patients rather than a convenience sample.

More questions to ask

  • How would this rule perform in a different setting – an external ED or even in general practice?
  • Does this decision process reduce our referrals for expert assessment/doppler US or does the low specificity/PPV represent a potential increase in referral, time and cost?

Follow us on twitter: @PEMLit