Category Archives: Critical Appraisal

Critical Appraisal of a Paper

5th July 2013: Comparison of cosmetic outcomes of absorbable versus nonabsorbable sutures in pediatric facial lacerations

5th July 2013

Where can I find this paper?

What is this paper about (what is the research question)?

Do non-absorbable and absorbable sutures give comparable cosmetic results for repair of simple facial wounds in kids?

Summary of the Paper

Design: multicentre, randomised controlled, single blinded trial with allocation concealment

Objective: to compare long-term cosmetic outcomes of absorbable versus non-absorbable sutures based on physician scoring of facial lacerations in the paediatric population

Outcomes: primary – visual analogue scale assessment of wound acceptability made by physicians, blinded to suture material, at 3 months. Secondary – caregiver completion of same visual analogue scale plus completion of satisfaction questionnaire

Intervention: closure of wound by standard approach using 5.0 fast-absorbing surgical gut (FAC) without removal of sutures

Reference Standard: closure of wound by standard approach using 5.0 non-absorbable suture (NYL) with removal of sutures at 4-7 days

Participants: patients presenting to two urban paediatric EDs in Philadelphia April 2008-April 2010

Inclusion – English speaking patients aged 1-18 years with isolated, non-contaminated linear facial wounds between 1-5cm in length assessed by clinicians as requiring closure by suture

Exclusion – irregular or contaminated wounds/bites, wounds>8h old, patients with complex wounds, immunodeficiency, bleeding/clotting disorder, pregnancy, diabetes, renal dysfunction, or allergy to local anaesthetic

Results: 98 patients were recruited of whom 49 had closure with FAC and 49 with NYL. 85 were followed-up at 4-7 days (42 FAC,43 NYL) and 76 at 3months in person or by telephone (FAC 37, NYL 39). Telephone follow up did not include VAS score.

61 patients had completed VAS scores at 3/12 (FAC 29, NYL 32)

Mean VAS scores by physicians:

FAC 57.6, NYL 67.6

Difference in means -10 (95% CI for difference in means -19.6 to -0.4) 

Authors’  conclusions

We are not yet able to conclude that absorbable sutures are equivalent to nonabsorbable sutures with respect to cosmetic outcomes of facial lacerations in children.

On the study design

There is little information on how patients were recruited, but other than the restriction of English-speaking patients inclusion and exclusion criteria seem sensible.

The allocation concealment and blinding is helpful in reducing bias, but I would question whether leaving absorbable sutures until completely absorbed is standard practice – it isn’t mine, and therefore this impacts the external validity of the study.

The plan for follow-up at 3/12 seems sensible and is rationalised by the authors but this seems early to fully assess the “long-term” impact of wound closure.

While the exact suture material does not necessarily replicate standard UK practice it is reasonable to assume little difference between non-absorbable and absorbable suture material around the globe.

What were the results and what does this mean?

The trial is a non-inferiority trial – the aim is to show that using absorbable suture material does not give a perceptibly inferior cosmetic result. The visual analogue scoring undertaken by blinded physicians (and averaged between three scorers) showed not only lower VAS satisfaction scores for the absorbable suture group but a 95% confidence interval which did not cross zero, suggesting the study was unable to demonstrate non-inferiority. The validity of the VAS has been assessed elsewhere but there is a considerable difference between physician and caregiver scores.

It is also important to remember that despite sample size calculations which predicted attrition of 40%, only 61/98 recruited patients actually completed the full study protocol and had photographs for assessment by VAS – so the study was insufficiently powered.

What can we take from this paper into clinical practice?

It appears that if we use absorbable sutures and don’t remove them, there are noticable differences in wound healing at 3/12; there’s insufficient evidence in this paper to convince us that not removing sutures provides a comparable cosmetic result in the first three months.

More questions to ask

  • Are there benefits to using absorbable sutures and then removing them (in the same timeframe as we would normally remove non-absorbable sutures)?
  • Would we see non-inferiority at a later review – 18 months after closure perhaps?
  • Would we see non-inferiority in an appropriately powered study?

Follow us on twitter: @PEMLit

12th April 2013: Electrolyte Profile of Paediatric Patients with Hypertrophic Pyloric Stenosis

120413 title

Where can I find this paper?

What is this paper about (what is the research question)?

In paediatric patients with hypertrophic pyloric stenosis (HPS), what is the prevalence of abnormal laboratory results?  Are these results related to the duration of illness (by duration of vomiting), and is there any time trend in these results?

Summary of the Paper

Design: Retrospective chart review

Objective: To investigate the incidence and prevalence of abnormal laboratory results in patients with a radiological and operative diagnosis of HPS


Primary – prevalence of high, low and normal CO2, K and Cl in HPS cases

Secondary – trend in prevalence of metabolic alkalosis and acidosis in HPS cases over the study period

Tertiary – association between days of vomiting and abnormal CO2, K and Cl

Reference Standard: Normal range laboratory results for the facility

Participants: Patients younger than 6 months, with HPS confirmed on ultrasound or Upper GI series, who underwent pyloromyotomy at a tertiary regional paediatric centre from 2000-2009.


205 patients were included in the study.  Their age varied from 1.4 to 13.9 weeks (SD 2.2), with a weight range of 2.1 to 4.9kg (SD 0.5).  88.3% were male.  74.3% were of non-Hispanic ethnicity.  80.5% white race, 1.5% African-American, 1.5% Asian and 16.5% other.

The proportion of HPS cases with normal serum CO2 was 62%, low 20%, and high CO2 18%.  Potassium was normal in 57%, low in 8% and high in 35% of cases.  Chloride was normal in 69%, low in 25% and high in 6% of cases.

Logistic regression analysis of the proportion of normal, low and high CO2 over the study period showed an increased in the prevalence of metabolic alkalosis (p=0.009) and a decreased in metabolic acidosis (p=0.002).

Advancing age was associated with presence of metabolic alkalosis on presentation with HPS (data not provided).

There was no correlation between the number of days of vomiting and abnormalities in electrolytes in this study population.

Authors’  conclusions

We observed that normal laboratory values are the most common finding in HPS and that metabolic alkalosis was found more commonly in the latter part of the decade and in older infants.

On the study design

This was a retrospective chart review for a 10 year period from 2000-2009.  Data from 2000-2002 was combined to increase power because the case numbers in single years were “small and unstable.”  It’s not clear what they mean by “unstable” as the raw data is not provided.

The authors do not comment on the total number of presentations over the study period, so it’s unclear if any cases were excluded, and reasons for any such exclusions.

There is demographic data missing with respect to birth weight (138/205), days of vomiting (196/205), heart rate at presentation (203/205) and weight at presentation (204/205).  The latter categories are unlikely to have been affected by this, and it is unclear whether additional data on duration of vomiting would have changed the analysis.

Prospective studies have the advantage of more complete data sets, and potential for further variables to be included, however can introduce observation/measurement bias.

What were the results and what does this mean?

Normal laboratory values are the most common finding in HPS and therefore  serum electrolytes are a poor marker for the presence or absence of HPS.

CO2 normal 62% low 20% high 18%

K normal 57% low 8% high 35%

Cl normal 69%  low 25%  high 6%

The incidence of metabolic alkalosis increased over the study period, and its prevalence is higher in older infants. 

They have no explanation for the increase in metabolic alkalosis over the decade of the study.

The authors postulate that the latter finding may demonstrate that advanced age at diagnosis serves as a marker for the duration and severity of stenosis.

What can we take from this paper into clinical practice?

This paper agrees with previous studies that the “typical metabolic picture” of hypochloraemic hypokalaemic metabolic alkalosis in paediatric HPS is no longer seen in the majority of presentations.

For us this means that we cannot rely on laboratory results as a marker for hypertrophic pyloric stenosis in infants.  We must continue to have a high index of suspicion for this condition in infants presenting with persistent vomiting and proceed to ultrasound for diagnosis.

Although laboratory results don’t help us decide which children need ultrasound, it is important to look for metabolic derangements and correct them as indicated.

What this study adds is that contrary to previous beliefs, there is no relationship between the duration of illness, and particularly vomiting, on the severity of metabolic derangements in these children.  This seems counter-intuitive, and perhaps the more important factor is not the duration of vomiting, but whether the infants are able to keep down an adequate amount of fluids – i.e. The severity of dehydration.

Unfortunately there was insufficient data in the patient charts to enable analysis of trends between dehydration, vomiting and abnormal laboratory results.  Only 43/205 (21%) charts mentioned hydration status, however 42% of the patients whose charts noted dehydration (36/205) had metabolic alkalosis at presentation, compared to 44% with normal CO2.

More questions to ask

  • Were there a higher proportion of males in this group than other populations?
  • Why was delayed presentation (60days vomiting in one case) not associated with more severe illness??
  • An insufficient number of charts contained information about hydration status – is this more relevant for laboratory abnormalities than days of vomiting?

Follow us on twitter: @PEMLit

5th April 2013: Prospective Pilot Derivation of a Decision Tool for Children at Low Risk for Testicular Torsion


Where can I find this paper?

What is this paper about (what is the research question)?

Is it possible to exclude a diagnosis of testicular torsion on the basis of history and examination alone?

Summary of the Paper

Design: prospective cohort study for derivation of a clinical decision rule

Objective: to derive a pilot clinical decision tool with 100% NPV for testicular torsion

Outcome: Proposed low-risk decision tree determined by recursive partitioning based on historical and examination variables recorded prior to ultrasonographic or specialist assessment

Reference Standard: presence of testicular torsion defined by: diminished blood flow on testicular doppler US (read by paediatric radiologist), or ischaemic/infarcted testicle at operative assessment (by paediatric surgeon or urologist), or presence of testicular atrophy at 1- to 3-month follow-up (contralateral difference in testicular size as measured by orchidometer)

Participants: Convenience sample of male patients aged 0-21 years with acute (<72h) testicular pain presenting to a tertiary children’s ED between July 2005-February 2008

Results: 228 patients (of 552 eligible patients) were enrolled. 55 (10% of eligible patients) were diagnosed with testicular torsion, of whom 21 (9.2%) were among those recruited into the study.

Odds ratios:

  • Horizontal/inguinal testicular lie OR=18.17 (95%CI 6.2-53.2)
  • Unilaterally or bilaterally absent cremasteric reflect OR=11.01 (95%CI 3.14-38.64)
  • Nausea or vomiting OR=5.63 (95%CI 2.08-15.22)
  • Age 11-21 years OR=3.9 (95%CI 1.27-11.97)
  • Scrotal oedema OR=3.42 (95%CI 1.21-9.69)

Authors’ Conclusions:

Patients with normal testicular lie, without nausea or vomiting, and between the ages of 0-10 years are at low risk for having testicular torsion despite the presence of acute testicular pain. Thus, patients who do not meet all three of these criteria should be considered at risk for possible testicular torsion and should undergo subsequent emergent evaluation.

On the study design

The inclusion and exclusion criteria seem sensible too; patients were included in the age 0-21 group with testicular pain of <72h duration, and subsequently excluded if they had prior ipsilateral inguinal or  urological surgery, definite hydrocoele or inguinal hernia or known diagnosis at initial evaluation. The authors have tried to maximise their awareness of the patient population by using database searches during the study period to identify “missed” participants.

Unfortunately the convenience sample meant that more than half of patients presenting during the study period who were diagnosed with testicular torsion were not included in the data collection. This means the study was underpowered for the question it intended to ask. Convenience sampling is often significantly cheaper and easier than a 24-hr recruiting presence in the ED but as this paper demonstrates it can have a profound effect on the numbers recruited, particularly in conditions which are relatively rare.

Various measures have been utilised to minimise the effect of bias; standardised data collection forms are always helpful in this regard. The initial ED assessments were made prior to ultrasound or speciality assessment which acts as a blind assessment, although surgeons and radiologists determining the outcome were not blinded. The authors argue that clinical information is essential in patient care, but many studies use blinded radiological assessment after the event and this could certainly have been undertaken in this case even if the surgeons could not be blinded.

In the UK, it is likely that testicular tissue would be sent for histological diagnosis; arguably, this is a more definitive outcome and could certainly be blinded.

The decision to follow-up at 1- 3 months with orchidometer measurements when baseline measurements were not taken is an odd one; surely this invites all manner of confounders? Thankfully this did not actually involve any subjects but it seems a strange choice – perhaps an afterthought?

What were the results and what does this mean?

Odds ratios for the various examination and historical findings were given in table 2. These variables were formulated into a decision rule using recursive partitioning.

050413 Table 2

The most strongly predictive finding was abnormal testicular lie, with an odds ratio of 18.17 but a very wide confidence interval (95%CI 6.2-53.2) reflecting the small study numbers.

The decision rule in itself had the following test characteristics:

  • NPV 100% (95%CI 98-100%)
  • Sensitivity 100% (95%CI 98-100%)
  • Specificity 44% (95%CI 38-50%)
  • PPV 15% (95%CI 11-21%)

Obviously an NPV of 100% and sensitivity of 100% is impressive and important in a rule-out tool such as this, but the specificity and positive predictive value are very low. This would ordinarily expose a large number of patients to further examination and assessment, but as these patients have not yet had doppler examination it may not be unworkable.

However, this rather raises the question – if I saw a 7-year-old patient with testicular pain and vomiting, would I really need this decision rule to tell me that he needed further assessment to exclude testicular torsion?

What can we take from this paper into clinical practice?

I don’t think that at this stage we can rely fully on the absence of abnormal lie, nausea/vomiting and age <10 years to exclude testicular torsion as a diagnosis in patients with acute testicular pain in the ED, but it will be interesting to see how the proposed decision tool performs in external validation.

However, taking a step back, we are able to see that what this paper is  trying to do is formalise the process of diagnostic suspicion of testicular torsion. We have little information about the skill and experience levels of the ED physicians performing the initial assessment. Does this paper tell us anything we don’t already know as clinicians?

Well, maybe yes – it looks as though we can be a little reassured by the group of patients aged <10 without abnormal lie or nausea/vomiting. The use of sensitivity analysis adds to this – the authors have included  patients lost to follow-up and assumed that they had torsion, finding that the decision rule performed just as well.

However, we really need to see how the rule performs in a fresh setting when applied to all patients rather than a convenience sample.

More questions to ask

  • How would this rule perform in a different setting – an external ED or even in general practice?
  • Does this decision process reduce our referrals for expert assessment/doppler US or does the low specificity/PPV represent a potential increase in referral, time and cost?

Follow us on twitter: @PEMLit

8th February 2013: Impact of Duration of CPR on Survival and Neurological Outcomes in Paediatric Cardiac Arrest


Where can I find this paper?

What is this paper about (what is the research question)?

Does a longer duration of CPR in paediatric cardiac arrest have prognostic bearing on the likelihood of a positive ultimate outcome (i.e. survival to discharge with intact neurological function)?

Summary of the Paper

Design: analysis of data from prospective multicentre registry (cohort study)

Objective: to use the GWTG-R data to evaluate the relationship between CPR duration and intact survival to hospital discharge after paediatric in-hospital cardiac arrest according to illness category

Primary Outcome: survival to hospital discharge. Secondary outcomes included survival with favourable neurological outcome.

Population: patients <18 years of age suffering pulseless cardiac arrest in-hospital at one of 28 US and Canadian institutions between 1st Jan 2000 and 31st Dec 2009.

  • Inclusion: At least one minute chest compressions provided
  • Exclusion: Events beginning outside hospital or in NNU, delivery room or nursery, illness categories newborn, obstetric or other illness

Results: 3419 paediatric in-hospital cardiac arrests fulfilled the criteria.

Median (IQR) CPR duration was 10 minutes (4-25) for survivors and 25 minutes (12-45) for non-survivors. Survival to discharge was 27.9%; 19.0% of all cardiac arrest patients had a favourable outcome, representing 68.2% of the survivors of the initial insult.

Survival rate and favourable neurological outcome fell linearly in the first 15 minutes; neurological outcome decreased by 1.2% for each additional minute of chest compressions.

Authors’ Conclusions:

CPR duration was inversely associated with survival
to hospital discharge and neurological outcome, even after
adjustment for confounding factors. Surgical cardiac patients had
improved outcomes compared with patients in all other illness
categories. Importantly, this study suggests that a proportion of
children who would presumably die without CPR survive with a
favorable neurological outcome even after prolonged CPR.

Why was the study necessary?

This article was brought to my attention via twitter with the headline “among survivors favourable neurological outcome occurred in 60% undergoing CPR > 35 minutes”. On checking this was indeed word perfect from the abstract. This took me aback as this was not quite my experience, even with the non-survivor group excluded. A paper definitely worth a closer look then!

The article starts with the pretext of the (relatively) low cardiac arrest rate (0.7-3%) in children admitted to hospital with a note that, although still low, survival rates have improved in the last decade. The authors acknowledge a paediatric mantra that CPR beyond 20 minutes or 2 rounds of adrenaline is generally futile although it is interesting the evidence for this has never been robustly demonstrated. A large prospective cohort study then seems a very reasonable undertaking. The American Heart Associations “Get with the Guidelines-Resucitation” [AHA GWTG-R ]initiative providing a national registry for the relationship between in-hospital CPR duration and intact survival to be explored.

On the study design

The AHA-GWTG-R is a prospective registry with data from 328 US and Canadian hospitals between 2000 and 2009. In order to categorise patients pre arrest the following categories were used with those with DNAR orders excluded:

  • Medical Illness (non-cardiovascular)
  • Medical Illness (cardiovascular)
  • Surgical Illness (non-cardiovascular)
  • Surgical Illness (cardiovascular)
  • Trauma

Patients must have had > 1min of CPR provided and it is worth noting that as well as inpatients those in outpatient clinics were included. Those in whom the event started out of the hospital were excluded. Anyone receiving >180 mins of CPR were defined as having 180mins maximum. The primary outcome measure was survival to hospital discharge and secondary measures included return of spontaneous circulation > 20 mins, 24 hour survival and discharge with favourable neurological outcome – this was defined as a Paediatric Cerebral Performance Category of 1, 2 or 3 on discharge. The PCPC is shown in table one.


The authors are open about the fact that as a multi-centre study classifications between hospitals may differ. They are even frank enough to state that as the users of the AHA-GWTG-R pay a fee to do so they may be more interested in outcomes that other hospitals (the AHA-GWTG-R is used by 10% of hospitals).

It could be argued that included ED patients with in hospital arrests are very different from patients who arrest on the wards but the differences between these two groups were not broken down. This would have been very useful information for the emergency care community.

What were the results and what does this mean?

The data collected included 5922 which were all accounted for in an Utstien diagram (figure 1).


Key demographic details in the study were that the mean age was 4.9+/- 6 years, 8% of events were not witnessed but nearly all (90.5%) were monitored. The majority of patients were General Medical (43.2%). The median CPR duration was 10 minutes for survivors and 25 minutes for non-survivors. Survival to discharge was 27.9%, but only 19.0% of all cardiac arrest patients had a favourable outcome as per the PCPC. Both survival rate and favourable neurological outcome fell linearly in the first 15 minutes, with neurological outcome decreasing by 1.2% for each additional minute of chest compressions. However although only 19% had a favourable neurological outcome this represented 68.2% of the survivors of the initial insult.


The headline figure of 60.1% comes from the 95 out of 158 survivors who received CPR for 35 minutes and had a favourable outcome. Surgical cardiac patients had the highest adjusted probability of neurological outcome with medical, general surgical and general medical similar to the whole cohort. Traumatic arrest has the poorest outcome of 4.3%

The study confirms that generally the outcome after arrest is poor however it demonstrated there is variation in the outcome dependant on the type of patient. Interestingly there was an indication that continuing CPR for more than 20 minutes may be justified given the proportion of those who survived with a positive neurological outcome, using the PCPC classification system. It is important to note that overall surviving numbers were low and there was not a priori attempt to predict in which patient group prolonged CPR may be beneficial. 

The biggest challenge with this study is the use of the PCPC and the classification of 3 as a favourable outcome. Classification 3 is a moderate disability but is quite different from 4 (severe; dependant on others for daily support). The distinction is not always clear cut and an excess of grade 3 due to the strict classification may bias the results especially as this was done on discharge when a more favourable grade may be applied.

Finally although the break down into 0-15, 15-35 and >35 seems reasonable there were patients receiving up to and above of 180 minutes of resuscitation. It would be useful to distinguish resuscitation rates into standard, prolonged, very prolonged and unique for the purposes of evaluation as the patient groups, and potentially teams working on them are likely to be very different.

What can we take from this paper into clinical practice?

This study is the largest of its kind and is an extremely useful platform on which to base further research. It will be important the neurological outcome is clearly defined and followed up for a reasonable period of time. The distinction between medical groups needs to be taken into account and for the emergency care community it is validation of the extremely poor survival and outcome of traumatic cardiac arrest. Until further work is performed it will be difficult to extend the ’20 minute rule’ but it is vital this work is performed. To be working in an area as ethically challenging as resuscitation needs a clear evidence base and work both in and out of hospital.

More questions to ask

  •  Do rates differ between ward patients and those suffering cardiorespiratory arrest within the Emergency Department?
  • How would results change if we defined “favourable neurological outcome” more clearly?

Follow us on twitter: @PEMLit

1st February 2013: Plain XR for Paediatric Patellar Subluxation


Where can I find this paper?

What is this paper about (what is the research question)?

Do plain post-reduction XRs provide clinically useful and ED management-altering information in paediatric patients with patellar subluxation?

Summary of the Paper

Design: retrospective chart review

Objective:  to estimate the incidence of fractures detected on post-reduction XRs for patients with lateral patellar subluxation and to identify whether (and how) the presence or absence of a fracture alters ED management

Outcome: primary – presence or absence of fracture on post-reduction XRs for patients with lateral patellar subluxation. Secondary – differences in ED management between patients with and without fractures on plain post-reduction films.

Population:  patients <21 years presenting to the ED of a tertiary children’s hospital between January 1st 2000 and 31st December 2010

  • Inclusion: ICD coding related to patellar dislocation with reduction in the ED
  • Exclusion: patients with medial or intra-articular dislocations, patients with spontaneous reduction

Results: 80 patients identified of whom 79 (98.8%) underwent reduction of their dislocation/subluxation in the ED.

11 (13.7%) of patients had a pre-reduction XR – none of these had a fracture identified by radiologist report.

74 (92.5%) of patients had a post-reduction XR – fractures were idenfied in 8 cases (10%: 95% CI 3-17)

Patients with both pre- and post-reduction XR had a longer length of ED stay (median 3.4h, range 1.5-5.2h) compared with those receiving a single XR set only (median 1.9h, range 0.6-6.0h).

All patients, regardless of presence or absence of fracture, had uneventful reduction and were discharged with knee immobilisation and outpatient follow-up.

Authors’ Conclusions:

Pediatric patients with lateral patellar dislocations may be candidates for discharge from the ED after reduction without plain radiography. The modality by which to best determine the presence of a complicating osteochondral fracture (i.e., plain radiography, computed tomography, MRI, or arthroscopy) may be left to the discretion of the orthopedic surgeon accepting the child in follow-up.

On the study design

A nice short paper this week – and a relevant question – do we need to XR knees post-reduction of patellar subluxation/dislocation?

A couple of methodological issues with this one though. Retrospective studies are always open to bias – in this case, relying on ICD-9 classification introduces a potential for selection bias as we are reliant upon the accuracy of coded data to identify our patient cohort. However, with such small numbers (only 80 patients in an 11-year period) a prospective study is unlikely to generate enough subjects to maintain momentum and as such retrospective data collection is far more pragmatic.

The wide timeframe, while providing a reasonable sample size (albeit difficult to guage in terms of its epidemiological accuracy) does open the study up to the confounding effect of changes in practice. Arguably the introduction of alternative imaging modalities (ultrasound) and the relative availability of MRI scan might impact clinician decisions regarding whether to perform post-reduction XRs. The study can give us no account of this.

The other major issue is the lack of blinding in the data collection stage; this paragraph of the methods section is particularly interesting as it sounds as though the research assistant was specfically trained to identify qualities and points of interest among the identified case notes – this would almost certainly introduced an element of observation bias, exacerbated by the use of a single unblinded data collector.

Still, formal radiology reporting was used to determine the presence or absence of fractures – a reasonable standard. Given that the article talks about the use of other imaging modalities, it’s hard not to wonder how the radiology report might have been influenced; they were almost certainly not blinded to the clinical data, were CT/MR reports also available which might have “added” to the interpretation of plain films? When were reports made relative to the injury and availability of other imaging?

What were the results and what does this mean?

First, let’s look at pre-reduction films. These were taken in 11 patients. We know from the paper that in 79/80 patients the clinician had documented that there was clinically visible displacement of the patella laterally. So why the XRs? Habit perhaps? In fact, of those without pre-reduction XRs, 68/69 had specific documentation about the clinically visible displacement (so that’s where the 1/80 was) – simple maths tells us that all 11 patients having pre-reduction films had dislocation/subluxation apparent on examination alone.

And for the post-reduction films; these were performed in 74/80 patients (92.5%). 8 patients had fractures (10%), of whom none had pre-reduction films taken. None of the patients required intervention beyond ED reduction apart from a multiply injured patient whose patella was reduced in theatre while other injuries were being treated.

So, for all the methodological problems, it doesn’t look as though plain films – with or without fractures – change our ED management.

What can we take from this paper into clinical practice?

Well, from this small and moderately flawed study, it doesn’t look as though plain XRs add anything to ED management of patellar subluxation/dislocation. I certainly can’t think of a time in my clinical practice when a post-reduction film has led to admission (assuming reduction was successful, of course). So why do we do them?

What we don’t know is how these patients are subsequently managed at outpatient clinic. While CT/MR scans are more sensitive (see discussion section of paper for references) for identifying osteochondral fractures, does the presence of a fracture on plain film have important prognostic significance? Does it lead to earlier operative intervention or increased likelihood of operative management?

So not quite enough to throw plain films out altogether, but certainly worth exploring with a longer study period to include follow-up, together with some good quality orthopaedic opinion (other than, “get an x-ray because that’s what we do”).

More questions to ask

  •  Can this data be extrapolated to patients with spontaneous reduction and are these patients routinely x-rayed in any case?
  • Would a period of follow-up including outpatient review change the outcomes of this paper? Would we discover that the patients with fractures who were discharged should have had emergency treatment?
  • How does this sit with our orthopaedic colleagues?

Follow us on twitter: @PEMLit

25th January 2013: Derivation of a Clinical Prediction Rule for Non-Accidental Head Injury


Where can I find this paper?

What is this paper about (what is the research question)?

Can we identify children whose head injury is non-accidentally occurring using the presence or absence of certain clinical criteria?

Summary of the Paper

Design: multicentre prospective observational cross-sectional study

Objective: to identify and measure relationships between clinical variables and non-accidental head injury at time of PICU admission

Tests of Interest: a list of clinical and radiological findings

Reference Standard: a priori definition criteria for non-accidental head injury

Primary outcome: test characteristics with sensitivity, specificity and reliability, combined into a decision tree

Population: PICU patients across 14 participating sites, recruited between Feb 2010 and August 2011

  • Inclusion: children <3yrs of age admitted to PICU for treatment of symptomatic acute closed traumatic head injuries
  • Exclusion: pre-existing brain malformation identified on CT, absence of acute head trauma, head trauma from RTC.

Results: 209 patients recruited, of whom 45% (95) met one or more of the criteria for non-accidental head injury defined a priori. Of these patients, abuse was admitted by the perpetrator in 14 cases and moderately-strongly suspected (2+ extracranial injuries) in 53 cases.

20 variables were both reliable and descriminating of which 13 were based on information available at or near to the time of PICU admission. Binary recursive  partitioning identified five variables present at or near the time of PICU admission which, when used alone or in combination, identified 92 (97%) of those meeting the a priori defined criteria;

  1. acute respiratory compromise prior to admission
  2. seizures or acute encephalopathy
  3. bruising of the ear, neck or torso
  4. interhemispheric or bilateral subdural haemorrhage or fluid collection
  5. skull fracture other than isolated linear non-diastatic parietal fracture

For the five-part rule:

Sensitivity 0.97 (95% CI 0.90-0.99)

Specificity 0.27 (95% CI 0.20-0.37)

Positive predictive value 0.53 (95% CI 0.45-0.60)

Negative predictive value 0.91 (95% CI 0.75-0.98)

LR+ 1.33 (95% CI 1.18-1.50)

LR- 0.12 (95% CI 0.04-0.37)

Authors’ Conclusions:

Once validated, the rule could be used by paediatric intensivists to calculate an evidence-based, patient-specific estimate of abuse probability that can inform – not dictate – early decisions to launch or forego an evaluation for abuse.

On the study design

The methodology here is quite complicated. Essentially, the authors have decided in advance some criteria which cause or allow a high probability of non-accidental trauma as the cause of head injury in their PICU population (let’s call these reference critera). They have then measured the presence or absence of historical, examination or radiological findings and, while measuring the reliability of their assessment of the presence or absence of the findings, have correlated the presence or absence of the finding with non-accidental trauma as defined by the reference criteria. The individual findings were divided into “early” or “late” to help determine those likely to be present at the time of PICU admission, and the most reliable and discriminating criteria identified.

These were then combined into a decision rule which was applied to the population again, and test characteristics calculated. Phew!

There are a few problems here, most of which the authors identify. The major issue is true of all non-accidental injuries; there is no certainty in the diagnosis, no gold standard, and as anyone working in child protection knows identifying children who have been intentionally or neglectfully injured is a game of probability in the absence of confession (and even then not always a certainty). Vigilance is key; overdiagnosis causes a massive workload for paediatricians and enormous stress for families and patients (who are often separated during investigations), underdiagnosis fails the child and the family, often with tragic, deadly consequences.

The criteria used as a reference standard (table 2) are a reasonable surrogate for a gold standard.

230113 Table2

The other problem we have is that this is a very select population; by definition, these are the more severely injured children and it is likely that a significant proportion of non-accidentally head injured children will not require PICU. This immediately affects the extent to which we can generalise the findings to our significantly different PED population.

What were the results and what does this mean?

Figure 2 shows the decision tree the authors developed from the most reliable and discriminating variables.

250113 Fig2

The authors have then calculated the test characteristics for the decision tree as shown in table 6.

250113 Table6

As we can see by looking at the decision tree, the tool is far more sensitive than specific, thereby acting as a better tool to rule out NAI than ruling it in. In fact, the sensitivity is perhaps not as good as we would like (look at the 95% confidence interval – it could be as low as 90%). We can see that 3 patients classified a priori as high risk were misclassified by this tool as low risk.

The negative likelihood ratio is small, suggesting that a negative result in this population (with a low pre-test probability) produces a very low post-test probability, but again the confidence interval is quite wide.

The authors concede that adding any further variables would make the rule too complicated to be practical, which seems reasonable, but it does leave us wondering whether this rule will be fit for purpose with a potentially low NPV and sensitivity.

What can we take from this paper into clinical practice?

While this decision tree might, once validated, help to rule-out non-accidental head injury in the PICU population, the patients here are just too different from the PED population for this to be useful.

In addition, our job in PED is to resuscitate these children and while it is essential that NAI is always on the mind of the PED doctor, for these patients stabilisation and management of the acute injuries must take priority. Does this rule add anything to our PED assessment? I don’t think so – these are not the patients in whom I want to think carefully about NAI as my priorities are different; they have immediate clinical needs and NAI can be considered in more depth later. The patients in whom I want to rule NAI out are altogether less unwell. Any one of these five findings necessitates further PED assessment of the child from a perspective other than NAI.

It’s not a bad paper – it just doesn’t help us in PED.

More questions to ask

  • How does this rule perform in other PICUs (validation)?
  • How sure do we want to be of ruling out NAI (what sensitivity level should we accept in a tool like this)?
  • Does it have any predictive value for patients in the PED? Are there other cues which do have useful rule-in or rule-out potential in the less seriously injured head injury patients in PED?

Follow us on twitter: @PEMLit

4th January 2013: Comparison of Rectal, Axillary, Tympanic, and Temporal Artery Thermometry in the Pediatric ER


Happy New Year from PEMLit!

Where can I find this paper?

What is this paper about (what is the research question)?

Which method of temperature measurement – axillary, tympanic or temporal artery thermometry – is the best predictor of rectal temperature in febrile and afebrile children?

Summary of the Paper

Design: Prospective single-centre observational (?diagnostic) study

Objective: to determine the most accurate non-invasive method of thermometry

Tests of Interest: Axillary digital thermometer – 5 minutes. Tympanic infrared thermometry as per manufacturer’s instructions (right and left ears). Temporal artery thermometry as per manufacturer’s instructions.

Reference Standard: rectal thermometry – mercury thermometer for 3 minutes.

Primary outcome: test characteristics and correlation coefficients for each method.

Population: Children aged 2-12 years presenting to the Emergency Room of a single centre in Delhi, India.

  • Inclusion: Not clear from methods section
  • Exclusion: Abnormal ear or rectal anatomy, thermoregulatory disturbances, family history of malignant hyperthermia, diaphoresis, Hb <8g/dL, severe malnutrition/severe wasting (WHO classification), uncooperative, crying, unconscious.

Results: 100 patients were enrolled, 50 “febrile” (Rectal T>38) and 50 “afebrile”.

Temporal artery thermometry had the highest correlation coefficient for both febrile (0.99) and afebrile (0.91) children.

In the detection of fever;

Axillary thermometry had sensitivity 80% and specificity 100% (no confidence intervals given).

Tympanic thermometry had sensitivity 98% and specificity 98% (no confidence intervals given).

Temporal artery thermometry had sensitivity 80% and specificity 98% (no confidence intervals given).

Authors’ Conclusions:

Temporal artery thermometry has the potential to replace rectal thermometry in the busy Emergency Room setting among children aged 2-12 years.

On the study design

Papers like this make me so sad! This could be such a great study – I’ve been asking the question “how should we measure temperature and what do we mean by fever?” for such a long time, and this simple study design has great potential to answer the question. Rectal thermometry is upheld as the “gold standard” in determining temperature but is not without risk.

Unfortunately, there are a few gaps. We don’t know how the patients were selected, so there could be all sort of bias and confounders we don’t know about. And what about excluding children who were crying? OK, it might make them a bit warmer – but who’s been in a Paediatric ED which isn’t filled with crying children?!

That said, they have a good size sample (100 subjects) and seem to have powered the study appropriately. But another question that arises is about the reliability of the measurements, in particular their standard rectal (mercury) thermometry. Who was reading these temperatures? How do we know their assessment is reliable – where is the kappa score?

What were the results and what does this mean?


In terms of correlation with rectal temperature, termporal artery thermometry came closest, with temperatures in 50/50 febrile and 49/50 afebrile children reading within +/- 0.4 degrees of their rectal temperature.

Axillary temperature seemed to correlate better in both groups than tympanic temperature.


Tympanic thermometry had the highest sensitivity for detecting fever (98%), making it the best at ruling fever out, while axillary thermometry was the most specific (100%). No confidence intervals were given. Are these figures useful in clinical practice? No, probably not.

What can we take from this paper into clinical practice?

Despite the suspicious methodology, it certainly seems as though infrared temporal artery thermometry is the closest proxy for rectal thermometry, and with 99% of measurements within +/-0.4 degrees it seems reasonable to suggest it as the preferred method of ED thermometry in those aged 2-12 years.

More questions to ask

  • As a gold standard, how reliable is rectal thermometry?
  • What is the best method for children under 2 years of age, for whom tympanic thermometry is not considered to be an option?

Follow us on twitter: @PEMLit