The Limits of Visual Inspection

Interesting research:

Target prevalence powerfully influences visual search behavior. In most visual search experiments, targets appear on at least 50% of trials. However, when targets are rare (as in medical or airport screening), observers shift response criteria, leading to elevated miss error rates. Observers also speed target-absent responses and may make more motor errors. This could be a speed/accuracy tradeoff with fast, frequent absent responses producing more miss errors. Disproving this hypothesis, our experiment one shows that very high target prevalence (98%) shifts response criteria in the opposite direction, leading to elevated false alarms in a simulated baggage search. However, the very frequent target-present responses are not speeded. Rather, rare target-absent responses are greatly slowed. In experiment two, prevalence was varied sinusoidally over 1000 trials as observers’ accuracy and reaction times (RTs) were measured. Observers’ criterion and target-absent RTs tracked prevalence. Sensitivity (d’) and target-present RTs did not vary with prevalence. These results support a model in which prevalence influences two parameters: a decision criterion governing the series of perceptual decisions about each attended item, and a quitting threshold that governs the timing of target-absent responses. Models in which target prevalence only influences an overall decision criterion are not supported.

This has implications for searching for contraband at airports.

Tags: academic papers, air travel, psychology of security, searches

Posted on February 8, 2010 at 1:54 PM • 11 Comments

Comments

David • February 8, 2010 5:16 PM

Wait a second, is this saying that by changing the search for a rare threat, say bombs, into a search for something more common like 3+oz hair gel containers they actually catch more bombs (since it’s part of the larger class)?

Is the ban/search for nail files and scissors really the best way to search for more dangerous knives and weapons?

If so, it could explain the odd policy behavior of the TSA, even if these are in fact misconceptions.

Daniel • February 8, 2010 5:28 PM

Actually, this type of behavior is perfectly predicted by Bayes Theorem. The more improbable an occurrence is the better off you are ignoring evidence to the contrary. The more probable an occurrence is the better off you are ignoring the absence of evidence in support of it. Accuracy and precision are only valuable commodities when the odds fall within a standard deviation of the mean.

Barton Chittenden • February 8, 2010 7:59 PM

The solution to this seems fairly obvious to me: Add images of guns/bombs/knives etc. in to the stream of images. There would be a trade-off between speed and accuracy as more fake images are added.

Nathan Tuggy • February 8, 2010 8:37 PM

This reminds me of http://www.pgdp.net/wiki/Confidence_in_Page_Algorithm#p_probability_of_finding_a_misprint — this likewise suggests deliberately injecting fake targets to allow human checkers to find more, and presumably removing the fake targets (in the case of PGDP) or perhaps auditing their presence (in the case of the DHS)?

Presence audits would have the advantage of being able to predict the approximate chance of an actual target getting through — though not, of course, the actual number of targets, without knowledge of the approximate number of targets to find.

Tamooj • February 8, 2010 11:34 PM

I believe the TSA airport screeners already do this, for just the reasons cited in the study. The NYT did a great story on this last year – almost all of the new (2008) x-ray scanners will inject images of suspicious objects (guns/bombs/etc.) into the realtime data feeds; the screeners must indicate when they see a suspicious object with a mouse click. If it’s a test image (~10%), the ‘fake’ image is highlighted (in case there is another (real) suspicious object in the image) and the screener’s hit::miss score is recorded and evaluated. Having a high failure rate will get a screener pulled from the line very quickly and sent back to a training course. The TSA is constantly adding to its database of threat-images, which are immediately added to the testing system. Pretty innovative for the TSA, if you ask me.

Will • February 9, 2010 1:16 AM

I hope that the fake image doesn’t get superimposed on a real threat, and the click verifies the threat gone and the baggage moves onward…

Clive Robinson • February 9, 2010 2:55 AM

Oh dear hit and miss scores against images from a DB…

Do people remember the story from earlier in the TSA’s life about the trainer telling a (slightly brighter than average) trainee that they had to learn “one hand gun” outline as that was “the one they would be tested with”…

How long before a person memorises the fake images and is watchfull for only those images.

As Bruce has noted in the past “hinky” is a state of mind and it is difficult to teach (especialy in a class room). If people are trained up on one set of images how are they going to recognise only losely related images they have not seen befor as potentialy dangerous?

The advantage of a human is to add up all sorts of bits of NEW data and come up with a likley hood of threat. The advantage of an automaton is to add up all sorts of bits of KNOWN data and come up with a correct choice of KNOWN threat.

We have an expression in the UK of,

“Points make Prizes”

In the near future I can see the pay/promotion advancment in the TSA being calculated on throughput and hit and miss score on fake images…

Thus we would be training automata…

So how do you train “hinky” well first take off managment iducements such as “bags/hour” and “hits/misses”. Very few learn well when they feel like they have a gun to their head, as stress is usually counter productive.

Then get a few inventive people to develop real test devices that are very much always different and put them through during various advanced learning sessions.

Then have a little patience with the testers some will develop “hinky” others will not.

However hit/miss training without punishment does work you only have to see kids with their console games to know that.

Thus a thought occures that maybe the TSA need to have an all comers competition each year as various Police Depts used to have with target shooting etc.

Maybe if they could up the action they could develop their own FPS console game 😉

peri • February 9, 2010 5:20 AM

I think Clive is right in cases where the events you are trying to classify are so rare you need to generate fakes. In these cases, ie bomb detection, you are probably in trouble.

However, if we take NPR’s “hundreds of images a day” to mean a 200 per day average, then Greenberg alone sees an average of one actual case of breast cancer per day. So as long as there are more than 100 other classifiers with similar rates we should have enough data to ensure a stream of unique test images.

If we store every positive in a DB and then have our classifiers work with a 50% chance of a real classification and a 50% chance of a test classification then the rates are halved. So we need to double the number of breast cancer classifiers to keep the rate constant.

Pete Austin • February 9, 2010 6:20 AM

Maybe this is why we daydream/fantasise about rare events. The brain is generating its own fakes to improve the accuracy of noticing the real thing.

David • February 9, 2010 9:33 AM

@Will: Fortunately, there’s almost no chance of a suitcase containing a bomb or other contraband in the first place, so a random image will almost never obscure one. If it increases the likelihood that a screener will notice a bomb, it’s worth it.

It’s a real-life application of the old statistics joke:

Statistician: “I never used to fly, since I thought the chances of a bomb on an airplane to be too high.”
Friend: “So why are you flying now?”
Statistician: “I calculated the chance of two bombs on a flight, and it’s low enough for me. So now I just bring my own bomb.”

paul • February 10, 2010 10:44 AM

Injecting fakes into the data stream, it seems to me, only works if the person doesn’t cognitively chunk “see a fake, learn it’s a fake, go on”. For fake-positives to count as positives for purposes of influencing the recognition system’s behavior, they have to be treated as positives for a long-enough period to get encoded that way.

The Limits of Visual Inspection

Comments

Leave a comment Cancel reply