Classification and Tagging Analysis

Overview

This page evaluates whether computer vision systems can generate useful archival tags for visual artifacts. Four photographs were analyzed from the archive, including trauma response training, humanitarian training, demining instruction, and landmines stored in a vehicle glove compartment. Each image received five manual tags based on historical meaning and context. Those tags were then compared with tags generated by Gemini 3.1 Pro.

Method and Quantitative Results

To measure agreement between human and AI interpretation, tags were normalized using a small controlled vocabulary and compared using the Jaccard similarity score, which measures overlap between two sets of keywords. A score of 1.0 would represent perfect agreement, while 0 represents no overlap. Across the four artifacts, scores ranged from 0.14 to 0.50, indicating generally weak agreement between human interpretation and automated tagging.

Artifact-Level Observations

The demining training image produced the highest score (0.50), where both the human and AI tags identified terms such as detector, training, and outdoors. In contrast, other artifacts showed substantial differences. For example, the glovebox photograph was manually tagged with landmine and M14, reflecting its relevance to mine warfare, while the AI system labeled the objects as grenades and vehicle interior. Similarly, the humanitarian training photograph was interpreted by the AI primarily as a generic crowd or meeting rather than a humanitarian training session.

Interpretation

These results highlight a common limitation of computer vision systems: they reliably recognize visible objects but struggle to identify the historical or contextual meaning of an artifact. For archival work, automated tagging may help generate preliminary metadata, but human review remains essential to preserve the interpretive accuracy of historical collections.

Conclusion and Future Direction

A related limitation is that many explosive devices and landmines are unlikely to appear frequently in modern training datasets. For safety and ethical reasons, AI image datasets often avoid detailed images of weapons or improvised explosive devices. As a result, models may lack the specialized training necessary to recognize these objects accurately, which likely contributed to misidentifications such as labeling a landmine as a grenade. However, carefully curated training datasets developed in partnership with humanitarian demining organizations could significantly improve this capability. If reliable computer vision systems could be trained to recognize mines and improvised explosive devices, they could eventually be deployed on drones or robotic platforms to assist in locating hazards in conflict zones. Such systems would not replace trained deminers, but they could reduce the need for humans to place themselves directly in dangerous environments and potentially scale detection efforts in regions heavily contaminated with unexploded ordnance.

← Back to AI Analysis