This system's detection problem has two distinct sub-problems: knowing that an instrument was picked up, and knowing that a hand-wash event occurred. Neither is trivially solved by a single sensor, and the combination of RFID and computer vision is the architecture's key strength.
Instrument pickup detection is the first sub-problem. A passive UHF RFID tag attached to each instrument is read by an antenna at the bench when the instrument enters the read field — typically when lifted. This gives a reliable, low-latency pickup event. The camera provides a second, independent confirmation: a model trained on the bench view detects a hand grasping the tagged zone. The fusion of both signals eliminates two failure classes that neither sensor handles alone: RFID reads can be triggered by proximity without an actual pickup (false positive); CV alone cannot distinguish which instrument was grasped if two instruments are adjacent. Together, they produce a confident, instrument-specific pickup event.
Assumption — on-metal tags: Many lab instruments have metal bodies or housings. Standard RFID tags lose range on metal surfaces. On-metal RFID tags (which have a spacer layer to decouple from the substrate) are required for these instruments. They cost roughly 5–10x a standard passive tag ($0.50–$5 per tag vs. $0.05–$0.50) but are widely available and proven. This cost should be factored into the instrument tagging plan.
Hand-wash detection is the second sub-problem. A binary sensor on the soap dispenser (a reed switch or optical break-beam triggered on pump actuation) provides the primary signal — near-100% reliable, zero false negatives on actual washes, no inference required. A camera at the wash station provides secondary confirmation: gesture recognition can verify rubbing motion and estimate wash duration if the protocol requires it. For the base system, the soap sensor alone is sufficient.
Person attribution is what makes the system work in a multi-technician environment. With roughly five technicians working simultaneously (assumption), the system must know which technician triggered which instrument event and which wash event. There are two approaches: wristband or badge RFID (each technician carries a unique tag that is read alongside the instrument tag) or camera-based person tracking (a person detection model identifies and tracks individuals across frames). The RFID approach is more reliable and avoids ML inference in the critical attribution path; the camera approach requires no extra wearable but is harder in crowded or occluded scenes. The recommended starting point is wristband RFID for attribution, with camera tracking as a fallback.