Yotam Erel, Kat Adams Shannon, Kim Scott, Peng Cao, Xincheng Tan, Peter K Hart, Melissa Kline Struhl, Junyi Chu, Gal Raz, Sabrina Piccolo, Catherine Mei, Christine Potter, Sagi Jaffe-Dax, Casey Lew-Williams, Joshua Tenenbaum, Katherine Fairchild, Amit Bermano, Shari Liu
Technological advances in psychological research have enabled large-scale studies of human behavior and streamlined pipelines for automatic processing of data. However, studies of infants and children have not fully reaped these benefits, because the behaviors of interest, such as gaze duration and direction, even when collected online, still have to be extracted from video through a laborious process of manual annotation. Recent advances in computer vision raise the possibility of automated annotation of this video data. In this paper, we built on a system for automatic gaze annotation in young children, iCatcher (Erel et al., 2022), by engineering improvements, and then training and testing the system (hereafter, iCatcher+) on three datasets with substantial video and participant variability (214 videos collected in United States lab and field sites, 143 videos collected in Senegal field sites, and 265 videos collected via webcams in homes; participants aged 4 months-3.5 years). When trained on each of these datasets, iCatcher+ performed with near human-level accuracy on held-out videos on distinguishing “LEFT” versus “RIGHT”, and “ON” versus “OFF” looking behavior, across all datasets. This high performance was achieved at the level of individual frames, experimental trials, and study videos, held across participant demographics (e.g., age, race/ethnicity), participant behavior (e.g., movement, head position) and video characteristics (e.g., luminance), and generalized to a fourth, entirely held-out online dataset. We close by discussing next steps required to fully automate the lifecycle of online infant and child behavioral studies, representing a key step towards enabling rapid, high-powered developmental research.