The images are collected from real-world scenarios, with humans appearing with challenging poses and views, heavily occlusions, various appearances and low-resolutions. We provide 14K images with 85 kinds of labels and 31 kinds of relations. For each image, we have 10 instances and 17 relationships on average. In sum, we label 136K instances and 235K relations. We mainly define 2 kinds of relations, including position relations and action relations. Several example images of the dataset are shown in the following.

Data Statistics

Dataset Total Train Val Test Labels Relations Instance Num Relation Num
PIC 2018 14135 10000 1135 3000 85 31 136K 235K

Image Example

Instance Annotation Example

Relation Annotation Example