中文 |

Newsroom

Researchers Improve Scene Perception with Innovative Framework

May 07, 2024

Led by Prof. LIU Yong from the Hefei lnstitutes of Physical Science of the Chinese Academy of Sciences, researchers proposed a novel framework, called Clip-based Knowledge Transfer and Relational Context Mining (CKT-RCM), to address the long-tail distribution problem in computer vision.

The results were published in IEEE International Conference on Acoustics, Speech and Signal Processing.

Panoptic Scene Graph (PSG) is a prominent research direction within scene graph generation, which requires comprehensive output of all relationships in an image alongside accurate segmentation for object localization. PSG aims to improve the understanding of scenes by computer vision models and to support downstream tasks such as scene description and visual inference.

In this study, the researchers explored how humans perceive object relationships, presenting two key perspectives. People anticipated the object relationships based on common sense or prior knowledge. They also inferred relationships based on contextual information between subjects and objects. These perspectives underscore the importance of leveraging prior knowledge: one involves correcting data biases using external data previously observed by humans, while the other relies on the prior distribution of conditions between objects.

"Therefore, we believe that sufficient prior knowledge and contextual information are crucial for PSG prediction," said Dr. WANG Fan, a member of the team.

They developed this network framework CKT-RCM. Based on the pre-trained vision-language model CLIP, CKT-RCM facilitates relationship inference during PSG processes. It integrates a cross-attention mechanism to extract relational context, ensuring a balance between value and quality in relational predictions.

This study contributes to the understanding and perception of scenes by robots and autonomous vehicles.

The proposed Clip-based Knowledge Transfer and Relational Context Mining (CKT-RCM). (Image by WANG Fan)

Contact

ZHAO Weiwei

Hefei lnstitutes of Physical Science

E-mail:

CKT-RCM: Clip-Based Knowledge Transfer and Relational Context Mining for Unbiased Panoptic Scene Graph Generation

Related Articles
Contact Us
  • 86-10-68597521 (day)

    86-10-68597289 (night)

  • 86-10-68511095 (day)

    86-10-68512458 (night)

  • cas_en@cas.cn

  • 52 Sanlihe Rd., Xicheng District,

    Beijing, China (100864)

Copyright © 2002 - Chinese Academy of Sciences