Knowledge Distillation for High Dimensional Search Index

  • Zepu Lu
  • , Jin Chen
  • , Defu Lian
  • , Zaixi Zhang
  • , Yong Ge
  • , Enhong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Lightweight compressed indexes are prevalent in Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) owing to their superiority of retrieval efficiency in large-scale datasets. However, results given by compressed indexes are less accurate due to the curse of dimension and limitation of optimization objectives (e.g., lacking interactions between queries and documents). Thus, we are encouraged to design a new learning algorithm for the compressed search index in high dimensions to improve retrieval performance. In this paper, we propose a novel Knowledge Distillation for high dimensional search index framework (KDindex), with the aim of efficiently learning lightweight indexes by distilling knowledge from high-precision ANNS and MIPS models such as graph-based indexes. Specifically, the student is guided to keep the same ranking order of the top-k relevant results yielded by the teacher model, which acts as the additional supervision signals between queries and documents to learn the similarities between documents. Furthermore, to avoid the trivial solutions that all candidates are partitioned to the same post list, the reconstruction loss that minimizes the compressed error, and the posting list balance strategy that equally allocates the candidates, are integrated into the learning objective. Experiment results demonstrate that KDindex outperforms existing learnable quantization-based indexes and is 40× lighter than the state-of-the-art non-exhaustive methods while achieving comparable recall quality.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713899921
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: Dec 10 2023Dec 16 2023

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period12/10/2312/16/23

ASJC Scopus subject areas

  • Signal Processing
  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Knowledge Distillation for High Dimensional Search Index'. Together they form a unique fingerprint.

Cite this