Learned sparse retrieval

From Wikipedia, the free encyclopedia

Learned sparse retrieval or sparse neural search is an approach to text search which uses a sparse vector representation of queries and documents.[1] It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE[2] and its successor SPLADE v2.[3] Others include DeepCT,[4] uniCOIL,[5] EPIC,[6] DeepImpact,[7] TILDE and TILDEv2,[8] Sparta,[9] SPLADE-max, and DistilSPLADE-max.[3]

Some implementations of SPLADE have similar latency to Okapi BM25 lexical search while giving as good results as state-of-the-art neural rankers on in-domain data.[10]

The Official SPLADE model weights and training code is released under a Creative Commons NonCommercial license.[11] But there are other independent implementations of SPLADE++ (a variant of SPLADE models) that are released under permissive licenses.

SPRINT is a toolkit for evaluating neural sparse retrieval systems.[12]

External links[edit]

Notes[edit]

  1. ^ Nguyen, Thong; MacAvaney, Sean; Yates, Andrew (2023). "A Unified Framework for Learned Sparse Retrieval". In Kamps, Jaap; Goeuriot, Lorraine; Crestani, Fabio; Maistro, Maria; Joho, Hideo; Davis, Brian; Gurrin, Cathal; Kruschwitz, Udo; Caputo, Annalina (eds.). Advances in Information Retrieval. Lecture Notes in Computer Science. Vol. 13982. Cham: Springer Nature Switzerland. pp. 101–116. arXiv:2303.13416. doi:10.1007/978-3-031-28241-6_7. ISBN 978-3-031-28241-6. S2CID 257585074.
  2. ^ Formal, Thibault; Piwowarski, Benjamin; Clinchant, Stéphane (2021-07-11). "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 2288–2292. arXiv:2107.05720. doi:10.1145/3404835.3463098. ISBN 978-1-4503-8037-9. S2CID 235792467.
  3. ^ a b Formal, Thibault; Piworwarski, Benjamin; Lassance, Carlos; Clinchant, Stéphane (21 September 2021). "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval". arXiv:2109.10086v1 [cs.IR].
  4. ^ Dai, Zhuyun; Callan, Jamie (2020-04-20). "Context-Aware Document Term Weighting for Ad-Hoc Search". Proceedings of the Web Conference 2020. New York, NY, USA: ACM. pp. 1897–1907. doi:10.1145/3366423.3380258. ISBN 9781450370233. S2CID 218521094.
  5. ^ Lin, Jimmy; Ma, Xueguang (28 June 2021). "A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques". arXiv:2106.14807 [cs.IR].
  6. ^ MacAvaney, Sean; Nardini, Franco Maria; Perego, Raffaele; Tonellotto, Nicola; Goharian, Nazli; Frieder, Ophir (2020-07-25). "Expansion via Prediction of Importance with Contextualization". Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '20. New York, NY, USA: Association for Computing Machinery. pp. 1573–1576. arXiv:2004.14245. doi:10.1145/3397271.3401262. ISBN 978-1-4503-8016-4. S2CID 216641912.
  7. ^ Mallia, Antonio; Khattab, Omar; Suel, Torsten; Tonellotto, Nicola (2021-07-11). "Learning Passage Impacts for Inverted Indexes". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 1723–1727. arXiv:2104.12016. doi:10.1145/3404835.3463030. ISBN 978-1-4503-8037-9. S2CID 233394068.
  8. ^ Zhuang, Shengyao; Zuccon, Guido (13 September 2021). "Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion". arXiv:2108.08513 [cs.IR].
  9. ^ Zhao, Tiancheng; Lu, Xiaopeng; Lee, Kyusong (28 September 2020). "SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval". arXiv:2009.13013 [cs.CL].
  10. ^ Lassance, Carlos; Clinchant, Stéphane (2022-07-07). "An Efficiency Study for SPLADE Models". Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '22. New York, NY, USA: Association for Computing Machinery. pp. 2220–2226. arXiv:2207.03834. doi:10.1145/3477495.3531833. ISBN 978-1-4503-8732-3. S2CID 250340284.
  11. ^ "splade/LICENSE at main · naver/splade". GitHub. Retrieved 2023-08-25.
  12. ^ Thakur, Nandan; Wang, Kexin; Gurevych, Iryna; Lin, Jimmy (2023-07-18). "SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval". Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '23. New York, NY, USA: Association for Computing Machinery. pp. 2964–2974. arXiv:2307.10488. doi:10.1145/3539618.3591902. ISBN 978-1-4503-9408-6. S2CID 259949923.