Main » SIGIR » 2021 » SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021 »

CopyCat: Near-Duplicates Within and Between the ClueWeb and the Common Crawl

Maik Fröbe, Janek Bevendorff, Lukas Gienapp, Michael Völske, Benno Stein, Martin Potthast, Matthias Hagen