Main » SIGIR » 2008 »

Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, Singapore, July 24, 2008

Daniel P. Lopresti, Shourya Roy, Klaus U. Schulz, L. Venkata Subramaniam (Editors)

Anthology ID:

Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, Singapore, July 24, 2008

doi dblp
How to cope with questions typed by dyslexic users
Laurianne Sitbon | Patrice Bellot

doi dblp
Optical character recognition errors and their effects on natural language processing
Daniel P. Lopresti

doi dblp
Successfully detecting and correcting false friends using channel profiles
Ulrich Reffle | Annette Gotscharek | Christoph Ringlstetter | Klaus U. Schulz

doi dblp
Named entity normalization in user generated content
Valentin Jijkoun | Mahboob Alam Khalid | Maarten Marx | Maarten de Rijke

doi dblp
Rule based synonyms for entity extraction from noisy text
Rema Ananthanarayanan | Vijil Chenthamarakshan | Prasad M. Deshpande | Raghuram Krishnapuram

doi dblp
Blogger, stick to your story: modeling topical noise in blogs with coherence measures
Jiyin He | Wouter Weerkamp | Martha A. Larson | Maarten de Rijke

doi dblp
Uncovering deep user context from blogs
Robert McArthur

doi dblp
On profiling blogs with representative entries
Jinfeng Zhuang | Steven C. H. Hoi | Aixin Sun

doi dblp
A comparative study of statistical features of language in blogs-vs-splogs
Soumya Datta | Sudeshna Sarkar

doi dblp
Unsupervised learning of multilingual short message service (SMS) dialect from noisy examples
Sreangsu Acharyya | Sumit Negi | L. Venkata Subramaniam | Shourya Roy

doi dblp
Data driven methods for improving mono- and cross-lingual IR performance in noisy environments
Antti Järvelin | Tuomas Talvensaari | Anni Järvelin

doi dblp
Opinion mining from noisy text data
Lipika Dey | S. K. Mirajul Haque

doi dblp
Latent dirichlet allocation based multi-document summarization
Rachit Arora | Balaraman Ravindran

doi dblp
An unsupervised Hindi stemmer with heuristic improvements
Amaresh Kumar Pandey | Tanveer J. Siddiqui

doi dblp
Topic based language models for OCR correction
Anurag Bhardwaj | Faisal Farooq | Huaigu Cao | Venu Govindaraju

doi dblp
A novel Arabic lemmatization algorithm
Eiman Tamah Al-Shammari | Jessica Lin

doi dblp
Noise and information
John Tait

doi dblp
Some thoughts on failure analysis for noisy data
Donna Harman