doi dblp DOM-based content extraction of HTML documents Suhit Gupta | Gail E. Kaiser | David Neistadt | Peter Grimm Proceedings of the Twelfth International World Wide Web Conference, WWW 2003, Budapest, Hungary, May 20-24, 2003