Bytelan.com

Direct Answers: Extracting Text from Pages Citations

2015-01-13

This is the last post in a series about Google’s International patent application Natural Language Search Results for Intent Queries.

This section was inspired by the citations list at the end of a paper used by the listed inventors as a provisional patent, that preceded that patent. The paper was Scalable Attribute-Value Extraction from Semi-Structured Text (pdf).

I sometimes like to start looking through the documents I see listed as citations or footnotes in a paper I find interesting, As I started looking at the documents in that paper, I found many of them to be very interesting.

And then an idea struck me.

Rather than me trying to take just one or two of these papers, I’d share the process. Since the original paper was a PDF without any links to it, the chances of most people exploring those links was very limited.

And yet some of these papers should be read.

There’s one on the Semantic Web from 1975, created by the Department of the Navy. There’s another from the 80s, and three more from the early 90s. Some basic concepts that people interested in the Semantic Web and in Search Engines such as Wrappers are covered.

I don’t know all of the classic papers of the Semantic Web, and whether or not many of these are ones that fit into that category. But that’s why I’m sharing links to them – so that we can work on learning that together.

It you see something that strikes you as really interesting, please let me know in the comments.

Thanks, and I hope you find something really interesting in these.

[1] M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. (pdf) In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2670–2676, Hyderabad, India, January 2007.

[2] M. Berland and E. Charniak. Finding parts in very large corpora (pdf). In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), pages 57–64, College Park, MD, June 1999.
[3] R. C. Bunescu and R. J. Mooney. Collective information extraction with relational Markov networks (pdf). In Proceedings of the 42th Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 439–446, Barcelona, Spain, July 2004.
[4] S. A. Caraballo. Automatic construction of a hypernym labeled noun hierarchy from text (pdf). In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), pages 120–126, College Park, MD, June 1999.

[5] S. M. Cherry. Weaving a web of ideas. IEEE Spectrum, 39(9):65–69, September 2002.
[6] W. W. Cohen, M. Hurst, and L. S. Jensen. A flexible learning system for wrapping tables and lists in HTML documents (pdf). In Proceedings of the 11th International World Wide Web Conference (WWW-02), pages 232–241, Honolulu, HI, May 2002. (Presentation (PDF))
[7] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithm (pdf). Journal of Machine Learning Research, 7:551–585, 2006.
[8] D. Freitag and N. Kushmerick. Boosted wrapper induction (pdf). In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-00), pages 577–583, Austin, TX, July 2000.
[9] M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-92), Nantes, France, August 1992.

[10] M. A. Hearst and H. Schutze. Customizing a lexicon to ¨better suit a computational task. In Proceedings of the ACL-SIGLEX Workshop on Acquisition of Lexical Knowledge from Text, Columbus, Ohio, June 1993.
[11] I. Muslea, S. Minton, and C. A. Knoblock. Hierarchical wrapper induction for semistructured information sources (pdf).Journal of Autonomous Agents and Multi-Agent Systems, 4:93–114, 2001.
[12] M. Pasca and B. Van Durme. Weakly-supervised acquisition of open-domain classes and class attributes from web documents and query logs (pdf). In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL-HLT-08), pages 19–27, Columbus, OH, June 2008.
[13] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (book in Amazon). Morgan Kaufmann, San Mateo, CA, 1988.
[14] M. Poesio and A. Almuhareb. Identifying concept attributes using a classifier (pdf). In Proceedings of the ACL Workshop on Deep Lexical Semantics, Ann Arbor, Michigan, June 2005.
[15] J. Pustejovsky. The Generative Lexicon (Book atg MIT). MIT Press, Cambridge, MA, 1995.
[16] Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery (pdf). In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL (HLT-NAACL-06), pages 304–311, New York City, NY, June 2006.

[17] W. A. Woods. What’s in a link: Foundations for semantic networks (pdf). In D. G. Bobrow and A. M. Collins, editors, Representation and Understanding: Studies in Cognitive Science, pages 35–82. Academic Press, New York, 1975.
[18] S. Zhao and J. Betz. Corroborate and learn facts from the web(Paid ACM Access Only). In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 995–1003, San Jose, CA, August 2007.

Copyright © 2015 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Direct Answers: Extracting Text from Pages Citations appeared first on SEO by the Sea.

SEO by the Sea

The post Direct Answers: Extracting Text from Pages Citations appeared first on SEO Montreal Firm Bytelan Enterprise (514)726-6799.