AN IMPROVED APPROACH OF WRAPPER GENERATION TECHNIQUES FOR WEB SOURCES

Authors

  • Sweta Dept. of P.G. Studies and Research in Computer Science, GUK

Keywords:

APIs; HTML; web sources; web data extraction; wrapper

Abstract

The World Wide Web has more and more online Web databases which can be searched through Web query
interfaces. All the Web databases make up the deep Web. Often the retrieved information is enwrapped in Web pages in
the form of data records. These special Web pages are generated dynamically and are hard to index by traditional
crawler based search engines, such as Google and Yahoo. The topic of Web data extraction has received a lot of
attention in recent years and most of the proposed solutions are based on analyzing the HTML source code or the tag
trees of the Web pages. Web data extraction is the process of extracting user required information from websites. The
web document contains data which is not in structured format. Specific data is able to be extracted from all these Web
sources in order to be used by other users or applications. The word web data extraction means the extraction of data
that is present in the web documents in HTML format and removing the unwanted things such as tags, advertisements,
videos and so on from web sources.

Published

2018-04-25

How to Cite

Sweta. (2018). AN IMPROVED APPROACH OF WRAPPER GENERATION TECHNIQUES FOR WEB SOURCES. International Journal of Advance Engineering and Research Development (IJAERD), 5(4), 1392–1395. Retrieved from https://www.ijaerd.org/index.php/IJAERD/article/view/5530