Introduction
Nutch, apart from giving search results for the keywords given, provides some more features about every page like, cached, explain, anchors and more. Explain page gives some more technical details about the page, which are more intended for developers' community. Nutch, as part of its functionality, provides some basic details about the page on explain page. The main concern of this topic will be, how can we change this page to display customized information about the page as per our need.
There are several plug-ins provided by Nutch to accomplish different functionalities. The plug-in, we are interested in, to customize explain page is "index-more".
Steps
Following is the sequence of steps to be followed to make desired changes. Pre-requisite for these steps is, Nutch crawler should be fully functional on the system. If not please refer to Installing Nutch page by Abhilash. Once Nutch is verified running successfully, we are ready to get going for the changes.
- Open file,
/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java - Add a method with following signature.
private Document addMyFields(Document doc, ParseData data, String url, Inlinks inlinks)
- Locate the method called "filter". This is the method responsible to add any additional fields on the explain page. Following are the arguments of filter method, which will be used to accomplish our task.
- Document doc: This is the document object which is displayed, i.e. the explain page. We will be using this object to add fields on the explain page.
- Parse parse: This is the object containing data obtained after parsing any web page. We will be using this object to get values of the fields which we want to add on the explain page.
- Inlinks inlinks: This is the object of Inlinks class. This object should be having information about the incoming links to the web page. (P.S.: Incoming links could not be found out when tried to get actually.)
- Text url: URL of the web page.
- Add a call to this method in filter method as follows.
addMyFields(doc, parse.getData(), url_s, inlinks);
- Inside addMyFields method, use following method call to add fields on the explain page.
doc.add(new Field(
../apache-ant-1.7.0/bin/ant
Important Methods
Parse
ParseData
Inlinks
Inlink
Outlink
Metadata
Some things worth knowing...
../apache-ant-1.7.0/bin/ant javadoc
No comments:
Post a Comment