Reply to comment

Code to Rig Up a News Index

By john


This news indexer sample is what we use to power the indexer widgets for Treehugger and CNN's Meme-o-meter

The indexer takes a list of queries as an input and generates a weekly index of number of mentions of each query term in the news in last 2 weeks. It also generates a list of 2 stories for each query term that were in the news in last 2 weeks.

You specify the list of query terms either as a semi-colon separated list or as a URL to an input file that adheres to this format. The input xml file format has a list of tags with a "keyword" attribute. The string in the keyword attribute is what we use as the search term. The value of the tag is what you can use for friendly display. As you can see from our sample input file, we support all the logical operators such as AND, OR like any other binary search.

The index creator outputs an XML that you can use to render your own indexer. Here is how you can test the indexer -- using a input file or using a list of query terms.

This index creator is not the fastest and the performance degrades as you have more input terms. To implement a high performance application, we suggest that you get the response XML every 30 minutes or so and cache it on your side to serve it to a live application.

The indexer has been written in PHP and you can download the code here.

Reply

CAPTCHA
This question helps us test whether you are a human visitor and to prevent automated spam submissions.
20 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.