
The Search News Object enables full-text searching of news content. The DayPI search engine is based on Lucene. As such, the DayPI supports a subset of Lucene's search syntax.
Search News Object method calls must have their query strings URL Encoded. For example, an "iraq war" query string (sans quotes) must be converted to either iraq+war or iraq%20war:
http://freeapi.daylife.com
A query is broken up into terms and Boolean operators (discussed below). There are two types of terms: Single Terms and Phrases.
A Single Term is a single word such as "test" or "hello".
A Phrase is a group of words surrounded by double quotes such as "hello dolly".
Multiple terms can be combined together with Boolean operators to form a more complex query (see below). Note that all single terms and phrases are case insensitive.
You can refine your search by limiting the application of a particular term to the headline or title of an article. This works like so:
title:"Global Warming"
You can combine this with a search against the body of the article, for example:
"jail time" AND title:Libby
This finds articles that contain the phrase "jail time" and have the word Libby in the headline.
The DayPI (via Lucene) supports modifying query terms to provide a wide range of searching options.
The DayPI supports single and multiple character wildcard searches within single terms (not within phrase queries).
To perform a single character wildcard search use the "?" symbol.
To perform a multiple character wildcard search use the "*" symbol.
The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:
te?t
Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:
test*
You can also use the wildcard searches in the middle of a term.
te*t
Note: You cannot use a * or ? symbol as the first character of a search.
Daylife searches determine the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.
Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for
cheney wiretapping
and you want the term "cheney" to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type:
cheney^4 wiretapping
This will make documents with the term cheney appear more relevant. In effect, this changes the sort order of the documents returned to you when you query using sort=relevance.
You can also boost Phrase Terms as in the example:
"cheney wiretapping"^4 "Alberto Gonzales"
By default, the boost factor is 1. The boost factor must be positive, and it can be less than 1 (e.g. 0.2)
Boolean operators allow terms to be combined through logic operators. The DayPI supports AND, "+", OR, NOT and "-" as Boolean operators. Note: Boolean operators must be in ALL CAPS.
The AND operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the AND operator is used.
The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbols && can be used in place of the token AND.
To search for documents that contain "libby perjury" and "Valery Plame" use the query:
"libby perjury" "Valery Plame"
or
"libby perjury" AND "Valery Plame"
The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.
To search for documents that contain either "libby perjury" or just "plame" use the query:
"libby perjury" OR plame
The "+" or required operator requires that the term after the "+" symbol exist somewhere in the text of a retrieved document.
To search for documents that must contain "libby" and may contain "pejury" use the query:
+libby perjury
The NOT operator excludes documents that contain the term after the NOT operator. This is equivalent to a difference using sets. The symbol ! can be used in place of the token NOT.
To search for documents that contain "libby perjury" but not "Valery Plame" use the query:
"libby perjury" NOT "Valery Plame"
Note: The NOT operator should not be used with just one term. For example, the following search will return no results:
NOT "Valery Plame"
The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.
To search for documents that contain "libby perjury" but not "Valery Plame" use the query:
"libby perjury" -"Valery Plame"
You can use parentheses to group clauses to form sub queries. This allows specific control over how Boolean logic is applied to a query.
To search for either "libby" or "cheney" and "indictment" use the query:
(libby OR cheney) AND indictment
This retrieves articles where either "libby" or "cheney" are used along with "indictment".
The DayPI supports proximity searches, which enables finding documents containing specific words that are within a certain distance of each other. To do a proximity search, use the tilde, "~", symbol at the end of a Phrase. For example to search for "libby" and "jail" within 10 words of each other in a document use the search:
"libby jail"~10
The DayPI supports Lucene's "fuzzy" searches based on "edit distance". For example, to search for a term similar in spelling to "roam" use the fuzzy search:
roam~
This search will find terms like foam and roams.
An additional (optional) parameter can specify the required similarity. This value must be between 0 and 1, where values closer to 1 indicate a greater similarity is required to match. For example:
qadaffi~0.8
finds both Qadaffi and Gadaffi. The default value when the parameter is not given is 0.5.