excerpt not returning the articles contents or search term

By pagetribe

I get the following excerpt: 'excerpt': '...than to notify the recipient of the article you have chosen. Who do you perceive to be the wealthier political couple, Kevin Rudd and Therese Rein or the Turnbulls? Bite sized news all in one place brought to you by the new Holden Caprice Lexus premier...',

When I go to the page 'daylife_url': 'http://www.daylife.com/article/0cNC5fUgsw348', this excerpt is not in the article and neither is the search term. However, the excerpt is a collection of text found spread out around the sites.

For example, part of text is from a voting widget on the right, the text here includes the search term I am looking for (Who do you perceive to be the wealthier political couple, Kevin Rudd and Therese Rein or the Turnbulls?).

The other text is from an text advertisement (Bite sized news all in one place brought to you by the new Holden Caprice Lexus premier...).

The first part of the excerpt ('...than to notify the recipient of the article you have chosen) does not appear anywhere on the visible part of the page and can only be seen in the html.

Is there a reason for this? Is the newspaper deliberately miss sending their feed to you for advertising reasons?

Many thanks.
Filed Under:

sliding excerpts

pagetribe,

The excerpts being returned in the Free API are what we call "sliding excerpts". We attempt to return you an excerpt from that part of the article where the topic name or the search query is mentioned.

Unfortunately, there is no flag to suppress the sliding excerpts and return the excerpt from the start of the article when using search_getRelatedArticles, topic_getRelatedStories or topic_getRelatedArticles. However, when you call article_getInfo, you do get the excerpt from the start of the article as this API does not have a reference name or a query to return a sliding excerpt. In case you are trying to get the excerpt from the start of the article, I would suggest you make 2 calls -- first get your article/stories and then gather all article IDs from  the response of your xxx_getRelatedArticles call, and use article_getInfo with a list of article_ids as input.

 

Vineet

 

sliding excerpts occuring using source_getArticles

Hi again,

Following on from you above advice I first run source_getArticles and get a list of article IDs from the response. I then use article_getinfo with the list of article_ids as input. However, I am still getting sliding excerpts.

See: http://www.daylife.com/article/0eDWcFp2FUbHo
The headline: 'Ex-file: Duchovny splits with wife'
and the excerpt: "Sophie thinks she's terrible in bed! See her hot pics in her undies and tell us what you think.\nMore\nHamish & Andy are set to head off to hug Australia again. Find out what happened last time...\nMore\nFind out if you made it to our weekly gallery of" don't match.

Is there a different method needed to avoid sliding excerpts when using source_getArticles as opposed to xxx_getRelatedArticles? Thanks.

solved (partly)

Thanks for the suggestion: I first made a call to search_getRelatedArticles then a call to article_getInfo and the excerpt returned was the start of the article, which is what I was after.

However, as the query term was picked up on the side of the page and not the article itself, I now get back an article that has no mention of the query term. How would I prevent this?

Does this mean that the 'sliding excerpts' are the result of the whole page being scanned for the search term. When I call search_getRelatedArticles the API returns any match to my query term, weather or not it is in the main article, ie somewhere on the page of the article (sliding excerpts)? Then, using article_getInfo, I narrow down to the article itself which may or may not have the search term I am looking for. I feel I have misunderstood something.

Thanks for you help on this.

sliding excerpt is a piece of text from the whole article

pagetribe,

The sliding excerpt returned is our attempt to find you the best piece of text from the whole article body and not just the summary published in the RSS feed. So if you use article_getInfo to get the excerpt from the start of the article body, then its not guaranteed that your query will be contained in there.

I am not sure if I completely understand what you are trying to accomplish with the excerpt, but a workaround for you could be to use both the sliding excerpt returned in search_getRelatedArticles as well as the excerpt from article_getInfo.

Let me know if I can help further.

Thanks!

-- Vineet

 

perhaps an outline

Thanks for you help here Vineet.
Maybe if outline what I'm trying to do it may be clearer.
I want to search a list of sources (i assume source_getArticles is the best way to do this)
I then want to list the headline and the summary text from the article.

For example on my page I would like:
1.
headline: Toyota profits fall.
body: Today Toyota's profits fell be 5% after...
url: www.linktoarticleonnewssite.com

what I don't want is:
2.
headline: Toyota profits fall.
body: Vote on Paris Hilton's new sunglasses and win...
url: www.linktoarticleonnewssite.com

I assumed the 'body' in 1 & 2 above come from the returned excerpts. Furthermore, number 2 is an instance of sliding excerpts (ie pulling other content from the page on which the headline is part of). Where as number 1's body refers directly to the headline, number 2's does not. I would like to prevent number 2 from happening.

What is the best way forward for me to achieve number 1?

Thanks.