User Centered

Studying the design of everyday things

Google News- Can we aggregate the aggregate?

, ,

Dan writes and asks for thoughts on Google News...

Originally posted by Dan:


Why is it that Google News can't recognize when a hundred different websites are simply printing the same AP wire story? When I search for a news item, I often like to sort by date (instead of by relevance), so I can be sure that I'm getting the most recent news item on a subject. Almost always, this results in me getting two or three pages of results of the exact same article, just reported by different TV and newspaper organizations. You can tell because the titles and snippets that Google provides are exactly the same! Shouldn't there be a way to collapse these results? When I sort by date, I expect to see this (I made up an example of an election result):

John Smith wins! - Today
Results were certified earlier today...
See 150 similar articles

John Smith projected to Win - Yesterday
Final results are still coming in...
See 150 similar articles

Polls Show Smith and Blow Neck and Neck - One week ago
Local residents are split on who will be Smalltown's next mayor...
See 35 similar articles

(a couple pages later in the results)

Joe Blow to Run for Mayor - Six months ago
Joe Blow announced his candidacy today for this fall's mayoral election...

------------------------------------------------
Instead, it looks more like this:

John Smith wins! - 10 minutes ago
Results were certified earlier today...
Smallville Picayune

John Smith wins! - 18 minutes ago
Results were certified earlier today...
Smallville Ledger

John Smith wins! - 2 hours ago
Results were certified earlier today...
WSMV - 5 ABC NEWS

(three pages of results later)

John Smith wins! - Today
Results were certified earlier today...
Delaware County Register

Am I the only one that thinks this is bad design? Why would I care if the Picayune or the Ledger (and a hundred other news publications) posted their article 10 minutes sooner, when they both just republished the exact same AP wire story? When I'm looking at this view, I'm hoping to see a chronological account of what was reported from when it was breaking news to old news. I'm not looking for 100 copies of the “old news” wrap-up. Is this unreasonable or off-base? I'm sure that Google is smart enough to recognize identical articles, and could group them together (they certainly do in "sort by relevance" mode). If someone wanted to find a particular news source, they could always expand out that list of similar articles.


(Update: I didn't "sort by date" when I took these screenshots/made these comments... see comments)
I feel for him.. But I'm not sure if I fully understand what's going on... it seems hit or miss. I've found both ways that Dan is talking about. First, is a search on "Hewlett Packard" gives me on the first page, the same article reproduced in differing links:

But there's also the "similar" or "related" links too:


...so I'm not sure if I'm fully understanding. But in the first case, I do find it hard to believe that google can't somehow figure out the articles have the same "content." I'm trying to get a handle on this from an "activity centered" point of view. Your goal in browsing is to quickly scan aggregated news articles from different sources. But... of course the AP articles, republished ones get posted around a million times that it creates extra noise you have to filter out defeating the point of the aggregation.

Thoughts?

Thanks for the email Dan.

Plug & Chug: "It's the UI, stupid. Actually, it's a stupid UI"A year of user centered conversation

Comments

Anonymous Thursday, September 14, 2006 12:15:02 PM

Dan writes: Just to clarify, I was specifically referring to the "sort by date" setting. If you plug in Hewlett Packard into the news search, you do get a very well organized, aggregated listing of articles. It's when you hit the "sort by date" option in the top right that it seems to lose all intelligence and just becomes a bland list of (sometimes repetitive) results. Maybe this is what Google envisions "sort by date" means, but what I expect is to have all the articles related to a certain search term, aggregating duplicates, and sorted by date. What they give me instead is basically just their raw material that they use to create the "sort by relevance" results.

Eddie LopezEddie_Lopez Thursday, September 14, 2006 1:03:24 PM

Right. Thanks for setting me straight. I'll update the post

How to use Quote function:

  1. Select some text
  2. Click on the Quote link

Write a comment

Comment
(BBcode and HTML is turned off for anonymous user comments.)

If you can't read the words, press the small reload icon.


Smilies