Tag: search

Where's the case for data retention?

Google’s CEO Eric Schmidt announced today that he thinks the greatest danger to people’s privacy is not from leaks of people’s data as happened earlier this week to AOL users but rather from government snooping.

I have always worried the query stream is a fertile ground for governments to snoop on the people.

This is a very valid argument and it has to be said that it is definitely in Google’s best economic interest to ensure that no-one can access their massive databases of saved searches. The same cannot be said for Irish ISPs and telcos who are being tasked with keeping three years of log files on all their customers. There is almost no incentive for them to secure this data – it is nothing but a dead cost for them and one they wish would go away. This data will more than likely be leaked and sold time and time again by everyone from crooked Gardaí (the Irish police) to minimum wage call centre employees.

Having said that no lock is uncrackable and if someone wants to get at Google’s databases badly enough, they will find a way. The easiest way to thwart this is not to retain the data!

The myth of privacy

You do know that every search term you type into a search engine is saved by the search engine, don’t you? That time you searched for porn, or an ex boy/girlfriend, or information about an illness you thought you might have – all saved by the search engine.

This practice was brought sharply into focus when AOL purposefully posted 3 months of search data on the Internet. Usernames were replaced with numbers but it was still possible to identify some of the searchers. The New York Times runs a story today about a Ms Thelma Arnold, a 62 year old living in Lilburn, Georgia. Ms Arnold was searcher number 4417749 in AOL’s records but was readily identifiable based on her searches for “numb fingersâ€?, “60 single menâ€?, “dog that urinates on everythingâ€?, “landscapers in Lilburn, Ga,â€? several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.â€?

Marketers are going to have a ball with all this info!

Someone has helpfully taken a copy of the data and put a web interface on it to make it easier to query!

Michael Geist said it best when he said:

The article provides a powerful illustration not only of the severity of the AOL mistake (which remains online for all to see), but of why search companies simply should not be retaining this data for any significant period of time. The public privacy risks, whether self-inflicted, from hackers, or via law enforcement fishing expeditions, outweigh the private commercial benefits.

While Ms Arnold is quoted in the New York times article as saying

My goodness, it’s my whole personal life, I had no idea somebody was looking over my shoulder… We all have a right to privacy, Nobody should have found this all out.

You haven’t searched for anything you wouldn’t want people to know about recently, have you?

Salim Ismail interview coming up

I will be interviewing Salim Ismail, chairman & co-founder of PubSub in the next couple of days. Pubsub is a blog search engine or as Salim likes to say a “matching engine”.

I was amazed to learn, from talking to Salim here at the les Blogs 2.0 conference, that Salim lived and worked in Cork for around a year and Salim is another fan of Murphy’s stout!

If you have any questions you’d like me to ask Salim – please leave them in the comments

Microsoft follow Google into Book Search

I had a much longer post prepared about this but I lost it when I had a server crash (due to my playing around with my .htaccess file!).

Anyway, according to the BBC, Microsoft are following Google’s lead into the Book Search arena.

MSNBC’s report states that Microsoft are teaming up with Yahoo! and the Open Content Alliance and they hope to:

sidestep hot-button copyright issues for now by initially focusing mainly on books, academic materials and other publications that are in the public domain… to let users search about 150,000 pieces of published material. A test version of the product is promised for next year.

Google’s Print project, on the other hand, promises to index millions of books and to remove from the index any books whose author requests they do so.

In terms of usefulness, a search index of millions of books will be orders of magnitude better than one with a mere 150,000 books – now if only Google can overcome the silly legal objections.

Video search via rss

Yahoo! have posted a nice instruction set on their blog, detailing how to subscribe your copy of iTunes to video searches of interest so that you are constantly fed relevant updated video casts!

Basically, to do it you simply use the RSS url generator to generate the RSS feed for your video search, add that to iTunes and watch the videos as they arrive in iTunes (or if you have one of the new Video iPods, you can watch them on that!).

I will be talking about other uses for RSS tomorrow evening at the IT@Cork RSS event – hope to see you there!

Developments in search

Two search announcements overnight:

The Yahoo! blog search engine is disappointing – from a user interface point of view – the blog search results are hidden away in a sidebar of the main results on the right hand side of the page.
This is the part of many web pages which contains ads and consequently is ignored subconsciously by most users. Also, on a couple of searches I performed on the site, the results are poor compared to its competitors and I note that Scoble had a similar experience.

Chris Pirillo’s Gada.be, on the other hand, is an interesting new take on search. It gives the search string in the domain – so a search for nano becomes http://nano.gada.be/ and a search for iPod Nano becomes http://ipod-nano.gada.be/. Also, amazingly if you add /opml onto the end of the domain name (i.e. http://ipod-nano.gada.be/opml) you are presented with an opml feed for the search which can be imported in to most RSS readers. Gada.be is also optimised for mobile devices which will be more and more important as PDA’s and mobile phones converge.

This is something Scoble has been asking for for some time now!

UPDATE:
Lisa Vaas of eWeek has an excellent review of Gada.be, if you want to know more about it, I suggest taking a look at that.

Yahoo! attacks iTunes podcast monopoly

Yahoo! have just rolled out Yahoo! Podcasts – a service which lets users find, listen and subscribe to podcasts. PodTech has an interview with Geoff Ralston, Yahoo!’s Chief Product Officer about the new offering where he says:

We want this to be as open as possible on both ends. We want to work with every device – however a user of Yahoo podcasts wants to consume their podcast, wherever they want to do it, whatever device, and on whatever jukebox. We’re going to work with them (jukeboxes) and we’re going to work with as many standards as possible using standard pcast format to integrate with a jukebox. You can listen to podcasts right on your computer, or you can listen to it right on the web itself. On the other end, we want to be as comprehensive as possible. If you have a podcast we’re going to find you, and if we haven’t found you then you can come to our website and give us your RSS feed and we’ll get it into our index within 24 hours.

Finally someone has taken on ITunes’ monopoly in this market – iTunes offering is pretty poor, interface-wise, but as they had no significant competition, they didn’t have to improve it. Now, however, with the launch of Yahoo! Podcasts, they have competition from a serious player.

One of the most useful features of Yahoo! Podcasts is the search function – coming from Yahoo!, not surprisingly, it works like a dream. Another nice feature of the site, is the ability to rate and review podcasts – this will add significantly to the value of the directory as the better podcasts come to the top.

This makes me want to start podcasting once more! I’ll have to fix the soundcard on this PowerBook before I can do that 🙁

What exactly is RSS?

Have you heard about RSS and wondered what exactly it is? Well in technospeak RSS stands for Really Simple Syndication, and it is a family of XML file formats for web syndication. To put it more simply, the technology behind RSS allows internet users to subscribe to websites that have provided RSS feeds so that they are notified when there are updates to the site. RSS feeds are typically used by news websites (RTE, BBC, Reuters, CNN, etc.), weblogs (blogs) and more recently by search engines and other search services to provide a perpetual search.

To Subscribe to an RSS feed from a website you need the site’s RSS feed address (i.e. http://www.tomrafteryit.net/feed/) and an RSS feed reader. You can install a feed reader on your computer so that you have access to it on your desktop, or if you prefer you can use an online feed reader. If you are not comfortable installing software on your computer then an online feed reader might suit you best. Wikipedia has a comprehensive list of commercial and free RSS feed readers. Google has recently launched an online feed reader called Google Reader, Yahoo! has one in its MyYahoo service and Microsoft has one on its Start.com site.

How do I know where a site’s feed is?
A sites RSS feed is typically linked to with a small orange button with white writing on it which might say one of the following: RSS, XML,Webfeed, Feed, or Subscribe.

Why would I want to use RSS?
RSS is a push technology, where the information you want is delivered directly to you – unlike browsing, where you have to go looking for the required data. Search engine RSS feeds are particularly powerful because they allow you to search for a term of interest (your company’s name, your competitor’s name, your market segment) and subscribe to an RSS feed for that search. This RSS feed will now constantly deliver new information on that search term as it arises on the internet. In the field of market intelligence, this is one of the most powerful tools ever seen.

If you’d like to know more about RSS or to see it in action, feel free to come along to the IT@Cork RSS Event on the 25th of October.