AOL Search Data in AffPortal.com

“On August 4, 2006, AOL Research, headed by Dr. Abdur Chowdhury, released a compressed text file on one of its websites containing twenty million search keywords for over 650,000 users over a 3-month period, intended for research purposes. AOL pulled the file from public access by the 7th, but not before it had been mirrored and distributed on the Internet.” – Wikipedia

Last November I had come across an interesting article on the AOL data that was accidentally released to the public. Intrigued I had to find it. Eventually I did find ten flat files that contained the 20 million searches done by AOL users.

The next challenge was importing it into a database that could handle the load. After quite a few attempts I was able to import it into MS Access but Access was unable to query it due to the sheer size. So I exported a cleaner version of the data into .csv files and imported into Sql Server Express.

After only three files were imported, I reached the 4 gb max size for Sql Server Express.

So that takes me to today. I am importing all 20 million records into MySql and making a front end search interface to make this data available to AffPortal.com members. It should be complete in a day or two so stand by.

In running some of my own queries on the data it there are some interesting finds in there.

  1. Why is user xxxx searching for “kill my wife” repeatedly?
  2. It’s AMAZING how many AOL users type in URLs into the search textbox to find a website. This is a proof positive that if you are not bidding on URLs, you are missing out. And if you are not using our URL scraper to gather those URLs, you are wasting time that could be better spent on your campaigns.
  3. I found someone was searching AOL for my father’s name? He is an AOL customer, still stuck with dialup. Who know’s, it could have been him?!?

I’ll be posting more on what I find after searching this data. If you want to mine AOL Search Database recordset yourself, you will find it in the members area of AffPortal.com under AOL Database in about a day or so.

Tags: , ,

Facebook Comments:

Archived Posts