Archive for database

Mar
04

AOL Bootleg Database is live

Posted by: Corey | Comments (0)

When you login to AffPortal.com you will see a new feature in the portal navigation. The AOL Bootleg database has been added and it’s a super source of information on how the less savvy internet users navigate the web.

A simple query will return proof positive the benefits of bidding on URLs in BOTH ppv and ppc campaigns.

I am still experimenting with using this database to it’s fullest potential so if you have any suggestions on features you would like to see added to the front end interface shoot me off a note or make a comment and I’ll be happy to review your suggestion.

I did notice some corrupt data in the last field in the database. I will be reviewing this and possibly re-importing the data at a later date. For now there is about 24 million records to data mine your niche for new affiliate marketing ideas.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • NewsVine
  • Reddit
  • StumbleUpon
  • Google Bookmarks
  • Yahoo! Buzz
  • Twitter
  • Technorati
  • Live
  • LinkedIn
  • MySpace
Categories : Development
Comments (0)
Feb
20

AOL Search Data in AffPortal.com

Posted by: Corey | Comments (0)

“On August 4, 2006, AOL Research, headed by Dr. Abdur Chowdhury, released a compressed text file on one of its websites containing twenty million search keywords for over 650,000 users over a 3-month period, intended for research purposes. AOL pulled the file from public access by the 7th, but not before it had been mirrored and distributed on the Internet.” – Wikipedia

Last November I had come across an interesting article on the AOL data that was accidentally released to the public. Intrigued I had to find it. Eventually I did find ten flat files that contained the 20 million searches done by AOL users.

The next challenge was importing it into a database that could handle the load. After quite a few attempts I was able to import it into MS Access but Access was unable to query it due to the sheer size. So I exported a cleaner version of the data into .csv files and imported into Sql Server Express.

After only three files were imported, I reached the 4 gb max size for Sql Server Express.

So that takes me to today. I am importing all 20 million records into MySql and making a front end search interface to make this data available to AffPortal.com members. It should be complete in a day or two so stand by.

In running some of my own queries on the data it there are some interesting finds in there.

  1. Why is user xxxx searching for “kill my wife” repeatedly?
  2. It’s AMAZING how many AOL users type in URLs into the search textbox to find a website. This is a proof positive that if you are not bidding on URLs, you are missing out. And if you are not using our URL scraper to gather those URLs, you are wasting time that could be better spent on your campaigns.
  3. I found someone was searching AOL for my father’s name? He is an AOL customer, still stuck with dialup. Who know’s, it could have been him?!?

I’ll be posting more on what I find after searching this data. If you want to mine AOL Search Database recordset yourself, you will find it in the members area of AffPortal.com under AOL Database in about a day or so.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • NewsVine
  • Reddit
  • StumbleUpon
  • Google Bookmarks
  • Yahoo! Buzz
  • Twitter
  • Technorati
  • Live
  • LinkedIn
  • MySpace
Categories : Development
Comments (0)
Feb
16

Big Databases pt 2

Posted by: Corey | Comments (0)

So I’m trying to ask this huge database questions with really simple queries and after like 8 minutes of it spinning, frozen,  I didn’t know what to do. There was no way that would work. So I talked with my friend at work and he said “index it”. A dropped column or three, one new index, this was returning results. Also I reduced the database to 44 million to start getting data and it was working.

It worked right up to a “ringtones” query. Where it again took over a minute to return but when it did the count was like 15,000 something and the datagrid couldn’t handle 15,000 results. Ok, so I put in a trigger so super huge result sets need to come in .txt format with a link on the search page. This works much better.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • NewsVine
  • Reddit
  • StumbleUpon
  • Google Bookmarks
  • Yahoo! Buzz
  • Twitter
  • Technorati
  • Live
  • LinkedIn
  • MySpace
Categories : Development
Comments (0)

I had never heard of the acronym VLDB until I tried to import a 77+ million record tab delimited file into Sql Server Express. I didn’t realize what a headache I was in for.

For AffPortal.com I have purchased about 15 – 20 different databases to power the keyword research functionality and to build initial lists with. I was thinking I could provide this search function to members and that brings a lot of value to a subscription which is only $27/month to begin with. It’s a serious amount of data to pour through.

Every attempt to import this data into Sql Server failed for one reason or another. Either truncation errors, Primary table disc space errors or freezeups. I even tried splitting out the large flat file into about 40 smaller files to import one by one. Still no luck. After much forehead slapping, I was able to set the initial size of the database to about 4gb  and the import began. After about 35% of the importing completed, the database was full. I had maxed out the 4gb limit on Sql Server express and the full blown version of Sql Server cost thousands. Good times.

My db vendor suggested I try Firebird. An open source db like MySql. I downloaded it, installed and look at that, command line only, no user interface… no thanks. But wait, there all these third party UIs, awesome. So I start the import. BLAM. Primary table memory error again. Keep in mind this is after about 6 hours of messing with this well into last saturday morning.

I got an idea. Thanks to the guys at IronTech, I had a MySql db running already on the server and I knew this had a gui front end that is user friendly. So I crank it up. Wait there is no import tool. So off to Google I go and find several but i’m already in the hole quite a bit financially so I don’t feel like spending another couple of hundred on an import tool so I find a nice little bulk import query. Finally something free and in loads the database.

After about an hour of cranking away, convinced this is going to time out at minute 59, the query actually completed. I FINALLY had a database full of all 77, 300,000 keyword phrases….

I thought I was at the home stretch… nope.. (continued)

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • NewsVine
  • Reddit
  • StumbleUpon
  • Google Bookmarks
  • Yahoo! Buzz
  • Twitter
  • Technorati
  • Live
  • LinkedIn
  • MySpace
Categories : Development
Comments (0)