Seoquake data on over 350 cricket and sports sites

The unexpected success of the debut season of the Indian Premier League can be attributed to a number of factors but most importantly it was proof that Indian cricket fans can be equally proud of their regional teams as they are of their national team. And of course, the reduced duration of a 20:20 match only served to rope in more and more fans. As if all this wasn’t enough, IPL 2008 climaxed with a cracker of a match with just 1 run needed to win on the last ball. All this fanfare resulted in additional traffic for both new and established cricket sites.

To get an idea about what cricket websites and blogs to visit, Seoquake (http://www.seoquake.com/) was used to get data on various sites covering sports, cricket and IPL. This raw data was then made uniform in format (values which were not available were changed to 0, while for parameters where lower is better, these were changed to the highest value available) and a composite score was calculated so as to gauge how the websites fared on a wide range of parameters.

The simplistic method used in this exercise involved the calculation of the ratio of the difference between a given value and the smallest value to the difference between the largest value and the smallest value for each parameter. This was then multiplied by 100. For parameters where lower is better such as Netcraft Date, Netcraft, Alexa and Technorati Ranks, the resultant value was then substracted from 100.

Because the data is fairly skewed and because some parameters like the Compete rank are not sensitive to subdomains, also because all weightages were kept equal, it is recommended that the raw data be reprocessed by an expert in statistics.

The following files are available for download:
(description_md5-hash_size-in-bytes.csv)

Disclaimer: The Seoquake data above was collected using a number of automated batch processes so as not to get banned from the sites supplying this data. Consequently, the accuracy or currentness of the data cannot be guaranteed. Similarly because of the complex nature of the data and the simplistic nature of the processing, the composite score should be taken to be entirely subjective as best and entirely random at worst. If you wish to view the spreadsheet file and the formulae used or if you wish that your website / blog be removed from this list, please write to admin at metrochannel dot in. After downloading the files, please ensure that the size of the file and the md5 hash match those embedded in the respective filename.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: