A web spider dream


(Spidering the web, literally !!)

Its been one of my dreams for a long time, to write my own spider. It came first to me while watching the movie matrix, where in one of the initial scenes,in which while the lead character is sleeping, his computer automatically runs search(about another character called "Morpheus") and gets the results for him.

Writing a automated search would only take a couple of minutes using any automation software like autoit but to efficiently gather data from the various search engines, organizing them, sorting it out and storing them requires talent of another kind. . .

So the journey begins, for the making of the web spider.


The google bot

By the way, guys, do you know that, google gets data of different sites using something like the web spider I mentioned earlier. The name of the spider that they use is google bot.


Robots.txt in action

Another interesting thing which i noted was almost all sites I visited after learning about spiders had robots.txt file which stated which parts of the site the web bot(spider) can visit(access). You too can see the file and the names of various spiders in the worlds, Just type the root name of the site and append it with robots.txt

Imagine you wanted to get the robots.txt of http://en.wikipedia.org
Then type http://en.wikipedia.org/robots.txt
Then you get the details of all bots(spiders) allowed and disallowed by wikipedia guys. You can try this out for any site. Cool huh ?


So in basic terms, the web spider is basically a software that goes to the site and gets the data for you, rather than you manually visiting each site using a web browser.

In future posts, we shall begin the journey of actually coding a spider and using it.

(As a tribute to the movie "The Matrix", this post was intentionally colored green)

Labels: , , , ,

Give me the report

(Google analytics in action: a screenshot of visitors for this blog)


Its been more than a month since i registered with google analytics. I checked it regularly for the first 3-4 days and I could see no result and I dropped out of it and moved on to statcounter.

What these sites do is to present information about the users who visit your blog/site. First you have to register in one of these sites. Then have to just insert the code that they provide in your site/blog and you get detailed summary about your visitors including the country they come from, their lenght of visit and so on. This is really useful especially if you have ads in your page that target a specific audience like AdSense.

But just today I just logged in to google analytics and here there was the complete summary for all visits in the past 1month and I have 3 visit from America too(see the pic on the top) It was really cool to see analytics in action.


Google gear

I stumbled across this utility called Google gear. It is suppposed to make our offline experience painless. But there are other things like webaroo and so on. But from what I understand google gear offers you api and code to make you web apps add the 'work offline' feature. This is really cool, becuase, very few web apps has this feature. I am not sure if google gear could be used to download and view web pages as such. Anyway I will look more into it soon


Digital Camera

Today I and my friend Swaroop, went to QRS( an electronic app shop in our town) to inquire about the price of digital camera. If everything goes as planned I will having a digital camera soon in my hands as promised by my dad, a couple of months back.


Labels: , , , ,