We all know that when users search on Google with keywords they are immediately shown a list of relevant links from all over the Internet. But how does this work?
When a user searches something on Google, a program checks the indexes to put up the most appropriate results, just like the index of a book serves us in finding something. The program that does the job is called Googlebot. They ‘crawl’ billions of web pages and build a searchable index for Google Search.
So just as required for generic website, Online stores are also required to be indexed so that they appear when consumers search with relevant keywords. Customizing the robot.txt allows you to control which pages of your online store will be indexed when a search engine bot visits your store. When a search engine bot visits your store it looks for the robot.txt and then crawls as per the instructions in the file.
In Magento Community and Enterprise however in the Configuration>Design Panel you get options to control the indexing of your pages. In the Default Robots field you get a dropdown with the following options:
- INDEX, FOLLOW: Pages are indexed, and search engine bots are allowed to follow links from applicable pages.
- NOINDEX, FOLLOW: Pages are not indexed, but search engine bots are allowed to follow links from applicable pages.
- INDEX, NOFOLLOW: Pages are indexed, but search engine bots do not follow links.
- NOINDEX, NOFOLLOW: Pages are not indexed, and search engine bots do not follow links.
Now if your online store is under development you should have the option “NOINDEX, NOFOLLOW” selected. Once you go live you can select the line “INDEX, FOLLOW”
But if you require to have more control over your store’s indexing you will have to edit the actual robot.txt file.
There are two basic instructions:
Google has different bots for different functions, for crawling through pages for video (Googlebot-Video), for images (Googlebot-Image) etc. You can have a look at the all the bots here.
“User-agent” lets you define which bots will be allowed access and “Disallow” lets you define where the bots would not access.
Below are few examples of their usage:
Here (*) signifies it will apply for all bots but (/) blocks access to any page on the site.
This will apply for all bots and will provide complete access.
This will apply for a single bot but access to no page.
This will apply for a single bot and will provide complete access.
This will apply for only a single bot and access to only media folder.