It has been a long time since my last post. For the last 8 months I have been working on web pages, IIS based, ASP.NET 3.5, using master pages, and so on.
When I thought that most of the work was almost done (master page designing, CSS/HTML editing, linking between pages, and so on), I faced the other side of the problem: SEO optimization, page sizes optimization, download times, conditional GETs, metas (title, keywords, description), page compression (gzip, deflate). The biggest part of the iceberg was under the water; I had a lot to learn, and a lot of lines to code.
Now all that things are already in place and running, so I am willing to share all the things I have learnt with the community, in a series of posts that will cover:
- ASP.NET menu control optimization; to reduce the page size, increase download speed, desirable to have in place before using conditional GETs.
- __VIEWSTATE size minimization; in our case it simply doubled the size of the page. A proper optimization can make the page half the size (or less).
- Conditional GET and ETag implementation for ASP.NET; generation of ETag and Last-Modified headers, when and how to return 304 – Not modified with no content (saves bandwidth and increases responsiveness of your site).
- Solve the CryptographicException: Padding is invalid and cannot be removed when requesting WebResource.axd; this problem is somewhat common but you will fill your EventLog with these errors if you start using conditional GETs.
- Automatic generation of meta tags: meta title, meta description, meta keywords; this way the editing of pages will be much simpler and faster.
- URL canonicalization with 301 redirects for ASP.NET; solve problems of http/https, www/non-www, upper/lower case, dupe content indexing among others.
- Serve different versions of robots.txt: Whether they are requested via http or https you can serve different contents for your robots.txt.
- Enforce robots.txt directives; to ban those robots on the wild by detecting bad-behaving bots, not following rules at robots.txt; we will ban them for some months and prevent them from spending our valuable bandwidth.
- Distinguish a crawl from Googlebot and from someone else pretending to be Googlebot (or any other well known bot); in order to ban those pretenders for a while.
- Set up honey-pots being excluded in robots.txt and ban anyone visiting that forbidden URL; very good against screen-scrapers, offline explorers, and so on.
Since we use Google Webmaster Tools and Google Analytics for all our websites, we had the opportunity to check the consequences of every change. For instance here is the graph that shows the decrease in number of Kb downloaded per day when we enabled http compression and put conditional GETs in place. Note how the number of crawled pages keeps more or less the same during the period, while the Kb downloaded per day slides down passed middle January (peaks match several master page updates).