News

The Googlebot’s role in search

Historically, the search engine and Googlebots are the most used medium online to find products or services. But what exactly is the goal of using search engines and Googlebots? It’s simply about being on the front page of Google, so that people can find your organisation, via a top search engine placement. Your competitor activity, for search marketing in your industry, and the sheer volume of searches related to your website, will determine your investment on growing your website traffic.

All industries have dominant players on Google search. The strategy for all our clients is to improve the searchability of their pages. Specifically, their extensive range of products or services.

First, I will dive deeper into the Googlebot topic. I think it would be useful to first define it, and explain the basics. Once we have covered those topics, then we can look at the strategies you can implement, to take advantage of Googlebots. Improved page content, more relevancy, greater quantity and quality of information, can meet a combination of customer searches, via Google services. Let’s explore how.

What is a Googlebot and Web Crawling?

A Googlebot is Google’s web crawling bot, also known as a spider or web robot. It is a program or automated script browsing the Internet globally in a methodical manner to provide up-to-date data. Web crawling is where a Googlebot finds new and updated pages to be included in Google’s indexing. Google has many computers undertaking this process with billions of pages across the Internet.

So, how many Googlebots are there? We can distinct nine different types of Google crawlers that will crawl different parts of your website:

Googlebot for Google Web Search
Google Smartphone
Google Mobile
Googlebot Images
Googlebot Video
Googlebot News
Googlebot Adsense
Google Mobile Adsense
Google Adsbot

Thanks to all of these, Google is able to fully crawl your site and rank it. But how does it actually work?

Googlebots and Web Crawling purposes defined

Googlebots locate pages by compiling all the links on each page discovered, as they then trail those links to other web pages. They crawl or index new pages linked from other known pages on the Internet, else a webmaster manually submits them via an XML sitemap (which I will explain later). The frequency that Googlebots crawl sites, depends on crawl budgets associated with your PageRank, which are an estimate of how often a website is updated. Let’s learn a bit more about PageRank and the basics of Googlebot and Web Crawling.

Googlebots and PageRank

I explained previously that the frequency Googlebots spend on crawling sites is not always the same. Googlebots will spend more time crawling your site, if you have a high PageRank score. So, what is PageRank? It’s a scale that Google invented to score every page based on a number of factors, such as the page importance, content quality, amount of links, and authority on the web. It is impossible to get a perfect 10, and even “google.com” has a score of 9. Most brand pages have an average score of 3.

The good news is that Googlebots will crawl your site, and look at it with a “fresh eye”, no matter how many times they have already crawled your site. So, if you improve your content or gather new backlinks, for example, chances are that Googlebots will notice it, and your PageRank score will improve.

So, now that we are talking about Googlebots on your site, what exactly happens when a Googlebot crawls it? I’ll investigate that now in more depth.

Googlebots and your site

Googlebots use two main elements to crawl your site. These are the robots.tx file and the XML sitemap.

It will not start randomly crawling your site. As I explained before, there is a very precise method to the way Googlebots operate. They will start by looking at your site “guidelines” and what you want them to crawl and ignore. This is called the robots.txt file. It is important, as it serves as a guide for Googlebots. On it, you can add all of the pages you do not want Googlebots to crawl.

So, whereas the robot.txt file serves as a guide for what bots are allowed to index and what they should ignore, the XML sitemap gives the ability for Googlebots to find all of the pages you do want indexed. Because of the structure of certain websites, it can be hard for Googlebots to find all of your pages. By giving Googlebots your XML sitemap, you basically give the ability for Googlebots to easily find all of your pages. It provides them with a clear message on how to access your site.

So, now that you have the basics down, it’s time to see what strategies you can implement to improve your PageRank score. This will ultimately, get you to rank higher.

How can you create a winning SEO strategy with Googlebots in mind?

So, what should you keep in mind when trying to optimise your website for Googlebots? Here are a few things you should follow to improve your searchability:

Learn how the Googlebots think.
Pay attention to your robots.txt to control the Googlebots.
Content plays a big role – get it right.
Don’t underestimate internal links.

We know what Google wants and how it’s Googlebots think!

To make the best out of Googlebots, you need to understand what they crawl and what has a higher risk of getting ignored. Got a lot of JavaScript on your site? Did you know that Googlebots can’t crawl it as well as HTML. Google hasn’t given its users a lot of information on how it crawls for JavaScript, Ajax, Flash, and DHTML. You should have a balance of HTML and other elements, to make sure it doesn’t keep crawlers from correctly indexing your page.

Basically, you need to make sure you create a website for users AND search engines. I have seen a lot of pretty-looking websites rank nowhere near the top results, because they were not optimised for crawlers such as Googlebots. You must remember that Googlebots don’t see your website the same way that your users do. So, when creating a website, ask yourself if Googlebots see and access your pages and resources, and if you want them to?

How can you see what Googlebots index? There is an easy answer to this. Open a new incognito window and type: “site:<yourwebsite.com>”. For example, I would type site: glideagency.com

This will show you all the pages that the Googlebots have found and indexed. So, when taking a look at the results for Glide Agency, I can see that the Googlebots have indexed over 5 000 results.

So, if you see very few results, you may be blocking some of your pages from being indexed in your robots.txt file. To see what your robots.txt file looks like, just type <yourcompanyname>.com/robots.txt, just like I have done in the image below.

Pay attention to your robots.txt to control the Googlebots

Your robots.txt file is a good way to control Googlebots. Why? Because you can precisely tell them what you want them to crawl, and what you don’t. If you look at the image above, you will see three different elements:

User-agent
Disallow
Allow

The “User Agent” is the type of crawler you may want to restrict the access to. So, for example, you may want to limit the access of a specific file or folder to a specific robot. On the other hand, you may just leave it as a “*” like above, if you want every robot to access it. This star is called a “wild card” and it basically means “everything”, or in this case, all robots.

The “Disallow” means, as its name indicates, the sections you do not want bots, including Googlebots, to access. You can include the name of the folder(s) you do not want Googlebots to access.

Moreover, in certain cases, you may have special instructions for the Googlebots. For example, you don’t want Googlebots to access most of the file, except one section. This is what the “Allow” element is for.

So, if you look at the image above, you can see that I have asked the Googlebots not to crawl the /wp-admin/ except for the /wp-admin/admin-ajax.php file.

What are the robots.txt common formats?

There are four common format types to write your robots.txt file.

If you want to allow full access, simply write: User-agent: *Disallow:
If you want to block all access, you can state: User-agent: *Disallow: /
If you wish to block one folder, like I have done in the image, write: User-agent: *Disallow: /folder/
Lastly, if you just want to block one file, you can say:User-agent: *Disallow: /file.html

Note that robots.txt file are useful only if you want to block or restrict the access of some of your folders to googlebots, or if you need to leave special instructions for Googlebots.

You can also mention in your meta tags for a page not to be crawled, if you do not have a robots.txt file. How? In your code, include: <meta name=”robots” content=”noindex”>.

So, now that you know how to control what Googlebots see, let’s take a look at how you can optimise the content that does get crawled.

How does Content come into play?

One of the main methods, to get your low ranked pages to be crawled frequently, is to offer fresh content. It is important to constantly change the content on your website. Offer new useful information to educate your audience. The aim should be to become a thought leader within your industry. Especially, since content is now becoming a currency that helps increase your organic rankings, within Google listings. And through this process, Google is effectively rating your website content, to determine its relevancy to your audience and how it resonates.

Consequently, building out the content available on your website, and refreshing this through features such as blogs, ultimately assists with an increase in your online presence. If you don’t have the time or the resources to keep up with such measures internally, this is where some companies choose to outsource these requirements.

Don’t underestimate internal links

Even if you are using an XML file for Google to follow, links also help direct Googlebots through your site. The more structure and integrated your links are, the easier it will be for Googlebots to crawl your site.

Not sure how to measure and analyse your internal linking? Don’t worry, it’s easy. Using the Google Webmaster tool, select “Search Traffic” and “Internal Links”. Then, take a look at the results. If the pages on top are content pages, that you are hoping to drive traffic to, then you’re doing great!

But while on-page, off-page and technical SEO is a good start, and the traditional way to go, you shouldn’t underestimate the power that social media probably has on your brand. So, make sure you are using those aspects and have a strategy in place for each.

In conclusion

Google and its Googlebots are here to reward high quality sites, by showing them on the top search results. And with the amount of people going onto the Internet today, a lot of companies have already invested time and effort, making sure they have an SEO strategy that follows the Googlebot’s guidelines.

Finally, you should now understand what Googlebots are, how they work, and what you have to do, to make sure you’re not left behind. From social media to SEO, marketing and quality content, everything is now in your hands. Not sure how to handle such a big responsibility? Get with Glide!