In this post, I am going to provide an introduction to crawling and indexing. I will also share how you can check if your site is being crawled and indexed successfully. And what to do if your site isn’t indexed.
Let’s get into it.
Indexing your website on search engines begins with crawling.
In order to show up in search results, your content needs to first be visible to search engines. It’s arguably the most important piece of the SEO puzzle: If your site can’t be found, there’s no way you’ll ever show up in the SERPs (Search Engine Results Page).
Crawling is the discovery process in which search engines send out a team of robots (known as crawlers or spiders) to find new and updated content.
The second kind of crawling is Refresh, where Google finds changes in webpages that are already indexed.
Googlebot starts out by fetching a few web pages and then follows the links on those web pages to find new URLs. By hopping along this path of links, the crawler is able to find new content and add it to its index.
For more in-depth information check out this guide by Google.
Search engines process and store information they find in an index, a huge database of all the content they’ve discovered and deem good enough to serve up to searchers.
Indexing essentially refers to the adding of a webpage’s content into Google to be considered for rankings.
For more in-depth information check out this guide by Google.
To check if your website is on Google you can do a “site:yoursite.com” search on Google. This will return results Google has in its index for the site specified:
The number of results Google displays (see “About XX results” above) isn’t exact, but it does give you a solid idea of which pages are indexed on your site and how they are currently showing up in search results.
If your site shows up in the results, great! You have nothing to worry about.
It is worth checking some specific pages, perhaps a key service page to make sure it’s in the index.
To do this, simply add the URL string after “site:”. For example, “site:yoursite.com/best-seller”
If your site doesn’t show, it could just mean that your site is new. And Google hasn’t found it yet.
If you know your site isn’t new, it probably means that your site has inadvertently blocked search engines from crawling it (which is surprisingly common!) Either way, you want to get this fixed ASAP.
Here are the top 3 causes stopping web pages from being crawled.
The whole website or certain pages can remain unseen by Google for a simple reason: its site crawlers are not allowed to enter them.
Without realising you may be blocking the page from indexing through robots meta tag.
If you do this, the search bot will not even start looking at your page’s content, moving directly to the next page.
You can detect this issue checking if your page’s code contains this directive:
To check your page’s code on Chrome, right-click and select “View Page Source”
Second, you may be blocking the pages from indexing through robots.txt.
Robots.txt is the first file of your website the crawlers look at. The most painful thing you can find there is:
It means that all the website’s pages are blocked from indexing.
It might happen that only certain pages or sections are blocked, for instance:
In this case, any page in the Products subfolder will be blocked from indexing and, therefore, none of your product descriptions will be visible in Google.
To check your robots.txt visit “yoursite.com/robots.txt”.
In this case, the site crawler will index your page’s content but will not follow the links. There are two types of no follow directives:
in the page’s code – that would mean the crawler can’t follow any link on the page.
If you are seeing one or all of these directives you need to instruct your developer to remove them as they are stopping your website from appearing on Google.
Issues with meta tags and robots.txt aren’t the only things stopping your website from showing in Google.
To give your website the best chance of being crawled and indexed, make sure you do the following:
A sitemap is just like it sounds: a “map” of your site. Google and other search engines use sitemaps to find all of the pages on your site.
You can usually find yours by typing one of these URLs into your browser:
If it’s not there, go to website.com/robots.txt where it’ll usually be listed
A sitemap helps ensure that all of your important pages are being crawled and indexed.
If you don’t have an XML sitemap, it’s essential to create one! If you are not sure how to do this, speak to your developer or get in touch with me.
Once you have found or created your sitemap you need to submit it to Google via the Search Console:
Both visitors and search engines need to be able to navigate your site easily and intuitively, which is why it’s important to create a logical hierarchy for your content.
The easiest way to do this is to sketch out a mind map. Each of the branches in your mind map will become internal links, which are links from one page on a website to another.
This site structure should be used as your menu/navigation. This way Google can easily crawl your website.
An internal link is any link from one page on your website to another page on your website. Both your users and search engines use links to find content on your website. Your users use links to navigate through your site and to find the content they want to find. Search engines also use links to navigate your site. They won’t see a page if there are no links to it.
There are several types of internal links. In addition to links on your homepage, menu, post, etc, you can also add links within your content. We call those contextual links. Contextual links point your users to interesting and related content.
Internal links are crucial for UX and SEO for a few reasons:
You should now have a good understanding of crawling and indexing; what it is and why it is important.
To ensure your website is in the best position to be crawled and indexed make sure there are no directives blocking Google from crawling. Add an XML sitemap > upload it to GSC. Create a logical website structure and use internal links to help Google find the content you want to be indexed.
|cookielawinfo-checkbox-analytics||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".|
|cookielawinfo-checkbox-functional||11 months||The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".|
|cookielawinfo-checkbox-necessary||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".|
|cookielawinfo-checkbox-others||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.|
|cookielawinfo-checkbox-performance||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".|