Indexing & Crawling for SEO

What’s the difference between crawling and indexing and why is that important for my business?

Both crawling and indexing are vital in understanding how your website and business is displayed in search results.

Crawling refers to the process of search engine bots (crawlers or spiders) navigating through the web and following links to discover new pages. The crawling process includes understanding site content, structure and code to gather information with a view to updating the search engine index.

Indexing is the process of storing the information gathered during the crawling process on a search engine, and how this information is displayed in the SERP (search engine result page). If your website is easily understood and deemed good value by the crawler, then they will be indexed accordingly in relation to relevant user search queries.

Both these elements are crucial to your business as they directly impact how your website is displayed on search engines. If search engine crawlers can’t access your web pages, then they won’t index them, and your business won’t appear in search engines for your target search queries. This means people won’t be able to find your business on search engines.

Does crawling and indexing have a direct impact on my website revenue?

Given that crawling and indexing play a vital role in determining the visibility of your website in search engine results, then there is a direct tangential link between your revenue and sales and how this relates back to how your website is crawled and indexed. 

If some of your core money pages or PLPs (product listing pages, for ecommerce owners) aren’t accessible to crawlers then they won’t display on search results on queries relevant to them and subsequently won’t generate revenue from the organic channel. 

We have a large website and Google data suggests that a large part of our website is not indexed, why?

There could be a number of reasons for this. It could be that these pages are very deep in your site architecture and aren’t linked to from relevant hub or category pages. It could also be the case that Google is crawling these pages but is choosing not to index them as they are deemed “low quality” from a content perspective and don’t offer the user much value from Google’s point of view.

Other technical reasons could include the use of JavaScript blocking crawlers, pages loading incredibly slowly, or deliberate deindexation using certain HTML or robots.txt directives.

In any case, take steps to understand the commercial value and revenue potential of these pages not being indexed. Are they money pages or lead generation pages or are they retired legacy blog content that offers little contemporary value? If the former, work with your marketing team to take steps to remove indexation blockers. Remember it’s ok to leave certain sections of your site out of the index.

How do I know if Google is correctly crawling my website?

If your website is verified under the Google Search Console tool, your marketing team will get regular email notifications if there are any errors in crawling and indexing. Within the tool itself, your team can also review the “Crawl stats” report which will show a graph of crawl requests over time.

You can also speak to your developer team to give reports on whether “Googlebot”, Google’s web crawler software, is showing up in your website’s server log files on an ongoing basis. There are a number of third-party SEO tools that can help you pull this data with ease and monitor on a recurring basis.

How do I measure indexing and crawling?

For product managers and marketing teams, tools such as Google Search Console are great ways to do this. The aforementioned “Crawl stats” report will give you an overview of crawl behaviour over different areas of your website. The tool’s “page indexing” report will also give you an overview of which pages are indexed and which pages are not. There are also plenty of third-party tools that can visualise this and measure at scale on an ongoing basis.

Get your marketing team to build out a dashboard on a service such as Looker Studio that connects to crawl data from a tool such as Google Search Console. There are plenty of ways to stream data continually and visualise crawl data to your website on a continual basis. You can also tie in key commercial metrics such as conversions and revenue to your crawling and indexing data. 

If you’re curious about understanding whether an individual page is indexed, such as a recent company blog post or new product page, enter the URL in Google Search Console’s “URL inspection tool” to understand its indexation status. You can also use the site search operator in Google itself to see if the page is indexed. An example of this would be to enter the command site:mysite.com/product/specific-url/ into Google.

Are there any KPIs I should consider when it comes to indexing and crawling?

Every website will have a nuanced approach to how they judge success in their crawling and indexing, with ideas on what should or shouldn’t be indexed varying depending on the business. 

When it comes to tracking KPIs, you can look at areas such as number of indexed pages and how they are generating organic traffic, conversions and subsequent revenue. If indexed pages aren’t generating either of these, then you will need to work with your marketing team to look at ways at optimising them further.

As a business owner should I care about indexing and crawling?

Every website will have a nuanced approach to how they judge success in their crawling and indexing, with ideas on what should or shouldn’t be indexed varying depending on the business. 

When it comes to tracking KPIs, you can look at areas such as number of indexed pages and how they are generating organic traffic, conversions and subsequent revenue. If indexed pages aren’t generating either of these, then you will need to work with your marketing team to look at ways at optimising them further.

Do I need to worry about both Bing and Google?

Currently, Bing only occupies around 8% of the search engine market share on desktop, and only around .5% on mobile. Despite a lot of changes in SEO over the years, Bing hasn’t really taken off in terms of making a serious dent in the search engine market share. From a business owner point of view,  Bing in general is likely to not be a massive priority in terms of organic revenue opportunity. 

That being said, with the oncoming changes with the use of AI answers in search results and Bing being very central in this, it’s worth keeping an eye on this moving forward.If you’re curious, then get your marketing team to look at your analytics stats on Bing.

If your website is getting a lot of visits via Bing, then consider setting up a Bing Webmaster Tools property and optimise for performance on Bing as well.

How can I make my website faster and will that improve conversions and sales?

There are many ways to do this and a lot will depend on your tech stack, including which CMS (content management system) or server you are on. For example, if you are on WordPress, there are a lot of ready-to-go plugins that you can install which can automate some aspects of site speed. Using a CDN (content delivery network) is another easy way to improve site speed.

A key thing is to get your development team to be mindful of resource size across your website. This includes use of images and code on your site, such as CSS and JavaScript. If there are savings to be made from superfluous or old code, or if you’re using lots of interactive elements or videos on your webpages, get them to investigate the impact this may be having on site speed.

Why do my pages not appear in Google?

We’ve alluded to specific reasons as to why certain sections of a site may not appear on Google, though there may be some larger aspects at play.

Your website may be very new and with little to no links pointing to it, and as such Google is taking some time to discover your site and its pages. You may also need to check whether you have submitted a sitemap to Google, and this is up to date in terms of the current pages on your site.

You may also have specific “noindex” directives on your pages which disparage search engines to index pages. Check with your team on the status of your robots.txt file too – this a file which directs web crawlers to which sections or pages of a site they are allowed to visit. Make sure there aren’t any pages being blocked by crawlers here.

While very rare these days, you may also be the target of a manual Google penalty, where your website isn’t complying with Google’s guidelines and you may need to take certain steps to recover from this as it will be having a big impact on revenue.

What tools can I use to understand my indexing and crawling?

Aside from using Google Search Console, crawling tools such as Screaming Frog, BigMetrics, Botify and Lumar can be great assets to your team’s arsenal in monitoring crawling and indexing on an ongoing basis.

Many of these tools will give you an overview of how your website is being crawled and indexed using Googlebot or their own “user agent” crawlers, and will allow you easily troubleshoot any obvious issues.

Indexing and Crawling SEO Case Study

Indexation issues on large sites.

Problem: A large audio ecommerce site was having trouble indexing a new section of its website recently launched from a staging site. Let’s call it the “Microphones” section. All pages under this section sit under a /microphones/ folder structure on the site. There are plenty of product URLs on this section that could be solid traffic and revenue generators.

Solution: It was discovered that the /microphones/ section of the site was recently added to the site from a staging website where the URLs were noindexed during the testing phase. These tags weren’t removed when the pages went live. Removing noindex tags on all URLs opened them up to indexation. Additionally, this new section of the site wasn’t updated in the sitemaps. 

Once this was rectified, further internal links were built out to microphone product pages from the main site nav and in relevant well-performing blog posts across the site to pass authority. These then saw improved revenue generated from these product pages as visibility improved.

Let's work together?
Discuss Project