Before search engines can even consider ranking your content, they first needs to know that it exists.
The leading search engines, such as Google, Bing and Yahoo!, discover new content with a web crawler, sometimes called a spider. A spider systematically browses the web, by following links, typically for the purpose of web indexing. The crawler processes both the text as well as the title tags, meta tags, and alt attributes for images.
You can see which pages are indexed by Google by using the search operator “site:yourdomain.com” (e.g., site:rushmediaagency.com) You can see how often Google is crawling your pages by looking at the Search Console. How often web crawler will crawl a site depends on the crawl budget.
You can submit a XML Sitemap to Google to ensure that all pages are found, especially pages that are not crawled through other links on the web. A sitemap is a list of URLs on your site that crawlers can use to discover and index your content. One of the easiest ways to ensure Google is finding your highest priority pages is to create a file that meets Google’s standards and submit it through Google Search Console.
You can use Screaming Frog to create an XML sitemap. Screaming frog is free if you are crawling a website that has less then 500 pages.
For WordPress users you can create an XML sitemap using Yoast.
Once you created your XML sitemap you can submit it in the search console.
Here is a great how-to article by Neil Patel on how to create and submit a sitemap.
Keep in mind that frequent indexing improves your search results.
That said, some things can block Google’s crawlers:
- Poor internal linking: Google relies on internal links to crawl all the pages on your site. Pages without internal links often won’t get crawled.
- Nofollowed internal links: Internal links with nofollow tags won’t get crawled by Google.
- Noindexed pages: You can exclude pages from Google’s index using a noindex meta tag or HTTP header. If other pages on your site only have internal links from noindexed pages, there’s a chance that Google won’t find them.
- Robots.txt: Robots.txt is a text file that tells Google where it can and can’t go on your website. If pages are blocked Robots.txt Google won’t crawl them.