Question About how Google Crawls Websites (hidden Pages)

can google find, crawl and index a page whose visibility is hidden (hidden meaning there's no link to it from the homepage) and has no internal links directed towards it?

the page in question links towards a basic page of the site that is visible (visible meaning that you can go to the homepage and click on a link to get to it).

it makes sense that when google crawls a website that they start at the homepage and follow the links down.

my question: is google able to find pages that go in reverse order, linking up to main pages just as they do linking down from basic to specific pages?
 
A web spider may find pages on your site that are not linked either through backlinks or predictive spidering. The best way to prevent it from getting indexed is to exclude it from your robots.txt. However this often allows people to see “hidden” pages on your site by just referencing your robots.txt. Another way to tell crawlers not to index a specific page should it find it is through a noindex META tag in the head element of the page you want to not get indexed. More information can be found here.

noindex - Wikipedia, the free encyclopedia
 
Last edited:
Yep, technically Google can find,crawl and index just about any page regardless of incoming links.

Thanksfully Google, Yahoo and Bing will respect a noindex request, unlike many other search bots like Yandex.

To put a noindex tag on a specific page you can put this in the page header. The following is a noindex request to all bots:

<meta name="robots" content="noindex">

You can also add the request for your specific page in a robots.txt file. I'd link you to a few good tutorials but I'm not allowed to do that yet. SEOBook has a decent one.
 
Google can "find" a page in many other ways besides spidering your site.

The use of no index will help, but it won't solve for search engines that ignore the request or spammers that take your content and replicate it elsewhere.
 
Google can and will index any url they like.

Case in point, Google will routinely index WP folders like wp-content, plugin folders, etc even if u block them with the robots txt doc.
 
Back
Top