Robots.txt designed to prevent search engine to crawl your page or content, some of the content probably you don’t want others to search about it.
Example, below shows that this site prevent all user agent (mostly search engine) to crawl the content of the entire site.
1 2
| User-agent: * Disallow: /
|
However, this could be a loophole for giving a chance to hacker to hack into your site, because you have exposed the paths.
Take a look on Facebook’s robots.txt, it is
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
| # Notice: Crawling Facebook is prohibited unless you have express written # permission. See: http:
User-agent: baiduspider Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: Bingbot Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: Googlebot Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: ia_archiver Disallow: / Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: msnbot Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: Naverbot Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: seznambot Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: Slurp Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: teoma Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: Yandex Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: Yeti Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /hashtag/ Disallow: /l.php Disallow: /p.php Disallow: /photo.php Disallow: /photos.php Disallow: /sharer/ Disallow: /topic/
User-agent: ia_archiver Allow: /about/privacy Allow: /full_data_use_policy Allow: /legal/terms Allow: /policy.php
User-agent: * Disallow: /
|
So the hacker may try to access in this way, e.g. www.facebook.com/topic/ (I have tried it, it shows page not available).
How to prevent this?
You can choose a modern web framework to develop your web application, example like Laravel, the path is you can specify it by your own, e.g.
1 2 3 4
| <?php Route::get('foo/bar', function () { return 'Hello World'; });
|
When hacker try to look for www.yoursite.com/foo, he/she won’t get anything here. So becareful when you design your web application.
Update: 4 Apr 2018
You may check your robots.txt here.