rewrite rule medo

From: Peter (BOUGHTONP)13 Feb 2017 22:37
To: CHYRON (DSMITHHFX) 6 of 9
The Apache docs would be more accurate if they said "...subject to rewriting by the following RewriteRule".

It is disappointing the docs don't mention it at all - since it's not obvious, but anyhow the easiest way to prove it is the source: apply_rewrite_rule in mod_rewrite.c

RewriteRule matching is preceded by this comment...

    /* Try to match the URI against the RewriteRule pattern
     * and exit immediately if it didn't apply.
     */

And after that we have this...

    /* Ok, we already know the pattern has matched, but we now
     * additionally have to check for all existing preconditions
     * (RewriteCond) which have to be also true. We do this at
     * this very late stage to avoid unnecessary checks which
     * would slow down the rewriting engine.
     */     

Curious to see performance given as the reason, since arguably simpler string header checks could be cheaper than the convoluted regexes that can occur - having the option to choose when the condition applied would allow the best performance.

From: Peter (BOUGHTONP)13 Feb 2017 22:46
To: CHYRON (DSMITHHFX) 7 of 9
As for the search engine stuff, the caret (^) is anchoring your match to the start of the string, but the Googlebot useragent is "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" so remove the caret. Also you shouldn't need the parentheses - the ! is a prefix, so try just "RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]"

Is it not simpler to use robots.txt to block them?

From: CHYRON (DSMITHHFX)14 Feb 2017 01:15
To: Peter (BOUGHTONP) 8 of 9
The object is to allow search engines to crawl unrewritten *.html urls (except index, and those in mobile/), and to rewrite human-submitted urls (from search results) with *.html suffix to the hashed urls -- I've got it all working with javascript redirects, but I think intercepting it before anything gets served would be preferable. Suffice to say it's become an academic exercise as the client has decided they don't want the app to be searchable after all. Now I just want to see if I can get the htaccess method to work.
EDITED: 14 Feb 2017 01:17 by DSMITHHFX
From: CHYRON (DSMITHHFX)17 Feb 2017 19:43
To: ALL9 of 9
So here's what ended up testing out on two different Apache 2.2 servers

OS X development server on powermac G5 (Apache installed through macports), localhost:8081 pointed at virtualhost:
Code: 
RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/
RewriteCond %{REQUEST_FILENAME} !^/index.html$
RewriteRule ^([a-z]+)-(.+)\.html$ /#$1/$2 [NE,R=301,L]

RewriteCond %{REQUEST_URI} !^.*/#[a-z]+/[.*]$
RewriteRule ^([a-z]+)\.html$ /#$1 [NE,R=301,L]
Staging server on Ubuntu 14.04 ppc (powermac G4), hosted in an "seo2" subdirectory:
Code: 
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/.*$
RewriteCond %{REQUEST_URI} !^.*/index.html$
RewriteRule ([a-z]+)-(.+)\.html$ /seo2/#$1/$2 [NE,R,L]

RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/.*$
RewriteCond %{REQUEST_URI} !^.*/#[a-z]+/[.*]$
RewriteRule ([a-z]+)\.html$ /seo2/#$1 [NE,R,L]
Not found any good online htaccess documentation or tutorials (relied a lot on stackoverflow), so these evolved through a lot of trial and (mostly) error.

htaccess seemed pretty erratic and unreliable on the staging server with subdirectory, with frequent browser cache-clearing required or sometimes just waiting a few hours.