rewrite rule medo

From: Peter (BOUGHTONP)11 Feb 2017 01:42
To: CHYRON (DSMITHHFX) 2 of 9
Stop using conditions you don't need.
RewriteRule ^.*/mobile/ - [L,PT]
RewriteRule ^.*/index.html$ - [L,PT]
RewriteRule /([a-z]+)-([^-]+)\.html$ /#$1/$2 [L,R=301,NE,QSA]
RewriteRule /([a-z]+)\.html$ /#$1 [L,R=301,NE,QSA]
From: CHYRON (DSMITHHFX)11 Feb 2017 11:12
To: Peter (BOUGHTONP) 3 of 9
Nice! I'll test it on Monday. Thanks
From: Peter (BOUGHTONP)11 Feb 2017 13:51
To: CHYRON (DSMITHHFX) 4 of 9
If I was more awake when I wrote that, I would have pointed out that the RewriteRule pattern is checked before the associated RewriteCond conditions, so in addition to being simpler and less code duplication, it reduces the number of unnecessary checks too.

Possibly a clearer way of explaining is that RewriteCond is not like an IF statement, but rather it's additional filtering checked only if the RewriteRule pattern is a match (but before the replacement/rewriting occurs).

From: CHYRON (DSMITHHFX)13 Feb 2017 16:59
To: Peter (BOUGHTONP) 5 of 9
Worked like a charm!

This part is confusing to me:
Quote: 
RewriteCond is not like an IF statement, but rather it's additional filtering checked only if the RewriteRule pattern is a match
Apache docs puts it this way:
Code: 
...RewriteCond directives can be used to restrict the types of requests that will be subject to the following RewriteRule. 


Now I *might* need RewriteCond's, to filter out search engines from the rewrite. So far this ain't doin nuthin':
Quote: 
RewriteCond %{HTTP_USER_AGENT} !^(google|yahoo|bing) [NC]
RewriteCond %{HTTP_REFERER} !^(google|yahoo|bing) [NC]



 
From: Peter (BOUGHTONP)13 Feb 2017 22:37
To: CHYRON (DSMITHHFX) 6 of 9
The Apache docs would be more accurate if they said "...subject to rewriting by the following RewriteRule".

It is disappointing the docs don't mention it at all - since it's not obvious, but anyhow the easiest way to prove it is the source: apply_rewrite_rule in mod_rewrite.c

RewriteRule matching is preceded by this comment...

    /* Try to match the URI against the RewriteRule pattern
     * and exit immediately if it didn't apply.
     */

And after that we have this...

    /* Ok, we already know the pattern has matched, but we now
     * additionally have to check for all existing preconditions
     * (RewriteCond) which have to be also true. We do this at
     * this very late stage to avoid unnecessary checks which
     * would slow down the rewriting engine.
     */     

Curious to see performance given as the reason, since arguably simpler string header checks could be cheaper than the convoluted regexes that can occur - having the option to choose when the condition applied would allow the best performance.

From: Peter (BOUGHTONP)13 Feb 2017 22:46
To: CHYRON (DSMITHHFX) 7 of 9
As for the search engine stuff, the caret (^) is anchoring your match to the start of the string, but the Googlebot useragent is "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" so remove the caret. Also you shouldn't need the parentheses - the ! is a prefix, so try just "RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]"

Is it not simpler to use robots.txt to block them?

From: CHYRON (DSMITHHFX)14 Feb 2017 01:15
To: Peter (BOUGHTONP) 8 of 9
The object is to allow search engines to crawl unrewritten *.html urls (except index, and those in mobile/), and to rewrite human-submitted urls (from search results) with *.html suffix to the hashed urls -- I've got it all working with javascript redirects, but I think intercepting it before anything gets served would be preferable. Suffice to say it's become an academic exercise as the client has decided they don't want the app to be searchable after all. Now I just want to see if I can get the htaccess method to work.
EDITED: 14 Feb 2017 01:17 by DSMITHHFX
From: CHYRON (DSMITHHFX)17 Feb 2017 19:43
To: ALL9 of 9
So here's what ended up testing out on two different Apache 2.2 servers

OS X development server on powermac G5 (Apache installed through macports), localhost:8081 pointed at virtualhost:
Code: 
RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/
RewriteCond %{REQUEST_FILENAME} !^/index.html$
RewriteRule ^([a-z]+)-(.+)\.html$ /#$1/$2 [NE,R=301,L]

RewriteCond %{REQUEST_URI} !^.*/#[a-z]+/[.*]$
RewriteRule ^([a-z]+)\.html$ /#$1 [NE,R=301,L]
Staging server on Ubuntu 14.04 ppc (powermac G4), hosted in an "seo2" subdirectory:
Code: 
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/.*$
RewriteCond %{REQUEST_URI} !^.*/index.html$
RewriteRule ([a-z]+)-(.+)\.html$ /seo2/#$1/$2 [NE,R,L]

RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/.*$
RewriteCond %{REQUEST_URI} !^.*/#[a-z]+/[.*]$
RewriteRule ([a-z]+)\.html$ /seo2/#$1 [NE,R,L]
Not found any good online htaccess documentation or tutorials (relied a lot on stackoverflow), so these evolved through a lot of trial and (mostly) error.

htaccess seemed pretty erratic and unreliable on the staging server with subdirectory, with frequent browser cache-clearing required or sometimes just waiting a few hours.