Feeds:
RSS
Atom

Recently a problem was discovered with several web sites. I must say that the problem is not TYPO3-specific but will happen to any web site that uses <base> tag. We will solve it for TYPO3, of course.

The problem happens as follows. Some user agents seems to ignore the <base> tag and requests invalid URLs. For example, while viewing the page at http://domain.tld/hello/world/ such agents may seea reference to the image at typo3temp/pics/12345.gif . If user agents ignores <base> tag, it will request http://domain.tld/hello/world/typo3temp/pics/12345.gif. The result for such requests is obvious: page not found and 404 error.

Now look closely to the URL. It is a TYPO3 URL. It means that TYPO3's 404 handler willbe invoked. Now if that page contains link to typo3temp/pics/12345.gif the process will become recursive. This will cause huge amount of useless traffic for the web site.

The problem is pretty serious and important for many web sites already. How to solve it?

If web site does not use linking across domains (new feature of TYPO3 4.2), the solution is pretty easy:

config.absRefPrefix = /

This will make all links absolute. There will be no need for config.baseURL at all. However this will not work with links across domains. So this solution is not universal. TYPO3 documentation does not recommend using config.absRefPrefix because it is not applied consistently across the system. So, is there a better solution?

Yes, there is. The following piece of code in .htaccess will solve the problem:

RewriteCond %{REQUEST_URI} ^.+(/(uploads|(typo3(conf|temp)))/.*)
RewriteRule ^.*$ %1 [L,R=301]

This code checks if there is anything before top-level TYPO3 directories in the URL. If yes, it redirects using 301 code (permanent redirect) to the proper place. As a result no 404 happens at all. This solution works independently of multi- or single domain and it does not invoke TYPO3/PHP at all. So it is better than any TYPO3-based solution.

Important: in order to work, the above shown code should be placed before this line in .htaccess:

RewriteRule ^(typo3|typo3temp|typo3conf|t3lib|tslib|fileadmin|uploads|showpic\.php)/ - [L]

I used Komodo IDE and its excellent Rx Toolkit to prepare and test regular expression for this article.

Like it? Then bookmark it! digg.comdel.icio.usgoogle.comMyLink.deYahooMyWebTechnoratiFurllive.comnetscapeTagThatWebnews

14 Comments

  1. on Monday, 26-05-08 14:01 Michiel Roos
    Great!

    :-)

  2. on Monday, 26-05-08 14:04 ben van 't ende
    Well you saved us just in time! We were experiencing a load of 33. After applying the rewrite rule load dropped to 1. I am still wondering however what sparked this? It seems to be some kind of virus. If that is the case it does not seem likely to me that the purpose of this virus is to bring a server down.

    gRTz and tHNx

    ben
  3. on Monday, 26-05-08 16:21 Olivier Dobberkau
    Hi Dimi.

    We had the same problems beginning of may. there is a bug in the 404 handling of TYPO3.

    http://bugs.typo3.org/view.php?id=8457
    http://bugs.typo3.org/view.php?id=8343

    we managed to get a hold of it with apache rewrite rules:

    # '%{REQUEST_FILENAME}' part.
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-l
    RewriteCond %{REQUEST_URI} ^.*((/[^./]*)|(.html))$
    # add further file formats like this:(/|.(html|xml))


    greetings,

    olivier
  4. on Monday, 26-05-08 19:24 Dmitry Dulepov
    @Ben:

    Well,I think it is one of these things where doing nothing pays back a lot. For a long time I have a theory that thinking on the background gives much better result than rushing with implementation. Check "The parable of two programmers" :)

    @Oliver:
    I do not see the bug in 404 implementation here. It is clearly wrong behavior of some user agents. And I bet most of them have "FunWebProducts" in the user agent string. The second bug tells that 404 code is not sent. This can be either a real bug or misconfiguration. Needs to be checked.
    Btw, how your RewriteCond works? It looks like it rewrites all virtual html files to index.php. But it does not rewtite js/css/gif/png and other files in directories. Which means you still have many wrong 404 requests.
  5. on Wednesday, 28-05-08 09:42 Jonas
    Hi Dimitry

    Where is the linking accross domains described? I couldn't find this in the 4.2 release notes, I couldn't also see it within the ChangeLog (on a first glance...).

    Is there somewhere a hint about how this accross domains linking works?

    Best Regards,
    Jonas
  6. on Thursday, 29-05-08 15:16 Nathan
    I had the same problem this year, with over 400,000 views to our 404 page, most of them recursive.

    One solution, especially if your 404 page does not have any special logic in it, is to create a flat 404 page with all absolute URLs on the images and links.
  7. on Thursday, 29-05-08 20:05 Christopher
    Hi Dmitry,

    Judging from the logs of the site where I've seen this problem, it looks like the RewriteCond line should include 'fileadmin' as well (at least if there are links into this from any pages):

    RewriteCond %{REQUEST_URI} ^.+(/(fileadmin|uploads|(typo3(conf|temp)))/.*)
  8. on Wednesday, 11-06-08 11:34 Benni
    Hey Dmitry,

    maybe we should update the _.htaccess file that ships with the TYPO3 packages for that?
    What do you think?
  9. on Wednesday, 25-06-08 09:58 Michiel Roos
    Hi Dmitry,

    I just saw a realurl errorlog entry that requested prototype.js: http://typofree.org/articles/optimizing-typo3-backend-responsiveness/typo3/contrib/prototype/prototy[..] I think that the typo3 dir shoudl be included too by rephrasing:
    RewriteCond %{REQUEST_URI} ^.+(/(fileadmin|uploads|(typo3(conf|temp)))/.*)

    Into:
    RewriteCond %{REQUEST_URI} ^.+(/(fileadmin|uploads|(typo3(|conf|temp)))/.*)

    Note the addittional pipe in front of conf.
  10. on Wednesday, 25-06-08 10:49 Michiel Roos
    Ok, maybe that was not a good idea.

    Since this: /home/category/typo3/

    Will then redirect to /typo3/

    So this 404 problem is still not completely solved for bogus calls to files inside /typo3/contrib.
  11. on Tuesday, 01-07-08 13:13 Michiel Roos
    Ok, one more comment . . .

    You may have to add an init variable to your realurl configuration to get the absrefPrefix into all your links:

    'reapplyAbsRefPrefix' => 1

    Realurl loses the prefix by default.
  12. on Wednesday, 09-07-08 11:27 Michiel Roos
    Owk, I promise this is the last one:

    AVG seems to be the culprit:

    http://it.slashdot.org/article.pl?sid=08/07/03/1411254

    Block it using:

    #Here we assume certain MSIE 6.0 agents are from linkscanner
    #redirect these requests back to avg in the hope they'll see their silliness
    Rewritecond %{HTTP_USER_AGENT} ".*MSIE 6.0; Windows NT 5.1; SV1.$" [OR]
    Rewritecond %{HTTP_USER_AGENT} ".*MSIE 6.0; Windows NT 5.1;1813.$"
    RewriteCond %{HTTP_REFERER} ^$
    RewriteCond %{HTTP:Accept-Encoding} ^$
    RewriteRule ^.* http://www.avg.com/?LinkScannerSucks [R=307,L]
  13. on Monday, 22-09-08 05:48 Tanya Powell
    Hm,
    the solution is *very* effective.

    I adapted the Condition to:
    RewriteCond %{REQUEST_URI} ^.+(/(fileadmin|uploads|typo3conf|typo3temp)/.*)

    which sorts out the /page/about/typo3 problem and my problem with typo3temp.

    I hope I understand correctly that the rewrite rule means that incoming requests of /any/path/(fileadmin|uploads|typo3conf|typo3temp)/ will be rewritten to just (fileadmin|uploads|typo3conf|typo3temp)/

    If that is true I have a thought, this is what happens:

    1. request uri contains /nice/readable/path/
    2. a reference in the HTML output is missing a leading slash (like tempo3temp/stylesheet*.css)
    3. Browser (doing it's job correctly) requests /nice/readable/path/typo3temp/stylesheet*.css
    4. Webserver and typo3 (correctly) state that the file could not be found.

    The rewrite solution comes in between 3. and 4. correct?

    Which is good when you can't find how to influence typo3 to add a leading slash to the typo3temp references, which I can't. It would be even more brilliant if realurl were built in... I'm dreaming. Thanx for the Rewrite Rule!

    Tanya
  14. on Tuesday, 07-10-08 17:09 Krystian Szymukowicz
    Can you just check if "rename" and "replace" file function in DAM extension still works after you applied this changes in your .htaccess?

    At my installations it does not.

Leave a Reply