Recently a problem was discovered with several web sites. I must say that the problem is not TYPO3-specific but will happen to any web site that uses <base> tag. We will solve it for TYPO3, of course.
The problem happens as follows. Some user agents seems to ignore the <base> tag and requests invalid URLs. For example, while viewing the page at http://domain.tld/hello/world/ such agents may seea reference to the image at typo3temp/pics/12345.gif . If user agents ignores <base> tag, it will request http://domain.tld/hello/world/typo3temp/pics/12345.gif. The result for such requests is obvious: page not found and 404 error.
Now look closely to the URL. It is a TYPO3 URL. It means that TYPO3's 404 handler willbe invoked. Now if that page contains link to typo3temp/pics/12345.gif the process will become recursive. This will cause huge amount of useless traffic for the web site.
The problem is pretty serious and important for many web sites already. How to solve it?
If web site does not use linking across domains (new feature of TYPO3 4.2), the solution is pretty easy:
config.absRefPrefix = /
This will make all links absolute. There will be no need for config.baseURL at all. However this will not work with links across domains. So this solution is not universal. TYPO3 documentation does not recommend using config.absRefPrefix because it is not applied consistently across the system. So, is there a better solution?
Yes, there is. The following piece of code in .htaccess will solve the problem:
RewriteCond %{REQUEST_URI} ^.+(/(uploads|(typo3(conf|temp)))/.*)
RewriteRule ^.*$ %1 [L,R=301]
This code checks if there is anything before top-level TYPO3 directories in the URL. If yes, it redirects using 301 code (permanent redirect) to the proper place. As a result no 404 happens at all. This solution works independently of multi- or single domain and it does not invoke TYPO3/PHP at all. So it is better than any TYPO3-based solution.
Important: in order to work, the above shown code should be placed before this line in .htaccess:
RewriteRule ^(typo3|typo3temp|typo3conf|t3lib|tslib|fileadmin|uploads|showpic\.php)/ - [L]
I used Komodo IDE and its excellent Rx Toolkit to prepare and test regular expression for this article.

:-)
gRTz and tHNx
ben
We had the same problems beginning of may. there is a bug in the 404 handling of TYPO3.
http://bugs.typo3.org/view.php?id=8457
http://bugs.typo3.org/view.php?id=8343
we managed to get a hold of it with apache rewrite rules:
# '%{REQUEST_FILENAME}' part.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteCond %{REQUEST_URI} ^.*((/[^./]*)|(.html))$
# add further file formats like this:(/|.(html|xml))
greetings,
olivier
Well,I think it is one of these things where doing nothing pays back a lot. For a long time I have a theory that thinking on the background gives much better result than rushing with implementation. Check "The parable of two programmers" :)
@Oliver:
I do not see the bug in 404 implementation here. It is clearly wrong behavior of some user agents. And I bet most of them have "FunWebProducts" in the user agent string. The second bug tells that 404 code is not sent. This can be either a real bug or misconfiguration. Needs to be checked.
Btw, how your RewriteCond works? It looks like it rewrites all virtual html files to index.php. But it does not rewtite js/css/gif/png and other files in directories. Which means you still have many wrong 404 requests.
Where is the linking accross domains described? I couldn't find this in the 4.2 release notes, I couldn't also see it within the ChangeLog (on a first glance...).
Is there somewhere a hint about how this accross domains linking works?
Best Regards,
Jonas
One solution, especially if your 404 page does not have any special logic in it, is to create a flat 404 page with all absolute URLs on the images and links.
Judging from the logs of the site where I've seen this problem, it looks like the RewriteCond line should include 'fileadmin' as well (at least if there are links into this from any pages):
RewriteCond %{REQUEST_URI} ^.+(/(fileadmin|uploads|(typo3(conf|temp)))/.*)
maybe we should update the _.htaccess file that ships with the TYPO3 packages for that?
What do you think?
I just saw a realurl errorlog entry that requested prototype.js: http://typofree.org/articles/optimizing-typo3-backend-responsiveness/typo3/contrib/prototype/prototy[..] I think that the typo3 dir shoudl be included too by rephrasing:
RewriteCond %{REQUEST_URI} ^.+(/(fileadmin|uploads|(typo3(conf|temp)))/.*)
Into:
RewriteCond %{REQUEST_URI} ^.+(/(fileadmin|uploads|(typo3(|conf|temp)))/.*)
Note the addittional pipe in front of conf.
Since this: /home/category/typo3/
Will then redirect to /typo3/
So this 404 problem is still not completely solved for bogus calls to files inside /typo3/contrib.
You may have to add an init variable to your realurl configuration to get the absrefPrefix into all your links:
'reapplyAbsRefPrefix' => 1
Realurl loses the prefix by default.
AVG seems to be the culprit:
http://it.slashdot.org/article.pl?sid=08/07/03/1411254
Block it using:
#Here we assume certain MSIE 6.0 agents are from linkscanner
#redirect these requests back to avg in the hope they'll see their silliness
Rewritecond %{HTTP_USER_AGENT} ".*MSIE 6.0; Windows NT 5.1; SV1.$" [OR]
Rewritecond %{HTTP_USER_AGENT} ".*MSIE 6.0; Windows NT 5.1;1813.$"
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP:Accept-Encoding} ^$
RewriteRule ^.* http://www.avg.com/?LinkScannerSucks [R=307,L]
the solution is *very* effective.
I adapted the Condition to:
RewriteCond %{REQUEST_URI} ^.+(/(fileadmin|uploads|typo3conf|typo3temp)/.*)
which sorts out the /page/about/typo3 problem and my problem with typo3temp.
I hope I understand correctly that the rewrite rule means that incoming requests of /any/path/(fileadmin|uploads|typo3conf|typo3temp)/ will be rewritten to just (fileadmin|uploads|typo3conf|typo3temp)/
If that is true I have a thought, this is what happens:
1. request uri contains /nice/readable/path/
2. a reference in the HTML output is missing a leading slash (like tempo3temp/stylesheet*.css)
3. Browser (doing it's job correctly) requests /nice/readable/path/typo3temp/stylesheet*.css
4. Webserver and typo3 (correctly) state that the file could not be found.
The rewrite solution comes in between 3. and 4. correct?
Which is good when you can't find how to influence typo3 to add a leading slash to the typo3temp references, which I can't. It would be even more brilliant if realurl were built in... I'm dreaming. Thanx for the Rewrite Rule!
Tanya
At my installations it does not.