Ticket #501 (assigned defect)
Lucene search needs an optional stop words dictionary to cope with very large sites like FM
| Reported by: | tboutell | Owned by: | tboutell |
|---|---|---|---|
| Priority: | major | Milestone: | 1.5 |
| Component: | apostrophePlugin | Version: | 1.4 |
| Keywords: | search, zend, scalability | Cc: | geoffd, dordille |
| Symfony version: | 1.4 |
Description
Large sites will run out of memory conducting searches for common words due to the high memory usage of Zend Lucene core.
There's a right way to add stop words:
http://framework.zend.com/manual/en/zend.search.lucene.extending.html
You would need to get the current analyzer and add the stop word filter to it just before searching or updating the index.
The stop word dictionary could live right in app.yml or be in a separate file in data/. There could be a folder of them, named by culture code. And it shouldn't be mandatory to use a stop word dictionary.
If you don't have time right now, you could hack this by just stripping stop words out of queries in an override of a/search, however they would still be cluttering up the index, and you'd be stripping out words that have significance in Zend search like 'AND' and 'OR'.
