teerapan
New Member Posts:4
|
12/13/2010 3:53 AM |
|
I installed DMX 5.3.4 on DotNetNuke 5.6 (459). Please help. I am doing presentation for corporate function committee but can not get content search to work. At the beginning, I couldn't search any document at all. After I run re-indexing script, I could search by file name but still enter counter the following error: - Cannot find file name "IT 056-53.doc" nor "IT 05653.doc" when search by keyword "IT". However, search result did show file name "IT Letter 1.doc". (All three are the same file with different file name). - Cannot search keyword "Gartner" inside pdf file (but I can copy and paste key word directly from pdf so.. it is not just a graphic). - Cannot search keyword "สารสนเทศ" (Thai Charater) in Microsoft Word file "IT 056-53" mentioned above. However, search result for keyword "สำนักสารสนเทศ" show both file "IT 056-53.doc", "IT 05653.doc" and "IT Letter 1.doc". (This proved the content search did work in some cases in my installation". In conclusion, I observe that search DMX search have problem with the following cases: - Some pattern of file name e.g. "IT 056-53.doc", "IT 05653.doc" - Content search inside PDF file is not working - Content search in MS Word using partial of word is not working Could you please help me fix this issue before my presenation to committee on 3rd week of this month? Thank you in advance. |
|
|
|
|
teerapan
New Member Posts:4
|
12/13/2010 4:26 AM |
|
For more information, I did more testing and pretty much can confirm that searching using partial word doesn't work. This could be a big problem for Thai language because there is no separator between word in Thai language. When people type "space" in Thai writing, it means the begining of the next sentence (We don't use "." to separate sentence either). |
|
|
|
|
teerapan
New Member Posts:4
|
12/13/2010 6:00 AM |
|
As for PDF search, it was my mistake. My server is running on 64 bit version. I downloaded and installed PDF iFilter from www.adobe.com and it work now. However, it seems to have the same problem with using part of word as a keyword. |
|
|
|
|
teerapan
New Member Posts:4
|
12/13/2010 6:15 AM |
|
For your information, I switched to Microsoft Indexing Service and had none of above problem. What is the drawback of using Microsoft Indexing Service? If there are not many drawbacks, I will propose team to use Indexing Service instead. Thanks |
|
|
|
|
Peter Donker
Veteran Member Posts:4536
|
12/15/2010 3:55 PM |
|
Hi Teerapan, Lucene is probably not well designed for Thai. It is sensitive to "tokenizing" ... meaning breaking down to words using spaces. That is at the heart of it. Probably you're better off using IS. The reason that DMX uses Lucene by default is that there are no configuration issues. IS needs permission on the server to configure and some customers don't have that level of access. Peter |
|
|
|
|