Is Solr available for .Net?

If you mean running the Solr server on .Net instead of Java, then no, there is no port. I’ve been trying to run it with IKVM here but it’s low-priority to me so I can’t put much time on it. It’d be great if someone can help out with this. If you mean using/connecting to … Read more

How to use a Lucene Analyzer to tokenize a String?

Based off of the answer above, this is slightly modified to work with Lucene 4.0. public final class LuceneUtil { private LuceneUtil() {} public static List<String> tokenizeString(Analyzer analyzer, String string) { List<String> result = new ArrayList<String>(); try { TokenStream stream = analyzer.tokenStream(null, new StringReader(string)); stream.reset(); while (stream.incrementToken()) { result.add(stream.getAttribute(CharTermAttribute.class).toString()); } } catch (IOException e) { … Read more

SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

SQL Server FTS is going to be easier to manage for a small deployment. Since FTS is integrated with the DB, the RDBMS handles updating the index automatically. The con here is that you don’t have an obvious scaling solution short of replicating DB’s. So if you don’t need to scale, SQL Server FTS is … Read more

Difference between BooleanClause.Occur.Must and BooleanClause.Occur.SHOULD in lucene

BooleanClause.Occur.SHOULD means that the clause is optional, whereas BooleanClause.Occur.Must means that the clause is compulsory. However, if a boolean query only has optional clauses, at least one clause must match for a document to appear in the results. For better control over what documents match a BooleanQuery, there is also a minimumShouldMatch parameter which lets … Read more

ElasticSearch – Searching For Human Names

First, I recreated your current configuration in Play: https://www.found.no/play/gist/867785a709b4869c5543 If you go there, switch to the “Analysis”-tab to see how the text is transformed: Note, for example that Heaney ends up tokenized as [hn, heanei] with the search_analyzer and as [HN, heanei] with the index_analyzer. Note the case-difference for the metaphone-term. Thus, that one is … Read more

Entity Extraction/Recognition with free tools while feeding Lucene Index

The problem you are facing in the ‘wicket’ example is called entity disambiguation, not entity extraction/recognition (NER). NER can be useful but only when the categories are specific enough. Most NER systems doesn’t have enough granularity to distinguish between a sport and a software project (both types would fall outside the typically recognized types: person, … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)