How to get a Token from a Lucene TokenStream?

Yeah, it’s a little convoluted (compared to the good ol’ way), but this should do it: TokenStream tokenStream = analyzer.tokenStream(fieldName, reader); OffsetAttribute offsetAttribute = tokenStream.getAttribute(OffsetAttribute.class); TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class); while (tokenStream.incrementToken()) { int startOffset = offsetAttribute.startOffset(); int endOffset = offsetAttribute.endOffset(); String term = termAttribute.term(); } Edit: The new way According to Donotello, TermAttribute has been … Read more

using OR and NOT in solr query

I don’t know why that doesn’t work, but this one is logically equivalent and it does work: -(myField:superneat AND -myOtherField:somethingElse) Maybe it has something to do with defining the same field twice in the query… Try asking in the solr-user group, then post back here the final answer!

How does Lucene work

Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. Hypothetically, an SQL unordered index on a varchar field could be just as fast, and in … Read more

Comparison of Lucene Analyzers

In general, any analyzer in Lucene is tokenizer + stemmer + stop-words filter. Tokenizer splits your text into chunks, and since different analyzers may use different tokenizers, you can get different output token streams, i.e. sequences of chunks of text. For example, KeywordAnalyzer you mentioned doesn’t split the text at all and takes all the … Read more

How does lucene index documents?

In a nutshell, Lucene builds an inverted index using Skip-Lists on disk, and then loads a mapping for the indexed terms into memory using a Finite State Transducer (FST). Note, however, that Lucene does not (necessarily) load all indexed terms to RAM, as described by Michael McCandless, the author of Lucene’s indexing system himself. Note … Read more

Elasticsearch vs Cassandra vs Elasticsearch with Cassandra

One of our applications uses data that is stored into both Cassandra and ElasticSearch. We use Cassandra to access those records whenever we can, and have data duplicated into query tables designed to adhere to specific application-side requests. For a more liberal search than our query tables can allow, ElasticSearch performs that functionality nicely. We … Read more

Difference between solr and lucene

@darkheir: Lucene and Solr are 2 differents Apache projects that are made to work together, I don’t understand what is the aim of each project. Solr uses Lucene under the hood. Lucene has no clue about the Solr API. Lucene is a powerful search engine framework that lets us add search capability to our application. … Read more

Choosing a stand-alone full-text search server: Sphinx or SOLR? [closed]

I’ve been using Solr successfully for almost 2 years now, and have never used Sphinx, so I’m obviously biased. However, I’ll try to keep it objective by quoting the docs or other people. I’ll also take patches to my answer 🙂 Similarities: Both Solr and Sphinx satisfy all of your requirements. They’re fast and designed … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)