Filename search with ElasticSearch

You have various problems with what you pasted: 1) Incorrect mapping When creating the index, you specify: “mappings”: { “files”: { But your type is actually file, not files. If you checked the mapping, you would see that immediately: curl -XGET ‘http://127.0.0.1:9200/files/_mapping?pretty=1’ # { # “files” : { # “files” : { # “properties” : … Read more

Understanding the `ngram_range` argument in a CountVectorizer in sklearn

Setting the vocabulary explicitly means no vocabulary is learned from data. If you don’t set it, you get: >>> v = CountVectorizer(ngram_range=(1, 2)) >>> pprint(v.fit([“an apple a day keeps the doctor away”]).vocabulary_) {u’an’: 0, u’an apple’: 1, u’apple’: 2, u’apple day’: 3, u’away’: 4, u’day’: 5, u’day keeps’: 6, u’doctor’: 7, u’doctor away’: 8, u’keeps’: … Read more

Python: Reducing memory usage of dictionary

I cannot offer a complete strategy that would help improve memory footprint, but I believe it may help to analyse what exactly is taking so much memory. If you look at the Python implementation of dictionary (which is a relatively straight-forward implementation of a hash table), as well as the implementation of the built-in string … Read more

Simple implementation of N-Gram, tf-idf and Cosine similarity in Python

Check out NLTK package: http://www.nltk.org it has everything what you need For the cosine_similarity: def cosine_distance(u, v): “”” Returns the cosine of the angle between vectors v and u. This is equal to u.v / |u||v|. “”” return numpy.dot(u, v) / (math.sqrt(numpy.dot(u, u)) * math.sqrt(numpy.dot(v, v))) For ngrams: def ngrams(sequence, n, pad_left=False, pad_right=False, pad_symbol=None): “”” … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)