Finding matching keys in two large dictionaries and doing it fast

Use sets, because they have a built-in intersection method which ought to be quick: myRDP = { ‘Actinobacter’: ‘GATCGA…TCA’, ‘subtilus sp.’: ‘ATCGATT…ACT’ } myNames = { ‘Actinobacter’: ‘8924342’ } rdpSet = set(myRDP) namesSet = set(myNames) for name in rdpSet.intersection(namesSet): print name, myNames[name] # Prints: Actinobacter 8924342

How to plot a gene graph for a DNA sequence say ATGCCGCTGCGC?

I did not previously know of Mark McClure’s blog about Chaos Game representation of gene sequences, but it reminded me of an article by Jose Manuel GutiĆ©rrez (The Mathematica Journal Vol 9 Issue 2), which also gives a chaos game algorithm for an IFS using (the four bases of) DNA sequences. A detailed description may … Read more

How to call module written with argparse in iPython notebook

An alternative to use argparse in Ipython notebooks is passing a string to: args = parser.parse_args() (line 303 from the git repo you referenced.) Would be something like: parser = argparse.ArgumentParser( description=’Searching longest common substring. ‘ ‘Uses Ukkonen\’s suffix tree algorithm and generalized suffix tree. ‘ ‘Written by Ilya Stepanov (c) 2013’) parser.add_argument( ‘strings’, metavar=”STRING”, … Read more

How much storage would be required to store a human genome?

If you trust such things, here is what Wikipedia claims (from http://en.wikipedia.org/wiki/Human_genome#Information_content): The 2.9 billion base pairs of the haploid human genome correspond to a maximum of about 725 megabytes of data, since every base pair can be coded by 2 bits. Since individual genomes vary by less than 1% from each other, they can … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)