Python – “case insensitive” in a string or “case ignore”
Series.str.contains has a case parameter that is True by default. Set it to False to do a case insensitive match. df2 = df1[‘company_name’].str.contains(“apple”, na=False, case=False)
Series.str.contains has a case parameter that is True by default. Set it to False to do a case insensitive match. df2 = df1[‘company_name’].str.contains(“apple”, na=False, case=False)
You can use IPython’s display function to achieve that: from IPython.display import display display(d)
AFAIk you need to call withColumn twice (once for each new column). But if your udf is computationally expensive, you can avoid to call it twice with storing the “complex” result in a temporary column and then “unpacking” the result e.g. using the apply method of column (which gives access to the array element). Note … Read more
You can use this: from IPython.display import display for i in df_list: display(i) Learn more tricks about rich and flexible formatting at Jupyter Notebook Viewer
Solution: header = True for chunk in chunks: chunk.to_csv(os.path.join(folder, new_folder, “new_file_” + filename), header=header, cols=[[‘TIME’,’STUFF’]], mode=”a”) header = False Notes: The mode=”a” tells pandas to append. We only write a column header on the first chunk.
It is a precedence operator issue. You should add extra parenthesis to make your multi condition test working: d[(d[‘x’]>2) & (d[‘y’]>7)] This section of the tutorial you mentioned shows an example with several boolean conditions and the parenthesis are used.
Note for pandas versions < 0.20.0: I will be using df.melt(…) for my examples, but you will need to use pd.melt(df, …) instead. Documentation references: Most of the solutions here would be used with melt, so to know the method melt, see the documentation explanation. Unpivot a DataFrame from wide to long format, optionally leaving … Read more
Pretty one-liner 🙂 new_df = df.idxmax(axis=1)
Just assign directly a new index array: df.index = np.arange(1, len(df) + 1) Example: In [151]: df = pd.DataFrame({‘a’:np.random.randn(5)}) df Out[151]: a 0 0.443638 1 0.037882 2 -0.210275 3 -0.344092 4 0.997045 In [152]: df.index = np.arange(1,len(df)+1) df Out[152]: a 1 0.443638 2 0.037882 3 -0.210275 4 -0.344092 5 0.997045 Or just: df.index = df.index … Read more