Parse a csv using awk and ignoring commas inside a field

gawk -vFPAT='[^,]*|”[^”]*”‘ ‘{print $1 “,” $3}’ | sort | uniq This is an awesome GNU Awk 4 extension, where you define a field pattern instead of a field-separator pattern. Does wonders for CSV. (docs) ETA (thanks mitchus): To remove the surrounding quotes, gsub(“^\”|\”$”,””,$3); if there’s more fields than just $3 to process that way, just … Read more

Spark – How to write a single csv file WITHOUT folder?

A possible solution could be convert the Spark dataframe to a pandas dataframe and save it as csv: df.toPandas().to_csv(“<path>/<filename>”) EDIT: As caujka or snark suggest, this works for small dataframes that fits into driver. It works for real cases that you want to save aggregated data or a sample of the dataframe. Don’t use this … Read more

Choosing between tsv and csv

TSV is a very efficient for Javascript/Perl/Python to process, without losing any typing information, and also easy for humans to read. The format has been supported in 4store since its public release, and it’s reasonably widely used. The way I look at it is: CSV is for loading into spreadsheets, TSV is for processing by … Read more

Is there any way in Elasticsearch to get results as CSV file in curl API?

I’ve done just this using cURL and jq (“like sed, but for JSON”). For example, you can do the following to get CSV output for the top 20 values of a given facet: $ curl -X GET ‘http://localhost:9200/myindex/item/_search?from=0&size=0’ -d ‘ {“from”: 0, “size”: 0, “facets”: { “sourceResource.subject.name”: { “global”: true, “terms”: { “order”: “count”, “size”: … Read more

Reading csv files with quoted fields containing embedded commas

I noticed that your problematic line has escaping that uses double quotes themselves: “32 XIY “”W”” JK, RE LK” which should be interpreter just as 32 XIY “W” JK, RE LK As described in RFC-4180, page 2 – If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped … Read more

Delete every other line in notepad++

I’m not sure Notepad++ is the best tool for this, but using the Power of Regex, we should be able to do it. Open the replace menu, fill in ([^\n]*\n)[^\n]*\n in the “Find what” box and $1 in the “Replace with” box. Then select regular expression for the search mode, click replace all and every … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)