A CSV data extractor, that works across Databases

CSV is a very versatile, and compact, human readable file format. Each database comes with custom CSV data extractors, or you can use an ETL tool to extract CSV data. sql2csv is lightweight tool that can be used universally across databases, without the unnecessary heft of the ETL tool, and without database specific syntax.

 

ThoughtSpot loader

 

ThoughtSpot loader (tsload) is the fastest way of getting large amounts of data into ThoughtSpot. In this post, we explore how to create CSV files that tsload can then load into ThoughtSpot.

 

Our old friend csvkit

In one of the more popular technical blog articles on the ThoughtSpot site, Jon Avrach has written about CSV file problems and his go-to tool of choice csvkit in 5 Magic Fixes for the Most Common CSV File Problems. Here, we make use of a utility that is bundled with csvkit called sql2csv.

 

sql2csv Utility

You can get the executable using the Python pip3 utility.

pip3 install csvkit    

Or on Windows

pip install csvkit

 

This command installs the csvkit binaries

 

Other utilities that are part of csvkit

 

Utility

Type

One-line description

csvclean

Processing

Cleans a CSV file

csvjoin

Processing

Merges 2 CSV files (similar to an SQL JOIN)

csvsort

Processing

Sorts CSV files

csvstack

Processing

Groups the CSV files

csvcut

Processing

Filters and Truncates CSV files

csvgrep

Processing

Filter tabular data based on grep match

csvformat

Output and Analysis

Convert CSV to custom format (replace line endings, escape characters, delimiters)

csvlook

Output and Analysis

Converts CSV to markdown

csvjson

Output and Analysis

Converts CSV to JSON

csvpy

Output and Analysis

Loads CSV into a agate.csv.Reader  object for use within IPython

csvsql

Output and Analysis

Generate CREATE TABLE and INSERTs from CSV

csvstat

Output and Analysis

Generate statistics for CSV file

in2csv

Input

Converts tabular data like dbf, fixed, geojson, json, ndjson, xls, xlsx into CSV

sql2csv

Input

Exports database table data/query output to CSV

 

Demo Setup (SQLite database)

Using DB Browser for SQLite, I created the following schema, and added data

 

 

 

 

Exporting the CSV

From the command line/terminal window,

<path_to_csvkit>/sql2csv --db "sqlite:///test.db" --query "select * from emp" > emp.csv
shell>less emp.csv
EMPID,ENAME,SALARY,DEPTID
100,Scott,150,10
101,David,100,20
102,Jones,120,30

Yes, there are other ways of doing this, using Stored Procedures, or command line pipes, but these are specific to databases, OSs, flavors and versions.

In contrast, sql2csv is a common utility that can be used across databases.

 

Scheduling

A script that includes the sql2csv utility can be created and scheduled using cron, for example, 4am each weekday.

00 04 * * 1-5 /home/etluser/csv-export.sh

For other databases

Since this command uses SQL Alchemy, we can make use of connection strings and the wide coverage of SQL Alchemy across databases:

SQL Alchemy connection string examples

  • PostgreSQL : postgresql://scott:tiger@localhost/mydatabase
  • MySQL: mysql://scott:tiger@localhost/foo
  • Oracle (cx_oracle): oracle://scott:tiger@127.0.0.1:1521/sidname
  • SQL Server (pyODBC): mssql+pyodbc://scott:tiger@mydsn

From : SQL Alchemy - Redshift project

  • AWS Redshift : redshift+psycopg2://username@host.amazonaws.com:5439/database

From : SQL Alchemy - Snowflake project

  • Snowflake: snowflake://{user}:{password}@{account}/{database}/{schema}

See Also

Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
Like1 Follow
  • 1 Likes
  • 2 wk agoLast active
  • 20Views
  • 1 Following