How to Rock CSV files with SQLite

Hey, have you ever had to work with large quantities of CSV data (read: Excel) in a structured way? Has that experience ever made you want to smash something in a fit of rage? Have you ever wondered why you’re wasting your precious time on this earth reading half-arsed Excel tutorials from 2001 to run a simple query across some data set? If you’re anything like me, the answer is yes. Here’s how to banish that problem from your world forever (at least the CSV-related bits).

In this post, I will describe how to import CSV files into SQlite 3 and do some SQL-y things to them (also known as “rocking” them). There are many other tutorials on how to do this sort of thing; this one is mine. It assumes you have SQLite already installed on your computer.

I’m using the following version:

$ sqlite3 --version
3.7.12 2012-04-03 19:43:07 86b8481be7e76cccc92d14ce762d21bfb69504af

The Data Set

Here’s the CSV data set we’ll be working with – the columns are id, advertiser, line_item, campaign, impressions, and clicks. It’s in a file called tiny-db.csv.

1,  A          , ACME Axe Co.                        , Midwest Lumberjack Competitors           ,        1792 ,     21 
2, A          , ACME Axe Co.                        , Northwestern Logging Equipment Intenders ,        4355 ,     34 
3, A          , Bar-None Ice Cream Cones            , New York Summer Heatwave                 ,      78231 ,   1408 
4, B          , Candles R-US                        , Home Decor Enthusiasts                   ,        2843 ,     65 
5, B          , Doggie Accessories                  , Southwestern Evening Weekends            ,        9486 ,    123 
6, C          , Evergreen Synthetic Xmas Trees      , Sun Belt Homeowners                      ,        2238 ,     46 
7, D          , Frog Hollow Athletic Socks          , Back-to-School                           ,        8198 ,    214 
8, D          , Frog Hollow Athletic Socks          , Practical Holiday  Shoppers              ,         103 ,     12 
9, E          , Best Soap & Candle                  , Clean Hands Stay Healthy                ,        3883 ,     41 
10, E          , Best Soap & Candle                  , Clean Hands Make Good Neighbors          ,        1292 ,    183 
11, E          , Best Soap & Candle                  , Bath Salt Relaxation                     ,         902 ,     81 
12, E          , Best Soap & Candle                  , Bath Salt Relaxation (plus Candles!)     ,        5352 ,    212 
13 , F          , Farmer Snowplows                    , Northeast Winter Storms                  ,      12448 ,    256 
14, F          , Farmer Snowplows                    , Upper Midwest Winter Storms              ,      23984 ,    782 
15, F          , Farmer Snowplows                    , Rocky Mountain Winter Storms             ,       8764 ,    128 
16, G          , Gretchen's Organic Chocolates       , East Coast NPR Podcast Listeners         ,      48996 ,    973 
17, H          , Hap's Go-Kart Track and Petting Zoo , City Folks; Country Weekends             ,        1108 ,     87 
18, H          , Hap's Go-Kart Track and Petting Zoo , Go-Kart Enthusiast Forums (Weekend)      ,        1872 ,    116 

Don’t worry too much about what those terms mean, they mostly have to do with online advertising stuff and aren’t that relevant, except to say that:

  • Impressions are those moments when a person views an ad on a website
  • Clicks are recorded when people click on advertisements
  • I totally made them up (this should be obvious from the names, but still…)

Importing the CSV File

Here’s how to get the data into sqlite: first you have to make a table in the database that matches the columns in your CSV file. Then, set the comma as your separator, and import the data. Note that you may run into issues with fields that have commas inside them such as “Hello, World!”; it’s best to clean them up somehow beforehand.

rloveland:code rloveland$ sqlite3 
SQLite version 3.7.12 2012-04-03 19:43:07
Enter ".help" for instructions
Enter SQL statements terminated with a ";"

sqlite> create table tiny (id integer, advertiser text, line_item text, campaign text, impressions integer, clicks integer);

sqlite> .separator ","

sqlite> .import tiny-db.csv tiny

Querying the Data Set

Now we can run SQL statements on the data.

How many rows had more than 9000 impressions?

sqlite> select * from tiny where impressions > 9000;

3, A          , Bar-None Ice Cream Cones            , New York Summer Heatwave                 ,78231,1408
5, B          , Doggie Accessories                  , Southwestern Evening Weekends            ,9486,123
13, F          , Farmer Snowplows                    , Northeast Winter Storms                  ,12448,256
14, F          , Farmer Snowplows                    , Upper Midwest Winter Storms              ,23984,782
16, G          , Gretchen's Organic Chocolates       , East Coast NPR Podcast Listeners         ,48996,973

How many rows had 10000 impressions or more? Can we order them by campaign?

sqlite> select * from tiny where impressions >= 10000 group by campaign;

16, G          , Gretchen's Organic Chocolates       , East Coast NPR Podcast Listeners         ,48996,973
3, A          , Bar-None Ice Cream Cones            , New York Summer Heatwave                 ,78231,1408
13, F          , Farmer Snowplows                    , Northeast Winter Storms                  ,12448,256
14, F          , Farmer Snowplows                    , Upper Midwest Winter Storms              ,23984,782

Using Full Text Search

Finally, let’s set up full text searching. We have to use something called a `virtual’ table that mirrors our existing table.

sqlite> create virtual table tiny_fts3 using fts4 (id, advertiser, line_item, campaign, impressions, clicks);

sqlite> insert into tiny_fts3(id, advertiser, line_item, campaign, impressions, clicks) select id, advertiser, line_item, campaign, impressions, clicks from tiny;

Let’s run a few text searches using the MATCH keyword to make sure it works:

sqlite> select * from tiny_fts3 where line_item match('Acme');

1,  A          , ACME Axe Co.                        , Midwest Lumberjack Competitors           ,1792,21
2, A          , ACME Axe Co.                        , Northwestern Logging Equipment Intenders ,4355,34

sqlite> select * from tiny_fts3 where line_item match('Candle');

9, E          , Best Soap & Candle                  , Clean Hands Stay Healthy                ,3883,41
10, E          , Best Soap & Candle                  , Clean Hands Make Good Neighbors          ,1292,183
11, E          , Best Soap & Candle                  , Bath Salt Relaxation                     ,902,81
12, E          , Best Soap & Candle                  , Bath Salt Relaxation (plus Candles!)     ,5352,212

Conclusion

That’s pretty much the whole show. Now go forth and become the master of all CSVs you survey!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s