Exploring kdb+ data in IPython notebooks

Alexander Belopolsky

Enlightenment Research, LLC

What is kdb+?


  • Streaming, real-time and historical data in one platform
  • Runs on MacOS X, Linux, Solaris, and Windows
  • Runs on commodity hardware
  • Columnar store for optimal search and join performance
  • Very fast bulk loading of flat files
  • Compression (WebSockets, IPC and on-disk)
  • Multi-Core / Multi-Proc / Multi-Threading / Multi-Server
  • Comes with Q – array, functional, query and time-series language

In [1]:
%load_ext pyq.magic

The Q programming language

(comes with kdb+)

A list of conforming dictionaries is a table

In [2]:
%%q
(`time`price!(09:31;53.12);`time`price!(09:33;53.13);
 `time`price!(09:34;53.13);`time`prIce!(09:39;53.11))
`time`price!(09:31;53.12)
`time`price!(09:33;53.13)
`time`price!(09:34;53.13)
`time`prIce!(09:39;53.11)
In [3]:
%%q
(`time`price!(09:31;53.12);`time`price!(09:33;53.13);
 `time`price!(09:34;53.13);`time`price!(09:39;53.11))
time  price
-----------
09:31 53.12
09:33 53.13
09:34 53.13
09:39 53.11

Tables are first class objects in Q

In [4]:
%%q
p:([]time:09:31 09:33 09:34 09:39;price:53.12 53.13 53.13 53.11)
In [5]:
q.p[2]
Out[5]:
time | 09:34
price| 53.13
In [6]:
%%q
`time xkey p
time | price
-----| -----
09:31| 53.12
09:33| 53.13
09:34| 53.13
09:39| 53.11

The Q programming language

(comes with kdb+)

A table is a flip of a dictionary

In [7]:
%%q
d:`time`price!(09:31 09:33 09:34 09:39;53.12 53.13 53.13 53.11); d
time | 09:31 09:33 09:34 09:39
price| 53.12 53.13 53.13 53.11
In [8]:
%q flip d  / "flip" means transpose
Out[8]:
time  price
-----------
09:31 53.12
09:33 53.13
09:34 53.13
09:39 53.11

What is PyQ?

World fastest kdb+ to Python convertor

In [9]:
%%q
a:til 100
b:til 10000
c:til 1000000
In [10]:
%timeit x = q.a
1000000 loops, best of 3: 1.72 µs per loop
In [11]:
%timeit x = q.b
1000000 loops, best of 3: 1.72 µs per loop
In [12]:
%timeit x = q.c
1000000 loops, best of 3: 1.72 µs per loop

What is PyQ?

World fastest kdb+ to numpy convertor

In [13]:
from numpy import asarray
In [14]:
%timeit x = asarray(q.a) 
100000 loops, best of 3: 3.2 µs per loop
In [15]:
%timeit x = asarray(q.b) 
100000 loops, best of 3: 3.3 µs per loop
In [16]:
%timeit x = asarray(q.c) 
100000 loops, best of 3: 3.21 µs per loop
In [17]:
x = asarray(q.b)
x.reshape((10, 10, 10, 10))[2,3].diagonal()[:5]
Out[17]:
array([2300, 2311, 2322, 2333, 2344], dtype=int64)

Where's the rub?

(There is no conversion)

In [18]:
c = %q til 10000
x = asarray(c)
In [19]:
print('before:', x[1232:1237], c[1234])
x.reshape((10, 10, 10, 10))[1,2,3,4] = 42
print(' after:', x[1232:1237], c[1234])
before: [1232 1233 1234 1235 1236] 1234
 after: [1232 1233   42 1235 1236] 42

PyQ – Python language for kdb+

(Put your Python where your Data is)

When you use PyQ, your Q code may look like Python

(Look Boss, – no Q!)

In [20]:
from datetime import time
rnd = q('?'); dct = q('!'); n = 1000; nrnd = rnd(n)
In [21]:
sym = nrnd('1').upper
prc = nrnd(10.)
tme = time(9, 30) + nrnd(time(6, 30))
price = dct(['sym', 'time', 'price'], [sym, tme, prc]).flip.asc
In [22]:
price.show(geometry=(10, 40))
sym time         price   
-------------------------
A   09:34:59.757 6.263786
A   10:21:12.507 8.788524
A   10:30:49.711 9.01325 
A   10:31:15.960 5.898603
A   10:31:56.146 6.44946 
A   10:31:57.349 5.252813
A   10:33:49.597 2.861031
..

.. or like Q

(Look Boss, – no Python!)

In [23]:
%%q
nrnd:1000?
Price:asc flip`sym`time`price!
 (upper nrnd`1;09:30t+nrnd 06:30t;nrnd 10f)
In [24]:
%q 8#Price
Out[24]:
sym time         price    
--------------------------
A   09:32:04.768 0.6438468
A   09:34:44.866 9.111832 
A   09:34:53.651 0.1672449
A   09:55:43.080 7.434591 
A   10:09:41.257 1.219494 
A   10:16:02.087 6.915277 
A   10:19:39.071 5.303752 
A   10:21:53.287 0.445254 

.. or like a mess

(Look Boss, – I am done!)

In [25]:
price = q("asc flip`sym`time`price!", [sym, tme, prc])
price.first
Out[25]:
sym  | `A
time | 09:34:59.757
price| 6.263786

for best results, – don't mix Q and Python

(except in an IPython notebook)

In [26]:
get_price = %q {[s;t]exec last price from Price where sym=s,time<t}
In [27]:
get_closing_price = get_price(t=time(16))
get_closing_price('A')
Out[27]:
1.719017

PyQ plays well with others

(Did I hear "memoryview"?)

  • numpy

  • matplotlib

  • pandas

  • bokeh

  • ipython

matplotlib

(just plot it)

In [28]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
In [29]:
%%q
n:1000;m:30;pi:acos -1
z:reciprocal[n]*2*pi*til n
x:(r:z*sin z)*sin m*z
y:r*cos m*z
In [30]:
from mpl_toolkits.mplot3d import axes3d
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot3D(q.x, q.y, q.z);

numpy

In [31]:
import numpy
q.n = n = 1000
q.noise = numpy.random.normal(size=n**2)
In [32]:
M = numpy.reshape(q.noise, (n, -1))
M += M.T.copy()
In [33]:
plt.matshow(M[:10,:10]);

the data stay in kdb+

In [34]:
%%q
m:noise (n*i)+/:i:til 5; m
-4.931556  -1.109778   1.242104    1.108081   -0.1486963
-1.109778  -2.722396   -0.07218326 -0.9140662 0.2160938 
1.242104   -0.07218326 0.7140347   0.6313393  1.60373   
1.108081   -0.9140662  0.6313393   -2.684218  -1.563344 
-0.1486963 0.2160938   1.60373     -1.563344  1.005066  
In [35]:
q.m == q.m.flip
Out[35]:
True

bokeh

In [36]:
e = numpy.linalg.eigvalsh(M)
In [37]:
#%load_ext bokeh_magic
from bokeh.plotting import output_notebook
output_notebook(force=True)
#%bokeh -n
# from bokeh import load_notebook
# load_notebook(force=True)