{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Matrix factorization is a very interesting area of machine learning research. Formulating a problem as a 2D matrix $X$ to be decomposed into multiple matrices, which combine to return an approximation of $X$, can lead to state of the art results for many interesting problems. This core concept is the focus of compressive sensing, matrix completion, sparse coding, robust PCA, dictionary learning, and many other algorithms. One major website which shows many different types of matrix decomposition algorithms is the [Matrix Factorization Jungle, run by Igor Carron](https://sites.google.com/site/igorcarron2/matrixfactorizations). There has been a heavy focus on random projections in recent algorithms, which can often lead to increased stability and computationally efficient solutions.\n",
"\n",
"\n",
"Below is a link to the GoDec algorithm output, as applied to the \"Hall\" video (shown below) found in [this zip file](https://sites.google.com/site/godecomposition/SSGoDec.zip?attredirects=0), which is a surveillance tape taken from a mall. Using the GoDec algorithm, the background is almost completely subtracted from the noisy elements of people walking, while still capturing periodic background elements as part of the background. I have written code for both the GoDec and Robust PCA algorithms in numpy based on their Matlab equivalents. There are many datasets which can be found [here](http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html), and we will set up a simple download function for ease-of-access. Special thanks to [@kuantkid](https://github.com/kuantkid) for the PyRPCA repo, which was the inspiration to start and extend this work, and especially the idea of creating a demo video from PNGs which is *PRETTY. DANG. AWESOME*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interstellar Overdrive\n",
"----------------------"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" VIDEO"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import YouTubeVideo\n",
"YouTubeVideo('JgfK46RA8XY')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we want to download a video, so that we can compare the algorithmic result against the original video. The file is downloaded, if it does not already exist in the working directory. Next, it will create a directory of the same name, and unzip the file contents (Campus.zip to Campus/*filename*)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Video file ShoppingMall.zip already downloaded, continuing\n"
]
}
],
"source": [
"try:\n",
" from urllib2 import urlopen\n",
"except ImportError:\n",
" from urllib.request import urlopen\n",
"from scipy.io import loadmat, savemat\n",
"import os\n",
"\n",
"ext = {\"water\":'WaterSurface.zip',\n",
" \"fountain\":'Fountain.zip',\n",
" \"campus\":'Campus.zip',\n",
" \"escalator\": 'Escalator.zip',\n",
" \"curtain\": 'Curtain.zip',\n",
" \"lobby\": 'Lobby.zip',\n",
" \"mall\": 'ShoppingMall.zip',\n",
" \"hall\": 'hall.zip',\n",
" \"bootstrap\": 'Bootstrap.zip'}\n",
"\n",
"example = \"mall\"\n",
"\n",
"\n",
"def progress_bar_downloader(url, fname, progress_update_every=5):\n",
" #from http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python/22776#22776\n",
" u = urlopen(url)\n",
" f = open(fname, 'wb')\n",
" meta = u.info()\n",
" file_size = int(meta.get(\"Content-Length\"))\n",
" print(\"Downloading: %s Bytes: %s\" % (fname, file_size))\n",
" file_size_dl = 0\n",
" block_sz = 8192\n",
" p = 0\n",
" while True:\n",
" buffer = u.read(block_sz)\n",
" if not buffer:\n",
" break\n",
" file_size_dl += len(buffer)\n",
" f.write(buffer)\n",
" if (file_size_dl * 100. / file_size) > p:\n",
" status = r\"%10d [%3.2f%%]\" % (file_size_dl, file_size_dl * 100. / file_size)\n",
" print(status)\n",
" p += progress_update_every\n",
" f.close()\n",
"\n",
"\n",
"def get_video_clip(d):\n",
" #Download files from http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html\n",
" if os.path.exists('./' + d):\n",
" print('Video file %s already downloaded, continuing' % d)\n",
" return\n",
" else:\n",
" print('Video file %s not found, downloading' % d)\n",
" progress_bar_downloader(r'http://perception.i2r.a-star.edu.sg/BK_Model_TestData/' + d, d)\n",
"\n",
" \n",
"def bname(x): return x.split('.')[0]\n",
" \n",
"get_video_clip(ext[example])\n",
"\n",
"if not os.path.exists('./' + bname(ext[example])):\n",
" os.makedirs(bname(ext[example]))\n",
" os.system('unzip ' + ext[example] + ' -d ' + bname(ext[example]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code below will read in all the .bmp images downloaded and unzipped from the website, as well as converting to grayscale, scaling the result between 0 and 1. Eventually, I plan to do a \"full-color\" version of this testing, but for now the greyscale will have to suffice."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(20480, 1286)\n",
"128\n",
"160\n"
]
}
],
"source": [
"from scipy import misc\n",
"import numpy as np\n",
"from glob import glob\n",
"\n",
"def rgb2gray(rgb):\n",
" r, g, b = rgb[:, :, 0], rgb[:, :, 1], rgb[:, :, 2]\n",
" gray = 0.2989 * r + 0.5870 * g + 0.1140 * b\n",
" return gray / 255.\n",
"\n",
"fdir = bname(ext[example])\n",
"names = sorted(glob(fdir + \"/*.bmp\"))\n",
"d1, d2, channels = misc.imread(names[0]).shape\n",
"d1 = 128\n",
"d2 = 160\n",
"num = len(names)\n",
"X = np.zeros((d1, d2, num))\n",
"for n, i in enumerate(names):\n",
" X[:, :, n] = misc.imresize(rgb2gray(misc.imread(i).astype(np.double)) / 255., (d1, d2))\n",
" \n",
"X = X.reshape(d1 * d2, num)\n",
"clip = 100\n",
"print(X.shape)\n",
"print(d1)\n",
"print(d2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Robust PCA\n",
"----------\n",
"Robust Principal Component Analysis (PCA) is an extension of PCA. Rather than attempting to solve $X = L$, where $L$ is typically a low-rank approximation ($N \\times M$, vs. $N \\times P$, $M < P$), Robust PCA solves the factorization problem $X = L + S$, where $L$ is a low-rank approximation, and $S$ is a sparse component. By separating the factorization into two separate matrix components, Robust PCA makes a much better low-rank estimate $L$ on many problems.\n",
"\n",
"There are a [variety of algorithms](http://perception.csl.illinois.edu/matrix-rank/sample_code.html) to solve this optimization problem. The code below is an implementation of the Inexact Augmented Lagrangian Multiplier algorithm for Robust PCA which is identical to the equivalent [MATLAB code (download)](http://perception.csl.illinois.edu/matrix-rank/Files/inexact_alm_rpca.zip), or as near as I could make it. The functionality seems equivalent, and for relevant details please see the [paper](http://perception.csl.illinois.edu/matrix-rank/Files/Lin09-MP.pdf). This algorithm was chosen because according to the [timing results at the bottom of this page](http://perception.csl.illinois.edu/matrix-rank/sample_code.html), it was both the fastest and most accurate of the formulas listed. Though it appears to be fairly slow in our testing, it is fully believable that this is an implementation issue, since this code has not been specifically optimized for numpy. Due to this limitation, we clip the algorithm to the first few frames to save time."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"from numpy.linalg import norm, svd\n",
"\n",
"def inexact_augmented_lagrange_multiplier(X, lmbda=.01, tol=1e-3,\n",
" maxiter=100, verbose=True):\n",
" \"\"\"\n",
" Inexact Augmented Lagrange Multiplier\n",
" \"\"\"\n",
" Y = X\n",
" norm_two = norm(Y.ravel(), 2)\n",
" norm_inf = norm(Y.ravel(), np.inf) / lmbda\n",
" dual_norm = np.max([norm_two, norm_inf])\n",
" Y = Y / dual_norm\n",
" A = np.zeros(Y.shape)\n",
" E = np.zeros(Y.shape)\n",
" dnorm = norm(X, 'fro')\n",
" mu = 1.25 / norm_two\n",
" rho = 1.5\n",
" sv = 10.\n",
" n = Y.shape[0]\n",
" itr = 0\n",
" while True:\n",
" Eraw = X - A + (1 / mu) * Y\n",
" Eupdate = np.maximum(Eraw - lmbda / mu, 0) + np.minimum(Eraw + lmbda / mu, 0)\n",
" U, S, V = svd(X - Eupdate + (1 / mu) * Y, full_matrices=False)\n",
" svp = (S > 1 / mu).shape[0]\n",
" if svp < sv:\n",
" sv = np.min([svp + 1, n])\n",
" else:\n",
" sv = np.min([svp + round(.05 * n), n])\n",
" Aupdate = np.dot(np.dot(U[:, :svp], np.diag(S[:svp] - 1 / mu)), V[:svp, :])\n",
" A = Aupdate\n",
" E = Eupdate\n",
" Z = X - A - E\n",
" Y = Y + mu * Z\n",
" mu = np.min([mu * rho, mu * 1e7])\n",
" itr += 1\n",
" if ((norm(Z, 'fro') / dnorm) < tol) or (itr >= maxiter):\n",
" break\n",
" if verbose:\n",
" print(\"Finished at iteration %d\" % (itr)) \n",
" return A, E"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished at iteration 18\n",
"RPCA complete\n"
]
}
],
"source": [
"sz = clip\n",
"A, E = inexact_augmented_lagrange_multiplier(X[:, :sz])\n",
"A = A.reshape(d1, d2, sz) * 255.\n",
"E = E.reshape(d1, d2, sz) * 255.\n",
"#Refer to them by position desired for video demo later \n",
"savemat(\"./IALM_background_subtraction.mat\", {\"1\": A, \"2\": E})\n",
"print(\"RPCA complete\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"GoDec\n",
"-----\n",
"\n",
"The code below contains an implementation of the GoDec algorithm, which attempts to solve the problem $X = L + S + G$, with $L$ low-rank, $S$ sparse, and $G$ as a component of Gaussian noise. By allowing the decomposition to expand to 3 matrix components, the algorithm is able to more effectively differentiate the sparse component from the low-rank. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"from numpy.linalg import norm\n",
"from scipy.linalg import qr\n",
"\n",
"def wthresh(a, thresh):\n",
" #Soft wavelet threshold\n",
" res = np.abs(a) - thresh\n",
" return np.sign(a) * ((res > 0) * res)\n",
"\n",
"#Default threshold of .03 is assumed to be for input in the range 0-1...\n",
"#original matlab had 8 out of 255, which is about .03 scaled to 0-1 range\n",
"def go_dec(X, thresh=.03, rank=2, power=0, tol=1e-3,\n",
" max_iter=100, random_seed=0, verbose=True):\n",
" m, n = X.shape\n",
" if m < n:\n",
" X = X.T\n",
" m, n = X.shape\n",
" L = X\n",
" S = np.zeros(L.shape)\n",
" itr = 0\n",
" random_state = np.random.RandomState(random_seed) \n",
" while True:\n",
" Y2 = random_state.randn(n, rank)\n",
" for i in range(power + 1):\n",
" Y1 = np.dot(L, Y2)\n",
" Y2 = np.dot(L.T, Y1);\n",
" Q, R = qr(Y2, mode='economic')\n",
" L_new = np.dot(np.dot(L, Q), Q.T)\n",
" T = L - L_new + S\n",
" L = L_new\n",
" S = wthresh(T, thresh)\n",
" T -= S\n",
" err = norm(T.ravel(), 2)\n",
" if (err < tol) or (itr >= max_iter):\n",
" break \n",
" L += T\n",
" itr += 1\n",
" #Is this even useful in soft GoDec? May be a display issue...\n",
" G = X - L - S\n",
" if m < n:\n",
" L = L.T\n",
" S = S.T\n",
" G = G.T\n",
" if verbose:\n",
" print(\"Finished at iteration %d\" % (itr))\n",
" return L, S, G"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished at iteration 100\n",
"GoDec complete\n"
]
}
],
"source": [
"sz = clip\n",
"L, S, G = go_dec(X[:, :sz])\n",
"L = L.reshape(d1, d2, sz) * 255.\n",
"S = S.reshape(d1, d2, sz) * 255.\n",
"G = G.reshape(d1, d2, sz) * 255.\n",
"savemat(\"./GoDec_background_subtraction.mat\", {\"1\": L, \"2\": S, \"3\": G, })\n",
"print(\"GoDec complete\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A Momentary Lapse of Reason\n",
"---------------------------\n",
"\n",
"Now it is time to do something a little unreasonable - we can actually take all of this data, reshape it into a series of images, and plot it as a video inside the IPython notebook! The first step is to generate the frames for the video as .png files, as shown below."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Completed frame 0 of 100 for method IALM\n",
"Completed frame 25 of 100 for method IALM\n",
"Completed frame 50 of 100 for method IALM\n",
"Completed frame 75 of 100 for method IALM\n",
"Completed frame 0 of 100 for method GoDec\n",
"Completed frame 25 of 100 for method GoDec\n",
"Completed frame 50 of 100 for method GoDec\n",
"Completed frame 75 of 100 for method GoDec\n"
]
}
],
"source": [
"import os\n",
"import sys\n",
"import matplotlib.pyplot as plt\n",
"from scipy.io import loadmat\n",
"import numpy as np\n",
"from matplotlib import cm\n",
"import matplotlib\n",
"\n",
"#demo inspired by / stolen from @kuantkid on Github - nice work!\n",
"def mlabdefaults():\n",
" matplotlib.rcParams['lines.linewidth'] = 1.5\n",
" matplotlib.rcParams['savefig.dpi'] = 300 \n",
" matplotlib.rcParams['font.size'] = 22\n",
" matplotlib.rcParams['font.family'] = \"Times New Roman\"\n",
" matplotlib.rcParams['legend.fontsize'] = \"small\"\n",
" matplotlib.rcParams['legend.fancybox'] = True\n",
" matplotlib.rcParams['lines.markersize'] = 10\n",
" matplotlib.rcParams['figure.figsize'] = 8, 5.6\n",
" matplotlib.rcParams['legend.labelspacing'] = 0.1\n",
" matplotlib.rcParams['legend.borderpad'] = 0.1\n",
" matplotlib.rcParams['legend.borderaxespad'] = 0.2\n",
" matplotlib.rcParams['font.monospace'] = \"Courier New\"\n",
" matplotlib.rcParams['savefig.dpi'] = 200\n",
" \n",
"def make_video(alg, cache_path='/tmp/matrix_dec_tmp'):\n",
" name = alg\n",
" if not os.path.exists(cache_path):\n",
" os.mkdir(cache_path)\n",
" #If you generate a big \n",
" if not os.path.exists('%s/%s_tmp'%(cache_path, name)):\n",
" os.mkdir(\"%s/%s_tmp\"%(cache_path, name))\n",
" mat = loadmat('./%s_background_subtraction.mat'%(name))\n",
" org = X.reshape(d1, d2, X.shape[1]) * 255.\n",
" fig = plt.figure()\n",
" ax = fig.add_subplot(111)\n",
" usable = [x for x in sorted(mat.keys()) if \"_\" not in x][0]\n",
" sz = min(org.shape[2], mat[usable].shape[2])\n",
" for i in range(sz):\n",
" ax.cla()\n",
" ax.axis(\"off\")\n",
" ax.imshow(np.hstack([mat[x][:, :, i] for x in sorted(mat.keys()) if \"_\" not in x] + \\\n",
" [org[:, :, i]]), cm.gray)\n",
" fname_ = '%s/%s_tmp/_tmp%03d.png'%(cache_path, name, i)\n",
" if (i % 25) == 0:\n",
" print('Completed frame', i, 'of', sz, 'for method', name)\n",
" fig.tight_layout()\n",
" fig.savefig(fname_, bbox_inches=\"tight\")\n",
" #Write out an mp4 and webm video from the png files. -r 5 means 5 frames a second\n",
" #libx264 is h.264 encoding, -s 160x130 is the image size\n",
" #You may need to sudo apt-get install libavcodec\n",
" plt.close()\n",
"\n",
" num_arrays = na = len([x for x in mat.keys() if \"_\" not in x])\n",
" cdims = (na * d1, d2)\n",
" cmd_h264 = \"ffmpeg -y -r 10 -i '%s/%s_tmp/_tmp%%03d.png' -c:v libx264 \" % (cache_path, name) + \\\n",
" \"-s %dx%d -preset ultrafast -pix_fmt yuv420p %s_animation.mp4\" % (cdims[0], cdims[1], name)\n",
" cmd_vp8 = \"ffmpeg -y -r 10 -i '%s/%s_tmp/_tmp%%03d.png' -c:v libvpx \" % (cache_path, name) + \\\n",
" \"-s %dx%d -preset ultrafast -pix_fmt yuv420p %s_animation.webm\" % (cdims[0], cdims[1], name)\n",
" os.system(cmd_h264)\n",
" os.system(cmd_vp8)\n",
" \n",
"if __name__ == \"__main__\":\n",
" mlabdefaults()\n",
" all_methods = ['IALM', 'GoDec']\n",
" for name in all_methods:\n",
" make_video(name);"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Background is generated from this file: mall\n"
]
}
],
"source": [
"print(\"Background is generated from this file:\", example)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Echoes\n",
"------\n",
"\n",
"The code below will display HTML5 video for each of the videos generated in the previos step, and embed it in the IPython notebook. There are \"echoes\" of people, which are much more pronounced in the Robust PCA video than the GoDec version, likely due to the increased flexibility of an independent Gaussian term. Overall, the effect is pretty cool though not mathematically as good as the GoDec result."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from IPython.display import HTML\n",
"from base64 import b64encode\n",
"\n",
"def html5_video(alg, frames):\n",
" #This *should* support all browsers...\n",
" framesz = 250\n",
" info = {\"mp4\": {\"ext\":\"mp4\", \"encoded\": '', \"size\":(frames * framesz, framesz)}}\n",
" html_output = []\n",
" for k in info.keys():\n",
" f = open(\"%s_animation.%s\" % (alg, info[k][\"ext\"]), \"rb\").read()\n",
" encoded = b64encode(f).decode('ascii')\n",
" video_tag = '' % (k, encoded)\n",
" html_output.append(video_tag)\n",
" return HTML(data=''.join(html_output))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If these videos freeze for some reason, just hit refresh and they should start playing."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"html5_video(\"IALM\", 3)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"html5_video(\"GoDec\", 4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Final Cut\n",
"-------------\n",
"\n",
"Though the results are definitely not identical to the video results, they are similar enough to the benchmark that I am satisfied for now. Future work in this area will involve more decomposition algorithms, dictionary learning, and matrix completion. Eventually, I would like to get this into scikit-learn format, and post as a gist or contribute to the codebase.\n",
"\n",
"kk"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}