{"id":1611,"date":"2020-09-16T14:28:31","date_gmt":"2020-09-16T14:28:31","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/09\/16\/autograd-the-best-machine-learning-library-youre-not-using\/"},"modified":"2020-09-16T14:28:31","modified_gmt":"2020-09-16T14:28:31","slug":"autograd-the-best-machine-learning-library-youre-not-using","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/09\/16\/autograd-the-best-machine-learning-library-youre-not-using\/","title":{"rendered":"Autograd: The Best Machine Learning Library You\u2019re Not Using?"},"content":{"rendered":"<div id=\"post-\">\n<h3><strong>Autograd: The Missing Machine Learning Library<\/strong><\/h3>\n<p>\u00a0<\/p>\n<h3>Wait, people use libraries other than TensorFlow and PyTorch?<\/h3>\n<p>\u00a0<br \/>Ask a group of deep learning practitioners for their programming language of choice and you\u2019ll undoubtedly hear a lot about Python. Ask about their go-to machine learning library, on the other hand, and you\u2019re likely to get a picture of a two library system with a mix of TensorFlow and PyTorch. While there are plenty of people that may be familiar with both, in general commercial applications in machine learning (ML) tend to be dominated by the use of TensorFlow, while research projects in artificial intelligence\/ML\u00a0<a href=\"https:\/\/thegradient.pub\/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry\/\" rel=\"noopener noreferrer\" target=\"_blank\">mostly use PyTorch<\/a>. Although there\u2019s significant convergence between the two libraries with the introduction of eager execution by default in\u00a0<a href=\"https:\/\/blog.exxactcorp.com\/tensorflow-2-0-dynamic-readable-and-highly-extended\/\" rel=\"noopener noreferrer\" target=\"_blank\">TensorFlow 2.0<\/a>\u00a0<a href=\"https:\/\/blog.tensorflow.org\/2019\/09\/tensorflow-20-is-now-available.html\" rel=\"noopener noreferrer\" target=\"_blank\">released last year<\/a>, and the availability of building static executable models using\u00a0<a href=\"https:\/\/pytorch.org\/docs\/master\/jit.html\" rel=\"noopener noreferrer\" target=\"_blank\">Torchscript<\/a>, most seem to stick to one or the other for the most part.<\/p>\n<p>While the general consensus seems to be that you should pick TensorFlow for its better deployment and edge support if you want to join a company, and PyTorch for flexibility and readability if you want to work in academic research, there\u2019s more to the world of AI\/ML libraries than just PyTorch and TensorFlow. Just like there\u2019s more to AI\/ML than just deep learning. In fact, the gradients and tensor computations powering deep learning promise to have a wide-ranging impact in fields ranging from physics to biology. While we would bet that the so-called shortage of ML\/AI researchers is exaggerated (and who wants to dedicate their most creative years to\u00a0<a href=\"https:\/\/www.fastcompany.com\/3008436\/why-data-god-jeffrey-hammerbacher-left-facebook-found-cloudera\" rel=\"noopener noreferrer\" target=\"_blank\">maximizing ad engagement<\/a>\u00a0and recommending more addictive newsfeeds?), we expect that the tools of differentiable programming will be increasingly valuable to a wide variety of professionals for the foreseeable future.<\/p>\n<p>\u00a0<\/p>\n<h3>Differentiable Computing is Bigger than Deep Learning<\/h3>\n<p>\u00a0<br \/>Deep learning, the use of many-layered artificial neural networks very loosely based on ideas about computation in mammalian brains, is well known for its impacts on fields like computer vision and natural language processing. We\u2019ve also seen that many of the lessons in hardware and software developed alongside deep learning in the past decade (gradient descent, function approximation, and accelerated tensor computations) have found interesting applications in the absence of neural networks.<\/p>\n<p>Automatic differentiation and\u00a0<a href=\"https:\/\/pennylane.ai\/qml\/demos\/tutorial_qubit_rotation.html\" rel=\"noopener noreferrer\" target=\"_blank\">gradient descent over the parameters of quantum circuits<\/a>\u00a0offers meaningful utility for quantum computing in the era of Noisy Intermediate-Scale Quantum (NISQ) computing devices (<i>i.e.<\/i>\u00a0quantum computing devices that are available now). The penultimate step in\u00a0<a href=\"https:\/\/blog.exxactcorp.com\/deepminds-protein-folding-upset\/\" rel=\"noopener noreferrer\" target=\"_blank\">DeepMind\u2019s impressive upset at the CASP13<\/a>\u00a0protein folding prediction conference and competition used gradient descent applied directly over predicted amino acid positions, rather than a deep neural network as the Google Alphabet subsidiary is well known for. These are just a few examples of the power of differentiable programming unbound by the paradigm of artificial neurons.<\/p>\n<p><img alt=\"Deep learning can be categorized as a subspace of the more general differentiable programming\" class=\"aligncenter\" src=\"https:\/\/blog.exxactcorp.com\/wp-content\/uploads\/2020\/07\/deep_diff-e1594151312234-1024x489.jpg\" width=\"100%\"><br \/><i>Deep learning can be categorized as a subspace of the more general differentiable programming. Deep neuroevolution refers to the optimization of neural networks by selection, without explicit differentiation or gradient descent.<\/i><br \/>\u00a0<\/p>\n<p>Differentiable programming is a broader programming paradigm that encompasses most of deep learning, excepting gradient-free optimization methods such as neuroevolution\/evolutionary algorithms. Yann LeCun, Chief AI Scientist at Facebook, touted the possibilities of differentiable programming in a\u00a0<a href=\"https:\/\/www.facebook.com\/yann.lecun\/posts\/10155003011462143?_fb_noscript=1\" rel=\"noopener noreferrer\" target=\"_blank\">Facebook post<\/a>\u00a0(content\u00a0<a href=\"https:\/\/gist.github.com\/halhenke\/872708ccea42ee8cafd950c6c2069814\" rel=\"noopener noreferrer\" target=\"_blank\">mirrored in a Github gist<\/a>). To hear LeCun tell it, differentiable programming is little more than a rebranding of modern deep learning, incorporating dynamic definitions of neural networks with loops and conditionals.<\/p>\n<p>I would argue that the consequences of widespread adoption of differentiable programming are closer to what Andrej Karpathy describes as\u00a0<a href=\"https:\/\/medium.com\/@karpathy\/software-2-0-a64152b37c35\" rel=\"noopener noreferrer\" target=\"_blank\">\u201cSoftware 2.0\u201d<\/a>, although he also limits his discussion largely to neural networks. It\u2019s reasonable to argue that software 2.0\/differentiable programming is a broader paradigm in its entirety than either LeCun or Karpathy described. Differentiable programming represents a generalization beyond the constraint of neural networks as function approximators to facilitate gradient-based optimization algorithms for a wide range of systems. If there is a Python library that is emblematic of the simplicity, flexibility, and utility of differentiable programming it has to be Autograd.<\/p>\n<p>\u00a0<\/p>\n<h3>Combining Deep Learning with Differentiable Programming<\/h3>\n<p>\u00a0<br \/>Differentiating with respect to arbitrary physical simulations and mathematical primitives presents opportunities for solutions where deep neural networks are inefficient or ineffective. That\u2019s not to say you should throw away all your deep learning intuition and experience. Rather, the most impressive solutions will combine elements of deep learning with the broader capabilities of differentiable programming, such as the work of\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1611.01652\" rel=\"noopener noreferrer\" target=\"_blank\">Degrave et al. 2018<\/a>, whose authors combined a differentiable physics engine with a neural network controller to solve robotic control tasks.<\/p>\n<p>Essentially they extended the differentiable parts of the environment beyond the neural network to include simulated robot kinematics. They could then backpropagate through the parameters of the robot environment into the neural network policy, speeding up the optimization process by about 6x to 8x in terms of sample efficiency. They chose to use\u00a0<a href=\"http:\/\/deeplearning.net\/software\/theano\/\" rel=\"noopener noreferrer\" target=\"_blank\">Theano<\/a>\u00a0as their automatic differentiation library, which prevented them from differentiating through conditional statements, limiting the types of contact constraints they could implement. A differentiable physics simulator built with Autograd or even recent versions of PyTorch or Tensorflow 2.0, which support differentiating through dynamic branching, would have even more possibilities for optimizing a neural network robot controller,\u00a0<i>e.g.<\/i>\u00a0offering more realistic collision detection.<\/p>\n<p>The\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Universal_approximation_theorem\" rel=\"noopener noreferrer\" target=\"_blank\">universal approximation power<\/a>\u00a0of deep neural networks makes them an incredible tool for problems in science, control, and data science, but sometimes this flexibility is more liability than utility, as anyone who has ever struggled with over-fitting can attest. As a famous quote from John von Neumann puts it: \u201cWith four parameters I can fit an elephant, and with five I can make him wiggle his trunk.\u201d (an actual demonstration of this concept can be found in \u201cDrawing an elephant with 4 complex parameters\u201d by Mayer\u00a0<i>et al.<\/i>\u00a0[<a href=\"https:\/\/publications.mpi-cbg.de\/getDocument.html?id=ff8080812daff75c012dc1b7bc10000c\" rel=\"noopener noreferrer\" target=\"_blank\">pdf<\/a>]).<\/p>\n<p>In modern machine learning practice, that means being careful not to mismatch your model to your dataset, a feat that for small datasets is all too easy to stumble into. In other words a big conv-net is likely to be overkill for many bespoke datasets with only a few hundred to a few thousand samples. In many physics problems, for example, it will be better to describe your problem mathematically and run gradient descent over the free parameters. Autograd is a Python package well suited to this approach, especially for Pythonicly-inclined mathematicians, physicists, and others who are well-practiced at describing problems at a low level with Python matrix and array computational package NumPy.<\/p>\n<p>\u00a0<\/p>\n<h3>Autograd: Anything you can NumPy, you can differentiate<\/h3>\n<p>\u00a0<br \/>Here\u2019s a simple example of what Autograd can do:<\/p>\n<div>\n<pre>import autograd.numpy as np\r\nfrom autograd import elementwise_grad as egrad\r\n\r\nimport matplotlib.pyplot as plt\r\n\r\nx = np.linspace(-31.4,31.4, 256)\r\n\r\nsinc = lambda x: np.sin(x) \/ x\r\n\r\nplt.figure(figsize=(12,7))\r\n\r\nplt.title(\u201csinc function and derivatives\u201d, fontsize=24)\r\n\r\nmy_fn = sinc\r\n\r\nfor ii in range(9):\r\n\r\n    plt.plot(x, my_fn(x), lw=3, label=\u201dd{} sinc(x)\/dx{}\u201d.format(ii,ii))\r\n\r\n    plt.legend(fontsize=18)\r\n\r\n    plt.axis([-32, 32, -0.50, 1.2])\r\n\r\n    plt.savefig(\u201c.\/sinc_grad{}.png\u201d.format(ii))\r\n\r\n    my_fn = egrad(my_fn) <\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img alt=\"Differentiation with Autograd\" class=\"aligncenter\" src=\"https:\/\/blog.exxactcorp.com\/wp-content\/uploads\/2020\/07\/sinc_grad8-1024x597.png\" width=\"100%\"><br \/><i>Differentiation with Autograd. In this case Autograd was able to differentiate up to the 7th derivative before running into some numerical stability problems around x=0 (note the sharp olive green spike in the center of the figure).<\/i><br \/>\u00a0<\/p>\n<p>Autograd is a powerful automatic differentiation library that makes it possible to differentiate native Python and NumPy code. Derivatives can be computed to an arbitrary order (you can take derivatives of derivatives of derivatives, and so on), and assigned to multiple arrays of parameters so long as the final output is a scalar (e.g. a loss function). The resulting code is\u00a0<a href=\"https:\/\/stackoverflow.com\/questions\/25011078\/what-does-pythonic-mean\" rel=\"noopener noreferrer\" target=\"_blank\">Pythonic<\/a>, a.k.a. it is readable and maintainable, and it doesn\u2019t require learning new syntax or style. That means we don\u2019t have to worry about memorizing complex APIs like the contents of\u00a0torch.nn\u00a0or\u00a0tf.keras.layers, and we can concentrate on the details of our problem,\u00a0<i>e.g.<\/i>\u00a0translating mathematics into code. Autograd+NumPy is a mature library that is maintained but no longer developed, so there\u2019s no real danger of future updates breaking your project.<\/p>\n<p>You\u00a0<i>can<\/i>\u00a0implement a neural network easily with Autograd, as the mathematical primitives of dense neural layers (matrix multiplication) and convolution (you can easily use Fourier transforms for this, or use\u00a0convolve2d\u00a0from scipy) have relatively fast implementations in NumPy. To try out a simple MLP demonstration on scikit-learn\u2019s diminutive digits dataset, download this\u00a0<a href=\"https:\/\/gist.github.com\/riveSunder\/1223824a4fb7e6831f20fde3b4871354\" rel=\"noopener noreferrer\" target=\"_blank\">Github gist<\/a>, (you may also be interested in studying the\u00a0<a href=\"https:\/\/github.com\/HIPS\/autograd\/blob\/master\/examples\/neural_net.py\" rel=\"noopener noreferrer\" target=\"_blank\">official example<\/a>\u00a0in the autograd repository).<\/p>\n<p>If you copy the gist and run it in a local virtual environment you\u2019ll need to\u00a0pip install\u00a0both\u00a0autograd, and\u00a0scikit-learn, the latter for its digits dataset. Once all set up, running the code should yield progress reports like the following:<\/p>\n<div>\n<pre>epoch 10, training loss 2.89e+02, train acc: 5.64e-01, val loss 3.94e+02, val accuracy 4.75e-01\r\ntotal time: 4.26, epoch time 0.38\r\n\r\nepoch 20, training loss 8.79e+01, train acc: 8.09e-01, val loss 9.96e+01, val accuracy 7.99e-01\r\n\r\ntotal time: 7.73, epoch time 0.33\r\n\r\nepoch 30, training loss 4.54e+01, train acc: 9.20e-01, val loss 4.55e+01, val accuracy 9.39e-01\r\n\r\ntotal time: 11.49, epoch time 0.35\r\n\r\n\u2026\r\n\r\nepoch 280, training loss 1.77e+01, train acc: 9.99e-01, val loss 1.39e+01, val accuracy 9.83e-01\r\n\r\ntotal time: 110.70, epoch time 0.49\r\n\r\nepoch 290, training loss 1.76e+01, train acc: 9.99e-01, val loss 1.39e+01, val accuracy 9.83e-01\r\n\r\ntotal time: 115.41, epoch time 0.43<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>That\u2019s a reasonably good result of 98.3% validation accuracy after just under two minutes of training. With a little tweaking of hyperparameters, you could probably push that performance to 100% accuracy or very near. Autograd handles this small dataset easily and efficiently (while Autograd and NumPy operations don\u2019t run on the GPU, primitives like matrix multiply do take advantage of multiple cores). But if all you wanted to do was build a shallow MLP you could do so more quickly in terms of both development and computational time with a more mainstream and modern machine learning library.<\/p>\n<p>There is some utility in building simple models at a low-level like this where control is prioritized or as a learning exercise, of course, but if a small dense neural network was the final goal we\u2019d recommend you stick to PyTorch or TensorFlow for brevity and compatibility with hardware accelerators like GPUs. Instead let\u2019s dive into something a bit more interesting: simulating an optical neural network. The following tutorial does involve a bit of physics and a fair bit of code: if that\u2019s not your thing feel free to skip ahead to the next section where we\u2019ll touch on some of Autograd\u2019s limitations.<\/p>\n<p>\u00a0<\/p>\n<h3>Simulating an Optical Neural Network with Autograd<\/h3>\n<p>\u00a0<br \/>Optical neural networks (ONNs) are an old idea, with the scientific journal Applied Optics running special issues on the topic\u00a0<a href=\"https:\/\/www.osapublishing.org\/ao\/issue.cfm?volume=26&amp;issue=23\" rel=\"noopener noreferrer\" target=\"_blank\">in 1987<\/a>\u00a0and again\u00a0<a href=\"https:\/\/www.osapublishing.org\/ao\/issue.cfm?volume=32&amp;issue=8\" rel=\"noopener noreferrer\" target=\"_blank\">in 1993<\/a>. The concept has recently been revisited by academics (<i>e.g.<\/i><a href=\"https:\/\/www.osapublishing.org\/optica\/abstract.cfm?uri=optica-6-9-1132\" rel=\"noopener noreferrer\" target=\"_blank\">\u00a0Zuo\u00a0<i>et al<\/i>. 2019<\/a>) and by startups such as\u00a0<a href=\"https:\/\/www.optalysys.com\/\" rel=\"noopener noreferrer\" target=\"_blank\">Optalysys<\/a>,\u00a0\u00a0<a href=\"https:\/\/www.wired.com\/story\/this-computer-uses-lightnot-electricityto-train-ai-algorithms\/\" rel=\"noopener noreferrer\" target=\"_blank\">Fathom Computing<\/a>, and\u00a0<a href=\"https:\/\/lightmatter.co\/\" rel=\"noopener noreferrer\" target=\"_blank\">Lightmatter<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.lightelligence.ai\/technology\" rel=\"noopener noreferrer\" target=\"_blank\">Lightelligence<\/a>, the last two of which were spun out of the same lab at MIT by co-authors on a\u00a0<a href=\"https:\/\/www.nature.com\/articles\/nphoton.2017.93\" rel=\"noopener noreferrer\" target=\"_blank\">high-profile paper published in Nature<\/a>.<\/p>\n<p>Light is an attractive physical phenomenon for implementing neural networks due to the similarity in the mathematics used to describe both neural networks and optical propagation. Thanks to the\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Fourier_optics#Applications_of_Fourier_optics_principles\" rel=\"noopener noreferrer\" target=\"_blank\">Fourier Transform property of lenses<\/a>\u00a0and the\u00a0<a href=\"http:\/\/www.thefouriertransform.com\/transform\/properties.php\" rel=\"noopener noreferrer\" target=\"_blank\">convolution property<\/a>\u00a0of the Fourier transform, convolutional layers can be implemented with a perturbative element placed after 2 focal lengths and one lens away from an input plane (this is known as a\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Optical_correlator\" rel=\"noopener noreferrer\" target=\"_blank\">4f correlator<\/a>) while a matrix multiply can be implemented by placing the element 2 focal lengths and 1 lens from that. But this isn\u2019t an optics lecture, it\u2019s a coding tutorial, so let\u2019s see some code!<\/p>\n<p>To install the necessary dependencies, activate your desired virtual environment with your environment manager of choice and use\u00a0pip\u00a0to install Autograd and scikit-image if you haven\u2019t already.<\/p>\n<p><code>pip install autograd<\/code><\/p>\n<p>pip install scikit-image<\/p>\n<p>We\u2019ll be simulating an optical system that essentially operates as a single-output generator, processing a flat input wavefront by passing it through a series of evenly-spaced phase images. To keep the tutorial relatively simple and the line count down, we will attempt to match only a single target image, shown below (you can download the image to your working directory if you want to follow along). After completing this simple tutorial, you may be inclined to experiment with building an optical classifier, autoencoder, or some other image transformation.<\/p>\n<p><img alt=\"Autograd image transformation example\" class=\"aligncenter\" src=\"https:\/\/blog.exxactcorp.com\/wp-content\/uploads\/2020\/07\/Image-transformation-example.png\"><\/p>\n<p>Now for some Python, starting with importing the packages we\u2019ll need.<\/p>\n<div>\n<pre>import autograd.numpy as np\r\nfrom autograd import grad\r\n\r\nimport matplotlib.pyplot as plt \r\n\r\nimport time\r\n\r\nimport skimage\r\n\r\nimport skimage.io as sio <\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>We\u2019ll use the angular spectrum method to simulate optical propagation. This is a good method for near-field conditions where the aperture size of your lens or beam is similar to the propagation distance. The following function executes angular spectrum method propagation given a starting wavefront and its dimensions, wavelength of light, and propagation distance.<\/p>\n<div>\n<pre>def asm_prop(wavefront, length=32.e-3, \r\nwavelength=550.e-9, distance=10.e-3):\r\n\r\n    if len(wavefront.shape) == 2:\r\n\r\n        dim_x, dim_y = wavefront.shape\r\n\r\n    elif len(wavefront.shape) == 3:\r\n\r\n        number_samples, dim_x, dim_y = wavefront.shape\r\n\r\n    else:\r\n\r\n        print(\u201conly 2D wavefronts or array of 2D wavefronts supported\u201d)\r\n\r\n    assert dim_x == dim_y, \u201cwavefront should be square\u201d\r\n\r\n    px = length \/ dim_x\r\n\r\n    l2 = (1\/wavelength)**2\r\n\r\n    fx = np.linspace(-1\/(2*px), 1\/(2*px) \u2013 1\/(dim_x*px), dim_x)\r\n\r\n    fxx, fyy = np.meshgrid(fx,fx)\r\n\r\n    q = l2 \u2013 fxx**2 \u2013 fyy**2\r\n\r\n    q[q\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Instead of restricting our ONN to either convolution or matrix multiplication operations, we\u2019ll propagate our beam through a series of evenly spaced phase object images. Physically, this is similar to shining a coherent beam of light through a series of thin, wavy glass plates, only in this case we\u2019ll use Autograd to backpropagate through the system to design them so that they direct light from the input wavefront to match a given target pattern at the end. After passing through the phase elements, we\u2019ll collect the light on the equivalent of an image sensor. This gives us a nice nonlinearity in the conversion from a complex field to real-valued intensity that we could use to build a more complex optical neural network by stacking several of these together.<\/p>\n<p>Each layer is defined by passing through a series of phase images separated by short distances. This is described computationally as\u00a0 propagation over a short distance, followed by a thin phase plate (implemented as multiplication):<\/p>\n<div>\n<pre>def onn_layer(wavefront, phase_objects, d=100.e-3):\r\n    for ii in range(len(phase_objects)):\r\n\r\n        wavefront = asm_prop(wavefront * phase_objects[ii], distance=d)\r\n\r\n    return wavefront<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>The key to training a model in Autograd is in defining a function that returns a scalar loss. This loss function can then be wrapped in Autograd\u2019s\u00a0grad\u00a0function to compute gradients. You can specify which argument contains the parameters to compute gradients for the\u00a0argnum\u00a0argument to\u00a0grad, and remember that the loss function must return a single scalar value, not an array.<\/p>\n<div>\n<pre>def get_loss(wavefront, y_tgt, phase_objects, d=100.e-3):\r\n    img = np.abs(onn_layer(wavefront, phase_objects, d=d))**2\r\n\r\n    mse_loss = np.mean( (img \u2013 y_tgt)**2 + np.abs(img-y_tgt) )\r\n\r\n    return mse_loss\r\n\r\nget_grad = grad(get_loss, argnum=2)<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>First, let\u2019s read in the target image and set up the input wavefront. Feel free to use a 64 by 64 image of your choosing, or download the grayscale smiley image from earlier in the article.<\/p>\n<div>\n<pre># target image\r\ntgt_img = sio.imread(\u201c.\/smiley.png\u201d)[:, :, 0]\r\n\r\ny_tgt = 1.0 * tgt_img \/ np.max(tgt_img)\r\n\r\n# set up input wavefront (a flat plane wave with an 16mm aperture)\r\n\r\ndim = 128\r\n\r\nside_length = 32.e-3\r\n\r\naperture = 8.e-3\r\n\r\nwavelength = 550.e-9\r\n\r\nk0 = 2*np.pi \/ wavelength\r\n\r\npx = side_length \/ dim\r\n\r\nx = np.linspace(-side_length\/2, side_length\/2-px, dim)\r\n\r\nxx, yy = np.meshgrid(x,x)\r\n\r\nrr = np.sqrt(xx**2 + yy**2)\r\n\r\nwavefront = np.zeros((dim,dim)) * np.exp(1.j*k0*0.0)\r\n\r\nwavefront[rr \n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Next, define the learning rate, propagation distance, and the model parameters.<\/p>\n<div>\n<pre>lr = 1e-3\r\ndist = 50.e-3\r\n\r\nphase_objects = [np.exp(1.j * np.zeros((128,128))) \r\n\r\n        for aa in range(32)]\r\n\r\nlosses = []<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>If you\u2019re familiar with training neural networks with PyTorch or similar librarie\u200bs, the training loop should look familiar. We call the gradient function we defined earlier (which is a function transformation of the function we wrote to calculate loss), and apply the resulting gradients to the parameters of our model. I found the model to get much better results by updating parameters (phase_objects) by only the phase of the gradient, rather than the raw complex gradient itself. The real-valued phase component of the gradient is accessed by using NumPy\u2019s\u00a0np.angle, and it\u2019s converted back into complex values by\u00a0np.exp(1.j * value).<\/p>\n<div>\n<pre>for step in range(128):\r\n    my_grad = get_grad(wavefront, y_tgt, phase_objects, d=dist)\r\n\r\n    for params, grads in zip(phase_objects, my_grad):\r\n\r\n        params -= lr * np.exp( -1.j * np.angle(grads))\r\n\r\n     loss = get_loss(wavefront, y_tgt, phase_objects,d=dist)\r\n\r\n     losses.append(loss)\r\n\r\n     img = np.abs(onn_layer(wavefront, phase_objects))**2\r\n\r\n     print(\u201closs at step {} = {:.2e}, lr={:.3e}\u201d.format(step, loss, lr))\r\n\r\n     fig = plt.figure(figsize=(12,7))\r\n\r\n     plt.imshow(img \/ 2.0, cmap=\u201djet\u201d)\r\n\r\n     plt.savefig(\u201c.\/smiley_img{}.png\u201d.format(step))\r\n\r\n     plt.close(fig)\r\n\r\nfig = plt.figure(figsize=(7,4))\r\n\r\nplt.plot(losses, lw=3)\r\n\r\nplt.savefig(\u201c.\/smiley_losses.png\u201d)<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>If everything worked out you should see monotonically decreasing mean squared error loss and the code will save a series of figures depicting optical network\u2019s output as it gets closer and closer to matching the target image.<\/p>\n<p><img alt=\"Optimization of the optical system attempting to match the target image\" class=\"aligncenter\" src=\"https:\/\/blog.exxactcorp.com\/wp-content\/uploads\/2020\/07\/Optimization-of-the-optical-system-attempting-to-match-the-target-image.jpg\" width=\"100%\"><br \/><i>Optimization of the optical system attempting to match the target image. Each of the numbered images with a blue background is the model output at different training steps. Unsurprisingly for training with a single sample, the loss decreases smoothly over the course of training.<\/i><br \/>\u00a0<\/p>\n<p>That\u2019s it! We\u2019ve simulated an optical system acting as a single-output generator. If you have any trouble getting the code to run, try copying the code from\u00a0<a href=\"https:\/\/gist.github.com\/riveSunder\/96267f5a52a1ebe8f567505516d4e068\" rel=\"noopener noreferrer\" target=\"_blank\">this Github gist<\/a>\u00a0all in one go to prevent introducing typos.<\/p>\n<p>\u00a0<\/p>\n<h3>Autograd Uses and Limitations<\/h3>\n<p>\u00a0<br \/>Autograd is a flexible automatic differentiation package that has influenced mainstream machine learning libraries in many ways. It\u2019s not always easy to determine the ancestry of how different ideas influence one another in a rapidly developing space like machine learning. However, the imperative, define-by-run approach features prominently in Chainer, PyTorch, and to some extent TensorFlow versions after 2.0 that feature eager execution. According to\u00a0<a href=\"https:\/\/libraries.io\/pypi\/autograd\/dependents\" rel=\"noopener noreferrer\" target=\"_blank\">libraries.io<\/a>\u00a0ten other Python packages depend on Autograd, including packages for\u00a0<a href=\"https:\/\/github.com\/lanius\/tinyik\" rel=\"noopener noreferrer\" target=\"_blank\">solving inverse kinematics<\/a>,\u00a0<a href=\"https:\/\/github.com\/rgiordan\/vittles\" rel=\"noopener noreferrer\" target=\"_blank\">sensitivity analysis<\/a>, and\u00a0<a href=\"https:\/\/github.com\/Gattocrucco\/lsqfitgp\" rel=\"noopener noreferrer\" target=\"_blank\">Gaussian processes<\/a>. My personal favorite is the quantum machine learning package\u00a0<a href=\"https:\/\/github.com\/xanaduai\/pennylane\" rel=\"noopener noreferrer\" target=\"_blank\">PennyLane<\/a>.<\/p>\n<p>Autograd may not be as powerful as PyTorch or TensorFlow, and it doesn\u2019t have implementations of all the latest deep learning tricks, but in some ways this can be an advantage during certain stages of development. There aren\u2019t a lot of specialized APIs to memorize and the learning curve is particularly gentle for anyone who is familiar with Python and\/or NumPy. It doesn\u2019t have any of the bells and whistles for deployment or scaling, but it is simple and efficient to use for projects where control and customization is important. It\u2019s particularly well-suited to mathematicians and physicists who need to translate abstract ideas from math to code to build arbitrary machine learning or optimization solutions at a low-level of implementation.<\/p>\n<p>The biggest con to using Autograd in our opinion is a lack of support for hardware acceleration. Perhaps there\u2019s no better way to describe this drawback than the 4-year-long discussion on\u00a0<a href=\"https:\/\/github.com\/HIPS\/autograd\/issues\/46\" rel=\"noopener noreferrer\" target=\"_blank\">this Github issue<\/a>, which discusses various ways of introducing GPU support. If you worked your way through the optical neural network tutorial in this post you\u2019ll have already noticed that running an experiment with even a modestly sized model could require a prohibitively high amount of computational time.\u00a0Computation speed with Autograd is enough of a drawback that we don\u2019t actually recommend using it for projects much larger than the MLP or ONN generator demonstrations described above.<\/p>\n<p>Instead, consider JAX, an Apache 2.0 licensed library developed by Google Brain researchers, including the Autograd developers. JAX combines hardware acceleration and just-in-time compilation for substantial speedups over native NumPy code, and in addition, JAX offers a set of function transformations for automatically parallelizing code. JAX can be slightly more complicated than a direct NumPy replacement with Autograd, but its powerful features can more than make up for that. We\u2019ll compare JAX to Autograd as well as the popular PyTorch and TensorFlow in a future article.<\/p>\n<p>\u00a0<br \/><a href=\"https:\/\/blog.exxactcorp.com\/autograd-the-best-machine-learning-library-youre-not-using\/\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/09\/autograd-best-machine-learning-library-not-using.html<\/p>\n","protected":false},"author":0,"featured_media":1612,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1611"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=1611"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1611\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/1612"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=1611"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=1611"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=1611"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}