class: center, middle # Tools to Get Started with Machine Learning / AI ## Christian Hudon | JDA Labs .footnote[Twitter: @christian_hudon and @JDALabsMTL] --- # You think machine learning is cool .middle[
] --- # You've taken a first online class ![](img/Coursera_Deep.png) --- class: center # But what about after that? -- ## Maybe you have a first project idea... -- ## (Or maybe not.) ??? * That's cool. We'll also have some suggestions for good first projects in here later. --- class: center, middle # Let's Get Started! ??? # First, pick a deep learning library --- class: center, middle ![](img/libraries_word_mural.png) ??? Well. ok. Let's put that off a bit, and start by visualizing our data instead. Amazon: MXNet Baidu: PaddlePaddle Facebook: PyTorch Google: TensorFlow Microsoft: CNTK / Microsoft Cognitive Toolkit Tencent: FeatherCNN Collage of the names of: theano, lasagne, nolearn, Keras, Pylearn2, Blocks, Caffe, Mxnet, CNTK, Torch7, Pytorch, Sklearn-theano. --- .center[![Python Visualization Landscape](img/python_viz_landscape.png)] ??? * Worse. * Easier to pick once experienced, but if you're starting out... --- class: center, middle # Who is this presentation for? ??? * No ML experience necesary, but ML not taught here * A little bit of programming experience * Not afraid to get feet wet * Gets you started. Initial selection of tools and starter project suggestions * Beginner-friendly tools (not necessarily what we use here) * Python (that's what I know best / deep learning research happening in Python) Giving you a good starting point to continue learning on your own. --- class: center, middle # Trying things out: # Jupyter Lab ??? * Born out of reproductible science movement * Runnable document * Code, text, plots, equations in same document * Lab: big brother, ready for use (stable plugin API: 1.0) --- class: center, middle # Numerical computations: # Numpy ??? * The standard foundation in Python for numerical work * Other libarries either built on top, or work with Numpy. * Similar to Matlab... --- class: center, middle # Data Exploration: # Pandas ??? * Data cleaning and exploration a significant part of the work! * Pandas * Easy to use yet powerful * Very good support for dates, categoricals --- class: center, middle # Classical machine learning: # Scikit-learn ## For NLP: SpaCy ??? * Easy to use, models know how to train themselves * Large number of algorithms * Excellent documentation * Ability to work with sequence of preprocessings + model as a composite model (pipelines) --- class: center, middle # Deep learning: # Pytorch (and Skorch) .footnote[.red[*] Keras (with Tensorflow) also a reasonable choice.] ??? * Created by Facebook; rapid growth in use * Fairly often described as "fun" * Easier to understand causes of errors * Skorch: makes it easy to replace scikit-learn model * Keras and TensorFlow: harder to debug problems, but new "immediate" mode helps --- class: center, middle # Visualization (simpler / pre-canned): # Pandas & Seaborn ??? * Quick and easy! * For the basics --- class: center, middle # Visualization (expressive): # HoloViews (or Altair) ??? * Easy to try complex plots quickly * One of the good ideas of R * Difference between 1-line plots and 1 paragraph ones! * Great for visually exploring your data * HoloViews: * Interactive plots in browser * Scalable to huge datasets with `datashader` Be picky about your visualization library! Invest the time! --- class: center, middle # How to continue learning? ## Do a project! --- # Dataset Sources * The Open Data movement (Bixi, etc.) -- * Kaggle -- * Your own life! (FitBit, your bank) -- * The Internet! --- # Project Ideas * Forecast Bixi usage -- * Cat vs. dog detector -- * Automated categorization of credit card bill items ??? * Reinforcement learning not a good first project. * Landing near the top of the Kaggle leaderboard needs different skills. * Creating trading models that make money has gotchas. --- # After your first project * Getting better at Python: https://www.youtube.com/playlist?list=PLu5kChXP9Hipqr1gIROxGlvb3YVadOjUB (14 videos currently) * Learning the basics of classes and object-oriented programming: http://www.greenteapress.com/thinkpython/html/thinkpython016.html (and next two chapters) * The matrix algebra you need for machine learning: http://parrt.cs.usfca.edu/doc/matrix-calculus/ * Scaling with [Dask](https://dask.pydata.org/en/latest/) and [Dask-ml](https://dask-ml.readthedocs.io/en/latest/) and [Dask-Kubernetes](https://dask-kubernetes.readthedocs.io/en/latest/) * Learning the basics of SQL --- .pull-left[ ## Thank you! Slides and notebooks: http://christianhudon.name/talks/#DSDT-2018-06 ] .pull-right[ ## Tools * Trying things out: **Jupyter Lab** * Numerical computations: **Numpy** * Data Exploration: **Pandas** * Classical machine learning: **Scikit-learn** (and SpaCy) * Deep learning: **Pytorch** (and Skorch) * Visualization: **Pandas** and **Seaborn** * More Visualization: **HoloViews** (or Altair) ]