Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech News - Artificial Intelligence

61 Articles
article-image-paper-in-two-minutes-attention-is-all-you-need
Sugandha Lahoti
05 Apr 2018
4 min read
Save for later

Paper in Two minutes: Attention Is All You Need

Sugandha Lahoti
05 Apr 2018
4 min read
A paper on a new simple network architecture, the Transformer, based solely on attention mechanisms The NIPS 2017 accepted paper, Attention Is All You Need, introduces Transformer, a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. This paper is authored by professionals from the Google research team including Ashish Vaswani, Noam Shazeer, Niki Parmar,  Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. The Transformer – Attention is all you need What problem is the paper attempting to solve? Recurrent neural networks (RNN), long short-term memory networks(LSTM) and gated RNNs are the popularly approaches used for Sequence Modelling tasks such as machine translation and language modeling. However, RNN/CNN handle sequences word-by-word in a sequential fashion. This sequentiality is an obstacle toward parallelization of the process. Moreover, when such sequences are too long, the model is prone to forgetting the content of distant positions in sequence or mix it with following positions’ content. Recent works have achieved significant improvements in computational efficiency and model performance through factorization tricks and conditional computation. But they are not enough to eliminate the fundamental constraint of sequential computation. Attention mechanisms are one of the solutions to overcome the problem of model forgetting. This is because they allow dependency modelling without considering their distance in the input or output sequences. Due to this feature, they have become an integral part of sequence modeling and transduction models. However, in most cases attention mechanisms are used in conjunction with a recurrent network. Paper summary The Transformer proposed in this paper is a model architecture which relies entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and tremendously improves translation quality after being trained for as little as twelve hours on eight P100 GPUs. Neural sequence transduction models generally have an encoder-decoder structure. The encoder maps an input sequence of symbol representations to a sequence of continuous representations. The decoder then generates an output sequence of symbols, one element at a time. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The authors are motivated to use self-attention because of three criteria.   One is that the total computational complexity per layer. Another is the amount of computation that can be parallelized, as measured by the minimum number of sequential operations required. The third is the path length between long-range dependencies in the network. The Transformer uses two different types of attention functions: Scaled Dot-Product Attention, computes the attention function on a set of queries simultaneously, packed together into a matrix. Multi-head attention, allows the model to jointly attend to information from different representation subspaces at different positions. A self-attention layer connects all positions with a constant number of sequentially executed operations, whereas a recurrent layer requires O(n) sequential operations. In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length is smaller than the representation dimensionality, which is often the case with machine translations. Key Takeaways This work introduces Transformer, a novel sequence transduction model based entirely on attention mechanism. It replaces the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers for translation tasks. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, the model achieves a new state of the art.  In the former task the model outperforms all previously reported ensembles. Future Goals Transformer has only been applied to transduction model tasks as of yet. In the near future, the authors plan to use it for other problems involving input and output modalities other than text. They plan to apply attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video. The Transformer architecture from this paper has gained major traction since its release because of major improvements in translation quality and other NLP tasks. Recently, the NLP research group at Harvard have released a post which presents an annotated version of the paper in the form of a line-by-line implementation. It is accompanied with 400 lines of library code, written in PyTorch in the form of a notebook, accessible from github or on Google Colab with free GPUs.  
Read more
  • 0
  • 0
  • 17003

article-image-sherin-thomas-explains-how-to-build-a-pipeline-in-pytorch-for-deep-learning-workflows
Packt Editorial Staff
09 May 2019
8 min read
Save for later

Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows

Packt Editorial Staff
09 May 2019
8 min read
A typical deep learning workflow starts with ideation and research around a problem statement, where the architectural design and model decisions come into play. Following this, the theoretical model is experimented using prototypes. This includes trying out different models or techniques, such as skip connection, or making decisions on what not to try out. PyTorch was started as a research framework by a Facebook intern, and now it has grown to be used as a research or prototype framework and to write an efficient model with serving modules. The PyTorch deep learning workflow is fairly equivalent to the workflow implemented by almost everyone in the industry, even for highly sophisticated implementations, with slight variations. In this article, we explain the core of ideation and planning, design and experimentation of the PyTorch deep learning workflow. This article is an excerpt from the book PyTorch Deep Learning Hands-On by Sherin Thomas and Sudhanshi Passi. This book attempts to provide an entirely practical introduction to PyTorch. This PyTorch publication has numerous examples and dynamic AI applications and demonstrates the simplicity and efficiency of the PyTorch approach to machine intelligence and deep learning. Ideation and planning Usually, in an organization, the product team comes up with a problem statement for the engineering team, to know whether they can solve it or not. This is the start of the ideation phase. However, in academia, this could be the decision phase where candidates have to find a problem for their thesis. In the ideation phase, engineers brainstorm and find the theoretical implementations that could potentially solve the problem. In addition to converting the problem statement to a theoretical solution, the ideation phase is where we decide what the data types are and what dataset we should use to build the proof of concept (POC) of the minimum viable product (MVP). Also, this is the stage where the team decides which framework to go with by analyzing the behavior of the problem statement, available implementations, available pretrained models, and so on. This stage is very common in the industry, and I have come across numerous examples where a well-planned ideation phase helped the team to roll out a reliable product on time, while a non-planned ideation phase destroyed the whole product creation. Design and experimentation The crucial part of design and experimentation lies in the dataset and the preprocessing of the dataset. For any data science project, the major timeshare is spent on data cleaning and preprocessing. Deep learning is no exception from this. Data preprocessing is one of the vital parts of building a deep learning pipeline. Usually, for a neural network to process, real-world datasets are not cleaned or formatted. Conversion to floats or integers, normalization and so on, is required before further processing. Building a data processing pipeline is also a non-trivial task, which consists of writing a lot of boilerplate code. For making it much easier, dataset builders and DataLoader pipeline packages are built into the core of PyTorch. The dataset and DataLoader classes Different types of deep learning problems require different types of datasets, and each of them might require different types of preprocessing depending on the neural network architecture we use. This is one of the core problems in deep learning pipeline building. Although the community has made the datasets for different tasks available for free, writing a preprocessing script is almost always painful. PyTorch solves this problem by giving abstract classes to write custom datasets and data loaders. The example given here is a simple dataset class to load the fizzbuzz dataset, but extending this to handle any type of dataset is fairly straightforward. PyTorch's official documentation uses a similar approach to preprocess an image dataset before passing that to a complex convolutional neural network (CNN) architecture. A dataset class in PyTorch is a high-level abstraction that handles almost everything required by the data loaders. The custom dataset class defined by the user needs to override the __len__ and __getitem__ functions of the parent class, where __len__ is being used by the data loaders to determine the length of the dataset and __getitem__ is being used by the data loaders to get the item. The __getitem__ function expects the user to pass the index as an argument and get the item that resides on that index: from dataclasses import dataclassfrom torch.utils.data import Dataset, DataLoader@dataclass(eq=False)class FizBuzDataset(Dataset):    input_size: int    start: int = 0    end: int = 1000    def encoder(self,num):        ret = [int(i) for i in '{0:b}'.format(num)]        return[0] * (self.input_size - len(ret)) + ret    def __getitem__(self, idx):        x = self.encoder(idx)        if idx % 15 == 0:            y = [1,0,0,0]        elif idx % 5 ==0:            y = [0,1,0,0]        elif idx % 3 == 0:            y = [0,0,1,0]        else:            y = [0,0,0,1]        return x,y           def __len__(self):        return self.end - self.start The implementation of a custom dataset uses brand new dataclasses from Python 3.7. dataclasses help to eliminate boilerplate code for Python magic functions, such as __init__, using dynamic code generation. This needs the code to be type-hinted and that's what the first three lines inside the class are for. You can read more about dataclasses in the official documentation of Python (https://docs.python.org/3/library/dataclasses.html). The __len__ function returns the difference between the end and start values passed to the class. In the fizzbuzz dataset, the data is generated by the program. The implementation of data generation is inside the __getitem__ function, where the class instance generates the data based on the index passed by DataLoader. PyTorch made the class abstraction as generic as possible such that the user can define what the data loader should return for each id. In this particular case, the class instance returns input and output for each index, where, input, x is the binary-encoder version of the index itself and output is the one-hot encoded output with four states. The four states represent whether the next number is a multiple of three (fizz), or a multiple of five (buzz), or a multiple of both three and five (fizzbuzz), or not a multiple of either three or five. Note: For Python newbies, the way the dataset works can be understood by looking first for the loop that loops over the integers, starting from zero to the length of the dataset (the length is returned by the __len__ function when len(object) is called). The following snippet shows the simple loop: dataset = FizBuzDataset()for i in range(len(dataset)):    x, y = dataset[i]dataloader = DataLoader(dataset, batch_size=10, shuffle=True,                     num_workers=4)for batch in dataloader:    print(batch) The DataLoader class accepts a dataset class that is inherited from torch.utils.data.Dataset. DataLoader accepts dataset and does non-trivial operations such as mini-batching, multithreading, shuffling, and so on, to fetch the data from the dataset. It accepts a dataset instance from the user and uses the sampler strategy to sample data as mini-batches. The num_worker argument decides how many parallel threads should be operating to fetch the data. This helps to avoid a CPU bottleneck so that the CPU can catch up with the GPU's parallel operations. Data loaders allow users to specify whether to use pinned CUDA memory or not, which copies the data tensors to CUDA's pinned memory before returning it to the user. Using pinned memory is the key to fast data transfers between devices, since the data is loaded into the pinned memory by the data loader itself, which is done by multiple cores of the CPU anyway. Most often, especially while prototyping, custom datasets might not be available for developers and in such cases, they have to rely on existing open datasets. The good thing about working on open datasets is that most of them are free from licensing burdens, and thousands of people have already tried preprocessing them, so the community will help out. PyTorch came up with utility packages for all three types of datasets with pretrained models, preprocessed datasets, and utility functions to work with these datasets. This article is about how to build a basic pipeline for deep learning development. The system we defined here is a very common/general approach that is followed by different sorts of companies, with slight changes. The benefit of starting with a generic workflow like this is that you can build a really complex workflow as your team/project grows on top of it. Build deep learning workflows and take deep learning models from prototyping to production with PyTorch Deep Learning Hands-On written by Sherin Thomas and Sudhanshu Passi. F8 PyTorch announcements: PyTorch 1.1 releases with new AI tools, open sourcing BoTorch and Ax, and more Facebook AI open-sources PyTorch-BigGraph for faster embeddings in large graphs Top 10 deep learning frameworks
Read more
  • 0
  • 0
  • 10990

article-image-graph-nets-deepminds-library-for-graph-networks-in-tensorflow-and-sonnet
Sunith Shetty
19 Oct 2018
3 min read
Save for later

Graph Nets – DeepMind's library for graph networks in Tensorflow and Sonnet

Sunith Shetty
19 Oct 2018
3 min read
Graph Nets is a new DeepMind’s library used for building graph networks in TensorFlow and Sonnet. Last week a paper Relational inductive biases, deep learning, and graph networks was published on arXiv by researchers from DeepMind, Google Brain, MIT and University of Edinburgh. The paper introduces a new machine learning framework called Graph networks which is expected to bring new innovations in artificial general intelligence realm. What are graph networks? Graph networks can generalize and extend various types of neural networks to perform calculations on the graph. It can implement relational inductive bias, a technique used for reasoning about inter-object relations. The graph networks framework is based on graph-to-graph modules. Each graph’s features are represented in three characteristics: Nodes Edges: Relations between the nodes Global attributes: System-level properties The graph network takes a graph as an input, performs the required operations and calculations from the edge, to the node, and to the global attributes, and then returns a new graph as an output. The research paper argues that graph networks can support two critical human-like capabilities: Relational reasoning: Drawing logical conclusions of how different objects and things relate to one another Combinatorial Generalization: Constructing new inferences, behaviors, and predictions from known building blocks To understand and learn more about graph networks you can refer the official research paper. Graph Nets Graph Nets library can be installed from pip. To install the library, run the following command: $ pip install graph_nets The installation is compatible with Linux/Mac OSX, and Python versions 2.7 and 3.4+ The library includes Jupyter notebook demos which allow you to create, manipulate, and train graph networks to perform operations such as shortest path-finding task, a sorting task, and prediction task. Each demo uses the same graph network architecture, thus showing the flexibility of the approach. You can try out various demos in your browser using Colaboratory. In other words, you don’t need to install anything locally when running the demos in the browser (or phone) via cloud Colaboratory backend. You can also run the demos on your local machine by installing the necessary dependencies. What’s ahead? The concept was released with ideas not only based in artificial intelligence research but also from the computer and cognitive sciences. Graph networks are still an early-stage research theory which does not yet offer any convincing experimental results. But it will be very interesting to see how well graph networks live up to the hype as they mature. To try out the open source library, you can visit the official Github page. In order to provide any comments or suggestions, you can contact graph-nets@google.com. Read more 2018 is the year of graph databases. Here’s why. Why Neo4j is the most popular graph database Pytorch.org revamps for Pytorch 1.0 with design changes and added Static graph support
Read more
  • 0
  • 0
  • 9363
Banner background image

article-image-what-is-meta-learning
Sugandha Lahoti
21 Mar 2018
5 min read
Save for later

What is Meta Learning?

Sugandha Lahoti
21 Mar 2018
5 min read
Meta Learning, an original concept of cognitive psychology, is now applied to machine learning techniques. If we go by the social psychology definition, meta learning is the state of being aware of and taking control of one's own learning. Similar concepts, when applied to the machine learning theory states that a meta learning algorithm uses prior experience to change certain aspects of an algorithm, such that the modified algorithm is better than the original algorithm. To explain in simple terms, meta-learning is how the algorithm learns how to learn. Meta Learning: Making a versatile AI agent Current AI Systems excel at mastering a single skill, playing Go, holding human-like conversations, predicting a disaster, etc. However, now that AI and machine learning is possibly being integrated in everyday tasks, we need a single AI system to solve a variety of problems. Currently, a Go Player, will not be able to navigate the roads or find new places. Or an AI navigation controller won’t be able to hold a perfect human-like conversation. What machine learning algorithms need to do is develop versatility – the capability of doing many different things. Versatility is achieved by intelligent amalgamation of Meta Learning along with related techniques such as reinforcement learning (finding suitable actions to maximize a reward), transfer learning (re-purposing a trained model for a specific task on a second related task), and active learning (learning algorithm chooses the data it wants to learn from). Such different learning techniques provides an AI agent with the brains to do multiple tasks without the need to learn every new task from scratch. Thereby making it capable of adapting intelligently to a wide variety of new, unseen situations. Apart from creating versatile agents, recent researches also focus on using meta learning for hyperparameter and neural network optimization, fast reinforcement learning, finding good network architectures and for specific cases such as few-shot image recognition. Using Meta Learning, AI agents learn how to learn new tasks by reusing prior experience, rather than examining each new task in isolation. Various approaches to Meta Learning algorithms A wide variety of approaches come under the umbrella of Meta-Learning. Let's have a quick glance at these algorithms and techniques: Algorithm Learning (selection) Algorithm selection or learning, selects learning algorithms on the basis of characteristics of the instance. For example, you have a set of ML algos (Random Forest, SVM, DNN), data sets as the instances and the error rate as the cost metric. Now, the goal of Algorithm Selection is to predict which machine learning algorithm will have a small error on each data set. Hyper-parameter Optimization Many machine learning algorithms have numerous hyper-parameters that can be optimized. The choice of selecting these hyper-parameters for learning algorithms determines how well the algorithm learns.  A recent paper, "Evolving Deep Neural Networks", provides a meta learning algorithm for optimizing deep learning architectures through evolution. Ensemble Methods Ensemble methods usually combine several models or approaches to achieve better predictive performance. There are 3 basic types – Bagging, Boosting, and Stacked Generalization. In Bagging, each model runs independently and then aggregates the outputs at the end without preference to any model. Boosting refers to a group of algorithms that utilize weighted averages to make weak learners into stronger learners. Boosting is all about “teamwork”. Stacked generalization, has a layered architecture. Each set of base-classifiers is trained on a dataset. Successive layers receive as input the predictions of the immediately preceding layer and the output is passed on to the next layer. A single classifier at the topmost level produces the final prediction. Dynamic bias selection In Dynamic Bias selection, we adjust the bias of the learning algorithm dynamically to suit the new problem instance. The performance of a base learner can trigger the need to explore additional hypothesis spaces, normally through small variations of the current hypothesis space. The bias selection can either be a form of data variation or a time-dependent feature. Inductive Transfer Inductive transfer describes learning using previous knowledge from related tasks. This is done by transferring meta-knowledge across domains or tasks; a process known as inductive transfer. The goal here is to incorporate the meta-knowledge into the new learning task rather than matching meta-features with a meta-knowledge base. Adding Enhancements to Meta Learning algorithms Supervised meta-learning:  When the meta-learner is trained with supervised learning. In supervised learning we have both input and output variables and the algorithm learns the mapping function from the input to the output. RL meta-learning: This algorithm talks about using standard deep RL techniques to train a recurrent neural network in such a way that the recurrent network can then implement its own Reinforcement learning procedure. Model-agnostic meta-learning: MAML trains over a wide range of tasks, for a representation that can be quickly adapted to a new task, via a few gradient steps. The meta-learner seeks an initialization that is not only useful for adapting to various problems, but also can be adapted quickly. The ultimate goal of any meta learning algorithm and its variations is to be fully self-referential. This means it can automatically inspect and improve every part of its own code. A regenerative meta learning algorithm, on the lines of how a lizard regenerates its limbs, would not only blur the distinction between the different variations as described above but will also lead to better future performance and versatility of machine learning algorithms.
Read more
  • 0
  • 0
  • 8399

article-image-how-deep-neural-networks-can-improve-speech-recognition-and-generation
Sugandha Lahoti
02 Feb 2018
7 min read
Save for later

How Deep Neural Networks can improve Speech Recognition and generation

Sugandha Lahoti
02 Feb 2018
7 min read
While watching your favorite movie or TV show, you must have found it difficult to sometimes decipher what the characters are saying, especially if they are talking really fast, or well, you’re seeing a show in the language you don’t know. You quickly add subtitles and voila, the problem is solved. But, do you know how these subtitles work? Instead of a person writing them, a computer automatically recognizes speech and the dialogues of the characters and generates scripts. However, this is just a trivial example of what computers and neural networks can do in the field of speech understanding and generation. Today, we’re gonna talk about the achievements of deep neural networks to improve the ability of our computing systems to understand and generate human speech. How traditional speech recognition systems work Traditionally speech recognition models used classification algorithms to arrive at a distribution of possible phonemes for each frame. These classification algorithms were based on highly specialized features such as MFCC. Hidden Markov Models (HMM) were used in the decoding phase. This model was accompanied with a pre-trained language model and was used to find the most likely sequence of phones that can be mapped to output words. With the emergence of deep learning, neural networks were used in many aspects of speech recognition such as phoneme classification, isolated word recognition,  audiovisual speech recognition, audio-visual speaker recognition and speaker adaptation. Deep learning enabled the development of Automatic Speech Recognition (ASR) systems. These ASR systems require separate models, namely acoustic model (AM), a pronunciation model (PM) and a language model (LM).  The AM is typically trained to recognize context-dependent states or phonemes, by bootstrapping from an existing model which is used for alignment. The PM maps the sequences of phonemes produced by the AM into word sequences. Word sequences are scored using LM trained on large amounts of text data, which estimate probabilities of word sequences. However, training independent components added complexities and was suboptimal compared to training all components jointly. This called for developing end-to-end systems in the ASR community, those which attempt to learn the separate components of an ASR jointly as a single system. A single system Speech recognition model The end-to-end trained neural networks can essentially recognize speech, without using an external pronunciation lexicon, or a separate language model. End-to-end trained systems can directly map the input acoustic speech signal to word sequences. In such sequence-to-sequence models, the AM, PM, and LM are trained jointly in a single system. Since these models directly predict words, the process of decoding utterances is also greatly simplified. The end-to-end ASR systems do not require bootstrapping from decision trees or time alignments generated from a separate system. Thereby making the training of such models simpler than conventional ASR systems. There are several sequence-to-sequence models including connectionist temporal classification (CTC), and recurrent neural network (RNN) transducer, an attention-based model etc. CTC models are used to train end-to-end systems that directly predict grapheme sequences. This model was proposed by Graves et al. as a way of training end-to-end models without requiring a frame-level alignment of the target labels for a training statement.  This basic CTC model was extended by Graves to include a separate recurrent LM component, in a model referred to as the recurrent neural network (RNN) transducer.  The RNN transducer augments the encoder network from the CTC model architecture with a separate recurrent prediction network over the output symbols. Attention-based models are also a type of end-to-end sequence models. These models consist of an encoder network, which maps the input acoustics into a higher-level representation. They also have an attention-based decoder that predicts the next output symbol based on the previous predictions. A schematic representation of various sequence-to-sequence modeling approaches Google’s Listen-Attend-Spell (LAS) end-to-end architecture is one such attention-based model. Their end-to-end system achieves a word error rate (WER) of 5.6%, which corresponds to a 16% relative improvement over a strong conventional system which achieves a 6.7% WER. Additionally, the end-to-end model used to output the initial word hypothesis, before any hypothesis rescoring, is 18 times smaller than the conventional model. These sequence-to-sequence models are comparable with traditional approaches on dictation test sets. However, the traditional models outperform end-to-end systems on voice-search test sets. Future work is being done on building optimal models for voice-search tests as well. More work is also expected in building multi-dialect and multi-lingual systems. So that data for all dialects/languages can be combined to train one network, without the need for a separate AM, PM, and LM for each dialect/language. Enough with understanding speech. Let’s talk about generating it Text-to-speech (TTS) conversion, i.e generating natural sounding speech from text, or allowing people to converse with machines has been one of the top research goals in the present times. Deep Neural networks have greatly improved the overall development of a TTS system, as well as enhanced individual pieces of such a system. In 2012, Google first used Deep Neural Networks (DNN) instead of Gaussian Mixture Model (GMMs), which were then used as the core technology behind TTS systems. DNNs assessed sounds at every instant in time with increased speech recognition accuracy.  Later, better neural network acoustic models were built using CTC and sequence discriminative training techniques based on RNNs. Although being blazingly fast and accurate,  these TTS systems were largely based on concatenative TTS, where a very large database of short speech fragments was recorded from a single speaker and then recombined to form complete utterances. This led to the development of parametric TTS, where all the information required to generate the data was stored in the parameters of the model, and the contents and characteristics of the speech were controlled via the inputs to the model. WaveNet further enhanced these parametric models by directly modeling the raw waveform of the audio signal, one sample at a time. WaveNet yielded more natural-sounding speech using raw waveforms and was able to model any kind of audio, including music. Baidu then came with their Deep Voice TTS system constructed entirely from deep neural networks. Their system was able to do audio synthesis in real-time, giving up to 400X speedup over previous WaveNet inference implementations. Google, then released Tacotron, an end-to-end generative TTS model that synthesized speech directly from characters. Tacotron was able to achieve a 3.82 mean opinion score (MOS), outperforming the traditional parametric system in terms of speech naturalness. Tacotron was also considerably faster than sample-level autoregressive methods because of its ability to generate speech at the frame level. Most recently, Google has released Tacotron 2 which took inspiration from past work on Tacotron and WaveNet. It features a tacotron style, recurrent sequence-to-sequence feature prediction network that generates mel spectrograms. Followed by a modified version of WaveNet which generates time-domain waveform samples conditioned on the generated mel spectrogram frames. The model achieved a MOS of 4.53 compared to a MOS of 4.58 for professionally recorded speech.   Deep Neural Networks have been a strong force behind the developments of end-to-end speech recognition and generation models. Although these end-to-end models have compared substantially well against the classical approaches, more work is to be done still. As of now, end-to-end speech models cannot process speech in real time. Real-time speech processing is a strong requirement for latency-sensitive applications such as voice search. Hence more progress is expected in such areas. Also, end-to-end models do not give expected results when evaluated on live production data. There is also difficulty in learning proper spellings for rarely used words such as proper nouns. This is done quite easily when a separate PM is used. More efforts will need to be made to address these challenges as well.
Read more
  • 0
  • 0
  • 7890

article-image-reinforcement-learning-works
Pravin Dhandre
14 Nov 2017
5 min read
Save for later

How Reinforcement Learning works

Pravin Dhandre
14 Nov 2017
5 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book by Rodolfo Bonnin titled Machine Learning for Developers.[/box] Reinforcement learning is a field that has resurfaced recently, and it has become more popular in the fields of control, finding the solutions to games and situational problems, where a number of steps have to be implemented to solve a problem. A formal definition of reinforcement learning is as follows: "Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment.” (Kaelbling et al. 1996). In order to have a reference frame for the type of problem we want to solve, we will start by going back to a mathematical concept developed in the 1950s, called the Markov decision process. Markov decision process Before explaining reinforcement learning techniques, we will explain the type of problem we will attack with them. When talking about reinforcement learning, we want to optimize the problem of a Markov decision process. It consists of a mathematical model that aids decision making in situations where the outcomes are in part random, and in part under the control of an agent. The main elements of this model are an Agent, an Environment, and a State, as shown in the following diagram: Simplified scheme of a reinforcement learning process The agent can perform certain actions (such as moving the paddle left or right). These actions can sometimes result in a reward rt, which can be positive or negative (such as an increase or decrease in the score). Actions change the environment and can lead to a new state st+1, where the agent can perform another action at+1. The set of states, actions, and rewards, together with the rules for transitioning from one state to another, make up a Markov decision process. Decision elements To understand the problem, let's situate ourselves in the problem solving environment and look at the main elements: The set of states The action to take is to go from one place to another The reward function is the value represented by the edge The policy is the way to complete the task A discount factor, which determines the importance of future rewards The main difference with traditional forms of supervised and unsupervised learning is the time taken to calculate the reward, which in reinforcement learning is not instantaneous; it comes after a set of steps. Thus, the next state depends on the current state and the decision maker's action, and the state is not dependent on all the previous states (it doesn't have memory), thus it complies with the Markov property. Since this is a Markov decision process, the probability of state st+1 depends only on the current state st and action at: Unrolled reinforcement mechanism The goal of the whole process is to generate a policy P, that maximizes rewards. The training samples are tuples, <s, a, r>.  Optimizing the Markov process Reinforcement learning is an iterative interaction between an agent and the environment. The following occurs at each timestep: The process is in a state and the decision-maker may choose any action that is available in that state The process responds at the next timestep by randomly moving into a new state and giving the decision-maker a corresponding reward The probability that the process moves into its new state is influenced by the chosen action in the form of a state transition function Basic RL techniques: Q-learning One of the most well-known reinforcement learning techniques, and the one we will be implementing in our example, is Q-learning. Q-learning can be used to find an optimal action for any given state in a finite Markov decision process. Q-learning tries to maximize the value of the Q-function that represents the maximum discounted future reward when we perform action a in state s. Once we know the Q-function, the optimal action a in state s is the one with the highest Q- value. We can then define a policy π(s), that gives us the optimal action in any state, expressed as follows: We can define the Q-function for a transition point (st, at, rt, st+1) in terms of the Q-function at the next point (st+1, at+1, rt+1, st+2), similar to what we did with the total discounted future reward. This equation is known as the Bellman equation for Q-learning: In practice, we  can think of the Q-function as a lookup table (called a Q-table) where the states (denoted by s) are rows and the actions (denoted by a) are columns, and the elements (denoted by Q(s, a)) are the rewards that you get if you are in the state given by the row and take the action given by the column. The best action to take at any state is the one with the highest reward: initialize Q-table Q observe initial state s while (! game_finished): select and perform action a get reward r advance to state s' Q(s, a) = Q(s, a) + α(r + γ max_a' Q(s', a') - Q(s, a)) s = s' You will realize that the algorithm is basically doing stochastic gradient descent on the Bellman equation, backpropagating the reward through the state space (or episode) and averaging over many trials (or epochs). Here, α is the learning rate that determines how much of the difference between the previous Q-value and the discounted new maximum Q- value should be incorporated.  We can represent this process with the following flowchart: We have successfully reviewed Q-Learning, one of the most important and innovative architecture of reinforcement learning that have appeared in recent. Every day, such reinforcement models are applied in innovative ways, whether to generate feasible new elements from a selection of previously known classes or even to win against professional players in strategy games. If you enjoyed this excerpt from the book Machine learning for developers, check out the book below.
Read more
  • 0
  • 0
  • 6831
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-improve-interpretability-machine-learning-systems
Sugandha Lahoti
12 Mar 2018
6 min read
Save for later

How to improve interpretability of machine learning systems

Sugandha Lahoti
12 Mar 2018
6 min read
Advances in machine learning have greatly improved products, processes, and research, and how people might interact with computers. One of the factors lacking in machine learning processes is the ability to give an explanation for their predictions. The inability to give a proper explanation of results leads to end-users losing their trust over the system, which ultimately acts as a barrier to the adoption of machine learning. Hence, along with the impressive results from machine learning, it is also important to understand why and where it works, and when it won’t. In this article, we will talk about some ways to increase machine learning interpretability and make predictions from machine learning models understandable. 3 interesting methods for interpreting Machine Learning predictions According to Miller, interpretability is the degree to which a human can understand the cause of a decision. Interpretable predictions lead to better trust and provide insight into how the model may be improved. The kind of machine learning developments happening in the present times require a lot of complex models, which lack in interpretability. Simpler models (e.g. linear models), on the other hand,  often give a correct interpretation of a prediction model’s output, but they are often less accurate than complex models. Thus creating a tension between accuracy and interpretability. Complex models are less interpretable as their relationships are generally not concisely summarized. However, if we focus on a prediction made on a particular sample, we can describe the relationships more easily. Balancing the trade-off between model complexity and interpretability lies at the heart of the research done in the area of developing interpretable deep learning and machine learning models. We will discuss a few methods to increase the interpretability of complex ML models by summarizing model behavior with respect to a single prediction. LIME or Local Interpretable Model-Agnostic Explanations, is a method developed in the paper Why should I trust you? for interpreting individual model predictions based on locally approximating the model around a given prediction. LIME uses two approaches to explain specific predictions: perturbation and linear approximation. With Perturbation, LIME takes a prediction that requires explanation and systematically perturbs its inputs. These perturbed inputs become new, labeled training data for a simpler approximate model. It then does local linear approximation by fitting a linear model to describe the relationships between the (perturbed) inputs and outputs. Thus a simple linear algorithm approximates the more complex, nonlinear function. DeepLIFT (Deep Learning Important FeaTures) is another method which serves as a recursive prediction explanation method for deep learning.  This method decomposes the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT assigns contribution scores based on the difference between activation of each neuron and its ‘reference activation’. DeepLIFT can also reveal dependencies which are missed by other approaches by optionally giving separate consideration to positive and negative contributions. Layer-wise relevance propagation is another method for interpreting the predictions of deep learning models. It determines which features in a particular input vector contribute most strongly to a neural network’s output.  It defines a set of constraints to derive a number of different relevance propagation functions. Thus we saw 3 different ways of summarizing model behavior with a single prediction to increase model interpretability. Another important avenue to interpret machine learning models is to understand (and rethink) generalization. What is generalization and how it affects Machine learning interpretability Machine learning algorithms are trained on certain datasets, called training sets. During training, a model learns intrinsic patterns in data and updates its internal parameters to better understand the data. Once training is over, the model is tried upon test data to predict results based on what it has learned. In an ideal scenario, the model would always accurately predict the results for the test data. In reality, what happens is that the model is able to identify all the relevant information in the training data, but sometimes fails when presented with the new data. This difference between “training error” and “test error” is called the generalization error. The ultimate aim of turning a machine learning system to a scalable product is generalization. Every task in ML wants to create a generalized algorithm that acts in the same way for all kind of distributions. And the ability to distinguish models that generalize well from those that do not, will not only help to make ML models more interpretable, but it might also lead to more principled and reliable model architecture design. According to the conventional statistical theory, small generalization error is either due to properties of the model family or because of the regularization techniques used during training. A recent paper at ICLR 2017,  Understanding deep learning requires rethinking generalization shows that current machine learning theoretical frameworks fail to explain the impressive results of deep learning approaches and why understanding deep learning requires rethinking generalization. They support their findings through extensive systematic experiments. Developing human understanding through visualizing ML models Interpretability also means creating models that support human understanding of machine learning. Human interpretation is enhanced when visual and interactive diagrams and figures are used for the purpose of explaining the results of ML models. This is why a tight interplay of UX design with Machine learning is essential for increasing Machine learning interpretability. Walking along the lines of Human-centered Machine Learning, researchers at Google, OpenAI, DeepMind, YC Research and others have come up with Distill. This open science journal features articles which have a clear exposition of machine learning concepts using excellent interactive visualization tools. Most of these articles are aimed at understanding the inner working of various machine learning techniques. Some of these include: An article on attention and Augmented Recurrent Neural Networks which has a beautiful visualization of attention distribution in RNN. Another one on feature visualization, which talks about how neural networks build up their understanding of images Google has also launched the PAIR initiative to study and design the most effective ways for people to interact with AI systems. It helps researchers understand ML systems through work on interpretability and expanding the community of developers. R2D3 is another website, which provides an excellent visual introduction to machine learning. Facets is another tool for visualizing and understanding training datasets to provide a human-centered approach to ML engineering. Conclusion Human-Centered Machine Learning is all about increasing machine learning interpretability of ML systems and in developing their human understanding. It is about ML and AI systems understanding how humans reason, communicate and collaborate. As algorithms are used to make decisions in more angles of everyday life, it’s important for data scientists to train them thoughtfully to ensure the models make decisions for the right reasons. As more progress is done in this area, ML systems will not make commonsense errors or violate user expectations or place themselves in situations that can lead to conflict and harm, making such systems safer to use.  As research continues in this area, machines will soon be able to completely explain their decisions and their results in the most humane way possible.
Read more
  • 0
  • 0
  • 6738

article-image-data-scientist-sexiest-role-21st-century
Aarthi Kumaraswamy
08 Nov 2017
6 min read
Save for later

Data Scientist: The sexiest role of the 21st century

Aarthi Kumaraswamy
08 Nov 2017
6 min read
"Information is the oil of the 21st century, and analytics is the combustion engine." -Peter Sondergaard, Gartner Research By 2018, it is estimated that companies will spend $114 billion on big data-related projects, an increase of roughly 300%, compared to 2013 (https://www.capgemini-consulting.com/resource-file-access/resource/pdf/big_dat a_pov_03-02-15.pdf). Much of this increase in expenditure is due to how much data is being created and how we are better able to store such data by leveraging distributed filesystems such as Hadoop. However, collecting the data is only half the battle; the other half involves data extraction, transformation, and loading into a computation system, which leverages the power of modern computers to apply various mathematical methods in order to learn more about data and patterns and extract useful information to make relevant decisions. The entire data workflow has been boosted in the last few years by not only increasing the computation power and providing easily accessible and scalable cloud services (for example, Amazon AWS, Microsoft Azure, and Heroku) but also by a number of tools and libraries that help to easily manage, control, and scale infrastructure and build applications. Such a growth in the computation power also helps to process larger amounts of data and to apply algorithms that were impossible to apply earlier. Finally, various computation- expensive statistical or machine learning algorithms have started to help extract nuggets of information from data. Finding a uniform definition of data science is akin to tasting wine and comparing flavor profiles among friends—everyone has their own definition and no one description is more accurate than the other. At its core, however, data science is the art of asking intelligent questions about data and receiving intelligent answers that matter to key stakeholders. Unfortunately, the opposite also holds true—ask lousy questions of the data and get lousy answers! Therefore, careful formulation of the question is the key for extracting valuable insights from your data. For this reason, companies are now hiring data scientists to help formulate and ask these questions. At first, it's easy to paint a stereotypical picture of what a typical data scientist looks like: t- shirt, sweatpants, thick-rimmed glasses, and debugging a chunk of code in IntelliJ... you get the idea. Aesthetics aside, what are some of the traits of a data scientist? One of our favorite posters describing this role is shown here in the following diagram: Math, statistics, and general knowledge of computer science is given, but one pitfall that we see among practitioners has to do with understanding the business problem, which goes back to asking intelligent questions of the data. It cannot be emphasized enough: asking more intelligent questions of the data is a function of the data scientist's understanding of the business problem and the limitations of the data; without this fundamental understanding, even the most intelligent algorithm would be unable to come to solid conclusions based on a wobbly foundation. A day in the life of a data scientist This will probably come as a shock to some of you—being a data scientist is more than reading academic papers, researching new tools, and model building until the wee hours of the morning, fueled on espresso; in fact, this is only a small percentage of the time that a data scientist gets to truly play (the espresso part however is 100% true for everyone)! Most part of the day, however, is spent in meetings, gaining a better understanding of the business problem(s), crunching the data to learn its limitations (take heart, this book will expose you to a ton of different feature engineering or feature extractions tasks), and how best to present the findings to non data-sciencey people. This is where the true sausage making process takes place, and the best data scientists are the ones who relish in this process because they are gaining more understanding of the requirements and benchmarks for success. In fact, we could literally write a whole new book describing this process from top-to-tail! So, what (and who) is involved in asking questions about data? Sometimes, it is process of saving data into a relational database and running SQL queries to find insights into data: "for the millions of users that bought this particular product, what are the top 3 OTHER products also bought?" Other times, the question is more complex, such as, "Given the review of a movie, is this a positive or negative review?" This book is mainly focused on complex questions, like the latter. Answering these types of questions is where businesses really get the most impact from their big data projects and is also where we see a proliferation of emerging technologies that look to make this Q and A system easier, with more functionality. Some of the most popular, open source frameworks that look to help answer data questions include R, Python, Julia, and Octave, all of which perform reasonably well with small (X < 100 GB) datasets. At this point, it's worth stopping and pointing out a clear distinction between big versus small data. Our general rule of thumb in the office goes as follows: If you can open your dataset using Excel, you are working with small data. Working with big data What happens when the dataset in question is so vast that it cannot fit into the memory of a single computer and must be distributed across a number of nodes in a large computing cluster? Can't we just rewrite some R code, for example, and extend it to account for more than a single-node computation? If only things were that simple! There are many reasons why the scaling of algorithms to more machines is difficult. Imagine a simple example of a file containing a list of names: B D X A D A We would like to compute the number of occurrences of individual words in the file. If the file fits into a single machine, you can easily compute the number of occurrences by using a combination of the Unix tools, sort and uniq: bash> sort file | uniq -c The output is as shown ahead: 2 A 1 B 1 D 1 X However, if the file is huge and distributed over multiple machines, it is necessary to adopt a slightly different computation strategy. For example, compute the number of occurrences of individual words for every part of the file that fits into the memory and merge the results together. Hence, even simple tasks, such as counting the occurrences of names, in a distributed environment can become more complicated. The above is an excerpt from the book  Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. If you would like to learn how to solve the above problem and other cool machine learning tasks a data scientist carries out such as the following, check out the book. Use Spark streams to cluster tweets online Run the PageRank algorithm to compute user influence Perform complex manipulation of DataFrames using Spark Define Spark pipelines to compose individual data transformations Utilize generated models for off-line/on-line prediction
Read more
  • 0
  • 0
  • 5871

article-image-thanks-deepcode-ai-can-help-you-write-cleaner-code
Richard Gall
30 Apr 2018
2 min read
Save for later

Thanks to DeepCode, AI can help you write cleaner code

Richard Gall
30 Apr 2018
2 min read
DeepCode is a tool that uses artificial intelligence to help software engineers write cleaner code. It's a bit like Grammarly or the Hemingway Editor, but for code. It works in an ingenious way. Using AI, it reads your GitHub repositories and highlights anything that might be broken or cause compatibility issues. It is currently only available for Java, JavaScript, and Python, but more languages are going to be added. DeepCode is more than a debugger Sure, DeepCode might sound a little like a glorified debugger. But it's important to understand it's much more than that. It doesn't just correct errors, it can actually help you to improve the code you write. That means the project's mission isn't just code that works, but code that works better. It's thanks to AI that DeepCode is able to support code performance too - the software learns 'rules' about how code works best. And because DeepCode is an AI system, it's only going to get better as it learns more. Speaking to TechCrunch, Boris Paskalev claimed that DeepCode has more than 250,000 rules. This is "growing daily." Paskalev went on to explain: "We built a platform that understands the intent of the code... We autonomously understand millions of repositories and note the changes developers are making. Then we train our AI engine with those changes and can provide unique suggestions to every single line of code analyzed by our platform.” DeepCode is a compelling prospect for developers. As applications become more complex, and efficiency becomes increasingly more important, a simple solution to unlocking greater performance could be invaluable. It's no surprise that it has already raised 1.1 milion in investment from VC company btov. It's only going to become more popular with investors as the popularity of the platform grows. This might mean the end of spaghetti code, which can only be a good thing. Find out more about DeepCode and it's pricing here. Read more: Active Learning: An approach to training machine learning models efficiently
Read more
  • 0
  • 0
  • 5279

article-image-baidu-releases-ezdl-a-platform-that-lets-you-build-ai-and-machine-learning-models-without-any-coding-knowledge
Melisha Dsouza
03 Sep 2018
3 min read
Save for later

Baidu releases EZDL - a platform that lets you build AI and machine learning models without any coding knowledge

Melisha Dsouza
03 Sep 2018
3 min read
Chinese internet giant Baidu released ‘EZDL’ on September 1. EZDL allows businesses to create and deploy AI and machine learning models without any prior coding skills. With a simple drag-and-drop interface, it takes only four steps to train a deep learning model that’s built specifically for a business’ needs. This is particularly good news for small and medium sized businesses for whom leveraging artificial intelligence might ordinarily prove challenging. Youping Yu, general manager of Baidu’s AI ecosystem division, claims that EZDL will allow everyone to access AI “in the most convenient and equitable way”. How does EZDL work? EZDL focuses on three important aspects of machine learning: image classification, sound classification, and object detection. One of the most notable features about EZDL is the small size of the training data sets required to create artificial intelligence models. For image classification and object recognition, it requires just 20 to 100 images per label. For sound classification, it needs only 50 audio files at the most. The training can be completed in just 15 minutes in some cases, or a maximum of one hour for more complex models. After a model has been trained, the algorithm can be downloaded as a SDK or uploaded into a public or private cloud platform. The algorithms created support a range of operating systems, including Android and iOS. Baidu also claims an accuracy of more than 90 percent in two-thirds of the models it creates. How EZDL is already being used by businesses Baidu has demonstrated many use cases for EZDL. For example: A home decorating website called ‘Idcool’ uses EZDL to train systems that automatically identify the design and style of a room with 90 percent accuracy. An unnamed medical institution is using EZDL to develop a detection model for blood testing. A security monitoring firm used it to make a sound-detecting algorithm that can recognize “abnormal” audio patterns that might signal a break-in. Baidu is clearly making its mark in the AI race. This latest release follows the launch of its Baidu Brain platform for enterprises two years ago. Baidu Brain is already used by more than 600,000 developers. Another AI service launched by the company is its conversational DuerOS digital assistant, which is installed on more than 100 million devices. As if all that weren't enough, Baidu has also been developing hardware for artificial intelligence systems in the form of its Kunlun chip, designed for edge computing and data center processing - it’s slated for launch later this year. Baidu will demo EZDL at TechCrunch Disrupt SF, September 5th to 7th at Moscone West, 800 Howard St., San Francisco. For more on EZDL visit the Baidu's website for the project. Read next Baidu Apollo autonomous driving vehicles gets machine learning based auto-calibration system Baidu announces ClariNet, a neural network for text-to-speech synthesis
Read more
  • 0
  • 0
  • 4825
article-image-google-cloud-next-fei-fei-li-reveals-new-ai-tools-for-developers
Richard Gall
25 Jul 2018
3 min read
Save for later

Google Cloud Next: Fei-Fei Li reveals new AI tools for developers

Richard Gall
25 Jul 2018
3 min read
AI was always going to be a central theme of this year's Google Cloud Next, and the company hasn't disappointed. In a blog post, Fei-Fei Li, Chief Scientist at Google AI, has revealed a number of new products that will make AI more accessible for developers. Expanding Cloud AutoML [caption id="attachment_21059" align="alignright" width="300"] Fei-Fei Li at Ai for Good in 2017 (via commons.wikimedia.org)[/caption] In her blog post, Li notes that there is a "significant gap" in the machine learning world. On the one hand data scientists build solutions from the ground up, while on the other, pre-trained solutions can deliver immediate results with little work from engineers. With Cloud AutoML Google has made a pitch to the middle ground: those that require more sophistication that pre-built models, but don't have the resources to build a system from scratch. Li provides detail on a number of new developments within the Cloud AutoML project, that are being launched as part of Google Cloud Next. This includes AutoML Vision, which "extends the Cloud Vision API to recognize entirely new categories of images." It also includes two completely new language-related machine learning tools: AutoML Natural Language and AutoML Translation. AutoML Natural Language will allow users to perform Natural Language Processing - this could, for example, help organizations manage content at scale. AutoML Translation meanwhile could be particularly useful for organizations looking to go global with content distribution and marketing. Improvements to Google machine learning APIs Li also revealed that Google are launching updates to a number of key updates to APIs: The Google Cloud Vision API "now recognizes handwriting, supports additional file types (PDF and TIFF) and product search, and can identify where an object is located within an image" according to Li. The Cloud Text-to-Speech and Cloud Speech-to-Text also have updates that build in greater sophistication in areas such as translation, for example. Bringing AI to customer service with Contact Center AI The final important announcement by Li centers on conversational UI using AI. Part of this was an update to Diagflow Enterprise Edition, a Google-owned tool that makes building conversational UI easier. Text to speech capabilities have been added to the tool alongside its speech to text capability, which came with its launch in November 2017. But the big reveal is Contact Center AI. This builds on Diagflow and is essentially a complete customer service AI solution. Contact Center AI bridges the gap between virtual assistant and human custom service representative, supporting the entire journey from customer query to resolution. It has the potential to be a game changer when it comes to customer support. Read next: Decoding the reasons behind Alphabet’s record high earnings in Q2 2018 Google Cloud Launches Blockchain Toolkit to help developers build apps easily Google’s Daydream VR SDK finally adds support for two controllers
Read more
  • 0
  • 0
  • 4778

article-image-convolutional-neural-networks%e2%80%afcnns-a-breakthrough-in-image-recognition
Expert Network
15 Mar 2021
9 min read
Save for later

Convolutional Neural Networks (CNNs) - A Breakthrough In Image Recognition 

Expert Network
15 Mar 2021
9 min read
A CNN is a combination of two components: a feature extractor module followed by a trainable classifier. The first component includes a stack of convolution, activation, and pooling layers. A dense neural network (DNN) does the classification. Each neuron in a layer is connected to those in the next layer.  This article is an excerpt from the book, Machine Learning Using TensorFlow Cookbook by Alexia Audevart, Konrad Banachewicz and Luca Massaron who are Kaggle Masters and Google Developer Experts.  Implementing a simple CNN  In this section, we will develop a CNN based on the LeNet-5 architecture, which was first introduced in 1998 by Yann LeCun et al. for handwritten and machine-printed character recognition.      Figure 1: LeNet-5 architecture – Original image published in [LeCun et al., 1998]  This architecture consists of two sets of CNNs composed of convolution-ReLU-max pooling operations used for feature extraction, followed by a flattening layer and two fully connected layers to classify the images. Our goal will be to improve upon our accuracy in predicting MNIST digits.  Getting ready  To access the MNIST data, Keras provides a package (tf.keras.datasets) that has excellent dataset-loading functionalities. (Note that TensorFlow also provides its own collection of ready-to-use datasets with the TF Datasets API.) After loading the data, we will set up our model variables, create the model, train the model in batches, and then visualize loss, accuracy, and some sample digits.  How to do it... Perform the following steps:  First, we'll load the necessary libraries and start a graph session:  import matplotlib.pyplot as plt  import numpy as np  import tensorflow as tf   2. Next, we will load the data and reshape the images in a four-dimensional matrix:  (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()   # Reshape  x_train = x_train.reshape(-1, 28, 28, 1)  x_test = x_test.reshape(-1, 28, 28, 1)  #Padding the images by 2 pixels  x_train = np.pad(x_train, ((0,0),(2,2),(2,2),(0,0)), 'constant')  x_test = np.pad(x_test, ((0,0),(2,2),(2,2),(0,0)), 'constant')  “Note that the MNIST dataset downloaded here includes training and test datasets. These datasets are composed of the grayscale images (integer arrays with shape (num_sample, 28,28)) and the labels (integers in the range 0-9). We pad the images by 2 pixels since in the LeNet-5 paper input images were 32x32.”  3. Now, we will set the model parameters. Remember that the depth of the image (number of channels) is 1 because these images are grayscale. We'll also set up a seed to have reproducible results:  image_width = x_train[0].shape[0]  image_height = x_train[0].shape[1]  num_channels = 1 # grayscale = 1 channel  seed = 98  np.random.seed(seed)  tf.random.set_seed(seed)  4. We'll declare our training data variables and our test data variables. We will have different batch sizes for training and evaluation. You may change these, depending on the physical memory that is available for training and evaluating:  batch_size = 100  evaluation_size = 500  epochs = 300  eval_every = 5  5. We'll normalize our images to change the values of all pixels to a common scale:  x_train = x_train / 255  x_test = x_test/ 255  6. Now we'll declare our model. We will have the feature extractor module composed of two convolutional/ReLU/max pooling layers followed by the classifier with fully connected layers. Also, to get the classifier to work, we flatten the output of the feature extractor module so we can use it in the classifier. Note that we use a softmax activation function at the last layer of the classifier. Softmax turns numeric output (logits) into probabilities that sum to one.  input_data = tf.keras.Input(dtype=tf.float32, shape=(image_width,image_height, num_channels), name="INPUT")   # First Conv-ReLU-MaxPool Layer  conv1 = tf.keras.layers.Conv2D(filters=6,                                kernel_size=5,                                padding='VALID',                                activation="relu",                                name="C1")(input_data)  max_pool1 = tf.keras.layers.MaxPool2D(pool_size=2,                                       strides=2,                                        padding='SAME',                                       name="S1")(conv1)  # Second Conv-ReLU-MaxPool Layer  conv2 = tf.keras.layers.Conv2D(filters=16,                                kernel_size=5,                                padding='VALID',                                strides=1,                                activation="relu",                                name="C3")(max_pool1)  max_pool2 = tf.keras.layers.MaxPool2D(pool_size=2,                                       strides=2,                                        padding='SAME',                                       name="S4")(conv2)  # Flatten Layer  flatten = tf.keras.layers.Flatten(name="FLATTEN")(max_pool2)  # First Fully Connected Layer  fully_connected1 = tf.keras.layers.Dense(units=120,                                          activation="relu",                                          name="F5")(flatten)  # Second Fully Connected Layer  fully_connected2 = tf.keras.layers.Dense(units=84,                                          activation="relu",                                          name="F6")(fully_connected1)  # Final Fully Connected Layer  final_model_output = tf.keras.layers.Dense(units=10,                                            activation="softmax",                                            name="OUTPUT"                                            )(fully_connected2)  model = tf.keras.Model(inputs= input_data, outputs=final_model_output)  7. Next, we will compile the model using an Adam (Adaptive Moment Estimation) optimizer. Adam uses Adaptive Learning Rates and Momentum that allow us to get to local minima faster, and so, to converge faster. As our targets are integers and not in a one-hot encoded format, we will use the sparse categorical cross-entropy loss function. Then we will also add an accuracy metric to determine how accurate the model is on each batch.  model.compile(     optimizer="adam",      loss="sparse_categorical_crossentropy",     metrics=["accuracy"]     8. Next, we print a string summary of our network.  model.summary()   Figure 4: The LeNet-5 architecture  The LeNet-5 model has 7 layers and contains 61,706 trainable parameters. So, let's go to train the model.  9. We can now start training our model. We loop through the data in randomly chosen batches. Every so often, we choose to evaluate the model on the train and test batches and record the accuracy and loss. We can see that, after 300 epochs, we quickly achieve 96%-97% accuracy on the test data:  train_loss = []  train_acc = []  test_acc = []  for i in range(epochs):     rand_index = np.random.choice(len(x_train), size=batch_size)     rand_x = x_train[rand_index]     rand_y = y_train[rand_index]     history_train = model.train_on_batch(rand_x, rand_y)     if (i+1) % eval_every == 0:         eval_index = np.random.choice(len(x_test), size=evaluation_size)         eval_x = x_test[eval_index]         eval_y = y_test[eval_index]         history_eval = model.evaluate(eval_x,eval_y)         # Record and print results         train_loss.append(history_train[0])         train_acc.append(history_train[1])         test_acc.append(history_eval[1])         acc_and_loss = [(i+1), history_train  [0], history_train[1], history_eval[1]]         acc_and_loss = [np.round(x,2) for x in acc_and_loss]         print('Epoch # {}. Train Loss: {:.2f}. Train Acc (Test Acc): {:.2f} ({:.2f})'.format(*acc_and_loss))   10. This results in the following output:  Epoch # 5. Train Loss: 2.19. Train Acc (Test Acc): 0.23 (0.34)  Epoch # 10. Train Loss: 2.01. Train Acc (Test Acc): 0.59 (0.58)  Epoch # 15. Train Loss: 1.71. Train Acc (Test Acc): 0.74 (0.73)  Epoch # 20. Train Loss: 1.32. Train Acc (Test Acc): 0.73 (0.77)  ...  Epoch # 290. Train Loss: 0.18. Train Acc (Test Acc): 0.95 (0.94)  Epoch # 295. Train Loss: 0.13. Train Acc (Test Acc): 0.96 (0.96)  Epoch # 300. Train Loss: 0.12. Train Acc (Test Acc): 0.95 (0.97)  11. The following is the code to plot the loss and accuracy using Matplotlib:  # Matlotlib code to plot the loss and accuracy  eval_indices = range(0, epochs, eval_every)  # Plot loss over time  plt.plot(eval_indices, train_loss, 'k-')  plt.title('Loss per Epoch')  plt.xlabel('Epoch')  plt.ylabel('Loss')  plt.show()  # Plot train and test accuracy  plt.plot(eval_indices, train_acc, 'k-', label='Train Set Accuracy')  plt.plot(eval_indices, test_acc, 'r--', label='Test Set Accuracy')  plt.title('Train and Test Accuracy')  plt.xlabel('Epoch')  plt.ylabel('Accuracy')  plt.legend(loc='lower right')  plt.show()   We then get the following plots:  Figure 5: The left plot is the train and test set accuracy across our 300 training epochs. The right plot is the softmax loss value over 300 epochs.    If we want to plot a sample of the latest batch results, here is the code to plot a sample consisting of six of the latest results:  # Plot some samples and their predictions  actuals = y_test[30:36]  preds = model.predict(x_test[30:36])  predictions = np.argmax(preds,axis=1)  images = np.squeeze(x_test[30:36])  Nrows = 2  Ncols = 3 for i in range(6):     plt.subplot(Nrows, Ncols, i+1)     plt.imshow(np.reshape(images[i], [32,32]), cmap='Greys_r')     plt.title('Actual: ' + str(actuals[i]) + ' Pred: ' + str(predictions[i]),                                fontsize=10)     frame = plt.gca()     frame.axes.get_xaxis().set_visible(False)     frame.axes.get_yaxis().set_visible(False)  plt.show()   We get the following output for the code above:  Figure 6: A plot of six random images with the actual and predicted values in the title. The lower-left picture was predicted to be a 6, when in fact it is a 4.  Using a simple CNN, we achieved a good result in accuracy and loss for this dataset.  How it works...  We increased our performance on the MNIST dataset and built a model that quickly achieves about 97% accuracy while training from scratch. Our features extractor module is a combination of convolutions, ReLU, and max pooling. Our classifier is a stack of fully connected layers. We trained in batches of size 100 and looked at the accuracy and loss across the epochs. Finally, we also plotted six random digits and found that the model prediction fails to predict one image. The model predicts a 6 when in fact it's a 4.  CNN does very well with image recognition. Part of the reason for this is that the convolutional layer creates its low-level features that are activated when they come across a part of the image that is important. This type of model creates features on its own and uses them for prediction.  Summary:  This article highlights how to create a simple CNN, based on the LeNet-5 architecture. The recipes cited in the book Machine Learning Using TensorFlow enable you to perform complex data computations and gain valuable insights into data.  About the Authors  Alexia Audevart, is a Google Developer Expert in machine learning and the founder of Datactik. She is a data scientist and helps her clients solve business problems by making their applications smarter.   Konrad Banachewicz holds a PhD in statistics from Vrije Universiteit Amsterdam. He is a lead data scientist at eBay and a Kaggle Grandmaster.   Luca Massaron is a Google Developer Expert in machine learning with more than a decade of experience in data science. He is also a Kaggle master who reached number 7 for his performance in data science competitions. 
Read more
  • 0
  • 0
  • 4746

article-image-you-can-now-make-music-with-ai-thanks-to-magenta-js
Richard Gall
04 May 2018
3 min read
Save for later

You can now make music with AI thanks to Magenta.js

Richard Gall
04 May 2018
3 min read
Google Brain's Magenta project has released Magenta.js, a tool that could open up new opportunities in developing music and art with AI. The Magenta team have been exploring a range of ways to create with machine learning, but with Magenta.js, they have developed a tool that's going to open up the very domain they've been exploring to new people. Let's take a look at how the tool works, what the aims are, and how you can get involved. How does Magenta.js work? Magenta.js is a JavaScript suite that runs on TensorFlow.js, which means it can run machine learning models in the browser. The team explains that JavaScript has been a crucial part of their project, as they have been eager to make sure they bridge the gap between the complex research they are doing and their end users. They want their research to result in tools that can actually be used. As they've said before: "...we often face conflicting desires: as researchers we want to push forward the boundaries of what is possible with machine learning, but as tool-makers, we want our models to be understandable and controllable by artists and musicians." As they note, JavaScript has informed a number of projects that have preceded Magenta.js, such as Latent Loops, Beat Blender and Melody Mixer. These tools were all built using MusicVAE, a machine learning model that forms an important part of the Magenta.js suite. The first package you'll want to pay attention to in Magenta.js is @magenta/music. This package features a number of Magenta's machine learning models for music including MusicVAE and DrumsRNN. Thanks to Magenta.js you'll be able to quickly get started. You can use a number of the project's pre-trained models which you can find on GitHub here. What next for Magenta.js? The Magenta team are keen for people to start using the tools they develop. They want a community of engineers, artists and creatives to help them drive the project forward. They're encouraging anyone who develops using Magenta.js to contribute to the GitHub repo. Clearly, this is a project where openness is going to be a huge bonus. We're excited to not only see what the Magenta team come up with next, but also the range of projects that are built using it. Perhaps we'll begin to see a whole new creative movement emerge? Read more on the project site here.
Read more
  • 0
  • 0
  • 4726
article-image-tensorflow-1-10-rc0-released
Amey Varangaonkar
24 Jul 2018
2 min read
Save for later

Tensorflow 1.10 RC0 released

Amey Varangaonkar
24 Jul 2018
2 min read
Continuing the recent trend of rapid updates introducing significant fixes and new features, Google have released the first release candidate for Tensorflow 1.10. TensorFlow 1.10 RC0 brings some improvements in model training and evaluation, and also how Tensorflow runs in a local environment. This is Tensorflow’s fifth update release in just over a month, which includes two major version updates, the previous one being Tensorflow 1.9 What’s new in Tensorflow 1.10 RC0? The tf.contrib.distributions module will be deprecated in this version. This module is primarily used to work with statistical distributions Upgrade to NCCL  2.2 will be mandatory in order to perform GPU computing with this version of Tensorflow, for added performance and efficiency. Model training speed can now be optimized by improving the communication between the model and the Tensorflow resources. For this, the RunConfig function has been updated in this version. The Tensorflow development team also announced support for Bazel - a popular build and testing automation software - and deprecated support for cmake starting with Tensorflow 1.11. This version also incorporated some bug fixes and performance improvements to the tf.data, tf.estimator and other related modules. To get full details on the features list of this release candidate, you can check out Tensorflow’s official release page on Github. No news on Tensorflow 2.0 yet Many developers were expecting the next major release of Tensorflow, Tensorflow 2.0, to be released in late July or August. However, the announcement of this release candidate and the mention of the next version update (1.11) means they will have to wait for some more time before they get to know more about the next breakthrough release. Read more Why Twitter (finally!) migrated to Tensorflow Python, Tensorflow, Excel and more – Data professionals reveal their top tools Can a production ready Pytorch 1.0 give TensorFlow a tough time?
Read more
  • 0
  • 0
  • 4617

article-image-4093-2
Savia Lobo
05 Feb 2018
6 min read
Save for later

AutoML : Developments and where is it heading to

Savia Lobo
05 Feb 2018
6 min read
With the growing demand in ML applications, there is also a demand for machine learning tasks such as data preprocessing, optimizing model hyperparameters and so on to be easily handled by non-experts. This is because, these tasks were repetitive and due to the complexity were considered to be handled only by ML experts. To support this cause and to maintain off-the-shelf quality of machine learning methods without expert knowledge, Google came out with a project named AutoML, an approach that automates designing of ML models. You could also refer to our article on Automated Machine Learning (AutoML) for a clear understanding on how AutoML functions. Trying AutoML on smaller datasets AutoML brought in altogether new dimensions within machine learning workflows where repetitive tasks performed by human experts could be taken over by machines. When Google started off with AutoML, they applied the AutoML approach onto two smaller datasets in DL namely, CIFAR-10 and Penn Treebank to test them on image recognition and language modeling tasks respectively. The result was, AutoML approach could design models that were at par with the ones designed by the ML experts. Also, on comparing the designs drafted by humans and AutoML, it was seen that the machine-suggested architecture included new elements. These elements were later known to alleviate gradient vanishing/exploding issues, which concludes that the machines provided a new architecture which could be more useful for multiple tasks. Also, the machine designed architecture has many channels so that the gradients could flow backwards. This could help explain why LSTM RNNs work better than standard RNNs. Trying AutoML on larger datasets After a success in small scale datasets, Google tested AutoML on large scale datasets such as ImageNet and COCO object detection dataset. Testing AutoML on these was a challenge because of their higher orders of magnitude, and also because simply applying AutoML directly to ImageNet would require many months of training the AutoML method. In order to apply AutoML to large scale datasets, some alterations were made within the AutoML approach for it to be more tractable to large scale datasets. The changes include: Redesigning the search space so that AutoML could find the best layer which can then be stacked many times in a flexible manner to create a final network. Carry out architecture search on CIFAR-10 dataset and transfer the best learned architecture to ImageNet image classification and COCO object detection datasets. Thus, AutoML could find out two best layers i.e normal cell and reduction cell, which when combined resulted into a novel architecture called as “NASNet”. These two work well with CIFAR-10, and also ImageNet and COCO object detection. NASNet was seen to have a prediction accuracy of 82.7% on the validation, as stated by Google. Such an accuracy surpassed all previous inception models built by Google. Further, the learned features from the ImageNet classification were transferred to carry out object detection tasks using the COCO dataset. The learned features combined with a faster R-CNN  resulted into a state-of-the-art predictive performance on the COCO object detection task in both the largest as well as mobile-optimized models. Google suspected that these image features learned by ImageNet and COCO can be reused for various other computer vision applications. Hence, Google open-sourced NASNet for inference on image classification and for object detection in the Slim and Object Detection TensorFlow repositories. Towards Cloud AutoML: Automated Machine learning platform for everyone Cloud AutoML has been Google’s latest buzz for its customers as it makes AI available for everyone. Using Google’s advanced techniques such as learning2learn and transfer learning, Cloud AutoML helps businesses having limited ML expertise, to start building their own high-quality custom models. Thus, Cloud AutoML benefits AI experts by improving their productivity and explore new fields in AI. The experts can also aid less-skilled engineers to build powerful systems. Companies such as Disney and Urban Outfitters are using AutoML for making search and shopping on their websites more relevant. With AutoML going on cloud, Google released its first Cloud AutoML product, Cloud AutoML Vision, an Image Recognition tool that enables fast and easy to build custom ML models. This tool has a drag-and-drop interface that allows one to easily upload images, train and manage the models, and then deploy those trained models directly on Google Cloud. When used to classify popular public datasets like ImageNet and CIFAR, Cloud AutoML Vision  has shown state-of-the-art results. These results included fewer misclassifications than the generic ML APIs results.    Here are some highlights on Cloud AutoML vision: It is built on Google’s leading image recognition approaches, along with transfer learning and neural architecture search technologies. Hence, one can expect an accurate model even if the business has a limited expertise in ML. One can build a simple model in minutes or a full, production-ready model in a day in order to pilot AI-enabled application. AutoML Vision has a simple graphical UI using which one can easily specify data. It later turns the data into a high quality model customized for one’s specific needs. Starting off with Images, Google plans to roll out Cloud AutoML tools and services for text and audio too. However, Google isn’t the only one in the race; other competitors including AWS and Microsoft are also bringing in tools such as Amazon’s SageMaker and Microsoft’s service for customizing Image recognition model, to aid developers with automating machine learning. Some other automated tools include: Auto-sklearn: An automated project that aids scikit-learn project--package of common machine learning functions--to choose the right estimator function. The Auto-sklearn includes a generic estimator function that conducts analysis to determine the best algorithm and set of hyperparameters for a given Scikit-learn job. Auto-WEKA : An inspiration from the Auto-sklearn is for machine learners using Java programming language and the Weka ML package. Auto-WEKA uses a fully automated approach to select a learning algorithm and sets its hyperparameters, unlike previous methods which used to address this in isolation. H2o Driverless AI : This uses a web-based UI and is specifically designed for business users who want to gain insights from data but do not want to get into the intricacies of machine learning algorithms. This tool allows users to choose one or multiple target variables in the dataset that needs a solution, and the system provides the answer. The results are in the form of interactive charts, explained with annotations in plain English. Currently, Google’s AutoML is leading them. It would be exciting to see how Google scales an automated ML environment exactly the same as traditional ML.   Not only Google, but also other businesses are contributing to the movement towards adopting an automated machine learning ecosystem. We saw some tools joining the automation league and can expect more tools to join them. Also, these tools could go on cloud in future for an extended availability for non-experts, similar to the AutoML cloud by Google. With machine learning going automated, we can expect more and more systems to move a step closer to widening the scope for AI.  
Read more
  • 0
  • 0
  • 4600