Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Tech News - Artificial Intelligence

61 Articles
article-image-predictive-cybersecurity-company-balbix-secures-20-million-investment
Richard Gall
27 Jun 2018
2 min read
Save for later

Predictive cybersecurity company Balbix secures $20M investment

Richard Gall
27 Jun 2018
2 min read
High profile security attacks have put cybersecurity high on the agenda. For most companies it's at best a headache and at worst a full-blown crisis. But if you're in the business of solving these problems, it only makes you more valuable. That's what has happened to Balbix. Balbix is a security solution that allows users to "predict & proactively mitigate breaches before they happen." It does this by using predictive analytics and machine learning to identify possible threats. According to TechCrunch, the company has received the series B investment from a number of different sources. This includes Singtel's Innov8 fund (based in Singapore). Balbix is bringing together machine learning and cybersecurity However, the most interesting part of the story is what Balbix is trying to do. The fact that it's seeing early signs of eager investment indicates that it's moving down the right track when it comes to cybersecurity. The company spends some time outlining how the tool works on its website. Balbix's Breach Control product uses "sensors deployed across your entire enterprise network automatically and continuously discover and monitor all devices, apps and users for hundreds of attack vectors." An 'attack vector' is really just a method of attack, like, for example, phishing or social engineering. The product then uses what the company calls the 'Balbix Brain' to analyse risks within the network. The Balbix Brain is an artificial intelligence system that is designed to do a number of things. It assesses how likely different assets and areas of the network are to be compromised, and highlights the potential impact such a compromise might have. This adds an additional level of intelligence that allows organizations that use the product to make informed decisions about how to act and what to prioritize. But Balbix BreachControl also combines chaos engineering and penetration testing by simulating small-scale attacks across a given network. "Every possible breach path is modeled across all attack vectors to calculate breach risk. " Balbix is aiming to exploit the need for improved security at an enterprise level. In an interview with TechCrunch, CEO  Gaurav Bhanga said “At enterprise scale, keeping everything up to snuff is very hard,” CEO Bangha told TechCrunch in an interview. “Most organizations have little visibility into attack surfaces, the right decisions aren’t made and projects aren’t secured.”
Read more
  • 0
  • 0
  • 1486

article-image-microsoft-start-ai-school-to-teach-machine-learning-and-artificial-intelligence
Amey Varangaonkar
25 Jun 2018
3 min read
Save for later

Microsoft start AI School to teach Machine Learning and Artificial Intelligence

Amey Varangaonkar
25 Jun 2018
3 min read
The race for cloud supremacy is getting interesting with every passing day. The three major competitors - Amazon, Google and Microsoft seem to be coming up with fresh and innovative ideas to attract customers, making them try and adopt their cloud offerings. The most recent dice was thrown by Google - when they announced their free Big Data and Machine Learning training courses for the Google Cloud Platform. These courses allowed the students to build intelligent models on the Google cloud using the cloud-powered resources. Microsoft have now followed suit with their own AI School - the promise of which is quite similar: Allowing professionals to build smart solutions for their businesses using the Microsoft AI platform on Azure. AI School: Offering custom learning paths to master Artificial Intelligence Everyone has a different style and pace of learning. Keeping this in mind, Microsoft have segregated their learning material into different levels - beginner, intermediate and advanced. This helps the intermediate and advanced learners pick up the relevant topics they want to skill up in, without having to compulsorily go through the basics - yet giving them the option to do so in case they’re interested. The topic coverage in the AI School is quite interesting as well - from introduction to deep learning and Artificial Intelligence to building custom conversational AI. In the process, the students will be using a myriad of tools such as Azure Cognitive Services and Microsoft Bot framework for pre-trained AI models, Azure Machine Learning for deep learning and machine learning capabilities as well as Visual Studio and Cognitive Toolkit. The students will have the option of working with their favourite programming language as well - from Java, C# and Node.js to Python and JavaScript. The end goal of this program, as Microsoft puts it perfectly, is to empower the developers to use the trending Artificial Intelligence capabilities within their existing applications to make them smarter and more intuitive. All this while leveraging the power of the Microsoft cloud. Google and Microsoft have stepped up, time for Amazon now? Although Amazon does provide training and certifications for Machine Learning and AI, they are yet to launch their own courses to encourage learners to learn these trending technologies from scratch, and adopt AWS to build their own intelligent models. Considering they dominate the cloud market with almost 2/3rds of the market share, this is quite surprising. Another interesting point to note here is that Microsoft and Google have both taken significant steps to contribute to open source and free learning. While Google-acquired Kaggle is a great platform to host machine learning competitions and thereby learn new, interesting things in the AI space, Microsoft’s recent acquisition of GitHub takes them in the similar direction of promoting the open source culture and sharing free knowledge. Is Amazon waiting for a similar acquisition before they take this step in promoting open source learning? We will have to wait and see.
Read more
  • 0
  • 0
  • 3000

article-image-why-drive-ai-is-going-to-struggle-to-disrupt-public-transport
Richard Gall
09 May 2018
5 min read
Save for later

Why Drive.ai is going to struggle to disrupt public transport

Richard Gall
09 May 2018
5 min read
Drive.ai has announced that it is to begin trialling a self-driving car taxi service in Frisco, Texas this Summer. The trial is to last 6 months as the organization works closely with the Frisco authorities to finalize the details of the routes and to 'educate' the public about how they can be used. But although the news has widely been presented as a step forward for the wider adoption of self-driving cars, the story in fact exposes the way in which self-driving car engineers are struggling to properly disrupt. And that's before it has even begun. Drive.ai's announcement comes shortly after a number of high profile incidents involving self-driving cars. In March, a woman was killed by an Uber self-driving car in Arizona. In May, a Waymo van was involved in a collision in Arizona too. This puts a little more pressure on Drive.ai, and means the trial will be watched particularly closely. Any further issues will only do more to make the wider public resistant to autonomous vehicles. The more issues that appear, the more the very concept of self-driving vehicles begins to look like a Silicon Valley pipe dream. It starts looking like a way for tech entrepreneurs to take advantage of underfunded public infrastructure in the name of disruption and innovation. And this is precisely the problem with the forthcoming Drive.ai trial. For the trial to work, Drive.ai are dependent on the support and collaboration of the Frisco authorities. Yes, there are some positives to this - there's an argument that the future of public life depends on a sort of hybrid of entrepreneurialism and state support. But we're not just talking about using machine learning or deep learning to better understand how to deploy resources more effectively, how to target those most in need of support. In this instance, we're talking about a slightly clunky system. It's a system everyone recognises as clunky - after all, that's why public 'education' is needed. Disruption should be frictionless. Self-driving taxis aren't. Whatever you think of Uber and Airbnb, both organisations have managed to disrupt their respective industries by building platforms that make certain transactions and interactions frictionless. However, when it comes to self-driving taxi services, things are much different. They're not frictionless at all. That's why Drive.ai are having to work with the Frisco authorities to sell the idea to the public. Disruptive tech works best when people immediately get the concept. It's the sort of thing that starts with wouldn't it be great if... No one thinks that about self-driving cars. The self-driving bit is immaterial to most users. Provided their Uber drivers are polite and get them to where they want to go, that's enough. Of course, some people might even like having a driver they can interact with (god forbid!). Sure, you might think I'm missing the point. Self driving cars will be more efficient, right? The cost savings will be passed on to end users. Of course it might - but seen in perspective, lots of things have become more efficient or automated. It doesn't mean we're suddenly all feeling the benefits of our savings. More importantly, this isn't really disruption. You're not radically changing the way you do something based on the needs of the people that use it. Instead you're simply trying to shift their expectations to make it easier to automate jobs. In many instances we're seeing power shift from public organizations to those where technical expertise is located. And that's what's happening here. Artificial intelligence needs to be accessible to be impactful Essentially, the technology is stuck inside the Silicon Valley organizations trying to profit from it. We know for a fact the deep learning and artificial intelligence are at their most exciting and interesting when its accessible to a huge range of people. In the case of Drive.ai, the AI is just the kernel around which all these other moving parts depend - the investment, infrastructure, and acceptance of the technology. Artificial intelligence projects work best when they seem to achieve something seamlessly, not when they require a whole operation just to make it work. The initiatives being run by Drive.ai and its competitors are a tired use of AI. It's almost as if we're chasing the dream of taxi cabs that can drive themselves simply because we simply should. And while there's clearly potential for big money to be made by those organizations working hard to make it work, for many of the cities they're working with, it might not be the best option. Public transport does, after all, already exist. Drive.ai needs users to adapt to the technology Perhaps Drive.ai might just make this work. But it's going to be difficult. That's because the problems of self-driving cars are actually a little different to those many software companies face. Typically the challenge is responding to the needs of users and building the technology accordingly. In this instance, the technology is almost there. The problem facing Drive.ai and others is getting users to accept it. What we learned from CES 2018: Self-driving cars and AI chips are the rage! Apple self-driving cars are back! VoxelNet may drive the autonomous vehicles
Read more
  • 0
  • 0
  • 1509
Banner background image

article-image-nvidia-volta-tensor-core-gpu-hits-performance-milestones
Richard Gall
08 May 2018
3 min read
Save for later

Nvidia's Volta Tensor Core GPU hits performance milestones. But is it the best?

Richard Gall
08 May 2018
3 min read
Nvidia has revealed that its Volta Tensor Core GPU has hit some significant milestones in performance. This is big news for the world of AI. It raises the bar in terms of the complexity and sophistication of the deep learning models that can be built. The Volta Tensor Core GPU has, according to the Nvidia team, has "achieved record-setting ResNet-50 performance for a single chip and single server" thanks to the updates and changes they have made. Here are the headline records and milestones the Volta Tensor Core GPU has hit, according to the team's intensive and rigorous testing: When it trains a ResNet-50, one V100 TensorCore GPU can achieve more than 1,075 images every second. That is apparently four times more than the Pascal GPU, the previous generation of Nvidia's GPU microarchitecture. Last year, one DGX-1 server supported by 8 TensorCore V100s could achieve 4,200 images a second (still a hell of a lot). Now it can achieve 7,850. One AWS P3 cloud instance supported by 8 TensorCore V100s Res-Net50 in less than 3 hours. That's three times faster than on a single TPU. But what do these advances in performance mean in practice? And has Nvidia really managed to outperform its competitors? Volta Tensor Core GPUs might not be as fast as you think Nvidia is clearly pretty excited about what it has achieved. Certainly the power of the Volta Tensor Core GPUs are impressive and not to be sniffed at. But website ExtremeTech poses a caveat. The piece argues that there are problems with using FLOPS ( floating point operations per second) as a metric for performance. This is because the mathematical formula that's used to calculate FLOPs assumes a degree of consistency in how something is processed that may be misleading. One GPU, for example, might have higher potential FLOPS but not be running at capacity. It could, of course be outperformed by an 'inferior' GPU. Other studies (this one from RiseML) have indicated that Google's TPU actually performs better than Nvidia's offering (when using a different test). Admittedly the difference wasn't huge, but enough when you consider that it's significantly cheaper than the Volta. Ultimately, the difference between the two is as much about what you want from your GPU or TPU. Google might give you a little more power but there's much less flexibility than you get with the Volta. It will be interesting to see how the competition changes over the next few years. Based on current form Nvidia and Google are going to be leading the way for some time, whoever has bragging rights about performance. Distributed TensorFlow: Working with multiple GPUs and servers Nvidia Tesla V100 GPUs publicly available in beta on Google Compute Engine and Kubernetes Engine OpenAI announces block sparse GPU kernels for accelerating neural networks
Read more
  • 0
  • 0
  • 3468

article-image-you-can-now-make-music-with-ai-thanks-to-magenta-js
Richard Gall
04 May 2018
3 min read
Save for later

You can now make music with AI thanks to Magenta.js

Richard Gall
04 May 2018
3 min read
Google Brain's Magenta project has released Magenta.js, a tool that could open up new opportunities in developing music and art with AI. The Magenta team have been exploring a range of ways to create with machine learning, but with Magenta.js, they have developed a tool that's going to open up the very domain they've been exploring to new people. Let's take a look at how the tool works, what the aims are, and how you can get involved. How does Magenta.js work? Magenta.js is a JavaScript suite that runs on TensorFlow.js, which means it can run machine learning models in the browser. The team explains that JavaScript has been a crucial part of their project, as they have been eager to make sure they bridge the gap between the complex research they are doing and their end users. They want their research to result in tools that can actually be used. As they've said before: "...we often face conflicting desires: as researchers we want to push forward the boundaries of what is possible with machine learning, but as tool-makers, we want our models to be understandable and controllable by artists and musicians." As they note, JavaScript has informed a number of projects that have preceded Magenta.js, such as Latent Loops, Beat Blender and Melody Mixer. These tools were all built using MusicVAE, a machine learning model that forms an important part of the Magenta.js suite. The first package you'll want to pay attention to in Magenta.js is @magenta/music. This package features a number of Magenta's machine learning models for music including MusicVAE and DrumsRNN. Thanks to Magenta.js you'll be able to quickly get started. You can use a number of the project's pre-trained models which you can find on GitHub here. What next for Magenta.js? The Magenta team are keen for people to start using the tools they develop. They want a community of engineers, artists and creatives to help them drive the project forward. They're encouraging anyone who develops using Magenta.js to contribute to the GitHub repo. Clearly, this is a project where openness is going to be a huge bonus. We're excited to not only see what the Magenta team come up with next, but also the range of projects that are built using it. Perhaps we'll begin to see a whole new creative movement emerge? Read more on the project site here.
Read more
  • 0
  • 0
  • 3735

article-image-paper-in-two-minutes-zero-shot-learning-for-visual-imitation
Savia Lobo
02 May 2018
4 min read
Save for later

Paper in Two minutes: Zero-Shot learning for Visual Imitation

Savia Lobo
02 May 2018
4 min read
The ICLR paper, ‘Zero-Shot learning for Visual Imitation’ is a collaborative effort by Deepak Pathak, Parsa Mahmoudieh, Michael Luo, Pulkit Agrawal, Dian Chen, Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, and Trevor Darrell. In this article, we will come across one of the main problems with imitation learning, the expense of expert demonstration. The authors here propose a method for sidestepping this issue by using the random exploration of an agent to learn generalizable skills which can then be applied without any specific pretraining on any new task. Reducing the expert demonstration expense with Zero-shot visual imitation What problem is the paper trying to solve? In order to carry out imitation, the expert should be able to simply demonstrate tasks capably without lots of effort, instrumentation, or engineering. Collecting too many demonstrations is time-consuming, exact state-action knowledge is impractical, and reward design is involved and takes more than task expertise. The agent should be able to achieve goals based on the demonstrations without having to devote time learning to do each and every task. To address these issues, the authors recast learning from demonstration into doing from demonstration by (1) Only giving demonstrations during inference and, (2) Restricting demonstrations to visual observations alone rather than full state-actions. Instead of imitation learning, the agent must learn to imitate. This is the goal that the authors are trying to achieve. Paper summary This paper explains how existing approaches to imitation learning distill both what to do (goal) and how to do it (skills), from expert demonstrations. However, this expertise is effective but expensive supervision: it is not always practical to collect many detailed demonstrations. The authors here suggest that if an agent has access to its environment along with the expert, it can learn skills from its own experience and rely on expertise for the goals alone. And so, they have proposed a ‘Zero-shot’ method which does not include any expert actions or demonstrations during learning. The zero-shot imitator has no prior knowledge of the environment and makes no use of the expert during training. It learns from experience to follow experts, for instance, the authors conducted certain experiments such as, navigating an office with a turtlebot, and manipulating rope with a baxter robot. Key takeaways The authors have proposed a method for learning a parametric skill function (PSF) that takes as input a description of the initial state, goal state, parameters of the skill and outputs a sequence of actions (could be of varying length), which take the agent from initial state to goal state. The authors have shown real-world results for office navigation and rope manipulation but make no domain assumptions limiting the method to these problems. Zero-shot imitators learn to follow demonstrations without any expert supervision during learning. This approach learns task priors of representation, goals, and skills from the environment in order to imitate the goals given by the expert during inference. Reviewer comments summary Overall Score: 25/30 Average Score: 8 As per one of the reviewers, the proposed approach is well founded and the experimental evaluations are promising. The paper is well written and easy to follow. The skill function uses a RNN as function approximator and minimizes the sum of two losses i.e. the state mismatch loss over the trajectory (using an explicitly learnt forward model) and the action mismatch loss (using a model-free action prediction module) . This is hard to do in practice due to jointly learning both the forward model as well as the state mismatches. So first they are separately learnt and then fine-tuned together. One Shot Learning: Solution to your low data problem Using Meta-Learning in Nonstationary and Competitive Environments with Pieter Abbeel What is Meta Learning?
Read more
  • 0
  • 0
  • 1952
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-thanks-deepcode-ai-can-help-you-write-cleaner-code
Richard Gall
30 Apr 2018
2 min read
Save for later

Thanks to DeepCode, AI can help you write cleaner code

Richard Gall
30 Apr 2018
2 min read
DeepCode is a tool that uses artificial intelligence to help software engineers write cleaner code. It's a bit like Grammarly or the Hemingway Editor, but for code. It works in an ingenious way. Using AI, it reads your GitHub repositories and highlights anything that might be broken or cause compatibility issues. It is currently only available for Java, JavaScript, and Python, but more languages are going to be added. DeepCode is more than a debugger Sure, DeepCode might sound a little like a glorified debugger. But it's important to understand it's much more than that. It doesn't just correct errors, it can actually help you to improve the code you write. That means the project's mission isn't just code that works, but code that works better. It's thanks to AI that DeepCode is able to support code performance too - the software learns 'rules' about how code works best. And because DeepCode is an AI system, it's only going to get better as it learns more. Speaking to TechCrunch, Boris Paskalev claimed that DeepCode has more than 250,000 rules. This is "growing daily." Paskalev went on to explain: "We built a platform that understands the intent of the code... We autonomously understand millions of repositories and note the changes developers are making. Then we train our AI engine with those changes and can provide unique suggestions to every single line of code analyzed by our platform.” DeepCode is a compelling prospect for developers. As applications become more complex, and efficiency becomes increasingly more important, a simple solution to unlocking greater performance could be invaluable. It's no surprise that it has already raised 1.1 milion in investment from VC company btov. It's only going to become more popular with investors as the popularity of the platform grows. This might mean the end of spaghetti code, which can only be a good thing. Find out more about DeepCode and it's pricing here. Read more: Active Learning: An approach to training machine learning models efficiently
Read more
  • 0
  • 0
  • 4471

article-image-paper-in-two-minutes-attention-is-all-you-need
Sugandha Lahoti
05 Apr 2018
4 min read
Save for later

Paper in Two minutes: Attention Is All You Need

Sugandha Lahoti
05 Apr 2018
4 min read
A paper on a new simple network architecture, the Transformer, based solely on attention mechanisms The NIPS 2017 accepted paper, Attention Is All You Need, introduces Transformer, a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. This paper is authored by professionals from the Google research team including Ashish Vaswani, Noam Shazeer, Niki Parmar,  Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. The Transformer – Attention is all you need What problem is the paper attempting to solve? Recurrent neural networks (RNN), long short-term memory networks(LSTM) and gated RNNs are the popularly approaches used for Sequence Modelling tasks such as machine translation and language modeling. However, RNN/CNN handle sequences word-by-word in a sequential fashion. This sequentiality is an obstacle toward parallelization of the process. Moreover, when such sequences are too long, the model is prone to forgetting the content of distant positions in sequence or mix it with following positions’ content. Recent works have achieved significant improvements in computational efficiency and model performance through factorization tricks and conditional computation. But they are not enough to eliminate the fundamental constraint of sequential computation. Attention mechanisms are one of the solutions to overcome the problem of model forgetting. This is because they allow dependency modelling without considering their distance in the input or output sequences. Due to this feature, they have become an integral part of sequence modeling and transduction models. However, in most cases attention mechanisms are used in conjunction with a recurrent network. Paper summary The Transformer proposed in this paper is a model architecture which relies entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and tremendously improves translation quality after being trained for as little as twelve hours on eight P100 GPUs. Neural sequence transduction models generally have an encoder-decoder structure. The encoder maps an input sequence of symbol representations to a sequence of continuous representations. The decoder then generates an output sequence of symbols, one element at a time. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The authors are motivated to use self-attention because of three criteria.   One is that the total computational complexity per layer. Another is the amount of computation that can be parallelized, as measured by the minimum number of sequential operations required. The third is the path length between long-range dependencies in the network. The Transformer uses two different types of attention functions: Scaled Dot-Product Attention, computes the attention function on a set of queries simultaneously, packed together into a matrix. Multi-head attention, allows the model to jointly attend to information from different representation subspaces at different positions. A self-attention layer connects all positions with a constant number of sequentially executed operations, whereas a recurrent layer requires O(n) sequential operations. In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length is smaller than the representation dimensionality, which is often the case with machine translations. Key Takeaways This work introduces Transformer, a novel sequence transduction model based entirely on attention mechanism. It replaces the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers for translation tasks. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, the model achieves a new state of the art.  In the former task the model outperforms all previously reported ensembles. Future Goals Transformer has only been applied to transduction model tasks as of yet. In the near future, the authors plan to use it for other problems involving input and output modalities other than text. They plan to apply attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video. The Transformer architecture from this paper has gained major traction since its release because of major improvements in translation quality and other NLP tasks. Recently, the NLP research group at Harvard have released a post which presents an annotated version of the paper in the form of a line-by-line implementation. It is accompanied with 400 lines of library code, written in PyTorch in the form of a notebook, accessible from github or on Google Colab with free GPUs.  
Read more
  • 0
  • 0
  • 16228

article-image-data-science-news-daily-roundup-2nd-april-2018
Packt Editorial Staff
02 Apr 2018
2 min read
Save for later

Data Science News Daily Roundup – 2nd April 2018

Packt Editorial Staff
02 Apr 2018
2 min read
Apache Releases Trafodion, SAP announces general availability of SAP Predictive Analytics application edition, Pachyderm 1.7, and more in today’s top stories and news around data science, machine learning, and deep learning. Top Data Science news of the Day The 5 biggest announcements from TensorFlow Dev Summit 2018 Other Data Science News at a Glance Apache Releases Trafodion, a webscale SQL-on-Hadoop solution. Apache Trafodion has moved from incubator status to become a high level project. Trafodion enables transactional or operational workloads on Apache Hadoop. Read more on I Programmer SAP has announced general availability of the application edition of SAP Predictive Analytics software, to help enterprise clients harness machine learning. With this, one can create and manage predictive models that deliver powerful data-driven insights to every business user across the enterprise in real-time. Read more on inside SAP IBM’s GPU-Accelerated Semantic Similarity Search at Scale Shows ~30000x Speed Up. The proposed model is a linear-complexity RWMD that avoids wasteful and repetitive computations and reduces the average time complexity to linear. Read more on IBM Research Blog Announcing Pachyderm 1.7, an open source and enterprise data science platform that is enabling reproducible data processing at scale. Read more on Medium Mobodexter announces general availability of Paasmer 2.0, a dockerized version of their IoT Edge software that removes the hardware dependency to run Paasmer Edge Software. Paasmer becomes one of the few IoT software platforms in the world to add the Docker capability on the IoT Edge. Read more on Benzinga Announcing AIRI: Integrated AI-Ready Infrastructure for Deploying Deep Learning at Scale. AIRI is purpose-built to enable data architects, scientists and business leaders to extend the power of the NVIDIA DGX-1 and operationalise AI-at-scale for every enterprise. Read more on Scientific Computing World
Read more
  • 0
  • 0
  • 1895

article-image-paper-two-minutes-novel-method-resource-efficient-image-classification
Sugandha Lahoti
23 Mar 2018
4 min read
Save for later

Paper in Two minutes: A novel method for resource efficient image classification

Sugandha Lahoti
23 Mar 2018
4 min read
This ICLR 2018 accepted paper, Multi-Scale Dense Networks for Resource Efficient Image Classification, introduces a new model to perform image classification with limited computational resources at test time. This paper is authored by Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Weinberger. The 6th annual ICLR conference is scheduled to happen between April 30 - May 03, 2018. Using a multi-scale convolutional neural network for resource efficient image classification What problem is the paper attempting to solve? Recent years have witnessed a surge in demand for applications of visual object recognition, for instance, in self-driving cars and content-based image search. This demand is because of the astonishing progress of convolutional networks (CNNs) where state-of-the-art models may have even surpassed human-level performance. However, most are complex models which have high computational demands at inference time. In real-world applications, computation is never free; it directly translates into power consumption, which should be minimized for environmental and economic reasons. Ideally, all systems should automatically use small networks when test images are easy or computational resources are limited and use big networks when test images are hard or computation is abundant. In order to develop resource-efficient image recognition, the authors aim to develop CNNs that slice the computation and process these slices one-by-one, stopping the evaluation once the CPU time is depleted or the classification sufficiently certain. Unfortunately, CNNs learn the data representation and the classifier jointly, which leads to two problems The features in the last layer are extracted directly to be used by the classifier, whereas earlier features are not. The features in different layers of the network may have a different scale. Typically, the first layers of deep nets operate on a fine scale (to extract low-level features), whereas later layers transition to coarse scales that allow global context to enter the classifier. The authors propose a novel network architecture that addresses both problems through careful design changes, allowing for resource-efficient image classification. Paper summary The model is based on a multi-scale convolutional neural network similar to the neural fabric, but with dense connections and with a classifier at each layer.  This novel network architecture, called Multi-Scale DenseNet (MSDNet), address both of the problems described above (of classifiers altering the internal representation and the lack of coarse-scale features in early layers) for resource-efficient image classification. The network uses a cascade of intermediate classifiers throughout the network. The first problem is addressed through the introduction of dense connectivity. By connecting all layers to all classifiers, features are no longer dominated by the most imminent early exit and the trade-off between early or later classification can be performed elegantly as part of the loss function. The second problem is addressed by adopting a multi-scale network structure. At each layer, features of all scales (fine-to-coarse) are produced, which facilitates good classification early on but also extracts low-level features that only become useful after several more layers of processing. Key Takeaways MSDNet, is a novel convolutional network architecture optimized to incorporate CPU budgets at test-time. The design is based on two high-level design principles, to generate and maintain coarse level features throughout the network and to interconnect the layers with dense connectivity. The final network design is a two-dimensional array of horizontal and vertical layers, which decouples depth and feature coarseness. Whereas in traditional convolutional networks features only become coarser with increasing depth, the MSDNet generates features of all resolutions from the first layer on and maintains them throughout. Through experiments, the authors show that their network outperforms all competitive baselines on an impressive range of budgets ranging from highly limited CPU constraints to almost unconstrained settings. Reviewer feedback summary Overall Score: 25/30 Average Score: 8.33 The reviewers found the approach to be natural and effective with good results. They found the presentation to be clear and easy to follow. The structure of the network was clearly justified. The reviewers found the use of dense connectivity to avoid the loss of performance of using early-exit classifier interesting. They appreciated the results and found them to be quite promising, with 5x speed-ups and same or better accuracy than previous models.  However, some reviewers pointed out that the results about the more efficient densenet* could be shown in the main paper.
Read more
  • 0
  • 0
  • 2692
article-image-what-is-meta-learning
Sugandha Lahoti
21 Mar 2018
5 min read
Save for later

What is Meta Learning?

Sugandha Lahoti
21 Mar 2018
5 min read
Meta Learning, an original concept of cognitive psychology, is now applied to machine learning techniques. If we go by the social psychology definition, meta learning is the state of being aware of and taking control of one's own learning. Similar concepts, when applied to the machine learning theory states that a meta learning algorithm uses prior experience to change certain aspects of an algorithm, such that the modified algorithm is better than the original algorithm. To explain in simple terms, meta-learning is how the algorithm learns how to learn. Meta Learning: Making a versatile AI agent Current AI Systems excel at mastering a single skill, playing Go, holding human-like conversations, predicting a disaster, etc. However, now that AI and machine learning is possibly being integrated in everyday tasks, we need a single AI system to solve a variety of problems. Currently, a Go Player, will not be able to navigate the roads or find new places. Or an AI navigation controller won’t be able to hold a perfect human-like conversation. What machine learning algorithms need to do is develop versatility – the capability of doing many different things. Versatility is achieved by intelligent amalgamation of Meta Learning along with related techniques such as reinforcement learning (finding suitable actions to maximize a reward), transfer learning (re-purposing a trained model for a specific task on a second related task), and active learning (learning algorithm chooses the data it wants to learn from). Such different learning techniques provides an AI agent with the brains to do multiple tasks without the need to learn every new task from scratch. Thereby making it capable of adapting intelligently to a wide variety of new, unseen situations. Apart from creating versatile agents, recent researches also focus on using meta learning for hyperparameter and neural network optimization, fast reinforcement learning, finding good network architectures and for specific cases such as few-shot image recognition. Using Meta Learning, AI agents learn how to learn new tasks by reusing prior experience, rather than examining each new task in isolation. Various approaches to Meta Learning algorithms A wide variety of approaches come under the umbrella of Meta-Learning. Let's have a quick glance at these algorithms and techniques: Algorithm Learning (selection) Algorithm selection or learning, selects learning algorithms on the basis of characteristics of the instance. For example, you have a set of ML algos (Random Forest, SVM, DNN), data sets as the instances and the error rate as the cost metric. Now, the goal of Algorithm Selection is to predict which machine learning algorithm will have a small error on each data set. Hyper-parameter Optimization Many machine learning algorithms have numerous hyper-parameters that can be optimized. The choice of selecting these hyper-parameters for learning algorithms determines how well the algorithm learns.  A recent paper, "Evolving Deep Neural Networks", provides a meta learning algorithm for optimizing deep learning architectures through evolution. Ensemble Methods Ensemble methods usually combine several models or approaches to achieve better predictive performance. There are 3 basic types – Bagging, Boosting, and Stacked Generalization. In Bagging, each model runs independently and then aggregates the outputs at the end without preference to any model. Boosting refers to a group of algorithms that utilize weighted averages to make weak learners into stronger learners. Boosting is all about “teamwork”. Stacked generalization, has a layered architecture. Each set of base-classifiers is trained on a dataset. Successive layers receive as input the predictions of the immediately preceding layer and the output is passed on to the next layer. A single classifier at the topmost level produces the final prediction. Dynamic bias selection In Dynamic Bias selection, we adjust the bias of the learning algorithm dynamically to suit the new problem instance. The performance of a base learner can trigger the need to explore additional hypothesis spaces, normally through small variations of the current hypothesis space. The bias selection can either be a form of data variation or a time-dependent feature. Inductive Transfer Inductive transfer describes learning using previous knowledge from related tasks. This is done by transferring meta-knowledge across domains or tasks; a process known as inductive transfer. The goal here is to incorporate the meta-knowledge into the new learning task rather than matching meta-features with a meta-knowledge base. Adding Enhancements to Meta Learning algorithms Supervised meta-learning:  When the meta-learner is trained with supervised learning. In supervised learning we have both input and output variables and the algorithm learns the mapping function from the input to the output. RL meta-learning: This algorithm talks about using standard deep RL techniques to train a recurrent neural network in such a way that the recurrent network can then implement its own Reinforcement learning procedure. Model-agnostic meta-learning: MAML trains over a wide range of tasks, for a representation that can be quickly adapted to a new task, via a few gradient steps. The meta-learner seeks an initialization that is not only useful for adapting to various problems, but also can be adapted quickly. The ultimate goal of any meta learning algorithm and its variations is to be fully self-referential. This means it can automatically inspect and improve every part of its own code. A regenerative meta learning algorithm, on the lines of how a lizard regenerates its limbs, would not only blur the distinction between the different variations as described above but will also lead to better future performance and versatility of machine learning algorithms.
Read more
  • 0
  • 0
  • 6870

article-image-improve-interpretability-machine-learning-systems
Sugandha Lahoti
12 Mar 2018
6 min read
Save for later

How to improve interpretability of machine learning systems

Sugandha Lahoti
12 Mar 2018
6 min read
Advances in machine learning have greatly improved products, processes, and research, and how people might interact with computers. One of the factors lacking in machine learning processes is the ability to give an explanation for their predictions. The inability to give a proper explanation of results leads to end-users losing their trust over the system, which ultimately acts as a barrier to the adoption of machine learning. Hence, along with the impressive results from machine learning, it is also important to understand why and where it works, and when it won’t. In this article, we will talk about some ways to increase machine learning interpretability and make predictions from machine learning models understandable. 3 interesting methods for interpreting Machine Learning predictions According to Miller, interpretability is the degree to which a human can understand the cause of a decision. Interpretable predictions lead to better trust and provide insight into how the model may be improved. The kind of machine learning developments happening in the present times require a lot of complex models, which lack in interpretability. Simpler models (e.g. linear models), on the other hand,  often give a correct interpretation of a prediction model’s output, but they are often less accurate than complex models. Thus creating a tension between accuracy and interpretability. Complex models are less interpretable as their relationships are generally not concisely summarized. However, if we focus on a prediction made on a particular sample, we can describe the relationships more easily. Balancing the trade-off between model complexity and interpretability lies at the heart of the research done in the area of developing interpretable deep learning and machine learning models. We will discuss a few methods to increase the interpretability of complex ML models by summarizing model behavior with respect to a single prediction. LIME or Local Interpretable Model-Agnostic Explanations, is a method developed in the paper Why should I trust you? for interpreting individual model predictions based on locally approximating the model around a given prediction. LIME uses two approaches to explain specific predictions: perturbation and linear approximation. With Perturbation, LIME takes a prediction that requires explanation and systematically perturbs its inputs. These perturbed inputs become new, labeled training data for a simpler approximate model. It then does local linear approximation by fitting a linear model to describe the relationships between the (perturbed) inputs and outputs. Thus a simple linear algorithm approximates the more complex, nonlinear function. DeepLIFT (Deep Learning Important FeaTures) is another method which serves as a recursive prediction explanation method for deep learning.  This method decomposes the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT assigns contribution scores based on the difference between activation of each neuron and its ‘reference activation’. DeepLIFT can also reveal dependencies which are missed by other approaches by optionally giving separate consideration to positive and negative contributions. Layer-wise relevance propagation is another method for interpreting the predictions of deep learning models. It determines which features in a particular input vector contribute most strongly to a neural network’s output.  It defines a set of constraints to derive a number of different relevance propagation functions. Thus we saw 3 different ways of summarizing model behavior with a single prediction to increase model interpretability. Another important avenue to interpret machine learning models is to understand (and rethink) generalization. What is generalization and how it affects Machine learning interpretability Machine learning algorithms are trained on certain datasets, called training sets. During training, a model learns intrinsic patterns in data and updates its internal parameters to better understand the data. Once training is over, the model is tried upon test data to predict results based on what it has learned. In an ideal scenario, the model would always accurately predict the results for the test data. In reality, what happens is that the model is able to identify all the relevant information in the training data, but sometimes fails when presented with the new data. This difference between “training error” and “test error” is called the generalization error. The ultimate aim of turning a machine learning system to a scalable product is generalization. Every task in ML wants to create a generalized algorithm that acts in the same way for all kind of distributions. And the ability to distinguish models that generalize well from those that do not, will not only help to make ML models more interpretable, but it might also lead to more principled and reliable model architecture design. According to the conventional statistical theory, small generalization error is either due to properties of the model family or because of the regularization techniques used during training. A recent paper at ICLR 2017,  Understanding deep learning requires rethinking generalization shows that current machine learning theoretical frameworks fail to explain the impressive results of deep learning approaches and why understanding deep learning requires rethinking generalization. They support their findings through extensive systematic experiments. Developing human understanding through visualizing ML models Interpretability also means creating models that support human understanding of machine learning. Human interpretation is enhanced when visual and interactive diagrams and figures are used for the purpose of explaining the results of ML models. This is why a tight interplay of UX design with Machine learning is essential for increasing Machine learning interpretability. Walking along the lines of Human-centered Machine Learning, researchers at Google, OpenAI, DeepMind, YC Research and others have come up with Distill. This open science journal features articles which have a clear exposition of machine learning concepts using excellent interactive visualization tools. Most of these articles are aimed at understanding the inner working of various machine learning techniques. Some of these include: An article on attention and Augmented Recurrent Neural Networks which has a beautiful visualization of attention distribution in RNN. Another one on feature visualization, which talks about how neural networks build up their understanding of images Google has also launched the PAIR initiative to study and design the most effective ways for people to interact with AI systems. It helps researchers understand ML systems through work on interpretability and expanding the community of developers. R2D3 is another website, which provides an excellent visual introduction to machine learning. Facets is another tool for visualizing and understanding training datasets to provide a human-centered approach to ML engineering. Conclusion Human-Centered Machine Learning is all about increasing machine learning interpretability of ML systems and in developing their human understanding. It is about ML and AI systems understanding how humans reason, communicate and collaborate. As algorithms are used to make decisions in more angles of everyday life, it’s important for data scientists to train them thoughtfully to ensure the models make decisions for the right reasons. As more progress is done in this area, ML systems will not make commonsense errors or violate user expectations or place themselves in situations that can lead to conflict and harm, making such systems safer to use.  As research continues in this area, machines will soon be able to completely explain their decisions and their results in the most humane way possible.
Read more
  • 0
  • 0
  • 5237

article-image-paper-two-minutes-certifiable-distributional-robustness-principled-adversarial-training
Savia Lobo
01 Mar 2018
3 min read
Save for later

Paper in two minutes: Certifiable Distributional Robustness with Principled Adversarial Training

Savia Lobo
01 Mar 2018
3 min read
Certifiable Distributional Robustness with Principled Adversarial Training, a paper accepted for ICLR 2018, is a collaborative effort of Aman Sinha, Hongseok Namkoong, and John Duchi. In this paper, the authors state the vulnerability of neural networks to adversarial examples and further take the perspective of a distributionally robust optimization which guarantees performance under adversarial input perturbations. Certifiable Distributional Robustness with Applying Principled Adversarial Training What problem is the paper trying to solve? Recent works have shown that neural networks are vulnerable to adversarial examples; seemingly imperceptible perturbations to data can lead to misbehavior of the model, such as misclassifications of the output. Many researchers proposed adversarial attack and defense mechanisms to counter these vulnerabilities. While these works provide an initial foundation for adversarial training, there are no guarantees on whether proposed white-box attacks can find the most adversarial perturbation and whether there is a class of attacks such defenses can successfully prevent. On the other hand, verification of deep networks using SMT (satisfiability modulo theories) solvers provides formal guarantees on robustness but is NP-hard in general. This approach requires prohibitive computational expense even on small networks. The authors take the perspective of distributionally robust optimization and provide an adversarial training procedure with provable guarantees on its computational and statistical performance. Paper summary This paper proposes a principled methodology to induce distributional robustness in trained neural nets with the purpose of mitigating the impact of adversarial examples. The idea is to train the model to perform well not only with respect to the unknown population distribution, but to perform well on the worst-case distribution in a Wasserstein ball around the population distribution. In particular, the authors adopt the Wasserstein distance to define the ambiguity sets. This allows them to use strong duality results from the literature on distributionally robust optimization and express the empirical minimax problem as a regularized ERM (empirical risk minimization) with a different cost. Key takeaways The paper provides a method for efficiently guaranteeing distributional robustness with a simple form of adversarial data perturbation. The method values strong statistical guarantees and fast optimization rates for a large class of problems. Empirical evaluations indicate that the proposed methods are in fact robust to perturbations in the data, and they outperform less-principled adversarial training techniques. The major benefit of this approach is its simplicity and wide applicability across many models and machine-learning scenarios. Reviewer comments summary Overall Score: 27/30 Average Score: 9 The reviewers have strongly accepted this paper and have stated that it is of a great quality and originality. They said that this paper is an interesting attempt, but some of the key claims seem to be inaccurate and miss comparison to proper baselines. Another reviewer said, the paper applies recently developed ideas in the literature of robust optimization, in particular distributionally robust optimization with Wasserstein metric, and showed that under this framework for smooth loss functions when not too much robustness is requested, then the resulting optimization problem is of the same difficulty level as the original one (where the adversarial attack is not concerned). The paper has also received some criticisms but at the end of all it is majorly liked by many of the reviewers.
Read more
  • 0
  • 0
  • 1452
article-image-paper-two-minutes-using-mean-field-games-learning-behavior-policy-large-populations
Sugandha Lahoti
20 Feb 2018
4 min read
Save for later

Paper in Two minutes: Using Mean Field Games for learning behavior policy of large populations

Sugandha Lahoti
20 Feb 2018
4 min read
This ICLR 2018 accepted paper, Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations, deals with inference in models of collective behavior, specifically at how to infer the parameters of a mean field game (MFG) representation of collective behavior. This paper is authored by Jiachen Yang, Xiaojing Ye, Rakshit Trivedi, Huan Xu, and Hongyuan Zha. The 6th annual ICLR conference is scheduled to happen between April 30 - May 03, 2018. Mean field game theory is the study of decision making in very large populations of small interacting agents. This theory understands the behavior of multiple agents each individually trying to optimize their position in space and time, but with their preferences being partly determined by the choices of all the other agents. Estimating the optimal behavior policy of large populations with Deep Mean Field Games What problem is the paper attempting to solve? The paper considers the problem of representing and learning the behavior of a large population of agents, to construct an effective predictive model of the behavior. For example, a population’s behavior directly affects the ranking of a set of trending topics on social media, represented by the global population distribution over topics. Each user’s observation of this global state influences their choice of the next topic in which to participate, thereby contributing to future population behavior. Classical predictive methods such as time series analysis are also used to build predictive models from data. However, these models do not consider the behavior as the result of optimization of a reward function and so may not provide insight into the motivations that produce a population’s behavior policy. Alternatively, methods that employ the underlying population network structure assume that nodes are only influenced by a local neighborhood and do not include a representation of a global state. Hence, they face difficulty in explaining events as the result of uncontrolled implicit optimization. MFG (mean field games) overcomes the limitations of alternative predictive methods by determining how a system naturally behaves according to its underlying optimal control policy. The paper proposes a novel approach for estimating the parameters of MFG. The main contribution of the paper is in relating the theories of MFG and Reinforcement Learning within the classic context of Markov Decision Processes (MDPs). The method suggested uses inverse RL to learn both the reward function and the forward dynamics of the MFG from data. Paper summary The paper covers the problem in three sections-- theory, algorithm, and experiment.  The theoretical contribution begins by transforming a continuous time MFG formulation to a discrete time formulation and then relates the MFG to an associated MDP problem. In the algorithm phase, an RL solution is suggested to the MFG problem. The authors relate solving an optimization problem on an MDP of a single agent with solving the inference problem of the (population-level) MFG. This leads to learning a reward function from demonstrations using a maximum likelihood approach, where the reward is represented using a deep neural network. The policy is learned through an actor-critic algorithm, based on gradient descent with respect to the policy parameters. The algorithm is then compared with previous approaches on toy problems with artificially created reward functions. The authors then demonstrate the algorithm on real-world social data with the aim of recovering the reward function and predicting the future trajectory. Key Takeaways This paper describes a data-driven method to solve a mean field game model of population evolution, by proving a connection between Mean Field Games with Markov Decision Process and building on methods in reinforcement learning. This method is scalable to arbitrarily large populations because the Mean Field Games framework represents population density rather than individual agents. With experiments on real data, Mean Field Games emerges as a powerful framework for learning a reward and policy that can predict trajectories of a real-world population more accurately than alternatives. Reviewer feedback summary Overall Score: 26/30 Average Score: 8.66 The reviewers are unanimous in finding the work in this paper highly novel and significant. According to the reviewers, there is still minimal work at the intersection of machine learning and collective behavior, and this paper could help to stimulate the growth of that intersection. On the flip side, surprisingly, the paper was criticized with the statement “scientific content of the work has critical conceptual flaws”. However, the author refutations persuaded the reviewers that the concerns were largely addressed.
Read more
  • 0
  • 0
  • 3576

article-image-4093-2
Savia Lobo
05 Feb 2018
6 min read
Save for later

AutoML : Developments and where is it heading to

Savia Lobo
05 Feb 2018
6 min read
With the growing demand in ML applications, there is also a demand for machine learning tasks such as data preprocessing, optimizing model hyperparameters and so on to be easily handled by non-experts. This is because, these tasks were repetitive and due to the complexity were considered to be handled only by ML experts. To support this cause and to maintain off-the-shelf quality of machine learning methods without expert knowledge, Google came out with a project named AutoML, an approach that automates designing of ML models. You could also refer to our article on Automated Machine Learning (AutoML) for a clear understanding on how AutoML functions. Trying AutoML on smaller datasets AutoML brought in altogether new dimensions within machine learning workflows where repetitive tasks performed by human experts could be taken over by machines. When Google started off with AutoML, they applied the AutoML approach onto two smaller datasets in DL namely, CIFAR-10 and Penn Treebank to test them on image recognition and language modeling tasks respectively. The result was, AutoML approach could design models that were at par with the ones designed by the ML experts. Also, on comparing the designs drafted by humans and AutoML, it was seen that the machine-suggested architecture included new elements. These elements were later known to alleviate gradient vanishing/exploding issues, which concludes that the machines provided a new architecture which could be more useful for multiple tasks. Also, the machine designed architecture has many channels so that the gradients could flow backwards. This could help explain why LSTM RNNs work better than standard RNNs. Trying AutoML on larger datasets After a success in small scale datasets, Google tested AutoML on large scale datasets such as ImageNet and COCO object detection dataset. Testing AutoML on these was a challenge because of their higher orders of magnitude, and also because simply applying AutoML directly to ImageNet would require many months of training the AutoML method. In order to apply AutoML to large scale datasets, some alterations were made within the AutoML approach for it to be more tractable to large scale datasets. The changes include: Redesigning the search space so that AutoML could find the best layer which can then be stacked many times in a flexible manner to create a final network. Carry out architecture search on CIFAR-10 dataset and transfer the best learned architecture to ImageNet image classification and COCO object detection datasets. Thus, AutoML could find out two best layers i.e normal cell and reduction cell, which when combined resulted into a novel architecture called as “NASNet”. These two work well with CIFAR-10, and also ImageNet and COCO object detection. NASNet was seen to have a prediction accuracy of 82.7% on the validation, as stated by Google. Such an accuracy surpassed all previous inception models built by Google. Further, the learned features from the ImageNet classification were transferred to carry out object detection tasks using the COCO dataset. The learned features combined with a faster R-CNN  resulted into a state-of-the-art predictive performance on the COCO object detection task in both the largest as well as mobile-optimized models. Google suspected that these image features learned by ImageNet and COCO can be reused for various other computer vision applications. Hence, Google open-sourced NASNet for inference on image classification and for object detection in the Slim and Object Detection TensorFlow repositories. Towards Cloud AutoML: Automated Machine learning platform for everyone Cloud AutoML has been Google’s latest buzz for its customers as it makes AI available for everyone. Using Google’s advanced techniques such as learning2learn and transfer learning, Cloud AutoML helps businesses having limited ML expertise, to start building their own high-quality custom models. Thus, Cloud AutoML benefits AI experts by improving their productivity and explore new fields in AI. The experts can also aid less-skilled engineers to build powerful systems. Companies such as Disney and Urban Outfitters are using AutoML for making search and shopping on their websites more relevant. With AutoML going on cloud, Google released its first Cloud AutoML product, Cloud AutoML Vision, an Image Recognition tool that enables fast and easy to build custom ML models. This tool has a drag-and-drop interface that allows one to easily upload images, train and manage the models, and then deploy those trained models directly on Google Cloud. When used to classify popular public datasets like ImageNet and CIFAR, Cloud AutoML Vision  has shown state-of-the-art results. These results included fewer misclassifications than the generic ML APIs results.    Here are some highlights on Cloud AutoML vision: It is built on Google’s leading image recognition approaches, along with transfer learning and neural architecture search technologies. Hence, one can expect an accurate model even if the business has a limited expertise in ML. One can build a simple model in minutes or a full, production-ready model in a day in order to pilot AI-enabled application. AutoML Vision has a simple graphical UI using which one can easily specify data. It later turns the data into a high quality model customized for one’s specific needs. Starting off with Images, Google plans to roll out Cloud AutoML tools and services for text and audio too. However, Google isn’t the only one in the race; other competitors including AWS and Microsoft are also bringing in tools such as Amazon’s SageMaker and Microsoft’s service for customizing Image recognition model, to aid developers with automating machine learning. Some other automated tools include: Auto-sklearn: An automated project that aids scikit-learn project--package of common machine learning functions--to choose the right estimator function. The Auto-sklearn includes a generic estimator function that conducts analysis to determine the best algorithm and set of hyperparameters for a given Scikit-learn job. Auto-WEKA : An inspiration from the Auto-sklearn is for machine learners using Java programming language and the Weka ML package. Auto-WEKA uses a fully automated approach to select a learning algorithm and sets its hyperparameters, unlike previous methods which used to address this in isolation. H2o Driverless AI : This uses a web-based UI and is specifically designed for business users who want to gain insights from data but do not want to get into the intricacies of machine learning algorithms. This tool allows users to choose one or multiple target variables in the dataset that needs a solution, and the system provides the answer. The results are in the form of interactive charts, explained with annotations in plain English. Currently, Google’s AutoML is leading them. It would be exciting to see how Google scales an automated ML environment exactly the same as traditional ML.   Not only Google, but also other businesses are contributing to the movement towards adopting an automated machine learning ecosystem. We saw some tools joining the automation league and can expect more tools to join them. Also, these tools could go on cloud in future for an extended availability for non-experts, similar to the AutoML cloud by Google. With machine learning going automated, we can expect more and more systems to move a step closer to widening the scope for AI.  
Read more
  • 0
  • 0
  • 3879