Data | 1209 articles | Tech News, Tutorials & Expert Insights

article-image-microsoft-ai-toolkit-connect-2017

17 Nov 2017

3 min read

Microsoft showcases its edgy AI toolkit at Connect(); 2017

17 Nov 2017

At the ongoing Microsoft Connect(); 2017, Microsoft has unveiled their latest innovations in AI development platforms. The Connect(); conference this year is all about developing new tools and cloud services that help developers seize the growing opportunity around artificial intelligence and machine learning. Microsoft has made two major announcements to capture the AI market. Visual Studio Tools for AI Microsoft has announced new tools for its Visual Studio IDE specific for building AI applications. Visual Studio Tools for AI is currently in the beta stage and is an extension to the Visual Studio 2017. It allows developers, data scientists, and machine learning engineers to embed deep learning models into applications. They also have built-in support for popular machine learning frameworks such as Microsoft Cognitive Toolkit (CNTK), Google TensorFlow, Caffe2, and MXNet. It also comes packed with features such as custom metrics, history tracking, enterprise-ready collaboration, and data science reproducibility and auditing. Visual Studio Tools for AI allows interactive debugging of deep learning applications with built-in features like syntax highlighting, IntelliSense and text auto formatting. Training of AI models on the cloud is also possible using the integration with Azure Machine Learning. This integration also allows deploying a model into production. Visualization and monitoring of AI models is available using TensorBoard, which is an integrated open tool and can be run both locally and in remote VMs. Azure IoT Edge Microsoft sees IoT as a mission-critical business asset. With this in mind, they have developed a product for IoT solutions. Termed as Azure IoT Edge, it enables developers to run cloud intelligence on the edge of IoT devices. Azure IoT Edge can operate on Windows and Linux as well as on multiple hardware architectures (x64 and ARM). Developers can work on languages such as C#, C and Python to deploy models on Azure IoT Edge. The Azure IoT edge is a bundle of multiple components. With AI Toolkit, developers can start building AI applications. With Azure Machine learning, AI applications can be created, deployed, and managed with the toolkit on any framework. Azure Machine Learning also includes a set of pre-built AI models for common tasks. In addition, using the Azure IoT Hub, developers can deploy Edge modules on multiple IoT Edge devices. Using a combination of Azure Machine Learning, Azure Stream Analytics, Azure Functions, and any third-party code, a complex data pipeline can be created to build and test container-based workloads. This pipeline can be managed using the Azure IoT Hub. The customer reviews on Azure IoT edge have been positive up till now. Here’s what Matt Boujonnier, Analytics Application Architect at Schneider Electric says: "Azure IoT Edge provided an easy way to package and deploy our Machine Learning applications. Traditionally, machine learning is something that has only run in the cloud, but for many IoT scenarios that isn’t good enough, because you want to run your application as close as possible to any events. Now we have the flexibility to run it in the cloud or at the edge—wherever we need it to be." With the launch of these two new tools, Microsoft is catching up quickly with the likes of Google and IBM to capture the AI market and providing developers with an intelligent edge.

0
0
32646

article-image-youtube-bans-dangerous-pranks-and-challenges

Prasad Ramesh

17 Jan 2019

2 min read

YouTube bans dangerous pranks and challenges

Prasad Ramesh

17 Jan 2019

2 min read

YouTube updates its policies to ban dangerous pranks and challenges that can be harmful to the victim of a prank or encourages people to partake in dangerous behavior. Pranks and challenges have been around on YouTube for a long time. Many of the pranks are entertaining and harmless, some challenges potentially unsafe like an extreme food eating challenge. Recently, the “Bird Box Challenge” has been popular inspired after the Netflix movie Bird Box. The challenge is to perform difficult tasks, like driving a car, blindfolded. This challenge has received media coverage not for the entertainment value but for the dangers involved. It has caused many accidents where people take this challenge. What is banned on YouTube? In the light of this challenge being harmful and dangerous to lives, YouTube bans certain content by updating its policies page. Primarily, it has banned three kinds of pranks: Challenges that can cause serious danger to life or cause death Pranks that lead the victims to believe that they’re in serious physical danger Any pranks that cause severe emotional distress in children They state in their policies page: “YouTube is home to many beloved viral challenges and pranks, but we need to make sure what’s funny doesn’t cross the line into also being harmful or dangerous.” What are the terms? Other than the points listed above there is no clear or exhaustive list of the kind of activities that are banned. The YouTube moderators may take a call to remove a video. In the next two months, YouTube will be removing any existing content that falls into this radar, however, content creators will not receive a strike. Going forward, any new content that may have objectionable content as per their policies will get the channel a ‘strike’. Three strikes in the span of three months will lead to the channel’s termination. Questionable content includes custom thumbnails or external links that display pornographic, graphic violent, malware, or spam content. So now you are less likely to see videos on driving blindfolded or eating tide pods. Google Chrome announces an update on its Autoplay policy and its existing YouTube video annotations Is the YouTube algorithm’s promoting of #AlternativeFacts like Flat Earth having a real-world impact? Worldwide Outage: YouTube, Facebook, and Google Cloud goes down affecting thousands of users

0
0
31195

article-image-what-we-learned-from-qlik-qonnections-2018

Amey Varangaonkar

09 May 2018

4 min read

What we learned from Qlik Qonnections 2018

Amey Varangaonkar

09 May 2018

4 min read

Qlik’s new CEO Mike Capone keynoted the recently held Qlik Qonnections 2018, with some interesting feature rollouts and announcements. He also shed light on the evolution of Qlik’s two premium products - Qlikview and Qlik Sense, and shared their roadmap for the coming year. Close to 4000 developers and Business Intelligence professionals were in attendance, and were very receptive to the positive announcements made in the keynote. Let us take a quick look at some of the important announcements: Qlik continues to be the market leader Capone began the keynote by sharing some of the interesting performance metrics over the past year, which have led to Qlik being listed as a ‘Leader’ in the Gartner Magic Quadrant 2017. One of the most impressive achievements among all is the impressive customer base that Qlik boasts of, including: 9 out of the 10 major banks 8 out of the 10 major insurance companies 11 out of the 15 major global investment and securities companies With an impressive retention rate of 94%, Qlik have also managed to add close to 4000 new customers over the last year and have also doubled the developer community to over 25,000 members. These numbers mean only one thing - Qlik will continue to dominate. Migration from Qlikview to Qlik Sense There has been a lot of talk (and confusion) of late about Qlik supposedly looking to transition its focus from Qlikview to Qlik Sense. In the keynote, Capone gave us all the much needed clarity on the licensing and migration options for those looking to move from Qlikview’s guided analytics features to Qlik Sense’s self-service analytics. These are some of the important announcements in this regard: Migration from Qlikview to Qlik Sense is optional: Acknowledging some of the loyal customers who don’t want to move away from QlikView, Capone said that the migration from Qlikview to Qlik Sense is optional. For those who do want to migrate, Qlik have assured that the transition will be made as smooth as possible, and that they would be making this a priority. Single license to use both Qlikview and Qlik Sense: Qlik have made it possible for customers to get the most out of their products without having to buy multiple licenses for multiple products. With just an additional maintenance fee, they will be able to enjoy the premium features of both the tools seamlessly. Qlik venturing into cognitive analytics One of the most notable announcements of this conference was incorporating aspects of Artificial Intelligence into the Business Intelligence capabilities of the Qlik products. Qlik are aiming to improving the core associative engine that works with the available data smartly. Not just that, they have also announced the Insight Advisor feature, to auto-generate the best possible visualizations and reports. Hybrid and multi-cloud support added Qlik’s vision going forward is quite simple and straightforward - to support deployment of their applications and services in a hybrid-cloud or multi-cloud environment. Going forward, users will be able to move their Qlik Sense applications that run using a microservices-based architecture on Linux, in either public or private clouds. They will also be able to self-manage these applications with the support features provided by Qlik. New tools for Qlik developers Qonnections 2018 saw 2 important announcements made to make the lives of Qlik developers easier. Along with Qlik Branch - a platform to collaborate on projects and share innovations and new developments, Qlik also announced a new platform for developers called Qlik Core. This new platform will allow Qlik developers to leverage the offerings of IoT, edge analytics and more to design and drive innovative business models and strategies. Qlik Core is currently in the beta stage, and is expected to be generally available very soon. Interesting times ahead for Qlik In recent times, Qlik has faced stiff competition from other popular Business Intelligence tools such as Tableau, Spotfire, Microsoft’s very own Power BI - apart from the freely available tools which are easily available to customers for fast, effective business intelligence. With all the tools delivering on a similar promise and not coming out with any groundbreaking blue ocean features, it will be interesting to see how Qlik’s new offerings will fare against these sharks. The recent restructuring of the Qlik management and the downsizing happening over the past few years can make one wonder if they are struggling to keep up. However, the announcements in Qonnections 2018 indicate the company is indeed moving in a positive direction with their products, and should restore the public faith and dispel any doubts Qlik’s customers may have. How Qlik Sense is driving self-service Business Intelligence Overview of a Qlik Sense® Application’s Life Cycle QlikView Tips and Tricks

0
0
20693

article-image-introducing-spleeter-tensorflow-python-library-extracts-voice-sound-from-music

Sugandha Lahoti

05 Nov 2019

2 min read

Introducing Spleeter, a Tensorflow based python library that extracts voice and sound from any music track

Sugandha Lahoti

05 Nov 2019

2 min read

On Monday, Deezer, a French online music streaming service, released Spleeter which is a music separation engine. It comes in the form of a Python Library based on Tensorflow. Stating the reason behind Spleeter, the researchers state, “We release Spleeter to help the Music Information Retrieval (MIR) community leverage the power of source separation in various MIR tasks, such as vocal lyrics analysis from audio, music transcription, any type of multilabel classification or vocal melody extraction.” Spleeter comes with pre-trained models for 2, 4 and 5 track separation. These include: Vocals (singing voice) / accompaniment separation (2 stems) Vocals / drums / bass / other separation (4 stems) Vocals / drums / bass / piano / other separation (5 stems) It can also train source separation models or fine-tune pre-trained ones with Tensorflow if you have a dataset of isolated sources. Deezer benchmarked Spleeter against Open-Unmix another open-source model recently released and reported slightly better performances with increased speed. It can perform separation of audio files to 4 stems 100x faster than real-time when running on a GPU. You can use Spleeter straight from the command line as well as directly in your own development pipeline as a Python library. It can be installed with Conda, with pip or be used with Docker. Spleeter creators mention a number of potential applications of source separation engine including remixes, upmixing, active listening, educational purposes, and pre-processing for other tasks such as transcription. Spleeter received mostly positive feedback on Twitter, as people experimented to separate vocals from music. https://twitter.com/lokijota/status/1191580903518228480 https://twitter.com/bertboerland/status/1191110395370586113 https://twitter.com/CholericCleric/status/1190822694469734401 Wavy.org also ran several songs through the two-stem filter and evaluated them in a blog post. They tried a variety of soundtracks across multiple genres. The performance of audio was much better than expected, however, vocals sometimes felt robotically autotuned. The amount of bleed was shockingly low relative to other solutions and surpassed any available free tool and rival commercial plugins and services. https://twitter.com/waxpancake/status/1191435104788238336 Spleeter will be presented and live-demoed at the 2019 ISMIR conference in Delft. For more details refer to the official announcement. DeepMind AI’s AlphaStar achieves Grandmaster level in StarCraft II with 99.8% efficiency. Google AI introduces Snap, a microkernel approach to ‘Host Networking’ Firefox 70 released with better security, CSS, and JavaScript improvements

0
0
19290

article-image-paper-in-two-minutes-attention-is-all-you-need

Sugandha Lahoti

05 Apr 2018

4 min read

Paper in Two minutes: Attention Is All You Need

Sugandha Lahoti

05 Apr 2018

4 min read

A paper on a new simple network architecture, the Transformer, based solely on attention mechanisms The NIPS 2017 accepted paper, Attention Is All You Need, introduces Transformer, a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. This paper is authored by professionals from the Google research team including Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. The Transformer – Attention is all you need What problem is the paper attempting to solve? Recurrent neural networks (RNN), long short-term memory networks(LSTM) and gated RNNs are the popularly approaches used for Sequence Modelling tasks such as machine translation and language modeling. However, RNN/CNN handle sequences word-by-word in a sequential fashion. This sequentiality is an obstacle toward parallelization of the process. Moreover, when such sequences are too long, the model is prone to forgetting the content of distant positions in sequence or mix it with following positions’ content. Recent works have achieved significant improvements in computational efficiency and model performance through factorization tricks and conditional computation. But they are not enough to eliminate the fundamental constraint of sequential computation. Attention mechanisms are one of the solutions to overcome the problem of model forgetting. This is because they allow dependency modelling without considering their distance in the input or output sequences. Due to this feature, they have become an integral part of sequence modeling and transduction models. However, in most cases attention mechanisms are used in conjunction with a recurrent network. Paper summary The Transformer proposed in this paper is a model architecture which relies entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and tremendously improves translation quality after being trained for as little as twelve hours on eight P100 GPUs. Neural sequence transduction models generally have an encoder-decoder structure. The encoder maps an input sequence of symbol representations to a sequence of continuous representations. The decoder then generates an output sequence of symbols, one element at a time. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The authors are motivated to use self-attention because of three criteria. One is that the total computational complexity per layer. Another is the amount of computation that can be parallelized, as measured by the minimum number of sequential operations required. The third is the path length between long-range dependencies in the network. The Transformer uses two different types of attention functions: Scaled Dot-Product Attention, computes the attention function on a set of queries simultaneously, packed together into a matrix. Multi-head attention, allows the model to jointly attend to information from different representation subspaces at different positions. A self-attention layer connects all positions with a constant number of sequentially executed operations, whereas a recurrent layer requires O(n) sequential operations. In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length is smaller than the representation dimensionality, which is often the case with machine translations. Key Takeaways This work introduces Transformer, a novel sequence transduction model based entirely on attention mechanism. It replaces the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers for translation tasks. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, the model achieves a new state of the art. In the former task the model outperforms all previously reported ensembles. Future Goals Transformer has only been applied to transduction model tasks as of yet. In the near future, the authors plan to use it for other problems involving input and output modalities other than text. They plan to apply attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video. The Transformer architecture from this paper has gained major traction since its release because of major improvements in translation quality and other NLP tasks. Recently, the NLP research group at Harvard have released a post which presents an annotated version of the paper in the form of a line-by-line implementation. It is accompanied with 400 lines of library code, written in PyTorch in the form of a notebook, accessible from github or on Google Colab with free GPUs.

0
0
18141

article-image-google-researchers-introduce-jax-a-tensorflow-like-framework-for-generating-high-performance-code-from-python-and-numpy-machine-learning-programs

Bhagyashree R

11 Dec 2018

2 min read

Google researchers introduce JAX: A TensorFlow-like framework for generating high-performance code from Python and NumPy machine learning programs

Bhagyashree R

11 Dec 2018

2 min read

Google researchers have build a tool called JAX, a domain-specific tracing JIT compiler, which generates high-performance accelerator code from pure Python and Numpy machine learning programs. It combines Autograd and XLA for high-performance machine learning research. At its core, it is an extensible system for transforming numerical functions. Autograd helps JAX automatically differentiate native Python and Numpy code. It can handle a large subset of Python features such as loops, branches, recursion, and closures. It comes with support for reverse-mode (backpropagation) and forward-mode differentiation, and these two can be composed arbitrarily in any order. XLA or Accelerated Linear Algebra is a linear algebra compiler used for optimizing TensorFlow computations. To run the NumPy programs on GPUs and TPUs, JAX uses XLA. The library calls are compiled and executed just-in-time. JAX also allows compiling your own Python functions just-in-time into XLA-optimized kernels using a one-function API, jit. How JAX works? The basic function of JAX is specializing and translating high-level Python and NumPy functions into a representation that can be transformed and then lifted back into a Python function. It traces Python functions by monitoring all the basic operations applied to its input to produce output and then records these operations and the data-flow between them in a directed acyclic graph (DAG). For tracing the functions, it wraps primitive operations and when they’re called they add themselves to a list of operations performed along with their inputs and outputs. In order to keep track of the data flow between these primitive operations, the values being tracked are wrapped in the Tracer class instances. The team is working towards expanding this project and provide support for cloud TPU, multi-GPU, and multi-TPU. In future, it will come with full NumPy coverage and some SciPy coverage, and more. As this is still a research project, we can expect bugs and is not recommended to be used in production. To read more in detail and contribute to this project, head over to GitHub. Google AdaNet, a TensorFlow-based AutoML framework Graph Nets – DeepMind’s library for graph networks in Tensorflow and Sonnet Dopamine: A Tensorflow-based framework for flexible and reproducible Reinforcement Learning research by Google

0
0
17872

article-image-ai-can-now-help-speak-your-mind-uc-researchers-introduce-a-neural-decoder-that-translates-brain-signals-to-natural-sounding-speech

Bhagyashree R

29 Apr 2019

4 min read

AI can now help speak your mind: UC researchers introduce a neural decoder that translates brain signals to natural-sounding speech

Bhagyashree R

29 Apr 2019

4 min read

In a research published in the Nature journal on Monday, a team of neuroscientists from the University of California, San Francisco, introduced a neural decoder that can synthesize natural-sounding speech based on brain activity. This research was led by Gopala Anumanchipalli, a speech scientist, and Josh Chartier, a bioengineering graduate student in the Chang lab. It is being developed in the laboratory of Edward Chang, a Neurological Surgery professor at University of California. Why is this neural decoder being introduced? There are many cases of people losing their voice because of stroke, traumatic brain injury, or neurodegenerative diseases such as Parkinson’s disease, multiple sclerosis, and amyotrophic lateral sclerosis. Currently,assistive devices that track very small eye or facial muscle movements to enable people with severe speech disabilities express their thoughts by writing them letter-by-letter, do exist. However, generating text or synthesized speech with such devices is often time consuming, laborious, and error-prone. Another limitation these devices have is that they only permit generating a maximum of 10 words per minute, compared to the 100 to 150 words per minute of natural speech. This research shows that it is possible to generate a synthesized version of a person’s voice that can be controlled by their brain activity. The researchers believe that in future, this device could be used to enable individuals with severe speech disability to have fluent communication. It could even reproduce some of the “musicality” of the human voice that expresses the speaker’s emotions and personality. “For the first time, this study demonstrates that we can generate entire spoken sentences based on an individual’s brain activity,” said Chang. “This is an exhilarating proof of principle that with technology that is already within reach, we should be able to build a device that is clinically viable in patients with speech loss.” How does this system work? This research is based on another study by Josh Chartier and Gopala K. Anumanchipalli, which shows how the speech centers in our brain choreograph the movements of the lips, jaw, tongue, and other vocal tract components to produce fluent speech. In this new study, Anumanchipalli and Chartier asked five patients being treated at the UCSF Epilepsy Center to read several sentences aloud. These patients had electrodes implanted into their brains to map the source of their seizures in preparation for neurosurgery. Simultaneously, the researchers recorded activity from a brain region known to be involved in language production. The researchers used the audio recordings of volunteer’s voice to understand the vocal tract movements needed to produce those sounds. With this detailed map of sound to anatomy in hand, the scientists created a realistic virtual vocal tract for each volunteer that could be controlled by their brain activity. The system comprised of two neural networks: A decoder for transforming brain activity patterns produced during speech into movements of the virtual vocal tract. A synthesizer for converting these vocal tract movements into a synthetic approximation of the volunteer’s voice. Here’s a video depicting the working of this system: https://www.youtube.com/watch?v=kbX9FLJ6WKw&feature=youtu.be The researchers observed that the synthetic speech produced by this system was much better as compared to the synthetic speech directly decoded from the volunteer’s brain activity. The generated sentences were also understandable to hundreds of human listeners in crowdsourced transcription tests conducted on the Amazon Mechanical Turk platform. The system is still in its early stages. Explaining its limitations, Chartier said, “We still have a ways to go to perfectly mimic spoken language. We’re quite good at synthesizing slower speech sounds like ‘sh’ and ‘z’ as well as maintaining the rhythms and intonations of speech and the speaker’s gender and identity, but some of the more abrupt sounds like ‘b’s and ‘p’s get a bit fuzzy. Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what’s currently available.” Read the full report on UCSF’s official website. OpenAI introduces MuseNet: A deep neural network for generating musical compositions Interpretation of Functional APIs in Deep Neural Networks by Rowel Atienza Google open-sources GPipe, a pipeline parallelism Library to scale up Deep Neural Network training

0
0
17457

article-image-patreon-speaks-out-against-the-protests-over-its-banning-sargon-of-akkad-for-violating-its-rules-on-hate-speech

Natasha Mathur

19 Dec 2018

3 min read

Patreon speaks out against the protests over its banning Sargon of Akkad for violating its rules on hate speech

Natasha Mathur

19 Dec 2018

3 min read

0
0
17422

article-image-introducing-remove-bg-a-deep-learning-based-tool-that-automatically-removes-the-background-of-any-person-based-image-within-5-seconds

Amrata Joshi

18 Dec 2018

3 min read

Introducing remove.bg, a deep learning based tool that automatically removes the background of any person based image within 5 seconds

Amrata Joshi

18 Dec 2018

3 min read

Yesterday, Benjamin Groessing, a web consultant and developer at byteq, released remove.bg, a tool built on python, ruby and deep learning. This tool automatically removes the background of any image within 5 seconds. It uses various custom algorithms for the processing of the image. https://twitter.com/hammer_flo_/status/1074914463726350336 It is a free service and users don’t have to manually select the background/foreground layers to separate them. One can simply select an image and instantly download the resulting image with the background removed. Features of remove.bg Personal and professional use Remove.bg can be used by graphic designer, photographer or selfie lover for removing backgrounds. Saves time and money It saves time as it is automated and it is free of cost. 100% Automatic Apart from the image file, this release doesn’t require inputs such as selecting pixels, marking persons, etc. How does remove.bg work? https://twitter.com/begroe/status/1074645152487129088 Remove.bg uses AI technology for detecting foreground layers and separating them from the background. It uses additional algorithms for improving fine details and preventing color contamination. The AI detects persons as foreground and everything else as background. So, it only works if there is at least one person in the image. Users can upload images of any resolution but for performance reasons, the output image has been limited to 500 × 500 pixels. Privacy in remove.bg User images are uploaded through a secure SSL/TLS-encrypted connection. These images are processed and the result is temporarily stored till the time a user can download them. After which, approximately an hour later, these image files get deleted. Privacy message on the official website of remove.bg states, “We do not share your images or use them for any other purpose than removing the background and letting you download the result.” What can be expected from the next release? The next set of releases might support other kinds of images such as product images. The team at Remove.bg might also release an easy-to-use API. Users are very excited about this release and the technology used behind it. Many users are comparing it with the portrait mode on iPhone X. Though it is not that fast but users are still liking it. https://twitter.com/Baconbrix/status/1074805036264316928 https://twitter.com/hammer_flo_/status/1074914463726350336 But how strong is remove.bg with regards to privacy is a bigger question. Though the website gives a privacy note at the end but it will take more to win the user’s trust. The images uploaded to remove.bg’ cloud might be at risk. How strong is the security and what preventive measures have they taken? These are few of the questions that might bother many. To have a look at the ongoing discussion on remove.bg, check out Benjamin Groessing’s AMA twitter thread. Facebook open-sources PyText, a PyTorch based NLP modeling framework Deep Learning Indaba presents the state of Natural Language Processing in 2018 NYU and AWS introduce Deep Graph Library (DGL), a python package to build neural network graphs

0
0
16582

article-image-numpy-drops-python-2-support-now-you-need-python-3-5-or-later

Prasad Ramesh

17 Dec 2018

2 min read

NumPy drops Python 2 support. Now you need Python 3.5 or later.

Prasad Ramesh

17 Dec 2018

2 min read

In a GitHub pull request last week, the NumPy community decided to remove support for Python 2.7. Python 3.4 support will also be dropped with this pull request. So now, to use NumPy 1.17 and newer versions, you will need Python 3.5 or later. NumPy has been supporting both Python versions since 2010. This move doesn't come as a surprise with the Python core team itself dropping support for Python 2 in 2020. The NumPy team had mentioned that this move comes in “Python 2 is an increasing burden on our limited resources”. The discussion to drop Python 2 support in NumPy started almost a year ago. Running pip install numpy on Python 2 will still install the last working version. But here on now, it may not contain the latest features as released for Python 3.5 or higher. However, NumPy on Python 2 will still be supported until December 31, 2019. After January 1, 2020, it may not contain the newest bug fixes. The Twitter audience sees this as a welcome move: https://twitter.com/TarasNovak/status/1073262599750459392 https://twitter.com/esc___/status/1073193736178462720 A comment on Hacker News reads: “Let's hope this move helps with the transitioning to Python 3. I'm not a Python programmer myself, but I'm tired of things getting hairy on Linux dependencies written in Python. It almost seems like I always got to have a Python 2 and a Python 3 version of some packages so my system doesn't break.” Another one reads: “I've said it before, I'll say it again. I don't care for everything-is-unicode-by-default. You can take my Python 2 when you pry it from my cold dead hands.” Some researchers who use NumPy and SciPy stick Python 2, this move from the NumPy team will help in getting everyone to work on a single version. One single supported version will sure help with the fragmentation. Often, Python developers find themselves in a situation where they have one version installed and a specific module is available/works properly in another version. Some also argue about stability, that Python 2 has greater stability and x or y feature. But the general sentiment is more supportive of adopting Python 3. Introducing numpywren, a system for linear algebra built on a serverless architecture NumPy 1.15.0 release is out! Implementing matrix operations using SciPy and NumPy

0
0
15376

article-image-sherin-thomas-explains-how-to-build-a-pipeline-in-pytorch-for-deep-learning-workflows

Packt Editorial Staff

09 May 2019

8 min read

Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows

Packt Editorial Staff

09 May 2019

8 min read

A typical deep learning workflow starts with ideation and research around a problem statement, where the architectural design and model decisions come into play. Following this, the theoretical model is experimented using prototypes. This includes trying out different models or techniques, such as skip connection, or making decisions on what not to try out. PyTorch was started as a research framework by a Facebook intern, and now it has grown to be used as a research or prototype framework and to write an efficient model with serving modules. The PyTorch deep learning workflow is fairly equivalent to the workflow implemented by almost everyone in the industry, even for highly sophisticated implementations, with slight variations. In this article, we explain the core of ideation and planning, design and experimentation of the PyTorch deep learning workflow. This article is an excerpt from the book PyTorch Deep Learning Hands-On by Sherin Thomas and Sudhanshi Passi. This book attempts to provide an entirely practical introduction to PyTorch. This PyTorch publication has numerous examples and dynamic AI applications and demonstrates the simplicity and efficiency of the PyTorch approach to machine intelligence and deep learning. Ideation and planning Usually, in an organization, the product team comes up with a problem statement for the engineering team, to know whether they can solve it or not. This is the start of the ideation phase. However, in academia, this could be the decision phase where candidates have to find a problem for their thesis. In the ideation phase, engineers brainstorm and find the theoretical implementations that could potentially solve the problem. In addition to converting the problem statement to a theoretical solution, the ideation phase is where we decide what the data types are and what dataset we should use to build the proof of concept (POC) of the minimum viable product (MVP). Also, this is the stage where the team decides which framework to go with by analyzing the behavior of the problem statement, available implementations, available pretrained models, and so on. This stage is very common in the industry, and I have come across numerous examples where a well-planned ideation phase helped the team to roll out a reliable product on time, while a non-planned ideation phase destroyed the whole product creation. Design and experimentation The crucial part of design and experimentation lies in the dataset and the preprocessing of the dataset. For any data science project, the major timeshare is spent on data cleaning and preprocessing. Deep learning is no exception from this. Data preprocessing is one of the vital parts of building a deep learning pipeline. Usually, for a neural network to process, real-world datasets are not cleaned or formatted. Conversion to floats or integers, normalization and so on, is required before further processing. Building a data processing pipeline is also a non-trivial task, which consists of writing a lot of boilerplate code. For making it much easier, dataset builders and DataLoader pipeline packages are built into the core of PyTorch. The dataset and DataLoader classes Different types of deep learning problems require different types of datasets, and each of them might require different types of preprocessing depending on the neural network architecture we use. This is one of the core problems in deep learning pipeline building. Although the community has made the datasets for different tasks available for free, writing a preprocessing script is almost always painful. PyTorch solves this problem by giving abstract classes to write custom datasets and data loaders. The example given here is a simple dataset class to load the fizzbuzz dataset, but extending this to handle any type of dataset is fairly straightforward. PyTorch's official documentation uses a similar approach to preprocess an image dataset before passing that to a complex convolutional neural network (CNN) architecture. A dataset class in PyTorch is a high-level abstraction that handles almost everything required by the data loaders. The custom dataset class defined by the user needs to override the __len__ and __getitem__ functions of the parent class, where __len__ is being used by the data loaders to determine the length of the dataset and __getitem__ is being used by the data loaders to get the item. The __getitem__ function expects the user to pass the index as an argument and get the item that resides on that index: from dataclasses import dataclassfrom torch.utils.data import Dataset, DataLoader@dataclass(eq=False)class FizBuzDataset(Dataset): input_size: int start: int = 0 end: int = 1000 def encoder(self,num): ret = [int(i) for i in '{0:b}'.format(num)] return[0] * (self.input_size - len(ret)) + ret def __getitem__(self, idx): x = self.encoder(idx) if idx % 15 == 0: y = [1,0,0,0] elif idx % 5 ==0: y = [0,1,0,0] elif idx % 3 == 0: y = [0,0,1,0] else: y = [0,0,0,1] return x,y def __len__(self): return self.end - self.start The implementation of a custom dataset uses brand new dataclasses from Python 3.7. dataclasses help to eliminate boilerplate code for Python magic functions, such as __init__, using dynamic code generation. This needs the code to be type-hinted and that's what the first three lines inside the class are for. You can read more about dataclasses in the official documentation of Python (https://docs.python.org/3/library/dataclasses.html). The __len__ function returns the difference between the end and start values passed to the class. In the fizzbuzz dataset, the data is generated by the program. The implementation of data generation is inside the __getitem__ function, where the class instance generates the data based on the index passed by DataLoader. PyTorch made the class abstraction as generic as possible such that the user can define what the data loader should return for each id. In this particular case, the class instance returns input and output for each index, where, input, x is the binary-encoder version of the index itself and output is the one-hot encoded output with four states. The four states represent whether the next number is a multiple of three (fizz), or a multiple of five (buzz), or a multiple of both three and five (fizzbuzz), or not a multiple of either three or five. Note: For Python newbies, the way the dataset works can be understood by looking first for the loop that loops over the integers, starting from zero to the length of the dataset (the length is returned by the __len__ function when len(object) is called). The following snippet shows the simple loop: dataset = FizBuzDataset()for i in range(len(dataset)): x, y = dataset[i]dataloader = DataLoader(dataset, batch_size=10, shuffle=True, num_workers=4)for batch in dataloader: print(batch) The DataLoader class accepts a dataset class that is inherited from torch.utils.data.Dataset. DataLoader accepts dataset and does non-trivial operations such as mini-batching, multithreading, shuffling, and so on, to fetch the data from the dataset. It accepts a dataset instance from the user and uses the sampler strategy to sample data as mini-batches. The num_worker argument decides how many parallel threads should be operating to fetch the data. This helps to avoid a CPU bottleneck so that the CPU can catch up with the GPU's parallel operations. Data loaders allow users to specify whether to use pinned CUDA memory or not, which copies the data tensors to CUDA's pinned memory before returning it to the user. Using pinned memory is the key to fast data transfers between devices, since the data is loaded into the pinned memory by the data loader itself, which is done by multiple cores of the CPU anyway. Most often, especially while prototyping, custom datasets might not be available for developers and in such cases, they have to rely on existing open datasets. The good thing about working on open datasets is that most of them are free from licensing burdens, and thousands of people have already tried preprocessing them, so the community will help out. PyTorch came up with utility packages for all three types of datasets with pretrained models, preprocessed datasets, and utility functions to work with these datasets. This article is about how to build a basic pipeline for deep learning development. The system we defined here is a very common/general approach that is followed by different sorts of companies, with slight changes. The benefit of starting with a generic workflow like this is that you can build a really complex workflow as your team/project grows on top of it. Build deep learning workflows and take deep learning models from prototyping to production with PyTorch Deep Learning Hands-On written by Sherin Thomas and Sudhanshu Passi. F8 PyTorch announcements: PyTorch 1.1 releases with new AI tools, open sourcing BoTorch and Ax, and more Facebook AI open-sources PyTorch-BigGraph for faster embeddings in large graphs Top 10 deep learning frameworks

0
0
15258

article-image-salesforce-open-sources-transmogrifai-automated-machine-learning-library

Sugandha Lahoti

17 Aug 2018

2 min read

Salesforce Einstein team open sources TransmogrifAI, their automated machine learning library

Sugandha Lahoti

17 Aug 2018

2 min read

Salesforce has open sourced TransmogrifAI, their end-to-end automated machine learning library for structured data. This library is currently used in production to help power Salesforce Einstein AI platform. TransmogrifAI enables data scientists at Salesforce to transform customer data into meaningful, actionable predictions. Now, they have open-sourced this project to enable other developers and data scientists to build machine learning solutions at scale, fast. TransmogrifAI is built on Scala and SparkML that automates data cleansing, feature engineering, and model selection to arrive at a performant model. It encapsulates five main components of the machine learning process: Source: Salesforce Engineering Feature Inference: TransmogrifAI allows users to specify a schema for their data to automatically extract the raw predictor and response signals as “Features”. In addition to allowing for user-specified types, TransmogrifAI also does inference of its own. The strongly-typed features allow developers to catch a majority of errors at compile-time rather than run-time. Transmogrification or automated feature engineering: TransmogrifAI comes with a myriad of techniques for all the supported feature types ranging from phone numbers, email addresses, geo-location to text data. It also optimizes the transformations to make it easier for machine learning algorithms to learn from the data. Automated Feature Validation: TransgmogrifAI has algorithms that perform automatic feature validation to remove features with little to no predictive power. These algorithms are useful when working with high dimensional and unknown data. They apply statistical tests based on feature types, and additionally, make use of feature lineage to detect and discard bias. Automated Model Selection: The TransmogrifAI Model Selector runs several different machine learning algorithms on the data and uses the average validation error to automatically choose the best one. It also automatically deals with the problem of imbalanced data by appropriately sampling the data and recalibrating predictions to match true priors. Hyperparameter Optimization: It automatically tunes hyperparameters and offers advanced tuning techniques. This large-scale automation has brought down the total time taken to train models from weeks and months to a few hours with just a few lines of code. You can check out the project to get started with TransmogrifAI. For detailed information, read the Salesforce Engineering Blog. Salesforce Spring 18 – New features to be excited about in this release! How to secure data in Salesforce Einstein Analytics How to create and prepare your first dataset in Salesforce Einstein

0
0
13912

article-image-introducing-voila-that-turns-your-jupyter-notebooks-to-standalone-web-applications

Bhagyashree R

13 Jun 2019

3 min read

Introducing Voila that turns your Jupyter notebooks to standalone web applications

Bhagyashree R

13 Jun 2019

3 min read

Last week, a Jupyter Community Workshop on dashboarding was held in Paris. At the workshop, several contributors came together to build the Voila package, the details of which QuantStack shared yesterday. Voila serves live Jupyter notebooks as standalone web applications providing a neat way to share your work results with colleagues. Why do we need Voila? Jupyter notebooks allow you to do something called “literature programming” in which human-friendly explanations are accompanied with code blocks. It allows scientists, researchers, and other practitioners of scientific computing to add theory behind their code including mathematical equations. However, Jupyter notebooks may prove to be a little bit problematic when you plan to communicate your results with other non-technical stakeholders. They might get put-off by the code blocks and also the need for running the notebook to see the results. It also does not have any mechanism to prevent arbitrary code execution by the end user. How Voila works? Voila addresses all the aforementioned queries by converting your Jupyter notebook to a standalone web application. After connecting to a notebook URL, Voila launches the kernel for that notebook and runs all the cells. Once the execution is complete, it does not shut down the kernel. The notebook gets converted to HTML and is served to the user. This rendered HTML includes JavaScript that is responsible for initiating a websocket connection with the Jupyter kernel. Here’s a diagram depicting how it works: Source: Jupyter Blog Following are the features Voila provides: Renders Jupyter interactive widgets: It supports Jupyter widget libraries including bqplot, ipyleafet, ipyvolume, ipympl, ipysheet, plotly, and ipywebrtc. Prevents arbitrary code execution: It does not allow arbitrary code execution by consumers of dashboards. A language-agnostic dashboarding system: Voila is built upon Jupyter standard protocols and file formats enabling it to work with any Jupyter kernel (C++, Python, Julia). Includes custom template system for better extensibility: It provides a flexible template system to produce rich application layouts. Many Twitter users applauded this new way of creating live and interactive dashboards from Jupyter notebooks: https://twitter.com/philsheard/status/1138745404772818944 https://twitter.com/andfanilo/status/1138835776828071936 https://twitter.com/ToluwaniJohnson/status/1138866411261124608 Some users also compared it with another dashboarding solution called Panel. The main difference between Panel and Voila is that Panel supports Bokeh widgets whereas Voila is framework and language agnostic. “Panel can use a Bokeh server but does not require it; it is equally happy communicating over Bokeh Server's or Jupyter's communication channels. Panel doesn't currently support using ipywidgets, nor does Voila currently support Bokeh plots or widgets, but the maintainers of both Panel and Voila have recently worked out mechanisms for using Panel or Bokeh objects in ipywidgets or using ipywidgets in Panels, which should be ready soon,” a Hacker News user commented. To read more in detail about Voila, check out the official announcement on the Jupyter Blog. JupyterHub 1.0 releases with named servers, support for TLS encryption and more Introducing Jupytext: Jupyter notebooks as Markdown documents, Julia, Python or R scripts JupyterLab v0.32.0 releases

0
0
12981

article-image-googles-new-facial-recognition-patent-uses-your-social-network-to-identify-you

Melisha Dsouza

10 Aug 2018

3 min read

Google’s new facial recognition patent uses your social network to identify you!

Melisha Dsouza

10 Aug 2018

3 min read

Google is making its mark in facial recognition technology. After two successful forays in facial identification patents in August 2017 and January 2018, Google is back with another charter. This time its huge and plans to use machine-learning technology for facial recognition of publicly available personal photos on the internet. It’s no secret that Google can crawl trillions of websites at once. Using this as an advantage, the new patent allows Google to source pictures and identify faces from personal communications, social networks, collaborative apps, blogs and much more! Why is facial recognition gaining importance? The internet is buzzing with people clicking and uploading their images. Whether it be profile pictures or group photographs, images on social networks is all the rage these days. Apart from this, facial recognition also comes in handy while performing secure banking and financial transactions. ATMs and banks use this technology to make sure the user is who he/she says they are. From criminal tracking to identifying individuals in huge masses of people- facial recognition has applications everywhere! Clearly, Google has been taking full advantage of this tech. First, in the “Reverse Image Search” system, that allowed users to upload an image of a public figure to Google, the results would be a “best Guess” about who appears in the photo. And now, with the new patent, users can identify photos of less famous individuals. Imagine uploading a picture of a fifth-grade friend and coming back with the result of his/her email ID or occupation or for that matter, where they lives! The Workings of the Google Brain The process is simple and straightforward. First, the user uploads a photo, screenshot or scanned image The system analyzes the image and comes up with both visually similar, and a potential match using advanced image recognition Google will find the best possible match based partially on the data it pulled from your social accounts and other collaborative apps plus the aforementioned data sources The process of recognizing an image adopted by Google Source: CBInsights While all of this does sound exciting, there is a dark side left to be explored. Imagine you are out going about your own business. Someone who you don't even know happens to click your picture. This could later be used to find out all your personal details like where you live, what you do for a living, what your email address. All because everything is available on your social media accounts and on the internet these days! Creepy much? This is where basic ethics and privacy concerns come into play. The only solace here is that the patent states, in certain scenarios, a person would have to opt-in to have his/identity appear in search results. Need to know more? Check out the perspective on thenextweb.com. Admiring the many faces of Facial Recognition with Deep Learning Google’s second innings in China: Exploring cloud partnerships with Tencent and others Google’s Smart Display – A push towards the new OS, Fuchsia

0
59
12835

article-image-intel-amd-laptop-chip-partnership

Abhishek Jha

09 Nov 2017

3 min read

Frenemies: Intel and AMD partner on laptop chip to keep Nvidia at bay

Abhishek Jha

09 Nov 2017

3 min read

For decades, Intel and AMD have remained bitter archrivals. Today, they find themselves teaming up to thwart a common enemy – Nvidia. As Intel revealed its partnership with Advanced Micro Devices (AMD) over a next-generation notebook chip, it was the first time the two chip giants collaborated since the ‘80s. The proposed chip for thin and lightweight laptops combines an Intel processor and an AMD graphics unit for complex video gaming. The new series of processors will be part of Intel's 8th-generation Core H-series mobile chips, expected to hit the market in the first quarter of 2018. What it means is that Intel’s high-performance x86 cores will get combined with AMD Radeon Graphics into the same processor package using Intel’s EMIB multi-die technology. That is not all. Intel is also bundling the design with built-in High Bandwidth Memory (HBM2) RAM. The new processor, Intel claims, reduces the usual silicon footprint by about 50%. And with a ‘semi-custom’ graphics processor from AMD, enthusiasts can look forward to discrete graphics-level performances for playing games, editing photos or videos, and other tasks that can leverage modern GPU technologies. What does AMD get? Having struggled to remain profitable in recent times, AMD has been losing share in the discrete notebook GPU market. The deal could bring additional revenues with increased market share. Most importantly, the laptops built with the new processors won’t be competing with AMD’s Ryzen chips (which are also designed for ultrathin laptops). AMD clarified on the difference: While the new Intel chips are designed for serious gamers, Ryzen chips (that are due out at the end of the year) can run games but are not specifically designed for that purpose. "Our collaboration with Intel expands the installed base for AMD Radeon GPUs and brings to market a differentiated solution for high-performance graphics,” Scott Herkelman, vice president and general manager of AMD's Radeon Technologies Group, said. "Together we are offering gamers and content creators the opportunity to have a thinner-and-lighter PC capable of delivering discrete performance-tier graphics experiences in AAA games and content creation applications.” While more information will be available in future, the first machines with the new technology are expected to release in the first quarter of 2018. Nvidia's stock fell on the news. While both AMD and Intel saw their shares surging. A rivalry that began when AMD reverse-engineered the Intel 8080 microchip in 1975 could still be far from over, but in graphics, the two have been rather cordial. Despite hating each other since formation, both decided to pick each other as lesser evil over Nvidia. This is why the Intel AMD laptop chip partnership has a definite future. Currently centered around laptop solutions, this could even stretch to desktops, who knows!

0
0
12743

Tech News - Data

Microsoft showcases its edgy AI toolkit at Connect(); 2017

YouTube bans dangerous pranks and challenges

What we learned from Qlik Qonnections 2018

Introducing Spleeter, a Tensorflow based python library that extracts voice and sound from any music track

Paper in Two minutes: Attention Is All You Need

Google researchers introduce JAX: A TensorFlow-like framework for generating high-performance code from Python and NumPy machine learning programs

AI can now help speak your mind: UC researchers introduce a neural decoder that translates brain signals to natural-sounding speech

Patreon speaks out against the protests over its banning Sargon of Akkad for violating its rules on hate speech

Introducing remove.bg, a deep learning based tool that automatically removes the background of any person based image within 5 seconds

NumPy drops Python 2 support. Now you need Python 3.5 or later.

Trending Topics

Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows

Salesforce Einstein team open sources TransmogrifAI, their automated machine learning library

Introducing Voila that turns your Jupyter notebooks to standalone web applications

Google’s new facial recognition patent uses your social network to identify you!

Frenemies: Intel and AMD partner on laptop chip to keep Nvidia at bay