Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Tech Guides - Data

281 Articles
article-image-ai-tools-data-scientists-might-not-know
Amey Varangaonkar
22 Aug 2018
8 min read
Save for later

5 artificial intelligence tools data scientists might not know

Amey Varangaonkar
22 Aug 2018
8 min read
With Artificial Intelligence going mainstream, it is not at all surprising to see the number of tools and platforms for AI development go up as well. Open source libraries such as Tensorflow, Keras and PyTorch are very popular today. Not just those - enterprise platforms such as Azure AI Platform, Google Cloud AI and Amazon Sagemaker are commonly used to build scalable production-grade AI applications. While you might be already familiar with these tools and frameworks, there are quite a few relatively unknown AI tools and services which can make your life as a data scientist much, much easier! In this article, we look at 5 such tools for AI development which you may or may not have heard of before. Wit.ai One of the most popular use-cases of Artificial Intelligence today is building bots that facilitate effective human-computer interaction. Wit.ai, a platform for building these conversational chatbots, finds applications across various platforms, including mobile apps, IoT as well as home automation. Used by over 150,000 developers across the world, this platform gives you the ability to build conversational UI that supports text categorization, classification, sentiment analysis and a whole host of other features. Why you should try this machine learning tool out There are a multitude of reasons why wit.ai is so popular among developers for creating conversational chatbots. Some of the major reasons are: Support for text as well as voice, which gives you more options and flexibility in the way you want to design your bots Support for multiple languages such as Python, Ruby and Node.js which facilitates better integration of your app with the website or the platform of your choice The documentation is very easy to follow Lots of built-in entities to ease the development of your chatbots Intel OpenVINO Toolkit Bringing together two of the most talked about technologies today, i.e. Artificial Intelligence and Edge Computing, we had to include Intel’s OpenVINO Toolkit in this list. Short for Open Visual Inference and Neural Network Optimization, this toolkit brings comprehensive computer vision and deep learning capabilities to the edge devices. It has proved to be an invaluable resource to industries looking to set up smart IoT systems for image recognition and processing using edge devices. The OpenVINO toolkit can be used with the commonly used popular frameworks such as OpenCV, Tensorflow as well as Caffe. It can be configured to leverage the power of the traditional CPUs as well as customized AI chips and FPGAs. Not just that, this toolkit also has support for the Vision Processing Unit, a processor developed specifically for machine vision. Why you should try this AI tool out Allows you to develop smart Computer Vision applications for IoT-specific use-cases Support for a large number of deep learning and image processing frameworks. Also, it can be used with the traditional CPUs as well as customized chips for AI/Computer Vision Its distributed capability allows you to develop scalable applications, which again is invaluable when deployed on edge devices You can know more about OpenVINO’s features and capabilities in our detailed coverage of the toolkit. Apache PredictionIO This one is for the machine learning engineers and data scientists looking to build large-scale machine learning solutions using the existing Big Data infrastructure. Apache PredictionIO is an open source, state-of-the-art Machine Learning server which can be easily integrated with the popular Big Data tools such as Apache Hadoop, Apache Spark and Elasticsearch to deploy smart applications. Source: PredictionIO System architecture As can be seen from the architecture diagram above, PredictionIO has modules that interact with the different components of the Big Data system and uses an App Server to communicate the results of the analysis to the outside devices. Why you should try this machine learning tool out Let’s you build production-ready models which can also be deployed as web services You can also leverage the machine learning capabilities of Apache Spark to build large-scale machine learning models Pre-built performance evaluation measures available to check the accuracy of your predictive models Most importantly, this tool helps you simplify your Big Data infrastructure without adding too many complexities IBM Snap ML A machine learning library that is 46 times faster than Tensorflow. If that’s not a reason to start using IBM’s Snap ML, what is? IBM have been taking some giant strides in the field of AI research in a bid to compete with the heavyweights in this space - mainly Google, Microsoft and Amazon. With Snap ML, they seem to have struck a goldmine. A library that can be used for high-speed machine learning models using the cutting edge CPU/GPU technology, Snap ML allows for agile development of models while scaling to process massive datasets. Why you should try this machine learning tool out It is insanely fast. Snap ML was used to train a logistic regression classifier on a terabyte-scale dataset in just under 100 seconds. It allows for GPU acceleration to avoid large data transfer overheads. With the enhanced GPU technology available today, Snap ML is one of the best tools you can have at your disposal to train models quickly and efficiently It allows for distributed model training and works on sparse data structures as well You should definitely check out our detailed coverage of Snap ML where we go into the depth of its features and understand why this is a very special tool. Crypto-ML It is common knowledge that cryptocurrency, especially Bitcoin, can be traded more efficiently and profitably by leveraging the power of machine learning. Large financial institutions and trading firms have been using the machine learning tools to great effect. However, it’s the individuals, on the other hand, who have relied on historical data and outdated techniques to forecast the trends. All that has now changed, thanks to Crypto-ML. Crypto-ML is a cryptocurrency trading platform designed specifically for individuals who want to get the most out of their investments in the most reliable, error-free ways. Using state-of-the-art deep learning techniques, Crypto-ML uses historical data to build models that predict future price movement. At the same time, it eliminates any human error or mistakes arising out of emotions. Why you should try this machine learning tool out No expertise in cryptocurrency trading is required if you want to use this tool Crypto-ML only makes use of historical data and builds data models to predict future prices without any human intervention Per the Crypto-ML website, the average gain on winning trades is close to 53%, whereas the average loss on losing trades is just close to 6%. If you are a data scientist or a machine learning developer with an interest in finance and cryptocurrency, this platform can also help you customize your own models for efficient trading. Here’s where you can read on how Crypto-ML works, in more detail. Other notable mentions Apart from the tools we mentioned above, there are a quite a few other tools that could not make it to the list, but deserve a special mention. Some of them are: ABBYY’s Real-time Recognition SDK for document recognition, language processing and data capturing is worth checking out. Vertex.ai’s PlaidML is an open source tool that allows you to build smart deep learning models across a variety of platforms. It leverages the power of Tile, a new machine learning language that facilitates tensor manipulation. Facebook recently open sourced MUSE, a Python library for efficient word embedding and other NLP tasks. This one’s worth keeping an eye on for sure! If you’re interested in browser-based machine learning, MachineLabs recently open sourced the entire code base of their machine learning platform. NVIDIA’s very own NVVL, their open source offering that provides GPU-accelerated video decoding for training deep learning models The vast ecosystem of tools and frameworks available for building smart, intelligent use-cases across various domains just points to the fact that AI is finding practical applications with every passing day. It is not an overstatement anymore to suggest that that AI is slowly becoming indispensable to businesses. This is not the end of it by any means either - expect to see more such tools spring to life in the near future, with some having game-changing, revolutionary consequences. So which tools are you planning to use for your machine learning / AI tasks? Is there any tool we missed out? Let us know! Read more Predictive Analytics with AWS: A quick look at Amazon ML Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics How to earn $1m per year? Hint: Learn machine learning  
Read more
  • 0
  • 1
  • 5292

article-image-5-examples-of-artificial-intelligence-in-web-apps
Sugandha Lahoti
20 Aug 2018
7 min read
Save for later

5 examples of Artificial Intelligence in Web apps

Sugandha Lahoti
20 Aug 2018
7 min read
Modern day web app development is increasingly focused on building a customer-facing front-end presence with the use of Artificial Intelligence. Web apps, use Artificial Intelligence not just for intelligent automation, but also for building recommendation engines, website implementation, and image recognition, among other application areas. In this post, we look at five key areas, illustrated by real-world examples, where web apps are employing Artificial intelligence to automate some part of their system. Recommendation Engines of Amazon and Netflix Curating content based on the user’s context is one of the most widely used AI features in web apps. Amazon, for instance, uses item-based collaborative filtering for product classification. Amazon’s recommendation system uses a combination of goods-based recommendation (users are recommended for those similar to what they liked in the past) and buddy-based recommendation (users are recommended things which their Facebook friends like.) Not just for their recommendation system, Amazon has been using AI for multiple tasks. Their AI Management Strategy is called The Flywheel, where one part of Amazon acts as a catalyst for AI and machine learning growth in other areas. Read more: Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics Another popular example is Netflix, who revamped their recommendation algorithm based on visual impressions. One of their research projects indicated that the artwork was not only the biggest influencer to a viewer's decision to watch content, but it also drew over 82% of their focus while browsing Netflix. This made them develop a new image recommendation algorithm which works in real time to project the image it thinks the user will respond to. They use implicit (user behavior) and Explicit data (user activity) and then feed this data to machine learning algorithms to figure out the relevant content for each user. For each title, users get the image with the highest rank based on their profile. Side by side, it continues collecting data from its 100 million other subscribers to improve its engine’s performance. Read more: What software stack does Netflix use? Google and Microsoft using Image recognition Image recognition can serve multiple uses for web apps including object and pattern recognition, locating duplicates (exact or partial), image search by fragments, and more. Two such unique applications of image recognition are Google’s Quickdraw and Microsoft’s Captionbot.ai. Quick Draw is Google’s AI-powered web app game, where users have to draw an everyday object that a neural network tries to recognize. Players are given 20 seconds to draw a random item, and Google’s neural network tries to match it with other 50 million hand-drawn sketches by other players to identify the correct one. Quickdraw aims to generate the world’s largest doodling data set, which is shared publicly to help further machine learning research. The data preserves user privacy by collecting only anonymous metadata, including timestamp, country code, whether or not the drawing was recognized, and which word the drawing corresponded to. This dataset was used in SketchRNN, a neural network that can draw words and interpolate between drawings. Another image recognition web app is Microsoft’s Captionbot.ai. The system can automatically generate a caption for an uploaded photograph. Users can rate how accurately it has detected what was on display. The algorithm learns from the rating, to make the captions more accurate. It uses three separate services to process the images. The Computer Vision API identifies the components of the photo, then mixes it with data from the Bing Image API, and runs any faces it spots through Emotion API. The Emotion API analyses facial expressions to detect anger, contempt, disgust, fear, and other traits. Based on the results from these APIs, the caption is generated. Google Docs powered by Natural Language Processing Modern Web apps can also be fueled with cognitive capabilities to make them stand apart from other apps. Instances of this include transforming human speech to text or conversing with people in natural language. One such example of a web app which includes natural language processing is Google Docs. Google Docs and Slides have an Explore feature to show text, images, and other features relevant to the document that a user is working on at any given point.  Docs can also use natural language to search through data and reports, and automatically generate formulas in Sheets. Google Docs recently incorporated an AI grammar checker, announced at Google Cloud Next. It uses a machine translation algorithm to recognize errors and suggest corrections as users type. Google Docs can also be integrated with Natural Language API to recognize the sentiment of selected text in a Google Doc and highlight it based on that sentiment. Web-based artificial intelligence Chatbots Web-based chatbots are just like app-based chatbots albeit they interact with users in the website browser. They use AI techniques such as natural language understanding and pattern recognition to store and distinguish between the context of the information provided and elicit a suitable response for future replies. An example of web-based chatbots are the Live Chat bots where the conversation with a visitor on a website is automated using a chatbot. Many live chat software companies are already experimenting with chatbots. Examples include the Operator bot used by Intercom, a company building customer messaging platform or Driftbot by Drift which gives your website a personal assistant. Read More: Top 4 chatbot development frameworks for developers Another example, are AI based chatbots which help in creating full websites. Right Click is a startup that introduced an A.I.-powered chatbot which uses Artificial Intelligence in a conversational interface to create websites. It asks general questions during the conversation like “What industry you belong to?” and “Why do you want to make a website?” and creates customized templates as per the given answers. Similarly, Wix’s Artificial Intelligence Design bot can tailor websites by learning about each person’s or business’ own needs. Web-based code helpers using AI Intelligent coding assistants are gaining popularity with their ability to understand the code and provide right suggestions at the right time. They can analyze code on the web and give fast and smart completions. Codota for Chrome is a smart web-based IDE which can build predictive models of code and suggest code completions and related content based on the current context present in the code. It combines program analysis, natural language processing, and machine learning to learn from the code. Users can look for Codota’s Icon on every code snippet on their browsers - in GitHub, StackOverflow and others. Another example is Deep Cognition’s Deep Learning Studio – Cloud. It is not exactly an IDE, but it features AI-powered drag & drop interface to help design deep learning models with ease. It features assisted modeling, for automated tensor size calculations and real-time validation. It also has AutoML feature to automatically build a neural network. [dropcap]E[/dropcap]ven though AI is a great choice to enhance your web apps, an important facet to keep in mind is ensuring fairness, accuracy, and transparency of your web apps. For instance, web apps powered by natural language should not discriminate people based on caste, color, or creed or hurt user sentiments. Similarly, those using neural networks for recognizing images should ensure the filtering of obscene images. Creating such types of artificial intelligence systems would require a hybrid of designers, programmers, ML engineers, and researchers. This collective group will have a good grasp of user experience, will be comfortable thinking in abstracts and algorithms, and equally well versed with the social impacts of artificial intelligence. Read More: 20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017 Uber introduces Fusion.js, a plugin-based web development framework for high-performance apps. Electron Fiddle: A ‘code playground’ for experimenting with cross-platform native apps. Warp: Rust’s new web framework for implementing WAI (Web Application Interface)
Read more
  • 0
  • 0
  • 29346

article-image-how-everyone-at-netflix-uses-jupyter-notebooks-from-data-scientists-machine-learning-engineers-to-data-analysts
Bhagyashree R
18 Aug 2018
4 min read
Save for later

How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts

Bhagyashree R
18 Aug 2018
4 min read
Netflix uses a variety of tools to do data analysis. One of the big ways that data scientists and engineers at Netflix interact with their data is through Jupyter notebooks. In addition to providing execution environments to users, Netflix invests in various parts of the Jupyter ecosystem and tooling. They are “reimagining what a notebook can be, who can use it, and what they can do with it.” Netflix aims to provide personalized content to their 130 million viewers. For this every day more than 1 trillion events are written into a streaming ingestion pipeline. To support this, they’ve built an industry-leading data platform which is flexible, powerful, and complex. There are so many diverse users of this platform, such as analytics engineers, data engineers, and data scientists, requiring different sets of tools and languages. To help the platform scale, they wanted to minimize the number of tools and the solution to this was the open-source tool: Jupyter notebooks. Why Jupyter notebook is so compelling for Netflix? These are the functionalities provided by notebook that benefits Netflix’s data scientists and engineers: Standard messaging API: The Jupyter protocol provides a standard messaging API with the kernels that act as computational engines. It separates where the content is written and where the content is executed. This makes it language agnostic. Editable file format: It provides an editable file format that stores the code and results together. Web-based UI: It is web-based which helps interactively writing and running code as well as visualizing outputs. How Netflix uses Jupyter Notebooks? The following are some of the use cases they use Jupyter notebooks for: Data access: Notebooks were first introduced for workflows and their adoption grew among the data scientists. Seeing this, Netflix decided to leverage its versatility and architecture for general data access. Notebooks provide an user-friendly interface for interactively running code, exploring the outputs, and visualizing data all from a single cloud-based development environment. Notebook Templates: They introduced parameterized notebooks, which allow the use of parameters in the code and take values as input at runtime. These templates help: Data scientists to run an experiment with different coefficients and summarize the results Data engineers to execute data quality audits Data analysts to share prepared queries and visualizations Software engineers to email the results of a troubleshooting script Scheduling notebooks: Next they are using notebooks for creating a unifying layer for scheduling workflows. Notebooks are used for interactive work and allows smooth move to scheduling that work to run recurrently. Many users create an entire workflow in a notebook and just copy/paste it into separate files for scheduling when they’re ready to deploy it. Notebook infrastructure: The three fundamental components of the infrastructure are: storage, compute, and interface. Source: Netflix Tech Blog Storage: The Netflix Data Platform is made of Amazon S3 and EFS for cloud storage, which notebooks treat as virtual filesystems. Each user has a home directory on EFS containing a personal workspace for notebooks. This workspace is for storing any notebook created or uploaded by a user. When a user launches a notebook interactively, all the reading and writing happens at the workspace. Compute: All the jobs on the data platform run on containers including queries, pipelines and notebooks. A container with reasonable default resources is provisioned when a user launches a notebook. Users can request more resources if they find that the provided resources are not enough. A unified execution environment with a prepared container image is provided, which has common libraries and an array of default kernels preinstalled. The orchestration and environments are managed with Titus, their container management platform. Interface: They are using nteract, a React-based frontend for Jupyter notebooks, which emphasizes simplicity and composability as core design principles.They’re also introducing native support for parameterization, which makes it easier to schedule notebooks and create reusable templates. Netflix is planning to make investments in both the frontend and backend to improve the overall notebook experience. This year they are also sponsoring JupyterCon. To read more about how Jupyter is offering value to Netflix read Netflix’s original post at Medium. 10 reasons why data scientists love Jupyter notebooks What’s new in Jupyter Notebook 5.3.0 Netflix open sources Zuul 2 cloud gateway
Read more
  • 0
  • 0
  • 10061
Banner background image

article-image-tackle-trolls-machine-learning-filtering-inappropriate-content
Amarabha Banerjee
15 Aug 2018
4 min read
Save for later

Tackle trolls with Machine Learning bots: Filtering out inappropriate content just got easy

Amarabha Banerjee
15 Aug 2018
4 min read
The most feared online entities in the present day are trolls. Trolls, a fearsome bunch of fake or pseudo online profiles, tend to attack online users, mostly celebrities, sports person or political profiles using a wide range of methods. One of these methods is to post obscene or NSFW (Not Safe For Work) content on your profile or website where User Generated Content (USG) is allowed. This can create unnecessary attention and cause legal troubles for you too. The traditional way out is to get a moderator (or a team of them). Let all the USGs pass through this moderation system. This is a sustainable solution for a small platform. But if you are running a large scale app, say a publishing app where you publish one hundred stories a day, and the success of these stories depend on the user interaction with them, then this model of manual moderation becomes unsustainable. More the number of USGs, more is the turn-around time, larger the moderation team size. This results in escalating costs, for a purpose that’s not contributing to your business growth in any manner. That’s where Machine Learning could help. Machine Learning algorithms that can scan images and content for possible abusive or adult content is a better solution that manual moderation. Tech giants like Microsoft, Google, Amazon have a ready solution for this. These companies have created APIs which are commercially available for developers. You can incorporate these APIs in your application to weed out the filth served by the trolls. The different APIs available for this purpose are Microsoft moderation, Google Vision, AWS Rekognition & Clarifai. Dataturks have made a comparative study on using these APIs on one particular dataset to measure their efficiency. They used a YACVID dataset with 180 images, manually labelled 90 of these images as nude and the rest as non-nude. The dataset was then fed to the 4 APIs mentioned above, their efficiency was tested based on the following parameters. True Positive (TP): Given a safe photo, the API correctly says so False Positive (FP): Given an explicit photo but the API incorrectly classifies it as safe. False negative (FN): Given a safe photo but the API is not able to detect so and True negative(TN): Given an explicit photo and the API correctly says so. TP and TN are two cases which meant the system behaved correctly. An FP meant that the app was vulnerable to attacks from trolls, FN meant the efficiency of the systems were low and hence not practically viable. 10% of the cases would be such that the API can’t decide whether its explicit or not. Those would be sent for manual moderation. This would bring down the maintenance cost of the moderation team. The results that they received are shown below: Source: Dataturks As it is evident from the above table, the best standalone API is Google vision with a 99% accuracy and 94% recall value. Recall value implies that if the same images are repeated, it can recognize them with 94% precision. The best results however were received with the combination of Microsoft and Google. The comparison of the response times are mentioned below: Source: dataturks The response time might have been affected with the fact that all the images accessed by the APIs were stored in Amazon S3. Hence AWS API might have had an unfair advantage on the response time. The timings were noted for 180 image calls per API. The cost is the lowest for AWS Rekognition - $1 for 1000 calls to the API. It’s $1.2 for Clarifai, $1.5 for both Microsoft and Google. The one notable drawback of the Amazon API was that the images had to be stored as S3 objects, or converted into that. All the other APIs accepted any web links as possible source of images. What this study says is that the power of filtering out negative and explicit content in your app is much easier now. You might still have to have a small team of moderators, but their jobs will be made a lot easier with the ML models implemented in these APIs. Machine Learning is paving the way for us to be safe from the increasing menace of Trolls, a threat to free speech and open sharing of ideas which were the founding stones of internet and the world wide web as a whole. Will this discourage Trolls from continuing their slandering or will it create a counter system to bypass the APIs and checks? We can only know in time. Facebook launches a 6-part Machine Learning video series Google’s new facial recognition patent uses your social network to identify you! Microsoft’s Brad Smith calls for facial recognition technology to be regulated
Read more
  • 0
  • 0
  • 6129

article-image-budget-demand-forecasting-markov-model-in-sas
Sunith Shetty
10 Aug 2018
8 min read
Save for later

Budget and Demand Forecasting using Markov model in SAS [Tutorial]

Sunith Shetty
10 Aug 2018
8 min read
Budget and demand forecasting are important aspects of any finance team. Budget forecasting is the outcome, and demand forecasting is one of its components. In this article, we understand the Markov model for forecasting and budgeting in finance.   This article is an excerpt from a book written by Harish Gulati titled SAS for Finance. Understanding problem of budget and demand forecasting While a few decades ago, retail banks primarily made profits by leveraging their treasury office, recent years have seen fee income become a major source of profitability. Accepting deposits from customers and lending to other customers is one of the core functions of the treasury. However, charging for current or savings accounts with add-on facilities such as breakdown cover, mobile, and other insurances, and so on, has become a lucrative avenue for banks. One retail bank has a plain vanilla classic bank account, mid-tier premier, and a top-of-the-range, benefits included a platinum account. The classic account is offered free and the premier and platinum have fees of $10 and $20 per month respectively. The marketing team has just relaunched the fee-based accounts with added benefits. The finance team wanted a projection of how much revenue could be generated via the premier and the platinum accounts. Solving with Markovian model approach Even though we have three types of account, the classic, premier, and the platinum, it doesn't mean that we are only going to have nine transition types possible as in Figure 4.1. There are customers who will upgrade, but also others who may downgrade. There could also be some customers who leave the bank and at the same time there will be a constant inflow of new customers. Let's evaluate the transition states flow for our business problem: In Figure 4.2, we haven't jotted down the transition probability between each state. We can try to do this by looking at the historical customer movements, to arrive at the transitional probability. Be aware that most business managers would prefer to use their instincts while assigning transitional probabilities. They may achieve some merit in this approach, as the managers may be able to incorporate the various factors that may have influenced the customer movements between states. A promotion offering 40% off the platinum account (effective rate $12/month, down from $20/month) may have ensured that more customers in the promotion period opted for the platinum account than the premier offering ($10/month). Let's examine the historical data of customer account preferences. The data is compiled for the years 2008 – 2018. This doesn't account for any new customers joining after January 1, 2008 and also ignores information on churned customers in the period of interest. Figure 4.3 consists of customers who have been with the bank since 2008: Active customer counts (Millions) Year Classic (Cl) Premium (Pr) Platinum (Pl) Total customers 2008 H1 30.68 5.73 1.51 37.92 2008 H2 30.65 5.74 1.53 37.92 2009 H1 30.83 5.43 1.66 37.92 2009 H2 30.9 5.3 1.72 37.92 2010 H1 31.1 4.7 2.12 37.92 2010 H2 31.05 4.73 2.14 37.92 2011 H1 31.01 4.81 2.1 37.92 2011 H2 30.7 5.01 2.21 37.92 2012 H1 30.3 5.3 2.32 37.92 2012 H2 29.3 6.4 2.22 37.92 2013 H1 29.3 6.5 2.12 37.92 2013 H2 28.8 7.3 1.82 37.92 2014 H1 28.8 8.1 1.02 37.92 2014 H2 28.7 8.3 0.92 37.92 2015 H1 28.6 8.34 0.98 37.92 2015 H2 28.4 8.37 1.15 37.92 2016 H1 27.6 9.01 1.31 37.92 2016 H2 26.5 9.5 1.92 37.92 2017 H1 26 9.8 2.12 37.92 2017 H2 25.3 10.3 2.32 37.92 Figure 4.3: Active customers since 2008 Since we are only considering active customers, and no new customers are joining or leaving the bank, we can calculate the number of customers moving from one state to another using the data in Figure 4.3: Customer movement count to next year (Millions) Year Cl-Cl Cl-Pr Cl-Pl Pr-Pr Pr-Cl Pr-Pl Pl-Pl Pl-Cl Pl-Pr Total customers 2008 H1 - - - - - - - - - - 2008 H2 30.28 0.2 0.2 5.5 0 0.23 1.1 0.37 0.04 37.92 2009 H1 30.3 0.1 0.25 5.1 0.53 0.11 1.3 0 0.23 37.92 2009 H2 30.5 0.32 0.01 4.8 0.2 0.43 1.28 0.2 0.18 37.92 2010 H1 30.7 0.2 0 4.3 0 1 1.12 0.4 0.2 37.92 2010 H2 30.7 0.2 0.2 4.11 0.35 0.24 1.7 0 0.42 37.92 2011 H1 30.9 0 0.15 4.6 0 0.13 1.82 0.11 0.21 37.92 2011 H2 30.2 0.8 0.01 3.8 0.1 0.91 1.29 0.4 0.41 37.92 2012 H1 30.29 0.4 0.01 4.9 0.01 0.1 2.21 0 0 37.92 2012 H2 29.3 0.9 0.1 5.3 0 0 2.12 0 0.2 37.92 2013 H1 29.2 0.1 0 6.1 0.1 0.2 1.92 0 0.3 37.92 2013 H2 28.6 0.3 0.4 6.5 0 0 1.42 0.2 0.5 37.92 2014 H1 28.7 0.1 0 7.2 0.1 0 1.02 0 0.8 37.92 2014 H2 28.7 0 0.1 8.1 0 0 0.82 0 0.2 37.92 2015 H1 28.6 0 0.1 8.3 0 0 0.88 0 0.04 37.92 2015 H2 28.3 0 0.3 8 0.1 0.24 0.61 0 0.37 37.92 2016 H1 27.6 0.8 0 8.21 0 0.16 1.15 0 0 37.92 2016 H2 26 1 0.6 8.21 0.5 0.3 1.02 0 0.29 37.92 2017 H1 25 0.5 1 8 0.5 1 0.12 0.5 1.3 37.92 2017 H2 25.3 0.1 0.6 9 0 0.8 0.92 0 1.2 37.92 Figure 4.4: Customer transition state counts In Figure 4.4, we can see the customer movements between various states. We don't have the movements for the first half of 2008 as this is the start of the series. In the second half of 2008, we see that 30.28 out of 30.68 million customers (30.68 is the figure from the first half of 2008) were still using a classic account. However, 0.4 million customers moved away to premium and platinum accounts. The total customers remain constant at 37.92 million as we have ignored new customers joining and any customers who have left the bank. From this table, we can calculate the transition probabilities for each state: Year Cl-Cl Cl-Pr Cl-Pl Pr-Pr Pr-Cl Pr-Pl Pl-Pl Pl-Cl Pl-Pr 2008 H2 98.7% 0.7% 0.7% 96.0% 0.0% 4.0% 72.8% 24.5% 2.6% 2009 H1 98.9% 0.3% 0.8% 88.9% 9.2% 1.9% 85.0% 0.0% 15.0% 2009 H2 98.9% 1.0% 0.0% 88.4% 3.7% 7.9% 77.1% 12.0% 10.8% 2010 H1 99.4% 0.6% 0.0% 81.1% 0.0% 18.9% 65.1% 23.3% 11.6% 2010 H2 98.7% 0.6% 0.6% 87.4% 7.4% 5.1% 80.2% 0.0% 19.8% 2011 H1 99.5% 0.0% 0.5% 97.3% 0.0% 2.7% 85.0% 5.1% 9.8% 2011 H2 97.4% 2.6% 0.0% 79.0% 2.1% 18.9% 61.4% 19.0% 19.5% 2012 H1 98.7% 1.3% 0.0% 97.8% 0.2% 2.0% 100.0% 0.0% 0.0% 2012 H2 96.7% 3.0% 0.3% 100.0% 0.0% 0.0% 91.4% 0.0% 8.6% 2013 H1 99.7% 0.3% 0.0% 95.3% 1.6% 3.1% 86.5% 0.0% 13.5% 2013 H2 97.6% 1.0% 1.4% 100.0% 0.0% 0.0% 67.0% 9.4% 23.6% 2014 H1 99.7% 0.3% 0.0% 98.6% 1.4% 0.0% 56.0% 0.0% 44.0% 2014 H2 99.7% 0.0% 0.3% 100.0% 0.0% 0.0% 80.4% 0.0% 19.6% 2015 H1 99.7% 0.0% 0.3% 100.0% 0.0% 0.0% 95.7% 0.0% 4.3% 2015 H2 99.0% 0.0% 1.0% 95.9% 1.2% 2.9% 62.2% 0.0% 37.8% 2016 H1 97.2% 2.8% 0.0% 98.1% 0.0% 1.9% 100.0% 0.0% 0.0% 2016 H2 94.2% 3.6% 2.2% 91.1% 5.5% 3.3% 77.9% 0.0% 22.1% 2017 H1 94.3% 1.9% 3.8% 84.2% 5.3% 10.5% 6.2% 26.0% 67.7% 2017 H2 97.3% 0.4% 2.3% 91.8% 0.0% 8.2% 43.4% 0.0% 56.6% Figure 4.5: Transition state probability In Figure 4.5, we have converted the transition counts into probabilities. If 30.28 million customers in 2008 H2 out of 30.68 million customers in 2008 H1 are retained as classic customers, we can say that the retention rate is 98.7%, or the probability of customers staying with the same account type in this instance is .987. Using these details, we can compute the average transition between states across the time series. These averages can be used as the transition probabilities that will be used in the transition matrix for the model: Cl Pr Pl Cl 98.2% 1.1% 0.8% Pr 2.0% 93.2% 4.8% Pl 6.3% 20.4% 73.3% Figure 4.6: Transition probabilities aggregated The probability of classic customers retaining the same account type between semiannual time periods is 98.2%. The lowest retain probability is for platinum customers as they are expected to transition to another customer account type 26.7% of the time. Let's use the transition matrix in Figure 4.6 to run our Markov model. Use this code for Data setup: DATA Current; input date CL PR PL; datalines; 2017.2 25.3 10.3 2.32 ; Run; Data Netflow; input date CL PR PL; datalines; 2018.1 0.21 0.1 0.05 2018.2 0.22 0.16 0.06 2019.1 .24 0.18 0.08 2019.2 0.28 0.21 0.1 2020.1 0.31 0.23 0.14 ; Run; Data TransitionMatrix; input CL PR PL; datalines; 0.98 0.01 0.01 0.02 0.93 0.05 0.06 0.21 0.73 ; Run; In the current data set, we have chosen the last available data point, 2017 H2. This is the base position of customer counts across classic, premium, and platinum accounts. While calculating the transition matrix, we haven't taken into account new joiners or leavers. However, to enable forecasting we have taken 2017 H2 as our base position. The transition matrix seen in Figure 4.6 has been input as a separate dataset. Markov model code PROC IML; use Current; read all into Current; use Netflow; read all into Netflow; use TransitionMatrix; read all into TransitionMatrix; Current = Current [1,2:4]; Netflow = Netflow [,2:4]; Model_2018_1 = Current * TransitionMatrix + Netflow [1,]; Model_2018_2 = Model_2018_1 * TransitionMatrix + Netflow [1,]; Model_2019_1 = Model_2018_2 * TransitionMatrix + Netflow [1,]; Model_2019_2 = Model_2019_1 * TransitionMatrix + Netflow [1,]; Model_2020_1 = Model_2019_2 * TransitionMatrix + Netflow [1,]; Budgetinputs = Model_2018_1//Model_2018_2//Model_2019_1//Model_2019_2//Model_2020_1; Create Budgetinputs from Budgetinputs; append from Budgetinputs; Quit; Data Output; Set Budgetinputs (rename=(Col1=Cl Col2=Pr Col3=Pl)); Run; Proc print data=output; Run; Figure 4.7: Model output The Markov model has been run and we are able to generate forecasts for all account types for the requested five periods. We can immediately see that there is an increase forecasted for all the account types. This is being driven by the net flow of customers. We have derived the forecasts by essentially using the following equation: Forecast = Current Period * Transition Matrix + Net Flow Once the 2018 H1 forecast is derived, we replace the Current Period with the 2018 H1 forecasted number while trying to forecast the 2018 H2 numbers. We are doing this as, based on the 2018 H1 customer counts, the transition probabilities will determine how many customers move across states. This will help generate the forecasted customer count for the required period. Understanding transition probability Now, since we have our forecasts let's take a step back and revisit our business goals. The finance team wants to estimate the revenues from the revamped premium and platinum customer accounts for the next few forecasting periods. As we have seen, one of the important drivers of the forecasting process is the transition probability. This transition probability is driven by historical customer movements, as shown in Figure 4.4. What if the marketing team doesn't agree with the transitional probabilities calculated in Figure 4.6? As we discussed, 26.7% of platinum customers aren't retained in this account type. Since we are not considering customer churn out of the bank, this means that a large proportion of platinum customers downgrade their accounts. One of the reasons the marketing teams revamped the accounts is due to this reason. The marketing team feels that it will be able to raise the retention rates for platinum customers and want the finance team to run an alternate forecasting scenario. This is, in fact, one of the pros of the Markov model approach as by tweaking the transition probabilities we can run various business scenarios. Let's compare the base and the alternate scenario forecasts generated in Figure 4.8: A change in the transition probabilities of how platinum customers moved to various states has brought about a significant change in the forecast for premium and platinum customer accounts. For classic customers, the change in the forecast between the base and the alternate scenario is negligible, as shown in the table in Figure 4.8. The finance team can decide which scenario is best suited for budget forecasting: Cl Pr Pl Cl 98.2% 1.1% 0.8% Pr 2.0% 93.2% 4.8% Pl 5.0% 15.0% 80.0% Figure 4.8: Model forecasts and updated transition probabilities To summarize, we learned the Markov model methodology and learned Markov models for forecasting and imputation. To know more about how to use the other two methodologies such as ARIMA and MCMC for generating forecasts for various business problems, you can check out the book SAS for Finance. Read more How to perform regression analysis using SAS Performing descriptive analysis with SAS Akon is planning to create a cryptocurrency city in Senegal
Read more
  • 0
  • 1
  • 5593

article-image-predictive-analytics-with-amazon-ml
Natasha Mathur
09 Aug 2018
9 min read
Save for later

Predictive Analytics with AWS: A quick look at Amazon ML

Natasha Mathur
09 Aug 2018
9 min read
As artificial intelligence and big data have become a ubiquitous part of our everyday lives, cloud-based machine learning services are part of a rising billion-dollar industry. Among the several services currently available in the market, Amazon Machine Learning stands out for its simplicity. In this article, we will look at Amazon Machine Learning, MLaaS, and other related concepts. This article is an excerpt taken from the book 'Effective Amazon Machine Learning' written by Alexis Perrier. Machine Learning as a Service Amazon Machine Learning is an online service by Amazon Web Services (AWS) that does supervised learning for predictive analytics. Launched in April 2015 at the AWS Summit, Amazon ML joins a growing list of cloud-based machine learning services, such as Microsoft Azure, Google prediction, IBM Watson, Prediction IO, BigML, and many others. These online machine learning services form an offer commonly referred to as Machine Learning as a Service or MLaaS following a similar denomination pattern of other cloud-based services such as SaaS, PaaS, and IaaS respectively for Software, Platform, or Infrastructure as a Service. Studies show that MLaaS is a potentially big business trend. ABI Research, a business intelligence consultancy, estimates machine learning-based data analytics tools and services revenues to hit nearly $20 billion in 2021 as MLaaS services take off as outlined in this business report  Eugenio Pasqua, a Research Analyst at ABI Research, said the following: "The emergence of the Machine-Learning-as-a-Service (MLaaS) model is good news for the market, as it cuts down the complexity and time required to implement machine learning and thus opens the doors to an increase in its adoption level, especially in the small-to-medium business sector." The increased accessibility is a direct result of using an API-based infrastructure to build machine-learning models instead of developing applications from scratch. Offering efficient predictive analytics models without the need to code, host, and maintain complex code bases lowers the bar and makes ML available to smaller businesses and institutions. Amazon ML takes this democratization approach further than the other actors in the field by significantly simplifying the predictive analytics process and its implementation. This simplification revolves around four design decisions that are embedded in the platform: A limited set of tasks: binary classification, multi-classification, and regression A single linear algorithm A limited choice of metrics to assess the quality of the prediction A simple set of tuning parameters for the underlying predictive algorithm That somewhat constrained environment is simple enough while addressing most predictive analytics problems relevant to business. It can be leveraged across an array of different industries and use cases. Let's see how! Leveraging full AWS integration The AWS data ecosystem of pipelines, storage, environments, and Artificial Intelligence (AI) is also a strong argument in favor of choosing Amazon ML as a business platform for its predictive analytics needs. Although Amazon ML is simple, the service evolves to greater complexity and more powerful features once it is integrated into a larger structure of AWS data related services. AWS is already a major factor in cloud computing. Here's what an excerpt from The Economist, August  2016 has to say about AWS (http://www.economist.com/news/business/21705849-how-open-source-software-and-cloud-computing-have-set-up-it-industry): AWS shows no sign of slowing its progress towards full dominance of cloud computing's wide skies. It has ten times as much computing capacity as the next 14 cloud providers combined, according to Gartner, a consulting firm. AWS's sales in the past quarter were about three times the size of its closest competitor, Microsoft's Azure. This gives an edge to Amazon ML, as many companies that are using cloud services are likely to be already using AWS. Adding simple and efficient machine learning tools to the product offering mix anticipates the rise of predictive analytics features as a standard component of web services. Seamless integration with other AWS services is a strong argument in favor of using Amazon ML despite its apparent simplicity. The following architecture is a case study taken from an AWS January 2016 white paper titled Big Data Analytics Options on AWS (http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf), showing a potential AWS architecture for sentiment analysis on social media. It shows how Amazon ML can be part of a more complex architecture of AWS services: Comparing performances in Amazon ML services Keeping systems and applications simple is always difficult, but often worth it for the business. Examples abound with overloaded UIs bringing down the user experience, while products with simple, elegant interfaces and minimal features enjoy widespread popularity. The Keep It Simple mantra is even more difficult to adhere to in a context such as predictive analytics where performance is key. This is the challenge Amazon took on with its Amazon ML service. A typical predictive analytics project is a sequence of complex operations: getting the data, cleaning the data, selecting, optimizing and validating a model and finally making predictions. In the scripting approach, data scientists develop codebases using machine learning libraries such as the Python scikit-learn library or R packages to handle all these steps from data gathering to predictions in production. As a developer breaks down the necessary steps into modules for maintainability and testability, Amazon ML breaks down a predictive analytics project into different entities: datasource, model, evaluation, and predictions. It's the simplicity of each of these steps that makes AWS so powerful to implement successful predictive analytics projects. Engineering data versus model variety Having a large choice of algorithms for your predictions is always a good thing, but at the end of the day, domain knowledge and the ability to extract meaningful features from clean data is often what wins the game. Kaggle is a well-known platform for predictive analytics competitions, where the best data scientists across the world compete to make predictions on complex datasets. In these predictive competitions, gaining a few decimals on your prediction score is what makes the difference between earning the prize or being just an extra line on the public leaderboard among thousands of other competitors. One thing Kagglers quickly learn is that choosing and tuning the model is only half the battle. Feature extraction or how to extract relevant predictors from the dataset is often the key to winning the competition. In real life, when working on business-related problems, the quality of the data processing phase and the ability to extract meaningful signal out of raw data is the most important and time-consuming part of building an effective predictive model. It is well known that "data preparation accounts for about 80% of the work of data scientists" (http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/). Model selection and algorithm optimization remains an important part of the work but is often not the deciding factor when the implementation is concerned. A solid and robust implementation that is easy to maintain and connects to your ecosystem seamlessly is often preferred to an overly complex model developed and coded in-house, especially when the scripted model only produces small gains when compared to a service-based implementation. Amazon's expertise and the gradient descent algorithm Amazon has been using machine learning for the retail side of its business and has built a serious expertise in predictive analytics. This expertise translates into the choice of algorithm powering the Amazon ML service. The Stochastic Gradient Descent (SGD) algorithm is the algorithm powering Amazon ML linear models and is ultimately responsible for the accuracy of the predictions generated by the service. The SGD algorithm is one of the most robust, resilient, and optimized algorithms. It has been used in many diverse environments, from signal processing to deep learning and for a wide variety of problems, since the 1960s with great success. The SGD has also given rise to many highly efficient variants adapted to a wide variety of data contexts. We will come back to this important algorithm in a later chapter; suffice it to say at this point that the SGD algorithm is the Swiss army knife of all possible predictive analytics algorithm. Several benchmarks and tests of the Amazon ML service can be found across the web (Amazon, Google, and Azure: https://blog.onliquid.com/machine-learning-services-2/ and Amazon versus scikit-learn: http://lenguyenthedat.com/minimal-data-science-2-avazu/). Overall results show that the Amazon ML performance is on a par with other MLaaS platforms, but also with scripted solutions based on popular machine learning libraries such as scikit-learn. For a given problem in a specific context and with an available dataset and a particular choice of a scoring metric, it is probably possible to code a predictive model using an adequate library and obtain better performances than the ones obtained with Amazon ML. But what Amazon ML offers is stability, an absence of coding, and a very solid benchmark record, as well as a seamless integration with the Amazon Web Services ecosystem that already powers a large portion of the Internet. Amazon ML service pricing strategy As with other MLaaS providers and AWS services, Amazon ML only charges for what you consume. The cost is broken down into the following: An hourly rate for the computing time used to build predictive models A prediction fee per thousand prediction samples And in the context of real-time (streaming) predictions, a fee based on the memory allocated upfront for the model The computational time increases as a function of the following: The complexity of the model The size of the input data The number of attributes The number and types of transformations applied At the time of writing, these charges are as follows: $0.42 per hour for data analysis and model building fees $0.10 per 1,000 predictions for batch predictions $0.0001 per prediction for real-time predictions $0.001 per hour for each 10 MB of memory provisioned for your model These prices do not include fees related to the data storage (S3, Redshift, or RDS), which are charged separately. During the creation of your model, Amazon ML gives you a cost estimation based on the data source that has been selected. The Amazon ML service is not part of the AWS free tier, a 12-month offer applicable to certain AWS services for free under certain conditions. To summarize, we presented a simple introduction to the Amazon ML service. Amazon ML is built on a solid ground, with a simple yet very efficient algorithm driving its predictions. If you found this post useful, be sure to check out the book  'Effective Amazon Machine Learning' to learn about predictive analytics and other concepts in AWS machine learning. Integrate applications with AWS services: Amazon DynamoDB & Amazon Kinesis [Tutorial] AWS makes Amazon Rekognition, its image recognition AI, available for Asia-Pacific developers AWS Elastic Load Balancing: support added for Redirects and Fixed Responses in Application Load Balancer
Read more
  • 0
  • 0
  • 8993
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-write-python-code-or-pythonic-code
Aaron Lazar
08 Aug 2018
5 min read
Save for later

Do you write Python Code or Pythonic Code?

Aaron Lazar
08 Aug 2018
5 min read
If you’re new to Programming, and Python in particular, you might have heard the term Pythonic being brought up at tech conferences, meetups and even at your own office. You might have also wondered why the term and whether they’re just talking about writing Python code. Here we’re going to understand what the term Pythonic means and why you should be interested in learning how to not just write Python code, rather write Pythonic code. What does Pythonic mean? When people talk about pythonic code, they mean that the code uses Python idioms well, that it’s natural or displays fluency in the language. In other words, it means the most widely adopted idioms that are adopted by the Python community. If someone said you are writing un-pythonic code, they might actually mean that you are attempting to write Java/C++ code in Python, disregarding the Python idioms and performing a rough transcription rather than an idiomatic translation from the other language. Okay, now that you have a theoretical idea of what Pythonic (and unpythonic) means, let’s have a look at some Pythonic code in practice. Writing Pythonic Code Before we get into some examples, you might be wondering if there’s a defined way/method of writing Pythonic code. Well, there is, and it’s called PEP 8. It’s the official style guide for Python. Example #1 x=[1, 2, 3, 4, 5, 6] result = [] for idx in range(len(x)); result.append(x[idx] * 2) result Output: [2, 4, 6, 8, 10, 12] Consider the above code, where you’re trying to multiply some elements, “x” by 2. So, what we did here was, we created an empty list to store the results. We would then append the solution of the computation into the result. The result now contains a function which is 2 multiplied by each of the elements. Now, if you were to write the same code in a Pythonic way, you might want to simply use list comprehensions. Here’s how: x=[1, 2, 3, 4, 5, 6] [(element * 2) for element in x] Output: [2, 4, 6, 8, 10] You might have noticed, we skipped the entire for loop! Example #2 Let’s make the previous example a bit more complex, and place a condition that the elements should be multiplied by 2 only if they are even. x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] result = [] for idx in range(len(x)); if x[idx] % 2 == 0; result.append(x[idx] * 2) else; result.append(x[idx]) result Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] We’ve actually created an if else statement to solve this problem, but there is a simpler way of doing things the Pythonic way. [(element * 2 if element % 2 == 0 else element) for element in x] Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] If you notice what we’ve done here, apart from skipping multiple lines of code, is that we used the if-else statement in the same sentence. Now, if you wanted to perform filtering, you could do this: x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [element * 2 for element in x if element % 2 == 0] Output: [4, 8, 12, 16, 20] What we’ve done here is put the if statement after the for declaration, and Voila! We’ve achieved filtering. If you’re using a nice IDE like Jupyter Notebooks or PyCharm, they will help you format your code as per the PEP 8 suggestions. Why should you write Pythonic code? Well firstly, you’re saving loads of time writing humongous piles of cowdung code, so you’re obviously becoming a smarter and more productive programmer. Python is a pretty slow language, and when you’re trying to do something in Python, which is acquired from another language like Java or C++, you’re going to worsen things. With idiomatic, Pythonic code, you’re improving the speed of your programs. Moreover, idiomatic code is far easier to comprehend and understand for other developers who are working on the same code. It helps a great deal when you’re trying to refactor someone else’s code. Fearing Pythonic idioms Well, I don’t mean the idioms themselves are scary. Rather, quite a few developers and organisations have begun discriminating on the basis of whether someone can or cannot write Pythonic code. This is wrong, because, at the end of the day, though the PEP 8 exists, the idea of the term Pythonic is different for different people. To some it might mean picking up a new style guide and improving the way you code. To others, it might mean being succinct and not repeating themselves. It’s time we stopped judging people on whether they can or can’t write Pythonic code and instead, we should appreciate when someone is able to present readable, easily maintainable and succinct code. If you find them writing a bit of clumsy code, you can choose to talk to them about improving their design considerations. And the world will be a better place! If you’re interested in learning how to write more succinct and concise Python code, check out these resources: Learning Python Design Patterns - Second Edition Python Design Patterns [Video] Python Tips, Tricks and Techniques [Video]    
Read more
  • 0
  • 2
  • 23999

article-image-amazon-patents-2018-machine-learning-ar-robotics
Natasha Mathur
06 Aug 2018
7 min read
Save for later

Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics

Natasha Mathur
06 Aug 2018
7 min read
"There are two kinds of companies, those that work to try to charge more and those that work to charge less. We will be the second."-- Jeff Bezos, CEO Amazon When Jeff Bezos launched Amazon.com in 1994, it was an online bookselling site. This was during a time when bookstores such as Barnes & Noble, Waldenbooks and Crown Books were the leading front runners in the bookstore industry in the American shopping malls. Today, Amazon’s name has become almost synonymous with online retail for most people and has now spread its wings to cloud computing, electronics, tech gadgets and the entertainment world. With market capitalization worth $897.47B as of August 3rd 2018, it’s hard to believe that there was a time when Amazon sold only books. Amazon is constantly pushing to innovate and as new inventions come to shape, there are “patents” made that helps the company have a competitive advantage over technologies and products in order to attract more customers. [box type="shadow" align="" class="" width=""]According to United States Patent and Trademark Office (USPTO), Patent is an exclusive right to invention and “the right to exclude others from making, using, offering for sale, or selling the invention in the United States or “importing” the invention into the United States”.[/box] As of March 20, 2018, Amazon owned 7,717 US patents filed under two business entities, Amazon Technologies, Inc. (7,679), and Amazon.com, Inc (38). Looking at the chart below, you can tell that Amazon Technologies, Inc., was one among the top 15 companies in terms of number of patents granted in 2017. Top 15 companies, by number of patents granted by USPTO, 2017 Amazon competes closely with the world’s leading tech giants in terms of patenting technologies. The below table only considers US patents. Here, Amazon holds only few US patents than IBM, Microsoft, Google, and Apple.  Number of US Patents Containing Emerging-Technology Keywords in Patent Description Some successfully patented Amazon innovations in 2018 There are thousands of inventions that Amazon is tied up with and for which they have filed for patents. These include employee surveillance AR goggles, a real-time accent translator, robotic arms tossing warehouse items,  one-click buying, drones,etc. Let’s have a look at these remarkable innovations by Amazon. AR goggles for improving human-driven fulfillment (or is it to track employees?) Date of Patent: August 2, 2018 Filed: March 20, 2017 Assignee: Amazon Technologies, Inc.   AR Goggles                                                          Features: Amazon has recently patented a pair of augmented reality goggles that could be used to keep track of its employees.The patent is titled “Augmented Reality User interface facilitating fulfillment.” As per the patent application, the application is a wearable computing device such as augmented reality glasses that are worn on user’s head. The user interface is rendered upon one or more lenses of the augmented reality glasses and it helps to show the workers where to place objects in Amazon's fulfillment centers. There’s also a feature in the AR glasses which provides workers with turn-by-turn directions to the destination within the fulfillment centre. This helps them easily locate the destination as all the related information gets rendered on the lenses.    AR Goggles  steps The patent has received criticism over concerns that this application might hamper the privacy of employees within the warehouses, tracking employees’ every single move. However, Amazon has defended the application by saying that it has got nothing to do with “employee surveillance”. As this is a patent, there’s no guarantee if it will actually hit the market. Robotic arms that toss warehouse items Date of Patent: July 17, 2018 Filed: September 29, 2015 Assignee: Amazon Technologies, Inc. Features: Amazon won a patent titled “Robotic tossing of items in inventory system” last month. As per the patent application, “Robotic arms or manipulators can be used to toss inventory items within an inventory system. Tossing strategies for the robotic arms may include information about how a grasped item is to be moved and released by a robotic arm to achieve a trajectory for moving the item to a receiving location”.  Robotic Arms Utilizing a robotic arm to toss an item to a receiving location can help improve throughput through the inventory system. This is possible as the robotic arms will help with reducing the amount of time that may otherwise be spent on placing a grasped item directly onto a surface for receiving the item. “The tossing strategy may be based at least in part upon a database containing information about the item, characteristics of the item, and/or similar items, such as information indicating tossing strategies that have been successful or unsuccessful for such items in the past,” the patent reads.  Robotic Arms Steps Amazon’s aim with this is to eliminate the challenges faced by modern inventory systems like supply chain distribution centers, airport luggage systems, etc, while responding to requests for inventory items. The patent received criticism over the concern that one of the examples in the application was a dwarf figurine and could possibly mock people of short stature. But, according to Amazon, “The intention was simply to illustrate a robotic arm moving products, and it should not be taken out of context.” Real-time accent translator Date of Patent: June 21, 2018 Filed: December 21, 2016 Assignee: Amazon Technologies, Inc. Features: Amazon won a patent for an audio system application, titled “Accent translation” back in June this year, which will help with translating the accent of the speaker to the listener’s accent. The aim with this app is to get rid of the possible communication barriers which may arise due to different accents as they can be difficult to understand at times. Accent translation system The accent translation system collects a number of audio samples from different sources such as phone call, television, movies, broadcasts, etc. Each audio sample will have its association with at least one of the accent sample sets present in its database.  For instance, german accent will be associated with the german accent sample set.   Accent translation system steps In a two-party dialog, acquired audio is analyzed and if it associates with one among a wide range of saved accents then the audio from both the sides is outputted based on the accent of the opposite party. The possibilities with this application are endless. One major use case is the customer care industry where people have to constantly talk to different people with different accents. Drone that uses Human gestures and voice commands Date of Patent: March 20, 2018 Filed: July 18, 2016 Assignee: Amazon Technologies, Inc. Features: Amazon patented for a drone, titled “Human interaction with unmanned aerial vehicles”, earlier this year, that would use human gestures and voice commands for package delivery. Amazon Drone makes use of propulsion technology which will help with managing the speed, trajectory, and direction of the drone.   Drones As per the patent application, “an unmanned aerial vehicle is provided which includes propulsion device, sensor device and a management system. The management system is configured to receive human gestures via the sensor device and in response, instruct the propulsion device to affect and adjustment to the behavior of the unnamed aerial vehicle. Human gestures include-- visible gestures, audible gestures, and other gestures capable of recognition by the unmanned vehicle”. Working structure of drones The concept for drones started when Amazon CEO, Jeff Bezos, promised, back in 2013, that the company aims to make 30-minute deliveries, of packages up to 2.25 kgs or 5 pounds. Amazon’s patents are a clear indication of its efforts and determination for inventing cutting-edge technologies for optimizing its operations so that it can pass on the benefits to its customers in the form of competitively priced product offerings. As Amazon has been putting its focus on machine learning, the drones and robotic arms will make the day-to-day tasks of the facility workers easier and more efficient. In fact, Amazon has stepped up its game big time and is incorporating Augmented reality, with its AR glasses to further scale efficiencies. The real-time accent translators help eliminate the communication barriers, making Amazon cover a wide range of areas and perhaps provide a seamless customer care experience in the coming days. Amazon Echo vs Google Home: Next-gen IoT war Amazon is selling facial recognition technology to police  
Read more
  • 0
  • 3
  • 9516

article-image-product-development-need-developers-and-product-managers-collaborate
Packt Editorial Staff
04 Aug 2018
16 min read
Save for later

Effective Product Development needs developers and product managers collaborating on success metrics

Packt Editorial Staff
04 Aug 2018
16 min read
Modern product development is witnessing a drastic shift. Disruptive ideas and ambiguous business conditions have changed the way products are developed. Product development is no longer guided by existing processes or predefined frameworks. Delivering on time is a baseline metric, as is software quality. Today, businesses are competing to innovate. They are willing to invest in groundbreaking products with cutting-edge technology. Cost is no longer the constraint—execution is. Can product managers then continue to rely upon processes and practices aimed at traditional ways of product building? How do we ensure that software product builders look at the bigger picture and do not tie themselves to engineering practices and technology viability alone? Understanding the business and customer context is essential for creating valuable products. In this article, we are going to identify what success means to us in terms of product development. This article is an excerpt from the book Lean Product Management written by Mangalam Nandakumar. For the kind of impact that we predict our feature idea to have on the Key Business Outcomes, how do we ensure that every aspect of our business is aligned to enable that success? We may also need to make technical trade-offs to ensure that all effort on building the product is geared toward creating a satisfying end-to-end product experience. When individual business functions take trade-off decisions in silo, we could end up creating a broken product experience or improvising the product experience where no improvement is required. For a business to be able to align on trade-offs that may need to be made on technology, it is important to communicate what is possible within business constraints and also what is not achievable. It is not necessary for the business to know or understand the specific best practices, coding practices, design patterns, and so on, that product engineering may apply. However, the business needs to know the value or the lack of value realization, of any investment that is made in terms of costs, effort, resources, and so on. The section addresses the following topics: The need to have a shared view of what success means for a feature idea Defining the right kind of success criteria Creating a shared understanding of technical success criteria "If you want to go quickly, go alone. If you want to go far, go together. We have to go far — quickly." Al Gore Planning for success doesn't come naturally to many of us. Come to think of it, our heroes are always the people who averted failure or pulled us out of a crisis. We perceive success as 'not failing,' but when we set clear goals, failures don't seem that important. We can learn a thing or two about planning for success by observing how babies learn to walk. The trigger for walking starts with babies getting attracted to, say, some object or person that catches their fancy. They decide to act on the trigger, focusing their full attention on the goal of reaching what caught their fancy. They stumble, fall, and hurt themselves, but they will keep going after the goal. Their goal is not about walking. Walking is a means to reaching the shiny object or the person calling to them. So, they don't really see walking without falling      as a measure of success. Of course, the really smart babies know to wail their way to getting the said shiny thing without lifting a toe. Somewhere along the way, software development seems to have forgotten about shiny objects, and instead focused on how to walk without falling. In a way, this has led to an obsession with following processes without applying them to the context and writing perfect code, while disdaining and undervaluing supporting business practices. Although technology is a great enabler, it is not the end in itself. When applied in the context of running a business or creating social impact, technology cannot afford to operate as an isolated function. This is not to say that technologists don't care about impact. Of course, we do. Technologists show a real passion for solving customer problems. They want their code to change lives, create impact, and add value. However, many technologists underestimate the importance of supporting business functions in delivering value. I have come across many developers who don't appreciate the value of marketing, sales, or support. In many cases, like the developer who spent a year perfecting his code without acquiring a single customer, they believe that beautiful code that solves the right problem is enough to make a business succeed. Nothing can be further from the truth Most of this type of thinking is the result of treating technology as an isolated function. There is a significant gap that exists between nontechnical folks and software engineers. On the one hand, nontechnical folks don't understand the possibilities, costs, and limitations of software technology. On the other hand, technologists don't value the need for supporting functions and communicate very little about the possibilities and limitations of technology. This expectation mismatch often leads to unrealistic goals and a widening gap between technology teams and the supporting functions. The result of this widening gap is often cracks opening in the end-to-end product experience for the customer, thereby resulting in a loss of business. Bridging this gap of expectation mismatch requires that technical teams and business functions communicate in the same language, but first they must communicate. Setting SMART goals for team In order to set the right expectations for outcomes, we need the collective wisdom of the entire team. We need to define and agree upon what success means for each feature and to each business function. This will enable teams to set up the entire product experience for success. Setting specific, measurable, achievable, realistic, and time-bound (SMART) metrics can resolve this. We cannot decouple our success criteria from the impact scores we arrived at earlier. So, let's refer to the following table for the ArtGalore digital art gallery: The estimated impact rating was an indication of how much impact  the business expected a feature idea to have on the Key Business Outcomes. If you recall, we rated this on a scale of 0 to 10. When the estimated impact of a Key Business Outcomes is less than five, then the success criteria for that feature is likely to be less ambitious. For example, the estimated impact of "existing buyers can enter a lucky draw to meet an artist of the month" toward generating revenue is zero. What this means is that we don't expect this feature idea to bring in any revenue for us or put in another way, revenue is not the measure of success for this feature idea. If any success criteria for generating revenue does come up for this feature idea, then there is a clear mismatch in terms of how we have prioritized the feature itself. For any feature idea with an estimated impact of five or above, we need to get very specific about how to define and measure success. For instance, the feature idea "existing buyers can enter a lucky draw to meet an artist of the month" has an estimated impact rating of six towards engagement. This means that we expect an increase in engagement as a measure of success for this feature idea. Then, we need to define what "increase in engagement" means. My idea of "increase in engagement" can be very different from your idea of "increase in engagement." This is where being S.M.A.R.T. about our definition of success can be useful. Success metrics are akin to user story acceptance criteria. Acceptance criteria define what conditions must be fulfilled by the software in order for us to sign off on the success of the user story. Acceptance criteria usually revolve around use cases and acceptable functional flows. Similarly, success criteria for feature ideas must define what indicators can tell us that the feature is delivering the expected impact on the KBO. Acceptance criteria also sometimes deal with NFRs (nonfunctional requirements). NFRs include performance, security, and reliability. In many instances, nonfunctional requirements are treated as independent user stories. I also have seen many teams struggle with expressing the need for nonfunctional requirements from a customer's perspective. In the early days of writing user stories, the tendency for myself and most of my colleagues was to write NFRs from a system/application point of view. We would say, "this report must load in 20 seconds," or "in the event of a network failure, partial data must not be saved."  These functional specifications didn't tell us how/why they were important for an end user. Writing user stories forces us to think about the user's perspective. For example, in my team we used to have interesting conversations about why a report needed to load within 20 seconds. This compelled us to think about how the user interacted with our software. It is not uncommon for visionary founders to throw out very ambitious goals for success. Having ambitious goals can have a positive impact in motivating teams to outperform. However, throwing lofty targets around, without having a plan for success, can be counter-productive. For instance, it's rather ambitious to say, "Our newsletter must be the first to publish artworks by all the popular artists in the country," or that "Our newsletter must become the benchmark for art curation." These are really inspiring words, but can mean nothing if we don't have a plan to get there. The general rule of thumb for this part of product experience planning is that when we aim for an ambitious goal, we also sign up to making it happen. Defining success must be a collaborative exercise carried out by all stakeholders. This is the playing field for deciding where we can stretch our goals, and for everyone to agree on what we're signing up to, in order to set the product experience up for success. Defining key success metrics For every feature idea we came up with, we can create feature cards that look like the following sample. This card indicates three aspects about what success means for this feature. We are asking these questions: what are we validating? When do we validate this? What Key Business Outcomes does it help us to validate? The criteria for success demonstrates what the business anticipates as being a tangible outcome from a feature. It also demonstrates which business functions will support, own, and drive the execution of the feature. That's it! We've nailed it, right? Wrong. Success metrics must be SMART, but how specific is the specific? The preceding success metric indicates that 80% of those who sign up for the monthly art catalog will enquire about at least one artwork. Now, 80% could mean 80 people, 800 people, or 8000 people, depending on whether we get 100 sign-ups, 1000, or 10,000, respectively! We have defined what external (customer/market) metrics to look for, but we have not defined whether we can realistically achieve this goal, given our resources and capabilities. The question we need to ask is: are we (as a business) equipped to handle 8000 enquiries? Do we have the expertise, resources, and people to manage this? If we don't plan in advance and assign ownership, our goals can lead to a gap in the product experience. When we clarify this explicitly, each business function could make assumptions. When we say 80% of folks will enquire about one artwork, the sales team is thinking that around 50 people will enquire. This is what the sales team  at ArtGalore is probably equipped to handle. However, marketing is aiming for 750 people and the developers are planning for 1000 people. So, even if we can attract 1000 enquiries, sales can handle only 50 enquiries a month! If this is what we're equipped for today, then building anything more could be wasteful. We need to think about how we can ramp up the sales team to handle more requests. The idea of drilling into success metrics is to gauge whether we're equipped to handle our success. So, maybe our success metric should be that we expect to get about 100 sign-ups in the first three months and between 40-70 folks enquiring about artworks after they sign up. Alternatively, we can find a smart way to enable sales to handle higher sales volumes. Before we write up success metrics, we should be asking a whole truckload of questions that determine the before-and-after of the feature. We need to ask the following questions: What will the monthly catalog showcase? How many curated art items will be showcased each month? What is the nature of the content that we should showcase? Just good high-quality images and text, or is there something more? Who will put together the catalog? How long must this person/team(s) spend to create this catalog? Where will we source the art for curation? Is there a specific date each month when the newsletter needs     to go out? Why do we think 80% of those who sign up will enquire? Is it because of the exclusive nature of art? Is it because of the quality of presentation? Is it because of the timing? What's so special about our catalog? Who handles the incoming enquiries? Is there a number to call    or is it via email? How long would we take to respond to enquiries? If we get 10,000 sign-ups and receive 8000 enquiries, are we equipped to handle these? Are these numbers too high? Can we still meet our response time if we hit those numbers? Would we still be happy if we got only 50% of folks who sign up enquiring? What if it's 30%? When would we throw away the idea of the catalog? This is where the meat of feature success starts taking shape. We  need a plan to uncover underlying assumptions and set ourselves up for success. It's very easy for folks to put out ambitious metrics without understanding the before-and-after of the work involved in meeting that metric. The intent of a strategy should be to set teams up for success, not for failure. Often, ambitious goals are set without considering whether they are realistic and achievable or not. This is so detrimental that teams eventually resort to manipulating the metrics or misrepresenting them, playing the blame game, or hiding information. Sometimes teams try to meet these metrics by deprioritizing other stuff. Eventually, team morale, productivity, and delivery take a hit. Ambitious goals, without the required capacity, capability, and resources to deliver, are useless. Technology to be in line with business outcomes Every business function needs to align toward the Key Business Outcomes and conform to the constraints under which the business operates. In our example here, the deadline is for the business to launch this feature idea before the Big Art show. So, meeting timelines is already a necessary measure of success. The other indicators of product technology measures could be quality, usability, response times, latency, reliability, data privacy, security, and so on. These are traditionally clubbed under NFRs (nonfunctional requirements). They are indicators of how the system has been designed or how the system operates, and are not really about user behavior. There is no aspect of a product that is nonfunctional or without a bearing on business outcomes. In that sense, nonfunctional requirements are a misnomer. NFRs are really technical success criteria. They are also a business stakeholder's decision, based on what outcomes the business wants to pursue. In many time and budget-bound software projects, technical success criteria trade-offs happen without understanding the business context or thinking about the end-to-end product experience. Let's take an example: our app's performance may be okay when handling 100 users, but it could take a hit when we get to 10,000 users. By then, the business has moved on to other priorities and the product isn't ready to make the leap. This depends on how each team can communicate the impact of doing or not doing something today in terms of a cost tomorrow. What that means is that engineering may be able to create software that can scale to 5000 users with minimal effort, but in order to scale to 500,000 users, there's a different level of magnitude required. There is a different approach needed when building solutions for meeting short-term benefits, compared to how we might build systems for long-term benefits. It is not possible to generalize and make a case that just because we build an application quickly, that it is likely to be full of defects or that it won't be secure. By contrast, just because we build a lot of robustness into an application, this does not mean that it will make the product sell better. There is a cost to building something, and there is also a cost to not building something and a cost to a rework. The cost will be justified based on the benefits we can reap, but it is important for product technology and business stakeholders to align on the loss or gain in terms of the end-to-end product experience because of the technical approach we are taking today. In order to arrive at these decisions, the business does not really need to understand design patterns, coding practices, or the nuanced technology details. They need to know the viability to meet business outcomes. This viability is based on technology possibilities, constraints, effort, skills needed, resources (hardware and software), time, and other prerequisites. What we can expect and what we cannot expect must both be agreed upon. In every scope-related discussion, I have seen that there are better insights and conversations when we highlight what the business/customer does not get from this product release. When we only highlight what value they will get, the discussions tend to go toward improvising on that value. When the business realizes what it doesn't get, the discussions lean toward improvising the end-to-end product experience. Should a business care that we wrote unit tests? Does the business care what design patterns we used or what language or software we used? We can have general guidelines for healthy and effective ways to follow best practices within our lines of work, but best practices don't define us, outcomes do. To summarize we learned before commencing on the development of any feature idea, there must be a consensus on what outcomes we are seeking to achieve. The success metrics should be our guideline for finding the smartest way to implement a feature. Developer’s guide to Software architecture patterns Hey hey, I wanna be a Rockstar (Developer) The developer-tester face-off needs to end. It’s putting our projects at risk
Read more
  • 0
  • 0
  • 4260

article-image-neo4j-most-popular-graph-database
Amey Varangaonkar
02 Aug 2018
7 min read
Save for later

Why Neo4j is the most popular graph database

Amey Varangaonkar
02 Aug 2018
7 min read
Neo4j is an open source, distributed data store used to model graph problems. It departs from the traditional nomenclature of database technologies, in which entities are stored in schema-less, entity-like structures called nodes, which are connected to other nodes via relationships or edges. In this article, we are going to discuss the different features and use-cases of Neo4j. This article is an excerpt taken from the book 'Seven NoSQL Databases in a Week' written by Aaron Ploetz et al. Neo4j's best features Aside from its support of the property graph model, Neo4j has several other features that make it a desirable data store. Here, we will examine some of those features and discuss how they can be utilized in a successful Neo4j cluster. Clustering Enterprise Neo4j offers horizontal scaling through two types of clustering. The first is the typical high-availability clustering, in which several slave servers process data overseen by an elected master. In the event that one of the instances should fail, a new master is chosen. The second type of clustering is known as causal clustering. This option provides additional features, such as disposable read replicas and built-in load balancing, that help abstract the distributed nature of the clustered database from the developer. It also supports causal consistency, which aims to support Atomicity Consistency Isolation and Durability (ACID) compliant consistency in use cases where eventual consistency becomes problematic. Essentially, causal consistency is delivered with a distributed transaction algorithm that ensures that a user will be able to immediately read their own write, regardless of which instance handles the request. Neo4j Browser Neo4j ships with Neo4j Browser, a web-based application that can be used for database management, operations, and the execution of Cypher queries. In addition to, monitoring the instance on which it runs, Neo4j Browser also comes with a few built-in learning tools designed to help new users acclimate themselves to Neo4j and graph databases. Neo4j Browser is a huge step up from the command-line tools that dominate the NoSQL landscape. Cache sharding In most clustered Neo4j configurations, a single instance contains a complete copy of the data. At the moment, true sharding is not available, but Neo4j does have a feature known as cache sharding. This feature involves directing queries to instances that only have certain parts of the cache preloaded, so that read requests for extremely large data sets can be adequately served. Help for beginners One of the things that Neo4j does better than most NoSQL data stores is the amount of documentation and tutorials that it has made available for new users. The Neo4j website provides a few links to get started with in-person or online training, as well as meetups and conferences to become acclimated to the community. The Neo4j documentation is very well-done and kept up to date, complete with well-written manuals on development, operations, and data modeling. The blogs and videos by the Neo4j, Inc. engineers are also quite helpful in getting beginners started on the right path. Additionally, when first connecting to your instance/cluster with Neo4j Browser, the first thing that is shown is a list of links directed at beginners. These links direct the user to information about the Neo4j product, graph modeling and use cases, and interactive examples. In fact, executing the play movies command brings up a tutorial that loads a database of movies. This database consists of various nodes and edges that are designed to illustrate the relationships between actors and their roles in various films. Neo4j's versatility demonstrated in its wide use cases Because of Neo4j's focus on node/edge traversal, it is a good fit for use cases requiring analysis and examination of relationships. The property graph model helps to define those relationships in meaningful ways, enabling the user to make informed decisions. Bearing that in mind, there are several use cases for Neo4j (and other graph databases) that seem to fit naturally. Social networks Social networks seem to be a natural fit for graph databases. Individuals have friends, attend events, check in to geographical locations, create posts, and send messages. All of these different aspects can be tracked and managed with a graph database such as Neo4j. Who can see a certain person's posts? Friends? Friends of friends? Who will be attending a certain event? How is a person connected to others attending the same event? In small numbers, these problems could be solved with a number of data stores. But what about an event with several thousand people attending, where each person has a network of 500 friends? Neo4j can help to solve a multitude of problems in this domain, and appropriately scale to meet increasing levels of operational complexity. Matchmaking Like social networks, Neo4j is also a good fit for solving problems presented by matchmaking or dating sites. In this way, a person's interests, goals, and other properties can be traversed and matched to profiles that share certain levels of equality. Additionally, the underlying model can also be applied to prevent certain matches or block specific contacts, which can be useful for this type of application. Network management Working with an enterprise-grade network can be quite complicated. Devices are typically broken up into different domains, sometimes have physical and logical layers, and tend to share a delicate relationship of dependencies with each other. In addition, networks might be very dynamic because of hardware failure/replacement, organization, and personnel changes. The property graph model can be applied to adequately work with the complexity of such networks. In a use case study with Enterprise Management Associates (EMA), this type of problem was reported as an excellent format for capturing and modeling the inter dependencies that can help to diagnose failures. For instance, if a particular device needs to be shut down for maintenance, you would need to be aware of other devices and domains that are dependent on it, in a multitude of directions. Neo4j allows you to capture that easily and naturally without having to define a whole mess of linear relationships between each device. The path of relationships can then be easily traversed at query time to provide the necessary results. Analytics Many scalable data store technologies are not particularly suitable for business analysis or online analytical processing (OLAP) uses. When working with large amounts of data, coalescing desired data can be tricky with relational database management systems (RDBMS). Some enterprises will even duplicate their RDBMS into a separate system for OLAP so as not to interfere with their online transaction processing (OLTP) workloads. Neo4j can scale to present meaningful data about relationships between different enterprise-marketing entities, which is crucial for businesses. Recommendation engines Many brick-and-mortar and online retailers collect data about their customers' shopping habits. However, many of them fail to properly utilize this data to their advantage. Graph databases, such as Neo4j, can help assemble the bigger picture of customer habits for searching and purchasing, and even take trends in geographic areas into consideration. For example, purchasing data may contain patterns indicating that certain customers tend to buy certain beverages on Friday evenings. Based on the relationships of other customers to products in that area, the engine could also suggest things such as cups, mugs, or glassware. Is the customer also a male in his thirties from a sports-obsessed area? Perhaps suggesting a mug supporting the local football team may spark an additional sale. An engine backed by Neo4j may be able to help a retailer uncover these small troves of insight. To summarize, we saw Neo4j is widely used across all enterprises and businesses, primarily due to its speed, efficiency and accuracy. Check out the book Seven NoSQL Databases in a Week to learn more about Neo4j and the other popularly used NoSQL databases such as Redis, HBase, MongoDB, and more. Read more Top 5 programming languages for crunching Big Data effectively Top 5 NoSQL Databases Is Apache Spark today’s Hadoop?
Read more
  • 0
  • 0
  • 10701
article-image-guide-to-safe-cryptocurrency-trading
Guest Contributor
02 Aug 2018
8 min read
Save for later

A Guide to safe cryptocurrency trading

Guest Contributor
02 Aug 2018
8 min read
So, you’ve decided to take a leap of faith and start trading in cryptocurrency. But, do you know how to do it safely? Cryptocurrency has risen in popularity as of late- especially since its market reached half a trillion dollars in 2017! This is good news to you if you ever wanted to trade in a system that veers away from tradition or if you simply distrust the traditional market with all their brokers and bankers. Cryptocurrency trading is, however, not without risks. Hackers work hard every day to steal and scam you out of your hard-earned crypto cash by stealing or coaxing your private keys directly from you. The problem is there’s nowhere to run in case you lose your money since cryptocurrency is largely unregulated. So, should you steer clear of cryptocurrency after all? Heck, no! Read this guide and you’ll be a few steps closer to safe cryptocurrency trading in no time. Know the basics As with any endeavor that involves money, you should at least learn the basic ins and outs of cryptocurrency trading. Remember to always exercise prudence when dealing in cryptocurrency. Also, look for books or reliable sites to guide you through the various risks you might face in cryptocurrency trading. Finally, keep up to date with the latest news and trends involving cryptocurrency-related cybersecurity threats. Use a VPN Most people believe that cryptocurrencies are great for privacy because they don’t need any personal information to buy or sell. In short, they’re anonymous. But, this couldn’t be further from the truth. Cryptocurrencies are pseudonymous- not anonymous. Each coin acts as your pseudonym which means that if your transactions are ever linked to your identity (via your IP address stored in the blockchain), you’ll suddenly find yourself out in the open. A VPN hides this trail by hiding your IP address and encrypting your personal data (like your location and ISP). To ensure that your sensitive transactions (especially those made over public Wi-Fi), use only the best VPN you can afford. The keyword here is “afford”. Never use free VPNs while trading cryptocurrency because free VPNs have been known to share/sell your personal information to their partners or third parties. Worse still, these free VPNs aren’t exactly the most secure. This was the case of popular crypto service MyEtherWallet, which suffered a serious security issue after popular free VPN Hola was compromised for 5 hours. This doesn’t really come as a surprise since Hola was never a secure VPN, to begin with. Check out this Hola VPN review to see for yourself. If you want better VPN options for cryptocurrency trading, try out ZenMate and F-Secure Freedome. Install an antivirus program You can add another layer of safety by installing a high-quality antivirus program. These programs protect you from malware that could take over your computer or device. An antivirus program also protects you from ransomware which hackers use to wrest control over your computer or device by encrypting some or all of your data contained therein and keeping it in stasis until you pay the ransom- which costs $133,000 on average. Now, unlike VPNs, you can get quality protection from free antivirus programs. The best ones, so far, are Avast Free Antivirus and Bitdefender Antivirus. Keep your private key to yourself Your private key is basically the password you use to access your cryptocurrency and it’s the only thing a hacker needs to access to your cryptocurrency. Never share your private key with anyone. Don’t even show a QR code containing your private key. With that said: It’s important to note that your private key is usually stored in your cryptocurrency wallet- which is either “hot” or “cold”. A “hot” wallet is one that is always online and is always ready to use while a “cold” wallet is usually offline and only goes online when you need to use it. Hot wallets are provided by cryptocurrency exchanges when you register an account. They are easy to use and make your cryptocurrency more accessible. However, being provided by an exchange means that you might lose all the funds in that wallet if that exchange ever gets hacked- which usually results in that company shutting down (like Bitfinex, Mt. Gox, and Youbit). How do you avoid this? Easy. Just keep the exact amount you need to spend in your hot wallet and keep the rest in your cold wallet a.k.a cold storage- which, as I’ve already mentioned, is entirely offline. This way, if your hot wallet provider ever gets hacked and goes out of business, you would have only experienced a relatively lesser loss. Now, there are three types of cold wallets to choose from. When choosing which one to use, it’s always important to keep in mind your purpose and the amount of cryptocurrency you plan to keep in that wallet. That said, the three types are: Hardware wallet: By far the most popular type, this wallet takes the form of a device that you plug into your computer’s USB drive. To date, there has yet been any record of cryptocurrency being stolen from a hardware wallet- which makes it useful for when you plan to acquire large amounts of cryptocurrency. This form of cold wallet is also convenient as you don’t need to type in your details each time you buy or sell cryptocurrency. Check out this list for the best cryptocurrency hardware wallets. Paper wallet: This simply involves you printing out your public and private keys on a piece of paper, thus, preventing hackers from accessing them. However, this does make it a bit tedious to type in your keys every time you need to use them online. You also run the risk of losing all your funds if it somehow winds up in someone else’s hands. So, remember to keep your paper wallet safe and secure. Brainwallet: This type of wallet involves you keeping your keys in your brain! This is usually done by memorizing a seed phrase. This means that, as long as you don’t record your seed phrase anywhere else, you are the only one who’ll ever know your keys, thus, making this the most secure wallet of all. However, If the owner of the seed phrase ever forgets it (or worse, dies), the cryptocurrency connected to that seed phrase is lost forever. Beware of phishing Phishing attacks are usually experienced through deceptive emails and websites. This is where a hacker employs fraudulent (usually psychological) tactics to get you to divulge private details. This type of cyber attack is responsible for over $115 million in stolen Etherium just last year. Now, you might be thinking “Why don’t they just avoid suspicious emails or messages?”, right? The thing is, they’re hard to resist. If you want to avoid falling for phishing attempts, check out this post for how to tell if someone is phishing for your cryptocurrency. Trade in secure exchanges Cryptocurrencies are usually bought and sold in a cryptocurrency exchange. However, not all exchanges can be trusted as some have already been proven fake. The problem here is that there’s no inherent protection and nowhere to run to for help if you lose your money. This is because cryptocurrency is, for the most part, unregulated- although the world is starting to catch up. That said, make sure to do your research before investing your money in any cryptocurrency exchange. You can also check out these 20 security tips for a more detailed list of safe trading practices. Conclusion Cryptocurrency trading can be hard, confusing, and downright risky. But, if you follow this guide, you’re at least a few steps closer to safe cryptocurrency trading. Arm yourself with at least the basic knowledge of how cryptocurrency trading works. Don’t fall for the illusion of anonymity that has fooled others and get yourself the best VPN you can afford and remember to install a reliable antivirus program to avoid malware or ransomware. Never reveal your private key. Hot wallets are fine if they only contain the exact amount you want to spend but it’s better to keep all your keys safe in a cold wallet that fits your purpose. Be wary of suspicious sites, emails, or messages that could turn out to be phishing scams and only trade in secure cryptocurrency exchanges. About Author: Dana Jackson, an U.S. expat living in Germany and the founder of PrivacyHub. She loves all things related to security and privacy. She holds a degree in Political Science, and loves to call herself a scientist. Dana also loves morning coffee and her dog Paw.   Cryptocurrency-based firm, Tron acquires BitTorrent Can Cryptocurrency establish a new economic world order? Top 15 Cryptocurrency Trading Bots    
Read more
  • 0
  • 0
  • 5163

article-image-5g-mobile-data-propel-artificial-intelligence
Neil Aitken
02 Aug 2018
7 min read
Save for later

How 5G Mobile Data will propel Artificial Intelligence (AI) progress

Neil Aitken
02 Aug 2018
7 min read
Like it’s predecessors, 3G and 4G, 5G refers to the latest ‘G’ – Generation – of mobile technology. 5G will give us very fast - effectively infinitely fast - mobile data download bandwidth. Downloading a TV show to your phone over 5G, in its entirety, in HD, will take less than a second, for example. A podcast will be downloaded within a fraction of a second of you requesting it. Scratch the surface of 5G, however, and there is a great deal more to see than just fast mobile data speeds.  5G is the backbone on which a lot of emerging technologies such as AI, blockchains, IoT among others will reach mainstream adoption. Today, we look at how 5G will accelerate AI growth and adoption. 5G will create the data AI needs to thrive One feature of 5G with ramifications beyond data speed is ‘Latency.’ 5G offers virtually ‘Zero Latency’ as a service. Latency is the time needed to transmit a packet of data from one device to another. It includes the period of time between when the request was made, to the time the response is completed. [caption id="attachment_21251" align="aligncenter" width="580"] 5G will be superfast – but will also benefit from near zero ‘latency’[/caption] Source: Economist At the moment, we keep files (music, pictures or films) in our phones’ memory permanently. We have plenty of processing power on our devices. In fact, the main upgrade between phone generations these days is a faster processor. In a 5G world, we will be able to use cheap parts in our devices – processors and memory in our new phones. Data downloads will be so fast, that we can get them immediately when we need them. We won’t need to store information on the phone unless we want to.  Even if the files are downloaded from the cloud, because the network has zero latency – he or she feels like the files are on the phone. In other words, you are guaranteed a seamless user experience in a 5G world. The upshot of all this is that the majority of any new data which is generated from mobile products will move to the cloud for storage. At their most fundamental level, AI algorithms are pattern matching tools. The bigger the data trove, the faster and better performing the results of AI analysis is. These new structured data sets, created by 5G, will be available from the place where it is easiest to extract and manipulate (‘Analyze’) it – the cloud. There will be 100 billion 5G devices connected to cellular networks by 2025, according to Huawei. 5G is going to generate data from those devices, and all the smartphones in the world and send it all back to the cloud. That data is the source of the incredible power AI gives businesses. 5G driving AI in autonomous vehicles 5G’s features and this Cloud / Connected Device future, will manifest itself in many ways. One very visible example is how 5G will supercharge the contribution, especially to reliability and safety, that AI can make to self driving cars. A great deal of the AI processing that is required to keep a self driving car operating safely, will be done by computers on board the vehicle. However, 5G’s facilities to communicate large amounts of data quickly will mean that any unusual inputs (for example, the car is entering or in a crash situation) can be sent to bigger computing equipment on the cloud which could perform more serious processing. Zero latency is important in these situations for commands which might come from a centralized accident computer, designed to increase safety– for example issuing the command ‘break.’ In fact, according to manufacturers, it’s likely that, ultimately, groups of cars will be coordinated by AI using 5G to control the vehicles in a model known as swarm computing. 5G will make AI much more useful with ‘context’ - Intel 5G will power AI by providing location information which can be considered in establishing the context of questions asked of the tool – according to Intel’s Data Center Group. For example, asking your Digital Assistant where the tablets are means something different depending on whether you’re in a pharmacy or an electronics store. The nature of 5G is that it’s a mobile service. Location information is both key to context and an inherent element of information sent over a 5G connection. By communicating where they are, 5G sensors will help AI based Digital Assistants solve our everyday problems. 5G phones will enable  AI calculations on ‘Edge’ network devices  - ARM 5G will push some processing to the ‘Edge’ of the network, for manipulation by a growing range of AI chips on to the processors of phones. In this regard, smartphones like any Internet Of Things connected processor ‘in the field’ are simply an ‘AI platform’. Handset manufacturers are including new software features in their phones that customers love to use – including AI based search interfaces which allow them to search for images containing ‘heads’ and see an accurate list. [caption id="attachment_21252" align="aligncenter" width="1918"] Arm are designing new types of chips targeted at AI calculations on ‘Edge’ network devices.[/caption] Source: Arm's Project Trillium ARM, one of the world’s largest CPU producers are creating specific, dedicated AI chip sets, often derived from the technology that was behind their Graphics Processing Units. These chips process AI based calculations up to 50 times faster than standard microprocessors already and their performance is set to improve 50x over the next 3 years, according to the company. AI is part of 5G networks - Huawei Huawei describes itself as an AI company (as well as a number of other things including handset manufacturer.) They are one of the biggest electronic manufacturers in China and are currently in the process of selling networking products to the world’s telecommunications companies, as they prepare to roll out their 5G networks. Based on the insight that 70% of network system downtime comes from human error, Huawei is now eliminating humans from the network management component of their work, to the degree that they can. Instead, they’re implementing automated AI based predictive maintenance systems to increase data throughput across the network and reduce downtime. The way we use cellular networks is changing. Different applications require different backend traffic to be routed across the network, depending on the customer need. Someone watching video, for example, has a far lower tolerance for a disruption to the data throughput (the ‘stuttering Netflix’ effect) than a connected IoT sensor which is trying to communicate the temperature of a thermometer. Huawei’s network maintenance AI software optimizes these different package needs, maintaining the near zero latency that the standard demands at a lower cost. AI based network maintenance complete a virtuous loop in which 5G devices on new cellular networks give AI the raw data they need, including valuable context information, and AI helps the data flow across the 5G network better. Bringing it all together 5G and artificial intelligence (AI) are revolutionary technologies that will evolve alongside each other. 5G isn’t just fast data, it’s one of the most important technologies ever devised. Just as the smartphone did, it will fundamentally change how we relate to information, partly, because it will link us to thousands of newly connected devices on the Internet of Things. Ultimately, it could be the secondary effects of 5G, the network’s almost zero latency, which could provide the largest benefit – by creating structured data sets from billions of connected devices, in an easily accessible place – the cloud which can be used to fuel the AI algorithms which run on them. Networking equipment, chip manufacturers and governments have all connected the importance of AI with the potential of 5G. Commercial sales of 5G start in The US, UK and Australia in 2019. 7 Popular Applications of Artificial Intelligence in Healthcare Top languages for Artificial Intelligence development Cognitive IoT: How Artificial Intelligence is remoulding Industrial and Consumer IoT      
Read more
  • 0
  • 0
  • 5418

article-image-top-automl-libraries-for-building-ml-pipelines
Sunith Shetty
01 Aug 2018
9 min read
Save for later

Top AutoML libraries for building your ML pipelines

Sunith Shetty
01 Aug 2018
9 min read
What is AutoML? When talking about AutoML we mostly refer to automated data preparation (namely feature preprocessing, generation, and selection) and model training (model selection and hyperparameter optimization). The number of possible options for each step of this process can vary vastly depending on the problem type. AutoML allows researchers and practitioners to automatically build ML pipelines out of the possible options for every step to find high-performing ML models for a given problem. AutoML libraries carefully set up experiments for various ML pipelines, which covers all the steps from data ingestion, data processing, modeling, and scoring. In this article we deal with understanding what AutoML is and cover popular AutoML libraries with practical examples. This article is an excerpt from a book written by Sibanjan Das, Umit Mert Cakmak titled Hands-On Automated Machine Learning. Overview of AutoML libraries There are many popular AutoML libraries, and in this section you will get an overview of commonly used ones in the data science community. Featuretools Featuretools is a good library for automatically engineering features from relational and transactional data. The library introduces the concept called Deep Feature Synthesis (DFS). If you have multiple datasets with relationships defined among them such as parent-child based on columns that you use as unique identifiers for examples, DFS will create new features based on certain calculations, such as summation, count, mean, mode, standard deviation, and so on. Let's go through a small example where you will have two tables, one showing the database information and the other showing the database transactions for each database: import pandas as pd # First dataset contains the basic information for databases. databases_df = pd.DataFrame({"database_id": [2234, 1765, 8796, 2237, 3398], "creation_date": ["2018-02-01", "2017-03-02", "2017-05-03", "2013-05-12", "2012-05-09"]}) databases_df.head() You get the following output: The following is the code for the database transaction: # Second dataset contains the information of transaction for each database id db_transactions_df = pd.DataFrame({"transaction_id": [26482746, 19384752, 48571125, 78546789, 19998765, 26482646, 12484752, 42471125, 75346789, 16498765, 65487547, 23453847, 56756771, 45645667, 23423498, 12335268, 76435357, 34534711, 45656746, 12312987], "database_id": [2234, 1765, 2234, 2237, 1765, 8796, 2237, 8796, 3398, 2237, 3398, 2237, 2234, 8796, 1765, 2234, 2237, 1765, 8796, 2237], "transaction_size": [10, 20, 30, 50, 100, 40, 60, 60, 10, 20, 60, 50, 40, 40, 30, 90, 130, 40, 50, 30], "transaction_date": ["2018-02-02", "2018-03-02", "2018-03-02", "2018-04-02", "2018-04-02", "2018-05-02", "2018-06-02", "2018-06-02", "2018-07-02", "2018-07-02", "2018-01-03", "2018-02-03", "2018-03-03", "2018-04-03", "2018-04-03", "2018-07-03", "2018-07-03", "2018-07-03", "2018-08-03", "2018-08-03"]}) db_transactions_df.head() You get the following output: The code for the entities is as follows: # Entities for each of datasets should be defined entities = { "databases" : (databases_df, "database_id"), "transactions" : (db_transactions_df, "transaction_id") } # Relationships between tables should also be defined as below relationships = [("databases", "database_id", "transactions", "database_id")] print(entities) You get the following output for the preceding code: The following code snippet will create feature matrix and feature definitions: # There are 2 entities called ‘databases’ and ‘transactions’ # All the pieces that are necessary to engineer features are in place, you can create your feature matrix as below import featuretools as ft feature_matrix_db_transactions, feature_defs = ft.dfs(entities=entities, relationships=relationships, target_entity="databases") The following output shows some of the features that are generated: You can see all feature definitions by looking at the following features_defs: feature_defs The output is as follows: This is how you can easily generate features based on relational and transactional datasets. Auto-sklearn Scikit-learn has a great API for developing ML models and pipelines. Scikit-learn's API is very consistent and mature; if you are used to working with it, auto-sklearn will be just as easy to use since it's really a drop-in replacement for scikit-learn estimators. Let's see a little example: # Necessary imports import autosklearn.classification import sklearn.model_selection import sklearn.datasets import sklearn.metrics from sklearn.model_selection import train_test_split # Digits dataset is one of the most popular datasets in machine learning community. # Every example in this datasets represents a 8x8 image of a digit. X, y = sklearn.datasets.load_digits(return_X_y=True) # Let's see the first image. Image is reshaped to 8x8, otherwise it's a vector of size 64. X[0].reshape(8,8) The output is as follows: You can plot a couple of images to see how they look: import matplotlib.pyplot as plt number_of_images = 10 images_and_labels = list(zip(X, y)) for i, (image, label) in enumerate(images_and_labels[:number_of_images]): plt.subplot(2, number_of_images, i + 1) plt.axis('off') plt.imshow(image.reshape(8,8), cmap=plt.cm.gray_r, interpolation='nearest') plt.title('%i' % label) plt.show() Running the preceding snippet will give you the following plot: Splitting the dataset to train and test data: # We split our dataset to train and test data X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) # Similarly to creating an estimator in Scikit-learn, we create AutoSklearnClassifier automl = autosklearn.classification.AutoSklearnClassifier() # All you need to do is to invoke fit method to start experiment with different feature engineering methods and machine learning models automl.fit(X_train, y_train) # Generating predictions is same as Scikit-learn, you need to invoke predict method. y_hat = automl.predict(X_test) print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat)) # Accuracy score 0.98 That was easy, wasn't it? MLBox MLBox is another AutoML library that supports distributed data processing, cleaning, formatting, and state-of-the-art algorithms such as LightGBM and XGBoost. It also supports model stacking, which allows you to combine an information ensemble of models to generate a new model aiming to have better performance than the individual models. Here's an example of its usage: # Necessary Imports from mlbox.preprocessing import * from mlbox.optimisation import * from mlbox.prediction import * import wget file_link = 'https://apsportal.ibm.com/exchange-api/v1/entries/8044492073eb964f46597b4be06ff5ea/data?accessKey=9561295fa407698694b1e254d0099600' file_name = wget.download(file_link) print(file_name) # GoSales_Tx_NaiveBayes.csv The GoSales dataset contains information for customers and their product preferences: import pandas as pd df = pd.read_csv('GoSales_Tx_NaiveBayes.csv') df.head() You get the following output from the preceding code: Let's create a test set from the same dataset by dropping a target column: test_df = df.drop(['PRODUCT_LINE'], axis = 1) # First 300 records saved as test dataset test_df[:300].to_csv('test_data.csv') paths = ["GoSales_Tx_NaiveBayes.csv", "test_data.csv"] target_name = "PRODUCT_LINE" rd = Reader(sep = ',') df = rd.train_test_split(paths, target_name) The output will be similar to the following: Drift_thresholder will help you to drop IDs and drifting variables between train and test datasets: dft = Drift_thresholder() df = dft.fit_transform(df) You get the following output: Optimiser will optimize the hyperparameters: opt = Optimiser(scoring = 'accuracy', n_folds = 3) opt.evaluate(None, df) You get the following output by running the preceding code: The following code defines the parameters of the ML pipeline: space = { 'ne__numerical_strategy':{"search":"choice", "space":[0]}, 'ce__strategy':{"search":"choice", "space":["label_encoding","random_projection", "entity_embedding"]}, 'fs__threshold':{"search":"uniform", "space":[0.01,0.3]}, 'est__max_depth':{"search":"choice", "space":[3,4,5,6,7]} } best = opt.optimise(space, df,15) The following output shows you the selected methods that are being tested by being given the ML algorithms, which is LightGBM in this output: You can also see various measures such as accuracy, variance, and CPU time: Using Predictor, you can use the best model to make predictions: predictor = Predictor() predictor.fit_predict(best, df) You get the following output: TPOT Tree-Based Pipeline Optimization Tool (TPOT) uses genetic programming to find the best performing ML pipelines, built on top of scikit-learn. Once your dataset is cleaned and ready to be used, TPOT will help you with the following steps of your ML pipeline: Feature preprocessing Feature construction and selection Model selection Hyperparameter optimization Once TPOT is done with its experimentation, it will provide you with the best performing pipeline. TPOT is very user-friendly as it's similar to using scikit-learn's API: from tpot import TPOTClassifier from sklearn.datasets import load_digits from sklearn.model_selection import train_test_split # Digits dataset that you have used in Auto-sklearn example digits = load_digits() X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, train_size=0.75, test_size=0.25) # You will create your TPOT classifier with commonly used arguments tpot = TPOTClassifier(generations=10, population_size=30, verbosity=2) # When you invoke fit method, TPOT will create generations of populations, seeking best set of parameters. Arguments you have used to create TPOTClassifier such as generations and population_size will affect the search space and resulting pipeline. tpot.fit(X_train, y_train) print(tpot.score(X_test, y_test)) # 0.9834 tpot.export('my_pipeline.py') Once you have exported your pipeline in the Python my_pipeline.py file, you will see the selected pipeline components: import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier # NOTE: Make sure that the class is labeled 'target' in the data file tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64) features = tpot_data.drop('target', axis=1).values training_features, testing_features, training_target, testing_target = train_test_split(features, tpot_data['target'].values, random_state=42) exported_pipeline = KNeighborsClassifier(n_neighbors=6, weights="distance") exported_pipeline.fit(training_features, training_target) results = exported_pipeline.predict(testing_features) To summarize, you learnt about Automated ML and practiced your skills using popular AutoML libraries. This is definitely not the whole list, and AutoML is an active area of research. You should check out other libraries such as Auto-WEKA, which also uses the latest innovations in Bayesian optimization, and Xcessive, which is a user-friendly tool for creating stacked ensembles. To know how AutoML can be further used to automate parts of Machine Learning, check out the book Hands-On Automated Machine Learning. Read more Anatomy of an automated machine learning algorithm (AutoML) AutoML: Developments and where is it heading to AmoebaNets: Google’s new evolutionary AutoML
Read more
  • 0
  • 0
  • 9920
article-image-earn-1m-per-year-hint-learn-machine-learning
Neil Aitken
01 Aug 2018
10 min read
Save for later

How to earn $1m per year? Hint: Learn machine learning

Neil Aitken
01 Aug 2018
10 min read
Internet job portal ‘Indeed.com’ links potential employers with people who are looking to take the next step in their careers. The proportion of job posts on their site, relating to ‘Data Science’, a specific job in the AI category, is growing fast (see chart below). More broadly, Artificial Intelligence & machine learning skills, of which ‘Data Scientist’ is just one example, are in demand. No wonder that it has been termed as the sexiest job role of the 21st century. Interest comes from an explosion of jobs in the field from big companies and Start-Ups, all of which are competing to come up with the best AI business and to earn the money that comes with software that automates tasks. The skills shortage associated with Artificial Intelligence represents an opportunity for any developer. There has never been a better time to consider whether reskilling or upskilling in AI could be a lucrative path for you. Below : Indeed.com. Proportion of job postings containing Data Scientist or Data Science. [caption id="attachment_21240" align="aligncenter" width="1525"] Artificial Intelligence skills are increasingly in demand and create a real opportunity for those prepared to reskill or upskill.[/caption] Source: Indeed  The AI skills gap the market is experiencing comes from the difficulty associated with finding an individual demonstrating a competent mixture of the very disparate faculties that AI roles require. Artificial Intelligence and it’s near equivalents such as Machine Learning and Neural Networks operate at the intersection of what have mostly been two very different disciplines – statistics and software development. In simple terms, they are half coding, half maths. Hamish Ogilvy, CEO of AI based Internal Search company Sajari is all too familiar with the problem. He’s on the front line, hiring AI developers. “The hardest part”, says Ogilvy, “is that AI is pretty complex and the average developer/engineer does not have the background in maths/stats/science to actually understand what is happening. On the flip side the trouble with the stats/maths/science people is that they typically can't code, so finding people in that sweet spot that have both is pretty tough.” He’s right. The New York Times suggests that the pool of qualified talent is only 10,000 people, worldwide. Those who do have jobs are typically happily ensconced, paid well, treated appropriately and given no reason whatsoever to want to leave. [caption id="attachment_21244" align="aligncenter" width="1920"] Judged by $ investments in the area alone, AI skills are worth developing for those wishing to stay current on technology skills.[/caption] In fact, an instinct to develop AI skills will serve any technology employee well. No One can have escaped the many estimates, from reputable consultancies, suggesting that Automation will replace up to 30% of jobs in the next 10 years. No job is safe. Every industry is touched by AI in some form. Any responsible individual with a view to the management of their own skills could learn ML and AI skills to stay relevant in current times. Even if you don't want to move out of your current job, learning ML will probably help you adapt better in your industry. What is a typical AI job and what will it pay? OpenAI, a world class Artificial Intelligence research laboratory, revealed the salaries of some of its key Data Science employees recently. Those working in the AI field with a specialization can earn $300 to $500k in their first year out of university. Experts in Artificial Intelligence now command salaries of up to $1m. [caption id="attachment_21242" align="aligncenter" width="432"] The New York Times observes AI salaries[/caption] [caption id="attachment_21241" align="aligncenter" width="1121"] The New York Times observes AI salaries[/caption] Source: The New York times Indraneil Roy, an Expert in AI and Talent Acquisition who works for Edge Networks puts it this way when outlining the difficulties of hiring the right skills and to explain why wages in the field are so high. “The challenge is the quality of resources. As demand is high for this skill, we are already seeing candidates with fake experience and work pedigree not up to standards.” The phenomenon is also causing a ‘brain drain’ in Universities. About a third of jobs in the AI field will go to someone with a Ph.D., and all of those are drawn from universities working on the discipline, often lured by the significant pay packages which are available. So, with huge demand and the universities drained, where will future AI employees come from? 3 ways to skill up to become an AI expert (And earn all that money?) There is still not a lot of agreed terminology or even job roles and responsibility in the sector. However, some things are clear. Those wishing to evolve in to the field of AI must understand the conceptual thinking involved, as a starting point, whether that view is found on the job or as part of an informal / formal educational course. Specifically, most jobs in the specialty require a working knowledge of neural networks, data / analytics, predictive analytics, with some basic programming and database skills. There are some great resources available online to train you up. Most, as you’d expect, are available on your smartphone so there really is no excuse for not having a look. 1. Free online course: Machine Learning & Statistics and probability Hamish Ogilvy summed the online education which is available in the area well. There are “so many free courses now on AI from Stanford,” he said, “that people are able to educate themselves and make up for the failings of antiquated university courses. AI is just maths really,” he says “complex models and stats. So that's what people need grounding in to be successful.” Microsoft offer free AI courses for technical professionals: Microsoft’s training materials are second to none. They’re also provided free and provide a shortcut to a credible understanding in an area simply because it comes from a technical behemoth. Importantly, they also have a list of AI services which you can play with, again for free. For example, a Natural Language engine offers a facility for you to submit text from Instant Messaging conversations and establish the sentiment being felt by the writer. Practical experience of the tools, processes and concepts involved will set you apart. See below. [caption id="attachment_21245" align="aligncenter" width="1999"] Check out Microsoft’s free AI training program for developers.[/caption] Google are taking a proactive stance on Machine Learning. They see it’s potential to improve efficiency in every industry and also offer free ML training courses on their site. 2. Take courses on AI/ML Packt’s machine learning courses, books and videos: Packt is working towards a mission to help the world put software to work in new ways, through the delivery of effective learning and information services to IT professionals. It has published over 6,000 books and videos so far, providing IT professionals with the actionable knowledge they need to get the job done - whether that's specific learning on an emerging technology or optimizing key skills in more established tools. You can choose from a variety of Packt’s books, videos and courses for AI/ML. Here’s a list of top ones: Artificial Intelligence by Example [Book] Artificial Intelligence for Big data [Book] Learn Artificial Intelligence with TensorFlow [Video] Introduction to Artificial Intelligence with Java [Video] Advanced Artificial Intelligence Projects with Python [Video] Python Machine learning - Second Edition [Book] Machine Learning with R - Second Edition [Book] Coursera’s machine learning courses Coursera is a company which make training courses, for a variety of subjects, available online. Taken from actual University course content and delivered with tests, videos and training notes, all accessed online, each course is roughly a University Module. Students pick up an ‘up to under-graduate’ level of understanding of the content involved. Coursera’s courses are often cited as merit worthy and are recognizable in the industry. Costs vary but are typically between $2k and $5k per course. 3. Learn by doing Familiarize yourself with relevant frameworks and tools including Tensor Flow, Python and Keras. TensorFlow from Google is the most used open source AI software library. You can use existing code in your experiments and experiment with neural networks in much the same way as you can in Microsoft’s. Python is a programming language written for a big data world. Its proponents will tell you that Python saves developers hundreds of lines of code, allowing you to tie together information and systems faster than ever before. Python is used extensively in ML and AI applications and should be at the top of your study list. Keras, a deep learning library is similarly ubiquitous. It’s a high level Neural Network API designed to allow prototyping of your software as fast as possible. Finally, a lesser known but still valuable resources is the Accord.net. It is one final example of the many  software elements with which you can engage with to train yourself up. Accord Framework.net will expose you to image libraries, natural learning and real time facial recognition. Earn extra points with employers AI has several lighthouse tasks which are proving the potential of the technology in these still early stages. We’ve included a couple of examples, Natural Language processing and image recognition, above. Practical expertise in these areas specifically, image or voice recognition or pattern matching are valued highly by employers. Alternatively, have you patented something? A registered patent in your name is highly prized. Especially something to do with Machine Learning. Both will help you showcase Extra skills / achievements that will help your application.’ The specifics of how to apply for patents differ by country but you can find out more about the overall principles of how to submit an idea here. Passion and engagement in the subject are also, clearly appealing characteristics for potential employers to see in applicants. Participating in competitions like Kaggle, and having a portfolio of projects you can showcase on facilities like GitHub are also well prized. Of all of these suggestions, for those employed, any on the job experience you can get will stand you in the best stead. Indraneil says "Individual candidates need to spend more time doing relevant projects while in employment. Start ups involved in building products and platforms on AI seem to have better talent." The fact that there are not many AI specialists around is a bad sign There is a demand for employees with AI skills and an investment in relevant training may pay you well. Unfortunately, the underlying problem this situation reveals could be far worse than the problems experienced so far. Together, once found, all these AI scientists are going to automate millions of jobs, in every industry, in every country around the world. If Industry, Governments and Universities cannot train enough people to fill the roles being created by an evolving skills market, we may rightly be concerned to worry about how they will deal with retraining all those displaced by AI, for whom there may be no obvious replacement role available. 18 striking AI Trends to watch in 2018 – Part 1 DeepMind, Elon Musk, and others pledge not to build lethal AI Attention designers, Artificial Intelligence can now create realistic virtual textures
Read more
  • 0
  • 0
  • 19720

article-image-are-distributed-networks-decentralized-systems-same
Amarabha Banerjee
31 Jul 2018
3 min read
Save for later

Are distributed networks and decentralized systems the same?

Amarabha Banerjee
31 Jul 2018
3 min read
The emergence of Blockchain has paved way for the implementation of non-centralized network architecture. The seeds of distributed network architecture was sown back in 1964, by Paul Baran in his paper “On Distributed Networks“. Since then, there have been many attempts at implementing this architecture in network systems. The most recent implementation has been aided by the discovery of Blockchain technology, in 2009, by the anonymous Satoshi Nakamoto. Terminologies: Network: A collection of interlinked nodes that exchange information. Node: The most basic part of the network; for example, a user or computer. Link: The connection between two nodes. Server: A node that has connections to a relatively large amount of other nodes The mother of all the network architectures is a centralized Network. Here, the primary decision making control rests with one Node. The message is carried on by Sub Nodes. Distributed networks are in a way, a conglomeration of small centralized networks. It consists of multiple Nodes which themselves are miniature versions of centralized networks. Decentralized networks consist of individual nodes and every one of these Nodes are capable of independent decision making. Hence there is no central control in Decentralized networks. Source: Meshworld A common misconception is that Distributed and Decentralized systems are the same; they just have different nomenclature and slightly different functionalities. In a way this is true, but not completely. A distributed system still has one central control algorithm, that takes the final call over the process and protocol to be followed. As an example, let’s consider distributed digital ledgers. Each of these ledgers are independent network nodes. These ledgers get updated with individual transactions and other details and this information is not passed on to the other Nodes. This particular feature makes this system secure and decentralized. The other Nodes are not aware of the information stored. This is how a decentralized network behaves. Now the same system behaves a bit differently, when the Nodes communicate with each other, (in-case of Ethereum, by the movement of “Ether” between nodes). Although the individual Node’s information stays secure, the information about the state of the Nodes is passed on, finally to a central peer. Then the peer takes on the decision of which state to finally change to and what is the optimum process to change the state. This is decided by the votes of the individual nodes. The Nodes then change to the new state, preserving information about the previous state. This makes the system dynamic, secure and distributed, because although the Nodes get to vote based on their individual states, the final decision is taken by the centralized peer. This is a distributed system. Hence we can clearly state that decentralized systems are a subset of distributed systems, more independent and minus any central controlling authority. This presents a familiar question to us - are the current blockchain based apps purely decentralized, or are they just distributed systems with a central control? Could this be the reason why we have not yet reached the ultimate Cryptocurrency based alternative economy? Put differently, is the invisible central control hindering the evolution of blockchain based systems to a purely decentralized system and economy? Only time and more dedicated research, along with better practical implementation of decentralized applications will answer these questions. A brief history of Blockchain Blockchain can solve tech’s trust issues – Imran Bashir Partnership alliances of Kontakt.io and IOTA Foundation for IoT and Blockchain
Read more
  • 0
  • 0
  • 7749