Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-ai-deserve-to-be-so-overhyped

28 May 2018

6 min read

Does AI deserve to be so Overhyped?

28 May 2018

The short answer is yes, and no. The long answer is, well, read on to find out. Several have been asking the question, including myself, wondering whether Artificial Intelligence is just another passing fad like maybe the Google Glass or nano technology. The hype for AI began over the past few years, although if you actually look back at the 60’s it seems to have started way back then. In the early 90s and all the way down to the early 2000’s, a lot of media and television shows were talking about AI quite a bit. Going 25 centuries even further back, Aristotle speaks of not just thinking machines but goes on to talk of autonomous ones in his book, Politics: for if every instrument, at command, or from a preconception of its master's will, could accomplish its work (as the story goes of the statues of Daedalus; or what the poet tells us of the tripods of Vulcan, "that they moved of their own accord into the assembly of the gods "), the shuttle would then weave, and the lyre play of itself; nor would the architect want servants, or the [1254a] master slaves. Aristotle, Politics: A treatise on Government, Book 1, Chapter 4 This imagery of AI has managed to sink into our subconscious minds over the centuries propelling creative work, academic research and industrial revolutions toward that goal. The thought of giving machines a mind of their own, existed quite long ago, but recent advancements in technology have made it much clearer and realistic. The Rise of the Machines The year is 2018. The 4th Industrial Revolution is happening and intelligent automation has taken over. This is the point where I say no, AI is not overhyped. General Electric, for example, is a billion dollar manufacturing company that has already invested in AI. GE Digital has AI systems running through several automated systems. They even have their own IIoT platform called Predix. Similarly, in the field of healthcare, the implementation of AI is growing in leaps and bounds. The Google Deepmind project is able to process millions of medical records within minutes. Although this kind of research is in its early phase, Google is working closely with the Moorfields Eye Hospital NHS Foundation Trust to implement AI and improve eye treatment. AI startups focused on healthcare and other allied areas such as genetic engineering are some of the highly invested and venture capital supported ones in recent times. Computer Vision or image recognition is one field where AI has really proven its power. Analysing datasets like iris has never been easier, paving way for more advanced use cases like automated quality checks in manufacturing units. Another interesting field is Healthcare, where AI has helped sift through tonnes of data, helping doctors diagnose illnesses quicker, manufacture more effective and responsive drugs, and in patient monitoring. The list is endless, clearly showing that AI has made its mark in several industries. Back (up) to the Future Now, if you talk about the commercial implementations of AI, they’re still quite far fetched at the moment. Take the same Computer Vision application for example. Its implementation will be a huge breakthrough in autonomous vehicles. But if researchers have managed to obtain an 80% accuracy for object recognition on roads, the battle is not close to being won! Even if they do improve, do you think driverless vehicles are ready to drive in the snow, through the rain or even storms? I remember a few years ago, Business Process Outsourcing was one industry, at least in India, that was quite fearful of the entry of AI and autonomous systems that might take over their jobs. Machines are only capable of performing 60-70% of the BPO processes in Insurance, and with changing customer requirements and simultaneously falling patience levels, these numbers are terrible! It looks like the end of Moore’s law is here, for AI I mean. Well, you can’t really expect AI to have the same exponential growth that computers did, decades ago. There are a lot of unmet expectations in several fields, which has a considerable number of people thinking that AI isn’t going to solve their problems now, and they’re right. It is probably going to take a few more years to mature, making it a thing of the future, not of the present. Is AI overhyped now? Yeah, maybe? What I think Someone once said, hype is a double-edged sword. If it’s not enough, innovation may become obscure and if it’s too much, expectations will become unreasonable. It’s true that AI has several beneficial use cases, but what about fairness of such systems? Will machines continue to think the way they’re supposed to or will they start finding their own missions that don’t involve benefits to the human race? At the same time, there’s also a question of security and data privacy. GDPR will come into effect in a few days, but what about the prevailing issues of internet security? I had an interesting discussion with a colleague yesterday. We were talking about what the impact of AI could be for us as end-customers, in a developing and young country like India. Do we really need to fear losing our jobs, will we be able to reap the benefits of AI directly or would it be an indirect impact? The answer is, probably yes, but not so soon. If we drew up the hierarchy of needs pyramid for AI, it would look something like the above. For each field to fully leverage AI, it’s going to involve several stages like collecting data, storing it effectively, exploring it, then aggregating it, optimising it with the help of algorithms and then finally achieving AI. That’s bound to take a LOT of time! Honestly speaking, a country like India lacks as much implementation of AI in several fields. The major customers of AI, apart from some industrial giants, will obviously be the government. Although, that is sure to take at least a decade or so, keeping in mind the several aspects to be accomplished first. In the meantime, buddying AI developers and engineers are scurrying to skill themselves up in the race to be in the cream of the crowd! Similarly, what about the rest of the world? Well, I can’t speak for everyone, but if you ask me, AI is a really promising technology and I think we need to give it some time; allow the industries and organisations investing in it to take enough time to let it evolve and ultimately benefit us customers, one way or another. You can now make music with AI thanks to Magenta.js Splunk leverages AI in its monitoring tools

0
0
23370

article-image-libraries-for-geospatial-analysis

Aarthi Kumaraswamy

22 May 2018

12 min read

Top 7 libraries for geospatial analysis

Aarthi Kumaraswamy

22 May 2018

12 min read

The term geospatial refers to finding information that is located on the earth's surface. This can include, for example, the position of a cellphone tower, the shape of a road, or the outline of a country. Geospatial data often associates some piece of information with a particular location. Geospatial development is the process of writing computer programs that can access, manipulate, and display this type of information. Internally, geospatial data is represented as a series of coordinates, often in the form of latitude and longitude values. Additional attributes, such as temperature, soil type, height, or the name of a landmark, are also often present. There can be many thousands (or even millions) of data points for a single set of geospatial data. In addition to the prosaic tasks of importing geospatial data from various external file formats and translating data from one projection to another, geospatial data can also be manipulated to solve various interesting problems. Obvious examples include the task of calculating the distance between two points, calculating the length of a road, or finding all data points within a given radius of a selected point. We use libraries to solve all of these problems and more. Today we will look at the major libraries used to process and analyze geospatial data. GDAL/OGR GEOS Shapely Fiona Python Shapefile Library (pyshp) pyproj Rasterio GeoPandas This is an excerpt from the book, Mastering Geospatial Analysis with Python by Paul Crickard, Eric van Rees, and Silas Toms. Geospatial Data Abstraction Library (GDAL) and the OGR Simple Features Library The Geospatial Data Abstraction Library (GDAL)/OGR Simple Features Library combines two separate libraries that are generally downloaded together as a GDAL. This means that installing the GDAL package also gives access to OGR functionality. The reason GDAL is covered first is that other packages were written after GDAL, so chronologically, it comes first. As you will notice, some of the packages covered in this post extend GDAL's functionality or use it under the hood. GDAL was created in the 1990s by Frank Warmerdam and saw its first release in June 2000. Later, the development of GDAL was transferred to the Open Source Geospatial Foundation (OSGeo). Technically, GDAL is a little different than your average Python package as the GDAL package itself was written in C and C++, meaning that in order to be able to use it in Python, you need to compile GDAL and its associated Python bindings. However, using conda and Anaconda makes it relatively easy to get started quickly. Because it was written in C and C++, the online GDAL documentation is written in the C++ version of the libraries. For Python developers, this can be challenging, but many functions are documented and can be consulted with the built-in pydoc utility, or by using the help function within Python. Because of its history, working with GDAL in Python also feels a lot like working in C++ rather than pure Python. For example, a naming convention in OGR is different than Python's since you use uppercase for functions instead of lowercase. These differences explain the choice for some of the other Python libraries such as Rasterio and Shapely, which are also covered in this chapter, that has been written from a Python developer's perspective but offer the same GDAL functionality. GDAL is a massive and widely used data library for raster data. It supports the reading and writing of many raster file formats, with the latest version counting up to 200 different file formats that are supported. Because of this, it is indispensable for geospatial data management and analysis. Used together with other Python libraries, GDAL enables some powerful remote sensing functionalities. It's also an industry standard and is present in commercial and open source GIS software. The OGR library is used to read and write vector-format geospatial data, supporting reading and writing data in many different formats. OGR uses a consistent model to be able to manage many different vector data formats. You can use OGR to do vector reprojection, vector data format conversion, vector attribute data filtering, and more. GDAL/OGR libraries are not only useful for Python programmers but are also used by many GIS vendors and open source projects. The latest GDAL version at the time of writing is 2.2.4, which was released in March 2018. GEOS The Geometry Engine Open Source (GEOS) is the C/C++ port of a subset of the Java Topology Suite (JTS) and selected functions. GEOS aims to contain the complete functionality of JTS in C++. It can be compiled on many platforms, including Python. As you will see later on, the Shapely library uses functions from the GEOS library. In fact, there are many applications using GEOS, including PostGIS and QGIS. GeoDjango, also uses GEOS, as well as GDAL, among other geospatial libraries. GEOS can also be compiled with GDAL, giving OGR all of its capabilities. The JTS is an open source geospatial computational geometry library written in Java. It provides various functionalities, including a geometry model, geometric functions, spatial structures and algorithms, and i/o capabilities. Using GEOS, you have access to the following capabilities—geospatial functions (such as within and contains), geospatial operations (union, intersection, and many more), spatial indexing, Open Geospatial Consortium (OGC) well-known text (WKT) and well-known binary (WKB) input/output, the C and C++ APIs, and thread safety. Shapely Shapely is a Python package for manipulation and analysis of planar features, using functions from the GEOS library (the engine of PostGIS) and a port of the JTS. Shapely is not concerned with data formats or coordinate systems but can be readily integrated with such packages. Shapely only deals with analyzing geometries and offers no capabilities for reading and writing geospatial files. It was developed by Sean Gillies, who was also the person behind Fiona and Rasterio. Shapely supports eight fundamental geometry types that are implemented as a class in the shapely.geometry module—points, multipoints, linestrings, multilinestrings, linearrings, multipolygons, polygons, and geometrycollections. Apart from representing these geometries, Shapely can be used to manipulate and analyze geometries through a number of methods and attributes. Shapely has mainly the same classes and functions as OGR while dealing with geometries. The difference between Shapely and OGR is that Shapely has a more Pythonic and very intuitive interface, is better optimized, and has a well-developed documentation. With Shapely, you're writing pure Python, whereas with GEOS, you're writing C++ in Python. For data munging, a term used for data management and analysis, you're better off writing in pure Python rather than C++, which explains why these libraries were created. For more information on Shapely, consult the documentation. This page also has detailed information on installing Shapely for different platforms and how to build Shapely from the source for compatibility with other modules that depend on GEOS. This refers to the fact that installing Shapely will require you to upgrade NumPy and GEOS if these are already installed. Fiona Fiona is the API of OGR. It can be used for reading and writing data formats. The main reason for using it instead of OGR is that it's closer to Python than OGR as well as more dependable and less error-prone. It makes use of two markup languages, WKT and WKB, for representing spatial information with regards to vector data. As such, it can be combined well with other Python libraries such as Shapely, you would use Fiona for input and output, and Shapely for creating and manipulating geospatial data. While Fiona is Python compatible and our recommendation, users should also be aware of some of the disadvantages. It is more dependable than OGR because it uses Python objects for copying vector data instead of C pointers, which also means that they use more memory, which affects the performance. Python shapefile library (pyshp) The Python shapefile library (pyshp) is a pure Python library and is used to read and write shapefiles. The pyshp library's sole purpose is to work with shapefiles—it only uses the Python standard library. You cannot use it for geometric operations. If you're only working with shapefiles, this one-file-only library is simpler than using GDAL. pyproj The pyproj is a Python package that performs cartographic transformations and geodetic computations. It is a Cython wrapper to provide Python interfaces to PROJ.4 functions, meaning you can access an existing library of C code in Python. PROJ.4 is a projection library that transforms data among many coordinate systems and is also available through GDAL and OGR. The reason that PROJ.4 is still popular and widely used is two-fold: Firstly, because it supports so many different coordinate systems Secondly, because of the routes it provides to do this—Rasterio and GeoPandas, two Python libraries covered next, both use pyproj and thus PROJ.4 functionality under the hood The difference between using PROJ.4 separately instead of using it with a package such as GDAL is that it enables you to re-project individual points, and packages using PROJ.4 do not offer this functionality. The pyproj package offers two classes—the Proj class and the Geod class. The Proj class performs cartographic computations, while the Geod class performs geodetic computations. Rasterio Rasterio is a GDAL and NumPy-based Python library for raster data, written with the Python developer in mind instead of C, using Python language types, protocols, and idioms. Rasterio aims to make GIS data more accessible to Python programmers and helps GIS analysts learn important Python standards. Rasterio relies on concepts of Python rather than GIS. Rasterio is an open source project from the satellite team of Mapbox, a provider of custom online maps for websites and applications. The name of this library should be pronounced as raster-i-o rather than ras-te-rio. Rasterio came into being as a result of a project called the Mapbox Cloudless Atlas, which aimed to create a pretty-looking basemap from satellite imagery. One of the software requirements was to use open source software and a high-level language with handy multi-dimensional array syntax. Although GDAL offers proven algorithms and drivers, developing with GDAL's Python bindings feels a lot like C++. Therefore, Rasterio was designed to be a Python package at the top, with extension modules (using Cython) in the middle, and a GDAL shared library on the bottom. Other requirements for the raster library were being able to read and write NumPy ndarrays to and from data files, use Python types, protocols, and idioms instead of C or C++ to free programmers from having to code in two languages. For georeferencing, Rasterio follows the lead of pyproj. There are a couple of capabilities added on top of reading and writing, one of them being a features module. Reprojection of geospatial data can be done with the rasterio.warp module. Rasterio's project homepage can be found on Github. GeoPandas GeoPandas is a Python library for working with vector data. It is based on the pandas library that is part of the SciPy stack. SciPy is a popular library for data inspection and analysis, but unfortunately, it cannot read spatial data. GeoPandas was created to fill this gap, taking pandas data objects as a starting point. The library also adds functionality from geographical Python packages. GeoPandas offers two data objects—a GeoSeries object that is based on a pandas Series object and a GeoDataFrame, based on a pandas DataFrame object, but adding a geometry column for each row. Both GeoSeries and GeoDataFrame objects can be used for spatial data processing, similar to spatial databases. Read and write functionality is provided for almost every vector data format. Also, because both Series and DataFrame objects are subclasses from pandas data objects, you can use the same properties to select or subset data, for example .loc or .iloc. GeoPandas is a library that employs the capabilities of newer tools, such as Jupyter Notebooks, pretty well, whereas GDAL enables you to interact with data records inside of vector and raster datasets through Python code. GeoPandas takes a more visual approach by loading all records into a GeoDataFrame so that you can see them all together on your screen. The same goes for plotting data. These functionalities were lacking in Python 2 as developers were dependent on IDEs without extensive data visualization capabilities which are now available with Jupyter Notebooks. We've provided an overview of the most important open source packages for processing and analyzing geospatial data. The question then becomes when to use a certain package and why. GDAL, OGR, and GEOS are indispensable for geospatial processing and analyzing, but were not written in Python, and so they require Python binaries for Python developers. Fiona, Shapely, and pyproj were written to solve these problems, as well as the newer Rasterio library. For a more Pythonic approach, these newer packages are preferable to the older C++ packages with Python binaries (although they're used under the hood). Now that you have an idea of what options are available for a certain use case and why one package is preferable over another, here’s something you should always remember. As is often the way in programming, there might be multiple solutions for one particular problem. For example, when dealing with shapefiles, you could use pyshp, GDAL, Shapely, or GeoPandas, depending on your preference and the problem at hand. Introduction to Data Analysis and Libraries 15 Useful Python Libraries to make your Data Science tasks Easier “Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou Using R to implement Kriging – A Spatial Interpolation technique for Geostatistics data

0
0
47778

article-image-dask-library-scalable-analytics-python

Amey Varangaonkar

22 May 2018

6 min read

Introducing Dask: The library that makes scalable analytics in Python easier

Amey Varangaonkar

22 May 2018

6 min read

Python’s rise as the preferred language of choice in Data Science is unprecedented, but not really unexpected. Apart from being a general-purpose language which can be used for a variety of tasks - from scripting to networking, Python offers a rich suite of libraries for general data science tasks such as scientific computing, data visualization, and more. However, one big challenge faced by the data scientists is that these packages are not designed for scale. This is crucial in today’s Big Data era where tons of data needs to be processed and analyzed on the go. A platform which supports the existing Python ecosystem and allows it to scale across multiple machines and clusters without affecting the performance was conspicuously missing. Enter Dask. What is Dask? Dask is a flexible parallel computing library written in Python for analytics, designed mainly to offer scalability and enhanced power to the existing packages and libraries. It allows the users to integrate their existing Python-based projects written in popular libraries such as NumPy, SciPy, pandas, and more. Architecture is demonstrated in the diagram below: Architecture (Image courtesy: Slideshare) The 2 key components of Dask that interact with the Python libraries are: Dynamic task schedulers - which takes care of the intensive computational workloads ‘Big Data’ Dask collections - consisting of dataframes, parallel arrays and interfaces that allow for the computations to run on distributed environments Why use Dask? Given there are already quite a few distributed platforms for large-scale data processing such as Apache Spark, Apache Storm, Flink and so on, why and when should one go for Dask? What are the advantages offered by this Python library? Let us take a look at the 4 major reasons to prefer Dask for distributed, scalable analytics in Python: Easy to get started: If you are an existing Python user, you must have already worked with popular Python packages such as NumPy, SciPy, matplotlib, scikit-learn, pandas, and more. Dask offers a similar, intuitive interface and since it is a part of the bigger Python ecosystem, getting started with Dask is very easy. It uses the existing Python APIs to switch between the popular packages and their Dask-equivalents, so you don’t have to spend a lot of time in porting the code. For absolute beginners, using Dask for scalable analytics would be an easier and logical option to pursue, once they have grasped the fundamentals of Python and the associated libraries. Scales up and down quite easily: You can run your project on Dask on a single machine, or on a cluster with thousands of cores without essentially affecting the speed and performance of your code. Dask uses the multi-core CPUs within a single system optimally to process hundreds of terabytes of data without the need for additional hardware. Similarly, for moderate to large datasets spanning 100+ gigabytes which often don’t fit into a single storage device, the computing power of the clusters can be coupled with Dask for effective analytics. Supports complex applications: Many companies tend to tackle complex computations by introducing custom codes that run on popular Big Data tools such as Hadoop MapReduce and Apache Spark. However, with the help of the dynamic task schedule feature of Dask, it is now possible to run and process complex applications without introducing any additional code. Dask is solely responsible for the smooth handling of various tasks such as network communication, load balancing and diagnostics, among the others. Clear, responsive, real-time feedback: One of the most important features of Dask is its user-friendliness. Dask provides a real-time dashboard that highlights the key metrics of the processing task undertaken by the user - such as the current progress of your project, memory consumption and more. It also offers an in-built IPython kernel that allows the user to investigate the ongoing computation with just a terminal. How Dask compares with Apache Spark Apache Spark is one of the most popular and widely used Big Data tools for distributed data processing and analytics. Dask and Apache Spark have many features in common, prompting us and many other developers to ask the question - which tool is better? While Spark has been around for quite some and has many standard, stable features over years of development, Dask is quite new and is still being improved as a tool. We summarize the important differences between Dask and Apache Spark in the table below: CriteriaApache SparkDaskPrimary languageScalaPythonScaleSupports a single node to thousands of nodes in the clusterSupports a single node to thousands of nodes in the clusterEcosystemAll-in-one self-sufficient ecosystemIntegration with popular libraries within the Python ecosystemFlexibilityLowHighStream processingBuilt-in module called Spark Streaming presentReal-time interface which is pretty low-level, requires more work than Apache SparkGraph processingPossible with GraphX moduleNot possibleMachine learningUses the Spark MLlib moduleIntegrates with scikit-learn and XGBoostPopularityVery high, commonly used tool in the Big Data ecosystemFairly new tool but has already found its place in the pandas, scikit-learn and Jupyter stack You can read a detailed comparison of Apache Spark and Dask on the official Dask documentation page. What we can expect from Dask As we saw from the comparison above, it is fairly easy to port an existing Python project using several high-profile Python libraries such as NumPy, scikit-learn and more. Python developers and data scientists will appreciate the high flexibility and complex computational capabilities offered by Dask. The limited stream processing and graph processing features are big areas of improvement, but we can expect some developments in this domain in the near future. Even though Dask is still relatively new, it looks very promising due to its close affinity with the Python ecosystem. With Python’s clout rising, many people would prefer a Python-based data processing tool which works at scale, without having to switch to an external Big Data framework. Dask may well be the superhero to come to the developers’ rescue, in such cases. You can learn more about the latest developments in Dask on their official GitHub page. Read more Is Apache Spark today’s Hadoop? Apache Spark 2.3 now has native Kubernetes support! Should you move to Python 3? 7 Python experts’ opinions

0
0
48499

article-image-facebooks-wit-ai-why-we-need-yet-another-chatbot-development-framework

Sunith Shetty

21 May 2018

4 min read

Facebook’s Wit.ai: Why we need yet another chatbot development framework?

Sunith Shetty

21 May 2018

4 min read

Chatbots are remarkably changing the way customer service is provided in a variety of industries. For every organization, customer satisfaction plays a very important role, thus they expect business to be reachable any time and respond to their queries 24*7. With growing artificial intelligence advances in smart devices and IoT, chatbots are becoming a necessity for communicating with customers in real time. There are many existing vendors such as Google, Microsoft, Amazon, and IBM with the required models and services to build conversational interfaces for the applications and devices. But the chatbot industry is evolving and even minor improvements in the UI, or the algorithms that work behind the scenes or the data they use to get trained, can mean a major win. With complete backing by the Facebook team, we can expect Wit.ai creating new simplified ways to ease speech recognition and voice interface for developers. Wit.ai has an excellent support for NLP making it one of the popular bot frameworks in the market. The key to chatbot success is to pursue continuous learning that enables them to leverage relevant data in order to connect with clearly defined customers, this what makes Wit.ai extra special. What is Wit.ai? Wit.ai is an open and extensible NLP engine for developers, acquired by Facebook, which allows you to build conversational applications and devices that you can talk or text to. It provides an easy interface and quick learning APIs to understand human communication from every interaction and helps to parse the complex message (which can be either voice or text) into structured data. It also helps you with predicting the forthcoming set of events based on the learning from the gathered data. Why Wit.ai It is one of the most powerful APIs used to understand natural language It is a free SaaS platform that provides services for developers to build a chatbot for their app or device. It has story support thus allowing you to visualize the user experience. A new built-in support NLP integration with the Page inbox allows the page admins to create a Wit app with ease. Further by using the anonymized samples from past messages, the bot provides automate responses to the most common requests asked. You can create efficient and powerful text or voice based conversational bots that humans can chat with. In addition to business bots, these APIs can be used to build hands-free voice interfaces for mobile phones, wearable devices, home automation products and more. It can be used in platforms that learn new commands semantically to those input by the developer. It provides a developer GUI which includes a visual representation of the conversation flows, business logic invocations, context variables, jumps, and branching logic. Programming language and integration support - Node.js client, Python client, Ruby client, and HTTP API. Challenges in Wit.ai Wit.ai doesn’t support third-party integration tools. Wit.ai has no required slot/parameter feature. Thus you will have to invoke business logic every time there is an interaction with the user in order to gather any missing information not spoken by the user. Training the engine can take some time based on the task performed. When the number of stories increases, Wit engine becomes slower. However, existing Wit.ai adoption looks very promising, with more than 160,000 members in the community contributing on GitHub. In order to have a complete coverage of tutorials, documentation and client support APIs you can visit the Github page to see a list of repositories. My friend, the robot: Artificial Intelligence needs Emotional Intelligence Snips open sources Snips NLU, its Natural Language Understanding engine What can Google Duplex do for businesses?

0
0
22332

article-image-tools-for-reinforcement-learning

Pravin Dhandre

21 May 2018

4 min read

Top 5 tools for reinforcement learning

Pravin Dhandre

21 May 2018

4 min read

0
0
36351

Aarthi Kumaraswamy

16 May 2018

27 min read

30 common data science terms explained

Aarthi Kumaraswamy

16 May 2018

27 min read

0
0
37966

article-image-what-can-google-duplex-do-for-businesses

Natasha Mathur

16 May 2018

9 min read

What can Google Duplex do for businesses?

Natasha Mathur

16 May 2018

9 min read

When talking about the capabilities of AI-driven digital assistants, the most talked about issue is their inability to converse in a way a real human does. The robotic tone of the virtual assistants has been limiting them from imitating real humans for a long time. And it’s not just the flat monotone. It’s about understanding the nuances of the language, pitches, intonations, sarcasm, and a lot more. Now, what if there emerges a technology that is capable of sounding and behaving almost human? Well, look no further, Google Duplex is here to dominate the world of digital assistants. Google introduced the new Duplex at Google I/O 2018, their annual developer conference, last week. But, what exactly is it? Google Duplex is a newly added feature to the famed Google assistant. Adding to the capabilities of Google assistant, it is also able to make phone calls for the users, and imitate human natural conversation almost perfectly to get the day-to-day tasks ( such as booking table reservations, hair salon appointments, etc. ) done in an easy manner. It includes pause-fillers and phrases such as “um”, “uh-huh “, and “erm” to make the conversation sound as natural as possible. Don’t believe me? Check out the audio yourself! [audio mp3="https://hub.packtpub.com/wp-content/uploads/2018/05/Google-Duplex-hair-salon.mp3"][/audio] Google Duplex booking appointments at a hair salon [audio mp3="https://hub.packtpub.com/wp-content/uploads/2018/05/Google-Duplex-table-reservation.mp3"][/audio] Google Duplex making table reservations at a restaurant The demo call recording video of the assistant and the business employee, presented by Sundar Pichai, Google’s CEO, during the opening keynote, befuddled the entire world about who’s the assistant and who’s the human, making it go noticeably viral. A lot of questions are buzzing around whether Google Duplex just passed the Turing Test. The Turing Test assesses a machine’s ability to present intelligence closer or equivalent to that of a human being. Did the new human sounding robot assistant pass the Turing test yet? No, but it’s certainly the voice AI that has come closest to passing it. Now how does Google Duplex work? It’s quite simple. Google Duplex finds out the information ( you need ) that isn’t out there on the internet by making a direct phone call. For instance, a restaurant has shifted location and the new address is nowhere to be found online. Google Duplex will call the restaurant and check on their new address for you. The system comes with a self-monitoring capability, helping it recognize complex tasks that it cannot accomplish on its own. Such cases are signaled to a human operator, who then takes care of the task. To get a bit technical, Google Duplex makes use of Recurrent Neural Networks ( RNNs ) which are created using TensorFlow extended ( TFX ), a machine learning platform. Duplex’s RNNs are trained using the data anonymization technique on phone conversation data. Data anonymization helps with protecting the identity of a company or an individual by removing the data sets related to them. The output of Google’s Automatic speech recognition technology, conversation history and different parameters of the conversation are used by the network. The model also makes use of hyperparameter optimization from TFX which further enhances the model. But, how does it sound natural? Google uses concatenative text to speech ( TTS ) along with synthesis TTS engine ( using Tacotron and WaveNet ) to control the intonation depending on different circumstances. Concatenative TTS is a technique that converts normal text into speech by concatenating or linking together the recorded speech pieces. Synthesis TTS engine helps developers modify the speech rate, volume, and pitch of the synthesized output. Including speech disfluencies ( “hmm”s, “erm”s, and “uh”s ) makes the Duplex sound more human. These speech disfluencies are added when very different sound units are combined in the concatenative TTS or adding synthetic waits. This allows the system to signal in a natural way that it is still processing ( equivalent to what humans do when trying to sort out their thoughts ). Also, the delay or latency should match people’s expectations. Duplex is capable of figuring out when to give slow or fast responses using low-confidence models or faster approximations. Google also found out that including more latency helps with making the conversation sound more natural. Some potential applications of Google Duplex for businesses Now that we’ve covered the what and how of this new technology, let’s look at five potential applications of Google Duplex in the immediate future. Customer Service Basic forms of AI using natural language processing ( NLP ), such as chatbots and the existing voice assistants such as Siri and Alexa are already in use within the customer care industry. Google Duplex paves the way for an even more interactive form of engaging customers and gaining information, given its spectacular human sounding capability. According to Gartner, “By 2018, 30% of our interactions with technology will be through "conversations" with smart machines”. With Google Duplex, being the latest smart machine introduced to the world, the basic operations of the customer service industry will become easier, more manageable and efficient. From providing quick solutions to the initial customer support problems and delivering internal services to the employees, Google Duplex perfectly fills the bill. And it will only get better with further advances in NLP. So far chatbots and digital assistants have been miserable at handling irate customers. I can imagine Google Duplex in John Legend’s smooth voice calming down an angry customer or even making successful sales pitches to potential leads with all its charm and suave! Of course, Duplex must undergo the right customer management training with a massive amount of quality data on what good and bad handling look like before it is ready for such a challenge. Other areas of customer service where Google Duplex can play a major role is in IT support. Instead of connecting with the human operator, the user will first get connected to Google Duplex. Thus, making the entire experience friendly and personalized from the user perspective and saving major costs for organizations. HR Department Google Duplex can also extend a helping hand in the HR department. The preliminary rounds of talent acquisition where hiring executives make phone calls to their respective candidates could be handled by Google Duplex provided it gets the right training. Making note of the basic qualifications, candidate details, and scheduling interviews are all the functions that Google Duplex should be able to do effectively. The Google Assistant can collect the information and then further rounds can be conducted by the human HR personnel. This could greatly cut down on the time expended by HR executives on the first few rounds of shortlisting. This means they are free to focus their time on other strategically important areas of hiring. Personal assistants and productivity As presented at Google I/O 2018, Google Duplex is capable of booking appointments at hair salons, booking table reservations and finding out holiday hours over the phone. It is not a stretch to therefore assume that it can also order takeaway food over a phone call, check with the delivery man regarding the order, cancel appointments, make business inquiries, etc. Apart from that, it’s a great aid for people with hearing loss issues as well as people who do not speak the local language by allowing them to carry out tasks on phone. Healthcare Industry There is already enough talk surrounding the use of Alexa, Siri, and other voice assistants in healthcare. Google Duplex is another new addition to the family. With its natural way of conversing, Duplex can: Let patients know their wait time for emergency rooms. Check with the hospital regarding their health appointments. Order the necessary equipment for hospital use. Another allied area is elder care. Google Duplex could help reduce ailments related to loneliness by engaging with the users at a more human level. It could also assist with preventive care and in the management of lifestyle diseases such as diabetes by ensuring patients continue their med intake, keep their appointments, provide emergency first aid help, call 911 etc. Real Estate Industry Duplex enabled Google Assistants will help make realtors’ task easy. Duplex can help call potential sellers and buyers, thereby, making it easy for realtors to select the respective customers. The conversation between Google Duplex ( helping a realtor ) and a customer wanting to buy a house can look something like this: Google Duplex: Hi! I heard you are house hunting. Are you looking to buy or sell a property? Customer: Hey, I’m looking to buy a home in the Washington area. Google Duplex: That’s great! What part of Washington are you looking in for? Customer: I’m looking for a house in Seattle. 3 bedrooms and 3 baths would be fine. Google Duplex: Sure, umm, may I know your budget? Customer: Somewhere between $749,000 to $850,000, is that fine? Google Duplex: Ahh okay sure, I’ve made a note and I’ll call you once I find the right matches. Customer: Yeah, sure. Google Duplex: okay, thanks. Customer: Thanks, Bye! Google Duplex then makes a note of the details on the realtor’s phone, thereby, narrowing down the efforts made by realtors on cold calling the potential sellers to a great extent. At the same time, the broker will also receive an email with the consumer’s details and contact information for a follow-up. Every rose has its thorns. What’s Duplex’s thorny issue? With all the good hype surrounding Google Duplex, there have been some controversies regarding the ethicality of Google Duplex. Some people have questions and mixed reactions about Google Duplex fooling people of one’s identity as the voice of the Duplex differs significantly from that of a robot. A lot of talk surrounding this issue is trending on several twitter threads. It has hushed away these questions by saying how ‘transparency in technology’ is important and they are ‘designing this feature with disclosure built-in’ which will help in identifying the system. Google also mentioned how any feedback that people have regarding their new product. Google successfully managed to awe people across the globe with their new and innovative Google Duplex. But there is a still a long way to go even though Google has already taken a step ahead in an effort to better the human relationships with the machines. If you enjoyed reading this article and want to know more, check out the official Google Duplex blog post. Google’s Android Things, developer preview 8: First look Google News’ AI revolution strikes balance between personalization and the bigger picture Android P new features: artificial intelligence, digital wellbeing, and simplicity

0
0
19278

article-image-6-reasons-to-choose-mysql-8-for-designing-database-solutions

Amey Varangaonkar

08 May 2018

4 min read

6 reasons to choose MySQL 8 for designing database solutions

Amey Varangaonkar

08 May 2018

4 min read

Whether you are a standalone developer or an enterprise consultant, you would obviously choose a database that provides good benefits and results when compared to other related products. MySQL 8 provides numerous advantages as the first choice in this competitive market. It has various powerful features available that make it a more comprehensive database. Today we will go through the benefits of using MySQL as the preferred database solution: [box type="note" align="" class="" width=""]The following excerpt is taken from the book MySQL 8 Administrator’s Guide, co-authored by Chintan Mehta, Ankit Bhavsar, Hetal Oza and Subhash Shah. This book presents step-by-step techniques on managing, monitoring and securing the MySQL database without any hassle.[/box] Security The first thing that comes to mind is securing data because nowadays data has become precious and can impact business continuity if legal obligations are not met; in fact, it can be so bad that it can close down your business in no time. MySQL is the most secure and reliable database management system used by many well-known enterprises such as Facebook, Twitter, and Wikipedia. It really provides a good security layer that protects sensitive information from intruders. MySQL gives access control management so that granting and revoking required access from the user is easy. Roles can also be defined with a list of permissions that can be granted or revoked for the user. All user passwords are stored in an encrypted format using plugin-specific algorithms. Scalability Day by day, the mountain of data is growing because of extensive use of technology in numerous ways. Because of this, load average is going through the roof. In some cases, it is unpredictable that data cannot exceed up to some limit or number of users will not go out of bounds. Scalable databases would be a preferable solution so that, at any point, we can meet unexpected demands to scale. MySQL is a rewarding database system for its scalability, which can scale horizontally and vertically; in terms of data, spreading database and load of application queries across multiple MySQL servers is quite feasible. It is pretty easy to add horsepower to the MySQL cluster to handle the load. An open source relational database management system MySQL is an open source database management system that makes debugging, upgrading, and enhancing the functionality fast and easy. You can view the source and make the changes accordingly and use it in your own way. You can also distribute an extended version of MySQL, but you will need to have a license for this. High performance MySQL gives high-speed transaction processing with optimal speed. It can cache the results, which boosts read performance. Replication and clustering make the system scalable for more concurrency and manages the heavy workload. Database indexes also accelerate the performance of SELECT query statements for substantial amount of data. To enhance performance, MySQL 8 has included indexes in performance schema to speed up data retrieval. High availability Today, in the world of competitive marketing, an organization's key point is to have their system up and running. Any failure or downtime directly impacts business and revenue; hence, high availability is a factor that cannot be overlooked. MySQL is quite reliable and has constant availability using cluster and replication configurations. Cluster servers instantly handle failures and manage the failover part to keep your system available almost all the time. If one server gets down, it will redirect the user's request to another node and perform the requested operation. Cross-platform capabilities MySQL provides cross-platform flexibility that can run on various platforms such as Windows, Linux, Solaris, OS2, and so on. It has great API support for the all major languages, which makes it very easy to integrate with languages such as PHP, C++, Perl, Python, Java, and so on. It is also part of the Linux Apache MySQL PHP (LAMP) server that is used worldwide for web applications. That’s it then! We discussed few important reasons of MySQL being the most popular relational database in the world and is widely adopted across many enterprises. If you want to learn more about MySQL’s administrative features, make sure to check out the book MySQL 8 Administrator’s Guide today! 12 most common MySQL errors you should be aware of Top 10 MySQL 8 performance benchmarking aspects to know

0
0
18996

article-image-2018-year-of-graph-databases

Amey Varangaonkar

04 May 2018

5 min read

2018 is the year of graph databases. Here's why.

Amey Varangaonkar

04 May 2018

5 min read

With the explosion of data, businesses are looking to innovate as they connect their operations to a whole host of different technologies. The need for consistency across all data elements is now stronger than ever. That’s where graph databases come in handy. Because they allow for a high level of flexibility when it comes to representing your data and also while handling complex interactions within different elements, graph databases are considered by many to be the next big trend in databases. In this article, we dive deep into the current graph database scene, and list out 3 top reasons why graph databases will continue to soar in terms of popularity in 2018. What are graph databases, anyway? Simply put, graph databases are databases that follow the graph model. What is a graph model, then? In mathematical terms, a graph is simply a collection of nodes, with different nodes connected by edges. Each node contains some information about the graph, while edges denote the connection between the nodes. How are graph databases different from the relational databases, you might ask? Well, the key difference between the two is the fact that graph data models allow for more flexible and fine-grained relationships between data objects, as compared to relational models. There are some more differences between the graph data model and the relational data model, which you should read through for more information. Often, you will see that graph databases are without a schema. This allows for a very flexible data model, much like the document or key/value store database models. A unique feature of the graph databases, however, is that they also support relationships between the data objects like a relational database. This is useful because it allows for a more flexible and faster database, which can be invaluable to your project which demands a quicker response time. Image courtesy DB-Engines The rise in popularity of the graph database models over the last 5 years has been stunning, but not exactly surprising. If we were to drill down the 3 key factors that have propelled the popularity of graph databases to a whole new level, what would they be? Let’s find out. Major players entering the graph database market About a decade ago, the graph database family included just Neo4j and a couple of other less-popular graph databases. More recently, however, all the major players in the industry such as Oracle (Oracle Spatial and Graph), Microsoft (Graph Engine), SAP (SAP Hana as a graph store) and IBM (Compose for JanusGraph) have come up with graph offerings of their own. The most recent entrant to the graph database market is Amazon, with Amazon Neptune announced just last year. According to Andy Jassy, CEO of Amazon Web Services, graph databases are becoming a part of the growing trend of multi-model databases. Per Jassy, these databases are finding increased adoption on the cloud as they support a myriad of useful data processing methods. The traditional over-reliance on relational databases is slowly breaking down, he says. Rise of the Cypher Query Language With graph databases slowly getting mainstream recognition and adoption, the major companies have identified the need for a standard query language for all graph databases. Similar to SQL, Cypher has emerged as a standard and is a widely-adopted alternative to write efficient and easy to understand graph queries. As of today, the Cypher Query Language is used in popular graph databases such as Neo4j, SAP Hana, Redis graph and so on. The OpenCypher project, the project that develops and maintains Cypher, has also released Cypher for popular Big Data frameworks like Apache Spark. Cypher’s popularity has risen tremendously over the last few years. The primary reason for this is the fact that like SQL, Cypher’s declarative nature allows users to state the actions they want performed on their graph data without explicitly specifying them. Finding critical real-world applications Graph databases were in the news as early as 2016, when the Panama paper leaks were revealed with the help of Neo4j and Linkurious, a data visualization software. In more recent times, graph databases have also found increased applications in online recommendation engines, as well as for performing tasks that include fraud detection and managing social media. Facebook’s search app also uses graph technology to map social relationships. Graph databases are also finding applications in virtual assistants to drive conversations - eBay’s virtual shopping assistant is an example. Even NASA uses the knowledge graph architecture to find critical data. What next for graph databases? With growing adoption of graph databases, we expect graph-based platforms to soon become the foundational elements of many corporate tech stacks. The next focus area for these databases will be practical implementations such as graph analytics and building graph-based applications. The rising number of graph databases would also mean more competition, and that is a good thing - competition will bring more innovation, and enable incorporation of more cutting-edge features. With a healthy and steadily growing community of developers, data scientists and even business analysts, this evolution may be on the cards, sooner than we might expect. Amazon Neptune: A graph database service for your applications When, why and how to use Graph analytics for your big data

0
0
38833

article-image-why-is-data-science-important

Richard Gall

24 Apr 2018

3 min read

Why is data science important?

Richard Gall

24 Apr 2018

3 min read

Is data science important? It's a term that's talked about a lot but often misunderstood. Because it's a buzzword it's easy to dismiss; but data science is important. Behind the term lies very specific set of activities - and skills - that businesses can leverage to their advantage. Data science allows businesses to use the data at their disposal, whether that's customer data, financial data or otherwise, in an intelligent manner. It's results should be a key driver of growth. However, although it’s not wrong to see data science as a real game changer for business, that doesn’t mean it’s easy to do well. In fact, it’s pretty easy to do data science badly. A number of reports suggest that a large proportion of analytics projects fail to deliver results. That means a huge number of organizations are doing data science wrong. Key to these failures is a misunderstanding of how to properly utilize data science. You see it so many times - buzzwords like data science are often like hammers. They make all your problems look like nails. And not properly understanding the business problems you’re trying to solve is where things go wrong. What is data science? But what is data science exactly? Quite simply, it’s about using data to solve problems. The scope of these problems is huge. Here are a few ways data science can be used: Improving customer retention by finding out what the triggers of churn might be Improving internal product development processes by looking at points where faults are most likely to happen Targeting customers with the right sales messages at the right time Informing product development by looking at how people use your products Analyzing customer sentiment on social media Financial modeling As you can see data science is a field that can impact every department. From marketing to product management to finance, data science isn’t just a buzzword, it’s a shift in mindset about how we work. Data science is about solving business problems To anyone still asking is data science important, the answer is actually quite straightforward. It's important because it solves business problems. Once you - and management - recognise that fact, you're on the right track. Too often businesses want machine learning, big data projects without thinking about what they’re really trying to do. If you want your data scientists to be successful, present them with the problems - let them create the solutions. They won’t want to be told to simply build a machine learning project. It’s crucial to know what the end goal is. Peter Drucker once said “in God we trust… everyone else must bring data”. But data science didn’t really exist then - if it did it could be much simpler: trust your data scientists.

0
0
33575

Aaron Lazar

23 Apr 2018

5 min read

Why is Hadoop dying?

Aaron Lazar

23 Apr 2018

5 min read

Hadoop has been the definitive big data platform for some time. The name has practically been synonymous with the field. But while its ascent followed the trajectory of what was referred to as the 'big data revolution', Hadoop now seems to be in danger. The question is everywhere - is Hadoop dying out? And if it is, why is it? Is it because big data is no longer the buzzword it once was, or are there simply other ways of working with big data that have become more useful? Hadoop was essential to the growth of big data When Hadoop was open sourced in 2007, it opened the door to big data. It brought compute to data, as against bringing data to compute. Organisations had the opportunity to scale their data without having to worry too much about the cost. It obviously had initial hiccups with security, the complexity of querying and querying speeds, but all that was taken care off, in the long run. Still, although querying speeds remained quite a pain, however that wasn’t the real reason behind Hadoop dying (slowly). As cloud grew, Hadoop started falling One of the main reasons behind Hadoop's decline in popularity was the growth of cloud. There cloud vendor market was pretty crowded, and each of them provided their own big data processing services. These services all basically did what Hadoop was doing. But they also did it in an even more efficient and hassle-free way. Customers didn't have to think about administration, security or maintenance in the way they had to with Hadoop. One person’s big data is another person’s small data Well, this is clearly a fact. Several organisations that used big data technologies without really gauging the amount of data they actually would need to process, have suffered. Imagine sitting with 10TB Hadoop clusters when you don’t have that much data. The two biggest organisations that built products on Hadoop, Hortonworks and Cloudera, saw a decline in revenue in 2015, owing to their massive use of Hadoop. Customers weren’t pleased with nature of Hadoop’s limitations. Apache Hadoop v Apache Spark Hadoop processing is way behind in terms of processing speed. In 2014 Spark took the world by storm. I’m going to let you guess which line in the graph above might be Hadoop, and which might be Spark. Spark was a general purpose, easy to use platform that was built after studying the pitfalls of Hadoop. Spark was not bound to just the HDFS (Hadoop Distributed File System) which meant that it could leverage storage systems like Cassandra and MongoDB as well. Spark 2.3 was also able to run on Kubernetes; a big leap for containerized big data processing in the cloud. Spark also brings along GraphX, which allows developers to view data in the form of graphs. Some of the major areas Spark wins are Iterative Algorithms in Machine Learning, Interactive Data Mining and Data Processing, Stream processing, Sensor data processing, etc. Machine Learning in Hadoop is not straightforward Unlike MLlib in Spark, Machine Learning is not possible in Hadoop unless tied with a 3rd party library. Mahout used to be quite popular for doing ML on Hadoop, but its adoption has gone down in the past few years. Tools like RHadoop, a collection of 3 R packages, have grown for ML, but it still is nowhere comparable to the power of the modern day MLaaS offerings from cloud providers. All the more reason to move away from Hadoop, right? Maybe. Hadoop is not only Hadoop The general misconception is that Hadoop is quickly going to be extinct. On the contrary, the Hadoop family consists of YARN, HDFS, MapReduce, Hive, Hbase, Spark, Kudu, Impala, and 20 other products. While e folks may be moving away from Hadoop as their choice for big data processing, they will still be using Hadoop in some form or the other. As with Cloudera and Hortonworks, though the market has seen a downward trend, they’re in no way letting go of Hadoop anytime soon, although they have shifted part of their processing operations to Spark. Is Hadoop dying? Perhaps not... In the long run, it’s not completely accurate to say that Hadoop is dying. December last year brought with it Hadoop 3.0, which is supposed to be a much improved version of the framework. Some of the most noteworthy features are its improved shell script, more powerful YARN, improved fault tolerance with erasure coding, and many more. Although, that hasn’t caused any major spike in adoption, there are still users who will adopt Hadoop based on their use case, or simply use another alternative like Spark along with another framework from the Hadoop family. So, Hadoop’s not going away anytime soon. Read More Pandas is an effective tool to explore and analyze data - Interview insights

0
1
39249

article-image-what-is-aiops-why-going-to-be-important

Aaron Lazar

19 Apr 2018

4 min read

What is AIOps and why is it going to be important?

Aaron Lazar

19 Apr 2018

4 min read

Woah, woah, woah! Wait a minute! First there was that game SpecOps that I usually sucked at, then there came ITOps and DevOps that took the world by storm, now there’s another something-Ops?? Well, believe it or not, there is, and they’re calling it AIOps. What does AIOps stand for? AIOps basically means Artificial Intelligence for IT Operations. It means IT operations are enhanced by using analytics and machine learning to analyze the data that’s collected from various IT operations tools and devices. This helps in spotting and reacting to issues in real time. Coined by Gartner, the term has grown in popularity over the past year. Gartner believes that AIOps will be a major transformation for ITOps professionals mainly due to the fact that traditional IT operations cannot cope with the modern digital transformation. Why is AIOps important? With the massive and rapid shift towards cloud adoption, automation and continuous improvement, AIOps is here to take care of the new entrants into the digital ecosystem - Machine agents, artificial intelligence, IoT devices, etc. These new entrants are impossible to service and maintain by humans and with billions of devices connected together, the only way forward is to employ algorithms that tackle known problems. Some of the solutions it provides are maintaining high availability and monitoring performance, event correlation and analysis, automation and IT service management. How does AIOps work? As depicted in Gartner’s diagram, there are two primary components to AIOps. Big Data Machine Learning Data is gathered from the enterprise. You then implement a comprehensive analytics and machine learning strategy alongside the combined IT data (monitoring data + job logs + tickets + incident logs). The processed data yields continuous insights, continuous improvements and fixes. It bridges three different IT disciplines to accomplish its goals: Service management Performance management, and Automation To put it simply, it is a strategic focus. It argues for a new approach in a world where big data and machine learning have changed everything. How to move from ITOps to AIOps Machine Learning Most of AIOps will involve supervised learning and professionals will need a good understanding of the underlying algorithms. Now don’t get me wrong, they don’t need to be full blown data scientists to build the system, but just having sufficient knowledge to be able to train the system to pick up anomalies. Auditing these systems to ensure they’re performing the tasks as per the initial vision is necessary and this will go hand in hand with scripting them. Understanding modern application technologies With the rise of Agile software development and other modern methodologies, AIOps professionals are expected to know all about microservices, APIs, CI/CD, containers, etc. With the giant leaps that cloud development is taking, it is expected to gain visibility into cloud deployments, with an emphasis on cost and performance. Security Security is critical, for example, it’s important for personnel to understand how to engage a denial of service attack or maybe a ransomware attack, like the ones we’ve seen in the recent past. Training machines to detect/predict such events is pertinent to AIOps. The key tools in AIOps There are a wide variety of AIOps platforms available in the market that bring AI and Intelligence to IT Operations. One of the most noteworthy ones is Splunk, which has recently incorporated AI for intelligence driven operations. Another one is the Moogsoft AIOps platform, that is quite similar to Splunk. BMC also has entered the fray, launching TrueSight 11, their AIOps platform that promises to address use cases to improve performance and capacity management, the service desk, and application development disciplines. Gartner has a handy list of top platforms. If you’re planning the transition from ITOps, do check out the list. Companies like Frankfurt Cargo Services and Revtrak have already added the AI to their Ops. So, are you going to make the transition? According to Gartner, 40% of large enterprises would have made the transition to AIOps by 2022. If you’re one of them, I recommend you do it for the right reasons, but don’t do it overnight. The transition needs to be gradual and well planned. The first thing you need to do is getting your enterprise data together. If you don’t have sufficient data that’s worthy of analysis, AIOps isn’t going to help you much. Read more: Bridging the gap between data science and DevOps with DataOps.

0
0
49058

article-image-how-machine-learning-as-a-service-transforming-cloud

Vijin Boricha

18 Apr 2018

4 min read

How machine learning as a service is transforming cloud

Vijin Boricha

18 Apr 2018

4 min read

Machine learning as a service (MLaaS) is an innovation that is growing out of 2 of the most important tech trends - cloud and machine learning. It's significant because it enhances both. It makes cloud an even more compelling proposition for businesses. That's because cloud typically has three major operations: computing, networking and storage. When you bring machine learning into the picture, the data that cloud stores and processes can be used in radically different ways, solving a range of business problems. What is machine learning as a service? Cloud platforms have always competed to be the first or the best to provide new services. This includes platform as a service (PaaS) solutions, infrastructure as a service (IaaS) solutions and software as a service (SaaS) solutions. In essense, cloud providers like AWS and Azure provide sets of software to different things so their customers don't have to. Machine learning as a service is simply another instance of the services offered by cloud providers. It could include a wide range of features, from data visualization to predictive analytics and natural language processing. It makes running machine learning models easy, effectively automating some of the work that might have typically done manually by a data engineering team. Here are the biggest cloud providers who offer machine learning as a service: Google Cloud Platform Amazon Web Services Microsoft Azure IBM Cloud Every platform provides a different suite of services and features. It will ultimately depend on what's most important to you which one you choose. Let's take a look now at the key differences between these cloud providers' machine learning as a service offerings. Comparing the leading MLaaS products Google Cloud AI Google Cloud Platform has always provided their own services to help businesses grow. They provide modern machine learning services with pre-trained models and a service to generate your own tailored models. Majority of Google applications like Photos (image search), the Google app (voice search), and Inbox (Smart Reply) have been built using the same services that they provide to their users. Pros: Cheaper in comparison to other Cloud providers Provides IaaS and PaaS Solutions Cons: Google Prediction API is going to be discontinued (May 1st, 2018) Lacks a visual interface You'll need to know TensorFlow Amazon Machine Learning Amazon Machine Learning provides services for building ML models and generating predictions which help users develop robust, scalable, and cost-effective smart applications. With the help of Amazon Machine Learning you are able to use powerful machine learning technology without having any prior experience in machine learning algorithms and techniques. Pros: Provides versatile automated solutions It's accessible - users don't need to be machine learning experts Cons: The more you use, the more expensive it is Azure Machine Learning Studio Microsoft Azure provides you with Machine Learning Studio - a simple browser-based, drag-and-drop environment which functions without any kind of coding. You are provided with fully-managed cloud services that enable you to easily build, deploy and share predictive analytics solutions. Here you are also provided with a platform (Gallery) to share and contribute to the community. Pros: Consists of most versatile toolset for MLaaS You can contribute to and reuse machine learning solutions from the community Cons: Comparatively expensive A lot of manual work is required Watson Machine Learning Similar to the above platforms, IBM Watson Machine Learning is a service that helps users to create, train, and deploy self-learning models to integrate predictive capabilities within their applications. This platform provides automated and collaborative workflows to grow intelligent business applications. Pros: Automated workflows Data science skills is not necessary Cons: Comparatively limited APIs and services Lacks streaming analytics Selecting the machine learning as a service solution that's right for you There are so many machine learning as a service solutions out there that it's easy to get confused. The crucial step to take before you make a decision to purchase anything is to plan your business requirements. Think carefully not only about what you want to achieve, but what you already do too. You want your MLaaS solution to easily integrate into the way you currently work. You also don't want it to replicate any work you're currently doing that you're pretty happy with. It gets repeated so much but it remains as true as it has ever been - make sure your software decisions are fully aligned with your business needs. It's easy to get seduced by the promise of innovative new tools, but without the right alignment they're not going to help you at all.

0
0
34133

article-image-ibm-think-2018-key-takeaways-developers

Amey Varangaonkar

17 Apr 2018

5 min read

IBM Think 2018: 6 key takeaways for developers

Amey Varangaonkar

17 Apr 2018

5 min read

This year, IBM Think 2018 was hosted in Las Vegas from March 20 to 22. It was one of the most anticipated IBM events in 2018, with over 40,000 developers as well as technology and business leaders in attendance. Considered IBM’s flagship conference, Think 2018 combined previous conferences such as IBM InterConnect and World of Watson. IBM Think 2018: Key Takeaways IBM Watson Studio announced - A platform where data professionals in different roles can come together and build end-to-end Artificial Intelligence workflows Integration of IBM Watson with Apple's Core ML, for incorporating custom machine learning models into iOS apps IBM Blockchain platform announced, for Blockchain developers to build enterprise-grade decentralized applications Deep Learning as a Service announced as a part of the Watson Studio, allowing you to train deep learning models more efficiently Fabric for Deep Learning open-sourced, so that you can use the open source deep learning framework to train your models and then integrate them with the Watson Studio Neural Network Modeler announced for Watson Studio, a GUI tool to design neural networks efficiently, without a lot of manual coding IBM Watson Assistant announced, an AI-powered digital assistant, for automotive vehicles and hospitality Here are some of the announcements and key takeaways which have excited us, as well as the developers all around the world! IBM Watson Studio announced One of the biggest announcements of the event was the IBM Watson Studio - a premier tool that brings together data scientists, developers and data engineers to collaborate, build and deploy end-to-end data workflows. Right from accessing your data source to deploying accurate and high performance models, this platform does it all. It is just what enterprises need today to leverage Artificial Intelligence in order to accelerate research, and get intuitive insights from their data. IBM Watson Studio's Lead Product Manager, Armand Ruiz, gives a sneak-peek into what we can expect from Watson Studio. Collaboration with Apple Core ML IBM took their relationship with Apple to another level by announcing their collaboration to develop smarter iOS applications. IBM Watson’s Visual Recognition Service can be used to train custom Core ML machine learning models, which can be directly used by iOS apps. The latest announcement at IBM Think 2018 comes as no surprise to us, considering IBM had released new developer tools for enterprise development using the Swift language. IBM Watson Assistant announced IBM Think 2018 also announced the evolution of Watson Conversation to Watson Assistant, introducing new features and capabilities to deliver a more engaging and personalized customer experience. With this, IBM plans to take the concept of AI assistants for businesses on to a new level. Currently in the beta program, there are 2 domain-specific solutions available for use on top of Watson Assistant - namely Watson Assistant for Automotive and Watson Assistant for Hospitality. IBM Blockchain Platform Per Juniper Research, more than half of the world’s big corporations are considering adoption of or are already in the process of adopting Blockchain technology. This presents a serious opportunity for a developer centric platform that can be used to build custom decentralized networks. IBM, unsurprisingly, has identified this opportunity and come up with a Blockchain development platform of their own - the IBM Blockchain Platform. Recently launched as a beta, this platform offers a pay-as-you-use option for Blockchain developers to develop their own enterprise-grade Blockchain solutions without any hassle. Deep Learning as a Service Training a deep learning model is quite tricky, as it requires you to design the right kind of neural networks along with having the right hyperparameters. This is a significant pain point for the data scientists and machine learning engineers. To tackle this problem, IBM announced the release of Deep Learning as a Service as part of the Watson Studio. It includes the Neural Network Modeler (explained in detail below) to simplify the process of designing and training neural networks. Alternatively, using this service, you can leverage popular deep learning libraries and frameworks such as PyTorch, Tensorflow, Caffe, Keras to train your neural networks manually. In the process, IBM also open sourced the core functionalities of Deep Learning as a Service as a separate project - namely Fabric for Deep Learning. This allows models to be trained using different open source frameworks on Kubernetes containers, and also make use of the GPUs’ processing power. These models can then eventually be integrated to the Watson Studio. Accelerating deep learning with the Neural Network Modeler In a bid to reduce the complexities and the manual work that go into designing and training neural networks, IBM introduced a beta release of the Neural Network Modeler within the Watson Studio. This new feature allows you to design and model standardized neural network models without going into a lot of technical details, thanks to its intuitive GUI. With this announcement, IBM aims to accelerate the overall process of deep learning, so that the data scientists and machine learning developers can focus on the thinking more than operational side of things. At Think 2018, we also saw the IBM Research team present their annual ‘5 in 5’ predictions. This session highlighted the 5 key innovations that are currently in research, and are expected to change our lives in the near future. With these announcements, it’s quite clear that IBM are well in sync with the two hottest trends in the tech space today - namely Artificial Intelligence and Blockchain. They seem to be taking every possible step to ensure they’re right up there as the preferred choice of tool for data scientists and machine learning developers. We only expect the aforementioned services to get better and have more mainstream adoption with time, as most of these services are currently in the beta stage. Not just that, there’s scope for more improvements and addition of newer functionalities as they develop these platforms. What did you think of these announcements by IBM? Do let us know!

0
0
9187

article-image-organisation-needs-to-know-about-gdpr

Aaron Lazar

16 Apr 2018

5 min read

What your organisation needs to know about GDPR

Aaron Lazar

16 Apr 2018

5 min read

GDPR is an acronym that has been doing the rounds for a couple of years now. It’s become even more visible in the last few weeks, thanks to the Facebook and Cambridge Analytica data hijacking scandal. And with the deadline date looming - 25 May 2018 - every organization on the planet needs to make sure their on top of things. But what is GDPR exactly? And how is it going to affect you? What is GDPR? Before April, 2016, a data protection directive enforced in 1995 was in place. This governed all organisations that dealt with collecting, storing and processing data. This directive became outdated with rapidly evolving technological trends, which meant a revised directive was needed. In April 2016, the European Union drew up General Data Protection Regulation. It has been specifically created to to protect the personal data and privacy of European citizens. It's important to note at this point that the directive doesn't just apply to EU organizations - it applies to anyone who deals with data on EU citizens. A relatively new genre of crime involving stealing data, has cropped up over the past decade. Data is so powerful, that its misuse could be devastating, possibly resulting in another world war. GDPR aims to set a new benchmark for the protection of consumer data rights by making organisations more accountable. Governed by GDPR, organisations will now be responsible for guarding every quantum of information that is connected to an individual, including IP addresses and web cookies! Read more: Why GDPR is good for everyone. Why should organizations bother with GDPR? In December 2017, the RSA, one of the first cryptosystems and security organisations, surveyed 7,500 customers in France, Italy, Germany, the UK and the US, and the results were interesting. When asked what their main concern was, customers responded that lost passwords, banking information, passports and other important documents were their major concern. The more interesting part was that over 60% of the respondents said that in the event of a breach, they would blame the organisation that lost their data rather than the hacker. If you work for or own a company that deals with the data of EU citizens, you’ll probably have GDPR on your radar. If you don’t comply, you’ll face a hefty fine - more on that below. What kind of data are we talking about? The GDPR aims to protect data related to identity information like name, physical address, sexual orientation and more. It also covers any ID numbers; IP addresses, cookies and RFID tags; genetic and any data related to health; biometric data like fingerprints, retina scans, etc; racial or ethnic data; political opinions. Who must comply with GDPR? You’ll be governed by GDPR if: You’re a company located in the EU You’re not located in the EU but you still process data of EU citizens You have more than 250 employees You have lesser than 250 employees but process data that could impact the rights and freedom of EU citizens When does GDPR come into force? In case you missed it in the first paragraph, GDPR comes into effect on 25 May 2018. If you're not ready yet, now is the time to scramble to get things right and make sure you comply with GDPR regulations. What if you don’t make the date? Unlike an invitation to a birthday party, if you miss the date to comply with the GDPR, you’re likely to be fined to the tune of €20 million or 4% of the worldwide turnover of your company. A more relaxed fine includes €10 million or 2% of the worldwide turnover of your company, for misusing data in ways involving failure to report a data breach, failure to incorporate privacy by design and failure to ensure that data protection is applied at the initial stage of a project. It also includes the failure to hire a Data Protection Officer/Chief Data Officer, who has professional experience and knowledge of data protection laws that are proportionate to what the organisation carries out. If it makes you feel any better, you’re not the only one. A report from Ovum states that more than 50% of the companies feel they’re most likely to be fined for non compliance. How do you prepare for GDPR? Well, here are a few honest steps that you could perform to ensure a successful compliance: Prepare to shell out between $1 million to $10 million to meet GDPR requirements Hire a DPO or a CDO who’s capable of handling all your data policies and migration Fully understand GDPR and its requirements Perform a risk assessment, understand what kind of data you store and what implications it might have Strategize to mitigate that risk Review/Create your data protection plan Plan for a 72 hour incident response system Implement internal plans and policies to ensure employees follow For the third time then - time is running out! It’s imperative that you ensure your organisation complies with GDPR before the 25th of May, 2018. We’ll follow up with some more thoughts to help you make the shift, as well as give you more insight into this game changing regulation. If you own or are part of an organisation that has migrated to comply with GDPR, please share some tips in the comments section below to help others still in the midst of the transition.

0
0
33325

Tech Guides - Data

Does AI deserve to be so Overhyped?

Top 7 libraries for geospatial analysis

Introducing Dask: The library that makes scalable analytics in Python easier

Facebook’s Wit.ai: Why we need yet another chatbot development framework?

Top 5 tools for reinforcement learning

30 common data science terms explained

What can Google Duplex do for businesses?

6 reasons to choose MySQL 8 for designing database solutions

2018 is the year of graph databases. Here's why.

Why is data science important?

Trending Topics

Why is Hadoop dying?

What is AIOps and why is it going to be important?

How machine learning as a service is transforming cloud

IBM Think 2018: 6 key takeaways for developers

What your organisation needs to know about GDPR

Create a Free Account To Continue Reading

SignIn Free Account To Continue Reading