Data | 281 articles | Tech News, Tutorials & Expert Insights

article-image-5-examples-of-artificial-intelligence-in-web-apps

20 Aug 2018

7 min read

5 examples of Artificial Intelligence in Web apps

20 Aug 2018

Modern day web app development is increasingly focused on building a customer-facing front-end presence with the use of Artificial Intelligence. Web apps, use Artificial Intelligence not just for intelligent automation, but also for building recommendation engines, website implementation, and image recognition, among other application areas. In this post, we look at five key areas, illustrated by real-world examples, where web apps are employing Artificial intelligence to automate some part of their system. Recommendation Engines of Amazon and Netflix Curating content based on the user’s context is one of the most widely used AI features in web apps. Amazon, for instance, uses item-based collaborative filtering for product classification. Amazon’s recommendation system uses a combination of goods-based recommendation (users are recommended for those similar to what they liked in the past) and buddy-based recommendation (users are recommended things which their Facebook friends like.) Not just for their recommendation system, Amazon has been using AI for multiple tasks. Their AI Management Strategy is called The Flywheel, where one part of Amazon acts as a catalyst for AI and machine learning growth in other areas. Read more: Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics Another popular example is Netflix, who revamped their recommendation algorithm based on visual impressions. One of their research projects indicated that the artwork was not only the biggest influencer to a viewer's decision to watch content, but it also drew over 82% of their focus while browsing Netflix. This made them develop a new image recommendation algorithm which works in real time to project the image it thinks the user will respond to. They use implicit (user behavior) and Explicit data (user activity) and then feed this data to machine learning algorithms to figure out the relevant content for each user. For each title, users get the image with the highest rank based on their profile. Side by side, it continues collecting data from its 100 million other subscribers to improve its engine’s performance. Read more: What software stack does Netflix use? Google and Microsoft using Image recognition Image recognition can serve multiple uses for web apps including object and pattern recognition, locating duplicates (exact or partial), image search by fragments, and more. Two such unique applications of image recognition are Google’s Quickdraw and Microsoft’s Captionbot.ai. Quick Draw is Google’s AI-powered web app game, where users have to draw an everyday object that a neural network tries to recognize. Players are given 20 seconds to draw a random item, and Google’s neural network tries to match it with other 50 million hand-drawn sketches by other players to identify the correct one. Quickdraw aims to generate the world’s largest doodling data set, which is shared publicly to help further machine learning research. The data preserves user privacy by collecting only anonymous metadata, including timestamp, country code, whether or not the drawing was recognized, and which word the drawing corresponded to. This dataset was used in SketchRNN, a neural network that can draw words and interpolate between drawings. Another image recognition web app is Microsoft’s Captionbot.ai. The system can automatically generate a caption for an uploaded photograph. Users can rate how accurately it has detected what was on display. The algorithm learns from the rating, to make the captions more accurate. It uses three separate services to process the images. The Computer Vision API identifies the components of the photo, then mixes it with data from the Bing Image API, and runs any faces it spots through Emotion API. The Emotion API analyses facial expressions to detect anger, contempt, disgust, fear, and other traits. Based on the results from these APIs, the caption is generated. Google Docs powered by Natural Language Processing Modern Web apps can also be fueled with cognitive capabilities to make them stand apart from other apps. Instances of this include transforming human speech to text or conversing with people in natural language. One such example of a web app which includes natural language processing is Google Docs. Google Docs and Slides have an Explore feature to show text, images, and other features relevant to the document that a user is working on at any given point. Docs can also use natural language to search through data and reports, and automatically generate formulas in Sheets. Google Docs recently incorporated an AI grammar checker, announced at Google Cloud Next. It uses a machine translation algorithm to recognize errors and suggest corrections as users type. Google Docs can also be integrated with Natural Language API to recognize the sentiment of selected text in a Google Doc and highlight it based on that sentiment. Web-based artificial intelligence Chatbots Web-based chatbots are just like app-based chatbots albeit they interact with users in the website browser. They use AI techniques such as natural language understanding and pattern recognition to store and distinguish between the context of the information provided and elicit a suitable response for future replies. An example of web-based chatbots are the Live Chat bots where the conversation with a visitor on a website is automated using a chatbot. Many live chat software companies are already experimenting with chatbots. Examples include the Operator bot used by Intercom, a company building customer messaging platform or Driftbot by Drift which gives your website a personal assistant. Read More: Top 4 chatbot development frameworks for developers Another example, are AI based chatbots which help in creating full websites. Right Click is a startup that introduced an A.I.-powered chatbot which uses Artificial Intelligence in a conversational interface to create websites. It asks general questions during the conversation like “What industry you belong to?” and “Why do you want to make a website?” and creates customized templates as per the given answers. Similarly, Wix’s Artificial Intelligence Design bot can tailor websites by learning about each person’s or business’ own needs. Web-based code helpers using AI Intelligent coding assistants are gaining popularity with their ability to understand the code and provide right suggestions at the right time. They can analyze code on the web and give fast and smart completions. Codota for Chrome is a smart web-based IDE which can build predictive models of code and suggest code completions and related content based on the current context present in the code. It combines program analysis, natural language processing, and machine learning to learn from the code. Users can look for Codota’s Icon on every code snippet on their browsers - in GitHub, StackOverflow and others. Another example is Deep Cognition’s Deep Learning Studio – Cloud. It is not exactly an IDE, but it features AI-powered drag & drop interface to help design deep learning models with ease. It features assisted modeling, for automated tensor size calculations and real-time validation. It also has AutoML feature to automatically build a neural network. [dropcap]E[/dropcap]ven though AI is a great choice to enhance your web apps, an important facet to keep in mind is ensuring fairness, accuracy, and transparency of your web apps. For instance, web apps powered by natural language should not discriminate people based on caste, color, or creed or hurt user sentiments. Similarly, those using neural networks for recognizing images should ensure the filtering of obscene images. Creating such types of artificial intelligence systems would require a hybrid of designers, programmers, ML engineers, and researchers. This collective group will have a good grasp of user experience, will be comfortable thinking in abstracts and algorithms, and equally well versed with the social impacts of artificial intelligence. Read More: 20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017 Uber introduces Fusion.js, a plugin-based web development framework for high-performance apps. Electron Fiddle: A ‘code playground’ for experimenting with cross-platform native apps. Warp: Rust’s new web framework for implementing WAI (Web Application Interface)

0
0
27376

article-image-top-10-computer-vision-tools

Aaron Lazar

05 Apr 2018

7 min read

Top 10 Tools for Computer Vision

Aaron Lazar

05 Apr 2018

7 min read

The adoption of Computer Vision has been steadily picking up pace over the past decade, but there’s been a spike in adoption of various computer vision tools in recent times, thanks to its implementation in fields like IoT, manufacturing, healthcare, security, etc. Computer vision tools have evolved over the years, so much so that computer vision is now also being offered as a service. Moreover, the advancements in hardware like GPUs, as well as machine learning tools and frameworks make computer vision much more powerful in the present day. Major cloud service providers like Google, Microsoft and AWS have all joined the race towards being the developers’ choice. But which tool should you choose? Today I’ll take you through a list of the top tools and will help you understand which one to pick up, based on your need. Computer Vision Tools/Libraries OpenCV: Any post on computer vision is incomplete without the mention of OpenCV. OpenCV is a great performing computer vision tool and it works well with C++ as well as Python. OpenCV is prebuilt with all the necessary techniques and algorithms to perform several image and video processing tasks. It’s quite easy to use and this makes it clearly the most popular computer vision library on the planet! It is multi-platform, allowing you to build applications for Linux, Windows and Android. At the same time, it does have some drawbacks. It gets a bit slow when working through massive data sets or very large images. Moreover, on its own, it doesn’t have GPU support and relies on CUDA for GPU processing. Matlab: Matlab is a great tool for creating image processing applications and is widely used in research. The reason being that Matlab allows quick prototyping. Another interesting aspect is that Matlab code is quite concise, as compared to C++, making it easier to read and debug. It tackles errors before execution by proposing some ways to make the code faster. On the downside, Matlab is a paid tool. Also, it can get quite slow during execution time, if that’s something that concerns you much. Matlab is not your go to tool in an actual production environment, as it was basically built for prototyping and research. AForge.NET/Accord.NET: You’ll be excited to know that image processing is possible even if you’re a C# and .NET developer, thanks to AForge/Accord. It’s a great tool that has a lot of filters and is great for image manipulation and different transforms. The Image Processing Lab allows for filtering capabilities like edge detection and more. AForge is extremely simple to use as all you need to do is adjust parameters from a user interface. Moreover, its processing speeds are quite good. However, AForge doesn’t possess the power and capabilities of other tools like OpenCV, like advanced motion picture analysis or even advanced processing on images. TensorFlow: TensorFlow has been gaining popularity over the past couple of years, owing to its power and ease of use. It lets you bring the power of Deep Learning to computer vision and has some great tools to perform image processing/classification - it’s API-like graph tensor. Moreover, you can make use of the Python API to perform face and expression detection. You can also perform classification using techniques like regression. Tensorflow also allows you to perform computer vision of tremendous magnitudes. One of the main drawbacks of Tensorflow is that it’s extremely resource hungry and can devour a GPU’s capabilities in no time, quite uncalled for. Moreover, if you wanted to learn how to perform image processing with TensorFlow, you’d have to understand what Machine and Deep Learning is, write your own algorithms and then go forward from there. CUDA: CUDA is a platform for parallel computing, invented by NVIDIA. It enables great boosts in computing performance by leveraging the power of GPUs. The CUDA Toolkit includes the NVIDIA Performance Primitives library which is a collection of signal, image, and video processing functions. If you have large images to process, that are GPU intensive, you can choose to use CUDA. CUDA is easy to program and is quite efficient and fast. On the downside, it is extremely high on power consumption and you will find yourself reformulating for memory distribution in parallel tasks. SimpleCV: SimpleCV is a framework for building computer vision applications. It gives you access to a multitude of computer vision tools on the likes of OpenCV, pygame, etc. If you don’t want to get into the depths of image processing and just want to get your work done, this is the tool to get your hands on. If you want to do some quick prototyping, SimpleCV will serve you best. Although, if your intention is to use it in heavy production environments, you cannot expect it to perform on the level of OpenCV. Moreover, the community forum is not very active and you might find yourself running into walls, especially with the installation. GPUImage: GPUImage is a framework or rather, an iOS library that allows you to apply GPU-accelerated effects and filters to images, live motion video, and movies. It is built on OpenGL ES 2.0. Running custom filters on a GPU calls for a lot of code to set up and maintain. GPUImage cuts down on all of that boilerplate and gets the job done for you. Computer Vision as a Service: Google Cloud and Mobile Vision APIs: Google Cloud Vision API enables developers to perform image processing by encapsulating powerful machine learning models in a simple REST API that can be called in an application. Also, its Optical Character Recognition (OCR) functionality enables you to detect text in your images. The Mobile Vision API lets you detect objects in photos and video, using real-time on-device vision technology. It also lets you scan and recognise barcodes and text. Amazon Rekognition: Amazon Rekognition is a deep learning-based image and video analysis service that makes adding image and video analysis to your applications, a piece of cake. The service can identify objects, text, people, scenes and activities, and it can also detect inappropriate content, apart from providing highly accurate facial analysis and facial recognition for sentiment analysis. Microsoft Azure Computer Vision API: Microsoft’s API is quite similar to its peers and allows you to analyse images, read text in them, and analyse video in near-real time. You can also flag adult content, generate thumbnails of images and recognise handwriting. Bonus: SciPy and NumPy: I thought I’d add these in as well, since I’ve seen quite a few developers use Python to build computer vision applications (without OpenCV, that is). SciPy and NumPy are quite powerful enough to perform image processing. scikit-image is a Python package that is dedicated towards image processing, which uses native NumPy and SciPy arrays as image objects. Moreover, you get to use the cool IPython interactive computing environment and you can also choose to include OpenCV if you want to do some more hardcore image processing. Well there you have it, these were the top tools for computer vision and image processing. Head on over and check out these resources, to get working with some of the top tools used in the industry.

0
2
25839

article-image-what-is-pytorch-and-how-does-it-work

Sunith Shetty

18 Sep 2018

7 min read

What is PyTorch and how does it work?

Sunith Shetty

18 Sep 2018

7 min read

PyTorch is a Python-based scientific computing package that uses the power of graphics processing units. It is also one of the preferred deep learning research platforms built to provide maximum flexibility and speed. It is known for providing two of the most high-level features; namely, tensor computations with strong GPU acceleration support and building deep neural networks on a tape-based autograd systems. There are many existing Python libraries which have the potential to change how deep learning and artificial intelligence are performed, and this is one such library. One of the key reasons behind PyTorch’s success is it is completely Pythonic and one can build neural network models effortlessly. It is still a young player when compared to its other competitors, however, it is gaining momentum fast. A brief history of PyTorch Since its release in January 2016, many researchers have continued to increasingly adopt PyTorch. It has quickly become a go-to library because of its ease in building extremely complex neural networks. It is giving a tough competition to TensorFlow especially when used for research work. However, there is still some time before it is adopted by the masses due to its still “new” and “under construction” tags. PyTorch creators envisioned this library to be highly imperative which can allow them to run all the numerical computations quickly. This is an ideal methodology which fits perfectly with the Python programming style. It has allowed deep learning scientists, machine learning developers, and neural network debuggers to run and test part of the code in real time. Thus they don’t have to wait for the entire code to be executed to check whether it works or not. You can always use your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch functionalities and services when required. Now you might ask, why PyTorch? What’ so special in using it to build deep learning models? The answer is quite simple, PyTorch is a dynamic library (very flexible and you can use as per your requirements and changes) which is currently adopted by many of the researchers, students, and artificial intelligence developers. In the recent Kaggle competition, PyTorch library was used by nearly all of the top 10 finishers. Some of the key highlights of PyTorch includes: Simple Interface: It offers easy to use API, thus it is very simple to operate and run like Python. Pythonic in nature: This library, being Pythonic, smoothly integrates with the Python data science stack. Thus it can leverage all the services and functionalities offered by the Python environment. Computational graphs: In addition to this, PyTorch provides an excellent platform which offers dynamic computational graphs, thus you can change them during runtime. This is highly useful when you have no idea how much memory will be required for creating a neural network model. PyTorch Community PyTorch community is growing in numbers on a daily basis. In the just short year and a half, it has shown some great amount of developments that have led to its citations in many research papers and groups. More and more people are bringing PyTorch within their artificial intelligence research labs to provide quality driven deep learning models. The interesting fact is, PyTorch is still in early-release beta, but the way everyone is adopting this deep learning framework at a brisk pace shows its real potential and power in the community. Even though it is in the beta release, there are 741 contributors on the official GitHub repository working on enhancing and providing improvements to the existing PyTorch functionalities. PyTorch doesn’t limit to specific applications because of its flexibility and modular design. It has seen heavy use by leading tech giants such as Facebook, Twitter, NVIDIA, Uber and more in multiple research domains such as NLP, machine translation, image recognition, neural networks, and other key areas. Why use PyTorch in research? Anyone who is working in the field of deep learning and artificial intelligence has likely worked with TensorFlow before, Google’s most popular open source library. However, the latest deep learning framework - PyTorch solves major problems in terms of research work. Arguably PyTorch is TensorFlow’s biggest competitor to date, and it is currently a much favored deep learning and artificial intelligence library in the research community. Dynamic Computational graphs It avoids static graphs that are used in frameworks such as TensorFlow, thus allowing the developers and researchers to change how the network behaves on the fly. The early adopters are preferring PyTorch because it is more intuitive to learn when compared to TensorFlow. Different back-end support PyTorch uses different backends for CPU, GPU and for various functional features rather than using a single back-end. It uses tensor backend TH for CPU and THC for GPU. While neural network backends such as THNN and THCUNN for CPU and GPU respectively. Using separate backends makes it very easy to deploy PyTorch on constrained systems. Imperative style PyTorch library is specially designed to be intuitive and easy to use. When you execute a line of code, it gets executed thus allowing you to perform real-time tracking of how your neural network models are built. Because of its excellent imperative architecture and fast and lean approach it has increased overall PyTorch adoption in the community. Highly extensible PyTorch is deeply integrated with the C++ code, and it shares some C++ backend with the deep learning framework, Torch. Thus allowing users to program in C/C++ by using an extension API based on cFFI for Python and compiled for CPU for GPU operation. This feature has extended the PyTorch usage for new and experimental use cases thus making them a preferable choice for research use. Python-Approach PyTorch is a native Python package by design. Its functionalities are built as Python classes, hence all its code can seamlessly integrate with Python packages and modules. Similar to NumPy, this Python-based library enables GPU-accelerated tensor computations plus provides rich options of APIs for neural network applications. PyTorch provides a complete end-to-end research framework which comes with the most common building blocks for carrying out everyday deep learning research. It allows chaining of high-level neural network modules because it supports Keras-like API in its torch.nn package. PyTorch 1.0: The path from research to production We have been discussing all the strengths PyTorch offers, and how these make it a go-to library for research work. However, one of the biggest downsides is, it has been its poor production support. But this is expected to change soon. PyTorch 1.0 is expected to be a major release which will overcome the challenges developers face in production. This new iteration of the framework will merge Python-based PyTorch with Caffe2 allowing machine learning developers and deep learning researchers to move from research to production in a hassle-free way without the need to deal with any migration challenges. The new version 1.0 will unify research and production capabilities in one framework thus providing the required flexibility and performance optimization for research and production. This new version promises to handle tasks one has to deal with while running the deep learning models efficiently on a massive scale. Along with the production support, PyTorch 1.0 will have more usability and optimization improvements. With PyTorch 1.0, your existing code will continue to work as-is, there won’t be any changes to the existing API. If you want to stay updated with all the progress to PyTorch library, you can visit the Pull Requests page. The beta release of this long-awaited version is expected later this year. Major vendors like Microsoft and Amazon are expected to provide complete support to the framework across their cloud products. Summing up, PyTorch is a compelling player in the field of deep learning and artificial intelligence libraries, exploiting its unique niche of being a research-first library. It overcomes all the challenges and provides the necessary performance to get the job done. If you’re a mathematician, researcher, student who is inclined to learn how deep learning is performed, PyTorch is an excellent choice as your first deep learning framework to learn. Read more Can a production-ready Pytorch 1.0 give TensorFlow a tough time? A new geometric deep learning extension library for Pytorch releases! Top 5 tools for reinforcement learning

0
0
25820

article-image-write-python-code-or-pythonic-code

Aaron Lazar

08 Aug 2018

5 min read

Do you write Python Code or Pythonic Code?

Aaron Lazar

08 Aug 2018

5 min read

If you’re new to Programming, and Python in particular, you might have heard the term Pythonic being brought up at tech conferences, meetups and even at your own office. You might have also wondered why the term and whether they’re just talking about writing Python code. Here we’re going to understand what the term Pythonic means and why you should be interested in learning how to not just write Python code, rather write Pythonic code. What does Pythonic mean? When people talk about pythonic code, they mean that the code uses Python idioms well, that it’s natural or displays fluency in the language. In other words, it means the most widely adopted idioms that are adopted by the Python community. If someone said you are writing un-pythonic code, they might actually mean that you are attempting to write Java/C++ code in Python, disregarding the Python idioms and performing a rough transcription rather than an idiomatic translation from the other language. Okay, now that you have a theoretical idea of what Pythonic (and unpythonic) means, let’s have a look at some Pythonic code in practice. Writing Pythonic Code Before we get into some examples, you might be wondering if there’s a defined way/method of writing Pythonic code. Well, there is, and it’s called PEP 8. It’s the official style guide for Python. Example #1 x=[1, 2, 3, 4, 5, 6] result = [] for idx in range(len(x)); result.append(x[idx] * 2) result Output: [2, 4, 6, 8, 10, 12] Consider the above code, where you’re trying to multiply some elements, “x” by 2. So, what we did here was, we created an empty list to store the results. We would then append the solution of the computation into the result. The result now contains a function which is 2 multiplied by each of the elements. Now, if you were to write the same code in a Pythonic way, you might want to simply use list comprehensions. Here’s how: x=[1, 2, 3, 4, 5, 6] [(element * 2) for element in x] Output: [2, 4, 6, 8, 10] You might have noticed, we skipped the entire for loop! Example #2 Let’s make the previous example a bit more complex, and place a condition that the elements should be multiplied by 2 only if they are even. x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] result = [] for idx in range(len(x)); if x[idx] % 2 == 0; result.append(x[idx] * 2) else; result.append(x[idx]) result Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] We’ve actually created an if else statement to solve this problem, but there is a simpler way of doing things the Pythonic way. [(element * 2 if element % 2 == 0 else element) for element in x] Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] If you notice what we’ve done here, apart from skipping multiple lines of code, is that we used the if-else statement in the same sentence. Now, if you wanted to perform filtering, you could do this: x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [element * 2 for element in x if element % 2 == 0] Output: [4, 8, 12, 16, 20] What we’ve done here is put the if statement after the for declaration, and Voila! We’ve achieved filtering. If you’re using a nice IDE like Jupyter Notebooks or PyCharm, they will help you format your code as per the PEP 8 suggestions. Why should you write Pythonic code? Well firstly, you’re saving loads of time writing humongous piles of cowdung code, so you’re obviously becoming a smarter and more productive programmer. Python is a pretty slow language, and when you’re trying to do something in Python, which is acquired from another language like Java or C++, you’re going to worsen things. With idiomatic, Pythonic code, you’re improving the speed of your programs. Moreover, idiomatic code is far easier to comprehend and understand for other developers who are working on the same code. It helps a great deal when you’re trying to refactor someone else’s code. Fearing Pythonic idioms Well, I don’t mean the idioms themselves are scary. Rather, quite a few developers and organisations have begun discriminating on the basis of whether someone can or cannot write Pythonic code. This is wrong, because, at the end of the day, though the PEP 8 exists, the idea of the term Pythonic is different for different people. To some it might mean picking up a new style guide and improving the way you code. To others, it might mean being succinct and not repeating themselves. It’s time we stopped judging people on whether they can or can’t write Pythonic code and instead, we should appreciate when someone is able to present readable, easily maintainable and succinct code. If you find them writing a bit of clumsy code, you can choose to talk to them about improving their design considerations. And the world will be a better place! If you’re interested in learning how to write more succinct and concise Python code, check out these resources: Learning Python Design Patterns - Second Edition Python Design Patterns [Video] Python Tips, Tricks and Techniques [Video]

0
2
21755

article-image-most-commonly-used-java-machine-learning-libraries

Fatema Patrawala

10 Sep 2018

15 min read

6 most commonly used Java Machine learning libraries

Fatema Patrawala

10 Sep 2018

15 min read

0
0
19441

article-image-libraries-for-geospatial-analysis

Aarthi Kumaraswamy

22 May 2018

12 min read

Top 7 libraries for geospatial analysis

Aarthi Kumaraswamy

22 May 2018

12 min read

The term geospatial refers to finding information that is located on the earth's surface. This can include, for example, the position of a cellphone tower, the shape of a road, or the outline of a country. Geospatial data often associates some piece of information with a particular location. Geospatial development is the process of writing computer programs that can access, manipulate, and display this type of information. Internally, geospatial data is represented as a series of coordinates, often in the form of latitude and longitude values. Additional attributes, such as temperature, soil type, height, or the name of a landmark, are also often present. There can be many thousands (or even millions) of data points for a single set of geospatial data. In addition to the prosaic tasks of importing geospatial data from various external file formats and translating data from one projection to another, geospatial data can also be manipulated to solve various interesting problems. Obvious examples include the task of calculating the distance between two points, calculating the length of a road, or finding all data points within a given radius of a selected point. We use libraries to solve all of these problems and more. Today we will look at the major libraries used to process and analyze geospatial data. GDAL/OGR GEOS Shapely Fiona Python Shapefile Library (pyshp) pyproj Rasterio GeoPandas This is an excerpt from the book, Mastering Geospatial Analysis with Python by Paul Crickard, Eric van Rees, and Silas Toms. Geospatial Data Abstraction Library (GDAL) and the OGR Simple Features Library The Geospatial Data Abstraction Library (GDAL)/OGR Simple Features Library combines two separate libraries that are generally downloaded together as a GDAL. This means that installing the GDAL package also gives access to OGR functionality. The reason GDAL is covered first is that other packages were written after GDAL, so chronologically, it comes first. As you will notice, some of the packages covered in this post extend GDAL's functionality or use it under the hood. GDAL was created in the 1990s by Frank Warmerdam and saw its first release in June 2000. Later, the development of GDAL was transferred to the Open Source Geospatial Foundation (OSGeo). Technically, GDAL is a little different than your average Python package as the GDAL package itself was written in C and C++, meaning that in order to be able to use it in Python, you need to compile GDAL and its associated Python bindings. However, using conda and Anaconda makes it relatively easy to get started quickly. Because it was written in C and C++, the online GDAL documentation is written in the C++ version of the libraries. For Python developers, this can be challenging, but many functions are documented and can be consulted with the built-in pydoc utility, or by using the help function within Python. Because of its history, working with GDAL in Python also feels a lot like working in C++ rather than pure Python. For example, a naming convention in OGR is different than Python's since you use uppercase for functions instead of lowercase. These differences explain the choice for some of the other Python libraries such as Rasterio and Shapely, which are also covered in this chapter, that has been written from a Python developer's perspective but offer the same GDAL functionality. GDAL is a massive and widely used data library for raster data. It supports the reading and writing of many raster file formats, with the latest version counting up to 200 different file formats that are supported. Because of this, it is indispensable for geospatial data management and analysis. Used together with other Python libraries, GDAL enables some powerful remote sensing functionalities. It's also an industry standard and is present in commercial and open source GIS software. The OGR library is used to read and write vector-format geospatial data, supporting reading and writing data in many different formats. OGR uses a consistent model to be able to manage many different vector data formats. You can use OGR to do vector reprojection, vector data format conversion, vector attribute data filtering, and more. GDAL/OGR libraries are not only useful for Python programmers but are also used by many GIS vendors and open source projects. The latest GDAL version at the time of writing is 2.2.4, which was released in March 2018. GEOS The Geometry Engine Open Source (GEOS) is the C/C++ port of a subset of the Java Topology Suite (JTS) and selected functions. GEOS aims to contain the complete functionality of JTS in C++. It can be compiled on many platforms, including Python. As you will see later on, the Shapely library uses functions from the GEOS library. In fact, there are many applications using GEOS, including PostGIS and QGIS. GeoDjango, also uses GEOS, as well as GDAL, among other geospatial libraries. GEOS can also be compiled with GDAL, giving OGR all of its capabilities. The JTS is an open source geospatial computational geometry library written in Java. It provides various functionalities, including a geometry model, geometric functions, spatial structures and algorithms, and i/o capabilities. Using GEOS, you have access to the following capabilities—geospatial functions (such as within and contains), geospatial operations (union, intersection, and many more), spatial indexing, Open Geospatial Consortium (OGC) well-known text (WKT) and well-known binary (WKB) input/output, the C and C++ APIs, and thread safety. Shapely Shapely is a Python package for manipulation and analysis of planar features, using functions from the GEOS library (the engine of PostGIS) and a port of the JTS. Shapely is not concerned with data formats or coordinate systems but can be readily integrated with such packages. Shapely only deals with analyzing geometries and offers no capabilities for reading and writing geospatial files. It was developed by Sean Gillies, who was also the person behind Fiona and Rasterio. Shapely supports eight fundamental geometry types that are implemented as a class in the shapely.geometry module—points, multipoints, linestrings, multilinestrings, linearrings, multipolygons, polygons, and geometrycollections. Apart from representing these geometries, Shapely can be used to manipulate and analyze geometries through a number of methods and attributes. Shapely has mainly the same classes and functions as OGR while dealing with geometries. The difference between Shapely and OGR is that Shapely has a more Pythonic and very intuitive interface, is better optimized, and has a well-developed documentation. With Shapely, you're writing pure Python, whereas with GEOS, you're writing C++ in Python. For data munging, a term used for data management and analysis, you're better off writing in pure Python rather than C++, which explains why these libraries were created. For more information on Shapely, consult the documentation. This page also has detailed information on installing Shapely for different platforms and how to build Shapely from the source for compatibility with other modules that depend on GEOS. This refers to the fact that installing Shapely will require you to upgrade NumPy and GEOS if these are already installed. Fiona Fiona is the API of OGR. It can be used for reading and writing data formats. The main reason for using it instead of OGR is that it's closer to Python than OGR as well as more dependable and less error-prone. It makes use of two markup languages, WKT and WKB, for representing spatial information with regards to vector data. As such, it can be combined well with other Python libraries such as Shapely, you would use Fiona for input and output, and Shapely for creating and manipulating geospatial data. While Fiona is Python compatible and our recommendation, users should also be aware of some of the disadvantages. It is more dependable than OGR because it uses Python objects for copying vector data instead of C pointers, which also means that they use more memory, which affects the performance. Python shapefile library (pyshp) The Python shapefile library (pyshp) is a pure Python library and is used to read and write shapefiles. The pyshp library's sole purpose is to work with shapefiles—it only uses the Python standard library. You cannot use it for geometric operations. If you're only working with shapefiles, this one-file-only library is simpler than using GDAL. pyproj The pyproj is a Python package that performs cartographic transformations and geodetic computations. It is a Cython wrapper to provide Python interfaces to PROJ.4 functions, meaning you can access an existing library of C code in Python. PROJ.4 is a projection library that transforms data among many coordinate systems and is also available through GDAL and OGR. The reason that PROJ.4 is still popular and widely used is two-fold: Firstly, because it supports so many different coordinate systems Secondly, because of the routes it provides to do this—Rasterio and GeoPandas, two Python libraries covered next, both use pyproj and thus PROJ.4 functionality under the hood The difference between using PROJ.4 separately instead of using it with a package such as GDAL is that it enables you to re-project individual points, and packages using PROJ.4 do not offer this functionality. The pyproj package offers two classes—the Proj class and the Geod class. The Proj class performs cartographic computations, while the Geod class performs geodetic computations. Rasterio Rasterio is a GDAL and NumPy-based Python library for raster data, written with the Python developer in mind instead of C, using Python language types, protocols, and idioms. Rasterio aims to make GIS data more accessible to Python programmers and helps GIS analysts learn important Python standards. Rasterio relies on concepts of Python rather than GIS. Rasterio is an open source project from the satellite team of Mapbox, a provider of custom online maps for websites and applications. The name of this library should be pronounced as raster-i-o rather than ras-te-rio. Rasterio came into being as a result of a project called the Mapbox Cloudless Atlas, which aimed to create a pretty-looking basemap from satellite imagery. One of the software requirements was to use open source software and a high-level language with handy multi-dimensional array syntax. Although GDAL offers proven algorithms and drivers, developing with GDAL's Python bindings feels a lot like C++. Therefore, Rasterio was designed to be a Python package at the top, with extension modules (using Cython) in the middle, and a GDAL shared library on the bottom. Other requirements for the raster library were being able to read and write NumPy ndarrays to and from data files, use Python types, protocols, and idioms instead of C or C++ to free programmers from having to code in two languages. For georeferencing, Rasterio follows the lead of pyproj. There are a couple of capabilities added on top of reading and writing, one of them being a features module. Reprojection of geospatial data can be done with the rasterio.warp module. Rasterio's project homepage can be found on Github. GeoPandas GeoPandas is a Python library for working with vector data. It is based on the pandas library that is part of the SciPy stack. SciPy is a popular library for data inspection and analysis, but unfortunately, it cannot read spatial data. GeoPandas was created to fill this gap, taking pandas data objects as a starting point. The library also adds functionality from geographical Python packages. GeoPandas offers two data objects—a GeoSeries object that is based on a pandas Series object and a GeoDataFrame, based on a pandas DataFrame object, but adding a geometry column for each row. Both GeoSeries and GeoDataFrame objects can be used for spatial data processing, similar to spatial databases. Read and write functionality is provided for almost every vector data format. Also, because both Series and DataFrame objects are subclasses from pandas data objects, you can use the same properties to select or subset data, for example .loc or .iloc. GeoPandas is a library that employs the capabilities of newer tools, such as Jupyter Notebooks, pretty well, whereas GDAL enables you to interact with data records inside of vector and raster datasets through Python code. GeoPandas takes a more visual approach by loading all records into a GeoDataFrame so that you can see them all together on your screen. The same goes for plotting data. These functionalities were lacking in Python 2 as developers were dependent on IDEs without extensive data visualization capabilities which are now available with Jupyter Notebooks. We've provided an overview of the most important open source packages for processing and analyzing geospatial data. The question then becomes when to use a certain package and why. GDAL, OGR, and GEOS are indispensable for geospatial processing and analyzing, but were not written in Python, and so they require Python binaries for Python developers. Fiona, Shapely, and pyproj were written to solve these problems, as well as the newer Rasterio library. For a more Pythonic approach, these newer packages are preferable to the older C++ packages with Python binaries (although they're used under the hood). Now that you have an idea of what options are available for a certain use case and why one package is preferable over another, here’s something you should always remember. As is often the way in programming, there might be multiple solutions for one particular problem. For example, when dealing with shapefiles, you could use pyshp, GDAL, Shapely, or GeoPandas, depending on your preference and the problem at hand. Introduction to Data Analysis and Libraries 15 Useful Python Libraries to make your Data Science tasks Easier “Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou Using R to implement Kriging – A Spatial Interpolation technique for Geostatistics data

0
0
18619

article-image-7-ai-tools-mobile-developers-need-to-know

Bhagyashree R

20 Sep 2018

11 min read

7 AI tools mobile developers need to know

Bhagyashree R

20 Sep 2018

11 min read

Advancements in artificial intelligence (AI) and machine learning has enabled the evolution of mobile applications that we see today. With AI, apps are now capable of recognizing speech, images, and gestures, and translate voices with extraordinary success rates. With a number of apps hitting the app stores, it is crucial that they stand apart from competitors by meeting the rising standards of consumers. To stay relevant it is important that mobile developers keep up with these advancements in artificial intelligence. As AI and machine learning become increasingly popular, there is a growing selection of tools and software available for developers to build their apps with. These cloud-based and device-based artificial intelligence tools provide developers a way to power their apps with unique features. In this article, we will look at some of these tools and how app developers are using them in their apps. Caffe2 - A flexible deep learning framework Source: Qualcomm Caffe2 is a lightweight, modular, scalable deep learning framework developed by Facebook. It is a successor of Caffe, a project started at the University of California, Berkeley. It is primarily built for production use cases and mobile development and offers developers greater flexibility for building high-performance products. Caffe2 aims to provide an easy way to experiment with deep learning and leverage community contributions of new models and algorithms. It is cross-platform and integrates with Visual Studio, Android Studio, and Xcode for mobile development. Its core C++ libraries provide speed and portability, while its Python and C++ APIs make it easy for you to prototype, train, and deploy your models. It utilizes GPUs when they are available. It is fine-tuned to take full advantage of the NVIDIA GPU deep learning platform. To deliver high performance, Caffe2 uses some of the deep learning SDK libraries by NVIDIA such as cuDNN, cuBLAS, and NCCL. Functionalities Enable automation Image processing Perform object detection Statistical and mathematical operations Supports distributed training enabling quick scaling up or down Applications Facebook is using Caffe2 to help their developers and researchers train large machine learning models and deliver AI on mobile devices. Using Caffe2, they significantly improved the efficiency and quality of machine translation systems. As a result, all machine translation models at Facebook have been transitioned from phrase-based systems to neural models for all languages. OpenCV - Give the power of vision to your apps Source: AndroidPub OpenCV short for Open Source Computer Vision Library is a collection of programming functions for real-time computer vision and machine learning. It has C++, Python, and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. It also supports the deep learning frameworks TensorFlow and PyTorch. Written natively in C/C++, the library can take advantage of multi-core processing. OpenCV aims to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. The library consists of more than 2500 optimized algorithms including both classic and state-of-the-art computer vision algorithms. Functionalities These algorithms can be used for the following: To detect and recognize faces Identify objects Classify human actions in videos Track camera movements and moving objects Extract 3D models of objects Produce 3D point clouds from stereo cameras Stitch images together to produce a high-resolution image of an entire scene Find similar images from an image database Applications Plickers is an assessment tool, that lets you poll your class for free, without the need for student devices. It uses OpenCV as its graphics and video SDK. You just have to give each student a card called a paper clicker, and use your iPhone/iPad to scan them to do instant checks-for-understanding, exit tickets, and impromptu polls. Also check out FastCV BoofCV TensorFlow Lite and Mobile - An Open Source Machine Learning Framework for Everyone Source: YouTube TensorFlow is an open source software library for building machine learning models. Its flexible architecture allows easy model deployment across a variety of platforms ranging from desktops to mobile and edge devices. Currently, TensorFlow provides two solutions for deploying machine learning models on mobile devices: TensorFlow Mobile and TensorFlow Lite. TensorFlow Lite is an improved version of TensorFlow Mobile, offering better performance and smaller app size. Additionally, it has very few dependencies as compared to TensorFlow Mobile, so it can be built and hosted on simpler, more constrained device scenarios. TensorFlow Lite also supports hardware acceleration with the Android Neural Networks API. But the catch here is that TensorFlow Lite is currently in developer preview and only has coverage to a limited set of operators. So, to develop production-ready mobile TensorFlow apps, it is recommended to use TensorFlow Mobile. Also, TensorFlow Mobile supports customization to add new operators not supported by TensorFlow Mobile by default, which is a requirement for most of the models of different AI apps. Although TensorFlow Lite is in developer preview, its future releases “will greatly simplify the developer experience of targeting a model for small devices”. It is also likely to replace TensorFlow Mobile, or at least overcome its current limitations. Functionalities Speech recognition Image recognition Object localization Gesture recognition Optical character recognition Translation Text classification Voice synthesis Applications The Alibaba tech team is using TensorFlow Lite to implement and optimize speaker recognition on the client side. This addresses many of the common issues of the server-side model, such as poor network connectivity, extended latency, and poor user experience. Google uses TensorFlow for advanced machine learning models including Google Translate and RankBrain. Core ML - Integrate machine learning in your iOS apps Source: AppleToolBox Core ML is a machine learning framework which can be used to integrate machine learning model in your iOS apps. It supports Vision for image analysis, Natural Language for natural language processing, and GameplayKit for evaluating learned decision trees. Core ML is built on top of the following low-level APIs, providing a simple higher level abstraction to these: Accelerate optimizes large-scale mathematical computations and image calculations for high performance. Basic neural network subroutines (BNNS) provides a collection of functions using which you can implement and run neural networks trained with previously obtained data. Metal Performance Shaders is a collection of highly optimized compute and graphic shaders that are designed to integrate easily and efficiently into your Metal app. To train and deploy custom models you can also use the Create ML framework. It is a machine learning framework in Swift, which can be used to train models using native Apple technologies like Swift, Xcode, and Other Apple frameworks. Functionalities Face and face landmark detection Text detection Barcode recognition Image registration Language and script identification Design games with functional and reusable architecture Applications Lumina is a camera designed in Swift for easily integrating Core ML models - as well as image streaming, QR/Barcode detection, and many other features. ML Kit by Google - Seamlessly build machine learning into your apps Source: Google ML Kit is a cross-platform suite of machine learning tools for its Firebase mobile development platform. It comprises of Google's ML technologies, such as the Google Cloud Vision API, TensorFlow Lite, and the Android Neural Networks API together in a single SDK enabling you to apply ML techniques to your apps easily. You can leverage its ready-to-use APIs for common mobile use cases such as recognizing text, detecting faces, identifying landmarks, scanning barcodes, and labeling images. If these APIs don't cover your machine learning problem, you can use your own existing TensorFlow Lite models. You just have to upload your model on Firebase and ML Kit will take care of the hosting and serving. These APIs can run on-device or in the cloud. Its on-device APIs process your data quickly and work even when there’s no network connection. Its cloud-based APIs leverage the power of Google Cloud Platform's machine learning technology to give you an even higher level of accuracy. Functionalities Automate tedious data entry for credit cards, receipts, and business cards, or help organize photos. Extract text from documents, which you can use to increase accessibility or translate documents. Real-time face detection can be used in applications like video chat or games that respond to the player's expressions. Using image labeling you can add capabilities such as content moderation and automatic metadata generation. Applications A popular calorie counter app, Lose It! uses Google ML Kit Text Recognition API to quickly capture nutrition information to ensure it’s easy to record and extremely accurate. PicsArt uses ML Kit custom model APIs to provide TensorFlow–powered 1000+ effects to enable millions of users to create amazing images with their mobile phones. Dialogflow - Give users new ways to interact with your product Source: Medium Dialogflow is a Natural Language Understanding (NLU) platform that makes it easy for developers to design and integrate conversational user interfaces into mobile apps, web applications, devices, and bots. You can integrate it on Alexa, Cortana, Facebook Messenger, and other platforms your users are on. With Dialogflow you can build interfaces, such as chatbots and conversational IVR that enable natural and rich interactions between your users and your business. It provides this human-like interaction with the help of agents. Agents can understand the vast and varied nuances of human language and translate that to standard and structured meaning that your apps and services can understand. It comes in two types: Dialogflow Standard Edition and Dialogflow Enterprise Edition. Dialogflow Enterprise Edition users have access to Google Cloud Support and a service level agreement (SLA) for production deployments. Functionalities Provide customer support One-click integration on 14+ platforms Supports multilingual responses Improve NLU quality by training with negative examples Debug using more insights and diagnostics Applications Domino’s simplified the process of ordering pizza using Dialogflow’s conversational technology. Domino's leveraged large customer service knowledge and Dialogflow's NLU capabilities to build both simple customer interactions and increasingly complex ordering scenarios. Also check out Wit.ai Rasa NLU Microsoft Cognitive Services - Make your apps see, hear, speak, understand and interpret your user needs Source: Neel Bhatt Cognitive Services is a collection of APIs, SDKs, and services to enable developers easily add cognitive features to their applications such as emotion and video detection, facial, speech, and vision recognition, among others. You need not be an expert in data science to make your systems more intelligent and engaging. The pre-built services come with high-quality RESTful intelligent APIs for the following: Vision: Make your apps identify and analyze content within images and videos. Provides capabilities such as image classification, optical character recognition in images, face detection, person identification, and emotion identification. Speech: Integrate speech processing capabilities into your app or services such as text-to-speech, speech-to-text, speaker recognition, and speech translation. Language: Your application or service will understand the meaning of the unstructured text or the intent behind a speaker's utterances. It comes with capabilities such as text sentiment analysis, key phrase extraction, automated and customizable text translation. Knowledge: Create knowledge-rich resources that can be integrated into apps and services. It provides features such as QnA extraction from unstructured text, knowledge base creation from collections of Q&As, and semantic matching for knowledge bases. Search: Using Search API you can find exactly what you are looking for across billions of web pages. It provides features like ad-free, safe, location-aware web search, Bing visual search, custom search engine creation, and many more. Applications To safeguard against fraud, Uber uses the Face API, part of Microsoft Cognitive Services, to help ensure the driver using the app matches the account on file. Cardinal Blue developed an app called PicCollage, a popular mobile app that allows users to combine photos, videos, captions, stickers, and special effects to create unique collages. Also check out AWS machine learning services IBM Watson These were some of the tools that will help you integrate intelligence into your apps. These libraries make it easier to add capabilities like speech recognition, natural language processing, computer vision, and many others, giving users the wow moment of accomplishing something that wasn’t quite possible before. Along with choosing the right AI tool, you must also consider other factors that can affect your app performance. These factors include the accuracy of your machine learning model, which can be affected by bias and variance, using correct datasets for training, seamless user interaction, and resource-optimization, among others. While building any intelligent app it is also important to keep in mind that the AI in your app is solving a problem and it doesn’t exist because it is cool. Thinking from the user’s perspective will allow you to assess the importance of a particular problem. A great AI app will not just help users do something faster, but enable them to do something they couldn’t do before. With the growing popularity and the need to speed up the development of intelligent apps, many companies ranging from huge tech giants to startups are providing AI solutions. In the future we will definitely see more developer tools coming into the market, making AI in apps a norm. 6 most commonly used Java Machine learning libraries 5 ways artificial intelligence is upgrading software engineering Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence

0
0
18553

article-image-earn-1m-per-year-hint-learn-machine-learning

Neil Aitken

01 Aug 2018

10 min read

How to earn $1m per year? Hint: Learn machine learning

Neil Aitken

01 Aug 2018

10 min read

0
0
18209

article-image-tools-for-reinforcement-learning

Pravin Dhandre

21 May 2018

4 min read

Top 5 tools for reinforcement learning

Pravin Dhandre

21 May 2018

4 min read

0
0
18094

article-image-5-ways-artificial-intelligence-is-upgrading-software-engineering

Melisha Dsouza

02 Sep 2018

8 min read

5 ways artificial intelligence is upgrading software engineering

Melisha Dsouza

02 Sep 2018

8 min read

0
0
17781

article-image-5-types-of-deep-transfer-learning

Bhagyashree R

25 Nov 2018

5 min read

5 types of deep transfer learning

Bhagyashree R

25 Nov 2018

5 min read

Transfer learning is a method of reusing a model or knowledge for another related task. Transfer learning is sometimes also considered as an extension of existing ML algorithms. Extensive research and work is being done in the context of transfer learning and on understanding how knowledge can be transferred among tasks. However, the Neural Information Processing Systems (NIPS) 1995 workshop Learning to Learn: Knowledge Consolidation and Transfer in Inductive Systems is believed to have provided the initial motivations for research in this field. The literature on transfer learning has gone through a lot of iterations, and the terms associated with it have been used loosely and often interchangeably. Hence, it is sometimes confusing to differentiate between transfer learning, domain adaptation, and multitask learning. Rest assured, these are all related and try to solve similar problems. In this article, we will look into the five types of deep transfer learning to get more clarity on how these differ from each other. [box type="shadow" align="" class="" width=""]This article is an excerpt from a book written by Dipanjan Sarkar, Raghav Bali, and Tamoghna Ghosh titled Hands-On Transfer Learning with Python. This book covers deep learning and transfer learning in detail. It also focuses on real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples.[/box] #1 Domain adaptation Domain adaptation is usually referred to in scenarios where the marginal probabilities between the source and target domains are different, such as P (Xs) ≠ P (Xt). There is an inherent shift or drift in the data distribution of the source and target domains that requires tweaks to transfer the learning. For instance, a corpus of movie reviews labeled as positive or negative would be different from a corpus of product-review sentiments. A classifier trained on movie-review sentiment would see a different distribution if utilized to classify product reviews. Thus, domain adaptation techniques are utilized in transfer learning in these scenarios. #2 Domain confusion Different layers in a deep learning network capture different sets of features. We can utilize this fact to learn domain-invariant features and improve their transferability across domains. Instead of allowing the model to learn any representation, we nudge the representations of both domains to be as similar as possible. This can be achieved by applying certain preprocessing steps directly to the representations themselves. Some of these have been discussed by Baochen Sun, Jiashi Feng, and Kate Saenko in their paper Return of Frustratingly Easy Domain Adaptation. This nudge toward the similarity of representation has also been presented by Ganin et. al. in their paper, Domain-Adversarial Training of Neural Networks. The basic idea behind this technique is to add another objective to the source model to encourage similarity by confusing the domain itself, hence domain confusion. #3 Multitask learning Multitask learning is a slightly different flavor of the transfer learning world. In the case of multitask learning, several tasks are learned simultaneously without distinction between the source and targets. In this case, the learner receives information about multiple tasks at once, as compared to transfer learning, where the learner initially has no idea about the target task. This is depicted in the following diagram: Multitask learning: Learner receives information from all tasks simultaneously #4 One-shot learning Deep learning systems are data hungry by nature, such that they need many training examples to learn the weights. This is one of the limiting aspects of deep neural networks, though such is not the case with human learning. For instance, once a child is shown what an apple looks like, they can easily identify a different variety of apple (with one or a few training examples); this is not the case with ML and deep learning algorithms. One-shot learning is a variant of transfer learning where we try to infer the required output based on just one or a few training examples. This is essentially helpful in real-world scenarios where it is not possible to have labeled data for every possible class (if it is a classification task) and in scenarios where new classes can be added often. The landmark paper by Fei-Fei and their co-authors, One Shot Learning of Object Categories, is supposedly what coined the term one-shot learning and the research in this subfield. This paper presented a variation on a Bayesian framework for representation learning for object categorization. This approach has since been improved upon, and applied using deep learning systems. #5 Zero-shot learning Zero-shot learning is another extreme variant of transfer learning, which relies on no labeled examples to learn a task. This might sound unbelievable, especially when learning using examples is what most supervised learning algorithms are about. Zero-data learning, or zero-short learning, methods make clever adjustments during the training stage itself to exploit additional information to understand unseen data. In their book on Deep Learning, Goodfellow and their co-authors present zero-shot learning as a scenario where three variables are learned, such as the traditional input variable, x, the traditional output variable, y, and the additional random variable that describes the task, T. The model is thus trained to learn the conditional probability distribution of P(y | x, T). Zero-shot learning comes in handy in scenarios such as machine translation, where we may not even have labels in the target language. In this article we learned about the five types of deep transfer learning types: Domain adaptation, domain confusion, multitask learning, one-shot learning, and zero-shot learning. If you found this post useful, do check out the book, Hands-On Transfer Learning with Python, which covers deep learning and transfer learning in detail. It also focuses on real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples. CMU students propose a competitive reinforcement learning approach based on A3C using visual transfer between Atari games What is Meta Learning? Is the machine learning process similar to how humans learn?

0
0
17566

article-image-top-5-programming-languages-big-data

Amey Varangaonkar

04 Apr 2018

8 min read

Top 5 programming languages for crunching Big Data effectively

Amey Varangaonkar

04 Apr 2018

8 min read

One of the most important decisions that Big Data professionals have to make, especially the ones who are new to the scene or are just starting out, is choosing the best programming languages for big data manipulation and analysis. Understanding the Big Data problem and framing the architecture to solve it is not quite enough these days - the execution needs to be perfect as well, and choosing the right language goes a long way. The best languages for big data In this article, we look at the 5 of the most popularly used - not to mention highly effective - programming languages for developing Big Data solutions. Scala A beautiful crossover of the object-oriented and functional programming paradigms, Scala is fast and robust, and a popular choice of language for many Big Data professionals.The fact that two of the most popular Big Data processing frameworks in Apache Spark and Apache Kafka have been built on top of Scala tells you everything you need to know about the power of Scala. Scala runs on the JVM, which means the codes written in Scala can be easily used within a Java-based Big Data ecosystem. One significant factor that differentiates Scala from Java, though, is that Scala is a lot less verbose in comparison. You can write 100s of lines of confusing-looking Java code in less than 15 lines in Scala. One negative aspect of Scala, though, is its steep learning curve when compared to languages like Go and Python, and this may put off beginners looking to use it. Why use Scala for big data? Fast and robust Suitable for working with Big Data tools like Apache Spark for distributed Big Data processing JVM compliant, can be used in a Java-based ecosystem Python Python has been declared as one of the fastest growing programming languages in 2018 as per the recently held Stack Overflow Developer Survey. Its general-purpose nature means it can be used across a broad spectrum of use-cases, and Big Data programming is one major area of application. Many libraries for data analysis and manipulation which are increasingly being used in a Big Data framework to clean and manipulate large chunks of data, such as pandas, NumPy, SciPy - are all Python-based. Not just that, most popular machine learning and deep learning frameworks such as scikit-learn, Tensorflow and many more, are also written in Python and are finding increasing application within the Big Data ecosystem. One drawback of using Python, and a reason why it is not a first-class citizen when it comes to Big Data programming yet, is that it’s slow. Although very easy to use, Big Data professionals have found systems built with languages such as Java or Scala faster and more robust to use than the systems built with Python. However, Python makes up for this limitation with other qualities. As Python is primarily a scripting language, interactive coding and development of analytical solutions for Big Data becomes very easy. Python can integrate effortlessly with the existing Big Data frameworks such as Apache Hadoop and Apache Spark, allowing you to perform predictive analytics at scale without any problem. Why use Python for big data? General-purpose Rich libraries for data analysis and machine learning Easy to use Supports iterative development Rich integration with Big Data tools Interactive computing through Jupyter notebooks R It won’t come as a surprise to many that those who love statistics, love R. The ‘language of statistics’ as it is popularly called as, R is used to build data models which can be used for effective and accurate data analysis. Powered by a large repository of R packages (CRAN, also called as Comprehensive R Archive Network), with R you have just about every type of tool to accomplish any task in Big Data processing - right from analysis to data visualization. R can be integrated seamlessly with Apache Hadoop and Apache Spark, among other popular frameworks, for Big Data processing and analytics. One issue with using R as a programming language for Big Data is that it is not very general-purpose. It means the code written in R is not production-deployable and generally has to be translated to some other programming language such as Python or Java. That said, if your goal is to only build statistical models for Big Data analytics, R is an option you should definitely consider. Why use R for big data? Built for data science Support for Hadoop and Spark Strong statistical modeling and visualization capabilities Support for Jupyter notebooks Java Last, but not the least, there’s always the good old Java. Some of the traditional Big Data frameworks such as Apache Hadoop and all the tools within its ecosystem are all Java-based, and still in use today in many enterprises. Not to mention the fact that Java is the most stable and production-ready language among all the languages we have discussed so far! Using Java to develop your Big Data applications gives you the ability to use a large ecosystem of tools and libraries for interoperability, monitoring and much more, most of which have already been tried and tested. One major drawback of Java is its verbosity. The fact that you have to write hundreds of lines of codes in Java for a task which can written in barely 15-20 lines of code in Python or Scala, can turnoff many budding programmers. However, the introduction of lambda functions in Java 8 does make life quite easier. Java also does not support iterative development unlike newer languages like Python, and this is an area of focus for the future Java releases. Despite the flaws, Java remains a strong contender when it comes to the preferred language for Big Data programming because of its history and the continued reliance on the traditional Big Data tools and frameworks. Why use Java for big data? Traditional Big Data tools and frameworks are written in Java Stable and production-ready Large ecosystem of tried and tested tools and libraries Go Last but not the least, there’s Go - one of the fastest rising programming languages in recent times. Designed by a group of Google engineers who were frustrated with C++, we think Go is a good shout in this list - simply because of the fact that it powers so many tools used in the Big Data infrastructure, including Kubernetes, Docker and many more. Go is fast, easy to learn, and fairly easy to develop applications with, not to mention deploy them. More importantly, as businesses look at building data analysis systems that can operate at scale, Go-based systems are being used to integrate machine learning and parallel processing of data. It is also possible to interface other languages with Go-based systems with relative ease. Why use Go for big data? Fast, easy to use Many tools used in the Big Data infrastructure are Go-based Efficient distributed computing There are a few other languages you might want to consider - Julia, SAS and MATLAB being some major ones which are useful in their own right. However, when compared to the languages we talked about above, we thought they fell a bit short in some aspects - be it speed, efficiency, ease of use, documentation, or community support, among other things. Let’s take a quick look at the comparison table of all the languages we discussed above. Note that we have used the ✓ symbol for the best possible language/s to help you make an informed decision. This is just our view, and that’s not to say that the other languages are any worse! Scala Python R Java Go Speed ✓ ✓ ✓ Ease of use ✓ ✓ ✓ Quick Learning curve ✓ ✓ Data Analysis capability ✓ ✓ ✓ General-purpose ✓ ✓ ✓ ✓ Big Data support ✓ ✓ ✓ ✓ ✓ Interfacing with other languages ✓ ✓ ✓ Production-ready ✓ ✓ ✓ So...which language should you choose? To answer the question in short - it all depends on the use-case you want to develop. If your focus is hardcore data analysis which involves a lot of statistical computing, R would be your go-to language. On the other hand, if you want to develop streaming applications for your Big Data, Scala can be a preferable choice. If you wish to use Machine Learning to leverage your Big Data and build predictive models, Python will come to your rescue. Lastly, if you plan to build Big Data solutions using just the traditionally-available tools, Java is the language for you. You also have the option of combining the power of two languages to get a more efficient and powerful solution. For example, you can train your machine learning model in Python and deploy it on Spark in a distributed mode. Ultimately, it all depends on how efficiently your solution can function, and more importantly, how fast and accurate it is. Which language do you prefer for crunching your Big Data? Do let us know!

0
1
17276

article-image-building-a-scalable-postgresql-solution

Natasha Mathur

14 Apr 2019

12 min read

Building a scalable PostgreSQL solution

Natasha Mathur

14 Apr 2019

12 min read

The term Scalability means the ability of a software system to grow as the business using it grows. PostgreSQL provides some features that help you to build a scalable solution but, strictly speaking, PostgreSQL itself is not scalable. It can effectively utilize the following resources of a single machine: It uses multiple CPU cores to execute a single query faster with the parallel query feature When configured properly, it can use all available memory for caching The size of the database is not limited; PostgreSQL can utilize multiple hard disks when multiple tablespaces are created; with partitioning, the hard disks could be accessed simultaneously, which makes data processing faster However, when it comes to spreading a database solution to multiple machines, it can be quite problematic because a standard PostgreSQL server can only run on a single machine. In this article, we will look at different scaling scenarios and their implementation in PostgreSQL. The requirement for a system to be scalable means that a system that supports a business now, should also be able to support the same business with the same quality of service as it grows. This article is an excerpt taken from the book 'Learning PostgreSQL 11 - Third Edition' written by Andrey Volkov and Salahadin Juba. The book explores the concepts of relational databases and their core principles. You’ll get to grips with using data warehousing in analytical solutions and reports and scaling the database for high availability and performance. Let's say a database can store 1 GB of data and effectively process 100 queries per second. What if with the development of the business, the amount of data being processed grows 100 times? Will it be able to support 10,000 queries per second and process 100 GB of data? Maybe not now, and not in the same installation. However, a scalable solution should be ready to be expanded to be able to handle the load as soon as it is needed. In scenarios where it is required to achieve better performance, it is quite common to set up more servers that would handle additional load and copy the same data to them from a master server. In scenarios where high availability is required, this is also a typical solution to continuously copy the data to a standby server so that it could take over in case the master server crashes. Scalable PostgreSQL solution Replication can be used in many scaling scenarios. Its primary purpose is to create and maintain a backup database in case of system failure. This is especially true for physical replication. However, replication can also be used to improve the performance of a solution based on PostgreSQL. Sometimes, third-party tools can be used to implement complex scaling scenarios. Scaling for heavy querying Imagine there's a system that's supposed to handle a lot of read requests. For example, there could be an application that implements an HTTP API endpoint that supports the auto-completion functionality on a website. Each time a user enters a character in a web form, the system searches in the database for objects whose name starts with the string the user has entered. The number of queries can be very big because of the large number of users, and also because several requests are processed for every user session. To handle large numbers of requests, the database should be able to utilize multiple CPU cores. In case the number of simultaneous requests is really large, the number of cores required to process them can be greater than a single machine could have. The same applies to a system that is supposed to handle multiple heavy queries at the same time. You don't need a lot of queries, but when the queries themselves are big, using as many CPUs as possible would offer a performance benefit—especially when parallel query execution is used. In such scenarios, where one database cannot handle the load, it's possible to set up multiple databases, set up replication from one master database to all of them, making each them work as a hot standby, and then let the application query different databases for different requests. The application itself can be smart and query a different database each time, but that would require a special implementation of the data-access component of the application, which could look as follows: Another option is to use a tool called Pgpool-II, which can work as a load-balancer in front of several PostgreSQL databases. The tool exposes a SQL interface, and applications can connect there as if it were a real PostgreSQL server. Then Pgpool-II will redirect the queries to the databases that are executing the fewest queries at that moment; in other words, it will perform load-balancing: Yet another option is to scale the application together with the databases so that one instance of the application will connect to one instance of the database. In that case, the users of the application should connect to one of the many instances. This can be achieved with HTTP load-balancing: Data sharding When the problem is not the number of concurrent queries, but the size of the database and the speed of a single query, a different approach can be implemented. The data can be separated into several servers, which will be queried in parallel, and then the result of the queries will be consolidated outside of those databases. This is called data sharding. PostgreSQL provides a way to implement sharding based on table partitioning, where partitions are located on different servers and another one, the master server, uses them as foreign tables. When performing a query on a parent table defined on the master server, depending on the WHERE clause and the definitions of the partitions, PostgreSQL can recognize which partitions contain the data that is requested and would query only these partitions. Depending on the query, sometimes joins, grouping and aggregation could be performed on the remote servers. PostgreSQL can query different partitions in parallel, which will effectively utilize the resources of several machines. Having all this, it's possible to build a solution when applications would connect to a single database that would physically execute their queries on different database servers depending on the data that is being queried. It's also possible to build sharding algorithms into the applications that use PostgreSQL. In short, applications would be expected to know what data is located in which database, write it only there, and read it only from there. This would add a lot of complexity to the applications. Another option is to use one of the PostgreSQL-based sharding solutions available on the market or open source solutions. They have their own pros and cons, but the common problem is that they are based on previous releases of PostgreSQL and don't use the most recent features (sometimes providing their own features instead). One of the most popular sharding solutions is Postgres-XL, which implements a shared-nothing architecture using multiple servers running PostgreSQL. The system has several components: Multiple data nodes: Store the data A single global transaction monitor (GTM): Manages the cluster, provides global transaction consistency Multiple coordinator nodes: Supports user connections, builds query-execution plans, and interacts with the GTM and the data nodes Postgres-XL implements the same API as PostgreSQL, therefore the applications don't need to treat the server in any special way. It is ACID-compliant, meaning it supports transactions and integrity constraints. The COPY command is also supported. The main benefits of using Postgres-XL are as follows: It can scale to support more reading operations by adding more data nodes It can scale for to support more writing operations by adding more coordinator nodes The current release of Postgres-XL (at the time of writing) is based on PostgreSQL 10, which is relatively new The main downside of Postgres-XL is that it does not provide any high-availability features out of the box. When more servers are added to a cluster, the probability of the failure of any of them increases. That's why you should take care with backups or implement replication of the data nodes themselves. Postgres-XL is open source, but commercial support is available. Another solution worth mentioning is Greenplum. It's positioned as an implementation of a massive parallel-processing database, specifically designed for data warehouses. It has the following components: Master node: Manages user connections, builds query execution plans, manages transactions Data nodes: Store the data and perform queries Greenplum also implements the PostgreSQL API, and applications can connect to a Greenplum database without any changes. It supports transactions, but support for integrity constraints is limited. The COPY command is supported. The main benefits of Greenplum are as follows: It can scale to support more reading operations by adding more data nodes. It supports column-oriented table organization, which can be useful for data-warehousing solutions. Data compression is supported. High-availability features are supported out of the box. It's possible (and recommended) to add a secondary master that would take over in case a primary master crashes. It's also possible to add mirrors to the data nodes to prevent data loss. The drawbacks are as follows: It doesn't scale to support more writing operations. Everything goes through the single master node and adding more data nodes does not make writing faster. However, it's possible to import data from files directly on the data nodes. It uses PostgreSQL 8.4 in its core. Greenplum has a lot of improvements and new features added to the base PostgreSQL code, but it's still based on a very old release; however, the system is being actively developed. Greenplum doesn't support foreign keys, and support for unique constraints is limited. There are commercial and open source editions of Greenplum. Scaling for many numbers of connections Yet another use case related to scalability is when the number of database connections is great. However, when a single database is used in an environment with a lot of microservices and each has its own connection pool, even if they don't perform too many queries, it's possible that hundreds or even thousands of connections are opened in the database. Each connection consumes server resources and just the requirement to handle a great number of connections can already be a problem, without even performing any queries. If applications don't use connection pooling and open connections only when they need to query the database and close them afterwards, another problem could occur. Establishing a database connection takes time—not too much, but when the number of operations is great, the total overhead will be significant. There is a tool, named PgBouncer, that implements a connection-pool functionality. It can accept connections from many applications as if it were a PostgreSQL server and then open a limited number of connections towards the database. It would reuse the same database connections for multiple applications' connections. The process of establishing a connection from an application to PgBouncer is much faster than connecting to a real database because PgBouncer doesn't need to initialize a database backend process for the session. PgBouncer can create multiple connection pools that work in one of the three modes: Session mode: A connection to a PostgreSQL server is used for the lifetime of a client connection to PgBouncer. Such a setup could be used to speed up the connection process on the application side. This is the default mode. Transaction mode: A connection to PostgreSQL is used for a single transaction that a client performs. That could be used to reduce the number of connections at the PostgreSQL side when only a few translations are performed simultaneously. Statement mode: A database connection is used for a single statement. Then it is returned to the pool and a different connection is used for the next statement. This mode is similar to the transaction mode, though more aggressive. Note that multi-statement transactions are not possible when statement mode is used. Different pools can be set up to work in different modes. It's possible to let PgBouncer connect to multiple PostgreSQL servers, thus working as a reverse proxy. The way PgBouncer could be used is represented in the following diagram: PgBouncer establishes several connections to the database. When an application connects to PgBouncer and starts a transaction, PgBouncer assigns an existing database connection to that application, forwards all SQL commands to the database, and delivers the results back. When the transaction is finished, PgBouncer will dissociate the connections, but not close them. If another application starts a transaction, the same database connection could be used. Such a setup requires configuring PgBouncer to work in transaction mode. PostgreSQL provides several ways to implement replication that would maintain a copy of the data from a database on another server or servers. This can be used as a backup or a standby solution that would take over in case the main server crashes. Replication can also be used to improve the performance of a software system by making it possible to distribute the load on several database servers. In this article, we discussed the problem of building scalable solutions based on PostgreSQL utilizing the resources of several servers. We looked at scaling for querying, data sharding, as well as scaling for many numbers of connections. If you enjoyed reading this article and want to explore other topics, be sure to check out the book 'Learning PostgreSQL 11 - Third Edition'. Handling backup and recovery in PostgreSQL 10 [Tutorial] Understanding SQL Server recovery models to effectively backup and restore your database Saving backups on cloud services with ElasticSearch plugins

0
0
16935

article-image-a-non-programmers-guide-to-learning-machine-learning

Natasha Mathur

05 Sep 2018

11 min read

A non programmer’s guide to learning Machine learning

Natasha Mathur

05 Sep 2018

11 min read

Artificial intelligence might seem intimidating, but it isn’t actually as complex as you might think. Many of the tools that have been developed over the last decade or so have all helped to make artificial intelligence and machine learning more accessible to engineers with varying degrees of experience and knowledge. Today, we’ve got to a stage where it’s now accessible even to people who have barely written a line of code in their life! Pretty exciting, right? But if you’re completely new to the field, it can be challenging to know how to get started - fortunately, we’re about to help you overcome that first hurdle. If you are an AI denier, then be sure to first read ‘why learn Machine Learning as a non-techie’ before you move forward. A strong purpose and belief is the first step to learning anything new. Alright, now here’s how you can get started with artificial intelligence and machine learning techniques quickly. 0. Use a free MLaaS or a no code interactive machine learning tool to experience first hand what is possible with learning machine learning: Some popular examples of no code machine learning as a service option are Microsoft Azure, BigML, Orange, and Amazon ML. Read Q2 under the FAQ section below to know more on this topic. 1. Learn Linear Algebra: Linear Algebra is the elementary unit for ML. It helps you effectively comprehend the theory behind the Machine learning algorithms and how they work. It also improves your math skills such as statistics, programming skills, which are all other skills that helps in ML. Learning Resources: Linear Algebra for Beginners: Open Doors to Great Careers Linear algebra Basics 2. Learn just enough Python or any programming: Now, you can get started with any language of your interest, but we suggest Python as it’s great for people who are new to programming. It’s easy to learn due to its simple syntax. You’ll be able to quickly implement the ML algorithms. Also, It has a rich development ecosystem that offers a ton of libraries and frameworks in Machine Learning such as Scikit Learn, Lasagne, Numpy, Scipy, Theano, Tensorflow, etc. Learning Resources: Python Machine Learning Learn Python in 7 Days Python for Beginners 2017 [Video] Learn Python with codecademy Python editor for beginner programmers 3. Learn basic Probability Theory and statistics: A lot of fundamental Statistical and Probability Theories form the basis for ML. You’ve probably already learned Probability and statistics in school, it easy to dive into advanced statistics for ML. Machine learning in its currently widely used form is a way to predict odds and see patterns. Knowing statistics and probability is important as it will help you with better understanding of why any machine learning algorithm works. For example, your grounding in this area, will help to ask the right questions, choose the right set of algorithms and know what to expect as answers from your ML model on questions such as: What are the odds of this person also liking this movie given their current movie watching choices ( Collaborative filtering and content-based filtering) How similar is this user to that group of users who brought a bunch of stuff on my site (clustering, collaborative filtering, and classification) Could this person be at risk of cancer given a certain set of traits and health indicator observations (logistic regression) Should you buy that stock (decision tree) Also, check out our interview with James D. Miller to know more about why learning stats is important in this field. Learning resources: Statistics for Data Science [Video] 4. Learn machine learning algorithms: Do not get intimidated! You don’t have to be an expert to learn ML algorithms. Knowing basic ML algorithms that are majorly used in the real world applications like linear regression, naive Bayes, and decision trees, are enough to get you started. Learn what they do and how they are used in Machine Learning. 5. Learn numpy sci-kit learn,Keras or any other popular machine learning framework: It can be confusing initially to decide which framework to learn. Each one has its own advantages and disadvantages. Numpy is a linear algebra library which is useful for performing mathematical and logical operations. You can easily work with large multidimensional arrays using Numpy. Sci-kit learn helps with quick implementation of popular algorithms on datasets as just one line of code makes different algorithms available for you. Keras is minimalistic and straightforward with high-levels of extensibility, so it is easier to approach. Learning Resources: Hands-on Machine Learning with TensorFlow [Video] Hands-on Scikit-learn for Machine Learning [Video] If you have reached till here, it is time to put your learning into practice. Go ahead and create a simple linear regression model using some publicly available dataset in your area of interest. Kaggle, ourworldindata.org, UC Irvine Machine Learning repository, elitedatascience, all have a rich set of clean datasets in varied fields. Now, it is necessary to commit and put in daily efforts to practise these skills. Quora, Reddit, Medium, and stackoverflow will be your best friends when it comes to solving doubts regarding any of these skills. Data Helpers is another great resource that provides newcomers with help on queries regarding entering the ML field and related topics. Additionally, once you start getting hang of these skills, identify your strengths and interests, to realign your career goals. Research on the kind of work you want to put your newly gained Machine Learning skill to use. It needn’t be professional or serious, it just needs to be something that you deeply care about or are passionate about. This will pull you through your learning milestones, should you feel low at some point. Also, don’t forget to collaborate with other people and learn from them. You can work with web developers, software programmers, data analysts, data administrators, game developers etc. Finally, keep yourself updated with all the latest happenings in the ML world. Follow top experts and influencers on social media, top blogs on Machine Learning, and conferences. Once you are done checking off these steps off your list, you’ll be ready to start off with your ML project. Now, we’ll be looking at the most frequently asked questions by beginners in the field of Machine learning. Frequently asked questions by Beginners in ML As a beginner, it’s natural to have a lot of questions regarding ML. We’ll be addressing the top three frequently asked questions by beginners or non-programmers when it comes to Machine learning: Q.1 I am looking to make a career in Machine learning but I have no prior programming experience. Do I need to know programming for Machine learning? In a nutshell, Yes. If you want a career in Machine learning then having some form of programming knowledge really helps. As mentioned earlier in this article, learning a programming language can really help you with implementing ML algorithms. It also lets you know the internal mechanism behind Machine learning. So, having programming as a prior skill is great. Again, as mentioned before, you can get started with Python which is the easiest and the most common languages for ML. However, programming is just a part of Machine learning. For instance, “machine learning engineers” typically write more code than develop models, while “research scientists” work more on modelling and analyzing different models. Now, ML is based on the principles of statistical inference and for talking statistically to the computer, we need a language, there comes Coding. So, even though the nature of your job in ML might not require you to code as much, there’s still some amount of coding required. Read Also: Why is Python so good for AI and ML? 5 Python Experts Explain Top languages for Artificial Intelligence development Q.2 Are there any tools that can help me with Machine learning without touching a single line of code? Yes. With the rise of MLaaS (Machine learning as a service), there are certain tools that help you get started with machine learning right-away. These are especially useful for business applications of ML, such as predictive modelling and clustering. Read Also: How MLaaS is transforming cloud Some of the most popular ones are: BigML: This cloud based web-service lets you upload your data, prepare it and run algorithms on it. It’s great for people with not so extensive data science backgrounds. It offers a clean and easy to use interfaces for configuring algorithms (decision trees) and reviewing the results. Being focused “only” on Machine Learning, it comes with a wide set of features, all well integrated within a usable Web UI. Other than that, it also offers an API so that if you like it you can build an application around it. Microsoft Azure: The Microsoft Azure ML studio is a “GUI-based integrated development environment for constructing and operationalizing Machine Learning workflow on Azure”. So, via an integrated development environment called ML Studio, people without data science background or non-programmers can also build data models with the help of drag-and-drop gestures and simple data flow diagrams. This also saves a lot of time through ML Studio's library of sample experiments. Learning resources: Microsoft Azure Machine Learning Machine Learning In The Cloud With Azure ML[Video] Orange: This is an open source machine learning and data visualization studio for novice and experts alike. It provides a toolbox comprising of text mining (topic modelling) and image recognition. It also offers a design tool for visual programming which allows you to connect together data preparation, algorithms, and result evaluation, thereby, creating machine learning “programs”. Apart from that, it provides over 100 widgets for the environment and there’s also a Python API and library available which you can integrate into your application. Amazon ML: Amazon ML is a part of Amazon Web Services ( AWS ) that combines powerful machine learning algorithms with interactive visual tools to guide you towards easily creating, evaluating, and deploying machine learning models. So, whether you are a data scientist or a newbie, it offers ML services and tools tailored to meet your needs and level of expertise. Building ML models using Amazon ML consists of three operations: data analysis, model training, and evaluation. Learning Resources: Effective Amazon Machine Learning Q.3 Do I need to know advanced mathematics ( college graduate level ) to learn Machine learning? It depends. As mentioned earlier, understanding of the following mathematical topics: Probability, Statistics and Linear Algebra can really make your machine learning journey easier and also help simplify your code. These help you understand the “why” behind the working of the machine learning algorithms, which is quite fundamental to understanding ML. However, not knowing advanced mathematics is not an excuse to not learning Machine Learning. There a lot of libraries which makes the task of applying an ML algorithm to solve a task easier. One such example is the widely used Python’s scikit-learn library. With scikit-learn, you just need one line of code and you’ll have the most common algorithms there for you, ready to be used. But, if you want to go deeper into machine learning then knowing advanced mathematics is a prerequisite as it will help you understand the algorithms, the formulas, how the learning is done and many other Machine Learning concepts. Also, with so many courses and tutorials online, you can always learn advanced mathematics on the side while exploring Machine learning. So, we looked at the three most asked questions by beginners in the field of Machine Learning. In the past, machine learning has provided us with self-driving cars, effective web search, speech recognition, etc. Machine learning is extremely pervasive, in fact, many researchers believe that ML is the best way to make progress towards human-level AI. Learning ML is not an easy task but its not next to impossible either. In the end, it all depends on the amount of dedication and efforts that you’re willing to put in to get a grasp of it. We just touched the tip of the iceberg in this article, there’s a lot more to know in Machine Learning which you will get a hang of as you get your feet dirty in it. That being said, all the best for the road ahead! Facebook launches a 6-part ML video series 7 of the best ML conferences for the rest of 2018 Google introduces Machine Learning courses for AI beginners

0
0
16815

article-image-dl-frameworks-tensorflow-vs-cntk

Aaron Lazar

30 Oct 2017

6 min read

The Deep Learning Framework Showdown: TensorFlow vs CNTK

Aaron Lazar

30 Oct 2017

6 min read

The question several Deep Learning engineers may ask themselves is: Which is better, TensorFlow or CNTK? Well, we're going to answer that question for you, taking you through a closely fought match between the two most exciting frameworks. So, here we are, ladies and gentlemen, it's fight night and it's a full house. In the Red corner, weighing in at two hundred and seventy pounds of Python and topping out at over ten thousand frames per second; managed by the American tech giant, Google; we have the mighty, the beefy, TensorFlow! In the Blue corner, weighing in at two hundred and thirty pounds of C++ muscle, we have, one of the top toolkits that can comfortably scale beyond a single machine. Managed by none other than Microsoft, it's fast, it's furious, it's CNTK aka the Microsoft Cognitive Toolkit! And we're into Round One… TensorFlow and CNTK are looking quite menacingly at each other and are raging to take down their opponents. TensorFlow seems pleased that its compile times are considerably faster than its successor, Theano. Although, it looks like happiness came a tad bit soon. CNTK, light and bouncy on it's feet, comes straight out of nowhere with a whopping seventy thousand frames/second upper cut, knocking TensorFlow to the floor. TensorFlow looks like it's in no mood to give up anytime soon. It makes itself so simple to use and understand that even students can pick it up and start training their own models. This isn't the case with CNTK, as it begs to shed its complexity. On the other hand, CNTK seems to be thrashing TensorFlow in terms of 3D convolution, where CNTK can clearly recognize images from streaming content. TensorFlow also tries its best to run LSTM RNNs, but in vain. The crowd keeps cheering on… Wait a minute...are they calling out for TensorFlow? Yes they are! There's hardly any cheering for CNTK. This is embarrassing! Looks like its community support can't match up to TensorFlow's. And ladies and gentlemen, that does make a difference - we can see TensorFlow improving on several fronts and gradually getting back in the game! TensorFlow huffs and puffs as it tries to prove that it's not just about deep learning and that it has tools in the pocket that can support other algorithms such as reinforcement learning. It conveniently whips out the TensorBoard, and drops CNTK to the floor with its beautiful visualizations. TensorFlow now has the upper hand and is trying hard to pin CNTK to the floor and tries to use its R support to finish it off. But CNTK tactfully breaks loose and leaves TensorFlow on the floor - still not ready to be used in production. And there goes the bell for Round One! Both fighters look exhausted but you can see a faint twinkle in TensorFlow's eye, primarily because it survived Round One. Google seems to be working hard to prep it for Round Two and is making several improvements in terms of speed, flexibility and majorly making it ready for production. Meanwhile, Microsoft boosts CNTK's spirits with a shot of Python APIs in its blood. As it moves towards reaching version 2.0, there are a lot of improvements to CNTK, wherein, Microsoft has ensured that it's not left behind, like having a backend for Keras, which puts it on par with TensorFlow. Moreover, there seem to be quite a few experimental features that it looks ready to enter the ring with, like the Java API for example. It's the final round and boy, are these two into a serious stare-down! The referee waves them in and off they are. CNTK needs to get back at TensorFlow. Comfortably supporting multiple GPUs and CPUs out of the box, across both the Microsoft and Linux platforms, it has an advantage over TensorFlow. Is it going to use that trump card? Yes it is! A thousand GPUs and a hundred machines in, and CNTK is raining blows on TensorFlow. TensorFlow clearly drops the ball when it comes to multiple machines, and it rather complicates things. It's high time that TensorFlow turned the tables. Lo and behold! It shows off its mobile deep learning capabilities with TensorFlow Lite, clearly flipping CNTK flat on its back. This is revolutionary and a tremendous breakthrough for TensorFlow! CNTK, however, is clearly the people's choice when it comes to language compatibility. With support for C++, Python, C#/.NET and now Java, it's clearly winning in this area. Round Two is coming to an end, ladies and gentlemen and it's a neck to neck battle out there. We're not sure the judges are going to be able to choose a clear winner, from the looks of it. And…. there goes the bell! While the scores are being tallied, we go over to the teams and some spectators for some gossip on the what's what of deep learning. Did you know having multiple machine support is a huge advantage? It increases speed and efficiency by almost 10 times! That's something! We also got to know that TensorFlow is training hard and is picking up positives from its rival, CNTK. There are also rumors about a new kid called MXNet (read about it here), that has APIs in R, Python and even in Julia! This makes it one helluva framework in terms of flexibility and speed. In fact, AWS is already implementing it while Apple also is rumored to be using it. Clearly, something to watch out for. And finally, the judges have made their decision. Ladies and gentlemen, after two rounds of sheer entertainment, we have the results... TensorFlow CNTK Processing speed 0 1 Learning curve 1 0 Production readiness 0 1 Community support 1 0 CPU, GPU computation support 0 1 Mobile deep learning 1 0 Multiple language compatibility 0 1 It's a unanimous decision and just as we thought, CNTK is the heavyweight champion! CNTK clearly beat TensorFlow in terms of performance, because of its flexibility, speed and ability to use in production! As a Deep Learning engineer, should you be wanting to use one of these frameworks in your tasks, you should check out their features thoroughly, test them out with a test dataset and then implement them to your actual data. After all, it's the choices we make that define a win or a loss - simplicity over resource utilisation, or speed over platform, we must choose our tools wisely. For more information on the kind of tests that both the tools have been put through, read the Research Paper presented by Shaohuai Shi, Qiang Wang, Pengfei Xu and Xiaowen Chu from the Department of Computer Science, Hong Kong Baptist University and these benchmarks.

0
1
15212

Tech Guides - Data

5 examples of Artificial Intelligence in Web apps

Top 10 Tools for Computer Vision

What is PyTorch and how does it work?

Do you write Python Code or Pythonic Code?

6 most commonly used Java Machine learning libraries

Top 7 libraries for geospatial analysis

7 AI tools mobile developers need to know

How to earn $1m per year? Hint: Learn machine learning

Top 5 tools for reinforcement learning

5 ways artificial intelligence is upgrading software engineering

Trending Topics

5 types of deep transfer learning

Top 5 programming languages for crunching Big Data effectively

Building a scalable PostgreSQL solution

A non programmer’s guide to learning Machine learning

The Deep Learning Framework Showdown: TensorFlow vs CNTK