Tech Guides

article-image-uses-of-machine-learning-in-gaming

22 Oct 2018

5 min read

Uses of Machine Learning in Gaming

22 Oct 2018

All around us, our perception of learning and intellect is being challenged daily with the advent of new and emerging technologies. From self-driving cars, playing Go and Chess, to computers being able to beat humans at classic Atari games, the advent of a group of technologies we colloquially call Machine Learning have come to dominate a new era in technological growth – a new era of growth that has been compared with the same importance as the discovery of electricity and has already been categorized as the next human technological age. Games and simulations are no stranger to AI technologies and there are numerous assets available to the Unity developer in order to provide simulated machine intelligence. These technologies include content like Behavior Trees, Finite State Machine, navigation meshes, A*, and other heuristic ways game developers use to simulate intelligence. So, why Machine Learning and why now? The reason is due in large part to the OpenAI initiative, an initiative that encourages research across academia and the industry to share ideas and research on AI and ML. This has resulted in an explosion of growth in new ideas, methods, and areas for research. This means for games and simulations that we no longer have to fake or simulate intelligence. Now, we can build agents that learn from their environment and even learn to beat their human builders. This article is an excerpt taken from the book 'Learn Unity ML-Agents – Fundamentals of Unity Machine Learning' by Micheal Lanham. In this article, we look at the role that machine learning plays in game development. Machine Learning is an implementation of Artificial Intelligence. It is a way for a computer to assimilate data or state and provide a learned solution or response. We often think of AI now as a broader term to reflect a "smart" system. A full game AI system, for instance, may incorporate ML tools combined with more classic AIs like Behavior Trees in order to simulate a richer, more unpredictable AI. We will use AI to describe a system and ML to describe the implementation. How Machine Learning is useful in gaming Game engines have embraced the idea of incorporating ML into all aspects of its product and not just for use as a game AI. While most developers may try to use ML for gaming, it certainly helps game development in the following areas: Map/Level Generation: There are already plenty of examples where developers have used ML to auto-generate everything from dungeons to the realistic terrain. Getting this right can provide a game with endless replayability, but it can be some of the most challenging ML to develop. Texture/Shader Generation: Another area that is getting the attention of ML is texture and shader generation. These technologies are getting a boost brought on by the attention of advanced generative adversarial networks, or GAN. There are plenty of great and fun examples of this tech in action; just do a search for DEEP FAKES in your favorite search engine. Model Generation: There are a few projects coming to fruition in this area that could greatly simplify 3D object construction through enhanced scanning and/or auto-generation. Imagine being able to textually describe a simple model and having ML build it for you, in real-time, in a game or other AR/VR/MR app, for example. Audio Generation: Being able to generate audio sound effects or music on the fly is already being worked on for other areas, not just games. Yet, just imagine being able to have a custom designed soundtrack for your game developed by ML. Artificial Players: This encompasses many uses from the gamer themselves using ML to play the game on their behalf to the developer using artificial players as enhanced test agents or as a way to engage players during low activity. If your game is simple enough, this could also be a way of auto testing levels. NPCs or Game AI: Currently, there are better patterns out there to model basic behavioral intelligence in the form of Behavior Trees. While it's unlikely that BTs or other similar patterns will go away any time soon, imagine being able to model an NPC that may actually do an unpredictable, but rather cool behavior. This opens all sorts of possibilities that excite not only developers but players as well. So, we learned about different areas of the gaming world such as model generation, artificial players, NPCs, level generation, etc, where Machine learning can be extensively used. If you found this post useful, be sure to check out the book 'Learn Unity ML-Agents – Fundamentals of Unity Machine Learning' to learn more machine learning concepts in gaming. 5 Ways Artificial Intelligence is Transforming the Gaming Industry How should web developers learn machine learning? Deep Learning in games – Neural Networks set to design virtual worlds

0
0
9253

article-image-how-artificial-intelligence-can-improve-pentesting

Melisha Dsouza

21 Oct 2018

8 min read

How artificial intelligence can improve pentesting

Melisha Dsouza

21 Oct 2018

8 min read

686 cybersecurity breaches were reported in the first three months of 2018 alone, with unauthorized intrusion accounting for 38.9% of incidents. And with high-profile data breaches dominating headlines, it’s clear that while modern, complex software architecture might be more adaptable and data-intensive than ever, securing that software is proving a real challenge. Penetration testing (or pentesting) is a vital component within the cybersecurity toolkit. In theory, it should be at the forefront of any robust security strategy. But it isn’t as simple as just rolling something out with a few emails and new software - it demands people with great skills, as well a culture where stress testing and hacking your own system is viewed as a necessity, not an optional extra. This is where artificial intelligence comes in - the automation that you can achieve through artificial intelligence could well help make pentesting much easier to do consistently and at scale. In turn, this would help organizations tackle both issues of skills and culture, and get serious about their cybersecurity strategies. But before we dive deeper into artificial intelligence and pentesting, let’s take a look at where we are now, and the shortcomings of established pentesting methods. The shortcomings of established methods of pentesting Typically, pentesting is carried out in 5 stages: Source: Incapsula Every one of these stages, when carried out by humans, opens up the chance of error. Yes, software is important, but contextual awareness and decisions are required.. This process, then, provides plenty of opportunities for error. From misinterpreting data - like thinking a system is secure, when actually it isn’t - to taking care of evidence and thoroughly and clearly recording the results of pentests, even the most experienced pentester will get things wrong. But even if you don’t make any mistakes, this whole process is hard to do well at scale. It requires a significant amount of time and energy to test a piece of software, which, given the pace of change created by modern processes, makes it much harder to maintain the levels of rigor you ultimately want from pentesting. This is where artificial intelligence comes in. The pentesting areas that artificial intelligence can impact Let’s dive into the different stages of pentesting that AI can impact. #1 Reconnaissance Stage The most important stage in pentesting is the Reconnaissance or information gathering stage. As rightly said by many in cybersecurity, "The more information gathered, the higher the likelihood of success." Therefore, a significant amount of time should be spent obtaining as much information as possible about the target. Using AI to automate this stage would provide accurate results as well as save a lot of time invested. Using a combination of Natural Language Processing, Computer Vision, and Artificial Intelligence, experts can identify a wide variety of details that can be used to build a profile of the company, its employees, the security posture, and even the software/hardware components of the network and computers. #2 Scanning Stage Comprehensive coverage is needed In the scanning phase. Manually scanning through thousands if systems in an organization is not ideal. NNor is it ideal to interpret the results returned by scanning tools. AI can be used to tweak the code of the scanning tools to scan systems as well as interpret the results of the scan. It can help save pentesters time and help in the overall efficiency of the pentesting process. AI can focus on test management and the creation of test cases automatically that will check if a particular program can be tagged having security flaw. They can also be used to check how a target system responds to an intrusion. #3 Gaining and Maintaining access stage Gaining access phase involves taking control of one or more network devices in order to either extract data from the target, or to use that device to then launch attacks on other targets. Once a system is scanned for vulnerabilities, the pentesters need to ensure that the system does not have any loopholes that attackers can exploit to get into the network devices. They need to check that the network devices are safely protected with strong passwords and other necessary credentials. AI-based algorithms can try out different combinations of passwords to check if the system is susceptible for a break-in. The algorithms can be trained to observe user data, look for trends or patterns to make inferences about possible passwords used. Maintaining access focuses on establishing other entry points to the target. This phase is expected to trigger mechanisms, to ensure that the penetration tester’s security when accessing the network. AI-based algorithms should be run at equal intervals to time to guarantee that the primary path to the device is closed. The algorithms should be able to discover backdoors, new administrator accounts, encrypted channels, new network access channels, and so on. #4 Covering Tracks And Reporting The last stage tests whether an attacker can actually remove all traces of his attack on the system. Evidence is most often stored in user logs, existing access channels, and in error messages caused by the infiltration process. AI-powered tools can assist in the discovery of hidden backdoors and multiple access points that haven't been left open on the target network; All of these findings should be automatically stored in a report with a proper timeline associated with every attack done. A great example of a tool that efficiently performs all these stages of pentesting is CloudSEK’s X-Vigil. This tool leverages AI to extract data, derive analysis and discover vulnerabilities in time to protect an organization from data breach. Manual vs automated vs AI-enabled pentesting Now that you have gone through the shortcomings of manual pen testing and the advantages of AI-based pentesting, let’s do a quick side-by-side comparison to understand the difference between the two. Manual Testing Automated Testing AI enabled pentesting Manual testing is not accurate at all times due to human error This is more likely to return false positives AI enabled pentesting is accurate as compared to automated testing Manual testing is time-consuming and takes up human resources. Automated testing is executed by software tools, so it is significantly faster than a manual approach. AI enabled testing does not consume much time. The algorithms can be deployed for thousands of systems at a single instance. Investment is required for human resources. Investment is required for testing tools. AI will save the investment for human resources in pentesting. Rather, the same employees can be used to perform less repetitive and more efficient tasks Manual testing is only practical when the test cases are run once or twice, and frequent repetition is not required.. Automated testing is practical when tools find test vulnerabilities out of programmable bounds AI-based pentesting is practical in organizations with thousands of systems that need to be tested at once to save time and resources. AI-based pentesting tools Pentoma is an AI-powered penetration testing solution that allows software developers to conduct smart hacking attacks and efficiently pinpoint security vulnerabilities in web apps and servers. It identifies holes in web application security before hackers do, helping prevent any potential security damages. Pentoma analyzes web-based applications and servers to find unknown security risks.In Pentoma, with each hacking attempt, machine learning algorithms incorporate new vulnerability discoveries, thus continuously improving and expanding threat detection capability. Wallarm Security Testing is another AI based testing tool that discovers network assets, scans for common vulnerabilities, and monitors application responses for abnormal patterns. It discovers application-specific vulnerabilities via Automated Threat Verification. The content of a blocked malicious request is used to create a sanitized test with the same attack vector to see how the application or its copy in a sandbox would respond. With such AI based pentesting tools, pentesters can focus on the development process itself, confident that applications are secured against the latest hacking and reverse engineering attempts, thereby helping to streamline a product’s time to market. Perhaps it is the increase in the number of costly data breaches or the continually expanding attack and proliferation of sensitive data and the attempt to secure them with increasingly complex security technologies that businesses lack in-house expertise to properly manage. Whatever be the reason, more organizations are waking up to the fact that if vulnerabilities are not caught in time can be catastrophic for the business. These weaknesses, which can range from poorly coded web applications, to unpatched databases to exploitable passwords to an uneducated user population, can enable sophisticated adversaries to run amok across your business. It would be interesting to see the growth of AI in this field to overcome all the aforementioned shortcomings. 5 ways artificial intelligence is upgrading software engineering Intelligent Edge Analytics: 7 ways machine learning is driving edge computing adoption in 2018 8 ways Artificial Intelligence can improve DevOps

0
0
13234

article-image-julia-for-machine-learning-will-the-new-language-pick-up-pace

Prasad Ramesh

20 Oct 2018

4 min read

Julia for machine learning. Will the new language pick up pace?

Prasad Ramesh

20 Oct 2018

4 min read

Machine learning can be done using many languages, with Python and R being the most popular. But one language has been overlooked for some time—Julia. Why isn’t Julia machine learning a thing? Julia isn't an obvious choice for machine learning simply because it's a new language that has only recently hit version 1.0. While Python is well-established, with a large community and many libraries, Julia simply doesn't have the community to shout about it. And that's a shame. Right now Julia is used in various fields. From optimizing milk production in dairy farms to parallel supercomputing for astronomy, Julia has a wide range of applications. A common theme here is that these actions all require numerical, scientific, and sometimes parallel computation. Julia is well-suited to the sort of tasks where intensive computation is essential. Viral Shah, CEO of Julia Computing said to Forbes “Amazon, Apple, Disney, Facebook, Ford, Google, Grindr, IBM, Microsoft, NASA, Oracle and Uber are other Julia users, partners and organizations hiring Julia programmers.” Clearly, Julia is powering the analytical nous of some of the most high profile organizations on the planet. Perhaps it just needs more cheerleading to go truly mainstream. Why Julia is a great language for machine learning Julia was originally designed for high-performance numerical analysis. This means that everything that has gone into its design is built for the very things you need to do to build effective machine learning systems. Speed and functionality Julia combines the functionality from various popular languages like Python, R, Matlab, SAS and Stata with the speed of C++ and Java. A lot of the standard LaTeX symbols can be used in Julia, with the syntax usually being the same as LaTeX. This mathematical syntax makes it easy for implementing mathematical formulae in code and make Julia machine learning possible. It also has in-built support for parallelism which allows utilization of multiple cores at once making it fast at computations. Julia’s loops and functions features are pretty fast, fast enough that you would probably notice significant performance differences against other languages. The performance can be almost comparable to C with very little code actually used. With packages like ArrayFire, generic code can be run on GPUs. In Julia, the multiple dispatch feature is very useful for defining number and array-like datatypes. Matrices, data tables work with good compatibility and performance. Julia has automatic garbage collection, a collection of libraries for mathematical calculations, linear algebra, random number generation, and regular expression matching. Libraries and scalability Julia machine learning can be done with powerful tools like MLBase.jl, Flux.jl, Knet.jl, that can be used for machine learning and artificial intelligence systems. It also has a scikit-learn implementation called ScikitLearn.jl. Although ScikitLearn.jl is not an official port, it is a useful additional tool for building machine learning systems with Julia. As if all those weren’t enough, Julia also has TensorFlow.jl and MXNet.jl. So, if you already have experience with these tools, in other implementations, the transition is a little easier than learning everything from scratch. Julia is also incredibly scalable. It can be deployed on large clusters quickly, which is vital if you’re working with big data across a distributed system. Should you consider Julia machine learning? Because it’s fast and possesses a great range of features, Julia could potentially overtake both Python and R to be the choice of language for machine learning in the future. Okay, maybe we shouldn’t get ahead of ourselves. But with Julia reaching the 1.0 milestone, and the language rising on the TIOBE index, you certainly shouldn’t rule out Julia when it comes to machine learning. Julia is also available to use in the popular tool Jupyter Notebook, paving a path for wider adoption. A note of caution, however, is important. Rather than simply dropping everything for Julia, it will be worth monitoring the growth of the language. Over the next 12 to 24 months we’ll likely see new projects and libraries, and the Julia machine learning community expanding. If you start hearing more noise about the language, it becomes a much safer option to invest your time and energy in learning it. If you are just starting off with machine learning, then you should stick to other popular languages. An experienced engineer, however, who already has a good grip on other languages shouldn’t be scared of experimenting with Julia - it gives you another option, and might just help you to uncover new ways of working and solving problems. Julia 1.0 has just been released What makes functional programming a viable choice for artificial intelligence projects? Best Machine Learning Datasets for beginners

0
0
6867

article-image-4-key-benefits-of-using-firebase-for-mobile-app-development

Guest Contributor

19 Oct 2018

6 min read

4 key benefits of using Firebase for mobile app development

Guest Contributor

19 Oct 2018

6 min read

A powerful backend solution is essential for building sophisticated mobile apps. In recent years, Firebase has emerged to prominence as a power-packed Backend-as-a-Solution (BaaS), thanks to its wide-ranging features and performance boosting elements. After being acquired in 2014 by Google, several of its features further got a performance boost. These features have made Firebase quite a popular backend solution for app developers and other emerging IT sectors. Let us look at its 4 key benefits for cross-platform mobile app development. Unleashing the power of Google Analytics Google Analytics for Firebase is a completely free solution with unconstrained reporting on many aspects. The reporting feature allows you to evaluate client behavior, report on broken links, user interactions and all other aspects of user experience and user interface. The reporting helps developers make informed decisions while optimizing the UI and the app performance. The unmatched scale of reporting: Firebase analytics allows access to unlimited reports on as many as 500 different events. The developers can also create custom events for reporting as their need suits. Robust audience segmentation: The Firebase analytics also allows segmenting the app audience on different parameters and grounds. The integrated console allows segmenting the audience on the basis of device information, custom events, and user characteristics. Crash reporting to fix Bugs Firebase also helps to address performance issues of an app by fixing bugs right from its backend solution. It is also equipped with robust crash reporting feature. Its crash reporting helps to deliver intricate and detailed bug and crash reports to address all the coding errors in an app. The reporting feature is capable of grouping together the issues in different categories as per the characteristics of the problem. Here are some of the attributes of this reporting feature. Monitoring errors: It is capable of monitoring fatal errors for iOS apps and both fatal and non-fatal errors for Android apps. Generally, reports are initiated as per the impact caused by such errors on the user experience. Required data collection to fix errors: The reports also enlist all the details concerning the device in use, performance shortfalls and user scenarios concerning the erroneous events. According to the contributing factors and other similarities, the issues are grouped in different categories. Email alerts: It also allows sending email alerts as and when such issues or problems are detected. The configuration of error reporting: The error reporting can also be configured remotely to control who can access the reports and list of events that occurred before an event. It is free: Crash and bug reporting is free with Firebase. You don't need to pay a penny to access this feature. Synchronizing data with real-time database With Firebase you can sync the offline and online data through NoSQL database. This makes the application data available on both offline and online states of the app. This boosts collaboration on the application data in real time. Here are some of its benefits. Real-time: Unlike the so-called HTTP requests that work to update the data across interfaces, the Real-time Database of firebase syncs data with every change thus helping to reflect the change in real time across any device in use. Offline: As Firebase Real-time Database SDK helps save your data in local disk, you can always access the data offline. As and when connectivity is back, the changes are synced with the present state of the server. Access from multiple devices: The Firebase Real-time Database allows accessing application data from multiple devices and interfaces including mobile devices and web. Splitting and scaling your data: Thanks to Firebase Real-time Database, you can split your data across multiple databases within the same project and set rules for each database instances. Firebase is feature rich for futuristic app development In addition to the above, Firebase is fully empowered with a host of rich features required for building sophisticated and most feature-rich mobile apps. Let us have a look at some of the key features of Firebase that made it a reliable platform for cross-platform development. Hosting: The hosting feature of Firebase allows developers to update their contents in the Content Delivery Network (CDN) during production. Firebase offers full hosting support with a custom domain, Global CDN, and an automatically provided SSL Certificate. Authentication: Firebase backend service offers a powerful authentication feature. It comes equipped with simple SDKs and easy to use libraries to integrate authentication feature with any mobile app. Storage: Firebase storage feature is powered by Google Cloud Storage and allows users to easily download media files and visual contents. This feature is also helpful in making use of user-generated content. Cloud Messaging: With Cloud Messaging, a mobile app powered can easily send a message to users and indulge in real-time communication. Remote Configuration: This feature of Firebase allows developers to incorporate certain changes in the app remotely. Thanks to this, the changes are reflected in the existing version, and the user does not need to download the latest updated version. Test Lab: With Test lab, developers can easily test the app in all the devices listed in the Google data center. It can even do the testing without requiring any test code of the respective app. Notifications: This feature gives developers a console to manage and send user-focused custom notifications to the users. App Indexing: This feature allows developers to index the app in Google Search and achieve higher search ranks in app marketplaces like Play Store and App Store. Dynamic Links: Firebase also equips the app to create dynamic links or smart URLs to present the respective app across all digital platforms including social media, mobile app, web, email, and other channels. All the above-mentioned benefits and useful features that empower mobile app developers to create dynamic user experience helped Firebase achieve such unprecedented popularity among developers worldwide. No wonder, in a short time span it has become a very popular backend solution for so many successful cross-platform mobile apps. Some exemplary use cases of Firebases Here we have picked two use cases of Firebase, respectively for one relatively new and successful app and one leading app in its niche. Fabulous Fabulous is a unique app that trains users to dispose of bad habits and get used to good habits to ensure health and wellbeing. The app by customizing the onboarding process through Firebase managed to double the retention rate. The app could incorporate custom user experience for different groups of users as per their preference. Onefootball This leading mobile soccer app OneFootBall experienced more than 5% increase in user session time thanks to Firebase. The new backend solution powered by Firebase helped the game app engage the audience more efficiently than ever before. The custom contents created by this popular app can enjoy better traction with users thanks to higher engagement. Author Bio: Juned Ahmed works as an IT consultant at IndianAppDevelopers, a leading Mobile app development company which offers to hire app developers in India for mobile solutions. He has more than 10 years of experience in developing and implementing marketing strategies. How to integrate Firebase on Android/iOS applications natively. Build powerful progressive web apps with Firebase. How to integrate Firebase with NativeScript for cross-platform app development.

0
0
12603

article-image-why-uber-created-hudi-an-open-source-incremental-processing-framework-on-apache-hadoop

Bhagyashree R

19 Oct 2018

3 min read

Why did Uber created Hudi, an open source incremental processing framework on Apache Hadoop?

Bhagyashree R

19 Oct 2018

3 min read

In the process of rebuilding its Big Data platform, Uber created an open-source Spark library named Hadoop Upserts anD Incremental (Hudi). This library permits users to perform operations such as update, insert, and delete on existing Parquet data in Hadoop. It also allows data users to incrementally pull only the changed data, which significantly improves query efficiency. It is horizontally scalable, can be used from any Spark job, and the best part is that it only relies on HDFS to operate. Why is Hudi introduced? Uber studied its current data content, data access patterns, and user-specific requirements to identify problem areas. This research revealed the following four limitations: Scalability limitation in HDFS Many companies who use HDFS to scale their Big Data infrastructure face this issue. Storing large numbers of small files can affect the performance significantly as HDFS is bottlenecked by its NameNode capacity. This becomes a major issue when the data size grows above 50-100 petabytes. Need for faster data delivery in Hadoop Since Uber operates in real time, there was a need for providing services the latest data. It was important to make the data delivery much faster, as the 24-hour data latency was way too slow for many of their use cases. No direct support for updates and deletes for existing data Uber used snapshot-based ingestion of data, which means a fresh copy of source data was ingested every 24 hours. As Uber requires the latest data for its business, there was a need for a solution which supports update and delete operations for existing data. However, since their Big Data is stored in HDFS and Parquet, direct support for update operations on existing data is not available. Faster ETL and modeling ETL and modeling jobs were also snapshot-based, requiring their platform to rebuild derived tables in every run. ETL jobs also needed to become incremental to reduce data latency. How Hudi solves the aforementioned limitations? The following diagram shows Uber's Big Data platform after the incorporation of Hudi: Source: Uber Regardless of whether the data updates are new records added to recent date partitions or updates to older data, Hudi allows users to pass on their latest checkpoint timestamp and retrieve all the records that have been updated since. This data retrieval happens without running an expensive query that scans the entire source table. Using this library Uber has moved to an incremental ingestion model leaving behind the snapshot-based ingestion. As a result, the data latency was reduced from 24 hrs to less than one hour. To know about Hudi in detail, check out Uber’s official announcement. How can Artificial Intelligence support your Big Data architecture? Big data as a service (BDaaS) solutions: comparing IaaS, PaaS and SaaS Uber’s Marmaray, an Open Source Data Ingestion and Dispersal Framework for Apache Hadoop

0
0
11412

article-image-5-best-practices-to-perform-data-wrangling-with-python

Savia Lobo

18 Oct 2018

5 min read

5 best practices to perform data wrangling with Python

Savia Lobo

18 Oct 2018

5 min read

Data wrangling is the process of cleaning and structuring complex data sets for easy analysis and making speedy decisions in less time. Due to the internet explosion and the huge trove of IoT devices there is a massive availability of data, at present. However, this data is most often in its raw form and includes a lot of noise in the form of unnecessary data, broken data, and so on. Clean up of this data is essential in order to use it for analysis by organizations. Data wrangling plays a very important role here by cleaning this data and making it fit for analysis. Also, Python language has built-in features to apply any wrangling methods to various data sets to achieve the analytical goal. Here are 5 best practices that will help you out in your data wrangling journey with the help of Python. And at the end, all you’ll have is a clean and ready to use data for your business needs. 5 best practices for data wrangling with Python Learn the data structures in Python really well Designed to be a very high-level language, Python offers an array of amazing data structures with great built-in methods. Having a solid grasp of all the capabilities will be a potent weapon in your repertoire for handling data wrangling task. For example, dictionary in Python can act almost like a mini in-memory database with key-value pairs. It supports extremely fast retrieval and search by utilizing a hash table underneath. Explore other built-in libraries related to these data structures e.g. ordered dict, string library for advanced functions. Build your own version of essential data structures like stack, queues, heaps, and trees, using classes and basic structures and keep them handy for quick data retrieval and traversal. Learn and practice file and OS handling in Python How to open and manipulate files How to manipulate and navigate directory structure Have a solid understanding of core data types and capabilities of Numpy and Pandas How to create, access, sort, and search a Numpy array. Always think if you can replace a conventional list traversal (for loop) with a vectorized operation. This will increase speed of your data operation. Explore special file types like .npy (Numpy’s native storage) to access/read large data set with much higher speed than usual list. Know in details all the file types you can read using built-in Pandas methods. This will simplify to a great extent your data scraping. Almost all of these methods have great data cleaning and other checks built in. Try to use such optimized routines instead of writing your own to speed up the process. Build a good understanding of basic statistical tests and a panache for visualization Running some standard statistical tests can quickly give you an idea about the quality of the data you need to wrangle with. Plot data often even if it is multi-dimensional. Do not try to create fancy 3D plots. Learn to explore simple set of pairwise scatter plots. Use boxplots often to see the spread and range of the data and detect outliers. For time-series data, learn basic concepts of ARIMA modeling to check the sanity of the data Apart from Python, if you want to master one language, go for SQL As a data engineer, you will inevitably run across situations where you have to read from a large, conventional database storage. Even if you use Python interface to access such database, it is always a good idea to know basic concepts of database management and relational algebra. This knowledge will help you build on later and move into the world of Big Data and Massive Data Mining (technologies like Hadoop/Pig/Hive/Impala) easily. Your basic data wrangling knowledge will surely help you deal with such scenarios. Although Data wrangling may be the most time-consuming process, it is the most important part of the data management. Data collected by businesses on a daily basis can help them make decisions on the latest information available. It also allows businesses to find the hidden insights and use it in the decision-making processes and provide them with new analytic initiatives, improved reporting efficiency and much more. About the authors Dr. Tirthajyoti Sarkar works in San Francisco Bay area as a senior semiconductor technologist where he designs state-of-the-art power management products and applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. He has 15+ years of R&D experience and is a senior member of IEEE. Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup where he is applying the state-of-the-art Computer Vision and Data Engineering algorithms and tools to develop cutting edge product. Data cleaning is the worst part of data analysis, say data scientists Python, Tensorflow, Excel and more – Data professionals reveal their top tools Manipulating text data using Python Regular Expressions (regex)

0
0
7186

article-image-4-misconceptions-about-data-wrangling

Sugandha Lahoti

17 Oct 2018

4 min read

4 misconceptions about data wrangling

Sugandha Lahoti

17 Oct 2018

4 min read

Around 80% of the time in data analysis is spent on cleaning and preparing data for analysis. This is, however, an important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis, and reporting. Although, being an important task given its nature, there are certain myths associated with data wrangling which developers should be cautious of. In this post, we will discuss four such misconceptions. Myth #1: Data wrangling is all about writing SQL query There was a time when data processing needed data to be presented in a relational manner so that SQL queries could be written. Today, there are many other types of data sources in addition to the classic static SQL databases, which can be analyzed. Often, an engineer has to pull data from diverse sources such as web portals, Twitter feeds, sensor fusion streams, police or hospital records. Static SQL query can help only so much in those diverse domains. A programmatic approach, which is flexible enough to interface with myriad sources and is able to parse the raw data through clever algorithmic techniques and use of fundamental data structures (trees, graphs, hash tables, heaps), will be the winner. Myth #2: Knowledge of statistics is not required for data wrangling Quick statistical tests and visualizations are always invaluable to check the ‘quality’ of the data you sourced. These tests can help detect outliers and wrong data entry, without running complex scripts. For effective data wrangling, you don’t need to have knowledge of advanced statistics. However, you must understand basic descriptive statistics and know how to execute them using built-in Python libraries. Myth #3: You have to be a machine learning expert to do great data wrangling Deep knowledge of machine learning is certainly not a pre-requisite for data wrangling. It is true that the end goal of data wrangling is often to prepare the data so that it can be used in a machine learning task downstream. As a data wrangler, you do not have to know all the nitty-gritties of your project’s machine learning pipeline. However, it is always a good idea to talk to the machine learning expert who will use your data and understand the data structure interface and format he/she needs to run the model fast and accurately. Myth #4: Deep knowledge of programming is not required for data wrangling As explained above, the diversity and complexity of data sources require that you are comfortable with deep notions of fundamental data structures and how a programming language paradigm handles them. Increasing deep knowledge of the programming framework (Python for example) will surely help you to come up with innovative methods for dealing with data source interfacing and data cleaning issues. The speed and efficiency of your data processing pipeline can often be benefited from using advanced knowledge of basic algorithms e.g. search, sort, graph traversal, hash table building, etc. Although built-in methods in standard libraries are optimized, having this knowledge gives you an edge for any situation. You read a guest post from Tirthajyoti Sarkar and Shubhadeep Roychowdhury, the authors of Data Wrangling with Python. We hope that these misconceptions would help you realize that data wrangling is not as difficult as it seems. Have fun wrangling data! About the authors Dr. Tirthajyoti Sarkar works as a Sr. Principal Engineer in the semiconductor technology domain where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup. He holds a Master Degree in Computer Science from West Bengal University Of Technology and certifications in Machine Learning from Stanford. Don’t forget to check out Data Wrangling with Python to learn the essential basics of data wrangling using Python. 30 common data science terms explained Python, Tensorflow, Excel and more – Data professionals reveal their top tools How to create a strong data science project portfolio that lands you a job

0
0
3745

article-image-how-is-artificial-intelligence-changing-the-mobile-developer-role

Bhagyashree R

15 Oct 2018

10 min read

How is Artificial Intelligence changing the mobile developer role?

Bhagyashree R

15 Oct 2018

10 min read

0
0
6014

article-image-4-myths-about-git-and-github-you-should-know-about

Savia Lobo

07 Oct 2018

3 min read

4 myths about Git and GitHub you should know about

Savia Lobo

07 Oct 2018

3 min read

With an aim to replace BitKeeper, Linus Torvalds created Git in 2005 to support the development of the Linux kernel. However, Git isn’t necessarily limited to code, any product or project that requires or exhibits characteristics such as having multiple contributors, requiring release management and versioning stands to have an improved workflow through Git. Just as every solution or tool has its own positives and negatives, Git is also surrounded by myths. Alex Magana and Joseph Mul, the authors of Introduction to Git and GitHub course discuss in this post some of the myths about the Git tool and GitHub. Git is GitHub Due to the usage of Git and GitHub as the complete set that forms the version control toolkit, adopters of the two tools misconceive Git and GitHub as interchangeable tools. Git is a tool that offers the ability to track changes on files that constitute a project. Git offers the utility that is used to monitor changes and persists the changes. On the other hand, GitHub is akin to a website hosting service. The difference here is that with GitHub, the hosted content is a repository. The repository can then be accessed from this central point and the codebase shared. Backups are equivalent to version control This emanates from a misunderstanding of what version control is and by extension what Git achieves when it’s incorporated into the development workflow. Contrary to archives created based on a team’s backup policy, Git tracks changes made to files and maintains snapshots of a repository at a given point in time. Git is only suitable for teams With the usage of hosting services such as GitHub, the element of sharing and collaboration, may be perceived as a preserve of teams. Git offers gains beyond source control. It lends itself to the delivery of a feature or product from the point of development to deployment. This means that Git is a tool for delivery. It can, therefore, be utilized to roll out functionality and manage changes to source code for teams and individuals alike. To effectively use Git, you need to learn every command to work When working as an individual or a team, the common commands required to allow you to contribute a repository encompass commands for initiating tracking of specific files, persisting changes made to tracked files, reverting changes made to files incorporating changes introduced by other developers working on the same project you are on. The four myths mentioned by the authors provides a clarification on both Git and GitHub and its uses. If you found this post useful, do check out the course titled Introduction to Git and GitHub by Alex and Joseph. GitHub addresses technical debt, now runs on Rails 5.2.1 GitLab 11.3 released with support for Maven repositories, protected environments and more GitLab raises $100 million, Alphabet backs it to surpass Microsoft’s GitHub

0
0
3615

article-image-what-role-does-linux-play-in-securing-android-devices

Sugandha Lahoti

07 Oct 2018

9 min read

What role does Linux play in securing Android devices?

Sugandha Lahoti

07 Oct 2018

9 min read

In this article, we will talk about the Android Model particularly the Linux Kernel layer, over which Android is built. We will also talk about Android's security features and offerings and how Linux plays a role to secure Android OS. This article is taken from the book Practical Mobile Forensics - Third Edition by Rohit Tamma et al. In this book, you will investigate, analyze, and report iOS, Android, and Windows devices. The Android architecture Android is open source and the code is released under the Apache license. Practically, this means anyone (especially device manufacturers) can access it, freely modify it, and use the software according to the requirements of any device. This is one of the primary reasons for its wide acceptance. Notable players that use Android include Samsung, HTC, Sony, and LG. As with any other platform, Android consists of a stack of layers running one above the other. To understand the Android ecosystem, it's essential to have a basic understanding of what these layers are and what they do. The following figure summarizes the various layers involved in the Android software stack: Android architecture Each of these layers performs several operations that support specific operating system functions. Each layer provides services to the layers lying on top of it. The Linux kernel layer Android OS is built on top of the Linux kernel, with some architectural changes made by Google. There are several reasons for choosing the Linux kernel. Most importantly, Linux is a portable platform that can be compiled easily on different hardware. The kernel acts as an abstraction layer between the software and hardware present on the device. Consider the case of a camera click. What happens when you take a photo using the camera button on your device? At some point, the hardware instruction (pressing a button) has to be converted to a software instruction (to take a picture and store it in the gallery). The kernel contains drivers to facilitate this process. When the user presses on the button, the instruction goes to the corresponding camera driver in the kernel, which sends the necessary commands to the camera hardware, similar to what occurs when a key is pressed on a keyboard. In simple words, the drivers in the kernel command control the underlying hardware. The Linux kernel is responsible for managing the core functionality of Android, such as process management, memory management, security, and networking. Linux is a proven platform when it comes to security and process management. Android has taken leverage of the existing Linux open source OS to build a solid foundation for its ecosystem. Each version of Android has a different version of the underlying Linux kernel. The Marshmallow Android version is known to use Linux kernel 3.18.10, whereas the Nougat version is known to use Linux kernel 4.4.1. Android security Android was designed with a specific focus on security. Android as a platform offers and enforces certain features that safeguard the user data present on the mobile through multi-layered security. There are certain safe defaults that will protect the user, and certain offerings that can be leveraged by the development community to build secure applications. The following are issues that are to be kept in mind while incorporating Android security controls: Protecting user-related data Safeguarding the system resources Making sure that one application cannot access the data of another application The next few sections will help us understand more about Android's security features and offerings. Secure kernel Linux has evolved as a trusted platform over the years, and Android has leveraged this fact using it as its kernel. The user-based permission model of Linux has, in fact, worked well for Android. As mentioned earlier, there is a lot of specific code built into the Linux kernel. With each Android version release, the kernel version has also changed. The following table shows Android versions and their corresponding kernel versions: Android version Linux kernel version 1 2.6.25 1.5 2.6.27 1.6 2.6.29 2.2 2.6.32 2.3 2.6.35 3.0 2.6.36 4.0 3.0.1 4.1 3.0.31 4.2 3.4.0 4.2 3.4.39 4.4 3.8 5.0 3.16.1 6.0 3.18.1 7.0 4.4.1 The permission model As shown in the following screenshot, any Android application must be granted permissions to access sensitive functionality, such as the internet, dialer, and so on, by the user. This provides an opportunity for the user to know in advance which functions on the device is being accessed by the application. Simply put, it requires the user's permission to perform any kind of malicious activity (stealing data, compromising the system, and so on). This model helps the user to prevent attacks, but if the user is unaware and gives away a lot of permissions, it leaves them in trouble (remember, when it comes to installing malware on any device, the weakest link is always the user). Until Android 6.0, users needed to grant the permissions during install time. Users had to either accept all the permissions or not install the application. But, starting from Android 6.0, users grant permissions to apps while the app is running. This new permission system also gives the user more control over the app's functionality by allowing the user to grant selective permissions. For example, a user can deny a particular app access to his location but provide access to the internet. The user can revoke the permissions at any time by going to the app's Settings screen. Application sandbox In Linux systems, each user is assigned a unique user ID (UID), and users are segregated so that one user cannot access the data of another user. However, all applications under a particular user are run with the same privileges. Similarly, in Android, each application runs as a unique user. In other words, a UID is assigned to each application and is run as a separate process. This concept ensures an application sandbox at the kernel level. The kernel manages the security restrictions between the applications by making use of existing Linux concepts, such as UID and GID. If an application attempts to do something malicious, say to read the data of another application, this is not permitted as the application does not have user privileges. Hence, the operating system protects an application from accessing the data of another application. Secure inter-process communication Android offers secure inter-process communication through which one's activity in an application can send messages to another activity in the same application or a different application. To achieve this, Android provides inter-process communication (IPC) mechanisms: intents, services, content providers, and so on. Application signing It is mandatory that all of the installed applications are digitally signed. Developers can place their applications in Google's Play Store only after signing the applications. The private key with which the application is signed is held by the developer. Using the same key, a developer can provide updates to their application, share data between the applications, and so on. Security-Enhanced Linux Security-Enhanced Linux (SELinux) is a security feature that was introduced in Android 4.3 and fully enforced in Android 5.0. Until this addition, Android security was based on Discretionary Access Control (DAC), which means applications can ask for permissions, and users can grant or deny those permissions. Thus, malware can create havoc on phones by gaining those permissions. But, SE Android uses Mandatory Access Control (MAC), which ensures that applications work in isolated environments. Hence, even if a user installs a malware app, the malware cannot access the OS and corrupt the device. SELinux is used to enforce MAC over all the processes, including the ones running with root privileges. SELinux operates on the principle of default denial: anything that is not explicitly allowed is denied. SELinux can operate in one of the two global modes: permissive mode, in which permission denials are logged but not enforced, and enforcing mode, in which denials are both logged and enforced. Full Disk Encryption With Android 6.0 Marshmallow, Google has mandated Full Disk Encryption (FDE) for most devices, provided that the hardware meets certain minimum standards. Encryption is the process of converting data into cipher text using a secret key. On Android devices, full disk encryption refers to the process of encrypting all user data using a secret key. This key is then encrypted by the lock screen PIN/pattern/password before being securely stored in a trusted location. Once a device is encrypted, all user-created data is automatically encrypted before writing it to disk, and all reads automatically decrypt data before returning it to the calling process. Full disk encryption in Android works only with an Embedded Multimedia Card (eMMC) and similar flash devices that present themselves to the kernel as block devices. Staring from Android 7.x, Google decided to shift the encryption feature from full-disk encryption to file-based encryption. In file-based encryption, different files are encrypted with different keys. By doing so, those files can be unlocked independently without requiring an entire partition to be decrypted at once. As a result of this, the system can now decrypt and use files needed to boot the system, and open notifications without having to wait until the user unlocks the phone. Trusted Execution Environment Trusted Execution Environment (TEE) is an isolated area (typically a separate microprocessor) intended to guarantee the security of data stored inside it, and also to execute code with integrity. The main processor on mobile devices is considered untrusted and cannot be used to store secret data (such as cryptographic keys). Hence, TEE is used specifically to perform such operations, and the software running on the main processor delegates any operations that require the use of secret data to the TEE processor. Thus we talked about the Linux Kernel layer, over which Android is built. We also talked about Android's security features and offerings and how Linux plays a role to secure Android OS. To learn more about methods for accessing the data stored on Android devices, read our book Practical Mobile Forensics - Third Edition. The kernel community attempting to make Linux more secure. Google open sources Filament – a physically based rendering engine for Android, Windows, Linux and macOS Google becomes a new platinum member of the Linux Foundation

0
0
9624

article-image-6-common-use-cases-of-reverse-proxy-scenarios

Guest Contributor

05 Oct 2018

6 min read

6 common use cases of Reverse Proxy scenarios

Guest Contributor

05 Oct 2018

6 min read

Proxy servers are used as intermediaries between a client and a website or online service. By routing traffic through a proxy server, users can disguise their geographic location and their IP address. Reverse proxies, in particular, can be configured to provide a greater level of control and abstraction, thereby ensuring the flow of traffic between clients and servers remains smooth. This makes them a popular tool for individuals who want to stay hidden online, but they are also widely used in enterprise settings, where they can improve security, allow tasks to be carried out anonymously, and control the way employees are able to use the internet. What is a Reverse Proxy? A reverse proxy server is a type of proxy server that usually exists behind the firewall of a private network. It directs any client requests to the appropriate server on the backend. Reverse proxies are also used as a means of caching common content and compressing inbound and outbound data, resulting in a faster and smoother flow of traffic between clients and servers. Furthermore, the reverse proxy can handle other tasks, such as SSL encryption, further reducing the load on web servers. There is a multitude of scenarios and use cases in which having a reverse proxy can make all the difference to the speed and security of your corporate network. By providing you with a point at which you can inspect traffic and route it to the appropriate server, or even transform the request, a reverse proxy can be used to achieve a variety of different goals. Load Balancing to route incoming HTTP requests This is probably the most familiar use of reverse proxies for many users. Load balancing involves the proxy server being configured to route incoming HTTP requests to a set of identical servers. By spreading incoming requests across these servers, the reverse proxies are able to balance out the load, therefore sharing it amongst them equally. The most common scenario in which load balancing is employed is when you have a website that requires multiple servers. This happens due to the volume of requests, which are too much for one server to handle efficiently. By balancing the load across multiple servers, you can also move away from an architecture that features a single point of failure. Usually, the servers will all be hosting the same content, but there are also situations in which the reverse proxy will also be retrieving specific information from one of a number of different servers. Provide security by monitoring and logging traffic By acting as the mediator between clients and your system’s backend, a reverse proxy server can hide the overall structure of your backend servers. This is because the reverse proxy will capture any requests that would otherwise go to those servers and handle them securely. A reverse proxy can also improve security by providing businesses with a point at which they can monitor and log traffic flowing through their network. A common use case in which a reverse proxy is used to bolster the security of a network would be the use of a reverse proxy as an SSL gateway. This allows you to communicate using HTTP behind the firewall without compromising your security. It also saves you the trouble of having to configure security for each server behind the firewall individually. A rotating residential proxy, also known as a backconnect proxy, is a type of proxy that frequently changes the IP addresses and connections that the user uses. This allows users to hide their identity and generate a large number of requests without setting alarms off. A reverse rotating residential proxy can be used to improve the security of a corporate network or website. This is because the servers in question will display the information for the proxy server while keeping their own information hidden from potential attackers. No need to install certificates on your backend servers with SSL Termination SSL termination process occurs when an SSL connection server ends, or when the traffic shifts between encrypted and unencrypted requests. By using a reverse proxy to handle any incoming HTTPS connections, you can have the proxy server decrypt the request, and then pass on the unencrypted request to the appropriate server. Taking this approach offers practical benefits. For example, it eliminates the need to install certificates on your backend servers. It also provides you with a single configuration point for managing SSL/TLS. Removing the need for your web servers to undertake this decryption means that you are also reducing the processing load on the server. Serve static content on behalf of backend servers Some reverse proxy servers can be configured to also act as web servers. Websites contain a mixture of dynamic content, which changes over time, and static content, which always remains the same. If you can configure your reverse proxy server to serve up static content on behalf of backend servers, you can greatly reduce the load, freeing up more power for dynamic content rendering. Alternatively, a reverse proxy can be configured to behave like a cache. This allows it to store and serve content that is frequently requested, thereby further reducing the load on backend servers. URL Rewriting before they go on to the backend servers Anything that a business can do to easily to improve their SEO score is worth considering. Without an investment in your SEO, your business or website will remain invisible to search engine users. With URL rewriting, you can compensate for any legacy systems you use, which produce URLs that are less than ideal for SEO. With a reverse proxy server, the URLs can be automatically reformatted before they are passed on to the backend servers. Combine Different Websites into a Single URL Space It is often desirable for a business to adopt a distributed architecture whereby different functions are handled by different components. With a reverse proxy, it is easy to route a single URL to a multitude of components. To anyone who uses your URL, it will simply appear as if they are moving to another page on the website. In fact, each page within that URL might actually be connecting to a completely different backend service. This is an approach that is widely used for web service APIs. To sum up, the primary function of a reverse proxy is load balancing, ensuring that no individual backend server becomes inundated with more traffic or requests than it can handle. However, there are a number of other scenarios in which a reverse proxy can potentially offer enormous benefits. About the author Harold Kilpatrick is a cybersecurity consultant and a freelance blogger. He's currently working on a cybersecurity campaign to raise awareness around the threats that businesses can face online. Read Next HAProxy introduces stick tables for server persistence, threat detection, and collecting metrics How to Configure Squid Proxy Server Acting as a proxy (HttpProxyModule)

0
0
26906

article-image-what-is-statistical-analysis-and-why-does-it-matter

Sugandha Lahoti

02 Oct 2018

6 min read

What is Statistical Analysis and why does it matter?

Sugandha Lahoti

02 Oct 2018

6 min read

0
0
4421

article-image-9-reasons-to-choose-agile-methodology-for-mobile-app-development

Guest Contributor

01 Oct 2018

6 min read

9 reasons to choose Agile Methodology for Mobile App Development

Guest Contributor

01 Oct 2018

6 min read

0
0
26106

article-image-is-youtubes-ai-algorithm-evil

Amarabha Banerjee

30 Sep 2018

6 min read

Is YouTube's AI Algorithm evil?

Amarabha Banerjee

30 Sep 2018

6 min read

YouTube is at the center of content creation, content distribution, and advertising activities for some time now. The impact of YouTube can be estimated from the 1.8 billion YouTube users worldwide. While the YouTube video hosting concept has been a great success story for content creators, the video viewing and recommendation model has been in the middle of a brewing controversy lately. The Controversy Logan Paul was already a top rated YouTube star when he stumbled across a hanging dead body in a Japanese forest which is famous as a suicide spot. After the initial shock and awe, Logan Paul seemed quite amused and commented “Dude, his hands are purple,” then he turned to his friends and giggled. “You ever stand next to a dead guy?”. This particular instance was a shocking moment for YouTubers all across the globe. Disapproving reactions had poured in and the video was taken down 24 hours later by YouTube. In those 24 hours, the video managed to garner 6 million views. Even after the furious backlash, users complained that they were still seeing recommendations of Logan Paul’s videos. That brought the emphasis back on the recommendation system that YouTube uses. YouTube Video Recommendation Back in 2005, when YouTube first started out, it had a uniform homepage for all users. This meant that every YouTube user would see the same homepage and the creators who would feature there, would get a huge boost in their viewership. Their selection was based on their subscriber count, views and user engagement metrics e.g. likes, comments, shares etc. This inspired other users to become creators and start contributing content to become a part of the YouTube family. In 2006, YouTube was bought by Google. Their policies and homepage started evolving gradually. As ads started showing on YouTube videos, the scenario changed quite quickly. Also, with the rapid rise in the number of users, Google had thought it to be a good idea to curate the homepage as per each user’s watch history, subscriptions, and likes. This was a good move in principle since it helped the users to see what they wanted to see. As a part of their next level innovation, a machine learning model was created to suggest or recommend videos to users. The goal of this deep neural network based recommendation engine was to increase watch time of every video so that users stay longer on the platform. What did it change and How When Youtube’s machine learning algorithm shows a few videos in your feed as “Recommended for you”, it predicts what you want to see from your watch history and watch history of similar users. If you interact with any of these videos and watch it for a certain amount of time, the recommendation engine considers it as a success and starts curating a list based on your interactions with its suggested videos. The more data it gathers about your choices and watch history, the more confident it becomes of its own video decisions. The major goal of Youtube’s recommendation engine is to attract your attention and get you hooked to the platform to get more watch time. More watch time means more revenue and more scope for targeted ads. What this changes, is the fundamental concept of choice and the exercising of user discretion. The moment the YouTube Algorithm considers watch time as the most important metric to recommend videos to you, less importance goes into the organic interactions on YouTube, which includes liking, commenting and subscribing to videos and channels. Users get to see video recommendations based on the YouTube Algorithm’s user understanding and its goal of maximizing watch time, with less importance given to user choices. Distorted Reality and YouTube This attention maximizing model is the fundamental working mechanism of mostly all social media networks. But YouTube has not been implicated in the accusation of distorting reality and spreading the fake news as much as Facebook has been in mainstream media. But times are changing and so are the viewpoints related to YouTube’s influence on the global population and its ability to manipulate important public opinion. Guillaume Chaslot, a 36-year-old French computer programmer with a Ph.D. in artificial intelligence, was one of those engineers who was in the core team to develop and perfect the YouTube algorithm. In his own words “YouTube is something that looks like reality, but it is distorted to make you spend more time online. The recommendation algorithm is not optimizing for what is truthful, or balanced, or healthy for democracy.” Chaslot explains that the algorithm never stays the same. It is constantly changing the weight it gives to different signals; the viewing patterns of a user, for example, or the length of time a video is watched before someone clicks away.” Chaslot was fired by Google in 2013 over performance issues. His claim was that he wanted to bring about a change in the approach of the YouTube algorithm to make it more aligned with democratic values instead of being devoted to just increasing the watch time. Where are we headed I am not qualified or righteous enough to answer the direct question - is YouTube good or bad. YouTube creates opportunities for millions of creators worldwide to showcase their talent and present it to a global audience without worrying about country or boundaries. This itself is a huge power for an internet application. But the crucial point to remember here is whether YouTube is using this power to just make the users glued to the screen. Do they really care if you are seeing divisive content or prejudiced flat earther conspiracies as recommended videos? The algorithm can be tweaked to include parameters which will remove unintended bias such as whether a video is propagating fake news or influencing voters minds in an unlawful way. But that is near impossible as machines lack morality or empathy or even common sense. To incorporate humane values such as honesty and morality into an AI system is like creating an AI that is more human than a machine. This is why machine augmented human intelligence will play a more and more crucial role in the near future. The possibilities are endless, be it good or bad. Whether we progress or digress, might not be in our hands anymore. But what might be in our hands is to come together to put effective checkpoints to identify and course correct scenarios where algorithms rule wild. Sex robots, artificial intelligence, and ethics: How desire shapes and is shaped by algorithms Like newspapers, Google algorithms are protected by the First amendment California replaces cash bail with algorithms

0
0
7801

Savia Lobo

28 Sep 2018

5 min read

What is Core ML?

Savia Lobo

28 Sep 2018

5 min read

Introduced by Apple, CoreML is a machine learning framework that powers the iOS app developers to integrate machine learning technology into their apps. It supports natural language processing (NLP), image analysis, and various other conventional models to provide a top-notch on-device performance with minimal memory footprint and power consumption. This article is an extract taken from the book Machine Learning with Core ML written by Joshua Newnham. In this article, you will get to know the basics of what CoreML is and its typical workflow. With the release of iOS 11 and Core ML, performing inference is just a matter of a few lines of code. Prior to iOS 11, inference was possible, but it required some work to take a pre-trained model and port it across using an existing framework such as Accelerate or metal performance shaders (MPSes). Accelerate and MPSes are still used under the hood by Core ML, but Core ML takes care of deciding which underlying framework your model should use (Accelerate using the CPU for memory-heavy tasks and MPSes using the GPU for compute-heavy tasks). It also takes care of abstracting a lot of the details away; this layer of abstraction is shown in the following diagram: There are additional layers too; iOS 11 has introduced and extended domain-specific layers that further abstract a lot of the common tasks you may use when working with image and text data, such as face detection, object tracking, language translation, and named entity recognition (NER). These domain-specific layers are encapsulated in the Vision and natural language processing (NLP) frameworks; we won't be going into any details of these frameworks here, but you will get a chance to use them in later chapters: It's worth noting that these layers are not mutually exclusive and it is common to find yourself using them together, especially the domain-specific frameworks that provide useful preprocessing methods we can use to prepare our data before sending to a Core ML model. So what exactly is Core ML? You can think of Core ML as a suite of tools used to facilitate the process of bringing ML models to iOS and wrapping them in a standard interface so that you can easily access and make use of them in your code. Let's now take a closer look at the typical workflow when working with Core ML. CoreML Workflow As described previously, the two main tasks of a ML workflow consist of training and inference. Training involves obtaining and preparing the data, defining the model, and then the real training. Once your model has achieved satisfactory results during training and is able to perform adequate predictions (including on data it hasn't seen before), your model can then be deployed and used for inference using data outside of the training set. Core ML provides a suite of tools to facilitate getting a trained model into iOS, one being the Python packaged released called Core ML Tools; it is used to take a model (consisting of the architecture and weights) from one of the many popular packages and exporting a .mlmodel file, which can then be imported into your Xcode project. Once imported, Xcode will generate an interface for the model, making it easily accessible via code you are familiar with. Finally, when you build your app, the model is further optimized and packaged up within your application. A summary of the process of generating the model is shown in the following diagram: The previous diagram illustrates the process of creating the .mlmodel;, either using an existing model from one of the supported frameworks, or by training it from scratch. Core ML Tools supports most of the frameworks, either internal or as third party plug-ins, including Keras, Turi, Caffe, scikit-learn, LibSVN, and XGBoost frameworks. Apple has also made this package open source and modular for easy adaption for other frameworks or by yourself. The process of importing the model is illustrated in this diagram: In addition; there are frameworks with tighter integration with Core ML that handle generating the Core ML model such as Turi Create, IBM Watson Services for Core ML, and Create ML. We will be introducing Create ML in chapter 10; for those interesting in learning more about Turi Create and IBM Watson Services for Core ML then please refer to the official webpages via the following links: Turi Create; https://github.com/apple/turicreate IBM Watson Services for Core ML; https://developer.apple.com/ibm/ Once the model is imported, as mentioned previously, Xcode generates an interface that wraps the model, model inputs, and outputs. Thus, in this post, we learned about the workflow of training and how to import a model. If you've enjoyed this post, head over to the book Machine Learning with Core ML to delve into the details of what this model is and what Core ML currently supports. Emotional AI: Detecting facial expressions and emotions using CoreML [Tutorial] Build intelligent interfaces with CoreML using a CNN [Tutorial] Watson-CoreML: IBM and Apple’s new machine learning collaboration project

0
0
6147

Uses of Machine Learning in Gaming

How artificial intelligence can improve pentesting

Julia for machine learning. Will the new language pick up pace?

4 key benefits of using Firebase for mobile app development

Why did Uber created Hudi, an open source incremental processing framework on Apache Hadoop?

5 best practices to perform data wrangling with Python

4 misconceptions about data wrangling

How is Artificial Intelligence changing the mobile developer role?

4 myths about Git and GitHub you should know about

What role does Linux play in securing Android devices?

Trending Topics

6 common use cases of Reverse Proxy scenarios

What is Statistical Analysis and why does it matter?

9 reasons to choose Agile Methodology for Mobile App Development

Is YouTube's AI Algorithm evil?

What is Core ML?