Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Tech Guides - Data

281 Articles
article-image-facebook-plans-to-use-bloomsbury-ai-to-fight-fake-news
Pravin Dhandre
30 Jul 2018
3 min read
Save for later

Facebook plans to use Bloomsbury AI to fight fake news

Pravin Dhandre
30 Jul 2018
3 min read
“Our investments in AI mean we can now remove more bad content quickly because we don't have to wait until after it's reported. It frees our reviewers to work on cases where human expertise is needed to understand the context or nuance of a situation. In Q1, for example, almost 90% of graphic violence content that we removed or added a warning label to was identified using AI. This shift from reactive to proactive detection is a big change -- and it will make Facebook safer for everyone.” Mark Zuckerberg, in Facebook’s earnings, call on Wednesday this week To understand the significance of the above statement, we must first look at the past. Last year, Social media giant Facebook suffered from multiple lawsuits across the UK, Germany, and US for defamation due to fake news articles and for spreading misleading information. To make amends, Facebook came up with fake news identification tools, however, failed to completely tame the effects of bogus news. In fact, the company’s revenue took a bad hit in advertising revenue along with its social reputation nosediving. Early this month, Facebook confirmed the acquisition of Bloomsbury AI, a London-based artificial intelligence start-up with over 60 patents acquired to date. Bloomsbury AI focuses on natural language processing - developing machine reading methods that can understand written text across a broad range of domains. The Artificial Intelligence team at Facebook would be on-boarding the complete team of Bloomsbury AI and will build highly robust methods to kill the plague of fake news throughout the Facebook platform. The rich expertise carried over by the Bloomsbury AI team will strengthen Facebook's endeavor in natural language processing research and gauge deeper understanding of natural language and its applications. It appears that the amalgamation will help Facebook to develop advanced machine reading, reasoning and question answering methods which will boost the Facebook’s NLP engine to understand the legitimacy of questions across a broad range of topics and make intellect choices thereby defeating the challenges of fake news and Autobots. No doubt, Facebook is going to leverage the Bloomsbury’s Cape service to answer a majority of the questions on unstructured text. The duo would play a significant role in parsing the content majorly to tackle fake photos and videos too. In addition, it has been said that the new team members would provide an active contribution to the ongoing artificial intelligence projects such as AI hardware chips, AI technology mimicking humans and many more. Facebook is investigating data analytics firm Crimson Hexagon over misuse of data Google, Microsoft, Twitter, and Facebook team up for Data Transfer Project Did Facebook just have another security scare?
Read more
  • 0
  • 0
  • 3719

article-image-why-should-enterprises-use-splunk
Sunith Shetty
25 Jul 2018
4 min read
Save for later

Why should enterprises use Splunk?

Sunith Shetty
25 Jul 2018
4 min read
Splunk is a multinational software company that offers its core platform, Splunk Enterprise, as well as many related offerings built on the Splunk platform. The platform helps a wide variety of organizational personas, such as analysts, operators, developers, testers, managers, and executives. They get analytical insights from machine-created data. It collects, stores, and provides powerful analytical capabilities, enabling organizations to act on often powerful insights derived from this data. The Splunk Enterprise platform was built with IT operations in mind. When companies had IT infrastructure problems, troubleshooting and solving problems was immensely difficult, complicated, and manual. It was built to collect and make log files from IT systems searchable and accessible. It is commonly used for information security and development operations, as well as more advanced use cases for custom machines, Internet of Things, and mobile devices. Most organizations will start using Splunk in one of three areas: IT operations management, information security, or development operations (DevOps). In today's post, we will understand the thoughts, concepts, and ideas to apply Splunk to an organization level. This article is an excerpt from a book written by J-P Contreras, Erickson Delgado and Betsy Page Sigman titled Splunk 7 Essentials, Third Edition. IT operations IT operations have moved from predominantly being a cost center to also being a revenue center. Today, many of the world's oldest companies also make money based on IT services and/or systems. As a result, the delivery of these IT services must be monitored and, ideally, proactively remedied before failures occur. Ensuring that hardware such as servers, storage, and network devices are functioning properly via their log data is important. Organizations can also log and monitor mobile and browser-based software applications for any issues from software. Ultimately, organizations will want to correlate these sets of data together to get a complete picture of IT Health. In this regard, Splunk takes the expertise accumulated over the years and offers a paid-for application known as IT Server Intelligence (ITSI) to help give companies a framework for tackling large IT environments. Complicating matters for many traditional organizations is the use of Cloud computing technologies, which now drive log captured from both internally and externally hosted systems. Cybersecurity With the relentless focus in today's world on cybersecurity, there is a good chance your organization will need a tool such as Splunk to address a wide variety of Information Security needs as well. It acts as a log data consolidation and reporting engine, capturing essential security-related log data from devices and software, such as vulnerability scanners, phishing prevention, firewalls, and user management and behavior, just to name a few. Companies need to ensure they are protected from external as well as internal threats, and as a result offer the paid-for applications enterprise security and User behavior analytics (UBA). Similar to ITSI, these applications deliver frameworks to help companies meet their specific requirements in these areas. In addition to cyber-security to protect the business, often companies will have to comply with, and audit against, specific security standards, which can be industry-related, such as PCI compliance of financial transactions; customer-related, such as National Institute of Standards and Technologies (NIST) requirements in working with the the US government; or data privacy-related, such as the Health Insurance Portability and Accountability Act (HIPAA) or the European Union's General Data Protection Regulation (GPDR). Software development and support operations Commonly referred to as DevOps, Splunk's ability to ingest and correlate data from many sources solves many challenges faced in software development, testing, and release cycles. Using Splunk will help teams provide higher quality software more efficiently. Then, with the controls into the software in place, it will provide visibility into released software, its use and user behavior changes, intended or not. This set of use cases is particularly applicable to organizations that develop their own software. Internet of Things Many organizations today are looking to build upon the converging trends in computing, mobility and wireless communications and data to capture data from more and more devices. Examples can include data captured from sensors placed on machinery such as wind turbines, trains, sensors, heating, and cooling systems. These sensors provide access to the data they capture in standard formats such as JavaScript Object Notation (JSON) through application programming interfaces (APIs). To summarize, we saw how Splunk can be used at an organizational level for IT operations, cybersecurity, software development and support and the IoTs. To know more about how Splunk can be used to make informed decisions in areas such as IT operations, information security, and the Internet of Things., do checkout this book Splunk 7 Essentials, Third Edition. Create a data model in Splunk to enable interactive reports and dashboards Splunk leverages AI in its monitoring tools Splunk Industrial Asset Intelligence (Splunk IAI) targets Industrial IoT marketplace
Read more
  • 0
  • 0
  • 24833

article-image-what-is-interactive-machine-learning
Amey Varangaonkar
23 Jul 2018
4 min read
Save for later

What is interactive machine learning?

Amey Varangaonkar
23 Jul 2018
4 min read
Machine learning is a useful and effective tool to have when it comes to building prediction models or to build a useful data structure from an avalanche of data. Many ML algorithms are in use today for a variety of real-world use cases. Given a sample dataset, a machine learning model can give predictions with only certain accuracy, which largely depends on the quality of the training data fed to it. Is there a way to increase the prediction accuracy by somehow involving humans in the process? The answer is yes, and the solution is called as ‘Interactive Machine Learning’. Why we need interactive machine learning As we already discussed above, a model can give predictions only as good as the quality of the training data fed to it. If the quality of the training data is not good enough, the model might: Take more time to learn and then give accurate predictions Quality of predictions will be very poor This challenge can be overcome by involving humans in the machine learning process. By incorporating human feedback in the model training process, it can be trained faster and more efficiently to give more accurate predictions. In the widely adopted machine learning approaches, including supervised and unsupervised learning or even active learning for that matter, there is no way to include human feedback in the training process to improve the accuracy of predictions. In case of supervised learning, for example, the data is already pre-labelled and is used without any actual inputs from the human during the training process. For this reason alone, the concept of interactive machine learning is seen by many machine learning and AI experts as a breakthrough. How interactive machine learning works Machine Learning Researchers Teng Lee, James Johnson and Steve Cheng have suggested a novel way to include human inputs to improve the performance and predictions of the machine learning model. It has been called as the ‘Transparent Boosting Tree’ algorithm, which is a very interesting approach to combine the advantages of machine learning and human inputs in the final decision making process. The Transparent Boosting Tree, or TBT in short, is an algorithm that would visualize the model and the prediction details of each step in the machine learning process to the user, take his/her feedback, and incorporate it into the learning process. The ML model is in charge of updating the assigned weights to the inputs, and filtering the information shown to the user for his/her feedback. Once the feedback is received, it can be incorporated by the ML model as a part of the learning process, thus improving it. A basic flowchart of the interactive machine learning process is as shown: Interactive Machine Learning More in-depth information on how interactive machine learning works can be found in their paper. What can Interactive machine learning do for businesses With the rising popularity and applications of AI across all industry verticals, humans may have a key role to play in the learning process of an algorithm, apart from just coding it. While observing the algorithm’s own outputs or evaluations in the form of visualizations or plain predictions, humans can suggest way to to improve that prediction by giving feedback in the form of inputs such as labels, corrections or rankings. This helps the models in two ways: Increases the prediction accuracy Time taken for the algorithm to learn is shortened considerably Both the advantages can be invaluable to businesses, as they look to incorporate AI and machine learning in their processes, and look for faster and more accurate predictions. Interactive Machine Learning is still in its nascent stage and we can expect more developments in the domain to surface in the coming days. Once production-ready, it will undoubtedly be a game-changer. Read more Active Learning: An approach to training machine learning models efficiently Anatomy of an automated machine learning algorithm (AutoML) How machine learning as a service is transforming cloud
Read more
  • 0
  • 0
  • 12801
Banner background image

article-image-can-cryptocurrency-establish-a-new-economic-world-order
Amarabha Banerjee
22 Jul 2018
5 min read
Save for later

Can Cryptocurrency establish a new economic world order?

Amarabha Banerjee
22 Jul 2018
5 min read
Cryptocurrency has already established one thing - there is a viable alternative to dollars and gold as a measure of wealth. Our present economic system is flawed. Cryptocurrencies, if utilized properly, can change the way the world deals with money and wealth. But can it completely overthrow the present system and create a new economic world order? To know the answer to this we will have to understand the concept of cryptocurrencies and the premise for their creation. Money - The weapon to control the world Money is a measure of wealth, which translates into power. The power centers have largely remained the same throughout history, be it a monarchy, or autocracy or democracy. Power has shifted from one king to one dictator, to a few elected/selected individuals. To remain in power, they had to control the source and distribution of money. That’s why till date, only the government can print money and distribute it among citizens. We can earn money in exchange for our time and skills or loan money in exchange for our future time. But there’s only so much of time that we can give away and hence the present day economy always runs on the philosophy of scarcity and demand. The money distribution follows a trickle down approach in a pyramid structure. Source: Credit Suisse Inception of Cryptocurrency - Delocalization of money It’s abundantly clear from the image above that while printing of money is under the control of the powerful and the wealth creators, the pyramidal distribution mechanism also has ensured very less money flows to the bottom most segments of the population. The money creators have been ensuring their safety and prosperity throughout history, by accumulating chunks of money for themselves. Subsequently, the global wealth gap has increased staggeringly. This could have possibly triggered the rise of cryptocurrencies, as a form of an alternative economic system, that theoretically, doesn’t just accumulate at the top, but also rewards anyone who is interested in mining these currencies and spending their time and resources. The main concept that made this possible was the distributed computing mechanism which has gained tremendous interest in recent times. Distributed Computing, Blockchain & the possibilities The foundation of our present economic system is a central power, be it government or a ruler or dictator. The alternative of this central system is a distributed system, where every single node of communication contains the power of decision making and is equally important for the system. So if one node is cut-off, the system will not fall apart, it will keep on functioning. That’s what makes distributed computing terrifying for the centralized economic systems. Because they can’t just attack the creator of the system or use a violent hack to bring down the entire system. Source: Medium.com When the white paper on Cryptocurrencies was first published by the anonymous Satoshi Nakamoto, there was this hope of constituting a parallel economy, where any individual with an access to a mobile phone and internet might be able to mine bitcoins and create wealth, for not just himself/herself, but for the system also. Satoshi himself invented the concept of Blockchain, an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable and permanent way. Blockchain was the technology on top of which the first unit of Cryptocurrency, Bitcoins, were created. The concept of Bitcoin mining seemed revolutionary at that time. The more people that joined the system, the more enriched the system would become. The hope was that it would make the mainstream economic system take note and cause a major overhaul of the wealth distribution system. But sadly, none of that seems to have taken place yet. The phase of Disillusionment The reality is that bitcoin mining capabilities were controlled by system resources. The creators also had accumulated enough bitcoins for themselves similar to the traditional wealth creation system. Satoshi’s Bitcoin holdings were valued at $19.4 Billion during the Dec 2017 peak, making him the 44th richest person in the world during that time. This basically meant that the wealth distribution system was at fault again, very few could get their hands onto Bitcoins as their prices in traditional currencies had climbed. The government then duly played their part in declaring that trading in Bitcoins was illegal, cracking down on several cryptocurrency top guns. Recently different countries have joined the bandwagon to ban Cryptocurrency. Hence the value is much less now. The major concern is that the skepticism in public minds might kill the hype earlier than anticipated. Source: Bitcoin.com The Future and Hope for a better Alternative What we must keep in mind is that Bitcoins are just a derivative of the concept of Cryptocurrencies. The primary concept of distributed systems and the resulting technology - Blockchain, is still a very viable and novel one. The problem in the current Bitcoin system is the distribution mechanism. Whether we would be able to tap into the distributed system concept and create a better version of the Bitcoin model, only time will tell. But for the sake of better wealth propagation and wealth balance, we can only hope that this realignment of economic system happens sooner than later. Blockchain can solve tech’s trust issues – Imran Bashir A brief history of Blockchain Crypto-ML, a machine learning powered cryptocurrency platform
Read more
  • 0
  • 0
  • 5595

article-image-polyglot-persistence-what-is-it-and-why-does-it-matter
Richard Gall
21 Jul 2018
3 min read
Save for later

Polyglot persistence: what is it and why does it matter?

Richard Gall
21 Jul 2018
3 min read
Polyglot persistence is a way of storing data. It's an approach that acknowledges that often there is no one size fits all solution to data storage. From the types of data you're trying to store to your application architecture, polyglot persistence is a hybrid solution to data management. Think of polyglot programming. If polyglot programming is about using a variety of languages according to the context in which your working, polyglot persistence is applying that principle to database architecture. For example, storing transactional data in Hadoop files is possible, but makes little sense. On the other hand, processing petabytes of Internet logs using a Relational Database Management System (RDBMS) would also be ill-advised. These tools were designed to tackle specific types of tasks; even though they can be co-opted to solve other problems, the cost of adapting the tools to do so would be enormous. It is a virtual equivalent of trying to fit a square peg in a round hole. Polyglot persistence: an example For example, consider a company that sells musical instruments and accessories online (and in a network of shops). At a high-level, there are a number of problems that a company needs to solve to be successful: Attract customers to its stores (both virtual and physical). Present them with relevant products (you would not try to sell a drum kit to a pianist, would you?!). Once they decide to buy, process the payment and organize shipping. To solve these problems a company might choose from a number of available technologies that were designed to solve these problems: Store all the products in a document-based database such as MongoDB, Cassandra, DynamoDB, or DocumentDB. There are multiple advantages of document databases: flexible schema, sharding (breaking bigger databases into a set of smaller, more manageable ones), high availability, and replication, among others. Model the recommendations using a graph-based database (such as Neo4j, Tinkerpop/Gremlin, or GraphFrames for Spark): such databases reflect the factual and abstract relationships between customers and their preferences. Mining such a graph is invaluable and can produce a more tailored offering for a customer. For searching, a company might use a search-tailored solution such as Apache Solr or ElasticSearch. Such a solution provides fast, indexed text searching capabilities. Once a product is sold, the transaction normally has a well-structured schema (such as product name, price, and so on.) To store such data (and later process and report on it) relational databases are best suited. With polyglot persistence, a company always chooses the right tool for the right job instead of trying to coerce a single technology into solving all of its problems. Read next: How to optimize Hbase for the Cloud [Tutorial] The trouble with Smart Contracts Indexing, Replicating, and Sharding in MongoDB [Tutorial]
Read more
  • 0
  • 0
  • 8384

article-image-how-rolls-royce-is-applying-ai-and-robotics-for-smart-engine-maintenance
Sugandha Lahoti
20 Jul 2018
5 min read
Save for later

How Rolls Royce is applying AI and robotics for smart engine maintenance

Sugandha Lahoti
20 Jul 2018
5 min read
Rolls Royce has been working in the civil aviation domain for quite some time now, to build what they call as ‘intelligent engines’. The IntelligentEngine vision was first announced at the Singapore Airshow in February 2018. The idea was built around how robotics could be used to revolutionise the future of engine maintenance. Rolls Royce aims to build engines which are: Connected, using cloud based nodes and IoT devices with other engines of the fleet, as well as with the customers and operators. Contextually aware, of its operations, constraints, and customers, with modern data analysis and big data mining techniques. Comprehending, of its own experiences and other engines in the fleet using state-of-the-art machine learning and recommendation algorithms. The company has been demonstrating steady progress and showing off their rapidly developing digital capabilities. Using tiny SWARM robots for engine maintenance Their latest inventions are, tiny roach-sized ‘SWARM’ robots, capable of crawling inside airplane engines and fix them. They look like they’ve just crawled straight out of a Transformers movie. This small robot, almost 10mm in size can perform a visual inspection of hard to reach airplane engine parts. The devices will be mounted with tiny cameras providing a live video feed to allow engineers to see what’s going on inside an engine without having to take it apart. These swarm robots will be deposited on the engine via another invention, the ‘snake’ robots. Officially called FLARE, these snake robots are flexible enough to travel through an engine, like an endoscope. Source Another group of robots, the INSPECT robots is a network of periscopes permanently embedded within the engine. These bots can inspect engines using periscope cameras to spot and report any maintenance requirements. Current prototypes of these bots are much larger than the desired size and not quite ready for intricate repairs. They may be production ready in almost two years. Reducing flight delays with data analysis R2 Data Labs (Rolls Royce data science department) offers technical insight capabilities to their Airline Support Teams (ASTs). ASTs generally assess incident reports, submitted after disruption events or maintenance is undertaken. The Technical Insight platform will help ASTs easily capture, categorize and collate report data in a single place. This platform builds a bank of high-quality data (almost 10 times the size of the database ASTs had access to previously), and then analyze it to identify trends and common issues for more insightful analytics. The technical insight platform has so far shown positive results and has been critical to achieving the company’s IntelligentEngine vision. According to their blog, it was able to avoid delays and cancellations in a particular operator’s 757 fleet by 30%, worth £1.5m per year. The social network for engines In May 2018, the company launched an engine network app. This app was designed to bring all of the engine data under a single hood, much like how Facebook brings all your friends on a single platform. In this app, all the crucial information regarding all the engines in a fleet is available in a single place. Much like Facebook, each engine has a ‘profile’, which shows data on how it’s been operated, the aircraft it has been paired with, the parts it contains, and how much service life is left in each component. It also has a ‘Timeline’ which shows the complete story of the engine’s operational history. In fact, you also have a ‘newsfeed’ to display the most important insights from across the fleet. Source The engine also has an in-built recommendation algorithm which suggests future maintenance work for individual engines, based on what it learns from other similar engines in the fleet. As Juan Carlos Cabrejas, Technical Product Manager, R2 Data Labs writes, “This capability is essential to our IntelligentEngine vision, as it underpins our ability to build a frictionless data ecosystem across our fleets.” Transforming Engine Health Management Rolls-Royce is taking Engine Health Management (EHM) to a new level of connectivity. Their latest EHM system can measure thousands of parameters and monitor entirely new parts of the engine. And interestingly, the EHM has a ‘talk back’ feature. An operational center can ask the system to focus on one particular part or parameter of the engine. The system listens and responds back with hundreds of hours of information specifically tailored to that request. Axel Voege, Rolls-Royce, Head of Digital Operations, Germany, says” By getting that greater level of detail, instantly, our engineering teams can work out a solution much more quickly.” This new system will go into service next year making it their most IntelligentEngine yet. As IntelligentEngine makes rapid progress, the company sees itself designing, testing, and managing engines entirely through their digital twin in the near future. You can read more about the IntelligentEngine vision and other stories to discover new products and updates at the Rolls Royce site. Unity announces a new automotive division and two-day Unity AutoTech Summit Apollo 11 source code: A small step for a woman, and a huge leap for ‘software engineering’
Read more
  • 0
  • 0
  • 4944
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-data-science-for-non-techies-how-i-got-started
Amey Varangaonkar
20 Jul 2018
7 min read
Save for later

Data science for non-techies: How I got started (Part 1)

Amey Varangaonkar
20 Jul 2018
7 min read
As a category manager, I manage the data science portfolio of product ideas for Packt Publishing, a leading tech publisher. In simple terms, I place informed bets on where to invest, what topics to publish on etc.  While I have a decent idea of where the industry is heading and what data professionals are looking forward to learn and why etc, it is high time I walked in their shoes for a couple of reasons. Basically, I want to understand the reason behind Data Science being the ‘Sexiest job of the 21st century’, and if the role is really worth all the fame and fortune. In the process, I also wanted to explore the underlying difficulties, challenges and obstacles that every data scientist has had to endure at some point in his/her journey, or still does, maybe. The cherry on top, is that I get to use the skills I develop, to supercharge my success in my current role that is primarily insight-driven. This is the first of a series of posts on how I got started with Data Science. Today, I’m sharing my experience with devising a learning path and then gathering appropriate learning resources. Devising a learning path To understand the concepts of data science, I had to research a lot. There are tons and tons of resources out there, many of which are very good. Once you seperate the good from the rest, it can be quite intimidating to pick the options that suit you the best. Some of the primary questions that clouded my mind were: What should be my programming language of choice? R or Python? Or something else? What tools and frameworks do I need to learn? What about the statistics and mathematical aspects of machine learning? How essential are they? Two videos really helped me find the answers to the questions above: If you don’t want to spend a lot of your time mastering the art of data science, there’s a beautiful video on how to become a data scientist in six months What are the questions asked in a data science interview? What are the in-demand skills that you need to master in order to get a data science job? This video on 5 Tips For Getting a Data Science Job really is helpful. After a lot of research that included reading countless articles and blogs and discussions with experts, here is my learning plan: Learn Python Per the recently conducted Stack Overflow Developer Survey 2018, Python stood out as the most-wanted programming language, meaning the developers who do not use it yet want to learn it the most. As one of the most widely used general-purpose programming languages, Python finds large applications when it comes to data science. Naturally, you get attracted to the best option available, and Python was the one for me. The major reasons why I chose to learn Python over the other programming languages: Very easy to learn: Python is one of the easiest programming languages to learn. Not only is the syntax clean and easy to understand, even the most complex of data science tasks can be done in a few lines of Python code. Efficient libraries for Data Science: Python has a vast array of libraries suited for various data science tasks, from scraping data to visualizing and manipulating it. NumPy, SciPy, pandas, matplotlib, Seaborn are some of the libraries worth mentioning here. Python has terrific libraries for machine learning: Learning a framework or a library which makes machine learning easier to perform is very important. Python has libraries such as scikit-learn and Tensorflow that makes machine learning easier and a fun-to-do activity. To make the most of these libraries, it is important to understand the fundamentals of Python. My colleague and good friend Aaron has put out a list of top 7 Python programming books which helped as a brilliant starting point to understand the different resources out there to learn Python. The one book that stood out for me was Learn Python Programming - Second Edition - This is a very good book to start Python programming from scratch. There is also a neat skill-map present on Mapt, where you can progressively build up your knowledge of Python - right from the absolute basics to the most complex concepts. Another handy resource to learn the A-Z of Python is Complete Python Masterclass. This is a slightly long course, but it will take you from the absolute fundamentals to the most advanced aspects of Python programming. Task Status: Ongoing Learn the fundamentals of data manipulation After learning the fundamentals of Python programming, the plan is to head straight to the Python-based libraries for data manipulation, analysis and visualization. Some of the major ones are what we already discussed above, and the plan to learn them is in the following order: NumPy - Used primarily for numerical computing pandas - One of the most popular Python packages for data manipulation and analysis matplotlib - The go-to Python library for data visualization, rivaling the likes of R’s ggplot2 Seaborn - A data visualization library that runs on top of matplotlib used for creating visually appealing charts, plots and histograms Some very good resources to learn about all these libraries: Python Data Analysis Python for Data Science and Machine Learning - This is a very good course with a detailed coverage on the machine learning concepts. Something to learn later. The aim is to learn these libraries upto a fairly intermediate level, and be able to manipulate, analyze and visualize any kind of data, including missing, unstructured data and time-series data. Understand the fundamentals of statistics, linear algebra and probability In order to take a step further and enter into the foray of machine learning, the general consensus is to first understand the maths and statistics behind the concepts of machine learning. Implementing them in Python is relatively easier once you get the math right, and that is what I plan to do. I shortlisted some very good resources for this as well: Statistics for Machine Learning Stanford University - Machine Learning Course at Coursera Task Status: Ongoing Learn Machine Learning (Sounds odd I know) After understanding the math behind machine learning, the next step is to learn how to perform predictive modeling using popular machine learning algorithms such as linear regression, logistic regression, clustering, and more. Using real-world datasets, the plan is to learn the art of building state-of-the-art machine learning models using Python’s very own scikit-learn library, as well as the popular Tensorflow package. To learn how to do this, the courses I mentioned above should come in handy: Stanford University - Machine Learning Course at Coursera Python for Data Science and Machine Learning Python Machine Learning, Second Edition Task Status: To be started [box type="shadow" align="" class="" width=""]During the course of this journey, websites like Stack Overflow and Stack Exchange will be my best friends, along with the popular resources such as YouTube.[/box] As I start this journey, I plan to share my experiences and knowledge with you all. Do you think the learning path looks good? Is there anything else that I should include in my learning path? I would really love to hear your comments, suggestions and experiences. Stay tuned for the next post where I seek answers to questions such as ‘How much of Python should I learn in order to be comfortable with Data Science?’, ‘How much time should I devote per day or week to learn the concepts in Data Science?’ and much more.. Read more Why is data science important? 9 Data Science Myths Debunked 30 common data science terms explained
Read more
  • 0
  • 0
  • 5090

article-image-why-twitter-finally-migrated-to-tensorflow
Amey Varangaonkar
18 Jul 2018
3 min read
Save for later

Why Twitter (finally!) migrated to Tensorflow

Amey Varangaonkar
18 Jul 2018
3 min read
A new nest in the same old tree. Twitter have finally migrated to Tensorflow as their preferred choice of machine learning framework. While not many are surprised by this move given the popularity of Tensorflow, many have surely asked the question - ‘What took them so long?’ Why Twitter migrated to Tensorflow only now Ever since its inception, Twitter have been using their trademark internal system called as DeepBird. This system was able to utilize the power of machine learning and predictive analytics to understand user data, drive engagement and promote healthier conversations. DeepBird primarily used Lua Torch to power its operations. As the support for the language grew sparse due to Torch’s move to PyTorch, Twitter decided it was high time to migrate DeepBird to support Python as well - and started exploring their options. Given the rising popularity of Tensorflow, it was probably the easiest choice Twitter had to make for some time. Per the recently conducted Stack Overflow Developer Survey 2018, Tensorflow is the most loved framework by the developers, with almost 74% of the respondents showing their loyalty towards it. With Tensorflow 2.0 around the corner, the framework promises to build on its existing capabilities by adding richer machine learning features with cross-platform support - something Twitter will be eager to get the most out of. How does Tensorflow help Twitter? After incorporating Tensorflow into DeepBird, Twitter were quick to share some of the initial results. Some of the features that stand out are: Higher engineer productivity - With the help of Tensorboard and some internal data viz tools such as Model Repo, it has become a lot easier for Twitter engineers to observe the performance of the models and tweak them to obtain better results. Easier access to Machine Learning - Tensorflow simplified machine learning models which can be integrated with other technology stacks due to the general-purpose nature of Python. Better performance - The overall performance of DeepBird v2 was found to be better than its predecessor which was powered by Lua Torch. Production-ready models - Twitter plan to develop models that can be integrated to the workflow with minimal issues and bugs, as compared to other frameworks such as Lua Torch. With Tensorflow in place, Twitter users can expect their timelines to be full of relatable, insightful and high quality interactions which they can easily be a part of. Tweets will be shown to readers based on their relevance, and Tensorflow will be able to predict how a particular user will react to them. A large number of heavyweights have already adopted Tensorflow as their machine learning framework of choice  - eBay, Google, Uber, Dropbox, and Nvidia being some of the major ones. As the list keeps on growing, one can only wonder which major organization will be next on the list. Read more TensorFlow 1.9.0-rc0 release announced Python, Tensorflow, Excel and more – Data professionals reveal their top tools Distributed TensorFlow: Working with multiple GPUs and servers  
Read more
  • 0
  • 0
  • 4743

article-image-what-you-missed-at-last-weeks-icml-2018-conference
Sugandha Lahoti
18 Jul 2018
6 min read
Save for later

What you missed at last week’s ICML 2018 conference

Sugandha Lahoti
18 Jul 2018
6 min read
The 35th International Conference on Machine Learning (ICML) 2018, took place on July 10, 2018 - July 15, 2018 in Stockholm, Sweden. ICML is one of the most anticipated conferences for every data scientist and ML practitioner and features some of the best ML researchers who come to talk about their research and discuss new ideas. It won’t be wrong to say that Deep learning and its subsets were the showstopper of this conference with a large number of research papers and AI professionals implementing it in their methods. These included sessions and paper presentations on, Gaussian Processes, -Networks and Relational Learning, Time-Series Analysis, Deep Bayesian Non-parametric Tracking, Generative Models, etc. Also, other deep learning subsets such as Representation Learning, Ranking and Preference Learning, Supervised Learning, Transfer and Multi-Task Learning, etc were heavily featured. The conference consisted of one day of tutorials (July 10), followed by three days of main conference sessions (July 11-13), followed by two days of workshops (July 14-15). Best Talks and Seminars of ICML 2018 ICML 2018 featured two informative talks dealing with the applications of Artificial Intelligence in other domains. Day 1 was inaugurated by an invited talk from Prof. Dawn Song on “AI and Security: Lessons, Challenges and Future Directions’’. She talked about the impact of AI in computer security, differential privacy techniques, and the synergy between AI, computer security, and blockchain. She also gave an overview of challenges and new techniques to enable privacy-preserving machine learning. Day 3 featured an inaugural talk by Max Welling on “Intelligence per  Kilowatt hour”, focusing on the connection between physics and AI. According to Max, in the coming future, companies will find it too expensive to run energy absorbing ML tools to power their AI engines, or the heat dissipation in edge devices will be too high to be safe. So the next frontier of AI is going to be finding the most energy efficient combination of hardware and algorithms. There were also two plenary talks. Language to Action: towards Interactive Task Learning with Physical Agents, by Joyce Chai and Building Machines that Learn and Think Like People by Josh Tenenbaum. Best Research Papers of ICML 2018 Among the many interesting research papers that were submitted to the ICML 2018 conference, here are the winners. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples by Anish Athalye, Nicholas Carlini, and David Wagner was lauded and bestowed with the Best Paper award. The paper identifies obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. They identify the three different types of obfuscated gradients and develop attack techniques to overcome them. Delayed Impact of Fair Machine Learning by Lydia T. Liu, Sarah Dean, Esther Rolf, and Max Simchowitz also got the Best Paper award. This paper examines the circumstances where fairness criteria promotes the long-term well-being of disadvantaged groups, measured in terms of a temporal variable of interest. The paper also introduces a one-step feedback model of decision-making that exposes how decisions change the underlying population over time. Bonus: The Test of Time award Day 4 witnessed Facebook researchers Ronan Collobert and Jason Weston receiving the honorary ‘Test of Time award’ for their 2008 ICML paper, A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. The paper proposed a single convolutional neural network that takes a sentence and outputs it’s language processing predictions. So the network can identify and distinguish part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. At the time of the paper publishing there was almost no neural networks research in Natural Language Processing. The paper’s use of word embeddings and how they are trained, the use of auxiliary tasks and multitasking, and the use of convolutional neural nets in NLP, really inspired the neural networks of today. For instance, Facebook’s recent machine translation and summarization tool Fairseq uses CNNs for language. AllenNLP’s Elmo learns improved word embeddings via a neural net language model and applies them to a large number of NLP tasks. Featured Tutorials at ICML 2018 The ICML 2018 featured a total of 9 tutorials in sets of 3 each. All the tutorials took place on Day 1. These included: Imitation Learning by Yisong Yue and Hoang M Le where they gave a broad overview of imitation learning techniques and its recent applications. Learning with Temporal Point Processes by Manuel Gomez Rodriguez and Isabel Valera. They talk about temporal point processes in machine learning from basics to advanced concepts such as marks and dynamical systems with jumps. Machine Learning in Automated Mechanism Design for Pricing and Auctions by Nina Balcan, Tuomas Sandholm, and Ellen Vitercik. This tutorial covered automated mechanism design for revenue maximization. Toward Theoretical Understanding of Deep Learning by Sanjeev Arora where he explained about what kind of theory may ultimately arise for deep learning with examples. Defining and Designing Fair Algorithms by Sam Corbett-Davies and Sharad Goel. They illustrated the problems that lie at the foundation of algorithmic fairness, drawing on ideas from machine learning, economics, and legal theory. Understanding your Neighbors: Practical Perspectives From Modern Analysis by Sanjoy Dasgupta and Samory Kpotufe. This tutorial aimed to cover new perspectives on k-NN, and translate new theoretical insights to a broader audience. Variational Bayes and Beyond: Bayesian Inference for Big Data by Tamara Broderick where she covered modern tools for fast, approximate Bayesian inference at scale. Machine Learning for Personalised Health by Danielle Belgrave and Konstantina Palla. This tutorial evaluated the current drivers of machine learning in healthcare and present machine learning strategies for personalised health. Optimization Perspectives on Learning to Control by Benjamin Recht where he showed how to learn models of dynamical systems, how to use data to achieve objectives in a timely fashion, how to balance model specification etc. Workshops at ICML 2018 Day 5 and 6 of the ICML 2018 conference were dedicated entirely for Workshops based on topics ranging from AI in health to AI in computational psychology to Humanizing AI to AI for Wildlife Conservation. Some other workshops included Bridging the Gap between Human and Automated Reasoning Data Science meets Optimization Domain Adaptation for Visual Understanding Eighth International Workshop on Statistical Relational AI Enabling Reproducibility in Machine Learning MLTrain@RML Engineering Multi-Agent Systems Exploration in Reinforcement Learning Federated AI for Robotics Workshop (F-Rob-2018) This is just a brief overview of the ICML conference, where we have handpicked a select few paper presentations and invited talks. You can see the full schedule along with the list of selected research papers at the ICML website. 7 of the best machine learning conferences for the rest of 2018 Microsoft start AI School to teach Machine Learning and Artificial Intelligence Google introduces Machine Learning courses for AI beginners
Read more
  • 0
  • 0
  • 3355

article-image-meet-the-whos-who-of-reinforcement-learning
Fatema Patrawala
12 Jul 2018
7 min read
Save for later

Meet the who's who of Reinforcement learning

Fatema Patrawala
12 Jul 2018
7 min read
Reinforcement learning is a branch of artificial intelligence that deals with an agent that perceives the information of the environment in the form of state spaces and action spaces and acts on the environment thereby resulting in a new state and receiving a reward as feedback for that action. This received reward is assigned to the new state. Just like when we had to minimize the cost function in order to train our neural network, here the reinforcement learning agent has to maximize the overall reward to find the optimal policy to solve a particular task. This article is an extract from the book Reinforcement Learning with TensorFlow.  How is reinforcement learning different from supervised and unsupervised learning? In supervised learning, the training dataset has input features, X, and their corresponding output labels, Y. A model is trained on this training dataset, to which test cases having input features, X', are given as the input and the model predicts Y'. In unsupervised learning, input features, X, of the training set are given for the training purpose. There are no associated Y values. The goal is to create a model that learns to segregate the data into different clusters by understanding the underlying pattern and thereby, classifying them to find some utility. This model is then further used for the input features X' to predict their similarity to one of the clusters. Reinforcement learning is different from both supervised and unsupervised. Reinforcement learning can guide an agent on how to act in the real world. The interface is broader than the training vectors, like in supervised or unsupervised learning. Here is the entire environment, which can be real or a simulated world. Agents are trained in a different way, where the objective is to reach a goal state, unlike the case of supervised learning where the objective is to maximize the likelihood or minimize cost. Reinforcement learning agents automatically receive the feedback, that is, rewards from the environment, unlike in supervised learning where labeling requires time-consuming human effort. One of the bigger advantages of reinforcement learning is that phrasing any task's objective in the form of a goal helps in solving a wide variety of problems. For example, the goal of a video game agent would be to win the game by achieving the highest score. This also helps in discovering new approaches to achieving the goal. For example, when AlphaGo became the world champion in Go, it found new, unique ways of winning. A reinforcement learning agent is like a human. Humans evolved very slowly; an agent reinforces, but it can do that very fast. As far as sensing the environment is concerned, neither humans nor and artificial intelligence agents can sense the entire world at once. The perceived environment creates a state in which agents perform actions and land in a new state, that is, a newly-perceived environment different from the earlier one. This creates a state space that can be finite as well as infinite. The largest sector interested in this technology is defense. Can reinforcement learning agents replace soldiers that not only walk, but fight, and make important decisions? Basic terminologies and conventions The following are the basic terminologies associated with reinforcement learning: Agent: This we create by programming such that it is able to sense the environment, perform actions, receive feedback, and try to maximize rewards. Environment: The world where the agent resides. It can be real or simulated. State: The perception or configuration of the environment that the agent senses. State spaces can be finite or infinite. Rewards: Feedback the agent receives after any action it has taken. The goal of the agent is to maximize the overall reward, that is, the immediate and the future reward. Rewards are defined in advance. Therefore, they must be created properly to achieve the goal efficiently. Actions: Anything that the agent is capable of doing in the given environment. Action space can be finite or infinite. SAR triple: (state, action, reward) is referred as the SAR triple, represented as (s, a, r). Episode: Represents one complete run of the whole task. Let's deduce the convention shown in the following diagram: Every task is a sequence of SAR triples. We start from state S(t), perform action A(t) and thereby, receive a reward R(t+1), and land on a new state S(t+1). The current state and action pair gives rewards for the next step. Since, S(t) and A(t) results in S(t+1), we have a new triple of (current state, action, new state), that is, [S(t),A(t),S(t+1)] or (s,a,s'). Pioneers and breakthroughs in reinforcement learning Here are the pioneers, industrial leaders, and research breakthroughs in the field of deep reinforcement learning. David Silver Dr. David Silver, with an h-index of 30, heads the research team of reinforcement learning at Google DeepMind and is the lead researcher on AlphaGo. David co-founded Elixir Studios and then completed his PhD in reinforcement learning from the University of Alberta, where he co-introduced the algorithms used in the first master-level 9x9 Go programs. After this, he became a lecturer at University College London. He used to consult for DeepMind before joining full-time in 2013. David lead the AlphaGo project, which became the first program to defeat a top professional player in the game of Go. Pieter Abbeel Pieter Abbeel is a professor at UC Berkeley and was a Research Scientist at OpenAI. Pieter completed his PhD in Computer Science under Andrew Ng. His current research focuses on robotics and machine learning, with a particular focus on deep reinforcement learning, deep imitation learning, deep unsupervised learning, meta-learning, learning-to-learn, and AI safety. Pieter also won the NIPS 2016 Best Paper Award. Google DeepMind Google DeepMind is a British artificial intelligence company founded in September 2010 and acquired by Google in 2014. They are an industrial leader in the domains of deep reinforcement learning and a neural turing machine. They made news in 2016 when the AlphaGo program defeated Lee Sedol, 9th dan Go player. Google DeepMind has channelized its focus on two big sectors: energy and healthcare. Here are some of its projects: In July 2016, Google DeepMind and Moorfields Eye Hospital announced their collaboration to use eye scans to research early signs of diseases leading to blindness In August 2016, Google DeepMind announced its collaboration with University College London Hospital to research and develop an algorithm to automatically differentiate between healthy and cancerous tissues in head and neck areas Google DeepMind AI reduced the Google's data center cooling bill by 40% The AlphaGo program As mentioned previously in Google DeepMind, AlphaGo is a computer program that first defeated Lee Sedol and then Ke Jie, who at the time was the world No. 1 in Go. In 2017 an improved version, AlphaGo zero was launched that defeated AlphaGo 100 games to 0. Libratus Libratus is an artificial intelligence computer program designed by the team led by Professor Tuomas Sandholm at Carnegie Mellon University to play Poker. Libratus and its predecessor, Claudico, share the same meaning, balanced. In January 2017, it made history by defeating four of the world's best professional poker players in a marathon 20-day poker competition. Though Libratus focuses on playing poker, its designers mentioned its ability to learn any game that has incomplete information and where opponents are engaging in deception. As a result, they have proposed that the system can be applied to problems in cybersecurity, business negotiations, or medical planning domains. You enjoyed an excerpt on Reinforcement learning and got to know about breakthrough research in this field. If you want to leverage the power of reinforcement learning techniques, grab our latest edition Reinforcement Learning with TensorFlow. Top 5 tools for reinforcement learning How to implement Reinforcement Learning with TensorFlow How to develop a stock price predictive model using Reinforcement Learning and TensorFlow
Read more
  • 0
  • 0
  • 2523
article-image-real-time-analytics-must-be-customer-centric
Richard Gall
09 Jul 2018
5 min read
Save for later

Real time analytics must be customer-centric

Richard Gall
09 Jul 2018
5 min read
Real time analytics is a watchword for the tech and marketing industries (or mar-tech if you like terrible neologisms). But it simply isn't delivering the impact it should for many businesses today. Research by Harvard Business Review Analytics Services, done with support from SAS, Accenture and Intel, found that while businesses are spending more time and money on real time analytics, they're not seeing the impact they want. So, although 70% of enterprises who took part in the research say they have increased spending on real time analytics, only 16% said they're actually very effective in using real time analytics across different channels. Clearly something isn't working. We're seeing a big gap between expectations and effectiveness. And, like everything in tech, it probably isn't the technology's fault. All of the data here suggests that we're all thinking about real time analytics in a way that is far too abstract. We've set expectations about what 'real time analytics' can and should do and set about building projects that should, in theory, support the business. We're thinking about real time analytics in a way that is far too abstract. But therein lies the problem - the data indicates that we're all thinking about real time analytics from a business perspective, not a customer one. Of course, all of the capabilities listed above are ultimately about supporting the customer in some way. But the thinking is backward. Customer touch points should be at the forefront of every business' collective mind when exploring real time analytics. Anything less is never going to be as effective as you want it to be. Read next: Why your app needs real time mobile analytics  Joining the dots between real time analytics and customers Of course, it might be that there's something unspoken here. It's not so much that we're thinking the wrong way round, but instead that the data is simply saying that bridging the gap between analytics and customers is hard. And it is, of course. It's hard because it forces us to change the way we work and organize. We might even need to rethink who should be driving the data strategy. Teams need to talk Arguably, there's not quite enough alignment between different teams - marketing on the one hand, data, and development are all dependent on one another, but they're possibly not working together in the way they should. CTOs, CIOs aren't working closely enough with CMOs. You can't build an analytics strategy without properly understanding all the customer touch points - what they are now, and what that should look like in the future. Real time analytics is a 'full stack problem' It also needs to be thought of as a 'full stack problem'. By this I don't mean that it's down to your full stack developers to solve (although they might be involved). Instead it's a problem that takes in every bit of the software stack. It's no good having an incredible vision if you haven't built the analytics capabilities into your front end. If your platform is old and held together with a bit of string, it's going to be difficult to fully realize your data-heavy dreams. This was something flagged up in the research with 70% of respondents claiming legacy systems were making data integration a huge challenge. Similarly, there's no point talking about self-service analytics if you're not going to build an internal tool that's actually usable. Integration is fine, but then you need people to actually use the data - and to know why they're using it. Whether you build an awesome tool in-house or find a perfect analytics solution, you need to be confident people are going to actually make use of your real time analytics you've finally achieved. Start from customer touch points Putting the customer first is, however, the first rule of real time analytics. Start with the key touch points for customers. That might require an audit of the current situation as well as some creative thinking about what might work in the future. Starting with these touch points is also crucial for how analytics is shared and used internally. As I've already said, there's no point having highly available analytics if stakeholders don't actually know what they're doing with the data that's there. This means every stakeholder needs to work backwards from every point they may influence the customer - whether that's a button, an email, a piece of content  - and consider how and what data is going to be most useful to them. Real time analytics takes time Although research suggests that implementing real time analytics is challenging, we maybe just need to accept that some of these things take time. They are a mix of technological and cultural issues we need to untie and work through. It can be frustrating - especially as the next few years will probably throw up new innovations just as we think we've cracked this one. The important thing is that collaboration and communication is key, as well as making sure everyone understands who the customer is and what they want to achieve. Simple really, right?
Read more
  • 0
  • 0
  • 1372

article-image-amazon-reinvents-speech-recognition-and-machine-translation-with-ai
Amey Varangaonkar
04 Jul 2018
4 min read
Save for later

How Amazon is reinventing Speech Recognition and Machine Translation with AI

Amey Varangaonkar
04 Jul 2018
4 min read
In the recently held AWS Summit in San Francisco, Amazon announced the general availability of two of its premium offerings - Amazon Transcribe and Amazon Translate. What’s so special about the two products is that customers will now be able to see the power of Artificial Intelligence in action, and use it to solve their day-to-day problems. These offerings from AWS will make it easier for startups and companies looking to adopt and integrate AI into their existing process and simplify their core tasks - especially pertaining to speech and language processing. Effective speech-to-text conversion with Amazon Transcribe In the AWS summit keynote, Amazon Solutions Architect Niranjan Hira expressed his excitement talking about the features of Amazon Transcribe; the automatic speech recognition service by AWS. This API can be integrated with the other tools and services offered by Amazon such as Amazon S3, and Quicksight. Source: YouTube Amazon Transcribe boasts wonderful features like: Simple API: It is very easy to use the Transcribe API to perform speech to text conversion, with minimum need for programming. Timestamp generation: The speech when converted to text also includes the timestamps for every word, so that tracking the word becomes easy and hassle-free. Variety of use-cases: The Transcribe API can be used to generate accurate transcripts of any audio or video file, of varied quality. Subtitle generation becomes easier using this API especially for low-quality audio recordings - customer service calls are a very good example. Easy to read text: Transcribe uses the cutting edge deep learning technology to parse text from speech without any errors. With appropriate punctuations and grammar in place, the transcripts are very easy to read and understand. Machine translation simplified with Amazon Translate Amazon Translate is a machine translation service offered by Amazon. It makes use of neural networks and advanced deep learning techniques to deliver accurate, high-quality translations. Key features of Amazon Translate include: Continuous training: The architecture of this service is built in such a way that the neural networks keep learning and improving. High accuracy: The continuous learning by the translation engines from new and varied datasets results in a higher accuracy of machine translations. The machine translation capability offered by this service is almost 30% more efficient than human translation. Easy to integrate with other AWS services: With a simple API call, Translate allows you to integrate the service within third party applications to allow real-time translation capabilities. Highly scalable: Regardless of the volume, Translate does not compromise the speed and accuracy of the machine translation. Know more about Amazon Translate from Yoni Friedman’s keynote at the AWS Summit. With all the businesses slowly migrating to cloud, it is clear all the cloud vendors - mainly Amazon, Google and Microsoft - are doing everything they can to establish their dominance. Google recently launched Cloud ML for GCP which offers machine learning and predictive analytics services improving businesses. Microsoft’s Azure Cognitive Services offer effective machine translation services as well, and are slowly gaining a lot of momentum. With these releases, the onus was on Amazon to respond, and they have done so in style. With the Transcribe and Translate APIs, Amazon’s goal of making it easier for startups and small-scale businesses to adopt AWS and incorporate AI seems to be on track. These services will also help AWS distinguish their cloud offerings, given that computing and storage resources are provided by rivals as well. Read more Verizon chooses Amazon Web Services(AWS) as its preferred cloud provider Tensor Processing Unit (TPU) 3.0: Google’s answer to cloud-ready Artificial Intelligence Amazon is selling facial recognition technology to police
Read more
  • 0
  • 0
  • 4118

article-image-top-8-ways-to-improve-your-data-visualizations
Natasha Mathur
04 Jul 2018
7 min read
Save for later

8 ways to improve your data visualizations

Natasha Mathur
04 Jul 2018
7 min read
In Dr. W.Edwards Deming’s words “In God we trust, all others must bring data”. Organizations worldwide, revolve around data like planets revolve around the sun. Since data is so central to organizations, there are certain data visualization tools that help them understand data to make better business decisions. A lot more data is getting churned out and collected by organizations than ever before. So, how to make sense of all this data? Humans are visual creatures and our human brain processes visual information far better than textual information. In fact, presentations that use visual aids such as colors, shapes, images, etc, are found to be far more persuasive according to a research done by University of Minnesota back in 1986. Data visualization is one such process that easily translates the collected information into engaging visuals. It’s easy, cheap and doesn’t require any designing expertise to create data visuals. However, some professionals feel that data visualization is just limited to slapping on charts and graphs when that’s not actually the case. Data visualization is about conveying the right information, in a way that enhances the audience’s experience. So, if you want your graphs and charts to be more succinct and understandable, here are eight ways to improve your data visualization process: 1. Get rid of unneeded information Less is more in some cases and the same goes for data visualization. Using excessive color, jargons, pie charts and metrics take away focus from the important information. For instance, when using colors, don’t make your charts and graphs a rainbow instead use a specific set of colors with a clear purpose and meaning. Do you see the difference color and chart make to visualization in the below images? Source: Podio Similarly, when it comes to expressing your data, note how people interact at your workplace. Keep the tone of your visuals as natural as possible to make it easy for the audience to interpret your data. For metrics, only show the ones that truly bring value to your storytelling. Filter out the ones that are not so important to create less fuss. Tread cautiously while using pie charts as they can be difficult to understand sometimes and also, get rid of the elements on a chart that cause unnecessary confusion.   Source: Dashboard Zone 2. Use conditional formatting for tabular data Data visualization doesn’t need to use fancy tools or designs. Take your standard excel table for example. Do you want to point out patterns or outliers in your data? Conditional formatting is a great tool for people working with data. It involves making simple rules on a given data and once that’s done, it’ll highlight only the data that matters the most to you. This helps quickly track the main information. Conditional formatting can be used for different things. It can help spot duplicate data in your table. You need to set bounds for the data using the built-in conditional formatting. It’ll then format the cells based on those bounds, highlighting the data you want. For instance, if sales quota of over 65% is good, between 65% and 55% is average, and below 55% is poor, then with conditional formatting, you can quickly find out who is meeting the expected sales quota, and who is not. 3. Add trendlines to unearth patterns for prediction Another feature that can amp up your data visualization is trendlines. They observe the relationship between two variables from your existing data. They are also are useful for predicting future values. Trendlines are simple to add and help discover trends in the given data set. Source: Interworks It also show data trends or moving averages in your charts. Depending on the kind of data you’re working with, there are a number of trendlines out there that you can use on your visualizations. Questions like whether a new strategy seems to be working in favor of the organization can be answered with the help of trendlines. This insight, in turn, helps predict new outcomes for the future. Statistical models are used in trendlines to make predictions. Once you add trend lines to a view, it’s up to you to decide how you want them to look and behave. 4. Implement filter by rule to get more specific Filter helps display just the information that you need. Using filter by rule, you can add filter option to your dataset. Organizations produce huge amounts of data on a regular basis. Suppose you want to know which employees within your organization are consistent performers. So, instead of creating a visualization that includes all the employees and their performances, you can filter it down, so that it shows only the employees who are always doing well. Similarly, if you want to find out which day the sales went up or down, you can filter it to show results for only the past week or month depending upon your preference. 5. For complex or dense data representation, add hierarchy Hierarchies eliminate the need to create extra visualizations. You can view data from a high level and dig deeper into the specifics of the data as you come up with questions based on the data. Adding a hierarchy to the data helps club multiple information in one visualization. Source: dzone For instance, if you create a hierarchy that shows the total sales achieved by different sales representative within an organization in the past month. Now, you can further break this down by selecting a particular sales rep, and then you can go even further by selecting a specific product assigned to that sales rep. This cuts down on a lot of extra work. 6. Make visuals more appealing by formatting data Data formatting takes only a few seconds but it can make a huge difference when it comes to the audience interpreting your data. Source: dzone It makes the numbers appear more visually appealing and easier to read for the audience. It can be used for charts such as bar charts and column charts. Formatting data to show a certain number of decimals, comma separators, number font, currency or percentage can make your visualization process more engaging. 7. Include comparison for more insight Comparisons provide readers a better perspective on data. It can both improve and add insights to your visualizations by including comparisons to your charts. For instance, in case you want to inform your audience about organization’s growth in current as well as the past year then you can include comparison within the visualization. You can also use a comparison chart to compare between two data points such as budget vs actually spent. 8. Sort data to improve readability Again, sorting through data is a great way to make things easy for the audience when dealing with huge quantities of data. For instance, if you want to include information about the highest and lowest performing products, you can sort your data. Sorting can be done in the following ways: Ascending - This helps sort the data from lowest to highest. Descending -  This sorts data from highest to lowest. Data source order - Sorts the data in the order it is sorted in the data source. Alphabetic - Data is alphabetically sorted. Manual -  Data can be sorted manually in the order you prefer. Effective data visualization helps people interpret the information in data that could not be seen before, to change their minds and prompt action. These were some of the tricks and features to take your data visualization game to the next level. There are different data visualization tools available in the market to choose from. Tableau and Microsoft Power BI are among the top ones that offer great features for data visualization. So, now that we’ve got you covered with some of the best practices for data visualization, it’s your turn to put these tips to practice and create some strong visual data stories. Do you have any DataViz tips to share with our readers? Please add them in the comments below. Getting started with Data Visualization in Tableau What is Seaborn and why should you use it for data visualization? “Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan  
Read more
  • 0
  • 0
  • 8137
article-image-9-data-science-myths-debunked
Amey Varangaonkar
03 Jul 2018
9 min read
Save for later

9 Data Science Myths Debunked

Amey Varangaonkar
03 Jul 2018
9 min read
The benefits of data science are evident for all to see. Not only does it equip you with the tools and techniques to make better business decisions, the predictive power of analytics also allows you to determine future outcomes - something that can prove to be crucial to businesses. Despite all these advantages, data science is a touchy topic for many businesses. It’s worth looking at some glaring stats that show why businesses are reluctant to adopt data science: Poor data across businesses and organizations - in both private and government costs the U.S economy close to $3 Trillion per year. Only 29% enterprises are able to properly leverage the power of Big Data and derive useful business value from it. These stats show a general lack of awareness or knowledge when it comes to data science. It could be due to some preconceived notions, or simply lack of knowledge and its application that seems to be a huge hurdle to these companies. In this article, we attempt to take down some of these notions and give a much clearer picture of what data science really is. Here are 5 of the most common myths or misconceptions in data science, and why are absolutely wrong: Data Science is just a fad, it won’t last long This is probably the most common misconception. Many tend to forget that although ‘data science’ is a recently coined term, this field of study is a cumulation of decades of research and innovation in statistical methodologies and tools. It has been in use since the 1960s or even before - just that the scale at which it was being used then was small. Back in the day, there were no ‘data scientists’, but just statisticians and economists who used the now unknown terms such as ‘data fishing’ or ‘data dredging’. Even the terms ‘data analysis’ and ‘data mining’ only went mainstream in the 1990s, but they were in use way before that period. Data Science’s rise to fame has coincided with the exponential rise in the amount of data being generated every minute. The need to understand this information and make positive use of it led to an increase in the demand for data science. Now with Big Data and Internet of Things going wild, the rate of data generation and the subsequent need for its analysis will only increase. So if you think data science is a fad that will go away soon, think again. Data Science and Business Intelligence are the same Those who are unfamiliar with what data science and Business Intelligence actually entail often get confused, and think they’re one and the same. No, they’re not. Business Intelligence is an umbrella term for the tools and techniques that give answers to the operational and contextual aspects of your business or organization. Data science, on the other hand has more to do with collecting information in order to build patterns and insights. Learning about your customers or your audience is Business Intelligence. Understanding why something happened, or whether it will happen again, is data science. If you want to gauge how changing a certain process will affect your business, data science - not Business Intelligence - is what will help you. Data Science is only meant for large organizations with large resources Many businesses and entrepreneurs are wrongly of the opinion that data science is - or works best - only for large organizations. It is a wrongly perceived notion that you need sophisticated infrastructure to process and get the most value out of your data. In reality, all you need is a bunch of smart people who know how to get the best value of the available data. When it comes to taking a data-driven approach, there’s no need to invest a fortune in setting up an analytics infrastructure for an organization of any scale. There are many open source tools out there which can be easily leveraged to process large-scale data with efficiency and accuracy. All you need is a good understanding of the tools. It is difficult to integrate data science systems with the organizational workflow With the advancement of tech, one critical challenge that has now become very easy to overcome is to collaborate with different software systems at once. With the rise of general-purpose programming languages, it is now possible to build a variety of software systems using a single programming language. Take Python for example. You can use it to analyze your data, perform machine learning or develop neural networks to work on more complex data models. All this while, you can link your web API designed in Python to communicate with these data science systems. There are provisions being made now to also integrate codes written in different programming languages while ensuring smooth interoperability and no loss of latency. So if you’re wondering how to incorporate your analytics workflow in your organizational workflow, don’t worry too much. Data Scientists will be replaced by Artificial Intelligence soon Although there has been an increased adoption of automation in data science, the notion that the work of a data scientist will be taken over by an AI algorithm soon is rather interesting. Currently, there is an acute shortage of data scientists, as this McKinsey Global Report suggests. Could this change in the future? Will automation completely replace human efforts when it comes to data science? Surely machines are a lot better than humans at finding patterns; AI best the best go player, remember. This is what the common perception seems to be, but it is not true. However sophisticated the algorithms become in automating data science tasks, we will always need a capable data scientist to oversee them and fine-tune their performance. Not just that, businesses will always need professionals with strong analytical and problem solving skills with relevant domain knowledge. They will always need someone to communicate the insights coming out of the analysis to non-technical stakeholders. Machines don’t ask questions of data. Machines don’t convince people. Machines don’t understand the ‘why’. Machines don’t have intuition. At least, not yet. Data scientists are here to stay, and their demand is not expected to go down anytime soon. You need a Ph.D. in statistics to be a data scientist No, you don’t. Data science involves crunching numbers to get interesting insights, and it often involves the use of statistics to better understand the results. When it comes to performing some advanced tasks such as machine learning and deep learning, sure, an advanced knowledge of statistics helps. But that does not imply that people who do not have a degree in maths or statistics cannot become expert data scientists. Today, organizations are facing a severe shortage of data professionals capable of leveraging the data to get useful business insights. This has led to the rise of citizen data scientists - meaning professionals who are not experts in data science, but can use the data science tools and techniques to create efficient data models. These data scientists are no experts in statistics and maths, they just know the tool inside out, ask the right questions, and have the necessary knowledge of turning data into insights. Having an expertise of the data science tools is enough Many people wrongly think that learning a statistical tool such as SAS, or mastering Python and its associated data science libraries is enough to get the data scientist tag. While learning a tool or skill is always helpful (and also essential), by no means is it the only requisite to do effective data science. One needs to go beyond the tools and also master skills such as non-intuitive thinking, problem-solving, and knowing the correct practical applications of a tool to tackle any given business problem. Not just that, it requires you to have excellent communication skills to present your insights and findings related to the most complex of analysis to other stakeholders, in a way they can easily understand and interpret. So if you think that a SAS certification is enough to get you a high-paying data science job and keep it, think again. You need to have access to a lot of data to get useful insights Many small to medium-sized businesses don’t adopt a data science framework because they think it takes lots and lots of data to be able to use the analytics tools and techniques. Data when present in bulk, always helps, true, but don’t need hundreds of thousands of records to identify some pattern, or to extract relevant insights. Per IBM, data science is defined by the 4 Vs of data, meaning Volume, Velocity, Veracity and Variety. If you are able to model your existing data into one of these formats, it automatically becomes useful and valuable. Volume is important to an extent, but it’s the other three parameters that add the required quality. More data = more accuracy Many businesses collect large hordes of information and use the modern tools and frameworks available at their disposal for analyzing this data. Unfortunately, this does not always guarantee accurate results. Neither does it guarantee useful actionable insights or more value. Once the data is collected, the preliminary analysis on what needs to be done with the data is required. Then, we use the tools and frameworks at our disposal to extract the relevant insights and built an appropriate data model. These models need to be fine-tuned as per the processes for which they will be used. Then, eventually, we get the desired degree of accuracy from the model. Data in itself is quite useless. It’s how we work on it - more precisely, how effectively we work on it - that makes all the difference. So there you have it! Data science is one of the most popular skills to have in your resume today, but it is important to first clear all the confusions and misconceptions that you may have about it. Lack of information or misinformation can do more harm than good, when it comes to leveraging the power of data science within a business - especially considering it could prove to be a differentiating factor for its success and failure. Do you agree with our list? Do you think there are any other commonly observed myths around data science that we may have missed? Let us know. Read more 30 common data science terms explained Why is data science important? 15 Useful Python Libraries to make your Data Science tasks Easier
Read more
  • 0
  • 0
  • 6714

article-image-the-trouble-with-smart-contracts
Guest Contributor
03 Jul 2018
6 min read
Save for later

The trouble with Smart Contracts

Guest Contributor
03 Jul 2018
6 min read
The government of Tennessee now officially recognizes Smart Contracts. That’s great news if we speak in terms of the publicity blockchain will receive. By virtue of such events, the Blockchain technology and all that’s related to it are drawing closer to becoming a standard way of how things work. However, the practice shows that the deeper you delve into the nuances of Blockchain, the more you understand that we are at the very beginning of quite a long and so far uncertain path. Before we investigate Smart Contracts on the back of a Tennessee law, let’s look at the concept in lay terms. Traditional Contract vs Smart Contract A traditional contract is simply a notarized piece of paper that details actions that are to be performed under certain conditions. It doesn’t control the actions fulfillment, but only assures it. Smart Contract is just like a paper contract; it specifies the conditions. Along with that, since a smart contract is basically a program code, it can carry out actions (which is impossible when we deal with the paper one). Most typically, smart contracts are executed in a decentralized environment, where: Anyone can become a validator and verify the authenticity of correct smart contract execution and the state of the database. Distributed and independent validators supremely minimize the third-party reliance and give confidence concerning unchangeability of what is to be done. That’s why, before putting a smart contract into action you should accurately check it for bugs. Because you won’t be able to make changes once it’s launched. All assets should be digitized. And all the data that may serve as a trigger for smart contract execution must be located within one database (system). What are oracles? There’s a popular myth that smart contracts in Ethereum can take external data from the web and use it in their environment (for example, smart contract transfers money to someone who won the bet on a football match results). You can not do that, because a smart contract only relies on the data that’s on the Ethereum blockchain. Still, there is a workaround. The database (Ethereum’s, in our case) can contain so-called oracles — ‘trusted’ parties that collect data from ‘exterior world’ and deliver it to smart contracts. For more precision, it is necessary to choose a wide range of independent oracles that provide smart contract with information. This way, you minimize the risk of their collusion. Smart Contract itself is only a piece of code For a better understanding, take a look at what Pavel Kravchenko — Founder of Distributed Lab has written about Smart Contracts on his Medium post: “A smart contract itself is a piece of code. The result of this code should be the agreement of all participants of the system regarding account balances (mutual settlements). From here indirectly it follows that a smart contract cannot manage money that hasn’t been digitized. Without a payment system that provides such opportunity (for example, Bitcoin, Ethereum or central bank currency), smart contracts are absolutely helpless!” Smart Contracts under the Tennessee law Storing data on the blockchain is now a legit thing to do in Tennessee. Here are some of the primary conditions stipulated by the law: Records or contracts secured through the blockchain are acknowledged as electronic records. Ownership rights of certain information stored on blockchain must be protected. Smart Contract is considered as an event-driven computer program, that’s executed on an electronic, distributed, decentralized, shared, and replicated ledger that is used to automate transactions. Electronic signatures and contracts secured through the blockchain technologies now have equal legal standing with traditional types of contracts and signatures. It is worth noting that the definition of a smart contract is pretty clear and comprehensive here. But, unfortunately, it doesn’t let the matter rest and there are some questions that were not covered: How can smart contracts and the traditional ones have equal legal standings if the functionality of a smart contract is much broader? Namely, it performs actions, while traditional contract only assures them. How will asset digitization be carried out? Do they provide any requirements for the Smart Contract source code or some normative audit that is to be performed in order to minimize bugs risk? The problem is not with smart contracts, but with creating the ecosystem around them. Unfortunately, it is impossible to build uniform smart-contract-based relationships in our society simply because the regulator has officially recognized the technology. For example, you won’t be able to sell your apartment via Smart Contract functionality if there won’t be a regulatory base that considers: The specified blockchain platform on which smart contract functionality is good enough to sustain a broad use. The way assets are digitized. And it’s not only for digital money transactions that you will be using smart contracts. You can use smart contracts to store any valuable information, for example, proprietary rights on your apartment. Who can be the authorized party/oracle that collects the exterior data and delivers it to the Smart Contract (Speaking of apartments, it is basically the notary, who should verify such parameters as ownership of the apartment, its state, even your existence, etc) So, it’s true. A smart contract itself is a piece of code and objectively is not a problem at all. What is a problem, however, is preparing a sound basis for the successful implementation of Smart Contracts in our everyday life. Create and launch a mechanism that would allow the connection of two entirely different gear wheels: smart contracts in its digital, decentralized and trustless environment the real world, where we mostly deal with the top-down approach and have regulators, lawyers, courts, etc. FAE (Fast Adaptation Engine): iOlite’s tool to write Smart Contracts using machine translation Blockchain can solve tech’s trust issues – Imran Bashir A brief history of Blockchain About the Expert, Dr. Pavel Kravchenko Dr. Pavel Kravchenko is the Founder of Distributed Lab, blogger, cryptographer and Ph.D. in Information Security. Pavel is working in blockchain industry since early 2014 (Stellar). Pavel's expertise is mostly focused on cryptography, security & technological risks, tokenization. About Distributed Lab Distributed Lab is a blockchain expertise center, with a core mission to develop cutting-edge enterprise tokenization solutions, laying the groundwork for the coming “Financial Internet”. Distributed Lab organizes dozens of events every year for the Crypto community – ranging from intensive small-format meetups and hackathons to large-scale international conferences which draw 1000+ attendees.  
Read more
  • 0
  • 0
  • 3162