Big Data | 10 articles | Tech News, Tutorials & Expert Insights

article-image-15-things-every-bi-professional-should-know-about-tableau

17 Dec 2019

8 min read

15 things every BI professional should know about Tableau

17 Dec 2019

“The art and practice of visualizing data is becoming ever more important in bridging the human-computer gap to mediate analytical insight in a meaningful way.” ―Edd Dumbill Tableau is a powerful data visualization and discovery tool. It is an important part of a data analyst or data scientist’s - skill set, with many organizations specifying it as a key skill in job adverts. In this article, we’ll take a look at few things in Tableau you need to know to successfully make a mark in your business intelligence career. While architecture of traditional BI tools has hardware limitations, Tableau does not have such dependencies and it can function independently and requires minimum hardware support. Traditional tools are based on a complex set of technologies when Tableau is based on Associative Search technology making it intuitive, fast and dynamic. Tableau supports in-memory, multi-thread and multi-core computing and more advanced capabilities while traditional BI tools do not offer such functionalities. Various Tableau products Tableau Desktop is a self service business analytics and data visualization suite that anyone can use. With tableau desktop, you can extract massive data offline from your data warehouse for live up to date data analysis. Tableau Online / Tableau Server is an online hosting platform designed for enterprise users. It lets users working on Tableau publish and share dashboards across organization and teams. Tableau Reader is a free desktop application that enables you to open and view visualizations that are built in Tableau Desktop. Tableau Public is a free Tableau software which you can use to make visualizations but you will need to save your workbook or worksheets in the Tableau Server for anyone else to view them. Different data types in Tableau All fields in a data source have a data type. The data type reflects the kind of information stored in that field, for example integers (410), dates (1/23/2015) and strings (“Wisconsin”). The data type of a field is identified in the Data pane by one of the icons shown below. Data type icons in Tableau Icon Data type Text (string) values Date values Date & Time values Numerical values Boolean values (relational only) for example True/False Geographic values (used with maps) Cluster Group Source: Tableau website Measures and Dimensions in Tableau Measures contain numeric, quantitative values that you can measure. Measures can be aggregated. When you drag a measure into the view, Tableau applies an aggregation to that measure (by default). Dimensions, on the other hand, contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal the details in your data. Dimensions affect the level of detail in the view. Ways to connect data in Tableau We can either connect live to your data set or extract data into Tableau. Live: Connecting live to a data set leverages its computational processing and storage. New queries will go to the database and will be reflected as new or updated within the data. Extract: The Extract API allows you to programmatically extract and combine any data sources for use in Tableau. There can be multiple data source connections to different sources in the same workbook. Each connection will show up under the Data tab on the left sidebar. The benefit of Tableau extract over live connection is that extract can be used anywhere without any connection and you can build your own visualization without connecting to database. You can read a complete section on how to extract data in Tableau from this book, Learning Tableau 2019 - Third Edition, written by Joshua Milligan. This book takes you through the foundations of the Tableau 2019 paradigm to the advanced topics. Joins and Blends in Tableau Joining tables and blending data sources are two different ways to link related data together in Tableau. Joins are performed to link tables of data together on a row-by-row basis. Blends are performed to link together multiple data sources at an aggregate level. Different filters in Tableau and different use cases in which these filters are more relevant than others In Tableau, filters are used to restrict the data from database. Often, you will want to filter data in Tableau in order to perform an analysis on a subset of data, narrow your focus, or drill into detail. Tableau offers multiple ways to filter data. If you want to limit the scope of your analysis to a subset of data, you can filter the data at the source using one of the following techniques: Data Source Filters are applied before all other filters and are useful when you want to limit your analysis to a subset of data. These filters are applied before any other filters. Extract Filters limit the data that is stored in an extract (.tde or .hyper). Data source filters are often converted into extract filters if they are present when you extract the data. Custom SQL Filters can be accomplished using a live connection with custom SQL, which has a Tableau parameter in the WHERE clause. Dual axis in Tableau Dual Axis is an excellent phenomenon supported by Tableau that helps users view two scales of two measures in the same graph. Many websites like Indeed.com and other make use of dual axis to show the comparison between two measures and their growth rate in a septic set of years. Dual axis let you compare multiple measures at once, having two independent axis layered on top of one another. Key components of a Tableau Dashboard Horizontal – Horizontal layout containers allow the designer to group worksheets and dashboard components left to right across your page and edit the height of all elements at once. Vertical – Vertical containers allow the user to group worksheets and dashboard components top to bottom down your page and edit the width of all elements at once. Text – All textual fields. Image Extract – A Tableau workbook is in XML format. In order to extract images, Tableau applies some codes to extract an image which can be stored in XML. Web [URL ACTION] – A URL action is a hyperlink that points to a Web page, file, or other web-based resource outside of Tableau. You can use URL actions to link to more information about your data that may be hosted outside of your data source. To make the link relevant to your data, you can substitute field values of a selection into the URL as parameters. If you want to learn how to design dashboards in Tableau, this book Learning Tableau 2019, will give you a step by step process for designing dashboards. Why automate reports in Tableau Once you have automated reporting, you’ll have time to spend on innovative projects. What can be done manually could be performed by automation, delivering the same results in a fraction of the time. Reducing such a time-consuming and repetitive task will make you more productive, and more efficient. What is story in Tableau? Why would create a story and what are they used for? A story is a sheet that contains a sequence of worksheets or dashboards that work together to convey information. You can create stories to show how facts are connected, provide context, demonstrate how decisions relate to outcomes, or simply make a compelling case. Each individual sheet in a story is called a story point. The primary objective of creating stories in Tableau is to communicate data to a certain audience with an intended result. How can you create stories in Tableau? There is a feature in Tableau named as Stories that allows you to tell a story using interactive snapshots of dashboards and views. The snapshots become points in a story. This allows you to construct guided narrative or even an entire presentation. Read this chapter, ‘Telling a Data Story with Dashboards’ from this book, Learning Tableau 2019, to create insightful dashboards in Tableau. How to embed views into Webpages? You can embed interactive Tableau views and dashboards into web pages, blogs, wiki pages, web applications, and intranet portals. Embedded views update as the underlying data changes, or as their workbooks are updated on Tableau Server. Embedded views follow the same licensing and permission restrictions used on Tableau Server. That is, to see a Tableau view that’s embedded in a web page, the person accessing the view must also have an account on Tableau Server. Alternatively, if your organization uses a core-based license on Tableau Server, a Guest account is available. This allows people in your organization to view and interact with Tableau views embedded in web pages without having to sign in to the server. Contact your server or site administrator to find out if the Guest user is enabled for the site you publish to. What is Tableau Prep? Can we clean messy data with Tableau? Tableau Prep extends the Tableau platform with robust options for cleaning and structuring data for analysis in Tableau. In the same way that Tableau Desktop provides a hands-on, visual experience for visualizing and analyzing data, Tableau Prep provides a hands-on, visual experience for cleaning and shaping data. If you wish to know more about Tableau Prep or how to clean messy data to create powerful data visualizations and unlock intelligent business insights, read this book Learning Tableau 2019, written by Joshua N. Milligan. ‘Tableau Day’ highlights: Augmented Analytics, Tableau Prep Builder and Conductor, and more! Alteryx vs. Tableau: Choosing the right data analytics tool for your business How to do data storytelling well with Tableau [Video]

0
0
10071

article-image-brett-lantz-on-implementing-a-decision-tree-using-c5-0-algorithm-in-r

Packt Editorial Staff

29 Mar 2019

9 min read

Brett Lantz on implementing a decision tree using C5.0 algorithm in R

Packt Editorial Staff

29 Mar 2019

9 min read

Decision tree learners are powerful classifiers that utilize a tree structure to model the relationships among the features and the potential outcomes. This structure earned its name due to the fact that it mirrors the way a literal tree begins at a wide trunk and splits into narrower and narrower branches as it is followed upward. In much the same way, a decision tree classifier uses a structure of branching decisions that channel examples into a final predicted class value. In this article, we demonstrate the implementation of decision tree using C5.0 algorithm in R. This article is taken from the book, Machine Learning with R, Fourth Edition written by Brett Lantz. This 10th Anniversary Edition of the classic R data science book is updated to R 4.0.0 with newer and better libraries. This book features several new chapters that reflect the progress of machine learning in the last few years and help you build your data science skills and tackle more challenging problems There are numerous implementations of decision trees, but the most well-known is the C5.0 algorithm. This algorithm was developed by computer scientist J. Ross Quinlan as an improved version of his prior algorithm, C4.5 (C4.5 itself is an improvement over his Iterative Dichotomiser 3 (ID3) algorithm). Although Quinlan markets C5.0 to commercial clients (see http://www.rulequest.com/ for details), the source code for a single-threaded version of the algorithm was made public, and has therefore been incorporated into programs such as R. The C5.0 decision tree algorithm The C5.0 algorithm has become the industry standard for producing decision trees because it does well for most types of problems directly out of the box. Compared to other advanced machine learning models, the decision trees built by C5.0 generally perform nearly as well but are much easier to understand and deploy. Additionally, as shown in the following table, the algorithm's weaknesses are relatively minor and can be largely avoided. Strengths An all-purpose classifier that does well on many types of problems. Highly automatic learning process, which can handle numeric or nominal features, as well as missing data. Excludes unimportant features. Can be used on both small and large datasets. Results in a model that can be interpreted without a mathematical background (for relatively small trees). More efficient than other complex models. Weaknesses Decision tree models are often biased toward splits on features having a large number of levels. It is easy to overfit or underfit the model. Can have trouble modeling some relationships due to reliance on axis-parallel splits. Small changes in training data can result in large changes to decision logic. Large trees can be difficult to interpret and the decisions they make may seem counterintuitive. To keep things simple, our earlier decision tree example ignored the mathematics involved with how a machine would employ a divide and conquer strategy. Let's explore this in more detail to examine how this heuristic works in practice. Choosing the best split The first challenge that a decision tree will face is to identify which feature to split upon. In the previous example, we looked for a way to split the data such that the resulting partitions contained examples primarily of a single class. The degree to which a subset of examples contains only a single class is known as purity, and any subset composed of only a single class is called pure. There are various measurements of purity that can be used to identify the best decision tree splitting candidate. C5.0 uses entropy, a concept borrowed from information theory that quantifies the randomness, or disorder, within a set of class values. Sets with high entropy are very diverse and provide little information about other items that may also belong in the set, as there is no apparent commonality. The decision tree hopes to find splits that reduce entropy, ultimately increasing homogeneity within the groups. Typically, entropy is measured in bits. If there are only two possible classes, entropy values can range from 0 to 1. For n classes, entropy ranges from 0 to log2(n). In each case, the minimum value indicates that the sample is completely homogenous, while the maximum value indicates that the data are as diverse as possible, and no group has even a small plurality. In mathematical notion, entropy is specified as: In this formula, for a given segment of data (S), the term c refers to the number of class levels, and pi refers to the proportion of values falling into class level i. For example, suppose we have a partition of data with two classes: red (60 percent) and white (40 percent). We can calculate the entropy as: > -0.60 * log2(0.60) - 0.40 * log2(0.40) [1] 0.9709506 We can visualize the entropy for all possible two-class arrangements. If we know the proportion of examples in one class is x, then the proportion in the other class is (1 – x). Using the curve() function, we can then plot the entropy for all possible values of x: > curve(-x * log2(x) - (1 - x) * log2(1 - x), col = "red", xlab = "x", ylab = "Entropy", lwd = 4) This results in the following figure: The total entropy as the proportion of one class varies in a two-class outcome As illustrated by the peak in entropy at x = 0.50, a 50-50 split results in the maximum entropy. As one class increasingly dominates the other, the entropy reduces to zero. To use entropy to determine the optimal feature to split upon, the algorithm calculates the change in homogeneity that would result from a split on each possible feature, a measure known as information gain. The information gain for a feature F is calculated as the difference between the entropy in the segment before the split (S1) and the partitions resulting from the split (S2): One complication is that after a split, the data is divided into more than one partition. Therefore, the function to calculate Entropy(S2) needs to consider the total entropy across all of the partitions. It does this by weighting each partition's entropy according to the proportion of all records falling into that partition. This can be stated in a formula as: In simple terms, the total entropy resulting from a split is the sum of entropy of each of the n partitions weighted by the proportion of examples falling in the partition (wi). The higher the information gain, the better a feature is at creating homogeneous groups after a split on that feature. If the information gain is zero, there is no reduction in entropy for splitting on this feature. On the other hand, the maximum information gain is equal to the entropy prior to the split. This would imply the entropy after the split is zero, which means that the split results in completely homogeneous groups. The previous formulas assume nominal features, but decision trees use information gain for splitting on numeric features as well. To do so, a common practice is to test various splits that divide the values into groups greater than or less than a threshold. This reduces the numeric feature into a two-level categorical feature that allows information gain to be calculated as usual. The numeric cut point yielding the largest information gain is chosen for the split. Note: Though it is used by C5.0, information gain is not the only splitting criterion that can be used to build decision trees. Other commonly used criteria are Gini index, chi-squared statistic, and gain ratio. For a review of these (and many more) criteria, refer to An Empirical Comparison of Selection Measures for Decision-Tree Induction, Mingers, J, Machine Learning, 1989, Vol. 3, pp. 319-342. Pruning the decision tree As mentioned earlier, a decision tree can continue to grow indefinitely, choosing splitting features and dividing into smaller and smaller partitions until each example is perfectly classified or the algorithm runs out of features to split on. However, if the tree grows overly large, many of the decisions it makes will be overly specific and the model will be overfitted to the training data. The process of pruning a decision tree involves reducing its size such that it generalizes better to unseen data. One solution to this problem is to stop the tree from growing once it reaches a certain number of decisions or when the decision nodes contain only a small number of examples. This is called early stopping or prepruning the decision tree. As the tree avoids doing needless work, this is an appealing strategy. However, one downside to this approach is that there is no way to know whether the tree will miss subtle but important patterns that it would have learned had it grown to a larger size. An alternative, called post-pruning, involves growing a tree that is intentionally too large and pruning leaf nodes to reduce the size of the tree to a more appropriate level. This is often a more effective approach than prepruning because it is quite difficult to determine the optimal depth of a decision tree without growing it first. Pruning the tree later on allows the algorithm to be certain that all of the important data structures were discovered. Note: The implementation details of pruning operations are very technical and beyond the scope of this book. For a comparison of some of the available methods, see A Comparative Analysis of Methods for Pruning Decision Trees, Esposito, F, Malerba, D, Semeraro, G, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, Vol. 19, pp. 476-491. One of the benefits of the C5.0 algorithm is that it is opinionated about pruning—it takes care of many of the decisions automatically using fairly reasonable defaults. Its overall strategy is to post-prune the tree. It first grows a large tree that overfits the training data. Later, the nodes and branches that have little effect on the classification errors are removed. In some cases, entire branches are moved further up the tree or replaced by simpler decisions. These processes of grafting branches are known as subtree raising and subtree replacement, respectively. Getting the right balance of overfitting and underfitting is a bit of an art, but if model accuracy is vital, it may be worth investing some time with various pruning options to see if it improves the test dataset performance. To summarize , decision trees are widely used due to their high accuracy and ability to formulate a statistical model in plain language. Here, we looked at a highly popular and easily configurable decision tree algorithm C5.0. The major strength of the C5.0 algorithm over other decision tree implementations is that it is very easy to adjust the training options. Harness the power of R to build flexible, effective, and transparent machine learning models with Brett Lantz’s latest book Machine Learning with R, Fourth Edition. Dr.Brandon explains Decision Trees to Jon Building a classification system with Decision Trees in Apache Spark 2.0 Implementing Decision Trees

0
0
6965

article-image-4-things-in-tech-that-might-die-in-2019

Richard Gall

19 Dec 2018

10 min read

4 things in tech that might die in 2019

Richard Gall

19 Dec 2018

10 min read

If you’re in and around the tech industry, you’ve probably noticed that hype is an everyday reality. People spend a lot of time talking about what trends and technologies are up and coming and what people need to be aware of - they just love it. Perhaps second only to the fashion industry, the tech world moves through ideas quickly, with innovation piling up upon the next innovation. For the most part, our focus is optimistic: what is important? What’s actually going to shape the future? But with so much change there are plenty of things that disappear completely or simply shift out of view. Some of these things may have barely made an impression, others may have been important but are beginning to be replaced with other, more powerful, transformative and relevant tools. So, in the spirit of pessimism, here is a list of some of the trends and tools that might disappear from view in 2019. Some of these have already begun to sink, while others might leave you pondering whether I’ve completely lost my marbles. Of course, I am willing to be proven wrong. While I will not be eating my hat or any other item of clothing, I will nevertheless accept defeat with good grace in 12 months time. Blockchain Let’s begin with a surprise. You probably expected Blockchain to be hyped for 2019, but no, 2019 might, in fact, be the year that Blockchain dies. Let’s consider where we are right now: Blockchain, in itself, is a good idea. But so far all we’ve really had our various cryptocurrencies looking ever so slightly like pyramid schemes. Any further applications of Blockchain have, by and large, eluded the tech world. In fact, it’s become a useful sticker for organizations looking to raise funds - there are examples of apps out there that support Blockchain backed technologies in the early stages of funding which are later dropped as the company gains support. And it’s important to note that the word Blockchain doesn’t actually refer to one thing - there are many competing definitions as this article on The Verge explains so well. At risk of sounding flippant, Blockchain is ultimately a decentralized database. The reason it’s so popular is precisely because there is a demand for a database that is both scalable and available to a variety of parties - a database that isn’t surrounded by the implicit bureaucracy and politics that even the most prosaic ones do. From this perspective, it feels likely that 2019 will be a search for better ways of managing data - whether that includes Blockchain in its various forms remains to be seen. What you should learn instead of Blockchain A trend that some have seen as being related to Blockchain is edge computing. Essentially, this is all about decentralized data processing at the ‘edge’ of a network, as opposed to within a centralized data center (say, for example, cloud). Understanding the value of edge computing could allow us to better realise what Blockchain promises. Learn edge computing with Azure IoT Development Cookbook. It’s also worth digging deeper into databases - understanding how we can make these more scalable, reliable, and available, are essentially the tasks that anyone pursuing Blockchain is trying to achieve. So, instead of worrying about a buzzword, go back to what really matters. Get to grips with new databases. Learn with Seven NoSQL Databases in a Week Why I could be wrong about Blockchain There’s a lot of support for Blockchain across the industry, so it might well be churlish to dismiss it at this stage. Blockchain certainly does offer a completely new way of doing things, and there are potentially thousands of use cases. If you want to learn Blockchain, check out these titles: Mastering Blockchain, Second Edition Foundations of Blockchain Blockchain for Enterprise Hadoop and big data If Blockchain is still receiving considerable hype, then big data has been slipping away quietly for the last couple of years. Of course, it hasn’t quite disappeared - data is now a normal part of reality. It’s just that trends like artificial intelligence and cloud have emerged to take its place and place even greater emphasis on what we’re actually doing with that data, and how we’re doing it. Read more: Why is Hadoop dying? With this change in emphasis, we’ve also seen the slow death of Hadoop. In a world that increasingly cloud native, it simply doesn’t make sense to run data on a cluster of computers - instead, leveraging public cloud makes much more sense. You might, for example, use Amazon S3 to store your data and then Spark, Flink, or Kafka for processing. Of course, the advantages of cloud are well documented. But in terms of big data, cloud allows for much greater elasticity in terms of scale, greater speed, and makes it easier to perform machine learning thanks to in built features that a number of the leading cloud vendors provide. What you should learn instead of Hadoop The future of big data largely rests in tools like Spark, Flink and Kafka. But it’s important to note it’s not really just about a couple of tools. As big data evolves, focus will need to be on broader architectural questions about what data you have, where it needs to be stored and how it should be used. Arguably, this is why ‘big data’ as a concept will lose valence with the wider community - it will still exist, but will be part of parcel of everyday reality, it won’t be separate from everything else we do. Learn the tools that will drive big data in the future: Apache Spark 2: Data Processing and Real-Time Analytics [Learning Path] Apache Spark: Tips, Tricks, & Techniques [Video] Big Data Processing with Apache Spark Learning Apache Flink Apache Kafka 1.0 Cookbook Why I could be wrong about Hadoop Hadoop 3 is on the horizon and could be the saving grace for Hadoop. Updates suggest that this new iteration is going to be much more amenable to cloud architectures. Learn Hadoop 3: Apache Hadoop 3 Quick Start Guide Mastering Hadoop 3 R 12 to 18 months ago debate was raging over whether R or Python was the best language for data. As we approach the end of 2018, that debate seems to have all but disappeared, with Python finally emerging as the go-to language for anyone working with data. There are a number of reasons for this: Python has the best libraries and frameworks for developing machine learning models. TensorFlow, for example, which runs on top of Keras, makes developing pretty sophisticated machine and deep learning systems relatively easy. R, however, simply can’t match Python in this way. With this ease comes increased adoption. If people want to ‘get into’ machine learning and artificial intelligence, Python is the obvious choice. This doesn’t mean R is dead - instead, it will continue to be a language that remains relevant for very specific use cases in research and data analysis. If you’re a researcher in a university, for example, you’ll probably be using R. But it at least now has to concede that it will never have the reach or levels of growth that Python has. What you should learn instead of R This is obvious - if you’re worried about R’s flexibility and adaptability for the future, you need to learn Python. But it’s certainly not the only option when it comes to machine learning - the likes of Scala and Go could prove useful assets on your CV, for machine learning and beyond. Learn a new way to tackle contemporary data science challenges: Python Machine Learning - Second Edition Hands-on Supervised Machine Learning with Python [Video] Machine Learning With Go Scala for Machine Learning - Second Edition Why I could be wrong about R R is still an incredibly useful language when it comes to data analysis. Particularly if you’re working with statistics in a variety of fields, it’s likely that it will remain an important part of your skill set for some time. Check out these R titles: Getting Started with Machine Learning in R [Video] R Data Analysis Cookbook - Second Edition Neural Networks with R IoT IoT is a term that has been hanging around for quite a while now. But it still hasn’t quite delivered on the hype that it originally received. Like Blockchain, 2019 is perhaps IoT’s make or break year. Even if it doesn’t develop into the sort of thing it promised, it could at least begin to break down into more practical concepts - like, for example edge computing. In this sense, we’d stop talking about IoT as if it were a single homogenous trend about to hit the modern world, but instead a set of discrete technologies that can produce new types of products, and complement existing (literal) infrastructure. The other challenge that IoT faces in 2019 is that the very concept of a connected world depends upon decision making - and policy - beyond the world of technology and business. If, for example, we’re going to have smart cities, there needs to be some kind of structure in place on which some degree of digital transformation can take place. Similarly, if every single device is to be connected in some way, questions will need to be asked about how these products are regulated and how this data is managed. Essentially, IoT is still a bit of a wild west. Given the year of growing scepticism about technology, major shifts are going to be unlikely over the next 12 months. What to learn One way of approaching IoT is instead to take a step back and think about the purpose of IoT, and what facets of it are most pertinent to what you want to achieve. Are you interested in collecting and analyzing data? Or developing products that have in built operational intelligence. Once you think about it from this perspective, IoT begins to sound less like a conceptual behemoth, and something more practical and actionable. Why I could be wrong about IoT Immediate shifts in IoT might be slow, but it could begin to pick up speed in organizations that understand it could have a very specific value. In this sense, IoT is a little like Blockchain - it’s only really going to work if we can move past the hype, and get into the practical uses of different technologies. Check out some of our latest IoT titles: Internet of Things Programming Projects Industrial Internet Application Development Introduction to Internet of Things [Video] Alexa Skills Projects Does anything really die in tech? You might be surprised at some of the entries on this list - others, not so much. But either way, it’s worth pointing out that ultimately nothing ever really properly disappears in tech. From a legacy perspective change and evolution often happens slowly, and in terms of innovation buzzwords and hype don’t simply vanish, they mature and influence developments in ways we might not have initially expected. What will really be important in 2019 is to be alive to these shifts, and give yourself the best chance of taking advantage of change when it really matters.

0
0
7410

article-image-time-series-modeling-what-is-it-why-it-matters-how-its-used

Sunith Shetty

10 Aug 2018

11 min read

Time series modeling: What is it, Why it matters and How it's used

Sunith Shetty

10 Aug 2018

11 min read

A series can be defined as a number of events, objects, or people of a similar or related kind coming one after another; if we add the dimension of time, we get a time series. A time series can be defined as a series of data points in time order. In this article, we will understand what time series is and why it is one of the essential characteristics for forecasting. This article is an excerpt from a book written by Harish Gulati titled SAS for Finance. The importance of time series What importance, if any, does time series have and how will it be relevant in the future? These are just a couple of fundamental questions that any user should find answers to before delving further into the subject. Let's try to answer this by posing a question. Have you heard the terms big data, artificial intelligence (AI), and machine learning (ML)? These three terms make learning time series analysis relevant. Big data is primarily about a large amount of data that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interaction. AI is a kind of technology that is being developed by data scientists, computational experts, and others to enable processes to become more intelligent, while ML is an enabler that is helping to implement AI. All three of these terms are interlinked with the data they use, and a lot of this data is time series in its nature. This could be either financial transaction data, the behavior pattern of individuals during various parts of the day, or related to life events that we might experience. An effective mechanism that enables us to capture the data, store it, analyze it, and then build algorithms to predict transactions, behavior (and life events, in this instance) will depend on how big data is utilized and how AI and MI are leveraged. A common perception in the industry is that time series data is used for forecasting only. In practice, time series data is used for: Pattern recognition Forecasting Benchmarking Evaluating the influence of a single factor on the time series Quality control For example, a retailer may identify a pattern in clothing sales every time it gets a celebrity endorsement, or an analyst may decide to use car sales volume data from 2012 to 2017 to set a selling benchmark in units. An analyst might also build a model to quantify the effect of Lehman's crash at the height of the 2008 financial crisis in pushing up the price of gold. Variance in the success of treatments across time periods can also be used to highlight a problem, the tracking of which may enable a hospital to take remedial measures. These are just some of the examples that showcase how time series analysis isn't limited to just forecasting. In this chapter, we will review how the financial industry and others use forecasting, discuss what a good and a bad forecast is, and hope to understand the characteristics of time series data and its associated problems. Forecasting across industries Since one of the primary uses of time series data is forecasting, it's wise that we learn about some of its fundamental properties. To understand what the industry means by forecasting and the steps involved, let's visit a common misconception about the financial industry: only lending activities require forecasting. We need forecasting in order to grant personal loans, mortgages, overdrafts, or simply assess someone's eligibility for a credit card, as the industry uses forecasting to assess a borrower's affordability and their willingness to repay the debt. Even deposit products such as savings accounts, fixed-term savings, and bonds are priced based on some forecasts. How we forecast and the rationale for that methodology is different in borrowing or lending cases, however. All of these areas are related to time series, as we inevitably end up using time series data as part of the overall analysis that drives financial decisions. Let's understand the forecasts involved here a bit better. When we are assessing an individual's lending needs and limits, we are forecasting for a single person yet comparing the individual to a pool of good and bad customers who have been offered similar products. We are also assessing the individual's financial circumstances and behavior through industry-available scoring models or by assessing their past behavior, with the financial provider assessing the lending criteria. In the case of deposit products, as long as the customer is eligible to transact (can open an account and has passed know your customer (KYC), anti-money laundering (AML), and other checks), financial institutions don't perform forecasting at an individual level. However, the behavior of a particular customer is primarily driven by the interest rate offered by the financial institution. The interest rate, in turn, is driven by the forecasts the financial institution has done to assess its overall treasury position. The treasury is the department that manages the central bank's money and has the responsibility of ensuring that all departments are funded, which is generated through lending and attracting deposits at a lower rate than a bank lends. The treasury forecasts its requirements for lending and deposits, while various teams within the treasury adhere to those limits. Therefore, a pricing manager for a deposit product will price the product in such a way that the product will attract enough deposits to meet the forecasted targets shared by the treasury; the pricing manager also has to ensure that those targets aren't overshot by a significant margin, as the treasury only expects to manage a forecasted target. In both lending and deposit decisions, financial institutions do tend to use forecasting. A lot of these forecasts are interlinked, as we saw in the example of the treasury's expectations and the subsequent pricing decision for a deposit product. To decide on its future lending and borrowing positions, the treasury must have used time series data to determine what the potential business appetite for lending and borrowing in the market is and would have assessed that with the current cash flow situation within the relevant teams and institutions. Characteristics of time series data Any time series analysis has to take into account the following factors: Seasonality Trend Outliers and rare events Disruptions and step changes Seasonality Seasonality is a phenomenon that occurs each calendar year. The same behavior can be observed each year. A good forecasting model will be able to incorporate the effect of seasonality in its forecasts. Christmas is a great example of seasonality, where retailers have come to expect higher sales over the festive period. Seasonality can extend into months but is usually only observed over days or weeks. When looking at time series where the periodicity is hours, you may find a seasonality effect for certain hours of the day. Some of the reasons for seasonality include holidays, climate, and changes in social habits. For example, travel companies usually run far fewer services on Christmas Day, citing a lack of demand. During most holidays people love to travel, but this lack of demand on Christmas Day could be attributed to social habits, where people tend to stay at home or have already traveled. Social habit becomes a driving factor in the seasonality of journeys undertaken on Christmas Day. It's easier for the forecaster when a particular seasonal event occurs on a fixed calendar date each year; the issue comes when some popular holidays depend on lunar movements, such as Easter, Diwali, and Eid. These holidays may occur in different weeks or months over the years, which will shift the seasonality effect. Also, if some holidays fall closer to other holiday periods, it may lead to individuals taking extended holidays and travel sales may increase more than expected in such years. The coffee shop near the office may also experience lower sales for a longer period. Changes in the weather can also impact seasonality; for example, a longer, warmer summer may be welcome in the UK, but this would impact retail sales in the autumn as most shoppers wouldn't need to buy a new wardrobe. In hotter countries, sales of air-conditioners would increase substantially compared to the summer months' usual seasonality. Forecasters could offset this unpredictability in seasonality by building in a weather forecast variable. We will explore similar challenges in the chapters ahead. Seasonality shouldn't be confused with a cyclic effect. A cyclic effect is observed over a longer period of generally two years or more. The property sector is often associated with having a cyclic effect, where it has long periods of growth or slowdown before the cycle continues. Trend A trend is merely a long-term direction of observed behavior that is found by plotting data against a time component. A trend may indicate an increase or decrease in behavior. Trends may not even be linear, but a broad movement can be identified by analyzing plotted data. Outliers and rare events Outliers and rare events are terminologies that are often used interchangeably by businesses. These concepts can have a big impact on data, and some sort of outlier treatment is usually applied to data before it is used for modeling. It is almost impossible to predict an outlier or rare event but they do affect a trend. An example of an outlier could be a customer walking into a branch to deposit an amount that is 100 times the daily average of that branch. In this case, the forecaster wouldn't expect that trend to continue. Disruptions Disruptions and step changes are becoming more common in time series data. One reason for this is the abundance of available data and the growing ability to store and analyze it. Disruptions could include instances when a business hasn't been able to trade as normal. Flooding at the local pub may lead to reduced sales for a few days, for example. While analyzing daily sales across a pub chain, an analyst may have to make note of a disruptive event and its impact on the chain's revenue. Step changes are also more common now due to technological shifts, mergers and acquisitions, and business process re-engineering. When two companies announce a merger, they often try to sync their data. They might have been selling x and y quantities individually, but after the merger will expect to sell x + y + c (where c is the positive or negative effect of the merger). Over time, when someone plots sales data in this case, they will probably spot a step change in sales that happened around the time of the merger, as shown in the following screenshot: In the trend graph, we can see that online travel bookings are increasing. In the step change and disruptions chart, we can see that Q1 of 2012 saw a substantive increase in bookings, where Q1 of 2014 saw a substantive dip. The increase was due to the merger of two companies that took place in Q1 of 2012. The decrease in Q1 of 2014 was attributed to prolonged snow storms in Europe and the ash cloud disruption from volcanic activity over Iceland. While online bookings kept increasing after the step change, the disruption caused by the snow storm and ash cloud only had an effect on sales in Q1 of 2014. In this case, the modeler will have to treat the merger and the disruption differently while using them in the forecast, as disruption could be disregarded as an outlier and treated accordingly. Also note that the seasonality chart shows that Q4 of each year sees almost a 20% increase in travel bookings, and this pattern continues each calendar year. In this article, we defined time series and learned why it is important for forecasting. We also looked at the characteristics of time series data. To know more how to leverage the analytical power of SAS to perform financial analysis efficiently, you can check out the book SAS for Finance. Read more Getting to know SQL Server options for disaster recovery Implementing a simple Time Series Data Analysis in R Training RNNs for Time Series Forecasting

0
0
7518

article-image-data-professionals-planning-to-learn-this-year-python-deep-learning

Amey Varangaonkar

14 Jun 2018

4 min read

What are data professionals planning to learn this year? Python, deep learning, yes. But also...

Amey Varangaonkar

14 Jun 2018

4 min read

One thing that every data professional absolutely dreads is the day their skills are no longer relevant in the market. In an ever-changing tech landscape, one must be constantly on the lookout for the most relevant, industrially-accepted tools and frameworks. This is applicable everywhere - from application and web developers to cybersecurity professionals. Not even the data professionals are excluded from this, as new ways and means to extract actionable insights from raw data are being found out almost every day. Gone are the days when data pros stuck to a single language and a framework to work with their data. Frameworks are more flexible now, with multiple dependencies across various tools and languages. Not just that, new domains are being identified where these frameworks can be applied, and how they can be applied varies massively as well. A whole new arena of possibilities has opened up, and with that new set of skills and toolkits to work on these domains have also been unlocked. What’s the next big thing for data professionals? We recently polled thousands of data professionals as part of our Skill-Up program, and got some very interesting insights into what they think the future of data science looks like. We asked them what they were planning to learn in the next 12 months. The following word cloud is the result of their responses, weighted by frequency of the tools they chose: What data professionals are planning on learning in the next 12 months Unsurprisingly, Python comes out on top as the language many data pros want to learn in the coming months. With its general-purpose nature and innumerable applications across various use-cases, Python’s sky-rocketing popularity is the reason everybody wants to learn it. Machine learning and AI are finding significant applications in the web development domain today. They are revolutionizing the customers’ digital experience through conversational UIs or chatbots. Not just that, smart machine learning algorithms are being used to personalize websites and their UX. With all these reasons, who wouldn’t want to learn JavaScript, as an important tool to have in their data science toolkit? Add to that the trending web dev framework Angular, and you have all the tools to build smart, responsive front-end web applications. We also saw data professionals taking active interest in the mobile and cloud domains as well. They aim to learn Kotlin and combine its power with the data science tools for developing smarter and more intelligent Android apps. When it comes to the cloud, Microsoft’s Azure platform has introduced many built-in machine learning capabilities, as well as a workbench for data scientists to develop effective, enterprise-grade models. Data professionals also prefer Docker containers to run their applications seamlessly, and hence its learning need seems to be quite high. [box type="shadow" align="" class="" width=""]Has machine learning with JavaScript caught your interest? Don’t worry, we got you covered - check out Hands-on Machine Learning with JavaScript for a practical, hands-on coverage of the essential machine learning concepts using the leading web development language. [/box] With Crypto’s popularity off the roof (sadly, we can’t say the same about Bitcoin’s price), data pros see Blockchain as a valuable skill. Building secure, decentralized apps is on the agenda for many, perhaps. Cloud, Big Data, Artificial Intelligence are some of the other domains that the data pros find interesting, and feel worth skilling up in. Work-related skills that data pros want to learn We also asked the data professionals what skills the data pros wanted to learn in the near future that could help them with their daily jobs more effectively. The following word cloud of their responses paints a pretty clear picture: Valuable skills data professionals want to learn for their everyday work As Machine learning and AI go mainstream, so do their applications in mainstream domains - often resulting in complex problems. Well, there’s deep learning and specifically neural networks to tackle these problems, and these are exactly the skills data pros want to master in order to excel at their work. [box type="shadow" align="" class="" width=""]Data pros want to learn Machine Learning in Python. Do you? Here’s a useful resource for you to get started - check out Python Machine Learning, Second Edition today![/box] So, there it is! What are the tools, languages or frameworks that you are planning to learn in the coming months? Do you agree with the results of the poll? Do let us know. What are web developers favorite front-end tools? Packt’s Skill Up report reveals all Data cleaning is the worst part of data analysis, say data scientists 15 Useful Python Libraries to make your Data Science tasks Easier

0
0
4362

article-image-setting-up-logistic-regression-model-using-tensorflow

Packt Editorial Staff

25 Apr 2018

8 min read

Setting up Logistic Regression model using TensorFlow

Packt Editorial Staff

25 Apr 2018

8 min read

0
0
3925

article-image-decrypting-bitcoin-2017-journey

Ashwin Nair

28 Dec 2017

7 min read

There and back again: Decrypting Bitcoin`s 2017 journey from $1000 to $20000

Ashwin Nair

28 Dec 2017

7 min read

Lately, Bitcoin has emerged as the most popular topic of discussion amongst colleagues, friends and family. The conversations more or less have a similar theme - filled with inane theories around what Bitcoin is and and the growing fear of missing out on Bitcoin mania. Well to be fair, who would want to miss out on an opportunity to grow their money by more than 1000%. That`s the return posted by Bitcoin since the start of the year. Bitcoin at the time of writing this article is at $15,000 with a marketcap over $250 billion. To put the hype in context, Bitcoin is now valued higher than 90% of companies in S&P 500 list. Supposedly, invented by an anonymous group or an individual under the alias Satoshi Nakamoto in 2009, Bitcoin has been seen as a digital currency that the internet might not need but one that it deserves. Satoshi`s vision was to create a robust electronic payment system that functions smoothly without a need for a trusted third party. This was achievable with the help of Blockchain, a digital ledger which records transactions in an indelible fashion and is distributed across multiple nodes. This ensured that no transaction is altered or deleted, thus completely eliminating the need for a trusted third party. For all the excitement around Bitcoin and the increase in interest level towards owning one, the thought of dissecting the roller coaster journey from sub $1000 level to $15,000 this year seemed pretty exciting. Started year with a bang / Global uncertainties leading to Bitcoin rally - Oct 2016 to Jan 2017 Global uncertainties played a big role in driving Bitcoin`s price and boy 2016 was full of it! From Brexit to Trump winning the US elections and major economic commotion in terms of Devaluation of China`s Yuan and India`s Demonetization drive all leading to investors seeking shelter in Bitcoin. The first blip - Jan 2017 to March 2017 China as a country has had a major impact in determining Bitcoin`s fate. Early 2017, China contributed to over 80% of Bitcoin transactions indicating the amount of power Chinese traders and investors had in controlling Bitcoin`s price. However, People's Bank of China`s closer inspection of the exchanges revealed several irregularities in the transactions and business practices. Which eventually led to officials halting withdrawals using Bitcoin transactions and taking stringent steps towards cryptocurrency exchanges. Source - Bloomberg On the path tobecoming mainstream and gaining support from Ethereum - Mar 2017 to June 2017 During this phase, we saw the rise of another cryptocurrency, Ethereum, a close digital cousin of Bitcoin. Similar to Bitcoin, Ethereum is also built on top of Blockchain technology allowing it to build a decentralized public network. However, Ethereum’s capabilities extend beyond being a cryptocurrency and help developers to build and deploy any kind of decentralized applications. Ethereum`s valuation in this period rose from $20 to $375 which was quite beneficial for Bitcoin. As every reference of Ethereum had Bitcoin mentioned as well, whether it was to explain what Ethereum is or how it can take the number 1 cryptocurrency spot in the future. This, coupled with the rise in Blockchain`s popularity, increased Bitcoin`s visibility within USA. The media started observing politicians, celebrities and other prominent personalities speaking on Bitcoin as well. Bitcoin also received a major boost from Chinese exchanges, wherein withdrawals of the cryptocurrency resumed after nearly a four-month freeze. All these factors led to Bitcoin crossing an all time high of $2500, up by more than 150% since the start of the year. The Curious case of Fork - June 2017 to September 2017 The month of July saw the Cryptocurrencies market cap witnessing a sharp decline, with questions being raised on the price volatility and whether Bitcoin`s rally for the year was over? We can of-course now confidently debunk that question. Though, there hasn’t been any proven rationale behind the decline, one of the reasons seems to be profit booking following months of steep rally in valuations witnessed by Bitcoin. Another major factor, which might have driven the price collapse may be an unresolved dispute among leading members of the bitcoin community over how to overcome the growing problem of Bitcoin being slow and expensive. With growing usage of Bitcoin, its performance in terms of transaction time has slowed down. Bitcoin`s network due to its limitation in terms of block size could only execute around 7 transactions per second compared to the VISA Network which could do over 1600 transactions. This also led to transaction fees being increased substantially to $5.00 per transaction and settlement time often taking hours and even days. This eventually put Bitcoin`s flaw in the spotlight when compared with services offered by competitors such as Paypal in terms of cost and transaction time. Source - Coinmarketcap The poor performance of Bitcoin led to investors opting for other cryptocurrencies. The above graph, shows how Bitcoin`s dominance fell substantially compared to other cryptocurrencies such as Ethereum and Litecoin during this time. With the core-community still unable to come to a consensus on how to improve the performance and update the software, the prospect of a “fork” was raised. Fork highlights change in underlying software protocol of Bitcoin to make previous rules valid or invalid. There are two types of blockchain forks - soft fork and hard fork. Around August, the community announced to go ahead with a hard fork in the form of Bitcoin Cash. This news was surprisingly taken in a positive manner leading to Bitcoin rebounding strongly and reaching new heights of around $4000 in price. Once bitten, Twice Shy (China) September 2017 to October 2017 Source - Zerohedge The month of September saw another setback for Bitcoin due to measures taken from People`s Bank of China. This time, PBoC banned initial coin offerings (ICO), thus prohibiting the practice of building and selling cryptocurrency to any investors or to finance any startup projects within China. Based on a report by National Committee of Experts on Internet Financial Security Technology, Chinese Investors were involved in 2.6 Billion Yuan worth of ICOs in January-June, 2017 reflecting China`s exposure towards Bitcoin. My Precious (October 2017 to December 2017) Source - Manmanroe During the last quarter Bitcoin`s surge has shocked even hardcore Bitcoin fanatics. Everything seems to be going right for Bitcoin at the moment. While at the start of the year China was the major contributor towards the hike in Bitcoin`s valuation, now the momentum seemed to have shifted to a much sensible and responsible market in terms of Japan who have embraced Bitcoin in quite a positive manner. As you can see from the below graph, Japan now holds more than 50% of transaction compared to USA which is much lesser in size. Besides Japan, we are also seeing positive developments in country such as Russia and India, who are looking to legalize cryptocurrency usage. Moreover, the level of interests towards Bitcoin from institutional investors is at its peak. All these factors have resulted in Bitcoin to cross the 5 digit mark for the first time in Nov, 2017 and touching an all time high figure of close to $20,000 in December, 2017. Post the record high, Bitcoin has been witnessing a crash and rebound phenomenon in the last two weeks of December. From a record high of $20,000 to $11,000 and now at $15,000, Bitcoin is still a volatile digital currency if one is looking for a quick price appreciation. Despite the valuation dilemma and the price volatility, one thing is sure: the world is warming up to the idea of cryptocurrencies and even owning one. There are already several predictions being made on how Bitcoin`s astronomical growth is going to continue in 2018. However, Bitcoin needs to overcome several challenges before it can replace the traditional currency and and be widely accepting in banking practices. Besides, the rise of other cryptocurriencies such as Ethereum, LiteCoin or bitcoin cash who are looking to dethrone Bitcoin from the #1 spot, there are broader issues at hand which the Bitcoin community should prioritize such as how to curb the effect of Bitcoin`s mining activities on the environment and on having smoother reforms as well as building regulatory roadmap from countries before people actually start using instead of just looking it as a tool for making a quick buck.

0
33
10977

article-image-highest-paying-data-science-jobs-2017

Amey Varangaonkar

27 Nov 2017

10 min read

Highest Paying Data Science Jobs in 2017

Amey Varangaonkar

27 Nov 2017

10 min read

It is no secret that this is the age of data. More data has been created in the last 2 years than ever before. Within the dumps of data created every second, businesses are looking for useful, action worthy insights which they can use to enhance their processes and thereby increase their revenue and profitability. As a result, the demand for data professionals, who sift through terabytes of data for accurate analysis and extract valuable business insights from it, is now higher than ever before. Think of Data Science as a large tree from which all things related to data branch out - from plain old data management and analysis to Big Data, and more. Even the recently booming trends in Artificial Intelligence such as machine learning and deep learning are applied in many ways within data science. Data science continues to be a lucrative and growing job market in the recent years, as evidenced by the graph below: Source: Indeed.com In this article, we look at some of the high paying, high trending job roles in the data science domain that you should definitely look out for if you’re considering data science as a serious career opportunity. Let’s get started with the obvious and the most popular role. Data Scientist Dubbed as the sexiest job of the 21st century, data scientists utilize their knowledge of statistics and programming to turn raw data into actionable insights. From identifying the right dataset to cleaning and readying the data for analysis, to gleaning insights from said analysis, data scientists communicate the results of their findings to the decision makers. They also act as advisors to executives and managers by explaining how the data affects a particular product or process within the business so that appropriate actions can be taken by them. Per Salary.com, the median annual salary for the role of a data scientist today is $122,258, with a range between $106,529 to $137,037. The salary is also accompanied by a whole host of benefits and perks which vary from one organization to the other, making this job one of the best and the most in-demand, in the job market today. This is a clear testament to the fact that an increasing number of businesses are now taking the value of data seriously, and want the best talent to help them extract that value. There are over 20,000 jobs listed for the role of data scientist, and the demand is only growing. Source: Indeed.com To become a data scientist, you require a bachelor’s or a master’s degree in mathematics or statistics and work experience of more than 5 years in a related field. You will need to possess a unique combination of technical and analytical skills to understand the problem statement and propose the best solution, good programming skills to develop effective data models, and visualization skills to communicate your findings with the decision makers. Interesting in becoming a data scientist? Here are some resources to help you get started: Principles of Data Science Data Science Algorithms in a Week Getting Started with R for Data Science [Video] For a more comprehensive learning experience, check out our skill plan for becoming a data scientist on Mapt, our premium skills development platform. Data Analyst Probably a term you are quite familiar with, Data Analysts are responsible for crunching large amounts of data and analyze it to come to appropriate logical conclusions. Whether it’s related to pure research or working with domain-specific data, a data analyst’s job is to help the decision-makers’ job easier by giving them useful insights. Effective data management, analyzing data, and reporting results are some of the common tasks associated with this role. How is this role different than a data scientist, you might ask. While data scientists specialize in maths, statistics and predictive analytics for better decision making, data analysts specialize in the tools and components of data architecture for better analysis. Per Salary.com, the median annual salary for an entry-level data analyst is $55,804, and the range usually falls between $50,063 to $63,364 excluding bonuses and benefits. For more experienced data analysts, this figure rises to around a mean annual salary of $88,532. With over 83,000 jobs listed on Indeed.com, this is one of the most popular job roles in the data science community today. This profile requires a pretty low starting point, and is justified by the low starting salary packages. As you gain more experience, you can move up the ladder and look at becoming a data scientist or a data engineer. Source: Indeed.com You may also come across terms such as business data analyst or simply business analyst which are sometimes interchangeably used with the role of a data analyst. While their primary responsibilities are centered around data crunching, business analysts model company infrastructure, while data analysts model business data structures. You can find more information related to the differences in this interesting article. If becoming a data analyst is something that interests you, here are some very good starting points: Data Analysis with R Python Data Analysis, Second Edition Learning Python Data Analysis [Video] Data Architect Data architects are responsible for creating a solid data management blueprint for an organization. They are primarily responsible for designing the data architecture and defining how data is stored, consumed and managed by different applications and departments within the organization. Because of these critical responsibilities, a data architect’s job is a very well-paid one. Per Salary.com, the median annual salary for an entry-level data architect is $74,809, with a range between $57,964 to $91,685. For senior-level data architects, the median annual salary rises up to $136,856, with a range usually between $121,969 to $159,212. These high figures are justified by the critical nature of the role of a data architect - planning and designing the right data infrastructure after understanding the business considerations to get the most value out of the data. At present, there are over 23,000 jobs for the role listed on Indeed.com, with a stable trend in job seeker interest, as shown: Source: Indeed.com To become a data architect, you need a bachelor’s degree in computer science, mathematics, statistics or a related field, and loads of real-world skills to qualify for even the entry-level positions. Technical skills such as statistical modeling, knowledge of languages such as Python and R, database architectures, Hadoop-based skills, knowledge of NoSQL databases, and some machine learning and data mining are required to become a data architect. You also need strong collaborative skills, problem-solving, creativity and the ability to think on your feet, to solve the trickiest of problems on the go. Suffice to say it’s not an easy job, but it is definitely a lucrative one! Get ahead of the curve, and start your journey to becoming a data architect now: Big Data Analytics Hadoop Blueprints PostgreSQL for Data Architects Data Engineer Data engineers or Big Data engineers are a crucial part of the organizational workforce and work in tandem with data architects and data scientists to ensure appropriate data management systems are deployed and the right kind of data is being used for analysis. They deal with messy, unstructured Big Data and strive to provide clean, usable data to the other teams within the organization. They build high-performance analytics pipelines and develop set of processes for efficient data mining. In many companies, the role of a data engineer is closely associated with that of a data architect. While an architect is responsible for the planning and designing stages of the data infrastructure project, a data engineer looks after the construction, testing and maintenance of the infrastructure. As such data engineers tend to have a more in-depth understanding of different data tools and languages than data architects. There are over 90,000 jobs listed on Indeed.com, suggesting there is a very high demand in the organizations for this kind of a role. An entry level data engineer has a median annual salary of $90,083 per Payscale.com, with a range of $60,857 to $131,851. For Senior Data Engineers, the average salary shoots up to $123,749 as per Glassdoor estimates. Source: Indeed.com With the unimaginable rise in the sheer volume of data, the onus is on the data engineers to build the right systems that empower the data analysts and data scientists to sift through the messy data and derive actionable insights from it. If becoming a data engineer is something that interests you, here are some of our products you might want to look at: Real-Time Big Data Analytics Apache Spark 2.x Cookbook Big Data Visualization You can also check out our detailed skill plan on becoming a Big Data Engineer on Mapt. Chief Data Officer There is a countless number of organizations that build their businesses on data, but don’t manage it that well. This is where a senior executive popularly known as the Chief Data Officer (CDO) comes into play - bearing the responsibility for implementing the organization’s data and information governance and assisting with data-driven business strategies. They are primarily responsible for ensuring that their organization gets the most value out of their data and put appropriate plans in place for effective data quality and its life-cycle management. The role of a CDO is one of the most lucrative and highest paying jobs in the data science frontier. An average median annual pay for a CDO per Payscale.com is around $192,812. Indeed.com lists just over 8000 job postings too - this is not a very large number, but understandable considering the recent emergence of the role and because it’s a high profile, C-suite job. Source: Indeed.com According to a Gartner research, almost 50% companies in a variety of regulated industries will have a CDO in place, by 2017. Considering the demand for the role and the fact that it is only going to rise in the future, the role of a CDO is one worth vying for. To become a CDO, you will obviously need a solid understanding of statistical, mathematical and analytical concepts. Not just that, extensive and high-value experience in managing technical teams and information management solutions is also a prerequisite. Along with a thorough understanding of the various Big Data tools and technologies, you will need strong communication skills and deep understanding of the business. If you’re planning to know more about how you can become a Chief Data Officer, you can browse through our piece on the role of CDO. Why demand for data science professionals will rise It’s hard to imagine an organization which doesn’t have to deal with data, but it’s harder to imagine the state of an organization with petabytes of data and not knowing what to do with it. With the vast amounts of data, organizations deal with these days, the need for experts who know how to handle the data and derive relevant and timely insights from it is higher than ever. In fact, IBM predicts there’s going to be a severe shortage of data science professionals, and thereby, a tremendous growth in terms of job offers and advertised openings, by 2020. Not everyone is equipped with the technical skills and know-how associated with tasks such as data mining, machine learning and more. This is slowly creating a massive void in terms of talent that organizations are looking to fill quickly, by offering lucrative salaries and added benefits. Without the professional expertise to turn data into actionable insights, Big Data becomes all but useless.

0
0
4352

article-image-halloween-costume-data-science-nerds

Packt Editorial Staff

31 Oct 2017

14 min read

(13*3)+ Halloween costume ideas for Data science nerds

Packt Editorial Staff

31 Oct 2017

14 min read

Are you a data scientist, a machine learning engineer, an AI researcher or simply a data enthusiast? Channel the inner data science nerd within you with these geeky ideas for your Halloween costumes! The Data Science Spectrum Don't know what to go as to this evening's party because you've been busy cleaning that terrifying data? Don’t worry, here are some easy-to-put-together Halloween costume ideas just for you. [dropcap]1[/dropcap] Big Data Go as Baymax, the healthcare robot, (who can also turn into battle mode when required). Grab all white clothes that you have. Stuff your tummy with some pillows and wear a white mask with cutouts for eyes. You are all ready to save the world. In fact, convince a friend or your brother to go as Hiro! [dropcap]2[/dropcap] A.I. agent Enter as Agent Smith, the AI antagonist, this Halloween. Lure everyone with your bold black suit paired with a white shirt and a black tie. A pair of polarized sunglasses would replicate you as the AI agent. Capture the crowd by being the most intelligent and cold-hearted personality of all. [dropcap]3[/dropcap] Data Miner Put on your dungaree with a tee. Fix a flashlight atop your cap. Grab a pickaxe from the gardening toolkit, if you have one. Stripe some mud onto your face. Enter the party wheeling with loads of data boxes that you have freshly mined. You’ll definitely grab some traffic for data. Unstructured data anyone? [dropcap]4[/dropcap] Data Lake Go as a Data lake this Halloween. Simply grab any blue item from your closet. Draw some fishes, crabs, and weeds. (Use a child’s marker for that). After all, it represents the data you have. And you’re all set. [dropcap]5[/dropcap] Dark Data Unleash the darkness within your soul! Just kidding. You don’t actually have to turn to the evil side. Just coming up with your favorite black-costume character would do. Looking for inspiration? Maybe, a witch, The dark knight, or The Darth Vader. [dropcap]6[/dropcap] Cloud A fluffy, white cloud is what you need to be this Halloween. Raid your nearby drug store for loads of cotton balls. Better still, tear up that old pillow you have been meaning to throw away for a while. Use the fiber inside to glue onto an unused tee. You will be the cutest cloud ever seen. Don’t forget to carry an umbrella in case you turn grey! [dropcap]7[/dropcap] Predictive Analytics Make your own paper wizard hat with silver stars and moons pasted on it. If you can arrange for an advocate gown, it would be great. Else you could use a long black bed sheet as a cape. And most importantly, a crystal ball to show off some prediction stunts at the Halloween. [dropcap]8[/dropcap] Gradient boosting Enter Halloween as the energy booster. Wear what you want. Grab loads of empty energy drink tetra packs and stick it all over you. Place one on your head too. Wear a nameplate that says “ G-booster Energy drink”. Fuel up some weak models this Halloween. [dropcap]9[/dropcap] Cryptocurrency Wear head to toe black. In fact, paint your face black as well, like the Grim reaper. Then grab a cardboard piece. Cut out a circle, paint it orange, and then draw a gold B symbol, just like you see in a bitcoin. This Halloween costume will definitely grab you the much-needed attention just as this popular cryptocurrency. [dropcap]10[/dropcap] IoT Are you a fan of IoT and the massive popularity it has gained? Then you should definitely dress up as your web-slinging, friendly neighborhood Spiderman. Just grab a spiderman costume from any costume store and attach some handmade web slings. Remember to connect with people by displaying your IoT knowledge. [dropcap]11[/dropcap] Self-driving car Choose a mono-color outfit of your choice (P.S. The color you would choose for your car). Cut out four wheels and paste two on your lower calves and two on your arms. Cut out headlights too. Put on a wiper goggle. And yes you do not need a steering wheel or the brakes, clutch and the accelerator. Enter the Halloween at your own pace, go self-driving this Halloween. Bonus point: You can call yourself Bumblebee or Optimus Prime. Machine Learning and Deep learning Frameworks If machine learning or deep learning is your forte, here are some fresh Halloween costume ideas based on some of the popular frameworks in that space. [dropcap]12[/dropcap] Torch Flame up the party with a costume inspired by the fantastic four superhero, Johnny Storm a.k.a The Human Torch. Wear a yellow tee and orange slacks. Draw some orange flames on your tee. And finally, wear a flame-inspired headband. Someone is a hot machine learning library! [dropcap]13[/dropcap] TensorFlow No efforts for this one. Just arrange for a pumpkin costume, paste a paper cut-out of the TensorFlow logo and wear it as a crown. Go as the most powerful and widely popular deep learning library. You will be the star of the Halloween as you are a Google Kid. [dropcap]14[/dropcap] Caffe Go as your favorite Starbucks coffee this Halloween. Wear any of your brown dress/ tee. Draw or stick a Starbucks logo. And then add frothing to the top by bunching up a cream-colored sheet. Mamma Mia! [dropcap]15[/dropcap] Pandas Go as a Panda this Halloween! Better still go as a group of Pandas. The best option is to buy a panda costume. But if you don’t want that, wear a white tee, black slacks, black goggles and some cardboard cutouts for ears. This will make you not only the cutest animal in the party but also a top data manipulation library. Good luck finding your python in the party by the way. [dropcap]16[/dropcap] Jupyter Notebook Go as a top trending open-source web application by dressing up as the largest planet in our solar system. People would surely be intimidated by your mass and also by your computing power. [dropcap]17[/dropcap] H2O Go to Halloween as a world famous open source deep learning platform. No, no, you don’t have to go as the platform itself. Instead go as the chemical alter-ego, water. Wear all blue and then grab some leftover asymmetric, blue cloth pieces to stick at your sides. Thirsty anyone? Data Viz & Analytics Tools If you’re all about analytics and visualization, grab the attention of every data geek in your party by dressing up as your favorite data insight tools. [dropcap]18[/dropcap] Excel Grab an old white tee and paint some green horizontal stripes. You’re all ready to go as the most widely used spreadsheet. The simplest of costumes, yet the most useful - a timeless classic that never goes out of fashion. [dropcap]19[/dropcap] MatLab If you have seriously run out of all costume ideas, going out as MatLab is your only solution. Just grab a blue tablecloth. Stick or sew it with some orange curtain and throw it over your head. You’re all ready to go as the multi-paradigm numerical computing environment. [dropcap]20[/dropcap] Weka Wear a brown overall, a brown wig, and paint your face brown. Make an orange beak out of a chart paper, and wear a pair orange stockings/ socks with your trousers tucked in. You are all set to enter as a data mining bird with ML algorithms and Java under your wings. [dropcap]21[/dropcap] Shiny Go all Shimmery!! Get some glitter powder and put it all over you. (You’ll have a tough time removing it though). Else choose a glittery outfit, with glittery shoes, and touch-up with some glitter on your face. Let the party see the bling of R that you bring. You will be the attractive storyteller out there. [dropcap]22[/dropcap] Bokeh A colorful polka-dotted outfit and some dim lights to do the magic. You are all ready to grab the show with such a dazzle. Make sure you enter the party gates with Python. An eye-catching beauty with the beast pair. [dropcap]23[/dropcap] Tableau Enter the Halloween as one of your favorite characters from history. But there is a term and condition for this: You cannot talk or move. Enjoy your Halloween by being still. Weird, but you’ll definitely grab everyone’s eye. [dropcap]24[/dropcap] Microsoft Power BI Power up your Halloween party by entering as a data insights superhero. Wear a yellow turtleneck, a stylish black leather jacket, black pants, some mid-thigh high boots and a slick attitude. You’re ready to save your party! Data Science oriented Programming languages These hand-picked Halloween costume ideas are for you if you consider yourself a top coder. By a top coder we mean you’re all about learning new programming languages in your spare and, well, your not so spare time. [dropcap]25[/dropcap] Python Easy peasy as the language looks, the reptile is not that easy to handle. A pair of python-printed shirt and trousers would do the job. You could be getting more people giving you candies some out of fear, other out of the ease. Definitely, go as a top trending and a go-to language which everyone loves! And yes, don’t forget the fangs. [dropcap]26[/dropcap] R Grab an eye patch and your favorite leather pants. Wear a loose white shirt with some rugged waistcoat and a sword. Here you are all decked up as a pirate for your next loot. You’ll surely thank me for giving you a brilliant Halloween idea. But yes! Don’t forget to make that Arrrr (R) noise! [dropcap]27[/dropcap] Java Go as a freshly roasted coffee bean! People in your Halloween party would be allured by your aroma. They would definitely compliment your unique idea and also the fact that you’re the most popular programming language. [dropcap]28[/dropcap] SAS March in your Halloween party up as a Special Airforce Service (SAS) agent. You would be disciplined, accurate, precise and smart. Just like the advanced software suite that goes by the same name. You would need a full black military costume, with a gas mask, some fake ammunition from a nearby toy store, and some attitude of course! [dropcap]29[/dropcap] SQL If you pride yourself on being very organized or are a stickler for the rules, you should go as SQL this Halloween. Prep-up yourself with an overall blue outfit. Spike up your hair and spray some temporary green hair color. Cut out bold letters S, Q, and L from a plain white paper and stick them on your chest. You are now ready to enter the Halloween party as the most popular database of all times. Sink in all the data that you collect this Halloween. [dropcap]30[/dropcap] Scala If Scala is your favorite programming language, add a spring to your Halloween by going as, well, a spring! Wear the brightest red that you have. Using a marker, draw some swirls around your body (You can ask your mom to help). Just remember to elucidate a 3D picture. And you’re all set. [dropcap]31[/dropcap] Julia If you want to make a red carpet entrance to your Halloween party, go as the Academy award-winning actress, Julia Roberts. You can even take up inspiration from her character in the 90s hit film Pretty Woman. For extra oomph, wear a pink, red, and purple necklace to highlight the Julia programming language [dropcap]32[/dropcap] Ruby Act pricey this Halloween. Be the elegant, dynamic yet simple programming language. Go blood red, wear on your brightest red lipstick, red pumps, dazzle up with all the red accessories that you have. You’ll definitely gather some secret admirers around the hall. [dropcap]33[/dropcap] Go Go as the mascot of Go, the top trending programming language. All you need is a blue mouse costume. Fear not if you don’t have one. Just wear a powder blue jumpsuit, grab a baby pink nose, and clip on a fake single, large front tooth. Ready for the party! [dropcap]34[/dropcap] Octave Go as a numerically competent programming language. And if that doesn’t sound very trendy, go as piano keys depicting an octave. You simply need to wear all white and divide your space into 8 sections. Then draw 5 horizontal black stripes. You won’t be able to do that vertically, well, because they are a big number. Here you go, you’re all set to fill the party with your melody. Fancy an AI system inspired Halloween costume? This is for you if you love the way AI works and the enigma that it has thrown around the world. This is for you if you are spellbound with AI magic. You should go dressed as one of these at your Halloween party this season. Just pick up the AI you want to look like and follow as advised. [dropcap]35[/dropcap] IBM Watson Wear a dark blue hat, a matching long overcoat, a vest and a pale blue shirt with a dark tie tucked into the vest. Complement it with a mustache and a brooding look. You are now ready to be IBM Watson at your Halloween party. [dropcap]36[/dropcap] Apple Siri If you want to be all cool and sophisticated like the Apple’s Siri, wear an alluring black turtleneck dress. Don’t forget to carry your latest iPhone and air pods. Be sure you don’t have a sore throat, in case someone needs your assistance. [dropcap]37[/dropcap] Microsoft Cortana If Microsoft Cortana is your choice of voice assistant, dress up as Cortana, the fictional synthetic intelligence character in the Halo video game series. Wear a blue bodysuit. Get a bob if you’re daring. (A wig would also do). Paint some dark blue robot like designs over your body and well, your face. And you’re all set. [dropcap]38[/dropcap] Salesforce Einstein Dress up as the world’s most famous physicist and also an AI-powered CRM. How? Just grab a white shirt, a blue pullover and a blue tie (Salesforce colors). Finish your look with a brown tweed coat, brown pants and shoes, a rugged white wig and mustache, and a deep thought on your face. [dropcap]39[/dropcap] Facebook Jarvis Get inspired by the Iron man’s Jarvis, the coolest A.I. in the Marvel universe. Just grab a plexiglass, draw some holograms and technological symbols over it with a neon marker. (Try to keep the color palette in shades of blues and reds). And fix this plexiglass in a curved fashion in front of your face by a headband. Do practice saying “Hello Mr. Stark.” [dropcap]40[/dropcap] Amazon Echo This is also an easy one. Grab a long, black chart paper. Roll it around in a tube form around your body. Draw the Amazon symbol at the bottom with some glittery, silver sketch pen, color your hair blue, and there you go. If you have a girlfriend, convince her to go as Amazon Alexa. [dropcap]41[/dropcap] SAP Leonardo Put on a hat, wear a long cloak, some fake overgrown mustache, and beard. Accessorize with a color palette and a paintbrush. You will be the Leonardo da Vinci of the Halloween party. Wait a minute, don’t forget to cut out SAP initials and stick them on your cap. After all, you are entering as SAP’s very own digital revolution system. [dropcap]42[/dropcap] Intel Neon Deck the Halloween hall with a Harley Quinn costume. For some extra dramatization, roll up some neon blue lights around your head. Create an Intel logo out of some blue neon lights and wear it as your neckpiece. [dropcap]43[/dropcap] Microsoft Brainwave This one will require a DIY task. Arrange for a red and green t-shirt, cut them into a vertical half. Stitch it in such a way that the green is on the left and the red on the right. Similarly, do that with your blue and yellow pants; with yellow on the left and blue on the right. You will look like the most powerful Microsoft’s logo. Wear a skullcap with wires protruding out and a Hololens like eyewear to go with. And so, you are all ready to enter the Halloween party as Microsoft’s deep learning acceleration platform for real-time AI. [dropcap]44[/dropcap] Sophia, the humanoid Enter with all the confidence and a top-to-toe professional attire. Be ready to answer any question thrown at you with grace and without a stroke of skepticism. And to top it off, sport a clean shaved head. And there, you are all ready to blow off everyone’s mind with a mix of beauty with super intelligent brains. Happy Halloween folks!

0
0
7657

article-image-halloween-costume-ideas-inspired-apache-big-data-projects

Packt Editorial Staff

30 Oct 2017

3 min read

Halloween costume ideas inspired from Apache Big Data Projects

Packt Editorial Staff

30 Oct 2017

3 min read

If you are a busy person who is finding it difficult to decide a Halloween costume for your office party tomorrow or for your kid's trick-or-treating madness, here are some geeky Halloween costume ideas that will make the inner data nerd in you proud! Apache Hadoop Be the cute little yellow baby elephant everyone wants to cuddle. Just grab all the yellow clothes you have. If you don’t, borrow them. Don’t forget to stuff in mini cushions in you. Pop in loads of candy in your mouth. And there, you’re all set to be as the dominant but the cutest framework! Cuteness overloaded. Apache Hive Be the buzz of your Halloween party by going as a top Apache data warehouse. What to wear you ask? Hum around wearing a yellow and white striped dress or a shirt. Compliment your outfit with a pair of black wings, headband with antennae and a small pot of honey. Apache Storm An X-Men fan are you? Go as Storm, the popular fictional superhero. Wear a black bodysuit (leather if possible). Drape a long cape. Put on a grey wig. And channel your inner power. Perhaps people would be able to see the powerful weather-controlling mutant in you and also recognize your ability to process streaming data in real time. Apache Kafka Go all out gothic with an Apache Kafka costume. Dress in a serious black dress and gothic makeup. Don’t forget your black butterfly wings and a choker necklace with linked circles. Keep asking existential questions to random people at the party to throw them off balance. Apache Giraph Put on a yellow tee and brown trousers, cut out some brown imperfect circles and paste them on your tee. Put on a brown cap, and paint your ears brown. Draw some graph representations using a marker all over your hands and palms. You are now Apache Giraph. Apache Singa Be the blend of a flexible Apache Singa with the ferocity of a lion this Halloween! All you need is a yellow tee paired with light brown trousers. Wear a lion’s wig. Grab a mascara and draw some strokes on your cheeks. Paint the tip of your nose using a brown watercolour or some melted chocolate. Apache Spark If you have obsessed over Pokémon Go and equally love the lightning blaze data processing speed of Apache Spark, you should definitely go as the leader of Pokémon Go's Team Instinct. Spark wears an orange hoodie, a black and yellow leather jacket, black jeans and orange gloves. Do remember to carry your Pokemon balls in case you are challenged for a battle. Apache Pig A dark blue dungaree paired with a baby pink tee, a pair of white gloves, purple shoes and yes, a baby pink chart paper cut out of the pig’s face. Wear all of this on and you will look like an Apache Pig. Complement the look with a wide grin when you make an entrance. [caption id="attachment_1414" align="aligncenter" width="708"] Two baby boys dressed in animal costumes in autumn park, focus on baby in elephant costume[/caption] Happy Haloween folks! Watch this space for more data science themed Haloween costume ideas tomorrow.

0
0
2646

How-To Tutorials - Big Data

15 things every BI professional should know about Tableau

Brett Lantz on implementing a decision tree using C5.0 algorithm in R

4 things in tech that might die in 2019

Time series modeling: What is it, Why it matters and How it's used

What are data professionals planning to learn this year? Python, deep learning, yes. But also...

Setting up Logistic Regression model using TensorFlow

Trending Topics

There and back again: Decrypting Bitcoin`s 2017 journey from $1000 to $20000

Highest Paying Data Science Jobs in 2017

(13*3)+ Halloween costume ideas for Data science nerds

Halloween costume ideas inspired from Apache Big Data Projects