Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Tech Guides - Data Analysis

34 Articles
article-image-data-science-new-alchemy
Erol Staveley
18 Jan 2016
7 min read
Save for later

Data Science Is the New Alchemy

Erol Staveley
18 Jan 2016
7 min read
Every day I come into work and sit opposite Greg. Greg (in my humble opinion) is a complete badass. He directly turns information that we’ve had hanging around for years and years into actual currency. Single handedly, he generates more direct revenue than any one individual in the business. When we were shuffling seating positions not too long ago (we now have room for that standing desk I’ve always wanted ❤), we were afraid to turn off his machine in fear of losing thousands upon thousands of dollars. I remember somebody saying “guys, we can’t unplug Skynet”. Nobody fully knows how it works. Nobody except Greg. We joked that by turning off his equipment, we’d ruin Greg's on-the-side Bitcoin mining gig that he was probably running off the back of the company network. We then all looked at one another in a brief moment of silence. We were all thinking the same thing — it wouldn’t surprise any of us if Greg was actually doing this. We wouldn’t know any better. To many, what Greg does is like modern day alchemy. In reality, Greg is a data scientist — an increasingly crucial role that helps businesses deliver more meaningful, relevant interactions with their customers. I like to think of them more as new-age alchemists, who wield keyboards instead of perfectly choreographed vials and alembics. This week - find out how to become a data alchemist with R. Save 50% on some of our top titles... or pick up any 5 for $50! Find them all here!  Content might have been king a few years back. Now, it’s data. Everybody wants more — and the people who can actually make sense of it all. By surveying 20,000 developers, we found out just how valuable these roles are to businesses of all shapes and sizes. Let’s take a look. Every Kingdom Needs an Alchemist Even within quite a technical business, Greg’s work lends a fresh perspective on what it is other developers want from our content. Putting the value of direct revenue generation to one side, the insight we’ve derived from purchasing patterns and user behaviour is incredibly valuable. We’re constantly challenging our own assumptions, and spending more time looking at what our customers are actually doing. We’re not alone in taking this increasingly data-driven approach. In general, the highest data science salaries are paid by large enterprises. This isn’t too surprising considering that’s where the real troves of precious data reside. At such scale, the aggregation and management of data alone can warrant the recruitment of specialised teams. On average though, SMEs are not too far behind when it comes to how much they’re willing to pay for top talent. Average salary by company size. Apache Spark was a particularly important focus going forward for folks in the Enterprise segment. What’s clear is that data science isn’t just for big businesses any more. It’s for everybody. We can see that in the growth of data-related roles for SMEs. We’re paying more attention to data because it represents the actions of our customers, but also because we’ve just got more of it lying around all over the place. Irrespective of company size, the range of industries we captured (and classified) was colossal. Seems like everybody needs an alchemist these days. They Double as Snake Charmers When supply is low and demand is high in a particular job market, we almost always see people move to fill the gap. It’s a key driver of learning. After all, if you’re trying to move to a new role, you’re likely to be developing new skills. It’s no surprise that Python is the go-to choice for data science. It’s an approachable language with some great introductory resources out there on the market like Python for Secret Agents. It also has a fantastic ecosystem of data science libraries and documentation that can help you get up and running quite quickly. Percentage of respondents who said they used a given technology. When looking at roles in more detail, you see strong patterns between technologies used. For example, those using Python were most likely to also be using R. When you dive deeper into the data you start to notice a lot of crossover between certain segments. It was at this point where we were able to also start seeing the relationships between certain technologies in specific segments. For example, the Financial sector was more likely to use R, and also paid (on average) higher salaries to those who had a more diverse technical background. Alchemists Have Many Forms Back at a higher level, what was really interesting is the natural technology groupings that started to emerge between four very distinct ‘types’ of data alchemist. “What are they?”, I hear you ask. The Visualizers Those who bring data to life. They turn what otherwise would be a spreadsheet or a copy-and-paste pie chart into delightful infographics and informative dashboards. Welcome to the realm of D3.js and Tableau. The Wranglers The SME all-stars. They aggregate, clean and process data with Python whilst leveraging the functionality of libraries like pandas to their full potential. A jack of all trades, master of all. The Builders Those who use Hadoop and other OS tools to deploy and maintain large-scale data projects. They keep the world running by building robust, scalable data platforms. The Architects Those who harness the might of the enterprise toolchain. They co-ordinate large scale Oracle and Microsoft deployments, the sheer scale of which would break the minds of mere mortals. Download the Full Report With 20,000 developers taking part overall, our most recent data science survey contains plenty of juicy information about real-world skills, salaries and trends. Packtpub.com In a Land of Data, the Alchemist is King We used to have our reports delivered in Excel. Now we have them as notebooks on Jupyter. If it really is a golden age for developers, data scientists must be having a hard time keeping their inbox clear of all the recruitment spam. What’s really interesting going forward is that the volume of information we have to deal with is only going to increase. Once IoT really kicks off and wearables become more commonly accepted (the sooner the better if you’re Apple), businesses of all sizes will find dealing with data overload to be a key growing pain — regardless of industry. Plenty of web services and platforms are already popping up, promising to deliver ‘actionable insight’ to everybody who can spare the monthly fees. This is fine for standardised reporting and metrics like bounce rate and conversion, but not so helpful if you’re working with a product that’s unique to you. Greg’s work doesn’t just tell us how we can improve our SEO. It shows us how we can make our products better without having to worry about internal confirmation bias. It helps us better serve our customers. That’s why present-day alchemists like Greg, are heroes.
Read more
  • 0
  • 0
  • 2282

article-image-rise-data-science
Akram Hussain
30 Jun 2014
5 min read
Save for later

The Rise of Data Science

Akram Hussain
30 Jun 2014
5 min read
The rise of big data and business intelligence has been one of the hottest topics to hit the tech world. Everybody who’s anybody has heard of the term business intelligence, yet very few can actually articulate what this means. Nonetheless it’s something all organizations are demanding. But you must be wondering why and how do you develop business intelligence? Enter data scientists! The concept of data science was developed to work with large sets of structured and unstructured data. So what does this mean? Let me explain. Data science was introduced to explore and give meaning to random sets of data floating around (we are talking about huge quantities here, that is, terabytes and petabytes), which are then used to analyze and help identify areas of poor performance, areas of improvement, and areas to capitalize on. The concept was introduced for large data-driven organisations that required consultants and specialists to deal with complex sets of data. However, data science has been adopted very quickly by organizations of all shapes and sizes, so naturally an element of flexibility would be required to fit data scientists in the modern work flow. There seems to be a shortage for data scientists and an increase in the amount of data out there. The modern data scientist is one who would be able to apply analytical skills necessary to any organization with or without large sets of data available. They are required to carry out data mining tasks to discover relevant meaningful data. Yet, smaller organizations wouldn’t have enough capital to invest in paying for a person who is experienced enough to derive such results. Nonetheless, because of the need for information, they might instead turn to a general data analyst and help them move towards data science and provide them with tools/processes/frameworks that allow for the rapid prototyping of models instead. The natural flow of work would suggest data analysis comes after data mining, and in my opinion analysis is at the heart of the data science. Learning languages like R and Python are fundamental to a good data scientist’s tool kit. However would a data scientist with a background in mathematics and statistics and little to no knowledge of R and Python still be as efficient? Now, the way I see it, data science is composed of four key topic areas crucial to achieving business intelligence, which are data mining, data analysis, data visualization, and machine learning. Data analysis can be carried out in many forms; it’s essentially looking at data and understanding it to make a factual conclusion from it (in simple terms). A data scientist may choose to use Microsoft Excel and VBA to analyze their data, but it wouldn’t be as accurate, clean, or as in depth as using Python or R, but it sure would be useful as a quick win with smaller sets of data. The approach here is that starting with something like Excel doesn’t mean it’s not counted as data science, it’s just a different form of it, and more importantly it actually gives a good foundation to progress on to using things like MySQL, R, Julia, and Python, as with time, business needs would grow and so would expectations of the level of analysis. In my opinion, a good data scientist is not one who knows more than one or two languages or tools, but one who is well-versed in the majority of them and knows which language and skill set are best suited to the task in hand. Data visualization is hugely important, as numbers themselves tell a story, but when it comes to representing the data to customers or investors, they're going to want to view all the different aspects of that data as quickly and easily as possible. Graphically representing complex data is one of the most desirable methods, but the way the data is represented varies dependent on the tool used, for example R’s GGplot2 or Python’s Matplotlib. Whether you’re working for a small organization or a huge data-driven company, data visualization is crucial. The world of artificial intelligence introduced the concept of machine learning, which has exploded on the scene and to an extent is now fundamental to large organizations. The opportunity for organizations to move forward by understanding a consumer’s behaviour and equally matching their expectations has never been so valuable. Data scientists are required to learn complex algorithms and core concepts such as classifications, recommenders, neural networks, and supervised and unsupervised learning techniques. This is just touching the edges of this exciting field, which goes into much more depth especially with emerging concepts such as deep learning.   To conclude, we covered the basic fundamentals of data science and what it means to be data scientists. For all you R and Python developers (not forgetting any mathematical wizards out there), data science has been described as the ‘Sexiest job of 21st century’  as well as being handsomely rewarding too. The rise in jobs for data scientists has without question exploded and will continue to do so; according to global management firm McKinsey & Company, there will be a shortage of 140,000 to 190,000 data scientists due to the continued rise of ‘big data’.
Read more
  • 0
  • 0
  • 1544

article-image-aspiring-data-analyst-meet-your-new-best-friend-excel
Akram Hussain
30 Jun 2014
4 min read
Save for later

Aspiring Data Analyst, Meet Your New Best Friend: Excel

Akram Hussain
30 Jun 2014
4 min read
In general, people want to associate themselves with cool job titles and one that indirectly says both that you’re clever and you get paid well, so what’s better than telling someone you’re a data analyst? Personally, as a graduate in Economics I always thought my natural career progression would be to go into a role of an analyst working for a banking organization, a private hedge fund, or an investment firm. I’m guessing at some point all people with a background in maths or some form of statistics have envisaged becoming a hotshot investment banker, right? However, the story was very different for me; I somehow was fortunate enough to fall into the tech world and develop a real interest in programming. What I found really interesting was that programming languages and data sets go hand in hand surprisingly well, which uncovered a relatively new field to me known as data science. Here’s how the story goes – I combined my academic skills with programming, which opened up a world of opportunity, allowing me to appreciate and explore data analysis on a whole new level. Nowadays, I’m using languages like Python and R to mix background knowledge of statistical data with my new-found passion. Yet that’s not how it started. It started with Excel. Now if you want to eventually move into the field of data science, you have to become competent in data analysis. I personally recommend Excel as a starting point. There are many reasons for this, one being that you don’t have to be technical wizard to get started and more importantly, Excel’s functionalities for data analysis are more powerful than you would expect and a lot quicker and efficient in resolving queries and allowing you to visualize them too. Excel has an inbuilt Data tab to get you started: The screenshot shows the basic analytical features to get you started within Excel. It’s separate to any functions and sum calculations that could be used. However, one useful and really handy plugin called Data Analysis is missing from that list. If you click on: File | Options | Add-ins and then choose Analysis tool and Analysis tool pack - VBA from the list and select Go, you will be prompt with the following image: Once you select the add-ins (as shown above) you will now find an awesome new tag in your data tab called Data Analysis: This allows you to run different methods of analysis on your data, anything from histograms, regressions, correlations, to t-tests. Personally I found this to save me tons of time. Excel also offers features such as Pivot-tables and functions like V-look ups, both extremely useful for data analysis, especially when you require multiple tables of information for large sets of data. A V-look up function is very useful when trying to identify products in a database that have the same set of IDs but are difficult to find. A more useful feature for analysis I found was using pivot tables. One of the best things about a pivot table is that it saves so much time and effort when you have a large set of data that you need to categorize and analyze quickly from a database. Additionally, there’s a visual option named a pivot chart, which allows you to visualize all your data in the pivot table. There are many useful tutorials and training available online on pivot tables for free. Overall, Excel provides a solid foundation for most analysts starting out. A general search on the job market for “Excel data” returns a search result of over 120,000 jobs all specific to an analyst role. To conclude, I wouldn’t underestimate Excel for learning the basics and getting valuable experience with large sets of data. From there, you can progress to learning a language like Python or R (and then head towards the exciting and supercool field of data science). With R’s steep learning curve, Python is often recommended as the best place to start, especially for people with little or no background in programming. But don’t dismiss Excel as a powerful first step, as it can easily become your best friend when entering the world of data analysis.
Read more
  • 0
  • 0
  • 1631
Banner background image

article-image-war-data-science-python-versus-r
Akram Hussain
30 Jun 2014
7 min read
Save for later

The War on Data Science: Python versus R

Akram Hussain
30 Jun 2014
7 min read
Data science The relatively new field of data science has taken the world of big data by storm. Data science gives valuable meaning to large sets of complex and unstructured data. The focus is around concepts like data analysis and visualization. However, in the field of artificial intelligence, a valuable concept known as Machine Learning has now been adopted by organizations and is becoming a core area for many data scientists to explore and implement. In order to fully appreciate and carry out these tasks, data scientists are required to use powerful languages. R and Python currently dominate this field, but which is better and why? The power of R R offers a broad, flexible approach to data science. As a programming language, R focuses on allowing users to write algorithms and computational statistics for data analysis. R can be very rewarding to those who are comfortable using it. One of the greatest benefits R brings is its ability to integrate with other languages like C++, Java, C, and tools such as SPSS, Stata, Matlab, and so on. The rise to prominence as the most powerful language for data science was supported by R’s strong community and over 5600 packages available. However, R is very different to other languages; it’s not as easily applicable to general programming (not to say it can’t be done). R’s strength and its ability to communicate with every data analysis platform also limit its ability outside this category. Game dev, Web dev, and so on are all achievable, but there’s just no benefit of using R in these domains. As a language, R is difficult to adopt with a steep learning curve, even for those who have experience in using statistical tools like SPSS and SAS. The violent Python Python is a high level, multi-paradigm programming language. Python has emerged as one of the more promising languages of recent times thanks to its easy syntax and operability with a wide variety of different eco-systems. More interestingly, Python has caught the attention of data scientists over the years, and thanks to its object-oriented features and very powerful libraries, Python has become the go-to language for data science, many arguing it’s taken over R. However, like R, Python has its flaws too. One of the drawbacks in using Python is its speed. Python is a slow language and one of the fundamentals of data science is speed! As mentioned, Python is very good as a programming language, but it’s a bit like a jack of all trades and master of none. Unlike R, it doesn’t purely focus on data analysis but has impressive libraries to carry out such tasks. The great battle begins While comparing the two languages, we will go over four fundamental areas of data science and discuss which is better. The topics we will explore are data mining, data analysis, data visualization, and machine learning. Data mining: As mentioned, one of the key components to data science is data mining. R seems to win this battle; in the 2013 Data Miners Survey, 70% of data miners (from the 1200 who participated in the survey) use R for data mining. However, it could be argued that you wouldn’t really use Python to “mine” data but rather use the language and its libraries for data analysis and development of data models. Data analysis: R and Python boast impressive packages and libraries. Python, NumPy, Pandas, and SciPy’s libraries are very powerful for data analysis and scientific computing. R, on the other hand, is different in that it doesn’t offer just a few packages; the whole language is formed around analysis and computational statistics. An argument could be made for Python being faster than R for analysis, and it is cleaner to code sets of data. However, I noticed that Python excels at the programming side of analysis, whereas for statistical and mathematical programming R is a lot stronger thanks to its array-orientated syntax. The winner of this is debatable; for mathematical analysis, R wins. But for general analysis and programming clean statistical codes more related to machine learning, I would say Python wins. Data visualization: the “cool” part of data science. The phrase “A picture paints a thousand words” has never been truer than in this field. R boasts its GGplot2 package which allows you to write impressively concise code that produces stunning visualizations. However. Python has Matplotlib, a 2D plotting library that is equally as impressive, where you can create anything from bar charts and pie charts, to error charts and scatter plots. The overall concession of the two is that R’s GGplot2 offers a more professional feel and look to data models. Another one for R. Machine learning: it knows the things you like before you do. Machine learning is one of the hottest things to hit the world of data science. Companies such as Netflix, Amazon, and Facebook have all adopted this concept. Machine learning is about using complex algorithms and data patterns to predict user likes and dislikes. It is possible to generate recommendations based on a user’s behaviour. Python has a very impressive library, Scikit-learn, to support machine learning. It covers everything from clustering and classification to building your very own recommendation systems. However, R has a whole eco system of packages specifically created to carry out machine learning tasks. Which is better for machine learning? I would say Python’s strong libraries and OOP syntax might have the edge here. One to rule them all From the surface of both languages, they seem equally matched on the majority of data science tasks. Where they really differentiate is dependent on an individual’s needs and what they want to achieve. There is nothing stopping data scientists using both languages. One of the benefits of using R is that it is compatible with other languages and tools as R’s rich packagescan be used within a Python program using RPy (R from Python). An example of such a situation would include using the Ipython environment to carry out data analysis tasks with NumPy and SciPy, yet to visually represent the data we could decide to use the R GGplot2 package: the best of both worlds. An interesting theory that has been floating around for some time is to integrate R into Python as a data science library; the benefits of such an approach would mean data scientists have one awesome place that would provide R’s strong data analysis and statistical packages with all of Python’s OOP benefits, but whether this will happen remains to be seen. The dark horse We have explored both Python and R and discussed their individual strengths and flaws in data science. As mentioned earlier, they are the two most popular and dominant languages available in this field. However a new emerging language called Julia might challenge both in the future. Julia is a high performance language. The language is essentially trying to solve the problem of speed for large scale scientific computation. Julia is expressive and dynamic, it’s fast as C, it can be used for general programming (its focus is on scientific computing) and the language is easy and clean to use. Sounds too good to be true, right?
Read more
  • 0
  • 0
  • 3512
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime