Data | 37 articles | Tech News, Tutorials & Expert Insights

article-image-interview-tirthajyoti-sarkar-and-shubhadeep-roychowdhury-data-wrangling-with-python

25 Oct 2018

7 min read

“Data is the new oil but it has to be refined through a complex processing network” - Tirthajyoti Sarkar and Shubhadeep Roychowdhury [Interview]

25 Oct 2018

Data is the new oil and is just as crude as unrefined oil. To do anything meaningful - modeling, visualization, machine learning, for predictive analysis – you first need to wrestle and wrangle with data. We recently interviewed Dr. Tirthajyoti Sarkar and Shubhadeep Roychowdhury, the authors of the course Data Wrangling with Python. They talked about their new course and discuss why do data wrangling and why use Python to do it. Key Takeaways Python boasts of a large, broad library equipped with a rich set of modules and functions, which you can use to your advantage and manipulate complex data structures NumPy, the Python library for fast numeric array computations and Pandas, a package with fast, flexible, and expressive data structures are helpful in working with “relational” or “labeled” data. Web scraping or data extraction becomes easy and intuitive with Python libraries, such as BeautifulSoup4 and html5lib. Regex, the tiny, highly specialized programming language inside Python can create patterns that help match, locate, and manage text for large data analysis and searching operations Present interesting, interactive visuals of your data with Matplotlib, the most popular graphing and data visualization module for Python. Easily and quickly separate information from a huge amount of random data using Pandas, the preferred Python tool for data wrangling and modeling. Full Interview Congratulations on your new course ‘Data wrangling with Python’. What this course is all about? Data science is the ‘sexiest job’ of 21st century’ (at least until Skynet takes over the world). But for all the emphasis on ‘Data’, it is the ‘Science’ that makes you - the practitioner - valuable. To practice high-quality science with data, first you need to make sure it is properly sourced, cleaned, formatted, and pre-processed. This course teaches you the most essential basics of this invaluable component of the data science pipeline – data wrangling. What is data wrangling and why should you learn it well? “Data is the new Oil” and it is ruling the modern way of life through incredibly smart tools and transformative technologies. But oil from the rig is far from being usable. It has to be refined through a complex processing network. Similarly, data needs to be curated, massaged and refined to become fit for use in intelligent algorithms and consumer products. This is called “wrangling” and (according to CrowdFlower) all good data scientists spend almost 60-80% of their time on this, each day, every project. It generally involves the following: Scraping the raw data from multiple sources (including web and database tables), Inputing, formatting, transforming – basically making it ready for use in the modeling process (e.g. advanced machine learning), Handling missing data gracefully, Detecting outliers, and Being able to perform quick visualizations (plotting) and basic statistical analysis to judge the quality of your formatted data This course aims to teach you all the core ideas behind this process and to equip you with the knowledge of the most popular tools and techniques in the domain. As the programming framework, we have chosen Python, the most widely used language for data science. We work through real-life examples and at the end of this course, you will be confident to handle a myriad array of sources to extract, clean, transform, and format your data for further analysis or exciting machine learning model building. Walk us through your thinking behind how you went about designing this course. What’s the flow like? How do you teach data wrangling in this course? The lessons start with a refresher on Python focusing mainly on advanced data structures, and then quickly jumping into NumPy and Panda libraries as fundamental tools for data wrangling. It emphasizes why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of specialized pre-built routines in Python. Thereafter, it covers how using the same Python backend, one can extract and transform data from a diverse array of sources - internet, large database vaults, or Excel financial tables. Further lessons teach how to handle missing or wrong data, and reformat based on the requirement from a downstream analytics tool. The course emphasizes learning by real example and showcases the power of an inquisitive and imaginative mind primed for success. What other tools are out there? Why do data wrangling with Python? First, let us be clear that there is no substitute for the data wrangling process itself. There is no short-cut either. Data wrangling must be performed before the modeling task but there is always the debate of doing this process using an enterprise tool or by directly using a programming language and associated frameworks. There are many commercial, enterprise-level tools for data formatting and pre-processing, which does not involve coding on the part of the user. Common examples of such tools are: General purpose data analysis platforms such as Microsoft Excel (with add-ins) Statistical discovery package such as JMP (from SAS) Modeling platforms such as RapidMiner Analytics platforms from niche players focusing on data wrangling such as – Trifacta, Paxata, Alteryx At the end of the day, it really depends on the organizational approach whether to use any of these off-the-shelf tools or to have more flexibility, control, and power by using a programming language like Python to perform data wrangling. As the volume, velocity, and variety (three V’s of Big Data) of data undergo rapid changes, it is always a good idea to develop and nurture significant amount of in-house expertise in data wrangling. This is done using fundamental programming frameworks so that an organization is not betrothed to the whims and fancies of any particular enterprise platform as a basic task as data wrangling. Some of the obvious advantages of using an open-source, free programming paradigm like Python for data wrangling are: General purpose open-source paradigm putting no restriction on any of the methods you can develop for the specific problem at hand Great eco-system of fast, optimized, open-source libraries, focused on data analytics Growing support to connect Python for every conceivable data source types, Easy interface to basic statistical testing and quick visualization libraries to check data quality Seamless interface of the data wrangling output to advanced machine learning models – Python is the most popular language of choice of machine learning/artificial intelligence these days. What are some best practices to perform data wrangling with Python? Here are five best practices that will help you out in your data wrangling journey with Python. And in the end, all you’ll have is clean and ready to use data for your business needs. Learn the data structures in Python really well Learn and practice file and OS handling in Python Have a solid understanding of core data types and capabilities of Numpy and Pandas Build a good understanding of basic statistical tests and a panache for visualization Apart from Python, if you want to master one language, go for SQL What are some misconceptions about data wrangling? Though data wrangling is an important task, there are certain myths associated with data wrangling which developers should be cautious of. Myths such as: Data wrangling is all about writing SQL query Knowledge of statistics is not required for data wrangling You have to be a machine learning expert to do great data wrangling Deep knowledge of programming is not required for data wrangling Learn in detail about these misconceptions. We hope that these misconceptions would help you realize that data wrangling is not as difficult as it seems. Have fun wrangling data! About the authors Dr. Tirthajyoti Sarkar works as a Sr. Principal Engineer in the semiconductor technology domain where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup. He holds a Master Degree in Computer Science from West Bengal University Of Technology and certifications in Machine Learning from Stanford. 5 best practices to perform data wrangling with Python 4 misconceptions about data wrangling Data cleaning is the worst part of data analysis, say data scientists

0
0
4799

article-image-git-like-all-other-version-control-tools-exists-to-solve-for-one-problem-change-joseph-muli-and-alex-magana-interview

Packt Editorial Staff

09 Oct 2018

5 min read

“Git, like all other version control tools, exists to solve for one problem: change” - Joseph Muli and Alex Magana [Interview]

Packt Editorial Staff

09 Oct 2018

5 min read

An unreliable versioning tool makes product development a herculean task. Creating and enforcing checks and controls for the introduction, scrutiny, approval, merging, and reversal of changes in your source code, are some effective methods to ensure a secure development environment. Git and GitHub offer constructs that enable teams to conduct version control and collaborative development in an effective manner. When properly utilized, Git and GitHub promote agility and collaboration across a team, and in doing so, enable teams to focus and deliver on their mandates and goals. We recently interviewed Joseph Muli and Alex Magana, the authors of Introduction to Git and GitHub course. They discussed the various benefits of Git and GitHub while sharing some best practices and myths. Author Bio Joseph Muli loves programming, writing, teaching, gaming, and travelling. Currently, he works as a software engineer at Andela and Fathom, and specializes in DevOps and Site Reliability. Previously, he worked as a software engineer and technical mentor at Moringa School. You can follow him on LinkedIn and Twitter. Alex Magana loves programming, music, adventure, writing, reading, architecture, and is a gastronome at heart. Currently, he works as a software engineer with BBC News and Andela. Previously, he worked as a software engineer with SuperFluid Labs and Insync Solutions. You can follow him on LinkedIn or GitHub. Key Takeaways Securing your source code with version control is effective only when you do it the right way. Understanding the best practices used in version control can make it easier for you to get the most out of Git and GitHub. GitHub is loaded with an elaborate UI. It’ll immensely help your development process to learn how to navigate the GitHub UI and install the octo tree. GitHub is a powerful tool that is equipped with useful features. Exploring the Feature Branch Workflow and other forking features, such as submodules and rebasing, will enable you to make optimum use of the many features of GitHub. The more elaborate the tools, the more time they can consume if you don’t know your way through them. Master the commands for debugging and maintaining a repository, to speed up your software development process. Keep your code updated with the latest changes using CircleCI or TravisCI, the continuous integration tools from GitHub. The struggle isn’t over unless the code is successfully released to production. With GitHub’s release management features, you can learn to complete hiccup-free software releases. Full Interview Why is Git important? What problem is it solving? Git, like all other version control tools, exists to solve for one problem, change. This has been a recurring issue, especially when coordinating work on teams, both locally and distributed, that specifically being an advantage of Git through hubs such as GitHub, BitBucket and Gitlab. The tool was created by Linus Torvalds in 2005 to aid in development and contribution on the Linux Kernel. However, this doesn’t necessarily limit Git to code any product or project that requires or exhibits characteristics such as having multiple contributors, requiring release management and versioning stands to have an improved workflow through Git. This also puts into perspective that there is no standard, it’s advisable to use what best suits your product(s). What other similar solutions or tools are out there? Why is Git better? As mentioned earlier, other tools do exist to aid in version control. There are a lot of factors to consider when choosing a version control system for your organizations, depending on product needs and workflows. Some organizations have in-house versioning tools because it suits their development. Some organizations, for reasons such as privacy and security or support, may look for an integration with third-party and in-house tools. Git primarily exists to provide for a faster and distributed version system, that is not tied to a central repository, hub or project. It is highly scalable and portable. Other VC tools include Apache SubVersion, Mercurial and Concurrent Versions System (CVS). How can Git help developers? Can you list some specific examples (real or imagined) of how it can solve a problem? A simple way to define Git’s indispensability is enabling fast, persistent and accessible storage. This implies that changes to code throughout a product’s life cycle can be viewed and updated on demand, each with simple and compact commands to enable the process. Developers can track changes from multiple contributors, blame introduced bugs and revert where necessary. Git enables multiple workflows that align to practices such as Agile e.g. feature branch workflows and others including forking workflows for distributed contribution, i.e. to open source projects. What are some best tips for using Git and GitHub? These are some of the best practices you should keep in mind while learning or using Git and GitHub. Document everything Utilize the README.MD and wikis Keep simple and concise naming conventions Adopt naming prefixes Correspond a PR and Branch to a ticket or task. Organize and track tasks using issues. Use atomic commits [box type="shadow" align="" class="" width=""]Editor’s note: To explore these tips further, read the authors’ post ‘7 tips for using Git and GitHub the right way’.[/box] What are the myths surrounding Git and GitHub? Just as every solution or tool has its own positives and negatives, Git is also surrounded by myths one should be aware of. Some of which are: Git is GitHub Backups are equivalent to version control Git is only suitable for teams To effectively use Git, you need to learn every command to work [box type="shadow" align="" class="" width=""]Editor’s note: To explore these tips further, read the authors’ post ‘4 myths about Git and GitHub you should know about’. [/box] GitHub’s new integration for Jira Software Cloud aims to provide teams a seamless project management experience GitLab raises $100 million, Alphabet backs it to surpass Microsoft’s GitHub GitHub introduces ‘Experiments’, a platform to share live demos of their research projects

0
0
3545

article-image-discussing-sap-past-present-and-future-with-rehan-zaidi-senior-sap-abap-consultant-interview

Savia Lobo

04 Oct 2018

11 min read

Discussing SAP: Past, present and future with Rehan Zaidi, senior SAP ABAP consultant [Interview]

Savia Lobo

04 Oct 2018

11 min read

SAP, the market-leading enterprise software, recently became the first European technology company to create an AI ethics advisory panel where they announced seven guiding principles for AI development. These guidelines revolve around recognizing AI’s significant impact on people and society. Also, last week, at the Microsoft Ignite conference, SAP, in collaboration with Microsoft and Adobe announced the Open Data Initiative. This initiative aims to help companies to better govern their data and support privacy and security initiatives. For SAP, this initiative will further bring advancements to its SAP C/4HANA and S/4HANA platforms. All of these actions emphasize SAP’s focus on transforming itself into a responsible data use company. We recently interviewed Rehan Zaidi, a senior SAP ABAP consultant. Rehan became one of the youngest authors on SAP worldwide when he was published in the prestigious SAP Professional Journal in the year 2001. He has written a number of books, and over 20 articles and professional papers for the SAP Professional Journal USA and HR Expert USA, part of the prestigious sapexperts.com library. Following are some of his views on the SAP community and products and how the SAP suite can benefit people including budding professionals, developers, and business professionals. Key takeaways SAP HANA was introduced to accelerate jobs 200 times faster while maintaining the efficiency. The introduction of SAP Leonardo brought in the next wave of AI, machine learning, and blockchain services via the SAP cloud platform and other standalone projects. Experienced ABAP developers should look forward to getting certified in one of the newest technologies such as HANA, and Fiori. SAP ERP Central Component (SAP ECC) is the on-premises version of SAP, and it is usually implemented in medium and large-sized companies. For smaller companies, SAP offers its Business One ERP platform. SAP Fiori is a line of SAP apps meant to address criticisms of SAP's user experience and UI complexity. Q.1. SAP is one of the most widely used ERP tools. How has it evolved over the past few years from the traditional on-premise model to keep up with the cloud-based trends? Yes. Let me cover the main points. SAP started in 1973 as a company and the first product SAP R/98 was launched. In 1979, SAP launched the R/2 design. It had most of the typical processes such as accounting, manufacturing processes, supply chain logistics, and human resources. Then came R/3 that brought the more efficient three-tier (Application server - Database and the presentation (GUI)) architecture, with more new modules and functionalities added. It was a smart system fully configurable by functional consultants. This was further enhanced with Netweaver capability that brought the integration of the internet and SOA capability. SAP introduced the ECC 5 and subsequently the ECC 6 Release. Mobility was later added that lets mobile applications running on devices to access the business processes in SAP and execute them. Both display and updation of SAP data was possible. HANA system was then introduced. It is very fast and efficient - allows you to do 200 times faster jobs than before Cloud systems then became available that let customers connect to SAP Cloud Platform via their on-premise systems and then get access to services such as Mobile Service for app protection, Mobile Service for SAP Fiori, among others. SAP Leonardo was finally introduced, as a way of bringing in next-gen AI, machine learning and blockchain services via standalone projects and the SAP cloud platform. Q.2. Being a Senior ABAP Programming Analyst, how does your typical day look like? Ahh. Well, a typical day! No two days are the same for us. Each morning we find ourselves confronting a problem whose solution is to be devised. A different problem every day- followed by a unique solution. We spend hours and hours finding issues in custom developed programs. We learn about making custom programs run faster. We get requirements of a wide variety of users. They may be in the Human Resource, Materials Management, Sales and Distribution or Finance, and so on. This requirement may be pertaining to an entirely new report or a dialog program having a set of screens. We even do Fiori ( using Javascript based library) applications that may be accessible from the PC or a mobile device. I even get requirements of teaching junior or trainee SAP developers on a wide variety of technologies. Q.3. Can you tell us about the learning curve for SAP? There are different job profiles related to SAP which range from executives to consultants and managers. How do each of them learn or update themselves on SAP? Yes, this is a very important question. A simple answer to this question is that “there is no end to learning and at any stage, learning is never enough,” no matter to which field within SAP you belong to. Things are constantly changing. The more you read and the more you work, you feel that there is a lot to be done. You need to constantly update yourself and learn about new technologies. There is plenty of material available on the internet. I usually refer to the Official SAP website for newer courses available. They even tell you for which background (managers, developers) the courses are relevant to. I also go to open.sap.com for new courses. Whether they are consultants (functional and technical), or managers, all of them need to keep themselves up-to-date. They must take new courses and learn about innovation in their technology. For example, HR must now study and try to learn about Successfactors. Even integration of SAP HANA with other software might be an interesting topic of today. There are Fiori and HANA related courses for Basis consultants and the corresponding tracks for developers. Some knowhow of newer technologies is also important for managers and executives, since your decisions may need to be adapted based on the underlying technologies running in your systems. You should know the pros and cons of all technologies in order to make the correct move for your business. Q.4. Many believe an SAP certification improves their chances of getting jobs at competitive salaries. How important are certifications? Which SAP certifications should a buddying developer look forward to obtain? When I did my Certification in October 2000, I used to think that Certifications are not important. But now I have realized, yes, it makes a difference. Well, certifications are definitely a plus point. They enhance your CV and allow you to have an edge over those who are not certified. I found some jobs adverts that specifically mention that certification will be required or will be advantageous. However, they are only useful when you have at least 4 years of experience. For a fresh graduate, a certification might not be very useful. A useful SAP consultant/developer is a combination of solid base/foundation of knowledge along with a touch of experience. I suggest all my juniors to go for Certifications in order to strengthen concepts, which include: C_C4C30_1711 - SAP Certified Development Associate – SAP Hybris Cloud for Customer C_CP_11 - SAP Certified Development Associate - SAP Cloud Platform C_FIORDEV_20 - SAP Certified Development Associate - SAP Fiori Application Developer C_HANADEV_13 - SAP Certified Development Associate - SAP HANA C_SMPNHB_30 - SAP Certified Development Associate - SAP Mobile Platform Application Development (SMP 3.0) C_TAW12_750 - SAP Certified Development Associate - ABAP with SAP NetWeaver 7.50 E_HANAAW_12 - SAP Certified Development Specialist - ABAP for SAP HANA For experienced ABAP developers, I suggest getting certified on the newest technologies such as HANA, and Fiori. They may help you get a project quicker and/or at a better rate than others. Q.5. The present buzz is around AI, machine learning, IoT, Big data, and many other emerging technologies. SAP Leonardo works on making it easy to create frameworks for harnessing the latest tech. What are your thoughts on SAP Leonardo? Leonardo is SAP’s response to an AI platform. It should be an important part of SAP’s offerings, mostly built on the SAP cloud platform. SAP has relaunched Leonardo as a digital innovation system. As I understand it, Leonardo allows customers to take advantage of artificial intelligence (AI), machine learning, advanced analytics and blockchain on their company’s data. SAP gives customers an efficient way of using these technologies to solve business issues. It allows you to build a system which, in conjunction with machine learning, searches for results that can be combined with SAP transactions. The benefit with SAP Leonardo is that all the company’s data is available right in the SAP system. Using Leonardo, you have access to all human resources data and any other module data residing in the ERP system. Any company from any industry can make use of Leonardo; it works equally well for retailers, food and beverage companies and medical industries, for organizations working in retail, manufacturing and automotive. An approach that works for one company in a given industry can be applied to other companies in that industry. Suppose a company operates sensors. They can link the sensor data with the data in their SAP systems and even link that with other data, and they can then use the Leonardo capabilities to solve problems or optimize performance. When a problem for one company in an industry is solved, a similar solution may be applied to the entire industry. Yes, in my opinion, Leonardo has a bright future and should be successful. For more information about Leonardo success stories, I encourage readers to check out SAP Leonardo Internet of Things Portfolio & Success Stories. Q. 6. You are currently writing a book on ABAP Objects and Design Patterns expected to be published by the end of 2018. What was your motivation behind writing it? Can you tell us more about ABAP objects? What should readers expect from this book? ABAP and ABAP Objects has gone tremendous changes since some time both on the features (and capability) as well as the syntax. It is the most unsung topic of today. It has been there for quite long but most developers are not aware of it or are not comfortable enough to use them in their day to day work. ABAP is a vast community with developers working in a variety of functional areas. The concepts covered in the book will be generic, allowing the learner to apply them to his or her particular area. This book will cover ABAP objects (the object-oriented extension of the SAP language ABAP) in the latest release of SAP NetWeaver 7.5 and explain the newest advancements. It will start with the programming of objects in general and the basics of ABAP language the developer needs to know to get started. The book will cover the most important topics needed on everyday support jobs and for succeeding in projects. The book will be goal-directed, not a collection of theoretical topics. It won’t just touch on the surface of ABAP objects, but will go in depth from building the basic foundation (e.g., classes and objects created locally and globally) to the intermediary areas (e.g., ALV programming, method chaining, polymorphism, simple and nested interfaces), and then finally into the advanced topics (e.g., shared memory, persistent Objects). The best practices for making better programs via ABAP objects will be shown at the end. No long stories, no boring theory, only pure technical concepts followed by simple examples using coding pertaining to football players. Everything will be presented in a clear, interesting manner, and readers will learn tips and tricks they can apply immediately. Learners, students, new SAP programmers and SAP developers with some experience can use this as an alternative to expensive training books. The book will also save reader’s time searching the internet for help writing new programs. Knowing ABAP objects is key for ABAP developers these days to move forward. Starting from simple ALV reporting requirements, or defining and catching exceptional situations that may occur in a program or even the enhancement technology of BAdIs that lets you enhance standard SAP applications require sound ABAP Objects understanding. In addition, Web Dynpro application development, the Business Object Processing Framework, and even OData service creation to expose data that can be used by Fiori apps all demand solid knowledge of ABAP objects. How to perform predictive forecasting in SAP Analytics Cloud Popular Data sources and models in SAP Analytics Cloud Understanding Text Search and Hierarchies in SAP HANA

0
0
5150

article-image-deep-meta-reinforcement-learning-will-be-the-future-of-ai-where-we-will-be-so-close-to-achieving-artificial-general-intelligence-agi-sudharsan-ravichandiran

Sunith Shetty

13 Sep 2018

9 min read

“Deep meta reinforcement learning will be the future of AI where we will be so close to achieving artificial general intelligence (AGI)”, Sudharsan Ravichandiran

Sunith Shetty

13 Sep 2018

9 min read

0
0
6070

article-image-what-should-we-watch-tonight-ask-a-robot-says-matt-jones-from-ovo-mobile

Neil Aitken

18 Aug 2018

11 min read

What Should We Watch Tonight? Ask a Robot, says Matt Jones from OVO Mobile [Interview]

Neil Aitken

18 Aug 2018

11 min read

Netflix, the global poster child for streamed TV and the use of Big Data to inform the programs they develop, has shown steady customer growth for several years now. Recently, the company revealed that it would be shutting down the user reviews which have been so prominent in their media catalogue interface for so long. In the background, media and telco are merging. AT&T, the telco which undertook the biggest deal in history recently, acquired Time and wants HBO to become like Netflix. Telia, a Finnish telecommunications company bought Bonnier Broadcasting in late July 2018. The video content landscape has changed a great deal in the last decade. Everyone in the entertainment game wants to move beyond broadcast TV and to use data to develop content their users will love and which will give their customer base more variety. This means they can look to data to charge higher subscription rates per user, experiment with tiered subscriptions, decide to localize global content, globalize local content and more. These changes raise two key questions. First, are we heading for a world in which AI and ML based algorithms drive what we watch on TV? And second, are the days of human recommendation being quietly replaced by machine recommendations over which the user has no control? [caption id="attachment_21726" align="aligncenter" width="1392"] As you know, Netflix is acquiring customers fast.[/caption] Source: Statista To get an insider’s view on the answer to those questions, I sat down with Matt Jones of OVO Mobile, one of Australia’s fastest growing telecommunications companies. OVO offer their customers a unique point of difference – streaming video sports content, included in a phone plan. OVO has bought the rights to a number of niche sports in Australia which weren’t previously available and now offer free OTA (Over the Air) digital content for fans of ‘unusual’ sports like Drag Racing or Gymnastics. OTA content is anything delivered to a user’s phone over a wireless network. In OVO’s case, the data used to transport the video content they provide to their users is free. That means customers don’t have to worry about paying more for mobile data so they can watch it – a key concern for users. OVO Mobile and Netflix are in very similar businesses – and Matt has a unique point of view about how Artificial Intelligence and Machine Learning will impact the world of telco and media. Key takeaways What’s changed our media consumption habits: the ubiquitous mobile internet, the always on and connected younger generation, better mobile hardware, improved network performance and capabilities, need for control over content choices. Digitization allows new features –some of which that people have proven to love - binge watching, screening out advert breaks and time shifting. The key to understanding the value of ML and AI is not in understanding the statistical or technical models that are used to enable it, it’s the way AI is used to improve the customer experience your digital customers are having with you. The use of AI in digital/app experience has changed in a way to personalize what users can see which old media could not offer. Content producers use the information they have on us, about the programs we watch, when we watch them and for how long we watch to Contribution of AI / ML towards the delivery of online media is endless in terms of personalisation, context awareness, notification management etc. Social acceptance of media delivered to users on mobile phones is what’s driving change A number of overlapping factors are driving changes in how we engage with content. Social acceptance of the internet and mobile access to it as a core part of life is one key enabler. From a technology perspective, things have changed too. Smartphones now have bigger, higher resolution screens than ever before – and they’re with us all the time. Jones believes this change is part of a cultural evolution in how we relate to technology. He says, “There has also been a generational shift which has taken place. Younger people are used to the small screen being the primary device. They’re all about control, seeking out their interests and consuming these, as opposed to previous generations which was used to mass content distribution from traditional channels like TV.” Other factors include network performance and capability which has improved dramatically in recent years. Data speeds have grown exponentially from 3G networks – launched less than 15 years ago, which could support stuttered low resolution video to 4G and 4.5G enabled networks. These can now support live streaming of High Definition TV. Mobile data allowances in plans and offers from some phone companies to provide some content ‘data free’ (as OVO does with theirs) have also driven uptake. Finally, people want convenience and digital offers that in a way people have never experienced before. Digitization allows new features –some of which that people have proven to love - binge watching, screening out advert breaks and time shifting. What part can AI / machine learning play in the delivery of media online? Artificial Intelligence (AI) is already part of 85% of our online interactions. Gartner suggest, it will be part of every product in the future. The key to understanding the value of ML and AI is not in understanding the statistical or technical models that are used to enable it, it’s the way AI is used to improve the customer experience your digital customers are having with you. When you find a new band in Spotify, when YouTube recommends a funny video you’ll like, when Amazon show you other products that you might like to consider alongside the one you just put in to your basket, that’s AI working to improve your experience. “Over The Top content is exploding. Content owners are going direct to consumer and providing fantastic experiences for their users. What’s changing is the use of AI in digital / app experiences to personalize what users see in ways old media never could.” Says Matt. Matt’s video content recommendation app, for example, ‘learns’ not just what you like to watch but also the times you are most likely to watch it. It then prompts users with a short video to entice them to watch. And the analytics available show just how effective it is. Matt’s app can be up to 5 times more successful at encouraging customers to watch his content, than those who don’t use it. “The list of ways that AI / ML contributes to the delivery of media online is endless. Personalisation, context awareness, notification management …. Endless” By offering users recommendations on content they’ll love, producers can now engage more customers for longer. Content producers use the information they have on us, about the programs we watch, when we watch them and for how long we watch to: Personalise at volume: Apps used to deliver content can personalise what’s shown first to users, based on a number of variables known about them, including the sort of context awareness that can be relatively easy to find on mobile devices. Ultimately, every AI customer experience improvement (including the examples that follow) are all designed to automate the process of providing something special to each individual that they uniquely want. Automation means that can be done at scale, with every customer treated uniquely. Notification management: AI that tracks the success of notifications and acknowledges, critically, when they are not helpful to the user, can be employed to alert users only about things they want to know. These AI solutions provide updates to users based on their preferences and avoid the provision of irrelevant information. Content discovery & Re- engagement: AI and ML can be used in the provision of recommendations as to what users could watch, which expose customers to content they would not otherwise find, but which they are likely to value. Better / more relevant advertising: Advertising which targets a legitimately held, real, customer need is actually useful to viewers. Better analytics for AI can assist in targeting micro segments with ads which contain information customers will value. Lattice, is a Business Insights tool provider. Their ‘Lattice Engine’ product combined information held in multiple cloud based locations and uses AI to automatically assign customers to a segment which suits them. Those data are then provided to a customer’s eCommerce site and other channel interactions, and used to offer content which will help them convert better. Developing better segments: Raw data on real customers can be gathered from digital content systems to inform Above The Line marketing in the real, non digital world. Big data analytics can now be used with accurate segmentation for local area marketing and to tie together digital and retail customer experiences. McKinsey suggest that 36% of companies are actively pursuing strategies, driven from their Big Data reserves. They advise their clients that Big Data can be used to better understand and grow Customer Lifetime Values. In the future - Deep linking for calls-to-action: Some digital content is provided in a form such that viewers can find out more information about an item on screen. Providing a way to deep link from a video screen in to a shopping cart prepopulated with something just seen on screen is an exciting possibility for the future. Cutting steps out of the buying process to make it easier for eCommerce users to transact from within content apps to buying a product they’ve seen on the screen is likely to become a big business. Deep linking raises the value of the content shown to the degree it raise the sales of the products included. Bringing it all together Jones believes those that invest big in AI and machine learning, and of them, those who find a way to draw out insights and act upon them, will be the ultimate victors. “The big winners are going to be the people who connect a fan with content they love and use AI and ML to deliver the best possible experience. It’s about using all the information you have about your users and acting on them.” Said Jones. That commercial incentive is already driving behavior. AI and ML drive already provide personalized content recommendations. Progressive content companies, including Matt’s, are already working on building AI in to every facet of every Digital experience you have. As to whether AI is entirely replacing social media influence, I don’t think that’s the case. The research says people are still 4 times more likely to watch a video if it is recommended to them by a friend. Reviews have always been important to presales on the internet and that applies to TV shows, too. People want to know what real users felt when they used a product. If they can’t get reviews from Netflix, they will simply open a new tab and google for reviews in that while they are thinking of how to find something to watch on Netflix. About Matt Jones, Matt is an industry disruptor, launching the first of its kind Media and Telco brand OVO Mobile in 2015, Matt is the driving force behind convergence of new media & telco – by bringing together Telecommunications with Media Rights and digital broadcast for mass distribution. OVO is a new type of Telco, delivering content that fans are passionate about, streamed live on their mobile or tablet UNLIMITED & data free. OVO has secured exclusive 3 year+ digital broadcast and distribution rights for a range of content owners including Supercars, World Superbikes, 400 Thunder Drag Series, Audi Australia Racing & Gymnastics Australia – with a combined Australian audience estimated at over 7 Million. OVO is a multi-award winner, including winning the Money Magazine Best of the Best Award 2017 for high usage, as well as featuring on A Current Affair, Sunrise, The Today Show, Channel 7 News, Channel 9 News and multiple radio shows for their world-first kids’ mobile phone plan with built-in cyber security protection. As OVO CEO, Matt was nominated for Start-Up Executive of the Year at the CEO Magazine Awards 2017 and was awarded runner-up. The Award recognises the achievements of leaders and professionals, and the contributions they have made to their companies across industry-specific categories. Matt holds a Bachelor of Arts (BA) from the University of Tasmania and regularly speaks at Telco, Sports Marketing and Media forums and events. Matt has held executive leadership roles at leading Telecommunications brands including Telstra (Head of Strategy – Operations), Optus, Vodafone, AAPT, Telecom New Zealand as well as global Management Consulting firms including BearingPoint. Matt lives on the northern beaches of Sydney with his wife Mel and daughters Charlotte and Lucy. How to earn $1m per year? Hint: Learn machine learning We must change how we think about AI, urge AI founding fathers Alarming ways governments are using surveillance tech to watch you

0
0
3495

article-image-blockchain-can-solve-tech-trust-issues-imran-bashir

Richard Gall

05 Jun 2018

4 min read

Blockchain can solve tech's trust issues - Imran Bashir

Richard Gall

05 Jun 2018

4 min read

The hype around blockchain has now reached fever pitch. Now the Bitcoin bubble has all but burst, it would seem that the tech world - and beyond - is starting to think more creatively about how blockchain can be applied. We've started to see blockchain being applied in a huge range of areas; that's likely to grow over the next year or so. We certainly weren't surprised to see blockchain rated highly by many developers working in a variety of fields in this year's Skill Up survey. Around 70% of all respondents believe that blockchain is going to prove to be revolutionary. Read the Skill Up report in full. Sign up to our weekly newsletter and download the PDF for free. To help us make sense of the global enthusiasm and hype for blockchain, we spoke to blockchain expert Imran Bashir. Imran is the author of Mastering blockchain, so we thought he could offer some useful insights into where blockchain is going next. He didn't disappoint. Respondents to the Skill Up survey said that blockchain would be revolutionary. Do you agree? Why? I agree. The fundamental issue that blockchain solves is that of trust. It enables two or more mutually distrusting parties to transact with each other without the need of establishing trust and a trusted third party. This phenomenon alone is enough to start a revolution. Generally, we perform transactions in a centralised and trusted environment, which is a norm and works reasonably well but think about a system where you do not need trust or a central trusted third party to do business. This paradigm fundamentally changes the way we conduct business and results in significant improvements such as cost saving, security and transparency. Why should developers learn blockchain? Do you think blockchain technology is something the average developer should be learning? Why? Any developer should learn blockchain technology because in the next year or so there will be a high demand for skilled blockchain developers/engineers. Even now there are many unfilled jobs, it is said that there are 14 jobs open for every blockchain developer. The future will be built on blockchain; every developer/technologist should strive to learn it. What most excites you about blockchain technology? It is the concept of decentralisation and its application in almost every industry ranging from finance and government to medical and law. We will see applications of this technology everywhere. It will change our lives; just the way Internet did in the 1990s. Also, smart contracts constitute a significant part of blockchain technology, and it allows you to implement Contracts that are automatically executable an enforceable. This ability of blockchain allows you drastically reduce the amount of time it takes for contract enforcement and eliminates the need for third parties and manual processes that can take a long time to come into action. Enforcement in the real world takes a long time, in blockchain world, it is reduced to few minutes, if not seconds, depending on the application and requirements. What tools do you need to learn to take advantage of blockchain? What tools do you think are essential to master in order to take advantage of blockchain? Currently, I think there are some options available. blockchain platforms such as Ethereum and Hyperledger fabric are the most commonly used for development. As such, developers should focus on at least one of these platforms. It is best to start with necessary tools and features available in a blockchain, and once you have mastered the concepts, you can move to using frameworks and APIs, which will ease the development and deployment of decentralised applications. What do you think will be the most important thing for developers to learn in the next 12 months? Learn blockchain technology and at least one related platform. Also explore how to implement business solutions using blockchain which results in bringing about benefits of blockchain such as security, cost-saving and transparency. Thanks for taking the time to talk to us Imran! You can find Imran's book on the Packt store.

0
0
6158

article-image-tableau-powerful-analytics-platform-interview-joshua-milligan

Sunith Shetty

22 May 2018

9 min read

“Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan

Sunith Shetty

22 May 2018

9 min read

Tableau is one of the leading BI tools used by data science and business intelligence professionals today. You can not only use it to create powerful data visualizations but also use it to extract actionable insights for quality decision making thanks to the plethora of tools and features it offers. We recently interviewed Joshua Milligan, a Tableau Zen Master and the author of the book, Learning Tableau. Joshua takes us on an insightful journey into Tableau explaining why it is the Google of data visualization. He tells us all about its current and future focus areas such as Geospatial analysis and automating workflows, the exciting new features and tools such as Hyper, Tableau Prep among other topics. He also gives us a preview of things to come in his upcoming book. Author’s Bio Joshua Milligan, author of the bestselling book, Learning Tableau, has been with Teknion Data Solutions since 2004 and currently serves as a principal consultant. With a strong background in software development and custom .NET solutions, he brings a blend of analytical and creative thinking to BI solutions. Joshua has been named Tableau Zen Master, the highest recognition of excellence from Tableau Software not once but thrice. In 2017, Joshua competed as one of three finalists in the prestigious Tableau Iron Viz competition. As a Tableau trainer, mentor, and leader in the online Tableau community, he is passionate about helping others gain insights from their data. His work has been featured multiple times on Tableau Public’s Viz of the Day and Tableau’s website. He also shares frequent Tableau (and Maestro) tips, tricks, and advice on his blog VizPainter.com. Key Takeaways Tableau is perfectly tailored for business intelligence professionals given its extensive list of offerings from data exploration to powerful data storytelling. The drag-and-drop interface allows you to understand data visually thus enabling anyone to perform and share self service data analytics with colleagues in seconds. Hyper is new in-memory data engine designed for powerful query analytical processing on complex datasets. Tableau Prep, a new data preparation tool released with Tableau 2018.1, allows users to easily combine, shape, analyze and clean the data for compelling analytics. Tableau 2018.1 is expected to bring new geospatial tools, enterprise enhancements to Tableau Server, and new extensions and plugins to create interactive dashboards. Tableau users can expect to see artificial intelligence and machine learning becoming major features in both Tableau and Tableau Prep - thus deriving insights based on users behavior across the enterprise. Full Interview There are many enterprise software for business intelligence, how does Tableau compare against the others? What are the main reasons for Tableau's popularity? Tableau's paradigm is what sets it apart from others. It's not just about creating a chart or dashboard. It's about truly having a conversation with the data: asking questions and seeing instant results as you drag and drop to get new answers that raise deeper questions and then iterating. Tableau allows for a flow of thought through the entire cycle of analytics from data exploration through analysis to data storytelling. Once you understand this paradigm, you will flow with Tableau and do amazing things! There's a buzz in the developer's community that Tableau is the Google of data visualization. Can you list the top 3-5 features in Tableau 10.5 that are most appreciated by the community? How do you use Tableau in your day-to-day work? Tableau 10.5 introduced Hyper - a next-generation data engine that really lays a foundation for enterprise scaling as well as a host of exciting new features and Tableau 2018.1 builds on this foundation. One of the most exciting new features is a completely new data preparation tool - Tableau Prep. Tableau Prep complements Tableau Desktop and allows users to very easily clean, shape, and integrate their data from multiple sources. It’s intuitive and gives you a hands-on, instant feedback paradigm for data preparation in a similar way to what Tableau Desktop enables with data visualization. Tableau 2018.1 also includes new geospatial features that make all kinds of analytics possible. I’m particularly excited about support for the geospatial data types and functions in SQL Server which have allowed me to dynamically draw distances and curves on maps. Additionally, web authoring in Tableau Server is now at parity with Tableau Desktop. I use Tableau every day to help my clients see and understand their data and to make key decisions that drive new business, avoid risk, and find hidden opportunities. Tableau Prep makes it easier to access the data I need and shape it according to the analysis I’ll be doing. Tableau offers a wide range of products to suit their users' needs. How does one choose the right product from their data analytics or visualization need? For example, what are the key differences between Tableau Desktop, Server and Public? Are there any plans for a unified product for the Tableau newbie in the near future? As a consultant at Teknion Data Solutions (a Tableau Gold Partner), I work with clients all the time to help them make the best decisions around which Tableau offering best meets their needs. Tableau Desktop is the go-to authoring tool for designing visualizations and dashboards. Tableau Server, which can be hosted on premises or in the cloud, gives enterprises and organizations the ability to share and scale Tableau. It is now at near parity with Tableau Desktop in terms of authoring. Tableau Online is the cloud-based, Tableau managed solution. Tableau Public allows for sharing public visualizations and dashboards with a world-wide audience. How good is Tableau for Self-Service Analytics / automating workflows? What are the key challenges and limitations? Tableau is amazing for this. Combined with the new data prep tool - Tableau Prep - Tableau really does offer users, across the spectrum (from business users to data scientists), the ability to quickly and easily perform self-service analytics. As with any tool, there are definitely cases which require some expertise to reach a solution. Pulling data from an API or web-based source or even sometimes structuring the data in just the right way for the desired analysis are examples that might require some know-how. But even there, Tableau has the tools that make it possible (for example, the web data connector) and partners (like Teknion Data Solutions) to help put it all together. In the third edition of Learning Tableau, I expand the scope of the book to show the full cycle of analytics from data prep and exploration to analysis and data storytelling. Expect updates on new features and concepts (such as the changes Hyper brings), a new chapter focused on Tableau Prep and strategies for shaping data to perform analytics, and new examples throughout that span multiple industries and common analytics questions. What is the development roadmap for Tableau 2018.1? Are we expecting major feature releases this year to overcome some of the common pain areas in business intelligence? I'm particularly excited about Tableau 2018.1. Tableau hasn't revealed everything yet, but things such as new geospatial tools and features, enterprise enhancements to Tableau Server, the new extensions API, new dashboard tools, and even a new visualization type or two look to be amazing! Tableau is working a lot in the geospatial domain coming up with new plugins/connectors and features. Can we expect Tableau to further strengthen their support for spatial data? What are the other areas/domains that Tableau is currently focused on? I couldn't say what the top 3-5 areas are - but you are absolutely correct that Tableau is really putting some emphasis on geospatial analytics. I think the speed and power of the Hyper data engine makes a lot of things like this possible. Although I don't have any specific knowledge beyond what Tableau has publicly shared, I wouldn't be surprised to see some new predictive and statistical models and expansion of data preparation abilities. What's driving Tableau to Cloud? Can we expect more organizations adopting Tableau on Cloud? There has been a major shift to the cloud by organizations. The ability to manage, scale, ensure up-time, and save costs are driving this move and that in turn makes Tableau's cloud-based offerings very attractive. What does Tableau's future hold, according to you? For example, do you see machine learning and AI-powered analytics platform transformation? Or can we expect Tableau entering the IoT and IIoT domain? Tableau demonstrated a concept around NLQ at the Tableau Conference and has already started building in a few machine learning features. For example, Tableau now recommends joins based on what is learns from behavior of users across the enterprise. Tableau Prep has been designed from the ground-up with machine learning in mind. I fully expect to see AI and machine learning become major features in both Tableau and Tableau Prep – but true to Tableau’s paradigm, they will complement the work of the analyst and allow for deeper insight without obscuring the role that humans play in reaching that insight. I'm excited to see what is announced next! Give us a sneak peek into the book you are currently writing "Learning Tableau 2018.1, Third Edition", expected to be released in the 3rd Quarter this year. What should our readers get most excited about as they wait for this book? Although the foundational concepts behind learning Tableau remain the same, I'm excited about the new features that have been released or will be as I write. Among these are a couple of game-changers such as the new geospatial features and the new data prep tool: Tableau Prep. In addition to updating the existing material, I'll definitely have a new chapter or two covering those topics! If you found this interview to be interesting, make sure you check out other insightful articles on business intelligence: Top 5 free Business Intelligence tools [Opinion] Tableau 2018.1 brings new features to help organizations easily scale analytics [News] Ride the third wave of BI with Microsoft Power BI [Interview - Part 1] Unlocking the secrets of Microsoft Power BI [Interview - Part 2] How Qlik Sense is driving self-service Business Intelligence [Interview]

0
0
4683

article-image-pandas-answers-data-analysis-problems-interview

Amey Varangaonkar

24 Apr 2018

9 min read

“Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou

Amey Varangaonkar

24 Apr 2018

9 min read

It comes as no surprise to many developers, Python has grown to become the preferred language of choice for data science. One of the reasons for its staggering adoption in the data science community is the rich suite of libraries for effective data analysis and visualization - allowing you to extract useful, actionable insights from your data. Pandas is one such Python-based library, that provides a solid platform to carry out high-performance data analysis. Ted Petrou is a data scientist and the founder of Dunder Data, a professional educational company focusing on exploratory data analysis. Before founding Dunder Data, Ted was a data scientist at Schlumberger, a large oil services company, where he spent the vast majority of his time exploring data. Ted received his Master’s degree in statistics from Rice University and has used his analytical skills to play poker professionally. He taught math before becoming a data scientist. He is a strong supporter of learning through practice and can often be found answering questions about pandas on Stack Overflow. In this exciting interview, Ted takes us through an insightful journey into pandas - Python’s premier library for exploratory data analysis, and tells us why it is the go-to library for many data scientists to discover new insights from their data. Key Takeaways Data scientists are in the business of making predictions. To make the right predictions you must know how to analyse your data. to perform data analysis efficiently, you must have a good understanding of the concepts as well be proficient using the tools like pandas. Pandas Cookbook contains step by step solutions to the master the pandas syntax while going through the data exploration journey (missteps et al) to solve the most common and not-so-common problems in data analysis. Unlike R which has several different packages for different data science tasks, pandas offers all data analysis capabilities as a single large Python library. Pandas has good time-series capabilities, making it well-suited for building financial applications. That said, its best use is in data exploration - to find interesting discoveries within the data. Ted says beginners in data science should focus on learning one data science concept at a time and master it thoroughly, rather than getting an overview of multiple concepts at once. Let us start with a very fundamental question - Why is data crucial to businesses these days? What problems does it solve? All businesses, from a child’s lemonade stand to the largest corporations, must account for all their operations in order to be successful. This accounting of supplies, transactions, people, etc., is what we call ‘data’ and gives us historical records of what has transpired in a business. Without this data, we would be reduced to oral history or what humans used for accounting before the advent of writing systems. By collecting and analyzing data, we gain a deeper understanding of how the business is progressing. In the most basic instances, such as with a child’s lemonade stand, we know how many glasses of lemonade have been sold, how much was spent on supplies, and importantly whether the business is profitable. This example is incredibly trivial, but it should be noted that such simple data collection is not something that comes naturally to humans. For instance, many people have a desire to lose weight at some point in their life, but fail to accurately record their daily weight or calorie intake in any regular manner, despite the large number of free services available to help with this. There are so many Python-based libraries out there which can be used for a variety of data science tasks. Where does pandas fit into this picture? pandas is the most popular library to perform the most fundamental tasks of a data analysis. Not many libraries can claim to provide the power and flexibility of pandas for working with tabular data. How does pandas help data scientists in overcoming different challenges in data analysis? What advantages does it offer over domain-specific languages such as R? One of the best reasons to use pandas is because it is so popular. There are a tremendous amount of resources available for it, and an excellent database of questions and answers on StackOverflow. Because the community is so large, you can almost always get an immediate answer to your problem. Comparing pandas to R is difficult as R is an entire language that provides tools for a wide variety of tasks. Pandas is a single large Python library. Nearly all the tasks capable in pandas can be replicated with the right library in R. We would love to hear your journey as a data scientist. Did having a master's degree in statistics help you in choosing this profession? Also tell us something about how you leveraged analytics in professional Poker! My journey to becoming a “data scientist” began long before the term even existed. As a math undergrad, I found out about the actuarial profession, which appealed to me because of its meritocratic pathway to success. Because I wasn’t certain that I wanted to become an actuary, I entered a Ph.D. program in statistics in 2004, the same year that an online poker boom began. After a couple of unmotivating and half-hearted attempts at learning probability theory, I left the program with a masters degree to play poker professionally. Playing poker has been by far the most influential and beneficial resource for understanding real-world risk. Data scientists are in the business of making predictions and there’s no better way to understand the outcomes of predictions you make than by exposing yourself to risk. Your recently published 'pandas Cookbook' has received a very positive response from the readers. What problems in data analysis do you think this book solves? I worked extremely hard to make pandas Cookbook the best available book on the fundamentals of data analysis. The material was formulated by teaching dozens of classes and hundreds of students with my company Dunder Data and my meetup group Houston Data Science. Before getting to what makes a good data analysis, it’s important to understand the difference between the tools available to you and the theoretical concepts. Pandas is a tool and is not much different than a big toolbox in your garage. It is possible to master the syntax of pandas without actually knowing how to complete a thorough data analysis. This is like knowing how to use all the individual tools in your toolbox without knowing how to build anything useful, such as a house. Similarly, understanding theoretical concepts such as ‘split-apply-combine’ or ‘tidy data’ without knowing how to implement them with a specific tool will not get you very far. Thus, in order to make a good data analysis, you need to understand both the tools and the concepts. This is what pandas Cookbook attempts to provide. The syntax of pandas is learned together with common theoretical concepts using real-world datasets. Your readers loved the way you have structured the book and the kind of datasets, examples and functions you have chosen to showcase pandas in all its glory. Was is experience, intuition, or observations that led to this fantastic writing insight? The official pandas documentation is very thorough (well over 1,000 pages) but does not present the features as you would see them in a real data analysis. Most of the operations are shown in isolation on contrived or randomly generated data. In a typical data analysis, it is common for many pandas operations to be called one after another. The recipes in pandas Cookbook expose this pattern to the reader, which will help them when they are completing an actual data analysis. This is not meant to disparage the documentation as I have read it multiple times myself and recommend reading it along with pandas Cookbook. Quantitative finance is one domain where pandas finds major application. How does pandas help in developing better financial applications? In what other domains does pandas find important applications and how? Pandas has good time-series capabilities which makes it well-suited for financial applications. It’s ability to group by specific time periods is a very useful feature. In my opinion, pandas most important application is with exploratory data analysis. It is possible for an analyst to quickly use pandas to find interesting discoveries within the data and visualize the results with either matplotlib or Seaborn. This tight integration, coupled with the Jupyter Notebook interface make for an excellent ecosystem for generating and reporting results to others. Please tell us more about 'pandas Cookbook'. What in your opinion are the 3 major takeaways from it? Are there any prerequisites needed to get the most out of the book? The only prerequisite for pandas Cookbook is a fundamental understanding of the Python programming language. The recipes progress in difficulty from chapter to chapter and for those with no pandas experience, I would recommend reading it cover to cover. One of the major takeaways from the book is to be able to write modern and idiomatic pandas code. Pandas is a huge library and there are always multiple ways of completing each task. This is more of a negative than a positive as beginners notoriously write poorly written and inefficient code. Another takeaway is the ability to probe and investigate data until you find something interesting. Many of the recipes are written as if the reader is experiencing the discovery process alongside the author. There are occasional (and purposeful) missteps in some recipes to show how often the right course of action is not always known. Lastly, I wanted to teach common theoretical concepts of doing a data analysis while simultaneously learning pandas syntax. Finally, what advice would you have for beginners in data science? What things should they keep in mind while designing and developing their data science workflow? Are there any specific resources which they could refer to, apart from this book of course? For those just beginning their data science journey, I would suggest keeping their ‘universe small’. This means concentrating on as few things as possible. It is easy to get caught up with a feeling that you need to keep learning as much as possible. Mastering a few subjects is much better than having a cursory knowledge of many. If you found this interview to be intriguing, make sure you check out Ted’s pandas Cookbook which presents more than 90 unique recipes for effective scientific computation and data analysis.

0
1
5343

article-image-python-machine-learning-expert-interviews

Richard Gall

13 Mar 2018

7 min read

Why is Python so good for AI and Machine Learning? 5 Python Experts Explain

Richard Gall

13 Mar 2018

7 min read

Python is one of the best programming languages for machine learning, quickly coming to rival R's dominance in academia and research. But why is Python so popular in the machine learning world? Why is Python good for AI? Mike Driscoll spoke to five Python experts and machine learning community figures about why the language is so popular as part of the book Python Interviews. Programming is a social activity - Python's community has acknowledged this best Glyph Lefkowitz (@glyph), founder of Twisted, a Python network programming framework, awarded The PSF’s Community Service Award in 2017 AI is a bit of a catch-all term that tends to mean whatever the most advanced areas in current computer science research are. There was a time when the basic graph-traversal stuff that we take for granted was considered AI. At that time, Lisp was the big AI language, just because it was higher-level than average and easier for researchers to do quick prototypes with. I think Python has largely replaced it in the general sense because, in addition to being similarly high-level, it has an excellent third-party library ecosystem, and a great integration story for operating system facilities. Lispers will object, so I should make it clear that I'm not making a precise statement about Python's position in a hierarchy of expressiveness, just saying that both Python and Lisp are in the same class of language, with things like garbage collection, memory safety, modules, namespaces and high-level data structures. In the more specific sense of machine learning, which is what more people mean when they say AI these days, I think there are more specific answers. The existence of NumPy and its accompanying ecosystem allows for a very research-friendly mix of high-level stuff, with very high-performance number-crunching. Machine learning is nothing if not very intense number-crunching. "...Statisticians, astronomers, biologists, and business analysts have become Python programmers and have improved the tooling." The Python community's focus on providing friendly introductions and ecosystem support to non-programmers has really increased its adoption in the sister disciplines of data science and scientific computing. Countless working statisticians, astronomers, biologists, and business analysts have become Python programmers and have improved the tooling. Programming is fundamentally a social activity and Python's community has acknowledged this more than any other language except JavaScript. Machine learning is a particularly integration-heavy discipline, in the sense that any AI/machine learning system is going to need to ingest large amounts of data from real-world sources as training data, or system input, so Python's broad library ecosystem means that it is often well-positioned to access and transform that data. Python allows users to focus on real problems Marc-Andre Lemburg (@malemburg), co-founder of The PSF and CEO of eGenix Python is very easy to understand for scientists who are often not trained in computer science. It removes many of the complexities that you have to deal with, when trying to drive the external libraries that you need to perform research. After Numeric (now NumPy) started the development, the addition of IPython Notebooks (now Jupyter Notebooks), matplotlib, and many other tools to make things even more intuitive, Python has allowed scientists to mainly think about solutions to problems and not so much about the technology needed to drive these solutions. "Python is an ideal integration language which binds technologies together with ease." As in other areas, Python is an ideal integration language, which binds technologies together with ease. Python allows users to focus on the real problems, rather than spending time on implementation details. Apart from making things easier for the user, Python also shines as an ideal glue platform for the people who develop the low-level integrations with external libraries. This is mainly due to Python being very accessible via a nice and very complete C API. Python is really easy to use for math and stats-oriented people Sebastian Raschka (@rasbt), researcher and author of Python Machine Learning I think there are two main reasons, which are very related. The first reason is that Python is super easy to read and learn. I would argue that most people working in machine learning and AI want to focus on trying out their ideas in the most convenient way possible. The focus is on research and applications, and programming is just a tool to get you there. The more comfortable a programming language is to learn, the lower the entry barrier is for more math and stats-oriented people. Python is also super readable, which helps with keeping up-to-date with the status quo in machine learning and AI, for example, when reading through code implementations of algorithms and ideas. Trying new ideas in AI and machine learning often requires implementing relatively sophisticated algorithms and the more transparent the language, the easier it is to debug. The second main reason is that while Python is a very accessible language itself, we have a lot of great libraries on top of it that make our work easier. Nobody would like to spend their time on reimplementing basic algorithms from scratch (except in the context of studying machine learning and AI). The large number of Python libraries which exist, help us to focus on more exciting things than reinventing the wheel. Python is also an excellent wrapper language for working with more efficient C/C++ implementations of algorithms and CUDA/cuDNN, which is why existing machine learning and deep learning libraries run efficiently in Python. This is also super important for working in the fields of machine learning and AI. To summarize, I would say that Python is a great language that lets researchers and practitioners focus on machine learning and AI and provides less of a distraction than other languages. Python has so many features that are attractive for scientific computing Luciano Ramalho (@ramalhoorg) technical principal at ThoughtWorks and fellow of The PSF The most important and immediate reason is that the NumPy and SciPy libraries enable projects such as scikit-learn, which is currently almost a de facto standard tool for machine learning. The reason why NumPy, SciPy, scikit-learn, and so many other libraries were created in the first place is because Python has some features that make it very attractive for scientific computing. Python has a simple and consistent syntax which makes programming more accessible to people who are not software engineers. "Python benefits from a rich ecosystem of libraries for scientific computing." Another reason is operator overloading, which enables code that is readable and concise. Then there's Python's buffer protocol (PEP 3118), which is a standard for external libraries to interoperate efficiently with Python when processing array-like data structures. Finally, Python benefits from a rich ecosystem of libraries for scientific computing, which attracts more scientists and creates a virtuous cycle. Python is good for AI because it is strict and consistent Mike Bayer (@zzzeek), Senior Software Engineer at Red Hat and creator of SQLAlchemy What we're doing in that field is developing our math and algorithms. We're putting the algorithms that we definitely want to keep and optimize into libraries such as scikit-learn. Then we're continuing to iterate and share notes on how we organize and think about the data. A high-level scripting language is ideal for AI and machine learning, because we can quickly move things around and try again. The code that we create spends most of its lines on representing the actual math and data structures, not on boilerplate. A scripting language like Python is even better, because it is strict and consistent. Everyone can understand each other's Python code much better than they could in some other language that has confusing and inconsistent programming paradigms. The availability of tools like IPython notebook has made it possible to iterate and share our math and algorithms on a whole new level. Python emphasizes the core of the work that we're trying to do and completely minimizes everything else about how we give the computer instructions, which is how it should be. Automate whatever you don't need to be thinking about. Getting Started with Python and Machine Learning 4 ways to implement feature selection in Python for machine learning Is Python edging R out in the data science wars?

0
1
12048

article-image-mongodb-popular-nosql-database-today

Amey Varangaonkar

23 Jan 2018

12 min read

Why MongoDB is the most popular NoSQL database today

Amey Varangaonkar

23 Jan 2018

12 min read

If NoSQL is the king, MongoDB is surely its crown jewel. With over 15 million downloads and counting, MongoDB is the most popular NoSQL database today, empowering users to query, manipulate and find interesting insights from their data. Alex Giamas is a Senior Software Engineer at the Department for International Trade, UK. Having worked as a consultant for various startups, he is an experienced professional in systems engineering, as well as NoSQL and Big Data technologies. Alex holds an M. Sc., from Carnegie Mellon University in Information Networking and has attended professional courses in Stanford University. He is a MongoDB-certified developer and a Cloudera-certified developer for Apache Hadoop & Data Science essentials. Alex has worked with a wide array of NoSQL and Big Data technologies, and built scalable and highly available distributed software systems in C++, Java, Ruby and Python. In this insightful interview with MongoDB expert Alex Giamas, we talk about all things related to MongoDB - from why NoSQL databases gained popularity to how MongoDB is making developers’ and data scientists’ work easier and faster. Alex also talks about his book Mastering MongoDB 3.x, and how it can equip you with the tools to become a MongoDB expert! Key Takeaways NoSQL databases have grown in popularity over the last decade because they allow users to query their data without having to learn and master SQL. The rise in popularity of the Javascript-based MEAN stack meant many programmers now prefer MongoDB as their choice of database. MongoDB has grown from being just a JSON data store to become the most popular NoSQL database solution with efficient data manipulation and administration capabilities. The sharding and aggregation framework, coupled with document validations, fine-grained locking, a mature ecosystem of tools and a vibrant community of users are some of the key reasons why MongoDB is the go-to database for many. Database schema design, data modeling, backup and security are some of the common challenges faced by database administrators today. Mastering MongoDB 3.x focuses on these common pain points of the database administrators and shows them how to build robust, scalable database solutions with ease. NoSQL databases seem to have taken the world by storm, and many people now choose various NoSQL database solutions over relational databases. What do you think is the reason for this rise in popularity? That's an excellent question. There are several factors contributing to the rise in popularity for NoSQL databases. Relational databases have served us for 30 years. At some point we realised that the one size fits all model is no longer applicable. While “software is eating the world” as Marc Andreessen has famously written, the diversity and breadth of use cases we use software for has brought an unprecedented specialisation in the level of solutions to our problems. Graph databases, column-based databases and of course document-oriented databases like MongoDB are in essence specialised solutions to particular database problems. If our problem fits the document-oriented use case, it makes more sense to use the right tool for the problem (e.g. MongoDB) than a generic one-size-fits-all RDBMS. Another contributing factor to the rise of NoSQL databases and especially MongoDB is the rise of the MEAN stack, which means Javascript developers can now work from frontend to backend and database. Last but not the least, more than a generation of developers have struggled with SQL and its several variations. The promise that one does not need to learn and master SQL to extract data from the database but can rather do it using Javascript or other more developer friendly tools is just too exciting to pass on. MongoDB struck gold in this aspect, as Javascript is one of the most commonly used programming languages. Using Javascript for querying also opened up database querying to the front end developers which I believe has driven adoption as well. MongoDB is one of the most popular NoSQL databases out there today, and finds application in web development as well as Big Data processing. How does MongoDB aid in effective analytics? In the past few years we have seen the explosive growth of generated data. 80% of the world’s data has been generated in the past 3 years and this will continue to happen even more in the near future with the rise of IoT. This data needs to be stored and most importantly analysed to derive insights and actions. The answer to this problem has been to separate the transactional loads from the analytical loads into OLTP and OLAP databases respectively. Hadoop ecosystem has several frameworks that can store and analyse data. The problem with Hadoop data warehouses/data lakes however is threefold. You need experts to analyse data, they are expensive and it’s difficult to get quickly the answers to your questions. MongoDB bridges this gap by offering efficient analytics capabilities. MongoDB can help developers and technical people get quick insights from data that can help define the direction of research for the data scientists working on the data lake. By utilising tools like the new charts or the BI connector, data warehousing and MongoDB are converging. MongoDB does not aim to substitute Hadoop-based systems but rather complement them and decrease the time to market for data-driven solutions. You have been using MongoDB since 2009, way back when it was in its 1.x version. How has the database has evolved over the years? When I started using MongoDB, it was not much more than a JSON data store. It’s amazing how far MongoDB has come in these 9 years in every aspect. Every piece of software has to evolve and adapt to the always changing environment. MongoDB started off as the JSON data store that is easy to setup and use while being blazingly fast with some caveats. The turning point for MongoDB early in its evolution was introducing sharding. Challenging as it may be to choose the right shard key, being able to horizontally scale using commodity hardware is the feature that has been appreciated the most by developers and architects throughout all these years. The introduction of aggregation framework was another turning point for MongoDB since it allowed developers to build data pipelines using MongoDB data, reducing time to market. Geospatial related features were there from an early point in time and actually one of MongoDB’s earliest and most visible customers, FourSquare was a vivid user of geospatial features in MongoDB. Overall, with time MongoDB has matured and is now a robust database for a wide set of use cases. Document validations, fine grained locking, a mature ecosystem of tools around it and a vibrant community means that no matter the language, state of development, startup or corporate environment, MongoDB can be evaluated as the database choice. There have been of course features and directions that didn’t end up as well as we were originally hoping for. A striking example is the MongoDB MapReduce framework which never lived up to the expectations of developers using MapReduce via Hadoop and has gradually been superseded by the more advanced and more developer-friendly Aggregation framework. What do you think are the most striking features of MongoDB? How does it help you in your day to day activities as a Senior Software Engineer? In my day to day development tasks I almost always use the Aggregation framework. It helps to quickly prototype a pipeline that can transform my data to a format that I can then collaborate with the data scientists to derive useful insights in a fraction of the time needed by traditional tools. Day to day or sprint to the next sprint - what you want from any technology is to be reliable and not get in your way but rather help you achieve the business goals. With MongoDB we can easily store data in JSON format, process it, analyse it and pass it on to different frontend or backend systems without much hassle. What are the different challenges that MongoDB developers and architects usually face while working with MongoDB? How does your book 'Mastering MongoDB 3.x' help, in this regard? The major challenge developers and architects face when choosing to work with MongoDB is the database design. Irrespective of whether we come from an RDBMS or a NoSQL background, designing the database such that it can solve our current and future problems is a difficult task. Having been there and struggled with it in the past, I have put emphasis on how someone coming from a relational background can model different relationships in MongoDB. I have also included easy to understand and follow checklists around different aspects of MongoDB. Backup and security is another challenge that users often face. Backups are many times ignored until it’s too late. In my book I identify all available options and the tradeoffs they come with, including cloud-based options. Security on the other hand is becoming an ever increasing concern for computing systems with data leaks and security breaches happening more often. I have put an emphasis on security both in the relevant chapters and also across most chapters by highlighting common security pitfalls and promoting secure practices wherever possible. MongoDB has commanded a significant market share in the NoSQL databases domain for quite some time now, highlighting its usefulness and viability in the community. That said, what are the 3 areas where MongoDB can get better, in order to stay ahead of its competition? MongoDB has conquered the NoSQL space in terms of popularity. The real question is how/if NoSQL can increase its market share in the overall database market. The most important area of improvement is interoperability. What developers get with popular RDBMS is not only the database engine itself, but also easy ways to integrate it with different systems, from programming frameworks to Big Data and analytics systems. MongoDB could invest heavier in building these libraries that can make a developer’s life easier. Real-time analytics is another area with huge potential in the near future. With IoT rapidly increasing the data volume, data analysts need to be able to quickly derive insights from data. MongoDB can introduce features to address this problem. Finally, MongoDB could improve by becoming more tunable in terms of the performance/consistency tradeoff. It’s probably a bit too much to ask from a NoSQL database to support transactions as this is not what it was designed to be from the very beginning, but it would greatly increase the breadth of use cases if we could sparingly link up different documents and treat them as one, even with severed performance degradation. Artificial Intelligence and Machine Learning are finding useful applications in every possible domain today. Although it's a database, do you foresee MongoDB going the Oracle way and incorporating features to make it AI-compatible? Throughout the past few years, algorithms, processing power and the sheer amount of data that we have available have brought a renewed trust in AI. It is true that we use ML algorithms in almost every problem domain, which is why every vendor is trying to make the developer’s life easier by making their products more AI-friendly. It’s only natural for MongoDB to do the same. I believe that not only MongoDB but every database vendor will have to gradually focus more on how to serve AI effectively, and this will become a key part of their strategy going ahead. Please tell us something more about your book 'Mastering MongoDB 3.x'. What are the 3 key takeaways for the readers? Are there any prerequisites to get the most out of the book? First of all, I would like to say that as a “Mastering” level book we assume that readers have some basic understanding of both MongoDB and programming in general. That being said, I encourage readers to start reading the book and try to pick up the missing parts along way. It’s better to challenge yourself than the other way around. As for the most important takeaways, in no specific order of importance: Know your problem. It’s important to understand and analyse as much as possible the problem that you are trying to solve. This will dictate everything, from data structures, indexing, to database design decisions, to technology choices. On the other hand, if the problem is not well defined then this may be the chance to shine for MongoDB as a database choice as we can store data with minimal hassle. Be ready to scale ahead of time. Whether that is replication or sharding, make sure that you have investigated and identified the correct design and implementation steps so that you can scale when needed. Trying to add an extra shard when load has already peaked in the existing shards is neither fun, nor easy to do. Use aggregation. Being able to transform data in MongoDB before extracting it for processing in an external database is a really important feature and should be used whenever possible, instead of querying large datasets and transforming their data in our application server. Finally, what advice would you give to beginners who would like to be an expert in using MongoDB? How would the learning path to mastering MongoDB look like? What are the key things to focus on in order to master data analytics using MongoDB? To become an expert in MongoDB, one should start by understanding its history and roots. They should understand and master schema design and data modelling. After mastering data modelling, the next step would be to master querying - both CRUD and more advanced concepts. Understanding the aggregation framework and how or when to index would be the next step. With this foundation, one can then move on to cross-cutting concerns like monitoring, backup and security, understanding the different storage engines that MongoDB supports and how to use MongoDB with Big Data. All this knowledge should then provide a strong foundation to move on to the scaling aspects like replication and sharding, with the goal of providing effective fault tolerance and high availability systems. Mastering MongoDB 3.x explains these topics in this order with the intention of getting from beginner to expert in a structured and easy to follow and understand way.

0
0
15458

article-image-statistics-data-science-interview-james-miller

Amey Varangaonkar

09 Jan 2018

9 min read

Why You Need to Know Statistics To Be a Good Data Scientist

Amey Varangaonkar

09 Jan 2018

9 min read

Data Science has popularly been dubbed as the sexiest job of the 21st century. So much so that everyone wants to become a data scientist. But what do you need to get started with data science? Do you need to have a degree in statistics? Why is having sound knowledge of statistics so important to be a good data scientist? We seek answers to these questions and look at data science through a statistical lens, in an interesting conversation with James D. Miller. [author title="James D. Miller"]James is an IBM certified expert and a creative innovator. He has over 35 years of experience in applications and system design & development across multiple platforms and technologies. Jim has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. He is the author or several popular books such as Big Data Visualization, Learning IBM Watson Analytics, Mastering Splunk, and many more. In addition, Jim has written a number of whitepapers and continues to write on a number of relevant topics based upon his personal experiences and industry best practices.[/author] In this interview, we look at some of the key challenges faced by many while transitioning from a data developer role to a data scientist. Jim talks about his new book, Statistics for Data Science and discusses how statistics plays a key role when it comes to finding unique, actionable insights from data in order to make crucial business decisions. Key Takeaways - Statistics for Data Science Data science attempts to uncover the hidden context of data by going beyond answering generic questions such as ‘what is happening’, to tackling questions such as ‘what should be done next’. Statistics for data science cultivates 'structured thinking' in one. For most data developers who are transitioning to the role of data scientist, the biggest challenge often comes in calibrating their thought process - from being data design-driven to more insight-driven Having a sound knowledge of statistics differentiates good data scientists from mediocre ones - it helps them accurately identify patterns in data that can potentially cause changes in outcomes Statistics for Data Science attempts to bridge the learning gap between database development and data science by implementing the statistical concepts and methodologies in R to build intuitive and accurate data models. These methodologies and their implementations are easily transferable to other popular programming languages such as Python. While many data science tasks are being automated these days using different tools and platforms, the statistical concepts and methodologies will continue to form their backbone. Investing in statistics for data science is worth every penny! Full Interview Everyone wants to learn data science today as it is one of the most in-demand skills out there. In order to be a good data scientist, having a strong foundation in statistics has become a necessity. Why do you think is this the case? What importance does statistics have in data science? With Statistics, it has always been about "explaining" (data). With data science, the objective is going beyond questions such as "what happened?" and the "what is happening?" to try to determine "what should be done next?". Understanding the fundamentals of statistics allows one to apply "structured thinking" to interpret knowledge and insights sourced from statistics. You are a seasoned professional in the field of Data Science with over 30 years of experience. We would like to know how your journey in Data Science began, and what changes you have observed in this domain over the 3 decades. I have been fortunate to have had a career that has traversed many platforms and technological trends (in fact over 37 years of diversified projects). Starting as a business applications and database developer, I have almost always worked for the office of finance. Typically, these experiences started with the collection - and then management of - data to be able to report results or assess performance. Over time, the industry has evolved and this work as becoming a “commodity” – with many mature tool options available and plenty of seasoned professionals available to perform the work. Businesses have now become keen to “do something more” with their data assets and are looking to move into the world of data science. The world before us offers enormous opportunities for those not only with a statistical background but someone with a business background that understands and can apply the statistical data sciences to identify new opportunities or competitive advantages. What are the key challenges involved in the transition from being a data developer to becoming a data scientist? How does the knowledge of statistics affect this transition? Does one need a degree in statistics before jumping into Data Science? Someone who has been working actively with data already has a “head start” in that they have experience with managing and manipulating data and data sources. They would also most likely have programming experience and possess the ability to apply logic to data. The challenge will be to “retool” their thinking from data developer to data scientist – for example, going from data querying to data mining. Happily, there is much that the data developer “already knows” about data science and my book Statistics for Data Science attempts to “point out” the skills and experiences that the data developer will recognize as the same or at least have significant similarities. You will find that the field of data science is still evolving and the definition of “data scientist” depends upon the industry, project or organization you are referring to. This means that there are many roles that may involve data science with each having perhaps quite different prerequisites (such as a statistical degree). You have authored a lot of books such as Big Data Visualization, Learning IBM Watson Analytics, etc. with the latest being Statistics for Data Science. Please tell us something about your latest book. The latest book, “Statistics for Data Science”, looks to point out the synergies between a data developer and data scientist and hopes to evolve the data developers thinking “beyond database structures”, but also introduces key concepts and terminologies such as probability, statistical inference, model fitting, classification, regression and more, that can be used to journey into statistics and data science. How is statistics used when it comes to cleaning and pre-processing the data? How does it help the analysis? What other tasks can these statistical techniques be used for? Simple examples of the use of statistics when cleaning and/or pre-processing of data (by a data developer) include data-typing, Min/Max limitation, addressing missing values and so on. A really good opportunity for the use of statistics in data or database development is while modeling data to design appropriate storage structures. Using statistics in data development applies a methodical, structured approach to the process. The use of statistics can be a competitive advantage to any data development project. In the book, for practical purposes, you have shown the implementation of the different statistical techniques using the popular R programming language. Why do you think R is favored by the statisticians so much? What advantages does it offer? R is a powerful, feature-rich, extendable free language with many, many easy to use packages free for download. In addition, R has “a history” within the data science industry. R is also quite easy to learn and be productive with quickly. It also includes many graphics and other abilities “built-in”. Do you foresee a change in the way statistics for data science is used in the near future? In other words, will the dependency on statistical techniques for performing different data science tasks reduce? Statistics will continue to be important to data science. I do see more “automation” of more and more data science tasks through the availability of “off the shelf” packages that can be downloaded and installed and used. Also, the more popular tools will continue to incorporate statistical functions over time. This will allow for the main-streaming of statistics and data science into even more areas of life. The key will be for the user to have an understanding of the key statistical concepts and uses. What advice would you like to give to - 1 Those transitioning from the developer to the data scientist role, and 2. Absolute beginners, who want to take up statistics and data science as a career option? Buy my book! But seriously, keep reading and researching. Expose yourself to as much statistics and data science use cases and projects a possible. Most importantly, as you read about the topic, look for similarities between what you do today and what you are reading about. How does it relate? Always look for opportunities to use something that is new to you to do something you do routinely today. Your book 'Statistics for Data Science' highlights different statistical techniques for data analysis and finding unique insights from data. What are the three key takeaways for the readers, from this book? Again, I see (and point out in the book) key synergies between data or database development and data science. I would urge the reader – or anyone looking to move from data developer to data scientist - to learn through these and perhaps additional examples he or she may be able to find and leverage on their own. Using this technique, one can perhaps navigate laterally, rather than losing the time it would take to “start over” at the beginning (or bottom?) of the data science learning curve. Additionally, I would suggest to the reader that time taken to get acquainted with the R programs and the logic used for statistical computations (this book should be a good start) is time well spent.

0
0
3985

article-image-why-choose-ibm-spss-statistics-r

Amey Varangaonkar

22 Dec 2017

9 min read

Why choose IBM SPSS Statistics over R for your data analysis project

Amey Varangaonkar

22 Dec 2017

9 min read

Data analysis plays a vital role in organizations today. It enables effective decision-making by addressing fundamental business questions based on the understanding of the available data. While there are tons of open source and enterprise tools for conducting data analysis, IBM SPSS Statistics has emerged as a popular tool among statistical analysts and researchers. It offers them the perfect platform to quickly perform data exploration and analysis, and share their findings with ease. [author title=""] Dr. Kenneth Stehlik-Barry Kenneth joined SPSS as Manager of Training in 1980 after using SPSS for his own research for several years. He has used SPSS extensively to analyze and discover valuable patterns that can be used to address pertinent business issues. He received his PhD in Political Science from Northwestern University and currently teaches in the Masters of Science in Predictive Analytics program there. Anthony J. Babinec Anthony joined SPSS as a Statistician in 1978 after assisting Norman Nie, the founder of SPSS, at the University of Chicago. Anthony has led a business development effort to find products implementing technologies such as CHAID decision trees and neural networks. Anthony received his BA and MA in Sociology with a specialization in Advanced Statistics from the University of Chicago and is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has served in different positions including the President. [/author] In this interview, we take a look at the world of statistical data analysis and see how IBM SPSS Statistics makes it easier to derive business sense from data. Kenneth and Anthony also walk us through their recently published book - Data Analysis with IBM SPSS Statistics - and tell us how it benefits aspiring data analysts and statistical researchers. Key Takeaways - IBM SPSS Statistics IBM SPSS Statistics is a key offering of IBM Analytics - providing an integrated interface for statistical analysis on-premise and on the cloud SPSS Statistics is a self-sufficient tool - it does not require you to have any knowledge of SQL or any other scripting language SPSS Statistics helps you avoid the 3 most common pitfalls in data analysis, i.e. handling missing data, choosing the best statistical method for analysis and understanding the results of the analysis R and Python are not direct competitors to SPSS Statistics - instead, you can create customized solutions by integrating SPSS Statistics with these tools for effective analyses and visualization Data Analysis with IBM SPSS Statistics highlights various popular statistical techniques to the readers, and how to use them in order to gather useful hidden insights from their data Full Interview IBM SPSS Statistics is a popular tool for efficient statistical analysis. What do you think are the 3 notable features of SPSS Statistics that make it stand apart from the other tools available out there? SPSS Statistics has a very short learning curve which makes it ideal for analysts to use efficiently. It also has a very comprehensive set of statistical capabilities so virtually everything a researcher would ever need is encompassed in a single application. Finally, SPSS Statistics provides a wealth of features for preparing and managing data so it is not necessary to master SQL or another database language to address data-related tasks. With over 20 years of experience in this field, you have a solid understanding of the subject and, equally, of SPSS Statistics. How do you use the tool in your work? How does it simplify your day to day tasks related to data analysis? I have used SPSS Statistics in my work with SPSS and IBM clients over the years. In addition, I use SPSS for my own research analysis. It allows me to make good use of my time whether I'm serving clients or doing my own analysis because of the breadth of capabilities available within this one program. The fact that SPSS produces presentation-ready output further simplifies things for me since I can collect key results as I work and put them into a draft report and share them as required. What are the prerequisites to use SPSS Statistics effectively? For someone who intends to use SPSS Statistics for their data analysis tasks, how steep is the curve when it comes to mastering the tool? It certainly helps to have a understanding of basic statistics when you begin to use SPSS Statistics but it can be a valuable tool even with a limited background in statistics. The learning curve is a very "gentle slope" when it comes to acquiring sufficient familiarity with SPSS Statistics to use it very effectively. Mastering the software does involve more time and effort but one can accomplish this over time as one builds on the initial knowledge that comes fairly easily. The good news is that one can obtain a lot of value from the software well before one truly masters it by discovering the many features. What are some of the common problems in data analysis? How does this book help the readers overcome them? Some of the most common pitfalls encountered when analyzing data involve handling missing/incomplete data, deciding which statistical method(s) to employ and understanding the results. In the book, we go into the details of detecting and addressing data issues including missing data. We also describe what each statistical technique provides and when it is most appropriate to use each of them. There are numerous examples of SPSS Statistics output and how the results can be used to assess whether a meaningful pattern exists. In the context of all the above, how does your book Data Analysis with IBM SPSS Statistics help readers in their statistical analysis journey? What, according to you, are the 3 key takeaways for the readers from this book? The approach we took with our book was to share with readers the most straightforward ways to use SPSS Statistics to quickly obtain the results needed to effectively conduct data analysis. We did this by showing the best way to proceed when it comes to analyzing data and then showing how this process can be done best in the software. The key takeaways from our book are the way to approach the discovery process when analyzing data, how to find hidden patterns present in the data and what to look for in the results provided by the statistical techniques covered in the book. IBM SPSS Statistics 25 was released recently. What are the major improvements or features introduced in this version? How do these features help the analysts and researchers? There are a lot of interesting new features introduced in SPSS Statistics 25. For starters, you can copy charts as Microsoft Graphic Objects, which allows you to manipulate charts in Microsoft Office. There are changes to the chart editor that make it easier to customize colors, borders, and grid line settings in charts. Most importantly, it allows the implementation of Bayesian statistical methods. Bayesian statistical methods enable the researcher to incorporate prior knowledge and assumptions about model parameters. This facility looks like a good teaching tool for Statistical Educators. Data visualization goes a long way in helping decision-makers get an accurate sense of their data. How does SPSS Statistics help them in this regard? Kenneth: Data visualization is very helpful when it comes to communicating findings to a broader audience and we spend time in the book describing when and how to create useful graphics to use for this purpose. Graphical examination of the data can also provide clues regarding data issues and hidden patterns that warrant deeper exploration. These topics are also covered in the book. Anthony: SPSS Statistics’ data visualizations capabilities are excellent. The menu system makes it easy to generate common chart types. You can develop customized looks and save them as a template to be applied to future charts. Underlying SPSS Graphics is an influential approach called the Grammar of Graphics. The SPSS graphics capabilities are embodied in a versatile syntax called Graphics Programming Language. Do you foresee SPSS Statistics facing stiff competition from open source alternatives in the near future? What is the current sentiment in the SPSS community regarding these topics? Kenneth: Open source tools based alternatives such as Python and R are potential competition for SPSS Statistics but I would argue otherwise. These tools, while powerful, have a much steeper learning curve and will prove difficult for subject matter experts that periodically need to analyze data. SPSS is ideally suited for these periodic analysts whose main expertise lies in their field which could be healthcare, law enforcement, education, human resources, marketing, etc. Anthony: The open source programs have a lot of capability but they are also fairly low-level languages, so you must learn to code. The learning curve is steep, and there are many maintainability issues. R has 2 major releases a year. You can have a situation where the data and commands remain the same, but the result changes when you update R. There are many dependencies among R packages. R has many contributors and is an avenue for getting your hands on new methods. However, there is a wide variance in the quality of the contributors and contributed packages. The occasional user of SPSS has an easier time jumping back in than does the occasional user of open source software. Most importantly, it is easier to employ SPSS in production settings. SPSS Statistics supports custom analytical solutions through integration with R and Python. Is this an intent from IBM to join hands with the open source community? This is a good follow-up question to the one asked before. Actually, the integration with R and Python allows SPSS Statistics to be extended to accommodate a situation in which an analyst wishes to try an algorithm or graphical technique not directly available in the software but which is supported in one of these languages. It also allows those familiar with R or Python to use SPSS Statistics as their platform and take advantage of all the built-in features it comes with, out of the box while still having the option to employ these other languages where they provide additional value. Lastly, this book is designed for analysts and researchers who want to get meaningful insights from their data as quickly as possible. How does this book help them in this regard? SPSS Statistics does make it possible to very quickly pull in data and get insightful results. This book is designed to streamline the steps involved in getting this done while also pointing out some of the less obvious "hidden gems" that we have discovered during the decades of using SPSS in virtually every possible situation.

0
0
4511

article-image-qlik-sense-driving-self-service-business-intelligence

Amey Varangaonkar

12 Dec 2017

11 min read

How Qlik Sense is driving self-service Business Intelligence

Amey Varangaonkar

12 Dec 2017

11 min read

Delivering Business Intelligence solutions to over 40000 customers worldwide, there is no doubt that Qlik has established a strong foothold in the analytics market for many years now. With the self-service capabilities of Qlik Sense, you can take better and more informed decisions than ever before. From simple data exploration to complex dashboarding and cloud-ready, multi-platform analytics, Qlik Sense gives you the power to find crucial, hidden insights from the depths of your data. We got some fascinating insights from our interview with two leading Qlik community members, Ganapati Hegde and Kaushik Solanki, on what Qlik Sense offers to its users and what the future looks like for the BI landscape. [box type="shadow" align="" class="" width=""] Ganapati Hegde Ganapati is an engineer by background and carries an overall IT experience of over 16 years. He is currently working with Predoole Analytics, an award-winning Qlik partner in India, in the presales role. He has worked on BI projects in several industry verticals and works closely with customers, helping them with their BI strategies. His experience in other aspects of IT, like application design and development, cloud computing, networking, and IT Security - helps him design perfect BI solutions. He also conducts workshops on various technologies to increase user awareness and drive their adoption. Kaushik Solanki Kaushik has been a Qlik MVP (Most Valuable Player) for the years 2016 and 2017 and has been working with the Qlik technology for more than 7 years now. An Information technology engineer by profession, he also holds a master’s degree in finance. Having started his career as a Qlik developer, Kaushik currently works with Predoole Analytics as the Qlik Project Delivery Manager and is also a certified QlikView administrator. An active member of Qlik community, his great understanding of project delivery - right from business requirement to final implementation, has helped many businesses take valuable business decisions.[/box] In this exciting interview, Ganapati and Kaushik take us through a compelling journey in self-service analytics, by talking about the rich features and functionalities offered by Qlik Sense. They also talk about their recently published book ‘Implementing Qlik Sense’ and what the readers can learn from it. Key Takeaways With many self-service and guided analytics features, Qlik Sense is perfectly tailored to business users Qlik Sense allows you to build customized BI solutions with an easy interface, good mobility, collaboration, focus on high performance and very good enterprise governance Built-in capabilities for creating its own warehouse, a strong ETL layer and a visualization layer for creating intuitive Business Intelligence solutions are some of the strengths of Qlik Sense With support for open APIs, the BI solutions built using Qlik Sense can be customized and integrated with other applications without any hassle. Qlik Sense is not a rival to Open Source technologies such as R and Python. Qlik Sense can be integrated with R or Python to perform effective predictive analytics ‘Implementing Qlik Sense’ allows you to upgrade your skill-set from a Qlik developer to a Qlik Consultant. The end goal of the book is to empower the readers to implement successful Business Intelligence solutions using Qlik Sense. Complete Interview There has been a significant rise in the adoption of Self-service Business Intelligence across many industries. What role do you think visualization plays in self-service BI? In a vast ocean of self-service tools, where do you think Qlik stands out from the others? As Qlik says visualization alone is not the answer. A strong backend engine is needed which is capable of strong data integration and associations. This then enables businesses to perform self-service and get answers to all their questions. Self-service plays an important role in the choice of visualization tools, as business users today no longer want to go to IT every time they need changes. Self service enable business users to quickly build their own visualization with simple drag and drop. Qlik stands out from the rest in its capability to bring in multiple data sources, enabling users to easily answers questions. Its unique associative engine allows users to find hidden insights. The open API allows easy customization and integrations which is a must for enterprises. Data security and governance is one of the best in Qlik. What are the key differences between QlikView and Qlik Sense? What are the factors crucial to building powerful Business Intelligence solutions with Qlik Sense? QlikView and Qlik Sense are similar yet different. Both share the same engine. On one hand, QlikView is a developer’s delight with the options it offers, and on the other hand, Qlik Sense with its self-service is more suited for business users. Qlik Sense has better mobility and open API as compared to QlikView, making Qlik Sense more customizable and extensible. The beauty of Qlik Sense lies in its ability to help business get answers to their questions. It helps correlate the data between different data sources and making it very meaningful to users. Powerful data visualizations do not necessarily mean beautiful visualizations and Qlik Sense lays special emphasis on this. Finally what the users need is performance, easy interface, good mobility, collaboration and good enterprise governance - something which Qlik Sense provides. Ganapati, you have over 15 years of experience in IT, and have extensively worked in the BI domain for many years. Please tell us something about your journey. How does your daily schedule look like? I have been fortunate in my career to be able to work on multiple technologies ranging from programming, databases, information security, integrations and cloud solutions. All this knowledge is helping me propose the best solutions for my Qlik customers. It’s a pleasure helping customers in their analytical journey and working for a services company helps in meeting customers from multiple domains. The daily schedule involves doing Proof of Concepts/Demos for customers, designing optimum solutions on Qlik, and conducting requirement gathering workshops. It’s a pleasure facing new challenges every day and this helps me increase my knowledge base. Qlik open API opens up amazing new possibilities and lets me come up with out of the box solutions. Kaushik, you have been awarded the Qlik MVP for 2016 and 2017, and have experience of using Qlik's tools for over 7 years. Please tell us something about your journey in this field. How do you use the tool in your day to day work? I started my career by working with the Qlik technology. My hunger for learning Qlik made me addicted to the Qlik community. I learned lot many things from the community by asking questions and solving real-world problems of community members. This helped me to get awarded by Qlik as MVP for consecutively 2 years. MVP award motivated me to help Qlik customers and users and that is one of the reasons why I thought about writing a book on Qlik Sense. I have implemented Qlik not only for clients but also for my personal use cases. There are many ways in which Qlik helps me in my day-to-day work and makes my life much easier. It’s safe to say that I absolutely love Qlik. Your book 'Implementing Qlik Sense' is primarily divided into 4 sections - with each section catering to a specific need when it comes to building a solid BI solution. Could you please talk more about how you have structured the book, and why? BI projects are challenging, and it really hurts when a project doesn’t succeed. The purpose of the book is to enable Qlik Sense developers to get to implement successful Qlik Projects. There is often a lot of focus on development and thereby Qlik developers miss several other crucial factors which contribute to project success. To make the journey from a Qlik developer to a Qlik consultant the book is divided into 4 sections. The first section focuses on the initial preparation and intended to help consultant to get their groundwork done. The second section focuses on the execution of the project and intended to help consultants play a key role in rest of phases involving requirement gathering, architecture, design, development UAT. The third section is intended to make consultant familiar with some industry domains. This section is intended to help consultant in engaging better with business users and suggesting value-additions to project. The last section is to use the knowledge gained in the three sections and approaching a project with a case study which we come across routinely. Who is the primary target audience for this book? Are there any prerequisites they need to know before they start reading this book? The primary target audience is the Qlik Developers who are looking to progress in their career and are looking to wear the hat of a Qlik consultant. The book is also for existing consultants who would like to sharpen their skills and use Qlik Sense more efficiently. The book will help them become trusted advisors to their clients. Those who are already familiar with some Qlik development will be able to get the most out of this book. Qlik Sense is primarily an enterprise tool. With the rise of open source languages such as R and Python, why do you think people would still prefer enterprise tools for their data visualization? Qlik Sense is not a competition to R and Python but there are lots of synergies. The customer gets the best value when Qlik co-exists with R/Python and can leverage the capabilities of both Qlik and R/Python. Qlik Sense does not have the predictive capability which is easily fulfilled by R/Python. For the customer, the tight integration ensures he/she doesn’t have to leave the Qlik screen. There can be other use cases for using them jointly such as analyzing unstructured data and using machine learning. The reports and visualizations built using Qlik Sense can be viewed and ported across multiple platforms. Can you please share your views on this? How does it help the users? Qlik has opened all gates to integrate its reporting and visualization with most of the technologies through APIs. This has empowered customers to integrate Qlik with their existing portals and provide easy access to end users. Qlik provides APIs for almost all its products, which makes Qlik the first choice for many CIOs because with those APIs they get a variety of options to integrate and automate their work. What are the other key functionalities of Qlik Sense that help the users build better BI solutions? Qlik Sense is not just a pure play data visualization tool. It has capabilities for creating its own warehouse, having an ETL layer and then of course there’s the visualization layer. For the customers, it’s all about getting all the relevant components required for their BI project in a single solution. Qlik is investing heavily in R&D and with its recent acquisitions and a strong portfolio, it is a complete solution enabling users to get all their use cases fulfilled. The open API has enabled opening newer avenues with custom visualizations, amazing concepts such as chatbots, augmented intelligence and much more. The core strength of strong data association, enterprise scalability, governance combined with all other aspects make Qlik one of the best in overall customer satisfaction. Do you foresee Qlik Sense competing strongly with major players such as Tableau and Power BI in the near future? Also, how do you think Qlik plans to tackle the rising popularity of the Open Source alternatives? Qlik has been classified as a Leader in the Gartner’s Magic Quadrant for several years now. We often come across Tableau and Microsoft Power BI as competition. We suggest our customers do a thorough evaluation and more often than not they choose Qlik for its features and the simplicity it offers. With recent acquisitions, Qlik Sense has now become an end-to-end solution for BI, covering uses cases ranging from report distributions, data-as-a-service, and geoanalytics as well. Open source alternatives have their own market and it makes more sense to leverage their capability rather than compete with them. An example, of course, is the strong integration of many BI tools with R or Python which makes life so much easier when it comes to finding useful insights from data. Lastly, what are the 3 key takeaways from your book 'Implementing Qlik Sense'? How will this book help the readers? The book is all about meeting your client’s expectations. The key takeaways are: Understand the role and importance of Qlik consultant and why it’s crucial to be a trusted advisor to your clients Successfully navigating through all aspects which enable successful implementation of your Qlik BI Project. Focus on mitigating risks, driving adoption and avoiding common mistakes while using Qlik Sense. The book is ideal for Qlik developers who aspire to become Qlik consultants. The book uses simple language and gives examples to make the learning journey as simple as possible. It helps the consultants to give equal importance to certain phases of project development that often neglected. Ultimately, the book will enable Qlik consultants to deliver quality Qlik projects. If this interview has nudged you to explore Qlik Sense, make sure you check out our book Implementing Qlik Sense right away!

0
0
4740

article-image-industrial-internet-iiot-architects

Aaron Lazar

21 Nov 2017

8 min read

Why the Industrial Internet of Things (IIoT) needs Architects

Aaron Lazar

21 Nov 2017

8 min read

The Industrial Internet, the IIoT, the 4th Industrial Revolution or Industry 4.0, whatever you may call it, has gained a lot of traction in recent times. Many leading companies are driving this revolution, connecting smart edge devices to cloud-based analysis platforms and solving their business challenges in new and smarter ways. To ensure the smooth integration of such machines and devices, effective architectural strategies based on accepted principles, best practices, and lessons learned, must be applied. In this interview, Shyam throws light on his new book, Architecting the Industrial Internet, and shares expert insights into the world of IIoT, Big Data, Artificial Intelligence and more. Shyam Nath Shyam is the director of technology integrations for Industrial IoT at GE Digital. His area of focus is building go-to-market solutions. His technical expertise lies in big data and analytics architecture and solutions with focus on IoT. He joined GE in Sep 2013 prior to which he has worked in IBM, Deloitte, Oracle, and Halliburton. He is the Founder/President of the BIWA Group, a global community of professional in Big Data, analytics, and IoT. He has often been listed as one of the top social media influencers for Industrial IoT. You can follow him on Twitter @ShyamVaran. He talks about the IIoT, the various impacts that technologies like AI and Deep Learning will have on IIoT and he gives a futuristic direction to where IIoT is headed towards. He talks about the challenges that Architects face while architecting IIoT solutions and how his book will help them overcome such issues. Key Takeaways The fourth Industrial Revolution will break silos and bring IT and Ops teams together to function more smoothly. Choosing the right technology to work with involves taking risks and experimenting with custom solutions. The Predix platform and Predix.io allow developers and architects, quickly learn from others and build working prototypes that can be used to get quick feedback from the business users. Interoperability issues and a lack of understanding of all the security ramifications of the hyper-connected world could be a few challenges that adoption of IIoT must overcome Supporting technologies like AI, Deep Learning, AR and VR will have major impacts on the Industrial Internet In-depth Interview On the promise of a future with the Industrial Internet The 4th Industrial Revolution is evolving at a terrific pace. Can you highlight some of the most notable aspects of Industry 4.0? The Industrial Internet is the 4th Industrial Revolution. It will have a profound impact on both the industrial productivity as well as the future of work. Due to more reliable power, cleaner water, and Intelligent Cities, the standard of living will improve, at large for the world citizens. Industrial Internet will forge new collaborations between the IT and OT, in the organizations, and each side will develop a better appreciation of the problems and technologies of the other. They will work together to create smoother overall operations by breaking the silos. On Shyam’s IIoT toolbox that he uses on a day to day basis You have a solid track record of architecting IIoT applications in the Big Data space over the years. What tools do you use on a day-to-day basis? In order to build Industrial Internet applications, GE's Predix is my preferred IIoT platform. It is built for Digital Industrial solutions, with security and compliance baked into it. Customer IIoT solutions can be quickly built on Predix and extended with the services in the marketplace from the ecosystem. For Asset Health Monitoring and for reducing the downtime, Asset Performance Management (APM) can be used to get a jump start and its extensibility framework can be used to extend it. On how to begin one’s journey into building the Industry 4.0 For an IIoT architect, what would your recommended learning plan be? What aspects of architecting Industry 4.0 applications are tricky to master and how does your book Architecting the Industrial Internet, prepare its readers to be industry ready? An IIoT Architect can start with the book Architecting the Industrial Internet, to get a good grasp of the area broadly. This book provides a diverse set of perspectives and architectural principles, from authors who work in GE Digital, Oracle and Microsoft. The end-to-end IIoT applications involve an understanding of sensors, machines, control systems, connectivity and cloud or server systems, along with the understanding of associated enterprise data, the architect needs to focus on a limited solution or proof of concept first. The book provides coverage for the end-to-end requirements of the IIoT solutions for the architects, developers and business managers. The extensive set of use cases and case studies provides examples from many different industry domains to allow the readers to easily related to it. The book is written, in a style that would not overwhelm the reader, yet explain the workings of the architecture and the solutions. The book will be best suited for Enterprise Architects and Data Architects who are trying to understand how IIoT solutions differ from traditional IT solutions. The layer-by-layer description of the IIoT Architecture will provide a systematic approach to help develop a deep understanding, for Architects. IoT Developers who have some understanding of this area can learn the IIoT platform-based approach to building solutions quickly. On how to choose the best technology solution to optimize ROI There are so many IIoT technologies, that manufacturers are confused as to how to choose the best technology to obtain the best ROI. What would your advice to manufacturers be, in this regard? The manufacturers and operation leaders look for quick solutions to known issues, in a proven way. Hence, often they do not have the appetite to experiment with a custom solution, rather they like to know where the solution provider has solved similar problems and what was the outcome. The collection of use cases and case studies will help business leaders get an idea of the potential ROI while evaluating the solution. Getting to know Predix, GE’s IIoT platform, better Let's talk a bit about Predix, GE's IIoT platform. What advantages does Predix offer developers and architects? Do you foresee any major improvements coming to Predix in the near future? The GE's Predix platform has a growing developer community that is approaching 40,000 strong. Likewise, the ecosystem of Partners is approaching 1000. Coupled with the free access to create developer accounts on Predix.io, the developers and architects can quickly learn from others and build working prototypes that can be used to get quick feedback from the business users. The catalog of microservices at Predix.io will continue to expand. Likewise, applications written on top of Predix, such as APM and OPM (Operations Performance Management) will continue to become feature-rich, providing coverage to many common Digital Industrial challenges. On the impact of other emerging technologies like AI on IIoT What according to you will the impact be of AI and Deep Learning, on IIoT? AI and Deep Learning help to build robust Digtal Twins of the industrial assets. These Digital Twins, will make the job of predictive maintenance and optimization, much easier for the operators of these assets. Further, IIoT will benefit from many new advances in technologies like AI, AR/VR and make the job of Field Services Technicians easier. IIoT is already widely used in energy generation and distribution, Intelligent Cities for law enforcement and to ease traffic congestion. The field of healthcare is evolving, due to increasing use of wearables. Finally, precision agriculture is enabled by IoT as well. On likely barriers to IIoT adoption What are the roadblocks you expect in the adoption of IIoT? Today the challenges to rapid adoption of IoT, are interoperability issues and lack of understanding of all the security ramifications of the hyper-connected world. Finally, how to explain the business case of the IoT to the decision makers and different stakeholders is still evolving. On why Architecting the Industrial Internet is a must read for Architects Would you like to give architects 3 reasons on why they should pick up your book? It is written by IIoT practitioners from large companies who are building solutions for both internal and external consumption. The book captures the architectural best practices and advocates a platform based approach, to solutions. The theory is put to practice in the form of use cases and case studies, to provide a comprehensive guide to the architects. If you enjoyed this interview, do check out Shyam’s latest book, Architecting the Industrial Internet.

0
0
2644

article-image-sports-analytics-empowering-better-decision-making

Amey Varangaonkar

14 Nov 2017

11 min read

Expert Insights: How sports analytics is empowering better decision-making

Amey Varangaonkar

14 Nov 2017

11 min read

Analytics is slowly changing the face of the sports industry as we know it. Data-driven insights are being used to improve the team and individual performance, and to get that all-important edge over the competition. But what exactly is sports analytics? And how is it being used? What better way to get answers to these questions than asking an expert himself! [author title="Gaurav Sundararaman"]A Senior Stats Analyst at ESPN currently based in Bangalore, India. With over 10 years of experience in the field of analytics, Gaurav worked as a Research Analyst and a consultant in the initial phase of his career. He then ventured into sports analytics in 2012, and played a major role in the Analytics division of SportsMechanics India Pvt. Ltd. where he was the Analytics Consultant for the T20 World Cup winning West Indies team in 2016.[/author] In this interview, Gaurav takes us through the current landscape of sports analytics, and talks about how analytics is empowering better decision-making in sports. Key Takeaways Sports analytics pertains to finding actionable, useful insights from sports data which teams can use to gain competitive advantage over the opposition Instincts backed by data make on and off-field decisions more powerful and accurate Rise of IoT and wearable technology has boosted sports analytics. With more data available for analysis, insights can be unique and very helpful Analytics is being used in sports right from improving player performance to optimizing ticket prices and understanding fan sentiments Knowledge of tools for data collection, analysis and visualization such as R, Python and Tableau is essential for a sports analyst Thorough understanding of the sport, up to date skillset and strong communication with players and management are equally important factors to perform efficient analytics Adoption of analytics within sports has been slow, but steady. More and more teams are now realizing the benefits of sports analytics and are adopting an analytics-based strategy Complete Interview Analytics today is finding widespread applications in almost every industry today - how has the sports industry changed over the years? What role is analytics playing in this transformation? The sports industry has been relatively late in adopting analytics. That said, the use of analytics in sports has also varied geographically. In the west, analytics plays a big role in helping teams, as well as individual athletes, take up decisions. Better infrastructure and a quick adoption of the latest trends in technology is an important factor here. Also, investment in sports starts from a very young age in the west, which also makes a huge difference. In contrast, many countries in Asia are still lagging behind when it comes to adopting analytics, and still leverage on traditional techniques to solve problems. A combination of analytics with traditional knowledge from experience would go a long way in helping teams, players and businesses succeed. Previously the sports industry was a very close community. Now with the advent of analytics, the industry has managed to expand its horizon. We witness more non-sportsmen playing a major part in the decision making. They understand the dynamics of the sports business and how to use data-driven insights to influence the same. Many major teams across different sports such as Football (Soccer), Cricket, American Football, Basketball and more have realized the value of data and analytics. How are they using it? What advantages does analytics offer to them? One thing I firmly believe is that analytics can’t replace skills or can’t guarantee wins. What it can do is ensure there is logic towards certain plans and decisions. Instincts backed by data make the decisions more powerful. I always tell the coaches or players – Go with your gut and instincts as Plan A. If it does not work out your fall back could be Plan B based on trends and patterns derived from data. It turns out to be a win-win for both. Analytics offers a neutral perspective which sometimes players or coaches may not realize. Each sport has a unique way of applying analytics to make decisions and obviously, as analysts, we need to understand the context and map the relevant data. As far as using the analytics is concerned, the goals are pretty straightforward - be the best, beat the opponents and aim for sustained success. Analytics helps you achieve each of these objectives. The rise of IoT and wearable technology over the last few years has been incredible. How has it affected sports, and sports analytics, in particular? It is great to see that many companies are investing in such technologies. It is important to identify where wearables and IoT can be used in sport and where it can cause maximum impact. These devices allow in-game monitoring of players, their performance, and their current physical state. Also, I believe more than on-field, these technologies would be very useful in engaging fans as well. Data derived from these devices could be used in broadcasting as well as providing a good experience for fans in the stadiums. This will encourage more and more people to watch games in stadiums and not in the comfort of their homes. We have already seen a beginning with a few stadiums around the world leveraging technology (IoT). The Mercedes Benz stadium (home of Atlanta Falcons) has a high tech stadium powered by IBM. Sacramento is building a state-of-the-art facility for the Sacramento Kings. This is just the start, and it will only get better with time. How does one become a sports analyst? Are there any particular courses/certifications that one needs to complete in order to become one? Can you share with us your journey in sports analytics? To be honest there are no professional courses yet in India to become an Analyst. There are a couple of colleges which have just started offering Sports Analytics as a course in their Post-Graduation Program. However, there are a few companies (Sports Mechanics and Kadamba Technologies in Chennai) that offer jobs that can enable you to become a Sports Analyst if you are really good. If you are a freelancer then my advice would be to ensure you brand yourself well and showcase your knowledge through social media platforms and get a breakthrough via contacts. Post my MBA, Sports Mechanics (a leader in this space), a company based in Chennai were looking for someone to work to start their data practice. I was just lucky to be at the right place at the right time. I worked for 4 years there and was able to learn a lot about the industry and what works and what does not. Being a small company, I was lucky to don multiple hats and work on different projects across the value chain. I moved and joined the lovely team Of ESPNCricinfo where I work for their stats team. What are the tools and frameworks that you use for your day to day tasks? How do they make your work easier? There are no specific tools or frameworks. It depends on the enterprise you are working for. Usually, they are proprietary tools of the company. Most of these tools are used either to collect, mine or visualize data. Interpreting the information and presenting it in a manner in which users understand is important and that is where certain applications or frameworks are used. However to be ready for the future it would be good to be skilled on tools that support data collection, analysis and visualization namely R, Python and Tableau, to name a few. Do sports analysts have to interact with players and the coaching staff directly? How do you communicate your insights and findings with the relevant stakeholders? Yes, they have to interact with players and management directly. If not, the impact will be minimal. Communicating insights is very important in this industry. Too much analysis could lead to paralysis. We need to identify what exactly each player or coach is looking for, based on their game and try to provide them the information in a crisp manner which helps them make decisions on and off the field. For each stakeholder the magnitude of the information provided is different. For the coach and management, the insights can be in detail while for the players we need to keep it short and to the point. The insights you generate must not only be limited to enhancing the performance of a team on the field but much more than that. Could you give us some examples? Insights can vary. For the management, it could deal with how to maximise the revenue or save some money in an auction. For coaches, it could help them know about his team’s as well as the opposition’s strengths and weaknesses from a different perspective. For captains, data could help in identifying some key strategies on the field. For example, in Cricket, it could help the captain determine which bowler to bring on to which opposition batsmen, or where to place the fielders. Off the field, one area where analytics could play a big role would be in grassroots development and tracking of an athlete from an early age to ensure he is prepared for the biggest stage. Monitoring performance, improving physical attributes by following a specific regimen, assessing injury record and designing specific training programs, etc. are some ways in which this could be done. What are some of the other challenges that you face in your day to day work? Growth in this industry can be slow sometimes. You need to be very patient, work hard and ensure you follow the sport very closely. There are not many analytical obstacles as such, but understanding the requirements and what exactly the data needs are can be quite a challenge. Despite all the buzz, there are quite a few sports teams and organizations who are still reluctant to adopt an analytics-based strategy – why do you think is that the case? What needs to change? The reason for the slow adoption could be the lack of successful case studies and the awareness. In most sports when so many decisions are taken on the field sometimes the players' ability and skill seems far more superior to anything else. As more instances of successful execution of data-based trends come up, we are likely to see more teams adopting the data-based strategy. Like I mentioned earlier, analytics needs to be used to make the coach and captain take the most logical and informed decisions. Decision-makers need to be aware of the way it is used and how much impact it can cause. This awareness is vital towards increasing the adoption of analytics in sports. Where do you see sports analytics in the next 5-10 years? Today in sports many decisions are taken on gut feeling, and I believe there should be a balance. That is where analytics can help. In sports like Cricket, only around 30% of the data is used and there is more emphasis given to video. Meanwhile, if we look at Soccer or Basketball, the usage of data and video analytics is close to 60-70% of its potential. Through awareness and trying out new plans based on data, we can increase usage of analytics in cricket to 60-70 % in the next few years. Despite the current shortcomings, It is fair to say that there is a progressive and positive change at the grassroots level across the world. Data-based coaching and access to technology are slowly being made available to teams as well as budding sportsmen/women. Another positive is that the investment in the sports industry is growing steadily. I am confident that in a couple of years, we will see more job opportunities in sports. Maybe in five years, the entire ecosystem would be more structured and professional. We would witness analytics playing a much bigger role in helping stakeholders make informed decisions, as data-based insights become even more crucial. Lastly, what advice do you have for aspiring sports analysts? My only advice would be - Be passionate, build a strong network of people around you, and constantly be on the lookout for opportunities. Also, it is important to keep updating your skill-set in terms of the tools and techniques needed to perform efficient and faster analytics. Newer and better tools keep coming up very quickly, which make your work easier and faster. Be on the lookout for such tools! One also needs to identify their own niche based on their strengths and try to build on that. The industry is on the cusp of growth and as budding analysts, we need to be prepared to take off when the industry matures. Build your brand and talk to more people in the industry - figure out what you want to do to keep yourself in the best position to grow with the industry.

0
0
3648

Author Posts - Data

“Data is the new oil but it has to be refined through a complex processing network” - Tirthajyoti Sarkar and Shubhadeep Roychowdhury [Interview]

“Git, like all other version control tools, exists to solve for one problem: change” - Joseph Muli and Alex Magana [Interview]

Discussing SAP: Past, present and future with Rehan Zaidi, senior SAP ABAP consultant [Interview]

“Deep meta reinforcement learning will be the future of AI where we will be so close to achieving artificial general intelligence (AGI)”, Sudharsan Ravichandiran

What Should We Watch Tonight? Ask a Robot, says Matt Jones from OVO Mobile [Interview]

Blockchain can solve tech's trust issues - Imran Bashir

“Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan

“Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou

Why is Python so good for AI and Machine Learning? 5 Python Experts Explain

Why MongoDB is the most popular NoSQL database today

Trending Topics

Why You Need to Know Statistics To Be a Good Data Scientist

Why choose IBM SPSS Statistics over R for your data analysis project

How Qlik Sense is driving self-service Business Intelligence

Why the Industrial Internet of Things (IIoT) needs Architects

Expert Insights: How sports analytics is empowering better decision-making