Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Author Posts - Data

37 Articles
article-image-learn-ibm-spss-modeler
Amey Varangaonkar
03 Nov 2017
9 min read
Save for later

Why learn IBM SPSS Modeler in 2017

Amey Varangaonkar
03 Nov 2017
9 min read
IBM’s SPSS Modeler provides a powerful, versatile workbench that allows you to build efficient and accurate predictive models in no time. What else separates IBM SPSS Modeler from other enterprise analytics tools out there today? To know just that, we talk to arguably two of the most popular members of the SPSS community. [box type="shadow" align="" class="" width=""] Keith McCormick Keith is a career-long practitioner of predictive analytics and data science, has been engaged in statistical modeling, data mining, and mentoring others in this area for more than 20 years. He is also a consultant, an established author, and a speaker. Although his consulting work is not restricted to any one tool, his writing and speaking have made him particularly well known in the IBM SPSS Statistics and IBM SPSS Modeler communities. Jesus Salcedo Jesus is an independent statistical consultant and has been using SPSS products for over 20 years. With a Ph.D., in Psychometrics from Fordham University, he is a former SPSS Curriculum Team Lead and Senior Education Specialist, and has developed numerous SPSS learning courses and trained thousands of users.[/box] In this interview with Packt, Keith and Jesus give us more insights on the Modeler as a tool, the different functionalities it offers, and how to get the most out of it for all your data mining and analytics needs. Key Interview Takeaways IBM SPSS Modeler is easy to get started with but can be a tricky tool to master Knowing your business, your dataset and what algorithms you are going to apply are some key factors to consider before building your analytics solution with SPSS Modeler SPSS Modeler’s scripting language is Python, and the tool has support for running R code IBM SPSS Modeler Essentials helps you effectively learn data mining and analytics, with a focus on working with data than on coding Full Interview Predictive Analytics has garnered a lot of attention of late, and adopting an analytics-based strategy has become the norm for many businesses. Why do you think this is the case?   Jesus: I think this is happening because everyone wants to make better-informed decisions.  Additionally, predictive analytics brings the added benefit of discovering new relationships that you were previously not aware of. Keith: That’s true, but it’s even more exciting when the models are deployed and are potentially driving automated decisions. With over 40 years of combined experience in this field, you are master consultants and trainers, with an unrivaled expertise when it comes to using the IBM SPSS products. Please share with us the story of your journey in this field. Our readers would also love to know how your day-to-day schedule looks like.   Jesus: When I was in college, I had no idea what I wanted to be. I took courses in many areas, however I avoided statistics because I thought it would be a waste of time, after all, what else is there to learn other than calculating a mean and plugging it into fancy formulas (as a kid I loved baseball, so I was very familiar with how to calculate various baseball statistics). Anyway, I took my first statistics course (where I learned SPSS) since it was a requirement, and I loved it. Soon after I became a teaching assistant for more advanced statistics courses and I eventually earned my Ph.D. in Psychometrics, all the while doing statistical consulting on the side. After graduate school, my first job was as an education consultant for SPSS (where I met Keith). I worked at SPSS (and later IBM) for seven years, at first focusing on training customers on statistics and data-mining, and then later on developing course materials for our trainings. In 2013 Keith invited me to join him as an IBM partner, so we both trained customers and developed a lot of new and exciting material in both book and video formats. Currently, I work as an independent statistical and data-mining consultant and my daily projects range from analyzing data for customers, training customers so they can analyze their own data, or creating books and videos on statistics and data mining. Keith: Our careers have lots of similarities. My current day to day is similar too. Lately, about 1/3rd of my year is lecturing and curriculum development for organizations like TDWI (Transforming Data with Intelligence), The Modeling Agency, and UC Irvine Extension. The majority of my work is in predictive analytics consulting. I especially enjoy projects where I’m brought in early and can help with strategy and planning. Then, the coach and mentor take over a team until they are self-sufficient. Sometimes building the team is even more exciting than the first project because I know that they will be able to do many more projects in the future. There is a plethora of predictive analytics tools used today - for desktop and enterprises. IBM SPSS Modeler is one such tool. What advantages does SPSS Modeler have over the others, in your opinion? Keith: One of our good friends who co-authored the IBM SPSS Modeler Cookbook made an interesting comment about this at a conference. He is unique in that he has done one-day seminars using several different software tools. As you know, it is difficult to present data mining in just one day. He said that only with Modeler he is able to spend some time on each of the CRISP-DM phases of a case study in a day. I think he feels this way because it’s among the easiest options to use. We agree. While powerful, and while it takes a whole career to master everything, it is easy to get started. Are there any prerequisites for using SPSS Modeler? How steep is the learning curve in order to start using the tool effectively? Keith: Well, the first thing I want to mention is that there are no prerequisites for our PACKT video IBM SPSS Modeler Essentials. In that, we assume that you are starting from scratch. For the tool in general, there aren’t any specific requisites as such, however knowing your data, and what insights you are looking for always helps. Jesus: Once you are back at the office, in order to be successful on a data mining project or efficiently utilize the tool, you’ll need to know your business, your data, and the modeling algorithm you are using. Keith: The other question that we get all the time is how much statistics and machine learning do you have to know. Our advice is to start with one or maybe two algorithms and learn them well. Try to stick to algorithms that you know. In our PACKT course, we mostly focus on just Decision Trees, which one of the easiest to learn. What do you think are the 3 key takeaways from your course - IBM SPSS Modeler Essentials? The 3 key takeaways from this course, we feel are: Start slow. Don’t pressure yourself to learn everything all at once. There are dozens of “nodes” in Modeler. We introduce the most important ones so start there. Be brilliant in the basics. Get comfortable with the software environment. We recommend the bests ways to organize your work. Don’t rush to Modeling. Remember the Cross Industry Standard Process for Data Mining (CRISP-DM), which we cover in the video. Use it to make sure that you proceed systematically and don’t skip critical steps. IBM recently announced that SPSS Modeler would be available freely for educational usage. How can one make the most of this opportunity? Jesus: A large portion of the work that we have done over the past few years has been to train people on how to analyze data. Professors are in a unique position to expose more students to data mining since we teach only those students whose work requires this type of training, whereas professors can expose a much larger group of people to data mining. IBM offers several programs that support professors, students, and faculty; for more information visit: https://www-01.ibm.com/software/analytics/spss/academic/ Keith: When seeking out a university class, whether it be classroom or online, ask them if they use Modeler or if they allow you to complete your homework assignments in Modeler. We recognize that R based classes are very popular now, but you potentially won’t learn as much about Data Mining. Sometimes too much of the class is spent on coding so you learn R, but learn less about analytics. You want to spend most of the class time actively working with data and producing results. With the rise of open source languages such as R and Python and their applications in predictive analytics, how do you foresee enterprise tools like SPSS Modeler competing with them? Keith: Perhaps surprisingly, we don’t think Modeler does compete with R or Python. A lot of folks don’t know that Python is Modeler’s scripting language. Now, that is an advanced feature, and we don’t cover it in the Essentials video, but learning Python actually increases your knowledge of Modeler. And Modeler supports running R code right in a Modeler stream by using the R nodes. So Modeler power users (or future power users) should keep learning R on their to-do list. If you prefer not to use code, you can produce powerful results without learning either by just using Modeler straight out of the box. So, it really is all up to you. If this interview has sparked your interest in learning more about IBM SPSS Modeler, make sure you check out our video course IBM SPSS Modeler Essentials right away!
Read more
  • 0
  • 0
  • 3570

article-image-unlocking-the-secrets-of-microsoft-power-bi-interview-part-2-of-2-with-brett-powell-founder-of-frontline-analytics-llc
Amey Varangaonkar
10 Oct 2017
12 min read
Save for later

Unlocking the secrets of Microsoft Power BI

Amey Varangaonkar
10 Oct 2017
12 min read
[dropcap]S[/dropcap]elf-service Business Intelligence is the buzzword everyone's talking about today. It gives modern business users the ability to find unique insights from their data without any hassle. Amidst a myriad of BI tools and platforms out there in the market, Microsoft’s Power BI has emerged as a powerful, all-encompassing BI solution - empowering users to tailor and manage Business Intelligence to suit their unique needs and scenarios. [author title="Brett Powell"]A Microsoft Power BI partner, and the founder and owner of Frontline Analytics LLC., a BI and analytics research and consulting firm. Brett has contributed to the design and development of Microsoft BI stack and Power BI solutions of diverse scale and complexity across the retail, manufacturing, financial, and services industries. He regularly blogs about the latest happenings in Microsoft BI and Power BI features at Insight Quest. He is also an organizer of the Boston BI User Group.[/author]   In this two part interview Brett talks about his new book, Microsoft Power BI Cookbook, and shares his insights and expertise in the area of BI and data analytics with a partciular focus on Power BI. In part one of the interview, Brett shared his views on topics ranging from what it takes to be successful in the field of BI & data analytics to why he thinks Microsoft is going to lead the way in shaping the future of the BI landscape. Today in part two, he shares his expertise with us on the unique features that differentiate Power BI from other tools and platforms in the BI space. Key Takeaways Ease of deployment across multiple platforms, efficient data-driven insights, ease of use and support for a data-driven corporate culture is what defines an ideal Business Intelligence solution for enterprises. Power BI leads in self-service BI because it’s the first Software as a Service (SaaS) platform to offer ‘End User BI’ in which anyone, not just a business analyst, can leverage powerful tools to obtain greater value from data. Microsoft Power BI has been identified as a leader in Gartner’s Magic Quadrant for BI and Analytics platforms, and provides a visually rich and easy to access interface that modern business users require. You can isolate report authoring from dataset development in Power BI, or quickly scale up or down a Power BI dataset as per your needs. Power BI is much more than just a tool for reports and dashboards. With a thorough understanding of the query and analytical engines of Power BI, users can customize more powerful and sustainable BI solutions. Part Two: Interview Excerpts - Power BI from a Worm’s Eye View How long have you been a Microsoft Power BI user? How have you been using Power BI on a day-to-day basis? What other tools do you generally end up using alongside Power BI for your work? I’ve been using Power BI from the beginning when it was merely an add-in for Excel 2010. Back then, there was no cloud service and Microsoft BI was significantly tethered to SharePoint but the fundamentals of the Tabular data modelling engine and programming language of DAX was available in Excel to build personal and team solutions. On a day-to-day basis I regularly work with Power BI datasets – that is, the analytical data models inside of Power BI Desktop files. I also work with Power BI report authoring and visualization features and with various data sources for Power BI such as SQL Server. From Learning to Mastering Power BI For someone just starting out using Power BI, what would your recommended learning plan be? For existing users, what does the road to mastering Microsoft Power BI look like? When you’re just starting out I’d recommend learning the essentials of the Power BI architecture and how the components (Power BI service, Power BI Desktop, On-Premises Data Gateway, Power BI Mobile, etc) work together. A sound knowledge on the differences between datasets, reports, and dashboards is essential and an understanding of app workspaces and apps is strongly recommended as this is the future of Power BI content management and distribution. In terms of a learning path you should consider what your role will be on Power BI projects – will you be administering Power BI, creating reports and dashboards, or building and managing datasets? Each of these roles has their own skills, technologies and processes to learn. For example, if you’re going to be designing datasets, a solid understanding of the DAX language and filter context is essential and knowledge of M queries and data access is very important as well. The road to mastering Power BI, in my view, involves a deep understanding of both the M and DAX languages in addition to knowledge of Power BI’s content management, delivery, and administration processes and features. You need to be able to contribute to the full lifecycle of Power BI projects and help guide the adoption of Power BI across an organization. The most difficult or ‘tricky’ aspect of Power BI is thinking of M and DAX functions and patterns in the context of DirectQuery and Import mode datasets. For example, certain code or design patterns which are perfectly appropriate for import models are not suitable for DirectQuery models. A deep understanding of the tradeoffs and use cases for DirectQuery versus default Import (in-memory) mode and the ability to design datasets accordingly is a top characteristic of a Power BI master. 5+ interesting things (you probably didn’t know) about Power BI What are some things that users may not have known about Power BI or what it could do? Can readers look forward to learning to do some of them from your upcoming book: Microsoft Power BI Cookbook? The great majority of learning tutorials and documentation on Power BI involves the graphical interfaces that help you get started with Power BI. Likewise, when most people think of Power BI they almost exclusively think of data visualizations in reports and dashboards – they don’t think of the data layer. While these features are great and professional Power BI developers can take advantage of them, the more powerful and sustainable Power BI solutions require some level of customization and can only be delivered via knowledge of the query and analytical engines of Power BI. Readers of the Power BI Cookbook can look forward to a broad mix of relatively simple to implement tips on usability such as providing an intuitive Fields list for users to more complex yet powerful examples of data transformations, embedded analytics, and dynamic filter behaviours such as with Row-level security models. Each chapter contains granular details on core Power BI features but also highlights synergies available by integrating features within a solution such as taking advantage of an M query expression, a SQL statement, or a DAX metric in the context of a report or dashboard. What are the 3 most striking features that make you love to work with Power BI? What are 3 aspects you would like improved? The most striking feature for me is the ability to isolate report authoring from dataset development. With Power BI you can easily implement a change to a dataset such as a new metric and many report authors can then leverage that change in their visualizations and dashboards as their reports are connected to the published version of the dataset in the Power BI service. A second striking feature is the ‘Query Folding’ of the M query engine. I can write or enhance an M query such that a SQL statement is generated to take advantage of the data source system’s query processing resources. A third striking feature is the ability to quickly scale up or down a Power BI dataset via the dedicated hardware available with Power BI Premium. With Power BI Premium, free users (users without a Pro License) are now able to access Power BI reports and dashboards. The three aspects I’d like to see improved include the following: Currently we don’t have IntelliSense and other common development features when writing M queries. Currently we don’t have display folders for Power BI datasets thus we have to work around this with larger, more complex datasets to maintain a simple user interface. Currently we don’t have Perspectives, a feature of SSAS, that would allow us to define a view of a Power BI dataset such that users don’t see other parts of a data model not relevant to their needs. Is the latest Microsoft Power BI update a significant improvement over the previous version? Any specific new features you’d like to highlight? Absolutely. The September update included a Drillthrough feature that, if configured correctly, enables users to quickly access the crucial details associated with values on their reports such as an individual vendor or a product. Additionally, there was a significant update to Report Themes which provides organizations with more control to define standard, consistent report formatting. Drillthrough is so important that an example of this feature was added to the Power BI Cookbook. Additionally, Power BI usage reporting including the identity of the individual user accessing Power BI content was recently released and this too was included in the Power BI Cookbook. Finally, I believe the new Ribbon Chart will be used extensively as a superior alternative to stacked column charts. Can you tell us a little about the new 'time storyteller custom visual' feature in Power BI? The Timeline Storyteller custom visual was developed by the Storytelling with Data group within Microsoft Research. Though it’s available for inclusion in Power BI reports via the Office Store like other custom visuals, it’s more like a storytelling design environment than a single visual given its extensive configuration options for timeline representations, scales, layouts, filtering and annotations. Like the inherent advantages of geospatial visuals, the linking of Visio diagrams with related Power BI datasets can intuitively call out bottlenecks and otherwise difficult-to-detect relationships within processes. 7 reasons to choose Power BI for building enterprise BI solutions Where does Power BI fall within Microsoft's mission to empower every person and every organization on the planet to achieve more of 1. Bringing people together 2. Living smarter 3. Friction free creativity 4. Fluid mobility? Power BI Desktop is available for free and is enhanced each month with features that empower the user to do more and which remove technical obstacles. Similarly, with no knowledge whatsoever of the underlying technology or solution, a business user can access a Power BI app on their phone or PC and easily view and interact with data relevant to their role. Importantly for business analysts and information workers, Power BI acknowledges the scarcity of BI and analytics resources (ie data scientists, BI developers) and thus provides both graphical interfaces as well as full programming capabilities right into Power BI Desktop. This makes it feasible and often painless to quickly create a working, valuable solution with relatively little experience with the product. We can expect Power BI to support 10GB (and then larger) datasets soon as well as improve its ‘data storytelling’ capabilities with a feature called Bookmarks. In effect, Bookmarks will allow Power BI reports to become like PowerPoint presentations with animation. Organizations will also have greater control over how they utilize the v-Cores they purchase as part of Power BI Premium. This will make scaling Power BI deployments easier and more flexible. I’m personally most interested in the incremental refresh feature identified on the Power BI Premium Roadmap. Currently an entire Power BI dataset (in import mode) is refreshed and this is a primary barrier to deploying larger Power BI datasets. Additionally (though not exclusively by any means), the ability to ‘write’ from Power BI to source applications is also a highly anticipated feature on the Power BI Roadmap. How does your book, Microsoft Power BI Cookbook, prepare its readers to be industry ready? What are the key takeaways for readers from this book? Power BI is built with proven, industry leading BI technologies and architectures such as in-memory, columnar compressed data stores and functional query and analytical programming languages. Readers of the Power BI Cookbook will likely be able to quickly deliver fresh solutions or propose ideas for enhancements to existing Power BI projects. Additionally, particularly for BI developers, the skills and techniques demonstrated in the Power BI Cookbook will generally be applicable across the Microsoft BI stack such as in SQL Server Analysis Services Tabular projects and the Power BI Report Server. A primary takeaway from this book is that Power BI is much more than a report authoring or visualization tool. The data transformation and modelling capabilities of Power BI, particularly combined with Power BI Premium capacity and licensing considerations, are robust and scalable. Readers will quickly learn that though certain Power BI features are available in Excel and though Excel can be an important part of Power BI solutions from a BI consumption standpoint, there are massive advantages of Power BI relative to Excel. Therefore, almost all PowerPivot and Power Query for Excel content can and should be migrated to Power BI Desktop. An additional takeaway is the breadth of project types and scenarios that Power BI can support. You can design a corporate BI solution with a Power BI dataset to support hundreds of users across multiple teams but you can also build a tightly focused solution such as monitoring system resources or documenting the contents of a dataset. If you enjoyed this interview, check out Brett’s latest book, Microsoft Power BI Cookbook. Also, read part one of the interview here to see how and where Power BI fits into the BI landscape and what it takes to stay successful in this industry.
Read more
  • 0
  • 0
  • 3335

article-image-microsoft-power-bi-interview-part1-brett-powell
Amey Varangaonkar
09 Oct 2017
8 min read
Save for later

Ride the third wave of BI with Microsoft Power BI

Amey Varangaonkar
09 Oct 2017
8 min read
[dropcap]S[/dropcap]elf-service Business Intelligence is the buzzword everyone's talking about today. It gives modern business users the ability to find unique insights from their data without any hassle. Amidst a myriad of BI tools and platforms out there in the market, Microsoft’s Power BI has emerged as a powerful, all-encompassing BI solution - empowering users to tailor and manage Business Intelligence to suit their unique needs and scenarios. [author title="Brett Powell"]A Microsoft Power BI partner, and the founder and owner of Frontline Analytics LLC., a BI and analytics research and consulting firm. Brett has contributed to the design and development of Microsoft BI stack and Power BI solutions of diverse scale and complexity across the retail, manufacturing, financial, and services industries. He regularly blogs about the latest happenings in Microsoft BI and Power BI features at Insight Quest. He is also an organizer of the Boston BI User Group.[/author]   In this two part interview Brett talks about his new book, Microsoft Power BI Cookbook, and shares his insights and expertise in the area of BI and data analytics with a particular focus on Power BI. In part one, Brett shares his views on topics ranging from what it takes to be successful in the field of BI & data analytics to why he thinks Microsoft is going to lead the way in shaping the future of the BI landscape. In part two of the interview, he shares his expertise with us on the unique features that differentiate Power BI from other tools and platforms in the BI space. Key Takeaways Ease of deployment across multiple platforms, efficient data-driven insights, ease of use and support for a data-driven corporate culture are factors to consider while choosing a Business Intelligence solution for enterprises. Power BI leads in self-service BI because it’s the first Software-as-a-Service (SaaS) platform to offer ‘End User BI’ where anyone, not just a business analyst, can leverage powerful tools to obtain greater value from data. Microsoft Power BI has been identified as a leader in Gartner’s Magic Quadrant for BI and Analytics platforms, and provides a visually rich and easy to access interface that modern business users require. You can isolate report authoring from dataset development in Power BI, or quickly scale up or down a Power BI dataset as per your needs. Power BI is much more than just a tool for reports and dashboards. With a thorough understanding of the query and analytical engines of Power BI, users can customize more powerful and sustainable BI solutions. Part One Interview Excerpts - Power BI from a Bird’s Eye View On choosing the right BI solution for your enterprise needs What are some key criteria one must evaluate while choosing a BI solution for enterprises? How does Power BI fare against these criteria as compared with other leading solutions from IBM, Oracle and Qlikview? Enterprises require a platform which can be implemented on their terms and adapted to their evolving needs. For example, the platform must support on-premises, cloud, and hybrid deployments with seamless integration allowing organizations to both leverage on-premises assets as well as fully manage their cloud solution. Additionally, the platform must fully support both corporate business intelligence processes such as staged deployments across development and production environments as well as self-service tools which empower business teams to contribute to BI projects and a data driven corporate culture. Furthermore, enterprises must consider the commitment of the vendor to BI and analytics, the full cost of scaling and managing the solution, as well as the vendors’ vision for delivering emerging capabilities such as artificial intelligence and natural language. Microsoft Power BI has been identified as a leader in Gartner’s Magic Quadrant for BI and Analytics platforms based on both its currently ability to execute as well as its vision. Particularly now with Power BI Premium, the Power BI Report Server, and Power BI embedded offerings, Power BI truly offers organizations the ability to tailor and manage BI to their unique needs and scenarios. Power BI’s mobile application, available on all common platforms (iOS, Android) in addition to continued user experience improvements in the Power BI service provides a visually rich and common interface for the ‘anytime access’ that modern business users require. Additionally, since Power BI’s self-service authoring tool of Power BI Desktop shares the same engine as SQL Server Analysis Services, Power BI has a distinct advantage in enabling organizations to derive value from both self-service and corporate BI. The BI landscape is very competitive and other vendors such as Tableau and Qlikview have obtained significant market share. However, as organizations fully consider the features distinguishing the products in addition to the licensing structures and the integration with Microsoft Azure, Office 365, and common existing BI assets such as Excel and SQL Server Reporting Services and Analysis Services, they will (and are) increasingly concluding that Power BI provides a compelling value. On the future of BI and why Brett is betting on Microsoft to lead the way Self-service BI as a trend has become mainstream. How does Microsoft Power BI lead this trend? Where do you foresee the BI market heading next i.e., are there other trends we should watch out for?  Power BI leads in self-service BI because it’s the first software as a service (SaaS) platform to offer ‘End User BI’ in which anyone, not just a business analyst, can leverage powerful tools to obtain greater value from data. This ‘third wave’ of BI, as Microsoft suggests, further follows and supplements the first and second waves of BI in Corporate and self-service BI, respectively. For example, Power BI’s Q & A experience with natural language queries and integration with Cortana goes far beyond the traditional self-service process of an analyst finding field names and dragging and dropping items on a canvas to build a report. Additionally, an end user has the power of machine learning algorithms at their fingertips with features such as Quick Insights now built into Power BI Desktop. Furthermore, it’s critical to understand that Microsoft has a much larger vision for self-service BI than other vendors. Self-service BI is not exclusively the visualization layer over a corporate IT controlled data model – it’s also the ability for self-service solutions to be extended and migrated to corporate solutions as part of a complete BI strategy. Given their common underlying technologies, Microsoft is able to remove friction between corporate and self-service BI and allows organizations to manage modern, iterative BI project lifecycles.    On staying ahead of the curve in the data analytics & BI industry For someone just starting out in the data analytics and BI fields, what would your advice be? How can one keep up with the changes in this industry? I would focus on building a foundation in the areas which don’t change frequently such as math, statistics, and dimensional modeling. You don’t need to become a data scientist or a data warehouse architect to deliver great value to organizations but you do need to know the basic tools of storing and analysing data to answer business questions. To succeed in this industry over time you need to consistently invest in your skills in the areas and technologies relevant to your chosen path. You need to hold yourself accountable for becoming a better data professional and this can be accomplished by certification exams, authoring technical blogs, giving presentations, or simply taking notes from technical books and testing out tools and code on your machine. For hard skills I’d recommend standard SQL, relational database fundamentals, data warehouse architecture and dimensional model design, and at least a core knowledge of common data transformation processes and/or tools such as SQL Server Integration Services (SSIS) and SQL stored procedures. You’ll need to master an analytical language as well and for Microsoft BI projects that language is increasingly DAX. For soft skills, you need to move beyond simply looking for a list of requirements for your projects. You need to learn to become flexible and active – you need to become someone who offers ideas and looks to show value and consistently improve projects rather than just ‘deliver requirements’. You need to be able to have both a deeply technical conversation but also have a very practical conversation with business stakeholders. You need to able to build relationships with both business and IT. You don’t ever want to dominate or try to impress anyone but if you’re truly passionate about your work then this will be visible in how you speak about your projects and the positive energy you bring to work every day and to your ongoing personal development.   If you enjoyed this interview, check out Brett’s latest book, Microsoft Power BI Cookbook. In part two of the interview, Brett shares 5 Power BI features to watch out for, 7 reasons to choose Power BI to build enterprise solutions and more. Visit us tomorrow to read part two of the interview.
Read more
  • 0
  • 0
  • 3023
Banner background image

article-image-romeo-kienzler-mastering-apache-spark
Amey Varangaonkar
02 Oct 2017
7 min read
Save for later

Is Apache Spark today's Hadoop?

Amey Varangaonkar
02 Oct 2017
7 min read
With businesses generating data at an enormous rate today, many Big Data processing alternatives such as Apache Hadoop, Spark, Flink, and more have emerged in the last few years. Apache Spark among them has gained a lot of popularity of late, as it offers ease of use and sophisticated analytics, and helps you process data with speed and efficiency. [author title="Romeo Kienzler" image="https://www.linkedin.com/in/romeo-kienzler-089b4557/detail/photo/"]Chief Data Scientist in the IBM Watson IoT worldwide team, has been helping clients all over the world find insights from their IoT data using Apache Spark. An Associate Professor for Artificial Intelligence at Swiss University of Applied Sciences, Berne, he is also a member of the IBM Technical Expert Council and the IBM Academy of Technology, IBM's leading brains trust.[/author] In this interview, Romeo talks about his new book on Apache Spark and Spark’s evolution from just a data processing framework to becoming a solid, all-encompassing platform for real-time processing, streaming analytics and distributed Machine Learning. Key Takeaways Apache Spark has evolved to become a full-fledged platform for real-time batch processing and stream processing. Its in-memory computing capabilities allow for efficient streaming analytics, graph processing, and machine learning. It gives you the ability to work with your data at scale, without worrying if it is structured or unstructured. Popular frameworks like H2O and DeepLearning4J are using Apache Spark as their preferred platform for distributed AI, Machine Learning, and Deep Learning. Full-length Interview As a data scientist and an assistant professor, you must have used many tools both for your work and for research? What are some key criteria one must evaluate while choosing a big data analytics solution? What are your go-to tools and where does Spark rank among them? Scalability. Make sure you can use a cluster to accelerate execution of your processes TCO – How much do I have to pay for licensing and deployment. Consider the usage of Open Source (but keep maintenance in mind). Also, consider Cloud. I’ve shifted completely away from non-scalable environments like R and python pandas. I’ve also shifted away from scala for prototyping. I’m using scala only for mission-critical applications which have to be maintained for the long term. Otherwise, I’m using python. I’m trying to completely stay on Apache Spark for everything I’m doing which is feasible since Spark supports: SQL Machine Learning DeepLearning The advantage is that everything I’m doing is scalable by definition and once I need it I can scale without changing code. What does the road to mastering Apache Spark look like? What are some things that users may not have known about Apache Spark? Can readers look forward to learning about some of them in your new book: Mastering Apache Spark, second edition? Scaling on very large clusters is still tricky with Apache Spark because at a certain point scale-out is not linear anymore. So, a lot of tweaking of the various knobs is necessary. Also, the Spark API somehow is slightly more tedious that the one of R or python Pandas – so it needs some energy to really stick with it and not to go back to “the good old R-Studio”. Next, I think the strategic shift from RDDs to DataFrames and Datasets was a disrupting but necessary step. In the book, I try to justify this step and first explain how the new API and the two related projects Tungsten and Catalyst work. Then I show how things like machine learning, streaming, and graph processing are done in the traditional, RDD based way as well as in the new DataFrames and Datasets based way. What are the top 3 data analysis challenges that never seem to go away even as time and technology keep changing? How does Spark help alleviate them? Data quality. Data is often noisy and in bad formats. The majority of the time I spend improving it through various methodologies. Apache Spark helps me to scale. SparkSQL and SparkML pipelines introduce a standardized framework for doing so. Unstructured data preparation. A lot of data is unstructured in the form of text. Apache Spark allows me to pre-process vast amount of text and create tiny mathematical representations out of it for downstream analysis. Instability on technology. Every six months there is a new hype which seems to make everything you’ve learned redundant. So, for example, there exist various scripting languages for big data. SparkSQL ensures that I can use my already acquired SQL skills now and in future. How is the latest Apache Spark 2.2.0 a significant improvement over the previous version? The most significant change, in my opinion, was labeling Structured Streaming GA and no longer as experimental. Otherwise, there have been “only” minor improvements, mainly on performance, 72 to be precise as all are documented in JIRA since it is an Apache project. The most significant improvement between version 1.6 to 2.0 was whole stage code generation in Tungsten which is also covered in this book. Streaming analytics has become mainstream. What role did Apache Spark play in leading this trend?   Actually, Apache Spark takes it to the next level by introducing the concept of continuous applications. So with Apache Spark, the streaming and batch API have been unified that you actually don’t have to care anymore on what type of data you are running your queries on. You can even mix and match. For example joining a structured stream, a relational database, a NoSQL database and a file in HDFS within a single SQL statement. Everything is possible. Mastering Apache Spark was first published back in 2015. Big data has greatly evolved since then. What does the second edition of Mastering Apache Spark offer readers today in this context? Back in 2015, Apache Spark was just another framework within the Hadoop ecosystem. Now, Apache Spark has grown to be one of the largest open source projects on this planet! Apache Spark is the new big data operating system like Hadoop was back in 2015. AI and Deep Learning are the most important trends and as explained in this book, Frameworks like H2O, DeepLearning4J and Apache SystemML are using Apache Spark as their big data operation system to scale.   I think I’ve done a very good job in taking real-life examples from my work and finding a good open data source or writing a good simulator to give hands-on experience in solving real-world problems. So in the book, you should find a recipe for all the current data science problems you find in the industry.   2015 was also the year when Apache Spark and IBM Watson chose to join hands. As the Chief data scientist for IBM Watson IoT, give us a glimpse of what this partnership is set to achieve. This partnership underpins IBM’s strong commitment to open source. Not only is IBM contributing to Apache Spark, IBM also creates new open source projects on top of it. The most prominent example is Apache SystemML which is also covered in this book. The next three years are dedicated to DeepLearning and AI. And IBM’s open source contributions will help the Apache Spark community to succeed. The most prominent example is PowerAI where IBM outperformed all state-of-the-art deep learning technologies for image recognition. For someone just starting out in the field of big data and analytics, what would your advice be?   I suggest taking a Machine Learning course of one of the leading online training vendors. Then take a Spark course (or read my book). Finally, try to do everything yourself. Participate in Kaggle competitions and try to replicate papers.
Read more
  • 0
  • 0
  • 11161

article-image-use-keras-deep-learning
Amey Varangaonkar
13 Sep 2017
5 min read
Save for later

Why you should use Keras for deep learning

Amey Varangaonkar
13 Sep 2017
5 min read
A lot of people rave about TensorFlow and Theano, but there are is one complaint you hear fairly regularly: that they can be a little challenging to use if you're directly building deep learning models. That’s where Keras comes to the rescue. It's a high-level deep learning library written in Python that can be used as a wrapper on top of TensorFlow or Theano, to simplify the model training process and to make the models more efficient. Sujit Pal is Technology Research Director at Elsevier Labs. He has been working with Keras for some time. He is an expert in Semantic Search, Natural Language Processing and Machine Learning. He's also the co-author of Deep Learning with Keras, which is why we spoke to him about why you should use start using Keras (he's very convincing). 5 reasons you should start using Keras Keras is easy to get started with if you’ve worked with Python before and have some basic knowledge of neural networks. It works on top of Theano and TensorFlow seamlessly to create efficient deep learning models. It offers just the right amount of abstraction - allowing you to focus on the problem at hand rather than worry about the complexity of using the framework. It is a handy tool to use if you’re looking to build models related to Computer Vision or Natural Language Processing. Keras is a very expressive framework that allows for rapid prototyping of models. Why I started using Keras Packt: Why did you start using using Keras? Sujit Pal: My first deep learning toolkit was actually Caffe, then TensorFlow, both for work related projects. I learned Keras for a personal project and I was impressed by the Goldilocks (i.e. just right) quality of the abstraction. Thinking at the layer level was far more convenient than having to think in terms of matrix multiplication that TensorFlow makes you do, and at the same time I liked the control I got from using a programming language (Python) as opposed to using JSON in Caffe. I've used Keras for multiple projects now. Packt: How has this experience been different from other frameworks and tools? What problems does it solve exclusively? Sujit: I think Keras has the right combination of simplicity and power. In addition, it allows you to run against either TensorFlow or Theano backends. I understand that it is being extended to support two other backends - CNTK and MXNet. The documentation on the Keras site is extremely good and the API itself (both the Sequential and Functional ones) are very intuitive. I personally took to it like a fish to water, and I have heard from quite a few other people that their experiences were very similar. What you need to know to start using Keras Packt: What are the prerequisites to learning Keras? And what aspects are tricky to learn? Sujit: I think you need to know some basic Python and have some idea about Neural Networks. I started with Neural Networks from the Google/edX course taught by Vincent Van Houke. It’s pretty basic (and taught using TensorFlow) but you can start building networks with Keras even with that kind of basic background. Also, if you have used numpy or scikit-learn, some of the API is easier to pick up because of the similarities. I think the one aspect I have had a few problems with is building custom layers. While there is some documentation that is just enough to get you started, I think Keras would be usable in many more situations if the documentation for the custom layers was better, maybe more in line with the rest of Keras. Things like how to signal that a layer supports masking or multiple tensors, debugging layers, etc. Packt: Why do you use Keras in your day-to-day programming and data science tasks? Sujit: I have spent most of last year working with Image classification and similarity, and I've used Keras to build most of my more recent models. This year I am hoping to do some work with NLP as it relates to images, such as generating image captions, etc. On the personal projects side, I have used Keras for building question answering and disease prediction models, both with data from Kaggle competitions. How Keras could be improved Packt: As a developer, what do you think are the areas of development for Keras as a library? Where do you struggle the most? Sujit: As I mentioned before, the Keras API is quite comprehensive and most of the time Keras is all you need to build networks, but occasionally you do hit its limits. So I think the biggest area of Keras that could be improved would be extensibility, using its backend interface. Another thing I am excited about is the contrib.keras package in TensorFlow, I think it might open up even more opportunity for customization, or at least the potential to maybe mix and match TensorFlow with Keras.
Read more
  • 0
  • 0
  • 4243

article-image-machine-learning-can-useful-almost-every-problem-domain-interview-sebastian-raschka
Packt Editorial Staff
04 Sep 2017
9 min read
Save for later

Has Machine Learning become more accessible?

Packt Editorial Staff
04 Sep 2017
9 min read
Sebastian Raschka is a machine learning expert. He is currently a researcher at Michigan State University, where he is working on computational biology. But he is also the author of Python Machine Learning, the most popular book ever published by Packt. It's a book that has helped to define the field, breaking it out of the purely theoretical and showing readers how machine learning algorithms can be applied to everyday problems. Python Machine Learning was published in 2015, but Sebastian is back with a brand new edition, updated and improved for 2017, working alongside his colleague Vahid Mirjalili. We were lucky enough to catch Sebastian in between his research and working on the new edition to ask him a few questions about what's new in the second edition of Python Machine Learning, and to get his assessment of what the key challenges and opportunities in data science are today. What's the most interesting takeaway from your book? Sebastian Raschka: In my opinion, the key take away from my book is that machine learning can be useful in almost every problem domain. I cover a lot of different subfields of machine learning in my book: classification, regression analysis, clustering, feature extraction, dimensionality reduction, and so forth. By providing hands-on examples for each one of those topics, my hope is that people can find inspiration for applying these fundamental techniques to drive their research or industrial applications. Also, by using well-developed and maintained open source software, makes machine learning very accessible to a broad audience of experienced programmers as well as people who are new to programming. And introducing the basic mathematics behind machine learning, we can appreciate machine learning being more than just black box algorithms, giving readers an intuition of the capabilities but also limitations of machine learning, and how to apply those algorithms wisely. What's new in the second edition? SR: As time and the software world moved on after the first edition was released in September 2015, we decided to replace the introduction to deep learning via Theano. No worries, we didn't remove it! But it got a substantial overhaul and is now based on TensorFlow, which has become a major player in my research toolbox since its open source release by Google in November 2015. Along with the new introduction to deep learning using TensorFlow, the biggest additions to this new edition are three brand new chapters focussing on deep learning applications: A more detailed overview of the TensorFlow mechanics, an introduction to convolutional neural networks for image classification, and an introduction to recurrent neural networks for natural language processing. Of course, and in a similar vein as the rest of the book, these new chapters do not only provide readers with practical instructions and examples but also introduce the fundamental mathematics behind those concepts, which are an essential building block for understanding how deep learning works. What do you think is the most exciting trend in data science and machine learning? SR: One interesting trend in data science and machine learning is the development of libraries that make machine learning even more accessible. Popular examples include TPOT and AutoML/auto-sklearn. Or, in other words, libraries that further automate the building of machine learning pipelines. While such tools do not aim to replace experts in the field, they may be able to make machine learning even more accessible to an even broader audience of non-programmers. However, being to interpret the outcomes of predictive modeling tasks and being to evaluate the results appropriately will always require a certain amount of knowledge. Thus, I see those tools not as replacements but rather as assistants for data scientists, to automate tedious tasks such as hyperparameter tuning. Another interesting trend is the continued development of novel deep learning architectures and the large progress in deep learning research overall. We've seen many interesting ideas from generative adversarial neural networks (GANs), densely connected neural networks (DenseNets), and  ladder networks. Large profress has been made in this field thanks to those new ideas and the continued improvements of deep learning libraries (and our computing infrastructure) that accelerate the implementation of research ideas and the development of these technologies in industrial applications. How has the industry changed since you first started working? SR: Over the years, I have noticed that more and more companies embrace open source, i.e., by sharing parts of their tool chain in GitHub, which is great. Also, data science and open source related conferences keep growing, which means more and more people are not only getting interested in data science but also consider working together, for example, as open source contributors in their free time, which is nice. Another thing I noticed is that as deep learning becomes more and more popular, there seems to be an urge to apply deep learning to problems even if it doesn't necessarily make sense -- i.e., the urge to use deep learning just for the sake of using deep learning. Overall, the positive thing is that people get excited about new and creative approaches to problem-solving, which can drive the field forward. Also, I noticed that more and more people from other domains become more familiar with the techniques used in statistical modeling (thanks to "data science") and machine learning. This is nice, since good communication in collaborations and teams is important, and a given, common knowledge about the basics makes this communication indeed a bit easier. What advice would you give to someone who wants to become a data scientist? SR: I recommend starting with a practical, introductory book or course to get a brief overview of the field and the different techniques that exist. A selection of concrete examples would be beneficial for understanding the big picture and what data science and machine learning is capable of. Next, I would start a passion project while trying to apply the newly learned techniques from statistics and machine learning to address and answer interesting questions related to this project. While working on an exciting project, I think the practitioner will naturally become motivated to read through the more advanced material and improve their skill. What are the biggest misunderstandings and misconceptions people have about machine learning today? Well, there's this whole debate on AI turning evil. As far as I can tell, the fear mongering is mostly driven by journalists who don't work in the field and are apparently looking for catchy headlines. Anyway, let me not iterate over this topic as readers can find plenty of information (from both viewpoints) in the news and all over the internet. To say it with one of the earlier comments, Andrew Ng's famous quote: “I don’t work on preventing AI from turning evil for the same reason that I don’t work on combating overpopulation on the planet Mars." What's so great about Python? Why do you think it's used in data science and beyond? SR: It is hard to tell which came first: Python becoming a popular language so that many people developed all the great open-source libraries for scientific computing, data science, and machine learning or Python becoming so popular due to the availability of these open-source libraries. One thing is obvious though: Python is a very versatile language that is easy to learn and easy to use. While most algorithms for scientific computing are not implemented in pure Python, Python is an excellent language for interacting with very efficient implementations in Fortran, C/C++, and other languages under the hood. This, calling code from computationally efficient low-level languages but also providing users with a very natural and intuitive programming interface, is probably one of the big reasons behind Python's rise to popularity as a lingua franca in the data science and machine learning community. What tools, frameworks and libraries do you think people should be paying attention to? There are many interesting libraries being developed for Python. As a data scientist or machine learning practitioner, I'd especially want to highlight the well-maintained tools from Python core scientific stack: -       NumPy and SciPy as efficient libraries for working with data arrays and scientific computing -       Pandas to read in and manipulate data in a convenient data frame format -       matplotlib for data visualization (and seaborn for additional plotting capabilities and more specialized plots) -       scikit-learn for general machine learning There are many, many more libraries that I find useful in my project. For example, Dask is an excellent library for working with data frames that are too large to fit into memory and to parallelize computations across multiple processors. Or take TensorFlow, Keras, and PyTorch, which are all excellent libraries for implementing deep learning models. What does the future look like for Python? In my opinion, Python's future looks very bright! For example, Python has just been ranked as top 1 programming language by IEEE Spectrum as of July 2017. While I mainly speak of Python from the data science/machine learning perspective, I heard from many people in other domains that they appreciate Python as a versatile language and its rich ecosystem of libraries. Of course, Python may not be the best tool for every problem, it is very well regarded as a "productive" language for programmers who want to "get things done." Also, while the availability of plenty of libraries is one of the strengths of Python, I must also highlight that most packages that have been developed are still being exceptionally well maintained, and new features and improvements to the core data science and machine learning libraries are being added on a daily basis. For instance, the NumPy project, which has been around since 2006, just received a $645,000 grant to further support its continued developed as a core library for scientific computing in Python. At this point, I also want to thank all the developers of Python and its open source libraries that have made Python to what it is today. It's an immensely useful tool to me, and as Python user, I also hope you will consider getting involved in open source -- every contribution is useful and appreciated, small documentation fixes, bug fixes in the code, new features, or entirely new libraries. Again, and with big thanks to the awesome community around it,  I think Python's future looks very bright.
Read more
  • 0
  • 0
  • 2342
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-interview-hussein-nasser
Hussein Nasser
01 Jul 2014
4 min read
Save for later

An Interview with Hussein Nasser

Hussein Nasser
01 Jul 2014
4 min read
What initially drew you to write your book for Packt Publishing? In 2009, I started writing technical articles on my personal blog. I would write about my field, Geographic Information Systems, or any other technical articles. Whenever a new technology emerged, a new product,or sometimes even mere tips or tricks,I would write an article about it. My blog became a well-known site in GIS, and that is when Packt approached me with a proposed title. I always wanted to write a book but I never expected that the opportunity would knock on my door. I thank Packt for giving me that opportunity. When you began writing, what were your main aims? My main aim was to write a book that readers in my domain could grab and benefit from. While working on a chapter, I would always imagine a reader picking up the book and reading that particular chapter and asked myself, what could I do better? And then I tried to make the chapter as simple as possible and leave nothing unexplained. What did you enjoy most and what was most rewarding about the experience of writing? Think about all the knowledge, information, ideas, and tips that you possess. You knew you had it in you somewhere but you didn’t know the joy and delight you would feel when this knowledge slipped through your fingertips into a physical medium. With each reading I would reread and polish the chapters;it seems there is always room for improvement in writing. Why, in your opinion, is ArcGIS exciting to discover, read, and write about? ArcGIS is not a new technology; it has been around for more than 14 years. It has become mature and polished during these years. It has expanded and started touching other bleeding-edge technologies like mobile, web, and the cloud. Everyday this technology is increasingly worth discovering and everyday it benefits areas like health, utilities, transportation, and so on. Why do you think interest in GIS is on the rise? If you read The Tipping Point,by Malcolm T. Gladwell, you will understand that the smartphone was actually a tipping point for the GIS technology. GIS was only used by enterprises and big companies who wanted to add the location dimension to their tabular data so it helped them better visualize and analyze their information. With smartphones and GPS, geographic location became more relevant. Pictures taken with smartphones are tagged with location information. Applications were developed to harness the power of GIS for routing, finding the best restaurants in an area, calculating shortest routes, finding information based on geo-fencing technology that sends you text messages when you pass by a shop, and so on. The popularity of GIS is rising and so is the interest in adapting this technology. What do you see on the horizon for GIS? High end processing servers are being sent to the cloud while we are carrying smaller and smaller gadgets. Networking is getting stronger every day with the LTE and 4G networks already setup in many countries. Storage has become no issue at all. The Web architecture is dominant so far and it is the most open and compatible platform that has ever existed. As long as we keep using devices, we will need geographic information systems. The data can be consumed and fetched swiftly from anywhere in the world from the smallest device. I believe this will evolve to an extent that everything valuable we own can be tagged with a location, so when we misplace something or lose it, we can always use GIS to locate it. Any tips for new authors? My role model author is Seth Godin; the first book I ever read was his. When I told him about my new book and asked him for any advice he might give me as a new author, he told me and I quote,″Congratulations, Hussein .This is thrilling to hear; my only advice is to keep writing!″ I took his advice and now I′m working on my second book with Packt. Another personal tip I can give to new authors is thatwriting needs focus, and I find music the best soul feeding source. While working on my first book,I discovered this site www.stereomood.com, which plays music that will help you write. Another thing is to use a clutter free word processor application that will blank the entire screen so you are only left with your words. I use WriteMonkey for Windows and Focus writer for Mac.
Read more
  • 0
  • 0
  • 3298