Data Analysis | 34 articles | Tech News, Tutorials & Expert Insights

21 Nov 2019

11 min read

What does a data science team look like?

21 Nov 2019

Until a couple of years ago, people barely knew the term 'data science' which has now evolved into an extremely popular career field. The Harvard Business Review dubbed data scientist within the data science team as the sexiest job of the 21st century and expert professionals jumped on the data is the new oil bandwagon. As per the Figure Eight Report 2018, which takes the pulse of the data science community in the US, a lot has changed rapidly in the data science field over the years. For the 2018 report, they surveyed approximately 240 data scientists and found out that machine learning projects have multiplied and more and more data is required to power them. Data science and machine learning jobs are LinkedIn's fastest growing jobs. And the internet is creating 2.5 quintillion bytes of data to process and analyze each day. With all these changes, it is evident for data science teams to evolve and change among various organizations. The data science team is responsible for delivering complex projects where system analysis, software engineering, data engineering, and data science is used to deliver the final solution. To achieve all of this, the team does not only have a data scientist or a data analyst but also includes other roles like business analyst, data engineer or architect, and chief data officer. In this post, we will differentiate and discuss various job roles within a data science team, skill sets required and the compensation benefit for each one of them. For an in-depth understanding of data science teams, read the book, Managing Data Science by Kirill Dubovikov, which has interesting case studies on building successful data science teams. He also explores how the team can efficiently manage data science projects through the use of DevOps and ModelOps. Now let's get into understanding individual data science roles and functions, but before that we take a look at the structure of the team.There are three basic team structures to match different stages of AI/ML adoption: IT centric team structure At times for companies hiring a data science team is not an option, and they have to leverage in-house talent. During such situations, they take advantage of the fully functional in-house IT department. The IT team manages functions like data preparation, training models, creating user interfaces, and model deployment within the corporate IT infrastructure. This approach is fairly limited, but it is made practical by MLaaS solutions. Environments like Microsoft Azure or Amazon Web Services (AWS) are equipped with approachable user interfaces to clean datasets, train models, evaluate them, and deploy. Microsoft Azure, for instance, supports its users with detailed documentation for a low entry threshold. The documentation helps in fast training and early deployment of models even without an expert data scientists on board. Integrated team structure Within the integrated structure, companies have a data science team which focuses on dataset preparation and model training, while IT specialists take charge of the interfaces and infrastructure for model deployment. Combining machine learning expertise with IT resource is the most viable option for constant and scalable machine learning operations. Unlike the IT centric approach, the integrated method requires having an experienced data scientist within the team. This approach ensures better operational flexibility in terms of available techniques. Additionally, the team leverages deeper understanding of machine learning tools and libraries – like TensorFlow or Theano which are specifically for researchers and data science experts. Specialized data science team Companies can also have an independent data science department to build an all-encompassing machine learning applications and frameworks. This approach entails the highest cost. All operations, from data cleaning and model training to building front-end interfaces, are handled by a dedicated data science team. It doesn't necessarily mean that all team members should have a data science background, but they should have technology background with certain service management skills. A specialized structure model aids in addressing complex data science tasks that include research, use of multiple ML models tailored to various aspects of decision-making, or multiple ML backed services. Today's most successful Silicon Valley tech operates with specialized data science teams. Additionally they are custom-built and wired for specific tasks to achieve different business goals. For example, the team structure at Airbnb is one of the most interesting use cases. Martin Daniel, a data scientist at Airbnb in this talk explains how the team emphasizes on having an experimentation-centric culture and apply machine learning rigorously to address unique product challenges. Job roles and responsibilities within data science team As discussed earlier, there are many roles within a data science team. As per Michael Hochster, Director of Data Science at Stitch Fix, there are two types of data scientists: Type A and Type B. Type A stands for analysis. Individuals involved in Type A are statisticians that make sense of data without necessarily having strong programming knowledge. Type A data scientists perform data cleaning, forecasting, modeling, visualization, etc. Type B stands for building. These individuals use data in production. They're good software engineers with strong programming knowledge and statistics background. They build recommendation systems, personalization use cases, etc. Though it is rare that one expert will fit into a single category. But understanding these data science functions can help make sense of the roles described further. Chief data officer/Chief analytics officer The chief data officer (CDO) role has been taking organizations by storm. A recent NewVantage Partners' Big Data Executive Survey 2018 found that 62.5% of Fortune 1000 business and technology decision-makers said their organization appointed a chief data officer. The role of chief data officer involves overseeing a range of data-related functions that may include data management, ensuring data quality and creating data strategy. He or she may also be responsible for data analytics and business intelligence, the process of drawing valuable insights from data. Even though chief data officer and chief analytics officer (CAO) are two distinct roles, it is often handled by the same person. Expert professionals and leaders in analytics also own the data strategy and how a company should treat its data. It does make sense as analytics provide insights and value to the data. Hence, with a CDO+CAO combination companies can take advantage of a good data strategy and proper data management without losing on quality. According to compensation analysis from PayScale, the median chief data officer salary is $177,405 per year, including bonuses and profit share, ranging from $118,427 to $313,791 annually. Skill sets required: Data science and analytics, programming skills, domain expertise, leadership and visionary abilities are required. Data analyst The data analyst role implies proper data collection and interpretation activities. The person in this job role will ensure that collected data is relevant and exhaustive while also interpreting the results of the data analysis. Some companies also require data analysts to have visualization skills to convert alienating numbers into tangible insights through graphics. As per Indeed, the average salary for a data analyst is $68,195 per year in the United States. Skill sets required: Programming languages like R, Python, JavaScript, C/C++, SQL. With this critical thinking, data visualization and presentation skills will be good to have. Data scientist Data scientists are data experts who have the technical skills to solve complex problems and the curiosity to explore what problems are needed to be solved. A data scientist is an individual who develops machine learning models to make predictions and is well versed in algorithm development and computer science. This person will also know the complete lifecycle of the model development. A data scientist requires large amounts of data to develop hypotheses, make inferences, and analyze customer and market trends. Basic responsibilities include gathering and analyzing data, using various types of analytics and reporting tools to detect patterns, trends and relationships in data sets. According to Glassdoor, the current U.S. average salary for a data scientist is $118,709. Skills set required: A data scientist will require knowledge of big data platforms and tools like Seahorse powered by Apache Spark, JupyterLab, TensorFlow and MapReduce; and programming languages that include SQL, Python, Scala and Perl; and statistical computing languages, such as R. They should also have cloud computing capabilities and knowledge of various cloud platforms like AWS, Microsoft Azure etc.You can also read this post on how to ace a data science interview to know more. Machine learning engineer At times a data scientist is confused with machine learning engineers, but a machine learning engineer is a distinct role that involves different responsibilities. A machine learning engineer is someone who is responsible for combining software engineering and machine modeling skills. This person determines which model to use and what data should be used for each model. Probability and statistics are also their forte. Everything that goes into training, monitoring, and maintaining a model is the ML engineer's job. The average machine learning engineer's salary is $146,085 in the US, and is ranked No.1 on the Indeed's Best Jobs in 2019 list. Skill sets required: Machine learning engineers will be required to have expertise in computer science and programming languages like R, Python, Scala, Java etc. They would also be required to have probability techniques, data modelling and evaluation techniques. Data architects and data engineers The data architects and data engineers work in tandem to conceptualize, visualize, and build an enterprise data management framework. The data architect visualizes the complete framework to create a blueprint, which the data engineer can use to build a digital framework. The data engineering role has recently evolved from the traditional software-engineering field. Recent enterprise data management experiments indicate that the data-focused software engineers are needed to work along with the data architects to build a strong data architecture. Average salary for a data architect in the US ranges from $1,22,000 to $1,29, 000 annually as per a recent LinkedIn survey. Skill sets required: A data architect or an engineer should have a keen interest and experience in programming languages frameworks like HTML5, RESTful services, Spark, Python, Hive, Kafka, and CSS etc. They should have the required knowledge and experience to handle database technologies such as PostgreSQL, MapReduce and MongoDB and visualization platforms such as; Tableau, Spotfire etc. Business analyst A business analyst (BA) basically handles Chief analytics officer's role but on the operational level. This implies converting business expectations into data analysis. If your core data scientist lacks domain expertise, a business analyst can bridge the gap. They are responsible for using data analytics to assess processes, determine requirements and deliver data-driven recommendations and reports to executives and stakeholders. BAs engage with business leaders and users to understand how data-driven changes will be implemented to processes, products, services, software and hardware. They further articulate these ideas and balance them against technologically feasible and financially reasonable. The average salary for a business analyst is $75,078 per year in the United States, as per Indeed. Skill sets required: Excellent domain and industry expertise will be required. With this good communication as well as data visualization skills and knowledge of business intelligence tools will be good to have. Data visualization engineer This specific role is not present in each of the data science teams as some of the responsibilities are realized by either a data analyst or a data architect. Hence, this role is only necessary for a specialized data science model. The role of a data visualization engineer involves having a solid understanding of UI development to create custom data visualization elements for your stakeholders. Regardless of the technology, successful data visualization engineers have to understand principles of design, both graphical and more generally user-centered design. As per Payscale, the average salary for a data visualization engineer is $98,264. Skill sets required: A data visualization engineer need to have rigorous knowledge of data visualization methods and be able to produce various charts and graphs to represent data. Additionally they must understand the fundamentals of design principles and visual display of information. To sum it up, a data science team has evolved to create a number of job roles and opportunities, but companies still face challenges in building up the team from scratch and find it hard to figure where to start from. If you are facing a similar dilemma, check out this book, Managing Data Science, written by Kirill Dubovikov. It covers concepts and methodologies to manage and deliver top-notch data science solutions, while also providing guidance on hiring, growing and sustaining a successful data science team. How to learn data science: from data mining to machine learning How to ace a data science interview Data science vs. machine learning: understanding the difference and what it means today 30 common data science terms explained 9 Data Science Myths Debunked

0
0
13181

Guest Contributor

04 Mar 2019

6 min read

Alteryx vs. Tableau: Choosing the right data analytics tool for your business

Guest Contributor

04 Mar 2019

6 min read

Data Visualization is commonly used in the modern world, where most business decisions are taken into consideration by analyzing the data. One of the most significant benefits of data visualization is that it enables us to visually access huge amounts of data in easily understandable visuals. There are many areas where data visualization is being used. Some of the data visualization tools include Tableau, Alteryx, Infogram, ChartBlocks, Datawrapper, Plotly, Visual.ly, etc. Tableau and Alteryx are industry standard tools and have dominated the data analytics market for a few years now and still running strong without any strong competition. In this article, we will understand the core differences between Alteryx tool and Tableau. This will help us in deciding which tool to use for what purposes. Tableau is one of the top-rated tools which helps the analysts to carry out business intelligence and data visualization activities. Using Tableau, the users will be able to generate compelling dashboards and stunning data visualizations. Tableau’s interactive user interface helps users to quickly generate reports where they can drill down the information to a granular level. Alteryx is a powerful tool widely used in data analytics and also provides meaningful insights to the executive level personnel. With the user-friendly interface, the user will be able to extract the data, transform the data, and load the data within the Alteryx tool. Why use Alteryx with Tableau? The use of Alteryx with Tableau is a powerful combination when it comes to getting value-added data decisions. With Alteryx, businesses can manipulate their data and provide input to the Tableau platform, which in return will be able to showcase strong data visualizations. This will help the businesses to take appropriate actions which are backed up with data analysis. Alteryx and Tableau tools are widely used within organizations where the decisions can be taken into consideration based on the insights obtained from data analysis. Talking about data handling, Alteryx is a powerful ETL platform where data can be analyzed in different formats. When it comes to data representation, Tableau is a perfect match. Further, using Tableau the reports can be shared across team members. Nowadays, most of the businesses want to see real-time data and want to understand business trends. The combination of Alteryx and Tableau allows the data analysts to analyze the data, and generate meaningful insights to the users, on-the-fly. Here, data analysis can be executed within the Alteryx tool where the raw data is handled, and then the data representation or visualization is done in Tableau, so both of these tools go hand in hand. Tableau vs Alteryx The table below lists the differences between the tools. Alteryx Tableau This tool is known as a smart data analytics platform. This tool is known for its data visualization capabilities. 2. Can connect with different data sources and can synthesize the raw data. A standard ETL process is possible. 2. Can connect with different data sources and provide data visualization within minutes from the gathered data. 3. Helps in terms of the data analysis 3. Helps in terms of building appealing graphs. 4. The GUI is okay and widely accepted. 4. The GUI is one of the best features where graphs can be easily built by using drag and drop options. 5. Technical knowledge is necessary because it involves in data sources integrations, and also data blending activity. 5. Technical knowledge is not necessary, because all the data will be polished and only the user has to build graphs/visualization. 6. Once the data blending activity is completed, the users will be able to share the file which can be consumed by Tableau. 6. Once the graphs are prepared, the reports can be easily shared among team members without any hassle. 7. A lot of flexibility while using this tool for data blending activity. 7. Flexibility while using the tool for data visualization. 8. Using this tool, the users will be able to do spatial and predictive analysis 8. Possible by representing the data in an appropriate format. 9. One of the best tools when it comes to data preparations. 9. Not feasible to prepare the data in Tableau when it is compared to Alteryx. 10. Data representation cannot be done accurately. 10. It is a wonderful tool for data representation. 11. Has one time feeds- Annual fees 11. Has an option to pay monthly as well. 12. Has a drag and drop interface where the user can develop a workflow easily. 12. Has a drag and drop interface where the user will be able to build a visualization in no time. Alteryx and Tableau Integration As discussed earlier, these two tools have their own advantages and disadvantages, but when integrated together, they can do wonders with the data. This integration between Tableau and Alteryx makes the task of visualizing the Alteryx generated answers quite simple. The data is first loaded into the Alteryx tool and is then extracted in the form of .tde files (i.e. Tableau Data Extracted Files). These .tde files will be consumed by Tableau tool to do the data visualization part. On a regular basis, the data extracted file from Alteryx tool (i.e. .tde files) will be generated and will replace the old .tde files. Thus, by integrating Alteryx and Tableau, we can: Cleanse, combine, as well as collect all the data sources that are relevant and enrich them with the help of third-party data - everything in one workflow. Give analytical context to your data by providing predictive, location-based, and deep spatial analytics. Publish your analytic workflows’ results to Tableau for intuitive, rich visualizations that help you in making decisions more quickly. Tableau and Alteryx do not require any advanced skill-set as both tools have simple drag and drop interfaces. You can create a workflow in Alteryx that can process data in a sequential manner. In a similar way, Tableau enables you to build charts by dragging various fields to be utilized, to specified areas. The companies which have a lot of data to analyze, and can spend large amounts of money on analytics, can use these two tools. There doesn’t exist any significant challenges during Tableau, Alteryx integration. Conclusion When Tableau and Alteryx are used together, it is really useful for the businesses so that the senior management can take decisions based on the data insights provided by these tools. These two tools compliment each other and provide high-quality service to businesses. Author Bio Savaram Ravindra is a Senior Content Contributor at Mindmajix.com. His passion lies in writing articles on different niches, which include some of the most innovative and emerging software technologies, digital marketing, businesses, and so on. By being a guest blogger, he helps his company acquire quality traffic to its website and build its domain name and search engine authority. Before devoting his work full time to the writing profession, he was a programmer analyst at Cognizant Technology Solutions. Follow him on LinkedIn and Twitter. How to share insights using Alteryx Server How to do data storytelling well with Tableau [Video] A tale of two tools: Tableau and Power BI

0
0
14567

Melisha Dsouza

30 Jan 2019

9 min read

‘Computing technology at a tipping point’, says WEF Davos Panel

Melisha Dsouza

30 Jan 2019

9 min read

The ongoing World Economic Forum meeting 2019 has seen a vast array of discussions on political, technological and other industrial agendas. The meeting brings together the world’s foremost CEOs, government officials, policy-makers, experts and academics, international organizations, youth, technology innovators and representatives of civil society with an aim to drive positive change in the world on multiple facets. This article will focus on the talk ‘Computing Technology at a Tipping Point’ that was moderated by Nicholas Carlson from Business Insider with a panel consisting of Antonio Neri, president and Chief Executive Officer of Hewlett Packard Enterprise, Jeremy O’Brien, CEO of PsiQuantum and Amy Webb, Adjunct Assistant Professor of NYU Stern School of Business. Their discussion explored questions of today's age, ranging from- why this is an important time for technology, the role of governments in encouraging a technological revolution, role of the community and business in optimizing tech and the challenges faced as we set out to utilize the next generation computing technologies like quantum computing and AI. Quantum Computing - The necessity of the future The discussion kicked off with the importance of Quantum computing at the present as well as the future. O’Brien defined Quantum computing as “Nothing short of a necessary tool that humans need to build their future”. According to him, QC is a “genuinely exponentially powerful technology”, due to the varied applications that quantum computing can impact if put to use in the correct way - from human health, energy, to molecular chemistry among others. Webb calls the year 2019 as the year of divergence, where we will move from the classic Von Neumann architecture to a more diversified Quantum age. Neri believes we are now at the end of Moore’s law that states overall processing power for computers will double every two years. He says that two years from now we will generate twice the amount of data as generated today and there will be a major divergence between the data generated and the computation power. This is why we need to focus on solving architectural problems of processing algorithms and computing data rather than focussing on the amount of data. Why is this an exciting time for tech? O’Brien: Quantum Computing, Molecular simulation for Techno-Optimism O’Brien expresses his excitement in the Quantum Computing and molecular simulation field where developers are just touching the waters with both these concepts. He has been in the QC field for the past 20 years and says that he has faith in Quantum computing and even though it's the next big thing to watch out for, he assures developers that it will not replace conventional computing. Quantum computers can be used in fact to improve the performance of classical computing systems to handle the huge amounts of data and information that we are faced with today. In addition to QC, another concept he believes that ‘will transform lives’ is molecular simulation. Molecular simulation will design new pharmaceuticals, new chemicals and help build really sophisticated computers to solve exponentially large problems. Webb: The beginning of the end of smartphones “We are in the midst of a great transformation. This is an explosion happening in slow motion”. Based on data-driven models she says this is the beginning of the end of smartphones. 10 years from now, as our phones retrieve biometric information to information derived from what we wear and we use, the computing environments will look different. Citing an example of MagicLeap who creates spatial glasses, she mentions how computable devices we wear will turn our environment into a computable space to visualize data in a whole different way. She advises business' to rethink how they function; even between the current cloud V/s edge and computer architectures change. Companies should start thinking in terms of 10 years rather than short term, since decisions made today will have long term consequences. While this is the positive side, Webb is pessimistic that there is no global alignment on the use of data. On the basis of GDPR and other data laws, systems have to be trained. Neri: continuous re-skilling to stay relevant Humans should continuously re-skill themselves with changing times and technologies to avoid an exclusion from new jobs as and when they arrive. He further states that, in the field of Artificial intelligence, there should not be a concentration of power in a few entities like Baidu, Alibaba, Tencent Google, Microsoft, Facebook, Apple and others. While these companies are at the foremost while deciding the future of AI, innovation should happen at all levels. We need guidelines and policy for the same- not to regulate but to guide the revolution. Business, community and Government should start thinking about ethical and moral codes. Government’s role in Technological Optimism The speakers emphasized on the importance of the government's’ involvement in these ‘exciting times’ and how they can work towards making citizens feel safe against the possible abuse of technology. Webb: Regulation of AI doesn't make sense We need to have conversations on optimizing Artificial Intelligence using available data. She expresses her opinion that the regulation of AI doesn't make sense. This is because we shift from a group of people understanding and implementing optimization to lawmakers who do not understand technical know-how. Nowadays, people focus on regulating tech instead of optimizing it because most don’t understand the nitty-gritties of a system, nor do they understand a system’s limitations. Governments play a huge role in this optimization or regulation decision making. She emphasizes on the need to get hold of the right people to come to an agreement ,“ where companies are a hero to their shareholders and the government to their citizens” . Governments should start talking about and exploring Quantum computing such that its benefits are distributed equitably in a shortest amount of time. Neri: Human centered future of computing He adds that for a human centered future of computing, it is we who need to decide what is good or bad for us. He agrees with Webb’s point that since technology evolves in a way we cannot think of, we need to come to reasonable conclusions before a crisis arrives. Further, he adds that governments should inculcate moral ethics while adopting and implementing technology and innovation. Role of Politicians in technology During the discussion, a member of the European Parliament stated that people have a common notion that politicians do not understand technology and cannot keep up with changing times. Stating that many companies do not think about governance, human rights, democracy and possible abuse of their products; the questioner says that we need a minimum threshold to protect human rights and safeguard humans against abuse. Her question was centered around ways to invite politicians to understand tech better before it's too late. Expressing her gratitude that the European Parliament is asking such a thoughtful question, Webb suggested that creating some kind of framework that the key people on all sides of the spectrum can agree to and a mechanism that incentivises everyone to play fairly- will help parliaments and other law making bodies to feel inclusive in understanding technology. Neri also suggested a guiding principle to think ethically before using any technology without stopping innovation. Technological progress in China and its implications on the U.S. Another question that caught our attention was the progress of technology in China and its implications on the US. Webb says that the development of tools, technologies, frameworks and data gathering mechanisms to mine, refine and monetize data have different approaches in US and China. In China, the activities related to AI and activities of Baidu, Alibaba and Tencent are under the leadership of the Chinese communist Party. She says that it is hard to overlook what is happening in Chain with the BRI (Belt to Road Initiative), 5G, digital transformation, expansion in fibre and expansion in e-commerce and a new world order is being formed because of the same. She is worried that the US and its allies will be locked out economically from the BRI countries and AI will be one of the factors propelling the same . Role of the Military in technology The last question pointed out that some of the worst abuses of technology can be done by governments and the military has the potential to misuse technology. We need to have conversations on the ethical use of technology and how to design technology to fit ethical morals. Neri says that corporations do have a point of view on the military using technology for various reasons and the governments are consulting them on the impacts of technology on the world as well. This is a hard topic and the debate is ongoing even though it is not visible to the people. Webb says that the US always had ties with the government. We live in a world of social media where conversations spiral out of control because of the same. She advises companies to meet quarterly to have conversations along this line and understanding how their work with the military/ government align with the core values of their company. Sustainability and Technology Neri states that 6% of the global power is used to power data centers. It is important to determine how to address this problem. The solutions proposed for the same are: Innovate in different ways. Be mindful the entire supply chain--->from the time you procure minerals to build the system and recycle it. We need to think of a circular economy. Consider if systems can be re-used by other companies, check parts to be re-cycled and reused. We can use synthetic DNA to back up data - this could potentially use less energy. To sustain human life on this planet, we need to optimise how we ruse resources- physical and virtual, QC tool will invent the future. Materials can be built using QC. You can listen to the entire talk at the World Economic Forum’s official page. What the US-China tech and AI arms race means for the world – Frederick Kempe at Davos 2019 Microsoft’s Bing ‘back to normal’ in China Facebook’s outgoing Head of communications and policy takes the blame for hiring PR firm ‘Definers’ and reveals more

0
0
4702

article-image-python-data-visualization-myths-you-should-know-about

Savia Lobo

02 Nov 2018

4 min read

Python Data Visualization myths you should know about

Savia Lobo

02 Nov 2018

4 min read

In recent years, we have experienced an exponential growth of data. As the amount of data grows, the need for developers with knowledge of data analytics and especially data visualization spikes. Data visualizations help in getting a clear and concise view of the data, making it more tangible for (non-technical) audiences. MATLAB and R are the two available languages that have been traditionally used for data science and data visualization. However, Python is the most requested and used language in the industry. Its ease of use and the speed at which you can manipulate and visualize data combined with the number of available libraries makes Python the best choice. So Data visualization seems easy, doesn’t it? However, there are a lot of myths surrounding it. Let us have a look at some of them. Myth 1: Data visualizations are just for data scientists Today's data visualization libraries are very convenient, so any person can create meaningful visualizations in just a few minutes. Myth 2: Data visualization technologies are difficult to learn Of course, building and designing sophisticated data visualizations will take some work and learning but with very little knowledge of the libraries and what they are capable of, you can create simple visualizations that will help you get valuable insights into your data. Python is a comparably easy language. The “pythonic” approach is also used when building visualization libraries for Python which makes them easy to understand and use. Myth 3: Data visualization isn’t needed for data insights Imagine having a table of data with 20 columns and several thousand rows. What do you think will give you better insights into this data? Just looking at the table and trying to make sense of all the columns and values in them, or creating some simple plots that visualize the content of this table? Of course, you could force yourself to get insights without visualizations, but the key is to work smarter, not harder. Myth 4: Data visualization takes a lot of time If you have a basic understanding of your data, you can create some basic visualizations in no time. There are a lot of libraries, which will be covered in this course, that allow you to simply import some data and build visualizations in a few lines of code. The more difficult part is creating visualizations which are descriptive and display the concepts you wanted to show but don’t worry, this will be discussed in the course in detail as well. Amidst all the myths, Data visualization in combination with Python is an essential skill when working with data. When properly utilized, it is a powerful combination that not only enables you to get better insights into your data but also gives you the tool to communicate results better. Head over to our course titled ‘Data Visualization with Python’, to use Python with NumPy, Pandas, Matplotlib, and Seaborn to create impactful data visualizations with the real world, public data. About Tim and Mario Tim Großmann is a CS student with interest in diverse topics ranging from AI to IoT. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of big data engineering. He’s highly involved in different Open Source projects and actively speaks at meetups and conferences about his projects and experiences. Mario Döbler is a graduate student with a focus in deep learning and AI. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of deep learning. Currently, he dedicates himself to apply deep learning to medical data to make health care accessible to everyone. 4 tips for learning Data Visualization with Python Setting up Apache Druid in Hadoop for Data visualizations [Tutorial] 8 ways to improve your data visualizations

0
0
9914

article-image-4-tips-for-learning-data-visualization-with-python

Sugandha Lahoti

01 Nov 2018

4 min read

4 tips for learning Data Visualization with Python

Sugandha Lahoti

01 Nov 2018

4 min read

Data today is the world’s most important resource. However, without properly visualizing your data to discover meaningful insights, it’s useless. Creating visualizations helps in getting a clearer and concise view of the data, making it more tangible for (non-technical) audiences. Python is the choice of programming language for developers these days. However, sometimes developers face issues performing data visualization with Python. In this post, Tim Großmann, and Mario Döbler, the authors of the Data Visualization with Python course, discuss some of the best practices you should keep in mind while visualizing data with Python. #1 Start looking and experimenting with examples One of the most important ways to deeply understand and learn to use Python for data visualizations is to download example projects and play around with them. You should read their documentation and comments and change values, observing what influence it has. In many cases, they can even serve as a starting point to insert your own data. Think about how you could modify the given examples to visualize your own data. #2 Start from scratch and build on it Sometimes starting with an empty canvas is the best approach. Start with only the necessary components like your data and the import of your library of choice. This builds a nice flow and process that will enable you to debug problems with precision. Once you have gone through the whole process of building a simple visualization, you will have a good understanding of where an error might occur and how to fix it. Starting from scratch sometimes shows you that simpler solutions will save you a lot of time while still communicating the essence of your idea. #3 Make full use of documentation There are libraries with plenty of documentation to answer every single question you have. Make sure to make best use of it, research their API, look at the given example, and search for open issues on their GitHub pages when encountering a problem. Especially the libraries covered in the course “Data Visualization with Python” not only has extensive documentation, but also an active community that is constantly creating new questions on StackOverflow which will help you to find solutions to your problems in no time. #4 Use every opportunity you have with data to visualize it Every time you encounter new data take a few minutes and think about what information might be interesting and visualize it. Think back to the last time you had to give a presentation about your findings and all you had was a table with numerical values in it. For you it was understandable, but your colleagues sat there and scratched their heads. Try to create some simple visualizations that would have impressed the entire team with your results. Only practice makes you perfect. We hope that these tips will not only enable you to get better insights into your data but also gives you the tool to communicate results better. Don’t forget to checkout our course Data Visualization with Python to understand, explore, and effectively present data using the powerful data visualization techniques of Python. About the authors Tim Großmann is a CS student with interest in diverse topics ranging from AI to IoT. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of big data engineering. He’s highly involved in different Open Source projects and actively speaks at meetups and conferences about his projects and experiences. Mario Döbler is a graduate student with a focus in deep learning and AI. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of deep learning. Currently, he dedicates himself to apply deep learning to medical data to make health care accessible to everyone. 8 ways to improve your data visualizations Seaborn v0.9.0 brings better data visualization with new relational plots, theme updates, and more Getting started with Data Visualization in Tableau

0
0
9749

article-image-top-five-questions-to-ask-when-evaluating-a-data-monitoring-solution

Guest Contributor

27 Oct 2018

6 min read

5 best practices to perform data wrangling with Python

Savia Lobo

18 Oct 2018

5 min read

Data wrangling is the process of cleaning and structuring complex data sets for easy analysis and making speedy decisions in less time. Due to the internet explosion and the huge trove of IoT devices there is a massive availability of data, at present. However, this data is most often in its raw form and includes a lot of noise in the form of unnecessary data, broken data, and so on. Clean up of this data is essential in order to use it for analysis by organizations. Data wrangling plays a very important role here by cleaning this data and making it fit for analysis. Also, Python language has built-in features to apply any wrangling methods to various data sets to achieve the analytical goal. Here are 5 best practices that will help you out in your data wrangling journey with the help of Python. And at the end, all you’ll have is a clean and ready to use data for your business needs. 5 best practices for data wrangling with Python Learn the data structures in Python really well Designed to be a very high-level language, Python offers an array of amazing data structures with great built-in methods. Having a solid grasp of all the capabilities will be a potent weapon in your repertoire for handling data wrangling task. For example, dictionary in Python can act almost like a mini in-memory database with key-value pairs. It supports extremely fast retrieval and search by utilizing a hash table underneath. Explore other built-in libraries related to these data structures e.g. ordered dict, string library for advanced functions. Build your own version of essential data structures like stack, queues, heaps, and trees, using classes and basic structures and keep them handy for quick data retrieval and traversal. Learn and practice file and OS handling in Python How to open and manipulate files How to manipulate and navigate directory structure Have a solid understanding of core data types and capabilities of Numpy and Pandas How to create, access, sort, and search a Numpy array. Always think if you can replace a conventional list traversal (for loop) with a vectorized operation. This will increase speed of your data operation. Explore special file types like .npy (Numpy’s native storage) to access/read large data set with much higher speed than usual list. Know in details all the file types you can read using built-in Pandas methods. This will simplify to a great extent your data scraping. Almost all of these methods have great data cleaning and other checks built in. Try to use such optimized routines instead of writing your own to speed up the process. Build a good understanding of basic statistical tests and a panache for visualization Running some standard statistical tests can quickly give you an idea about the quality of the data you need to wrangle with. Plot data often even if it is multi-dimensional. Do not try to create fancy 3D plots. Learn to explore simple set of pairwise scatter plots. Use boxplots often to see the spread and range of the data and detect outliers. For time-series data, learn basic concepts of ARIMA modeling to check the sanity of the data Apart from Python, if you want to master one language, go for SQL As a data engineer, you will inevitably run across situations where you have to read from a large, conventional database storage. Even if you use Python interface to access such database, it is always a good idea to know basic concepts of database management and relational algebra. This knowledge will help you build on later and move into the world of Big Data and Massive Data Mining (technologies like Hadoop/Pig/Hive/Impala) easily. Your basic data wrangling knowledge will surely help you deal with such scenarios. Although Data wrangling may be the most time-consuming process, it is the most important part of the data management. Data collected by businesses on a daily basis can help them make decisions on the latest information available. It also allows businesses to find the hidden insights and use it in the decision-making processes and provide them with new analytic initiatives, improved reporting efficiency and much more. About the authors Dr. Tirthajyoti Sarkar works in San Francisco Bay area as a senior semiconductor technologist where he designs state-of-the-art power management products and applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. He has 15+ years of R&D experience and is a senior member of IEEE. Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup where he is applying the state-of-the-art Computer Vision and Data Engineering algorithms and tools to develop cutting edge product. Data cleaning is the worst part of data analysis, say data scientists Python, Tensorflow, Excel and more – Data professionals reveal their top tools Manipulating text data using Python Regular Expressions (regex)

0
0
9384

article-image-4-misconceptions-about-data-wrangling

Sugandha Lahoti

17 Oct 2018

4 min read

4 misconceptions about data wrangling

Sugandha Lahoti

17 Oct 2018

4 min read

Around 80% of the time in data analysis is spent on cleaning and preparing data for analysis. This is, however, an important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis, and reporting. Although, being an important task given its nature, there are certain myths associated with data wrangling which developers should be cautious of. In this post, we will discuss four such misconceptions. Myth #1: Data wrangling is all about writing SQL query There was a time when data processing needed data to be presented in a relational manner so that SQL queries could be written. Today, there are many other types of data sources in addition to the classic static SQL databases, which can be analyzed. Often, an engineer has to pull data from diverse sources such as web portals, Twitter feeds, sensor fusion streams, police or hospital records. Static SQL query can help only so much in those diverse domains. A programmatic approach, which is flexible enough to interface with myriad sources and is able to parse the raw data through clever algorithmic techniques and use of fundamental data structures (trees, graphs, hash tables, heaps), will be the winner. Myth #2: Knowledge of statistics is not required for data wrangling Quick statistical tests and visualizations are always invaluable to check the ‘quality’ of the data you sourced. These tests can help detect outliers and wrong data entry, without running complex scripts. For effective data wrangling, you don’t need to have knowledge of advanced statistics. However, you must understand basic descriptive statistics and know how to execute them using built-in Python libraries. Myth #3: You have to be a machine learning expert to do great data wrangling Deep knowledge of machine learning is certainly not a pre-requisite for data wrangling. It is true that the end goal of data wrangling is often to prepare the data so that it can be used in a machine learning task downstream. As a data wrangler, you do not have to know all the nitty-gritties of your project’s machine learning pipeline. However, it is always a good idea to talk to the machine learning expert who will use your data and understand the data structure interface and format he/she needs to run the model fast and accurately. Myth #4: Deep knowledge of programming is not required for data wrangling As explained above, the diversity and complexity of data sources require that you are comfortable with deep notions of fundamental data structures and how a programming language paradigm handles them. Increasing deep knowledge of the programming framework (Python for example) will surely help you to come up with innovative methods for dealing with data source interfacing and data cleaning issues. The speed and efficiency of your data processing pipeline can often be benefited from using advanced knowledge of basic algorithms e.g. search, sort, graph traversal, hash table building, etc. Although built-in methods in standard libraries are optimized, having this knowledge gives you an edge for any situation. You read a guest post from Tirthajyoti Sarkar and Shubhadeep Roychowdhury, the authors of Data Wrangling with Python. We hope that these misconceptions would help you realize that data wrangling is not as difficult as it seems. Have fun wrangling data! About the authors Dr. Tirthajyoti Sarkar works as a Sr. Principal Engineer in the semiconductor technology domain where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup. He holds a Master Degree in Computer Science from West Bengal University Of Technology and certifications in Machine Learning from Stanford. Don’t forget to check out Data Wrangling with Python to learn the essential basics of data wrangling using Python. 30 common data science terms explained Python, Tensorflow, Excel and more – Data professionals reveal their top tools How to create a strong data science project portfolio that lands you a job

0
0
6093

article-image-4-myths-about-git-and-github-you-should-know-about

Savia Lobo

07 Oct 2018

3 min read

4 myths about Git and GitHub you should know about

Savia Lobo

07 Oct 2018

3 min read

With an aim to replace BitKeeper, Linus Torvalds created Git in 2005 to support the development of the Linux kernel. However, Git isn’t necessarily limited to code, any product or project that requires or exhibits characteristics such as having multiple contributors, requiring release management and versioning stands to have an improved workflow through Git. Just as every solution or tool has its own positives and negatives, Git is also surrounded by myths. Alex Magana and Joseph Mul, the authors of Introduction to Git and GitHub course discuss in this post some of the myths about the Git tool and GitHub. Git is GitHub Due to the usage of Git and GitHub as the complete set that forms the version control toolkit, adopters of the two tools misconceive Git and GitHub as interchangeable tools. Git is a tool that offers the ability to track changes on files that constitute a project. Git offers the utility that is used to monitor changes and persists the changes. On the other hand, GitHub is akin to a website hosting service. The difference here is that with GitHub, the hosted content is a repository. The repository can then be accessed from this central point and the codebase shared. Backups are equivalent to version control This emanates from a misunderstanding of what version control is and by extension what Git achieves when it’s incorporated into the development workflow. Contrary to archives created based on a team’s backup policy, Git tracks changes made to files and maintains snapshots of a repository at a given point in time. Git is only suitable for teams With the usage of hosting services such as GitHub, the element of sharing and collaboration, may be perceived as a preserve of teams. Git offers gains beyond source control. It lends itself to the delivery of a feature or product from the point of development to deployment. This means that Git is a tool for delivery. It can, therefore, be utilized to roll out functionality and manage changes to source code for teams and individuals alike. To effectively use Git, you need to learn every command to work When working as an individual or a team, the common commands required to allow you to contribute a repository encompass commands for initiating tracking of specific files, persisting changes made to tracked files, reverting changes made to files incorporating changes introduced by other developers working on the same project you are on. The four myths mentioned by the authors provides a clarification on both Git and GitHub and its uses. If you found this post useful, do check out the course titled Introduction to Git and GitHub by Alex and Joseph. GitHub addresses technical debt, now runs on Rails 5.2.1 GitLab 11.3 released with support for Maven repositories, protected environments and more GitLab raises $100 million, Alphabet backs it to surpass Microsoft’s GitHub

0
0
9305

article-image-what-is-statistical-analysis-and-why-does-it-matter

Sugandha Lahoti

02 Oct 2018

6 min read

What is Statistical Analysis and why does it matter?

Sugandha Lahoti

02 Oct 2018

6 min read

0
0
10326

article-image-messaging-app-telegram-updated-privacy-policy-open-challenge

Amarabha Banerjee

08 Sep 2018

7 min read

Messaging app Telegram's updated Privacy Policy is an open challenge

Amarabha Banerjee

08 Sep 2018

7 min read

Social media companies are facing a lot of heat presently because of their privacy issues. One of them is Facebook. The Cambridge analytica scandal had even prompted a senate hearing for Mark Zuckerberg. On the other end of this spectrum, there is another messaging app known as Telegram, registered in London, United Kingdom, founded by the Russian entrepreneur Pavel Durov. Telegram has been in the news for an absolutely opposite situation. It’s often touted as one of the most secure and secretive messaging apps. The end to end encryption ensures that security agencies across the world have a tough time getting access to any suspicious piece of information. For this reason Russia has banned the use of Telegram app on April 2018. Telegram updated their privacy policies on . These updates have further ensured that Telegram will retain the title of the most secure messaging application in the planet. It’s imperative for any messaging app to get access to our data. But how they choose to use it makes you either vulnerable or secure. Telegram in their latest update have stated that they process personal data on the grounds that such processing caters to the following two goals: Providing effective and innovative Services to our users To detect, prevent or otherwise address fraud or security issues in respect of their provision of Services. The caveat for the second point being the security interests shall not override the space of fundamental rights and freedoms that require protection of personal data. This clause is an excellent example on how applications can prove to be a torchbearer for human rights and basic human privacy amidst glaring loopholes. Telegram have listed the the kind of user data accessed by the app. They are as follows: Basic Account Data Telegram stores basic account user data that includes mobile number, profile name, profile picture and about information, which are needed to create a Telegram account. The most interesting part of this is Telegram allows you to only keep your username (if you choose to) public. The people who have you in their contact list will see you as you want them to - for example you might be a John Doe in public, but your mom will still see you as ‘Dear Son’ in their contacts. Telegram doesn’t require your real name, gender, age or even your screen name to be your real name. E-mail Address When you enable 2-step-verification for your account or store documents using the Telegram Passport feature, you can opt to set up a password recovery email. This address will only be used to send you a password recovery code if you forget it. They are particular about not sending any unsolicited marketing emails to you. Personal Messages Cloud Chats Telegram stores messages, photos, videos and documents from your cloud chats on their servers so that you can access your data from any of your devices anytime without having to rely on third-party backups. All data is stored heavily encrypted and the encryption keys in each case are stored in several other data centers in different jurisdictions. This way local engineers or physical intruders cannot get access to user data. Secret Chats Telegram has a feature called Secret chats that uses end-to-end encryption. This means that all data is encrypted with a key that only the sender and the recipients know. There is no way for us or anybody else without direct access to your device to learn what content is being sent in those messages. Telegram does not store ‘secret chats’ on their servers. They also do not keep any logs for messages in secret chats, so after a short period of time there is no way of determining who or when you messaged via secret chats. Secret chats are not available in the cloud — you can only access those messages from the device they were sent to or from. Media in Secret Chats When you send photos, videos or files via secret chats, before being uploaded, each item is encrypted with a separate key, not known to the server. This key and the file’s location are then encrypted again, this time with the secret chat’s key — and sent to your recipient. They can then download and decipher the file. This means that the file is technically on one of Telegram’s servers, but it looks like a piece of random indecipherable garbage to everyone except for you and the recipient. This complete process is random and there random data packets are periodically purged from the storage disks too. Public Chats In addition to private messages, Telegram also supports public channels and public groups. All public chats are cloud chats. Like everything else on Telegram, the data you post in public communities is encrypted, both in storage and in transit — but everything you post in public will be accessible to everyone. Phone Number and Contacts Telegram uses phone numbers as unique identifiers so that it is easy for you to switch from SMS and other messaging apps and retain your social graph. But the most important thing is that permissions from the users are a must before the cookies are allowed into your browser. Cookies Telegram promises that the only cookies they use are those to operate and provide their Services on the web. They clearly state that they don’t use cookies for profiling or advertising. Their cookies are small text files that allow them to provide and customize their Services, and provide an enhanced user experience. Also, whether or not to use these cookies is a choice made by the users. So, how does Telegram remain in business? The Telegram business model doesn’t match that of a revenue generating service. The founder Pavel Durov is also the founder of the popular Russian social networking site VK. Telegram doesn’t charge for any messaging services, it doesn’t show ads yet. Some new in app purchase features might be included in the new version. As of now, the main source of revenue for Telegram are donations and mainly the earnings of Pavel Durov himself (from the social networking site VK). What can social networks learn from Telegram? Telegram’s policies elevate privacy standards that many are asking from other social messaging apps. The clamour for stopping the exploitation of user data, using their location details for targeted marketing and advertising campaigns is increasing now. Telegram shows that privacy can be achieved, if intended, in today’s overexposed social media world. But there is are also costs to this level of user privacy and secrecy, that are sometimes not discussed enough. The ISIS members behind the 2015 Paris attacks used Telegram to spread propaganda. ISIS also used the app to recruit the perpetrators of the Christmas market attack in Berlin last year and claimed credit for the massacre. More recently, a Turkish prosecutor found that the shooter behind the New Year’s Eve attack at the Reina nightclub in Istanbul used Telegram to receive directions for it from an ISIS leader in Raqqa. While these incidents can never negate the need for a secure and less intrusive social media platform like Telegram, there should be workarounds and escape routes designed for stopping extremists and terrorist activities. Telegram have assured that all ISIS messaging channels are deleted from their network which is a great way to start. Content moderation, proactive sentiment and pattern recognition and content/account isolation are the next challenges for Telegram. One thing is for sure, Telegram’s continual pursuance of user secrey and user data privacy is throwing an open challenge to others to follow suite. Whether others will oblige or not, only time will tell. To read about Telegram’s updated privacy policies in detail, you can check out the official Telegram Privacy Settings. How to stay safe while using Social Media Time for Facebook, Twitter and other social media to take responsibility or face regulation What RESTful APIs can do for Cloud, IoT, social media and other emerging technologies

0
0
6989

article-image-how-everyone-at-netflix-uses-jupyter-notebooks-from-data-scientists-machine-learning-engineers-to-data-analysts

Bhagyashree R

18 Aug 2018

4 min read

How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts

Bhagyashree R

18 Aug 2018

4 min read

Netflix uses a variety of tools to do data analysis. One of the big ways that data scientists and engineers at Netflix interact with their data is through Jupyter notebooks. In addition to providing execution environments to users, Netflix invests in various parts of the Jupyter ecosystem and tooling. They are “reimagining what a notebook can be, who can use it, and what they can do with it.” Netflix aims to provide personalized content to their 130 million viewers. For this every day more than 1 trillion events are written into a streaming ingestion pipeline. To support this, they’ve built an industry-leading data platform which is flexible, powerful, and complex. There are so many diverse users of this platform, such as analytics engineers, data engineers, and data scientists, requiring different sets of tools and languages. To help the platform scale, they wanted to minimize the number of tools and the solution to this was the open-source tool: Jupyter notebooks. Why Jupyter notebook is so compelling for Netflix? These are the functionalities provided by notebook that benefits Netflix’s data scientists and engineers: Standard messaging API: The Jupyter protocol provides a standard messaging API with the kernels that act as computational engines. It separates where the content is written and where the content is executed. This makes it language agnostic. Editable file format: It provides an editable file format that stores the code and results together. Web-based UI: It is web-based which helps interactively writing and running code as well as visualizing outputs. How Netflix uses Jupyter Notebooks? The following are some of the use cases they use Jupyter notebooks for: Data access: Notebooks were first introduced for workflows and their adoption grew among the data scientists. Seeing this, Netflix decided to leverage its versatility and architecture for general data access. Notebooks provide an user-friendly interface for interactively running code, exploring the outputs, and visualizing data all from a single cloud-based development environment. Notebook Templates: They introduced parameterized notebooks, which allow the use of parameters in the code and take values as input at runtime. These templates help: Data scientists to run an experiment with different coefficients and summarize the results Data engineers to execute data quality audits Data analysts to share prepared queries and visualizations Software engineers to email the results of a troubleshooting script Scheduling notebooks: Next they are using notebooks for creating a unifying layer for scheduling workflows. Notebooks are used for interactive work and allows smooth move to scheduling that work to run recurrently. Many users create an entire workflow in a notebook and just copy/paste it into separate files for scheduling when they’re ready to deploy it. Notebook infrastructure: The three fundamental components of the infrastructure are: storage, compute, and interface. Source: Netflix Tech Blog Storage: The Netflix Data Platform is made of Amazon S3 and EFS for cloud storage, which notebooks treat as virtual filesystems. Each user has a home directory on EFS containing a personal workspace for notebooks. This workspace is for storing any notebook created or uploaded by a user. When a user launches a notebook interactively, all the reading and writing happens at the workspace. Compute: All the jobs on the data platform run on containers including queries, pipelines and notebooks. A container with reasonable default resources is provisioned when a user launches a notebook. Users can request more resources if they find that the provided resources are not enough. A unified execution environment with a prepared container image is provided, which has common libraries and an array of default kernels preinstalled. The orchestration and environments are managed with Titus, their container management platform. Interface: They are using nteract, a React-based frontend for Jupyter notebooks, which emphasizes simplicity and composability as core design principles.They’re also introducing native support for parameterization, which makes it easier to schedule notebooks and create reusable templates. Netflix is planning to make investments in both the frontend and backend to improve the overall notebook experience. This year they are also sponsoring JupyterCon. To read more about how Jupyter is offering value to Netflix read Netflix’s original post at Medium. 10 reasons why data scientists love Jupyter notebooks What’s new in Jupyter Notebook 5.3.0 Netflix open sources Zuul 2 cloud gateway

0
0
16989

article-image-can-cryptocurrency-establish-a-new-economic-world-order

Amarabha Banerjee

22 Jul 2018

5 min read

Can Cryptocurrency establish a new economic world order?

Amarabha Banerjee

22 Jul 2018

5 min read

Cryptocurrency has already established one thing - there is a viable alternative to dollars and gold as a measure of wealth. Our present economic system is flawed. Cryptocurrencies, if utilized properly, can change the way the world deals with money and wealth. But can it completely overthrow the present system and create a new economic world order? To know the answer to this we will have to understand the concept of cryptocurrencies and the premise for their creation. Money - The weapon to control the world Money is a measure of wealth, which translates into power. The power centers have largely remained the same throughout history, be it a monarchy, or autocracy or democracy. Power has shifted from one king to one dictator, to a few elected/selected individuals. To remain in power, they had to control the source and distribution of money. That’s why till date, only the government can print money and distribute it among citizens. We can earn money in exchange for our time and skills or loan money in exchange for our future time. But there’s only so much of time that we can give away and hence the present day economy always runs on the philosophy of scarcity and demand. The money distribution follows a trickle down approach in a pyramid structure. Source: Credit Suisse Inception of Cryptocurrency - Delocalization of money It’s abundantly clear from the image above that while printing of money is under the control of the powerful and the wealth creators, the pyramidal distribution mechanism also has ensured very less money flows to the bottom most segments of the population. The money creators have been ensuring their safety and prosperity throughout history, by accumulating chunks of money for themselves. Subsequently, the global wealth gap has increased staggeringly. This could have possibly triggered the rise of cryptocurrencies, as a form of an alternative economic system, that theoretically, doesn’t just accumulate at the top, but also rewards anyone who is interested in mining these currencies and spending their time and resources. The main concept that made this possible was the distributed computing mechanism which has gained tremendous interest in recent times. Distributed Computing, Blockchain & the possibilities The foundation of our present economic system is a central power, be it government or a ruler or dictator. The alternative of this central system is a distributed system, where every single node of communication contains the power of decision making and is equally important for the system. So if one node is cut-off, the system will not fall apart, it will keep on functioning. That’s what makes distributed computing terrifying for the centralized economic systems. Because they can’t just attack the creator of the system or use a violent hack to bring down the entire system. Source: Medium.com When the white paper on Cryptocurrencies was first published by the anonymous Satoshi Nakamoto, there was this hope of constituting a parallel economy, where any individual with an access to a mobile phone and internet might be able to mine bitcoins and create wealth, for not just himself/herself, but for the system also. Satoshi himself invented the concept of Blockchain, an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable and permanent way. Blockchain was the technology on top of which the first unit of Cryptocurrency, Bitcoins, were created. The concept of Bitcoin mining seemed revolutionary at that time. The more people that joined the system, the more enriched the system would become. The hope was that it would make the mainstream economic system take note and cause a major overhaul of the wealth distribution system. But sadly, none of that seems to have taken place yet. The phase of Disillusionment The reality is that bitcoin mining capabilities were controlled by system resources. The creators also had accumulated enough bitcoins for themselves similar to the traditional wealth creation system. Satoshi’s Bitcoin holdings were valued at $19.4 Billion during the Dec 2017 peak, making him the 44th richest person in the world during that time. This basically meant that the wealth distribution system was at fault again, very few could get their hands onto Bitcoins as their prices in traditional currencies had climbed. The government then duly played their part in declaring that trading in Bitcoins was illegal, cracking down on several cryptocurrency top guns. Recently different countries have joined the bandwagon to ban Cryptocurrency. Hence the value is much less now. The major concern is that the skepticism in public minds might kill the hype earlier than anticipated. Source: Bitcoin.com The Future and Hope for a better Alternative What we must keep in mind is that Bitcoins are just a derivative of the concept of Cryptocurrencies. The primary concept of distributed systems and the resulting technology - Blockchain, is still a very viable and novel one. The problem in the current Bitcoin system is the distribution mechanism. Whether we would be able to tap into the distributed system concept and create a better version of the Bitcoin model, only time will tell. But for the sake of better wealth propagation and wealth balance, we can only hope that this realignment of economic system happens sooner than later. Blockchain can solve tech’s trust issues – Imran Bashir A brief history of Blockchain Crypto-ML, a machine learning powered cryptocurrency platform

0
0
6311

article-image-data-science-for-non-techies-how-i-got-started

Amey Varangaonkar

20 Jul 2018

7 min read

Data science for non-techies: How I got started (Part 1)

Amey Varangaonkar

20 Jul 2018

7 min read

As a category manager, I manage the data science portfolio of product ideas for Packt Publishing, a leading tech publisher. In simple terms, I place informed bets on where to invest, what topics to publish on etc. While I have a decent idea of where the industry is heading and what data professionals are looking forward to learn and why etc, it is high time I walked in their shoes for a couple of reasons. Basically, I want to understand the reason behind Data Science being the ‘Sexiest job of the 21st century’, and if the role is really worth all the fame and fortune. In the process, I also wanted to explore the underlying difficulties, challenges and obstacles that every data scientist has had to endure at some point in his/her journey, or still does, maybe. The cherry on top, is that I get to use the skills I develop, to supercharge my success in my current role that is primarily insight-driven. This is the first of a series of posts on how I got started with Data Science. Today, I’m sharing my experience with devising a learning path and then gathering appropriate learning resources. Devising a learning path To understand the concepts of data science, I had to research a lot. There are tons and tons of resources out there, many of which are very good. Once you seperate the good from the rest, it can be quite intimidating to pick the options that suit you the best. Some of the primary questions that clouded my mind were: What should be my programming language of choice? R or Python? Or something else? What tools and frameworks do I need to learn? What about the statistics and mathematical aspects of machine learning? How essential are they? Two videos really helped me find the answers to the questions above: If you don’t want to spend a lot of your time mastering the art of data science, there’s a beautiful video on how to become a data scientist in six months What are the questions asked in a data science interview? What are the in-demand skills that you need to master in order to get a data science job? This video on 5 Tips For Getting a Data Science Job really is helpful. After a lot of research that included reading countless articles and blogs and discussions with experts, here is my learning plan: Learn Python Per the recently conducted Stack Overflow Developer Survey 2018, Python stood out as the most-wanted programming language, meaning the developers who do not use it yet want to learn it the most. As one of the most widely used general-purpose programming languages, Python finds large applications when it comes to data science. Naturally, you get attracted to the best option available, and Python was the one for me. The major reasons why I chose to learn Python over the other programming languages: Very easy to learn: Python is one of the easiest programming languages to learn. Not only is the syntax clean and easy to understand, even the most complex of data science tasks can be done in a few lines of Python code. Efficient libraries for Data Science: Python has a vast array of libraries suited for various data science tasks, from scraping data to visualizing and manipulating it. NumPy, SciPy, pandas, matplotlib, Seaborn are some of the libraries worth mentioning here. Python has terrific libraries for machine learning: Learning a framework or a library which makes machine learning easier to perform is very important. Python has libraries such as scikit-learn and Tensorflow that makes machine learning easier and a fun-to-do activity. To make the most of these libraries, it is important to understand the fundamentals of Python. My colleague and good friend Aaron has put out a list of top 7 Python programming books which helped as a brilliant starting point to understand the different resources out there to learn Python. The one book that stood out for me was Learn Python Programming - Second Edition - This is a very good book to start Python programming from scratch. There is also a neat skill-map present on Mapt, where you can progressively build up your knowledge of Python - right from the absolute basics to the most complex concepts. Another handy resource to learn the A-Z of Python is Complete Python Masterclass. This is a slightly long course, but it will take you from the absolute fundamentals to the most advanced aspects of Python programming. Task Status: Ongoing Learn the fundamentals of data manipulation After learning the fundamentals of Python programming, the plan is to head straight to the Python-based libraries for data manipulation, analysis and visualization. Some of the major ones are what we already discussed above, and the plan to learn them is in the following order: NumPy - Used primarily for numerical computing pandas - One of the most popular Python packages for data manipulation and analysis matplotlib - The go-to Python library for data visualization, rivaling the likes of R’s ggplot2 Seaborn - A data visualization library that runs on top of matplotlib used for creating visually appealing charts, plots and histograms Some very good resources to learn about all these libraries: Python Data Analysis Python for Data Science and Machine Learning - This is a very good course with a detailed coverage on the machine learning concepts. Something to learn later. The aim is to learn these libraries upto a fairly intermediate level, and be able to manipulate, analyze and visualize any kind of data, including missing, unstructured data and time-series data. Understand the fundamentals of statistics, linear algebra and probability In order to take a step further and enter into the foray of machine learning, the general consensus is to first understand the maths and statistics behind the concepts of machine learning. Implementing them in Python is relatively easier once you get the math right, and that is what I plan to do. I shortlisted some very good resources for this as well: Statistics for Machine Learning Stanford University - Machine Learning Course at Coursera Task Status: Ongoing Learn Machine Learning (Sounds odd I know) After understanding the math behind machine learning, the next step is to learn how to perform predictive modeling using popular machine learning algorithms such as linear regression, logistic regression, clustering, and more. Using real-world datasets, the plan is to learn the art of building state-of-the-art machine learning models using Python’s very own scikit-learn library, as well as the popular Tensorflow package. To learn how to do this, the courses I mentioned above should come in handy: Stanford University - Machine Learning Course at Coursera Python for Data Science and Machine Learning Python Machine Learning, Second Edition Task Status: To be started [box type="shadow" align="" class="" width=""]During the course of this journey, websites like Stack Overflow and Stack Exchange will be my best friends, along with the popular resources such as YouTube.[/box] As I start this journey, I plan to share my experiences and knowledge with you all. Do you think the learning path looks good? Is there anything else that I should include in my learning path? I would really love to hear your comments, suggestions and experiences. Stay tuned for the next post where I seek answers to questions such as ‘How much of Python should I learn in order to be comfortable with Data Science?’, ‘How much time should I devote per day or week to learn the concepts in Data Science?’ and much more.. Read more Why is data science important? 9 Data Science Myths Debunked 30 common data science terms explained

0
0
6948

article-image-top-8-ways-to-improve-your-data-visualizations

Natasha Mathur

04 Jul 2018

7 min read

8 ways to improve your data visualizations

Natasha Mathur

04 Jul 2018

7 min read

In Dr. W.Edwards Deming’s words “In God we trust, all others must bring data”. Organizations worldwide, revolve around data like planets revolve around the sun. Since data is so central to organizations, there are certain data visualization tools that help them understand data to make better business decisions. A lot more data is getting churned out and collected by organizations than ever before. So, how to make sense of all this data? Humans are visual creatures and our human brain processes visual information far better than textual information. In fact, presentations that use visual aids such as colors, shapes, images, etc, are found to be far more persuasive according to a research done by University of Minnesota back in 1986. Data visualization is one such process that easily translates the collected information into engaging visuals. It’s easy, cheap and doesn’t require any designing expertise to create data visuals. However, some professionals feel that data visualization is just limited to slapping on charts and graphs when that’s not actually the case. Data visualization is about conveying the right information, in a way that enhances the audience’s experience. So, if you want your graphs and charts to be more succinct and understandable, here are eight ways to improve your data visualization process: 1. Get rid of unneeded information Less is more in some cases and the same goes for data visualization. Using excessive color, jargons, pie charts and metrics take away focus from the important information. For instance, when using colors, don’t make your charts and graphs a rainbow instead use a specific set of colors with a clear purpose and meaning. Do you see the difference color and chart make to visualization in the below images? Source: Podio Similarly, when it comes to expressing your data, note how people interact at your workplace. Keep the tone of your visuals as natural as possible to make it easy for the audience to interpret your data. For metrics, only show the ones that truly bring value to your storytelling. Filter out the ones that are not so important to create less fuss. Tread cautiously while using pie charts as they can be difficult to understand sometimes and also, get rid of the elements on a chart that cause unnecessary confusion. Source: Dashboard Zone 2. Use conditional formatting for tabular data Data visualization doesn’t need to use fancy tools or designs. Take your standard excel table for example. Do you want to point out patterns or outliers in your data? Conditional formatting is a great tool for people working with data. It involves making simple rules on a given data and once that’s done, it’ll highlight only the data that matters the most to you. This helps quickly track the main information. Conditional formatting can be used for different things. It can help spot duplicate data in your table. You need to set bounds for the data using the built-in conditional formatting. It’ll then format the cells based on those bounds, highlighting the data you want. For instance, if sales quota of over 65% is good, between 65% and 55% is average, and below 55% is poor, then with conditional formatting, you can quickly find out who is meeting the expected sales quota, and who is not. 3. Add trendlines to unearth patterns for prediction Another feature that can amp up your data visualization is trendlines. They observe the relationship between two variables from your existing data. They are also are useful for predicting future values. Trendlines are simple to add and help discover trends in the given data set. Source: Interworks It also show data trends or moving averages in your charts. Depending on the kind of data you’re working with, there are a number of trendlines out there that you can use on your visualizations. Questions like whether a new strategy seems to be working in favor of the organization can be answered with the help of trendlines. This insight, in turn, helps predict new outcomes for the future. Statistical models are used in trendlines to make predictions. Once you add trend lines to a view, it’s up to you to decide how you want them to look and behave. 4. Implement filter by rule to get more specific Filter helps display just the information that you need. Using filter by rule, you can add filter option to your dataset. Organizations produce huge amounts of data on a regular basis. Suppose you want to know which employees within your organization are consistent performers. So, instead of creating a visualization that includes all the employees and their performances, you can filter it down, so that it shows only the employees who are always doing well. Similarly, if you want to find out which day the sales went up or down, you can filter it to show results for only the past week or month depending upon your preference. 5. For complex or dense data representation, add hierarchy Hierarchies eliminate the need to create extra visualizations. You can view data from a high level and dig deeper into the specifics of the data as you come up with questions based on the data. Adding a hierarchy to the data helps club multiple information in one visualization. Source: dzone For instance, if you create a hierarchy that shows the total sales achieved by different sales representative within an organization in the past month. Now, you can further break this down by selecting a particular sales rep, and then you can go even further by selecting a specific product assigned to that sales rep. This cuts down on a lot of extra work. 6. Make visuals more appealing by formatting data Data formatting takes only a few seconds but it can make a huge difference when it comes to the audience interpreting your data. Source: dzone It makes the numbers appear more visually appealing and easier to read for the audience. It can be used for charts such as bar charts and column charts. Formatting data to show a certain number of decimals, comma separators, number font, currency or percentage can make your visualization process more engaging. 7. Include comparison for more insight Comparisons provide readers a better perspective on data. It can both improve and add insights to your visualizations by including comparisons to your charts. For instance, in case you want to inform your audience about organization’s growth in current as well as the past year then you can include comparison within the visualization. You can also use a comparison chart to compare between two data points such as budget vs actually spent. 8. Sort data to improve readability Again, sorting through data is a great way to make things easy for the audience when dealing with huge quantities of data. For instance, if you want to include information about the highest and lowest performing products, you can sort your data. Sorting can be done in the following ways: Ascending - This helps sort the data from lowest to highest. Descending - This sorts data from highest to lowest. Data source order - Sorts the data in the order it is sorted in the data source. Alphabetic - Data is alphabetically sorted. Manual - Data can be sorted manually in the order you prefer. Effective data visualization helps people interpret the information in data that could not be seen before, to change their minds and prompt action. These were some of the tricks and features to take your data visualization game to the next level. There are different data visualization tools available in the market to choose from. Tableau and Microsoft Power BI are among the top ones that offer great features for data visualization. So, now that we’ve got you covered with some of the best practices for data visualization, it’s your turn to put these tips to practice and create some strong visual data stories. Do you have any DataViz tips to share with our readers? Please add them in the comments below. Getting started with Data Visualization in Tableau What is Seaborn and why should you use it for data visualization? “Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan

0
0
10966

Tech Guides - Data Analysis

What does a data science team look like?

Alteryx vs. Tableau: Choosing the right data analytics tool for your business

‘Computing technology at a tipping point’, says WEF Davos Panel

Python Data Visualization myths you should know about

4 tips for learning Data Visualization with Python

Top five questions to ask when evaluating a Data Monitoring solution

5 best practices to perform data wrangling with Python

4 misconceptions about data wrangling

4 myths about Git and GitHub you should know about

What is Statistical Analysis and why does it matter?

Messaging app Telegram's updated Privacy Policy is an open challenge

How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts

Can Cryptocurrency establish a new economic world order?

Data science for non-techies: How I got started (Part 1)

8 ways to improve your data visualizations

Tech Guides - Data Analysis

What does a data science team look like?

Alteryx vs. Tableau: Choosing the right data analytics tool for your business

‘Computing technology at a tipping point’, says WEF Davos Panel

Python Data Visualization myths you should know about

4 tips for learning Data Visualization with Python

Top five questions to ask when evaluating a Data Monitoring solution

5 best practices to perform data wrangling with Python

4 misconceptions about data wrangling

4 myths about Git and GitHub you should know about

What is Statistical Analysis and why does it matter?

Trending Topics

Messaging app Telegram's updated Privacy Policy is an open challenge

How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts

Can Cryptocurrency establish a new economic world order?

Data science for non-techies: How I got started (Part 1)

8 ways to improve your data visualizations