Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Tech Guides - Data

281 Articles
article-image-5-types-of-deep-transfer-learning
Bhagyashree R
25 Nov 2018
5 min read
Save for later

5 types of deep transfer learning

Bhagyashree R
25 Nov 2018
5 min read
Transfer learning is a method of reusing a model or knowledge for another related task. Transfer learning is sometimes also considered as an extension of existing ML algorithms. Extensive research and work is being done in the context of transfer learning and on understanding how knowledge can be transferred among tasks. However, the Neural Information Processing Systems (NIPS) 1995 workshop Learning to Learn: Knowledge Consolidation and Transfer in Inductive Systems is believed to have provided the initial motivations for research in this field. The literature on transfer learning has gone through a lot of iterations, and the terms associated with it have been used loosely and often interchangeably. Hence, it is sometimes confusing to differentiate between transfer learning, domain adaptation, and multitask learning. Rest assured, these are all related and try to solve similar problems. In this article, we will look into the five types of deep transfer learning to get more clarity on how these differ from each other. [box type="shadow" align="" class="" width=""]This article is an excerpt from a book written by Dipanjan Sarkar, Raghav Bali, and Tamoghna Ghosh titled Hands-On Transfer Learning with Python. This book covers deep learning and transfer learning in detail. It also focuses on real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples.[/box] #1 Domain adaptation Domain adaptation is usually referred to in scenarios where the marginal probabilities between the source and target domains are different, such as P (Xs) ≠ P (Xt). There is an inherent shift or drift in the data distribution of the source and target domains that requires tweaks to transfer the learning. For instance, a corpus of movie reviews labeled as positive or negative would be different from a corpus of product-review sentiments. A classifier trained on movie-review sentiment would see a different distribution if utilized to classify product reviews. Thus, domain adaptation techniques are utilized in transfer learning in these scenarios. #2 Domain confusion Different layers in a deep learning network capture different sets of features. We can utilize this fact to learn domain-invariant features and improve their transferability across domains. Instead of allowing the model to learn any representation, we nudge the representations of both domains to be as similar as possible. This can be achieved by applying certain preprocessing steps directly to the representations themselves. Some of these have been discussed by Baochen Sun, Jiashi Feng, and Kate Saenko in their paper Return of Frustratingly Easy Domain Adaptation. This nudge toward the similarity of representation has also been presented by Ganin et. al. in their paper, Domain-Adversarial Training of Neural Networks. The basic idea behind this technique is to add another objective to the source model to encourage similarity by confusing the domain itself, hence domain confusion. #3 Multitask learning Multitask learning is a slightly different flavor of the transfer learning world. In the case of multitask learning, several tasks are learned simultaneously without distinction between the source and targets. In this case, the learner receives information about multiple tasks at once, as compared to transfer learning, where the learner initially has no idea about the target task. This is depicted in the following diagram: Multitask learning: Learner receives information from all tasks simultaneously #4 One-shot learning Deep learning systems are data hungry by nature, such that they need many training examples to learn the weights. This is one of the limiting aspects of deep neural networks, though such is not the case with human learning. For instance, once a child is shown what an apple looks like, they can easily identify a different variety of apple (with one or a few training examples); this is not the case with ML and deep learning algorithms. One-shot learning is a variant of transfer learning where we try to infer the required output based on just one or a few training examples. This is essentially helpful in real-world scenarios where it is not possible to have labeled data for every possible class (if it is a classification task) and in scenarios where new classes can be added often. The landmark paper by Fei-Fei and their co-authors, One Shot Learning of Object Categories, is supposedly what coined the term one-shot learning and the research in this subfield. This paper presented a variation on a Bayesian framework for representation learning for object categorization. This approach has since been improved upon, and applied using deep learning systems. #5 Zero-shot learning Zero-shot learning is another extreme variant of transfer learning, which relies on no labeled examples to learn a task. This might sound unbelievable, especially when learning using examples is what most supervised learning algorithms are about. Zero-data learning, or zero-short learning, methods make clever adjustments during the training stage itself to exploit additional information to understand unseen data. In their book on Deep Learning, Goodfellow and their co-authors present zero-shot learning as a scenario where three variables are learned, such as the traditional input variable, x, the traditional output variable, y, and the additional random variable that describes the task, T. The model is thus trained to learn the conditional probability distribution of P(y | x, T). Zero-shot learning comes in handy in scenarios such as machine translation, where we may not even have labels in the target language. In this article we learned about the five types of deep transfer learning types: Domain adaptation, domain confusion, multitask learning, one-shot learning, and zero-shot learning. If you found this post useful, do check out the book, Hands-On Transfer Learning with Python, which covers deep learning and transfer learning in detail. It also focuses on real-world examples and research problems using TensorFlow, Keras, and the Python ecosystem with hands-on examples. CMU students propose a competitive reinforcement learning approach based on A3C using visual transfer between Atari games What is Meta Learning? Is the machine learning process similar to how humans learn?
Read more
  • 0
  • 0
  • 17617

article-image-sally-hubbard-on-why-tech-monopolies-are-bad-for-everyone-amazon-google-and-facebook-in-focus
Natasha Mathur
24 Nov 2018
8 min read
Save for later

Sally Hubbard on why tech monopolies are bad for everyone: Amazon, Google, and Facebook in focus

Natasha Mathur
24 Nov 2018
8 min read
When people talk about tech giants such as Amazon, Facebook, and Google, they usually talk about the great and powerful innovations that they’ve brought to the table, that have perpetually transformed the contemporary world. Of late, criticism of these same tech titans holding back the power of innovation from other smaller companies as they have become, what you may call, a tech monopoly has been gain traction. In a podcast episode of Innovation For All, Sheana Ahlqvist talked to Sally Hubbard, an antitrust expert, and investigative journalist at The Capitol Forum, regarding tech giants building monopolies. Here are some key highlights from the podcast.   Let’s recall the definition of monopoly. “A market structure characterized by a single seller, selling a unique product in the market. In a monopoly market, the seller faces no competition, as he is the sole seller of goods with no close substitute. Monopoly market makes the single seller the market controller as well as the price maker. He enjoys the power of setting the price for his goods”. In a nutshell, decrease the prices of your service and drive everyone else out of the business. A popular example is John D Rockefeller, Standard Oil’s chief executive, who ruined other competitors by cutting the prices of the oil until they went bankrupt, immediately after which the higher prices returned. Now although there is no price-fixing in the case of Google or Facebook since they offer completely free services, they’re still a monopoly. Let’s have a look. How are Amazon, Google, and Facebook tech monopolies? If you look at each one of these organizations - Amazon, Facebook, and Google have carved out their own markets, with gargantuan and durable market power vested in the hands of each one of them. According to the US Department of Justice, a market share of greater than 50% has been necessary for courts to find the existence of monopolistic power. A dominant market share is a useful starting point in determining monopoly power. Going by this rule, Google has dominated the search engine market, maintaining an 86.02 % market share as of July 2018, as per Statista. This is way over 50%, making Google a monopoly. The majority of Google revenues are generated through advertising. Similarly, Facebook dominates the social media market, with its worldwide market share of 66.67%, making it a monopoly too. Amazon, on the other hand, has 41% market share in the e-commerce retail market which is expected to increase significantly to 50% of the entire e-commerce retail market’s GMV, by 2021. This brings it pretty close to being a monopoly soon in the e-commerce market soon. Another factor that is considered under the Sherman Act, a part of the antitrust law, when identifying a firm that possesses monopoly power, is the existence of anti-competitive effect i.e. companies trying to maintain or acquire a dominant position by excluding competitors or preventing new entry. One recent example that comes to mind is when Google was fined with $5 billion in July this year for breaching EU’s antitrust laws. Google was fined for 3 types of illegal restrictions on the use of Android, cementing the dominance of its search engine. As per EU, Google denied its rivals a chance to innovate and compete on merits, which is illegal under EU’s antitrust laws. Also Read: A quick look at E.U.’s antitrust case against Google’s Android Monopolies and Social Injustice Hubbard points out how these tech titans don’t have any major competitors or substitutes, and even if you don’t pay most of these tech giants with money, you pay them with your data. This is more than enough for world domination, which is always an underlying aspiration for tech companies as they strive to be “the one” in the eyes of their customers, by carefully leveraging their data. This data also put these companies at an unfair advantage over other smaller and newer businesses. As Clive Humby, a British mathematician rightly said, “data is the new oil” in the digital economy. Hubbard explains how the downsides of this monopoly might not be visible to the consumer but affects entrepreneurs and small businesses who are greatly harmed by the practices of these companies. Taking Amazon, for instance, no one wishes to be dependent on their competitor, however, since Amazon has integrated the service of selling products on its platform, not only is everyone competing against Amazon but are also dependent on Amazon, as it is Amazon who decides the rules for the sellers. Add to this the fact that Amazon comprises a ginormous amount of consumer data in hand, putting it in an unfair advantage over others as it can dominate its products over others. There is currently an ongoing EU investigation into Amazon’s use of consumer and seller data collected on its platform to better its own products sold on its platform. Similarly, Google’s monopoly is evident in the fact that it gets to decide the losers and winners of the internet on its Google search, prioritizing its products over the others. An example of this is Google getting fined with 2.7 billion dollars by EU, last year after it ruled the company had abused its power by promoting its own shopping comparison service at the top of search results. Facebook, on the other hand, doesn’t have a direct competition, leaving users with less choice in terms of Social network sites, making it a monopoly. Add to that the fact that other major social media platforms such as Instagram and Whatsapp are also owned by Facebook. Hubbard explains how Facebook doesn't have competition, so it can prioritize its profits over the other factors such as user data as it's really not concerned about user loss. This is evident in the number of scandals that Facebook has gotten itself into regarding user data.  Facebook is facing a whole lot of data and privacy-related controversies, Cambridge Analytica scandal being the most popular one. Facebook suffered the largest security breach in its history that left 50M user accounts compromised, last month. Department of Housing and Urban Development UD) filed a complaint against Facebook in August, alleging the platform is selling ads that discriminate against users based on race, religion, and sexuality. ACLU also sued Facebook in September for enabling sex and age discrimination through targeted ads. Last week, the New York Times published a bombshell report on how Facebook has been following the strategy of ‘delaying, denying and deflecting’ the blame under the leadership of Sheryl Sandberg for all the controversies surrounding it. Scandals aside, even if a user finds the content hosted by Facebook displeasing, they don’t really have a choice to “stop using Facebook” as their friends and family continue to use the platform to stay in touch. Also, Facebook charges advertisers depending on how many people see a message instead of being based on ad clicks. This is why Facebook’s algorithm is programmed in a way that it prioritizes more engaging branded content and ads over the others. Monopoly and Gender Inequality As the market power of these tech giants increases, so does their wealth. Hubbard points out that the wealth from the many among the working and middle classes get transferred to the few belonging to the 1% and 0.1% at the top of the income and wealth distribution. The concentration of market power hurts workers and results in depresses wages, affecting women and other minority workers the most. “When general wages go down or stagnate, female workers are even worse off. Women make 78 cents to a man’s dollar, with black women making 64 cents and Latina women making 54 cents for every dollar a white man makes. As wages by the bottom 99% of earners continue to shrink, women get paid a mere percentage of fewer dollars. And the top 1% of earners are predominantly men”, mentions Sally Hubbard. There have also been declines in employee mobility as there are lesser firms competing due to giant firms acquiring smaller firms. This leads to reduced bargaining power in the hands of an employee. Moreover, these firms also t impose non-compete clauses and no-poach agreements putting a damper on workers’ ability to switch jobs. As eloquently put by Hubbard, “these tech platforms are the ones controlling the rules of the arena in which the game is played and are also the ones playing the game”. Taking into consideration this analogy, it’s anyone’s guess who’ll win the game. OK Google, why are you ok with mut(at)ing your ethos for Project DragonFly? How far will Facebook go to fix what it broke: Democracy, Trust, Reality Amazon splits HQ2 between New York and Washington, D.C. after a making 200+ states compete over a year; public sentiments largely negative
Read more
  • 0
  • 0
  • 4970

article-image-python-data-visualization-myths-you-should-know-about
Savia Lobo
02 Nov 2018
4 min read
Save for later

Python Data Visualization myths you should know about

Savia Lobo
02 Nov 2018
4 min read
In recent years, we have experienced an exponential growth of data. As the amount of data grows, the need for developers with knowledge of data analytics and especially data visualization spikes. Data visualizations help in getting a clear and concise view of the data, making it more tangible for (non-technical) audiences. MATLAB and R are the two available languages that have been traditionally used for data science and data visualization. However, Python is the most requested and used language in the industry. Its ease of use and the speed at which you can manipulate and visualize data combined with the number of available libraries makes Python the best choice. So Data visualization seems easy, doesn’t it? However, there are a lot of myths surrounding it. Let us have a look at some of them. Myth 1: Data visualizations are just for data scientists Today's data visualization libraries are very convenient, so any person can create meaningful visualizations in just a few minutes. Myth 2: Data visualization technologies are difficult to learn Of course, building and designing sophisticated data visualizations will take some work and learning but with very little knowledge of the libraries and what they are capable of, you can create simple visualizations that will help you get valuable insights into your data. Python is a comparably easy language. The “pythonic” approach is also used when building visualization libraries for Python which makes them easy to understand and use. Myth 3: Data visualization isn’t needed for data insights Imagine having a table of data with 20 columns and several thousand rows. What do you think will give you better insights into this data? Just looking at the table and trying to make sense of all the columns and values in them, or creating some simple plots that visualize the content of this table? Of course, you could force yourself to get insights without visualizations, but the key is to work smarter, not harder. Myth 4: Data visualization takes a lot of time If you have a basic understanding of your data, you can create some basic visualizations in no time. There are a lot of libraries, which will be covered in this course, that allow you to simply import some data and build visualizations in a few lines of code. The more difficult part is creating visualizations which are descriptive and display the concepts you wanted to show but don’t worry, this will be discussed in the course in detail as well. Amidst all the myths, Data visualization in combination with Python is an essential skill when working with data. When properly utilized, it is a powerful combination that not only enables you to get better insights into your data but also gives you the tool to communicate results better. Head over to our course titled ‘Data Visualization with Python’, to use Python with NumPy, Pandas, Matplotlib, and Seaborn to create impactful data visualizations with the real world, public data. About Tim and Mario Tim Großmann is a CS student with interest in diverse topics ranging from AI to IoT. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of big data engineering. He’s highly involved in different Open Source projects and actively speaks at meetups and conferences about his projects and experiences. Mario Döbler is a graduate student with a focus in deep learning and AI. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of deep learning. Currently, he dedicates himself to apply deep learning to medical data to make health care accessible to everyone. 4 tips for learning Data Visualization with Python Setting up Apache Druid in Hadoop for Data visualizations [Tutorial] 8 ways to improve your data visualizations  
Read more
  • 0
  • 0
  • 3776
Banner background image

article-image-4-tips-for-learning-data-visualization-with-python
Sugandha Lahoti
01 Nov 2018
4 min read
Save for later

4 tips for learning Data Visualization with Python

Sugandha Lahoti
01 Nov 2018
4 min read
Data today is the world’s most important resource. However, without properly visualizing your data to discover meaningful insights, it’s useless. Creating visualizations helps in getting a clearer and concise view of the data, making it more tangible for (non-technical) audiences. Python is the choice of programming language for developers these days. However, sometimes developers face issues performing data visualization with Python. In this post, Tim Großmann, and Mario Döbler, the authors of the Data Visualization with Python course, discuss some of the best practices you should keep in mind while visualizing data with Python. #1 Start looking and experimenting with examples One of the most important ways to deeply understand and learn to use Python for data visualizations is to download example projects and play around with them. You should read their documentation and comments and change values, observing what influence it has. In many cases, they can even serve as a starting point to insert your own data. Think about how you could modify the given examples to visualize your own data. #2 Start from scratch and build on it Sometimes starting with an empty canvas is the best approach. Start with only the necessary components like your data and the import of your library of choice. This builds a nice flow and process that will enable you to debug problems with precision. Once you have gone through the whole process of building a simple visualization, you will have a good understanding of where an error might occur and how to fix it. Starting from scratch sometimes shows you that simpler solutions will save you a lot of time while still communicating the essence of your idea. #3 Make full use of documentation There are libraries with plenty of documentation to answer every single question you have. Make sure to make best use of it, research their API, look at the given example, and search for open issues on their GitHub pages when encountering a problem. Especially the libraries covered in the course “Data Visualization with Python” not only has extensive documentation, but also an active community that is constantly creating new questions on StackOverflow which will help you to find solutions to your problems in no time. #4 Use every opportunity you have with data to visualize it Every time you encounter new data take a few minutes and think about what information might be interesting and visualize it. Think back to the last time you had to give a presentation about your findings and all you had was a table with numerical values in it. For you it was understandable, but your colleagues sat there and scratched their heads. Try to create some simple visualizations that would have impressed the entire team with your results. Only practice makes you perfect. We hope that these tips will not only enable you to get better insights into your data but also gives you the tool to communicate results better. Don’t forget to checkout our course Data Visualization with Python to understand, explore, and effectively present data using the powerful data visualization techniques of Python. About the authors Tim Großmann is a CS student with interest in diverse topics ranging from AI to IoT. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of big data engineering. He’s highly involved in different Open Source projects and actively speaks at meetups and conferences about his projects and experiences. Mario Döbler is a graduate student with a focus in deep learning and AI. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of deep learning. Currently, he dedicates himself to apply deep learning to medical data to make health care accessible to everyone. 8 ways to improve your data visualizations Seaborn v0.9.0 brings better data visualization with new relational plots, theme updates, and more Getting started with Data Visualization in Tableau
Read more
  • 0
  • 0
  • 3807

article-image-deep-reinforcement-learning-trick-or-treat
Bhagyashree R
31 Oct 2018
2 min read
Save for later

Deep reinforcement learning - trick or treat?

Bhagyashree R
31 Oct 2018
2 min read
Deep Reinforcement Learning (Deep RL) is the new buzzword in the machine learning world. Deep RL is an approach which combines reinforcement learning and deep learning in order to achieve human-level performance. It brings together the self-learning approach to learn successful strategies that lead to the greatest long-term rewards and allows the agents to construct and learn their own knowledge directly from raw inputs. With the fusion of these two approaches, we saw the introduction of many algorithms, starting with DeepMind’s Deep Q Network (DQN). It is a deep variant of the Q-learning algorithm. This algorithm reached human-level performance in playing Atari games. Combining Q-learning with reasonably sized neural networks and some optimization tricks, you can achieve human or superhuman performance in several Atari games. Deep RL resulted in one of the notable advancements in the game of AlphaGo.The AI agent by DeepMind was able to beat the human world champions Lee Sedol (4-1) and Fan Hui (5-0). DeepMind then further released advanced versions of their Agent called AlphaGO Zero and AlphaZero. Many recent works from the researchers at UC Berkeley have shown how both reinforcement learning and deep reinforcement learning have enabled the control of complex robots, both for locomotion and navigation. Despite these successes, it is quite difficult to find cases where deep RL has added any practical real-world value. The current status is that it is still a research topic. One of its limitations is that it assumes the existence of a reward function, which is either given or is hand-tuned offline. To get the desired results, your reward function must capture exactly what you want. RL has an annoying tendency to overfit to your reward, resulting in things you haven’t expected. This is the reason why Atari is a benchmark, as it is not only easy to get a lot of samples, but the goal is fairly straightforward i.e to maximize score. With so many researchers working towards introducing improved Deep RL algorithms, it surely is a treat. AlphaZero: The genesis of machine intuition DeepMind open sources TRFL, a new library of reinforcement learning building blocks Understanding Deep Reinforcement Learning by understanding the Markov Decision Process [Tutorial]
Read more
  • 0
  • 0
  • 3612

article-image-teaching-ai-ethics-trick-or-treat
Natasha Mathur
31 Oct 2018
5 min read
Save for later

Teaching AI ethics - Trick or Treat?

Natasha Mathur
31 Oct 2018
5 min read
The Public Voice Coalition announced Universal Guidelines for Artificial Intelligence (UGAI) at ICDPPC 2018, last week. “The rise of AI decision-making also implicates fundamental rights of fairness, accountability, and transparency. Modern data analysis produces significant outcomes that have real-life consequences for people in employment, housing, credit, commerce, and criminal sentencing. Many of these techniques are entirely opaque, leaving individuals unaware whether the decisions were accurate, fair, or even about them. We propose these Universal Guidelines to inform and improve the design and use of AI”, reads the EPIC’s guideline page. Artificial Intelligence ethics aim to improve the design and use of AI, as well as to minimize the risk for society, as well as ensures the protection of human rights. AI ethics focuses on values such as transparency, fairness, reliability, validity, accountability, accuracy, and public safety. Why teach AI ethics? Without AI ethics, the wonders of AI can convert into the dangers of AI, posing strong threats to society and even human lives. One such example is when earlier this year, an autonomous Uber car, a 2017 Volvo SUV traveling at roughly 40 miles an hour, killed a woman in the street in Arizona. This incident brings out the challenges and nuances of building an AI system with the right set of values embedded in them. As different factors are considered for an algorithm to reach the required set of outcomes, it is more than possible that these criteria are not always shared transparently with the users and authorities. Other non-life threatening but still dangerous examples include the time when Google Allo, responded with a turban emoji on being asked to suggest three emoji responses to a gun emoji, and when Microsoft’s Twitter bot Tay, who tweeted racist and sexist comments. AI scientists should be taught at the early stages itself that they these values are meant to be at the forefront when deciding on factors such as the design, logic, techniques, and outcome of an AI project. Universities and organizations promoting learning about AI ethics What’s encouraging is that organizations and universities are taking steps (slowly but surely) to promote the importance of teaching ethics to students and employees working with AI or machine learning systems. For instance, The World Economic Forum Global Future Councils on Artificial Intelligence and Robotics has come out with “Teaching AI ethics” project that includes creating a repository of actionable and useful materials for faculties wishing to add social inquiry and discourse into their AI coursework. This is a great opportunity as the project connects professors from around the world and offers them a platform to share, learn and customize their curriculum to include a focus on AI ethics. Cornell, Harvard, MIT, Stanford, and the University of Texas are some of the universities that recently introduced courses on ethics when designing autonomous and intelligent systems. These courses put an emphasis on the AI’s ethical, legal, and policy implications along with teaching them about dealing with challenges such as biased data sets in AI. Mozilla has taken initiative to make people more aware of the social implications of AI in our society through its Mozilla’s Creative Media Awards. “We’re seeking projects that explore artificial intelligence and machine learning. In a world where biased algorithms, skewed data sets, and broken recommendation engines can radicalize YouTube users, promote racism, and spread fake news, it’s more important than ever to support artwork and advocacy work that educates and engages internet users”, reads the Mozilla awards page. Moreover, Mozilla also announced a $3.5 million award for ‘Responsible Computer Science Challenge’ to encourage teaching ethical coding to CS graduates. Other examples include Google’s AI ethics principles announced back in June, to abide by when developing AI projects, and SAP’s AI ethics guidelines and an advisory panel created last month. SAP says that they have designed these guidelines as it “considers the ethical use of data a core value. We want to create software that enables intelligent enterprise and actually improves people’s lives. Such principles will serve as the basis to make AI a technology that augments human talent”. Other organizations, like Drivendata have come out with tools like Deon, a handy tool that helps data scientists add an ethics checklist to your data science projects, making sure that all projects are designed keeping ethics at the center. Some, however, feel that having to explain how an AI system reached a particular outcome (in the name of transparency) can put a damper on its capabilities. For instance, according to David Weinberger, a senior researcher at the Harvard Berkman Klein Center for Internet & society, “demanding explicability sounds fine, but achieving it may require making artificial intelligence artificially stupid”. Teaching AI ethics- trick or treat? AI has transformed the world as we know it. It has taken over different spheres of our lives and made things much simpler for us. However, to make sure that AI continues to deliver its transformative and evolutionary benefits effectively, we need ethics. From governments to tech organizations to young data scientists, everyone must use this tech responsibly. Having AI ethics in place is an integral part of the AI development process and will shape a healthy future of robotics and artificial intelligence. That is why teaching AI ethics is a sure-shot treat. It is a TREAT that will boost the productivity of humans in AI, and help build a better tomorrow.
Read more
  • 0
  • 0
  • 3291
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-digital-wellbeing-trick-or-treat
Sugandha Lahoti
31 Oct 2018
2 min read
Save for later

Digital wellbeing - Trick or Treat?

Sugandha Lahoti
31 Oct 2018
2 min read
Digital Wellbeing is coming into full view as Facebook, Instagram, Google's Android and Apple iOS 12 are all introducing digital wellbeing dashboards and features to their operating systems. Basically, Digital Wellbeing enables users to understand their digital habits, control the demands technology places on their attention, and focus on what actually matters. Google introduced a set of features named ‘Digital Wellbeing’ with it’s Android 9 Pie OS. The new features include a Dashboard, to monitor how long you’ve been using your phone and specific apps; App timer, to help users tap into the apps they are using and set a time limit on it for daily usage;  Do Not Disturb to prevent users from hearing any kind of notification from text or emails and Wind down, which turns your screen to grayscale making the apps less tempting as your bedtime approaches. Apple went a step further than Google when it comes to parental controls. While Google's usage dashboard and limits seem primarily designed for users to limit their own behavior, Apple's will let parents remotely manage their kid's usage from their own devices. Facebook is also not far behind with a new tool dubbed, “Your Time on Facebook,” to help users manage their time spent in the Facebook app on each of the last seven days, as well as see their average time spent per day. However, there is no proven research on these features. Much of what we know is based not on peer-reviewed research but on anecdotal data. Sometimes educational apps and videos meant for young children also contain ads on topics which are irrelevant to the learning objective. These ads may potentially soil the mind of young children. There is a growing pressure from public interest groups for the FTC and other government bodies to launch an investigation against these apps and hold developers accountable for their practices. Overall, Digital Wellbeing features sound like a real step forward taken by these tech giants in making the phones less addictive. If done right, this would help users focus on what actually matters and may definitely prove to be a TREAT. But for now, we are reserving our judgement. Tech Titans, Acquisitions and Regulation – Trick or Treat? Edge computing – Trick or Treat? WebAssembly – Trick or Treat?
Read more
  • 0
  • 0
  • 1675

article-image-tech-titans-acquisitions-and-regulation-trick-or-treat
Sugandha Lahoti
29 Oct 2018
5 min read
Save for later

Tech Titans, Acquisitions and Regulation - Trick or Treat?

Sugandha Lahoti
29 Oct 2018
5 min read
In probably the biggest open source acquisition ever, IBM announced that it has acquired Red Hat for $34 billion on Sunday. This is consistent with the trend of Silicon Valley giants’ increasing appetite for growth. The past few months also saw the emergence of the trillion dollar tech titans that has mesmerised even Wall Street. Apple and Amazon rose high in their stocks on their race to a $1 Trillion market cap with Google and Microsoft continuing to relentlessly chase that goal.  Even though Facebook and Twitter stocks took heavy blows thanks to the controversies surrounding their platforms, they continue to be valued a lot higher than solid stocks in other industries. Silicon Valley giants also acquired new companies and startups with the aim of capturing the market and coveted users. Microsoft acquired GitHub, and an AI startup Lobe; Alphabet, Google’s parent company helped GitLab raise $100 million in funding; Apple bought Shazam for an estimated $400 million; Cloudera and Hortonworks also merged to advance hybrid cloud development, Edge and Artificial Intelligence. These investments and acquisitions are a clear indication that companies are collaborating together to further technical advancements. Microsoft’s acquisition is also a signal that the attitude of mature Silicon Valley giants towards open source has changed significantly in recent years. However, people fear, that this embracing of open source is more about business than about values. Billion dollar acquisitions don’t exactly scream ‘free and open software’. Some also say that such acquisitions give access to the acquired company’s user base which big companies are most interested in. This issue was again brought up when EU regulators started an investigation over the concern that Apple’s acquisition of Shazam would potentially give Apple an unfair advantage over its rivals such as Spotify. This year has also been the year of questionable data harvesting practices and frequent and massive data breaches across firms, each affecting millions of users, even as tech titans raced to the $1 trillion club. 2018 opened with Facebook’s Cambridge Analytica scandal, that used Facebook’s user data to influence votes in the UK and US. Moreover, 50M facebook user accounts were compromised, a multimillion-dollar ad fraud scheme secretly tracked Android phones and 500K Google+ accounts were compromised by an undisclosed bug. In July, Timehop, a social media application also suffered a data breach with 21 million users’ data compromised. Just a few days ago, Cathay Pacific, a major Hong Kong based airlines, suffered a data breach affecting 9.4 million passengers. In September, Uber paid $148m over a data breach cover-up. Two weeks back, Pentagon also revealed a cybersecurity breach where hackers stole personal data of tens of thousands of military and civilian US Defense Department personnel. All of these events have left many users and even developers jaded. This has led to a growing ‘techlash’ that is throwing its weight on the need for tech regulation in recent times. Tech regulation in its simplest sense means the tech industry cannot be trusted to regulate itself and there must an independent entity that oversees how tech companies behave. This regulatory body would have power to formulate and implement policies and penalize those that don’t comply. Supporters of tech regulation argue that regulation can restore accountability and rebuild trust in tech. It will also make the conversation around the uses and abuses of technology more public while protecting citizens and software engineers. Tech regulation supporters also believe that regulation can bridge the gap between entrepreneurs, engineers and lawmakers. Read more: 5 reasons government should regulate technology However, tech regulation is not without pitfalls. Tech regulation may come at the cost of tech innovation. For example, user privacy and tech innovation are interlinked. Machine learning systems need more data to get better at their jobs. If more users choose to not share their data, the recommendations they get are likely to be generic at best or even irrelevant. Also, advertising revenue for tech companies might be hit by the limited opportunities to profile users. This could have adverse impact on companies’ ability to continue to innovate and provide free products for their users. There is a need to strike a delicate balance to make privacy work practically. This is the conclusion the US senate has come to as it continues to meet with industry leaders, and privacy experts to understand how to protect consumer data privacy without crippling tech innovation. Moreover, companies may may game tech regulation policies by providing users with little choice. For example they could simply deprive users of their services, should they choose to not share their data with the company. This should also be kept in mind while formulating both tech regulatory bodies and policy frameworks. Although data and security breaches are nasty tricks, they have been instrumental in opening the conversation around tech regulations and privacy policies, which if done right, may eventually make it a TREAT to users. As for tech acquisitions, they are never what they seem to be. Not only do they vary from company to company, but also have complex factors at play - people, culture, market, timing among others. It would be unfair or naive to claim tech acquisitions as purely tricks or treats. The truth lies somewhere in shades of gray. One time is clear though, funding does make the world go round! Sir Tim Berners-Lee on digital ethics and socio-technical systems at ICDPPC 2018 Gartner lists ‘Digital Ethics and Privacy’ as one of the top 10 strategic technology trends for 2019 Is Mozilla the most progressive tech organization on the planet right now?
Read more
  • 0
  • 0
  • 1976

article-image-top-five-questions-to-ask-when-evaluating-a-data-monitoring-solution
Guest Contributor
27 Oct 2018
6 min read
Save for later

Top five questions to ask when evaluating a Data Monitoring solution

Guest Contributor
27 Oct 2018
6 min read
Massive changes are happening around the way IT services are consumed and delivered. Cloud-based infrastructure is being tied together and instrumented by DevOps processes, while microservices-driven apps are replacing monolithic architectures. This evolution is driving the need for greater monitoring and better analysis of data than we have ever seen before. This need is compounded by the fact that an application today may be instrumented with the help of sensors and devices providing users with critical input in making decisions. Why is there a need for monitoring and analysis? The placement of sensors on practically every available surface in the material world – from machines to humans – is a reality today. Almost anything that is capable of giving off a measurable metric or recorded event can be instrumented, in the virtual world as well as the physical world, and has the need for monitoring. Metrics involve the consistent measurement of characteristics, such as CPU usage, while events are something that is triggered, such as temperature reaching above a threshold. The right instrumentation, observation and analytics are required to create business insight from the myriad of data points coming from these instruments. In the virtual world, monitoring and controlling software components that drive business processes is critical. Data monitoring in software is an important aspect of visualizing what systems are doing – what activities are happening, and precisely when – and how well the applications and services are performing. There is, of course, a business justification for all this monitoring of constant streams of metrics and events data. Companies want to become more data-driven, they want to apply data insights to be better situationally aware of business opportunities and threats. A data-driven organization is able to predict outcomes more effectively than relying on historical information, or on gut instinct. When vast amounts of data points are monitored and analyzed, the organization can find interesting “business moments” in the data. These insights help identify emerging opportunities and competitive advantages. How to develop a Data monitoring strategy Establishing an overall IT monitoring strategy that works for everyone across the board is nearly impossible. But it is possible to develop a monitoring strategy which is uniquely tailored to specific IT and business needs. At a high level, organizations can start developing their Data monitoring strategy by asking these five fundamental questions: #1 Have we considered all stakeholder needs? One of the more common mistakes DevOps teams make is focusing the monitoring strategies on the needs of just a few stakeholders and not addressing the requirements of stakeholders outside of IT operations, such as line of business (LOB) owners, application developers and owners, and other subgroups within operations, such as network operations (NOC) or communications teams. For example, an app developer may need usage statistics around application performance while the network operator might be interested in network bandwidth usage by that app’s users. #2 Will the data capture strategy meet future needs? Organizations, of course, must key on the data capture needs of today at the enterprise level, but at the same time, must consider the future. Developing a long-term plan helps in future-proofing the overall strategy since data formats and data exchange protocols always evolve. The strategy should also consider future needs around ingestion and query volumes. Planning for how much data will be generated, stored and archived will help establish a better long-term plan. #3 Will the data analytics satisfy my organization’s evolving needs? Data analysis needs always change over time. Stakeholders will ask for different types of analysis and planning ahead for those needs, and opting for a flexible data analysis strategy will help ensure that the solution is able to support future needs. #4 Is the presentation layer modular and embeddable? A flexible user interface that addresses the needs of all stakeholders is important for meeting the organization’s overarching goals. Solutions which deliver configurable dashboards that enable users to specify queries for custom dashboards meet this need for flexibility. Organizations should consider a plug-and-play model which allows users to choose different presentation layers as needed. #5 Does architecture enable smart actions? The ability to detect anomalies and trigger specific actions is a critical part of a monitoring strategy. A flexible and extensible model should be used to meet the notification preferences of diverse user groups. Organizations should consider self-learning models which can be trained to detect undefined anomalies from the collected data. Monitoring solutions which address the broader monitoring needs of the entire enterprise are preferred. What are purpose-built monitoring platforms Devising an overall IT monitoring strategy that meets these needs and fundamental technology requirements is a tall order. But new purpose-built monitoring platforms have been created to deal with today’s new requirements for monitoring and analyzing these specific metrics and events workloads – often called time-series data – and provide situational awareness to the business. These platforms support ingesting millions of data points per second, can scale both horizontally and vertically, are designed from the ground up to support real-time monitoring and decision making, and have strong machine learning and anomaly detection functions to aid in discovering interesting business moments. In addition, they are resource-aware, applying compression and down-sampling functions to aid in optimal resource utilization, and are built to support faster time to market with minimal dependencies. With the right strategy in mind, and tools in place, organizations can address the evolving monitoring needs of the entire organization. About the Author Mark Herring is the CMO of InfluxData. He is a passionate marketeer with a proven track record of generating leads, building pipeline, and building vibrant developer and open source communities. Data-driven marketeer with proven ability to define the forest from the trees, improve performance, and deliver on strategic imperatives. Prior to InfluxData, Herring was vice president of corporate marketing and developer marketing at Hortonworks where he grew the developer community by over 40x. Herring brings over 20 years of relevant marketing experience from his roles at Software AG, Sun, Oracle, and Forte Software. TensorFlow announces TensorFlow Data Validation (TFDV) to automate and scale data analysis, validation, and monitoring. How AI is going to transform the Data Center. Introducing TimescaleDB 1.0, the first OS time-series database with full SQL support.
Read more
  • 0
  • 0
  • 4735

article-image-is-initiative-q-a-pyramid-scheme-or-just-a-really-bad-idea
Richard Gall
25 Oct 2018
5 min read
Save for later

Is Initiative Q a pyramid scheme or just a really bad idea?

Richard Gall
25 Oct 2018
5 min read
If things seem too good to be true, they probably are. That's a pretty good motto to live by, and one that's particularly pertinent in the days of fake news and crypto-bubbles. However, it seems like advice many people haven't heeded with Initiative Q, a new 'payment system' developed by the brains behind PayPal technology. That's not to say that Initiative Q certainly is too good to be true. But when an organisation appears to be offering almost hundreds of thousands of dollars to users who simply offer an email and then encourage others to offer theirs, caution is essential. If it looks like a pyramid scheme, then do you really want to risk the chance that it might just be a pyramid scheme? What is Initiative Q? Initiative Q, is, according to its founders, "tomorrow's payment network." On its website it says that current methods of payment, such as credit cards, are outdated. They open up the potential for fraud and other bad business practices, as well as not being particularly efficient. Initiative Q claims that is it going to develop an alternative to these systems "which aggregate the best ideas, innovations, and technologies developed in recent years." It isn't specific about which ideas and technological innovations its referring to, but if you read through the payment model it wants to develop, there are elements that sound a lot like blockchain. For example, it talks about using more accurate methods of authentication to minimize fraud, and improving customer protection by "creating a network where buyers don’t need to constantly worry about whether they are being scammed" (the extent to which this turns out to be deliciously ironic remains to be seen). To put it simply, it's a proposed new payment system that borrows lots of good ideas that still haven't been shaped into a coherent whole. Compelling, yes, but alarm bells are probably sounding. Who's behind Initiative Q? There are very few details on who is actually involved in Initiative Q. The only names attached to the project are Saar Wilf, an entrepreneur who founded Fraud Sciences, a payment technology that was bought by PayPal in 2008, and Lawrence White, Professor of Monetary Theory and Policy and George Mason University. The team should grow, however. Once the number of members has grown to a significant level, the Initiative Q team say "we will continue recruiting the world’s top professionals in payment systems, macroeconomics, and Internet technologies." How is Initiative Q supposed to work? Initiative Q explains that for the world to adopt a new payment network is a huge challenge - a fair comment, because after all, for it to work at all, you need actors within that network who believe in it and trust it. This is why the initial model - which looks and feels a hell of a lot like a pyramid or Ponzi scheme - is, according to Initiative Q, so important. To make this work, you need a critical mass of users. Initiative Q actually defends itself from accusations that it is a Pyramid scheme by pointing out that there's no money involved at this stage. All that happens is that when you sign up you receive a specific number of 'Qs' (the name of the currency Initiative Q is proposing). These Qs obviously aren't worth anything at the moment. The idea is that when the project actually does reach critical mass, it will take on actual value. Isn't Initiative Q just another cryptocurrency? Initiative Q is keen to stress that it isn't a cryptocurrency. That said, on its website the project urges you to "think of it as getting free bitcoin seven years ago." But the website does go into a little more detail elsewhere, explaining that "cryptocurrencies have failed as currencies" because they "focus on ensuring scarcity" while neglecting to consider how people might actually use them in the real world." The implication, then, is that Initiative Q is putting adoption first. Presumably, it's one of the reasons that it has decided to go with such an odd acquisition strategy. Ultimately though, it's too early to say whether Initiative Q is or isn't a cryptocurrency in the strictest (ie. fully de-centralized etc.) sense. There simply isn't enough detail about how it will work. Of course, there are reasons why Initiative Q doesn't want to be seen as a cryptocurrency. From a marketing perspective, it needs to look distinctly different from the crypto-pretenders of the last decade. Initiative Q: pyramid scheme or harmless vaporware? Because no money is exchanged at any point, it's difficult to call Initiative Q a ponzi or pyramid scheme. In fact it's actually quite hard to know what to call it. As David Gerard wrote in a widely shared post from June, published when Initiative Q had a first viral wave, "the Initiative Q payment network concept is hard to critique — because not only does it not exist, they don’t have anything as yet, except the notion of “build a payment network and it’ll be awesome.” But while it's hard to critique, it's also pretty hard to say that it's actually fraudulent. In truth, at the moment it's relatively harmless. However, as David Gerard points out in the same post, if the data of those who signed up is hacked - or even sold (although the organization says it won't do that) - that's a pretty neat database of people who'll offer their details up in return for some empty promises of future riches.
Read more
  • 0
  • 0
  • 5052
article-image-julia-for-machine-learning-will-the-new-language-pick-up-pace
Prasad Ramesh
20 Oct 2018
4 min read
Save for later

Julia for machine learning. Will the new language pick up pace?

Prasad Ramesh
20 Oct 2018
4 min read
Machine learning can be done using many languages, with Python and R being the most popular. But one language has been overlooked for some time—Julia. Why isn’t Julia machine learning a thing? Julia isn't an obvious choice for machine learning simply because it's a new language that has only recently hit version 1.0. While Python is well-established, with a large community and many libraries, Julia simply doesn't have the community to shout about it. And that's a shame. Right now Julia is used in various fields. From optimizing milk production in dairy farms to parallel supercomputing for astronomy, Julia has a wide range of applications. A common theme here is that these actions all require numerical, scientific, and sometimes parallel computation. Julia is well-suited to the sort of tasks where intensive computation is essential. Viral Shah, CEO of Julia Computing said to Forbes “Amazon, Apple, Disney, Facebook, Ford, Google, Grindr, IBM, Microsoft, NASA, Oracle and Uber are other Julia users, partners and organizations hiring Julia programmers.” Clearly, Julia is powering the analytical nous of some of the most high profile organizations on the planet. Perhaps it just needs more cheerleading to go truly mainstream. Why Julia is a great language for machine learning Julia was originally designed for high-performance numerical analysis. This means that everything that has gone into its design is built for the very things you need to do to build effective machine learning systems. Speed and functionality Julia combines the functionality from various popular languages like Python, R, Matlab, SAS and Stata with the speed of C++ and Java. A lot of the standard LaTeX symbols can be used in Julia, with the syntax usually being the same as LaTeX. This mathematical syntax makes it easy for implementing mathematical formulae in code and make Julia machine learning possible. It also has in-built support for parallelism which allows utilization of multiple cores at once making it fast at computations. Julia’s loops and functions features are pretty fast, fast enough that you would probably notice significant performance differences against other languages. The performance can be almost comparable to C with very little code actually used. With packages like ArrayFire, generic code can be run on GPUs. In Julia, the multiple dispatch feature is very useful for defining number and array-like datatypes. Matrices, data tables work with good compatibility and performance. Julia has automatic garbage collection, a collection of libraries for mathematical calculations, linear algebra, random number generation, and regular expression matching. Libraries and scalability Julia machine learning can be done with powerful tools like MLBase.jl, Flux.jl, Knet.jl, that can be used for machine learning and artificial intelligence systems. It also has a scikit-learn implementation called ScikitLearn.jl. Although ScikitLearn.jl is not an official port, it is a useful additional tool for building machine learning systems with Julia. As if all those weren’t enough, Julia also has TensorFlow.jl and MXNet.jl. So, if you already have experience with these tools, in other implementations, the transition is a little easier than learning everything from scratch. Julia is also incredibly scalable. It can be deployed on large clusters quickly, which is vital if you’re working with big data across a distributed system. Should you consider Julia machine learning? Because it’s fast and possesses a great range of features, Julia could potentially overtake both Python and R to be the choice of language for machine learning in the future. Okay, maybe we shouldn’t get ahead of ourselves. But with Julia reaching the 1.0 milestone, and the language rising on the TIOBE index, you certainly shouldn’t rule out Julia when it comes to machine learning. Julia is also available to use in the popular tool Jupyter Notebook, paving a path for wider adoption. A note of caution, however, is important. Rather than simply dropping everything for Julia, it will be worth monitoring the growth of the language. Over the next 12 to 24 months we’ll likely see new projects and libraries, and the Julia machine learning community expanding. If you start hearing more noise about the language, it becomes a much safer option to invest your time and energy in learning it. If you are just starting off with machine learning, then you should stick to other popular languages. An experienced engineer, however, who already has a good grip on other languages shouldn’t be scared of experimenting with Julia - it gives you another option, and might just help you to uncover new ways of working and solving problems. Julia 1.0 has just been released What makes functional programming a viable choice for artificial intelligence projects? Best Machine Learning Datasets for beginners
Read more
  • 0
  • 0
  • 6875

article-image-5-best-practices-to-perform-data-wrangling-with-python
Savia Lobo
18 Oct 2018
5 min read
Save for later

5 best practices to perform data wrangling with Python

Savia Lobo
18 Oct 2018
5 min read
Data wrangling is the process of cleaning and structuring complex data sets for easy analysis and making speedy decisions in less time. Due to the internet explosion and the huge trove of IoT devices there is a massive availability of data, at present. However, this data is most often in its raw form and includes a lot of noise in the form of unnecessary data, broken data, and so on. Clean up of this data is essential in order to use it for analysis by organizations. Data wrangling plays a very important role here by cleaning this data and making it fit for analysis. Also, Python language has built-in features to apply any wrangling methods to various data sets to achieve the analytical goal. Here are 5 best practices that will help you out in your data wrangling journey with the help of Python. And at the end, all you’ll have is a clean and ready to use data for your business needs. 5 best practices for data wrangling with Python Learn the data structures in Python really well Designed to be a very high-level language, Python offers an array of amazing data structures with great built-in methods. Having a solid grasp of all the capabilities will be a potent weapon in your repertoire for handling data wrangling task. For example, dictionary in Python can act almost like a mini in-memory database with key-value pairs. It supports extremely fast retrieval and search by utilizing a hash table underneath. Explore other built-in libraries related to these data structures e.g. ordered dict, string library for advanced functions. Build your own version of essential data structures like stack, queues, heaps, and trees, using classes and basic structures and keep them handy for quick data retrieval and traversal. Learn and practice file and OS handling in Python How to open and manipulate files How to manipulate and navigate directory structure Have a solid understanding of core data types and capabilities of Numpy and Pandas How to create, access, sort, and search a Numpy array. Always think if you can replace a conventional list traversal (for loop) with a vectorized operation. This will increase speed of your data operation. Explore special file types like .npy (Numpy’s native storage) to access/read large data set with much higher speed than usual list. Know in details all the file types you can read using built-in Pandas methods. This will simplify to a great extent your data scraping. Almost all of these methods have great data cleaning and other checks built in. Try to use such optimized routines instead of writing your own to speed up the process. Build a good understanding of basic statistical tests and a panache for visualization Running some standard statistical tests can quickly give you an idea about the quality of the data you need to wrangle with. Plot data often even if it is multi-dimensional. Do not try to create fancy 3D plots. Learn to explore simple set of pairwise scatter plots. Use boxplots often to see the spread and range of the data and detect outliers. For time-series data, learn basic concepts of ARIMA modeling to check the sanity of the data Apart from Python, if you want to master one language, go for SQL As a data engineer, you will inevitably run across situations where you have to read from a large, conventional database storage. Even if you use Python interface to access such database, it is always a good idea to know basic concepts of database management and relational algebra. This knowledge will help you build on later and move into the world of Big Data and Massive Data Mining (technologies like Hadoop/Pig/Hive/Impala) easily. Your basic data wrangling knowledge will surely help you deal with such scenarios. Although Data wrangling may be the most time-consuming process, it is the most important part of the data management. Data collected by businesses on a daily basis can help them make decisions on the latest information available. It also allows businesses to find the hidden insights and use it in the decision-making processes and provide them with new analytic initiatives, improved reporting efficiency and much more. About the authors Dr. Tirthajyoti Sarkar works in San Francisco Bay area as a senior semiconductor technologist where he designs state-of-the-art power management products and applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. He has 15+ years of R&D experience and is a senior member of IEEE. Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup where he is applying the state-of-the-art Computer Vision and Data Engineering algorithms and tools to develop cutting edge product. Data cleaning is the worst part of data analysis, say data scientists Python, Tensorflow, Excel and more – Data professionals reveal their top tools Manipulating text data using Python Regular Expressions (regex)
Read more
  • 0
  • 0
  • 7304

article-image-4-misconceptions-about-data-wrangling
Sugandha Lahoti
17 Oct 2018
4 min read
Save for later

4 misconceptions about data wrangling

Sugandha Lahoti
17 Oct 2018
4 min read
Around 80% of the time in data analysis is spent on cleaning and preparing data for analysis. This is, however, an important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis, and reporting. Although, being an important task given its nature, there are certain myths associated with data wrangling which developers should be cautious of. In this post, we will discuss four such misconceptions. Myth #1: Data wrangling is all about writing SQL query There was a time when data processing needed data to be presented in a relational manner so that SQL queries could be written. Today, there are many other types of data sources in addition to the classic static SQL databases, which can be analyzed. Often, an engineer has to pull data from diverse sources such as web portals, Twitter feeds, sensor fusion streams, police or hospital records. Static SQL query can help only so much in those diverse domains. A programmatic approach, which is flexible enough to interface with myriad sources and is able to parse the raw data through clever algorithmic techniques and use of fundamental data structures (trees, graphs, hash tables, heaps), will be the winner. Myth #2: Knowledge of statistics is not required for data wrangling Quick statistical tests and visualizations are always invaluable to check the ‘quality’ of the data you sourced. These tests can help detect outliers and wrong data entry, without running complex scripts. For effective data wrangling, you don’t need to have knowledge of advanced statistics. However, you must understand basic descriptive statistics and know how to execute them using built-in Python libraries. Myth #3: You have to be a machine learning expert to do great data wrangling Deep knowledge of machine learning is certainly not a pre-requisite for data wrangling. It is true that the end goal of data wrangling is often to prepare the data so that it can be used in a machine learning task downstream. As a data wrangler, you do not have to know all the nitty-gritties of your project’s machine learning pipeline. However, it is always a good idea to talk to the machine learning expert who will use your data and understand the data structure interface and format he/she needs to run the model fast and accurately. Myth #4: Deep knowledge of programming is not required for data wrangling As explained above, the diversity and complexity of data sources require that you are comfortable with deep notions of fundamental data structures and how a programming language paradigm handles them. Increasing deep knowledge of the programming framework (Python for example) will surely help you to come up with innovative methods for dealing with data source interfacing and data cleaning issues. The speed and efficiency of your data processing pipeline can often be benefited from using advanced knowledge of basic algorithms e.g. search, sort, graph traversal, hash table building, etc. Although built-in methods in standard libraries are optimized, having this knowledge gives you an edge for any situation. You read a guest post from Tirthajyoti Sarkar and Shubhadeep Roychowdhury, the authors of Data Wrangling with Python. We hope that these misconceptions would help you realize that data wrangling is not as difficult as it seems. Have fun wrangling data! About the authors Dr. Tirthajyoti Sarkar works as a Sr. Principal Engineer in the semiconductor technology domain where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup. He holds a Master Degree in Computer Science from West Bengal University Of Technology and certifications in Machine Learning from Stanford. Don’t forget to check out Data Wrangling with Python to learn the essential basics of data wrangling using Python. 30 common data science terms explained Python, Tensorflow, Excel and more – Data professionals reveal their top tools How to create a strong data science project portfolio that lands you a job
Read more
  • 0
  • 0
  • 3824
article-image-how-is-artificial-intelligence-changing-the-mobile-developer-role
Bhagyashree R
15 Oct 2018
10 min read
Save for later

How is Artificial Intelligence changing the mobile developer role?

Bhagyashree R
15 Oct 2018
10 min read
Last year, at Google I/O, Sundar Pichai, the CEO of Google, said: “We are moving from a mobile-first world to an AI-first world” Is it only applicable to Google? Not really. In the recent past, we have seen several advancements in Artificial Intelligence and in parallel a plethora of intelligent apps coming into the market. These advancements are enabling developers to take their apps to the next level by integrating recommendation service, image recognition, speech recognition, voice translation, and many more cool capabilities. Artificial Intelligence is becoming a potent tool for mobile developers to experiment and innovate. The Artificial Intelligence components that are integral to mobile experiences, such as voice-based assistants and location-based services, increasingly require mobile developers to have a basic understanding of Artificial Intelligence to be effective. Of course, you don’t have to be Artificial Intelligence experts to include intelligent components in your app. But, you should definitely understand something about what you’re building into your app and why. After all AI in mobile is not just limited to calling an API, isn't it? There’s more to it and in this article we will explore how Artificial Intelligence will shape the mobile developer role in the immediate future. Read also: AI on mobile: How AI is taking over the mobile devices marketspace What is changing in the mobile developer role? Focus shifting to data With Artificial Intelligence becoming more and more accessible, intelligent apps are becoming the new norm for businesses. Artificial Intelligence strengthens the relationship between brands and customers, inspiring developers to build smart apps that increase user retention. This also means that developers have to direct their focus to data. They have to understand things like how the data will be collected? How will the data be fed to machines and how often will data input be needed? When nearly 1 in 4 people abandon an app after its first use, as a mobile app developer, you need to rethink how you drive in-app personalization and engagement. Explore “humanized” way of user-app interaction With so many chatbots such as Siri and Google Assistant coming into the market, we can see that “humanizing” the interaction between the user and the app is becoming mainstream. “Humanizing” is the process where the app becomes relatable to the user, and the more effective it is conducted, the more the end user will interact with the app. Users now want easy navigation and searching system and Artificial Intelligence fits perfectly in the scenario. The advances in technologies like text-to-speech, speech-to-text, Natural Language Processing, and cloud services, in general, have contributed to the mass adoption of these types of interfaces. Companies are increasingly expecting mobile developers to be comfortable working with AI functionalities Artificial Intelligence is the future. Companies are now expecting their mobile developers to know how to handle the huge amount of data generated every day and how to use it. Here's is an example of what Google wants their engineers to do: “We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day.” This open-ended requirement list shows that it is the right time to learn and embrace Artificial Intelligence as soon as possible. What skills do you need to build intelligent apps? Ideally, data scientists are the ones who conceptualize mathematical models and machine learning engineers are the ones who translate it into the code and train the model. But, when you are working in a resource-tight environment, for example in a start-up, you will be responsible for doing the end-to-end job. It is not as scary as it sounds, because you have several resources to get started with! Taking your first steps with machine learning as a service Learning anything starts with motivating yourself. Directly diving into the maths and coding part of machine learning might exhaust and bore you. That's why it's a good idea to know what the end goal of your entire learning process is going to be and what types of solutions are possible using machine learning. There are many products available that you can try to quickly get started such as Google Cloud AutoML (Beta), Firebase MLKit (Beta), and Fritz Mobile SDK, among others. Read also: Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence Getting your hands dirty After getting a “warm-up” the next step will involve creating and training your own model. This is where you’ll be introduced to TensorFlow Lite, which is going to be your best friend throughout your journey as a machine learning mobile developer. There are many other machine learning tools coming into the market that you can make use of. These tools make building AI in mobile easier. For instance, you can use Dialogflow, a Natural Language Understanding (NLU) platform that makes it easy for developers to design and integrate conversational user interfaces into mobile apps, web applications, devices, and bots. You can then integrate it on Alexa, Cortana, Facebook Messenger, and other platforms your users are on. Read also: 7 Artificial Intelligence tools mobile developers need to know For practicing you can leverage an amazing codelab by Google, TensorFlow For Poets. It guides you through creating and training a custom image classification model. Through this codelab you will learn the basics of data collection, model optimization, and other key components involved in creating your own model. The codelab is divided into two parts. The first part covers creating and training the model, and the second part is focused on TensorFlow Lite which is the mobile version of TensorFlow that allows you to run the same model on a mobile device. Mathematics is the foundation of machine learning Love it or hate it, machine learning and Artificial Intelligence are built on mathematical principles like calculus, linear algebra, probability, statistics, and optimization. You need to learn some essential foundational concepts and the notation used to express them. There are many reasons why learning mathematics for machine learning is important. It will help you in the process of selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, number of parameters and number of features. Maths is needed when choosing parameter settings and validation strategies, identifying underfitting and overfitting by understanding the bias-variance tradeoff. Read also: Bias-Variance tradeoff: How to choose between bias and variance for your machine learning model [Tutorial] Read also: What is Statistical Analysis and why does it matter? What are the key aspects of Artificial Intelligence for mobile to keep in mind? Understanding the problem Your number one priority should be the user problem you are trying to solve. Instead of randomly integrating a machine learning model into an application, developers should understand how the model applies to the particular application or use case. This is important because you might end up building a great machine learning model with excellent accuracy rate, but if it does not solve any problem, it will end up being redundant. You must also understand that while there are many business problems which require machine learning approaches, not all of them do. Most business problems can be solved through simple analytics or a baseline approach. Data is your best friend Machine learning is dependent on data; the data that you use, and how you use it, will define the success of your machine learning model. You can also make use of thousands of open source datasets available online. Google recently launched a tool for dataset search named, Google Dataset Search which will make it easier for you to search the right dataset for your problem. Typically, there’s no shortage of data; however, the abundant existence of data does not mean that the data is clean, reliable, or can be used as intended. Data cleanliness is a huge issue. For example, a typical company will have multiple customer records for a single individual, all of which differ slightly. If the data isn’t clean, it isn’t reliable. The bottom line is, it’s a bad practice to just grabbing the data and using it without considering its origin. Read also: Best Machine Learning Datasets for beginners Decide which model to choose A machine learning algorithm is trained and the artifact that it creates after the training process is called the machine learning model. An ML model is used to find patterns in data without the developer having to explicitly program those patterns. We cannot look through such a huge amount of data and understand the patterns. Think of the model as your helper who will look through all those terabytes of data and extract knowledge and insights from the data. You have two choices here: either you can create your own model or use a pre-built model. While there are several pre-built models available, your business-specific use cases may require specialized models to yield the desired results. These off-the-shelf model may also need some fine-tuning or modification to deliver the value the app is intended to provide. Read also: 10 machine learning algorithms every engineer needs to know Thinking about resource utilization is important Artificial Intelligence-powered apps or apps, in general, should be developed with resource utilization in mind. Though companies are working towards improving mobile hardware, currently, it is not the same as what we can accomplish with GPU clusters in the cloud. Therefore, developers need to consider how the models they intend to use would affect resources including battery power and memory usage. In terms of computational resources, inferencing or making predictions is less costly than training. Inferencing on the device means that the models need to be loaded into RAM, which also requires significant computational time on the GPU or CPU. In scenarios that involve continuous inferencing, such as audio and image data which can chew up bandwidth quickly, on-device inferencing is a good choice. Learning never stops Maintenance is important, and to do that you need to establish a feedback loop and have a process and culture of continuous evaluation and improvement. A change in consumer behavior or a market trend can make a negative impact on the model. Eventually, something will break or no longer work as intended, which is another reason why developers need to understand the basics of what it is they’re adding to an app. You need to have some knowledge of how the Artificial Intelligence component that you just put together is working or how it could be made to run faster. Wrapping up Before falling for the Artificial Intelligence and machine learning hype, it’s important to understand and analyze the problem you are trying to solve. You should examine whether applying machine learning can improve the quality of the service, and decide if this improvement justifies the effort of deploying a machine learning model. If you just want a simple API endpoint and don’t want to dedicate much time in deploying a model, cloud-based web services are the best option for you. Tools like ML Kit for Firebase looks promising and seems like a good choice for startups or developers just starting out. TensorFlow Lite and Core ML are good options if you have mobile developers on your team or if you’re willing to get your hands a little dirty. Artificial Intelligence is influencing the app development process by providing us a data-driven approach for solving user problems. It wouldn't be surprising if in the near future Artificial Intelligence becomes a forerunning factor for app developers in their expertise and creativity. 10 useful Google Cloud Artificial Intelligence services for your next machine learning project [Tutorial] How Artificial Intelligence is going to transform the Data Center How Serverless computing is making Artificial Intelligence development easier
Read more
  • 0
  • 0
  • 6069

article-image-4-myths-about-git-and-github-you-should-know-about
Savia Lobo
07 Oct 2018
3 min read
Save for later

4 myths about Git and GitHub you should know about

Savia Lobo
07 Oct 2018
3 min read
With an aim to replace BitKeeper, Linus Torvalds created Git in 2005 to support the development of the Linux kernel. However, Git isn’t necessarily limited to code, any product or project that requires or exhibits characteristics such as having multiple contributors, requiring release management and versioning stands to have an improved workflow through Git. Just as every solution or tool has its own positives and negatives, Git is also surrounded by myths. Alex Magana and Joseph Mul, the authors of Introduction to Git and GitHub course discuss in this post some of the myths about the Git tool and GitHub. Git is GitHub Due to the usage of Git and GitHub as the complete set that forms the version control toolkit, adopters of the two tools misconceive Git and GitHub as interchangeable tools. Git is a tool that offers the ability to track changes on files that constitute a project. Git offers the utility that is used to monitor changes and persists the changes. On the other hand, GitHub is akin to a website hosting service. The difference here is that with GitHub, the hosted content is a repository. The repository can then be accessed from this central point and the codebase shared. Backups are equivalent to version control This emanates from a misunderstanding of what version control is and by extension what Git achieves when it’s incorporated into the development workflow. Contrary to archives created based on a team’s backup policy, Git tracks changes made to files and maintains snapshots of a repository at a given point in time. Git is only suitable for teams With the usage of hosting services such as GitHub, the element of sharing and collaboration, may be perceived as a preserve of teams. Git offers gains beyond source control. It lends itself to the delivery of a feature or product from the point of development to deployment. This means that Git is a tool for delivery. It can, therefore, be utilized to roll out functionality and manage changes to source code for teams and individuals alike. To effectively use Git, you need to learn every command to work When working as an individual or a team, the common commands required to allow you to contribute a repository encompass commands for initiating tracking of specific files, persisting changes made to tracked files, reverting changes made to files incorporating changes introduced by other developers working on the same project you are on. The four myths mentioned by the authors provides a clarification on both Git and GitHub and its uses. If you found this post useful, do check out the course titled Introduction to Git and GitHub by Alex and Joseph. GitHub addresses technical debt, now runs on Rails 5.2.1 GitLab 11.3 released with support for Maven repositories, protected environments and more GitLab raises $100 million, Alphabet backs it to surpass Microsoft’s GitHub  
Read more
  • 0
  • 0
  • 3740