Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials

7019 Articles
article-image-chaos-engineering-company-gremlin-launches-scenarios-making-it-easier-to-tackle-downtime-issues
Richard Gall
26 Sep 2019
2 min read
Save for later

Chaos engineering company Gremlin launches Scenarios, making it easier to tackle downtime issues

Richard Gall
26 Sep 2019
2 min read
At the second ChaosConf in San Francisco, Gremlin CEO Kolton Andrus revealed the company's latest step in its war against downtime: 'Scenarios.' Scenarios makes it easy for engineering teams to simulate a common issues that lead to downtime. It's a natural and necessary progression for Gremlin that is seeing even the most forward thinking teams struggling to figure out how to implement chaos engineering in a way that's meaningful to their specific use case. "Since we released Gremlin Free back in February thousands of customers have signed up to get started with chaos engineering," said Andrus. "But many organisations are still struggling to decide which experiments to run in order to avoid downtime and outages." Scenarios, then, is a useful way into chaos engineering for teams that are reticent about taking their first steps. As Andrus notes, it makes it possible to inject failure "with a couple of clicks." What failure scenarios does Scenarios let engineering teams simulate? Scenarios lets Gremlin users simulate common issues that can cause outages. These include: Traffic spikes (think Black Friday site failures) Network failures Region evacuation This provides a great starting point for anyone that wants to stress test their software. Indeed, it's inevitable that these issues will arise at some point so taking advance steps to understand what the consequences could be will minimise their impact - and their likelihood. Why chaos engineering? Over the last couple of years plenty of people have been attempting to answer why chaos engineering? But in truth the reasons are clear: software - indeed, the internet as we know it - is becoming increasingly complex, a mesh of interdependent services and platforms. At the same time, the software being developed today is more critical than ever. For eCommerce sites downtime means money, but for those in IoT and embedded systems world (like self-driving cars, for example), it's sometimes a matter of life and death. This makes Gremlin's Scenarios an incredibly exciting an important prospect - it should end the speculation and debate about whether we should be doing chaos engineering, and instead help the world to simply start doing it. At ChaosConf Andrus said that Gremlin's mission is to build a more reliable internet. We should all hope they can deliver.
Read more
  • 0
  • 0
  • 3779

article-image-reactos-0-4-12-releases-with-kernel-improvements-intel-e1000-nic-driver-support-and-more
Bhagyashree R
25 Sep 2019
2 min read
Save for later

ReactOS 0.4.12 releases with kernel improvements, Intel e1000 NIC driver support, and more

Bhagyashree R
25 Sep 2019
2 min read
Earlier this week, the ReactOS team announced the release of ReactOS 0.4.12. This release comes with a bunch of kernel improvements, Intel e1000 NIC driver support, font improvements, and more. Key updates in ReactOS 0.4.12 Kernel updates The filesystem infrastructure of ReactOS 0.4.12 has received quite a few improvements to enable Microsoft filesystem drivers. The team has also worked on the common cache module that has deep ties to the memory manager. The team has also improved device power management, fixed support for PXE booting, and overhauled the write-protection functionality. Window snapping ReactOS 0.4.12 comes with support for window snapping. So, users will now be able to align windows to sides or maximize and minimize them by dragging in specific directions. This release also comes with the keyboard shortcuts that accompany this feature. Intel e1000 NIC driver ReactOS 0.4.12 has a new driver to support the Network Interface Card (NIC) out of the box. Now, end-users do not need to manually find and install a driver. This new driver will also be compatible with e1000 NICs. Improvements related to font In ReactOS 0.4.12, font rendering is made more robust and correct. This release fixes a series of problems that badly affected text rendering for buttons in a range of applications,  from iTunes to various .NET applications. User-mode DLLs In this release, the team has made a range of improvements to user-mode components. The common controls (comctl) library is used by most of the Windows applications to draw various generic user interface elements. The team has fixed an issue related to it “reading extremely dryly.” Other updates include drivers for MIDI instruments and animated rotation bar in the startup/shutdown dialog. Check out the official announcement to know more about ReactOS 0.4.12 in detail. ReactOS 0.4.11 is now out with kernel improvements, manifests support, and more! Btrfs now boots ReactOS, a free and open-source alternative for Windows NT ReactOS version 0.4.9 released with Self-hosting and FastFAT crash fixes Understanding network port numbers, TCP, UDP, and ICMP on an operating system Google’s secret Operating System ‘Fuchsia’ will run Android Applications: 9to5Google Report  
Read more
  • 0
  • 0
  • 2882

article-image-red-hat-announces-centos-stream-a-developer-forward-distribution-jointly-with-the-centos-project
Savia Lobo
25 Sep 2019
3 min read
Save for later

Red Hat announces CentOS Stream, a “developer-forward distribution” jointly with the CentOS Project

Savia Lobo
25 Sep 2019
3 min read
On September 24, just after the much-awaited CentOS 8 was released, the Red Hat community in agreement with the CentOS Project announced a new model into the CentOS Linux community called, CentOS Stream. CentOS Stream is an upstream development platform for ecosystem developers. It is a single, continuous stream of content with updates several times daily, encompassing the latest and greatest from the RHEL codebase. Also, it is like having a view into what the next version of RHEL will look like, available to a much broader community than just a beta or "preview" release. Chris Wright, Red Hat's CTO, says CentOS Stream is it's "a developer-forward distribution that aims to help community members, Red Hat partners, and others take full advantage of open source innovation within a more stable and predictable Linux ecosystem. It is a parallel distribution to existing CentOS." In the previous CentOS releases developers would not know beforehand about the releases in RHEL. As the CentOS Stream project sits between the Fedora Project and RHEL in the RHEL Development process, it will provide a "rolling preview" of future RHEL kernels and features. This enables developers to stay one or two steps ahead of what’s coming in RHEL. “CentOS Stream is parallel to existing CentOS builds; this means that nothing changes for current users of CentOS Linux and services, even those that begin to explore the newly-released CentOS 8. We encourage interested users that want to be more tightly involved in driving the future of enterprise Linux, however, to transition to CentOS Stream as the new "pace-setting" distribution,” The Red Hat blog states. CentOS Stream is part of Red Hat’s broader focus to engage with communities and developers in a way that better aligns with the modern IT world. A user on Hacker News commented, “I like it, at least in theory. I develop some industrial software that runs on RHEL so being able to run somewhat similar distribution on my machine would be convenient. I tried running CentOS but it was too frustrating and limiting to deal with all the outdated packages on a dev machine. I suppose it will also be good for devs who just like the RHEL environment but don't need a super stable, outdated packages.” Another user commented, “I wonder what future Fedora will have if this new CentOS Stream will be stable enough for developer daily driver. 6 month release cycle of Fedora always felt awkwardly in-between, not having the stability of lts nor the continuity of rolling. I guess lot depends on details on how the packages flow to CentOS Stream, do they come from released Fedora versions or rawhide etc.” To know more about CentOS Stream in detail, read Red Hat’s official blog post. After RHEL 8 release, users awaiting the release of CentOS 8 After Red Hat, Homebrew removes MongoDB from core formulas due to its Server Side license adoption Red Hat announces the general availability of Red Hat OpenShift Service Mesh Introducing ESPRESSO, an open-source, PyTorch based, end-to-end neural automatic speech recognition (ASR) toolkit for distributed training across GPUs .NET Core 3.0 is now available with C# 8, F# 4.7, ASP.NET Core 3.0 and general availability of EF Core 3.0 and EF 6.3
Read more
  • 0
  • 0
  • 2755

article-image-can-a-modified-mit-hippocratic-license-to-restrict-misuse-of-open-source-software-prompt-a-wave-of-ethical-innovation-in-tech
Savia Lobo
24 Sep 2019
5 min read
Save for later

Can a modified MIT ‘Hippocratic License’ to restrict misuse of open source software prompt a wave of ethical innovation in tech?

Savia Lobo
24 Sep 2019
5 min read
Open source licenses allow software to be freely distributed, modified, and used. These licenses give developers an additional advantage of allowing others to use their software as per their own rules and conditions. Recently, software developer and open-source advocate Coraline Ada Ehmke has caused a stir in the software engineering community with ‘The Hippocratic License.’ Ehmke was also the original author of Contributor Covenant, a “code of conduct" for open source projects that encourages participants to use inclusive language and to refrain from personal attacks and harassment. In a tweet posted in September last year, following the code of conduct, she mentioned, “40,000 open source projects, including Linux, Rails, Golang, and everything OSS produced by Google, Microsoft, and Apple have adopted my code of conduct.” [box type="shadow" align="" class="" width=""]The term ‘Hippocratic’ is derived from the Hippocratic Oath, the most widely known of Greek medical texts. The Hippocratic Oath in literal terms requires a new physician to swear upon a number of healing gods that he will uphold a number of professional ethical standards.[/box] Ehmke explained the license in more detail in a post published on Sunday. In it, she highlights how the idea that writing software with the goals of clarity, conciseness, readability, performance, and elegance are limiting, and potentially dangerous.“All of these technologies are inherently political,” she writes. “There is no neutral political position in technology. You can’t build systems that can be weaponized against marginalized people and take no responsibility for them.”The concept of the Hippocratic license is relatively simple. In a tweet, Ehmke said that it “specifically prohibits the use of open-source software to harm others.” Open source software and the associated harm Out of the many privileges that open source software allows such as free redistribution of the software as well as the source code, the OSI also defines there is no discrimination against who uses it or where it will be put to use. A few days ago, a software engineer, Seth Vargo pulled his open-source software, Chef-Sugar, offline after finding out that Chef (a popular open source DevOps company using the software) had recently signed a contract selling $95,000-worth of licenses to the US Immigrations and Customs Enforcement (ICE), which has faced widespread condemnation for separating children from their parents at the U.S. border and other abuses. Vargo took down the Chef Sugar library from both GitHub and RubyGems, the main Ruby package repository, as a sign of protest. In May, this year, Mijente, an advocacy organization released documents stating that Palantir was responsible for the 2017 ICE operation that targeted and arrested family members of children crossing the border alone. Also, in May 2018, Amazon employees, in a letter to Jeff Bezos, protested against the sale of its facial recognition tech to Palantir where they “refuse to contribute to tools that violate human rights”, citing the mistreatment of refugees and immigrants by ICE. Also, in July, the WYNC revealed that Palantir’s mobile app FALCON was being used by ICE to carry out raids on immigrant communities as well as enable workplace raids in New York City in 2017. Founder of OSI responds to Ehmke’s Hippocratic License Bruce Perens, one of the founders of the Open Source movement in software, responded to Ehmke in a post titled “Sorry, Ms. Ehmke, The “Hippocratic License” Can’t Work” . “The software may not be used by individuals, corporations, governments, or other groups for systems or activities that actively and knowingly endanger harm, or otherwise threaten the physical, mental, economic, or general well-being of underprivileged individuals or groups,” he highlights in his post. “The terms are simply far more than could be enforced in a copyright license,” he further adds.  “Nobody could enforce Ms. Ehmke’s license without harming someone, or at least threatening to do so. And it would be easy to make a case for that person being underprivileged,”  he continued. He concluded saying that, though the terms mentioned in Ehmke’s license were unagreeable, he will “happily support Ms. Ehmke in pursuit of legal reforms meant to achieve the protection of underprivileged people.” Many have welcomed Ehmke's idea of an open source license with an ethical clause. However, the license is not OSI approved yet and chances are slim after Perens’ response. There are many users who do not agree with the license. Reaching a consensus will be hard. https://twitter.com/seannalexander/status/1175853429325008896 https://twitter.com/AdamFrisby/status/1175867432411336704 https://twitter.com/rishmishra/status/1175862512509685760 Even though developers host their source code on open source repositories, a license may bring certain level of restrictions on who is allowed to use the code. However, as Perens mentions, many of the terms in Ehmke’s license hard to implement. Irrespective of the outcome of this license’s approval process, Coraline Ehmke has widely opened up the topic of the need for long overdue FOSS licensing reforms in the open source community. It would be interesting to see if such a license would boost ethical reformation by giving more authority to the developers in imbibing their values and preventing the misuse of their software. Read the Hippocratic license to know more in detail. Other interesting news Tech ImageNet Roulette: New viral app trained using ImageNet exposes racial biases in artificial intelligent system Machine learning ethics: what you need to know and what you can do Facebook suspends tens of thousands of apps amid an ongoing investigation into how apps use personal data
Read more
  • 0
  • 0
  • 5433
Banner background image

article-image-introducing-weld-a-runtime-written-in-rust-and-llvm-for-cross-library-optimizations
Bhagyashree R
24 Sep 2019
5 min read
Save for later

Introducing Weld, a runtime written in Rust and LLVM for cross-library optimizations

Bhagyashree R
24 Sep 2019
5 min read
Weld is an open-source Rust project for improving the performance of data-intensive applications. It is an interface and runtime that can be integrated into existing frameworks including Spark, TensorFlow, Pandas, and NumPy without changing their user-facing APIs. The motivation behind Weld Data analytics applications today often require developers to combine various functions from different libraries and frameworks to accomplish a particular task. For instance, a typical Python ecosystem application selects some data using Spark SQL, transforms it using NumPy and Pandas, and trains a model with TensorFlow. This improves developers’ productivity as they are taking advantage of functions from high-quality libraries. However, these functions are usually optimized in isolation, which is not enough to achieve the best application performance. Weld aims to solve this problem by providing an interface and runtime that can optimize across data-intensive libraries and frameworks while preserving their user-facing APIs. In an interview with Federico Carrone, a Tech Lead at LambdaClass, Weld’s main contributor, Shoumik Palkar shared, “The motivation behind Weld is to provide bare-metal performance for applications that rely on existing high-level APIs such as NumPy and Pandas. The main problem it solves is enabling cross-function and cross-library optimizations that other libraries today don’t provide.” How Weld works Weld serves as a common runtime that allows libraries from different domains like SQL and machine learning to represent their computations in a common functional intermediate representation (IR). This IR is then optimized by a compiler optimizer and JIT’d to efficient machine code for diverse parallel hardware. It performs a wide range of optimizations on the IR including loop fusion, loop tiling, and vectorization. “Weld’s IR is natively parallel, so programs expressed in it can always be trivially parallelized,” said Palkar. When Weld was first introduced it was mainly used for cross-library optimizations. However, over time people have started to use it for other applications as well. It can be used to build JITs or new physical execution engines for databases or analytics frameworks, individual libraries, target new kinds of parallel hardware using the IR, and more. To evaluate Weld’s performance the team integrated it with popular data analytics frameworks including Spark, NumPy, and TensorFlow. This prototype showed up to 30x improvements over the native framework implementations. While cross library optimizations between Pandas and NumPy also improved performance by up to two orders of magnitude. Source: Weld Why Rust and LLVM were chosen for its implementation The first iteration of Weld was implemented in Scala because of its algebraic data types, powerful pattern matching, and large ecosystem. However, it did have some shortcomings. Palkar shared in the interview, “We moved away from Scala because it was too difficult to embed a JVM-based language into other runtimes and languages.” It had a managed runtime, clunky build system, and its JIT compilations were quite slow for larger programs. Because of these shortcomings the team wanted to redesign the JIT compiler, core API, and runtime from the ground up. They were in the search for a language that was fast, safe, didn’t have a managed runtime, provided a rich standard library, functional paradigms, good package manager, and great community background. So, they zeroed-in on Rust that happens to meet all these requirements. Rust provides a very minimal, no setup required runtime. It can be easily embedded into other languages such as Java and Python. To make development easier, it has high-quality packages, known as crates, and functional paradigms such as pattern matching. Lastly, it is backed by a great Rust Community. Read also: “Rust is the future of systems programming, C is the new Assembly”: Intel principal engineer, Josh Triplett Explaining the reason why they chose LLVM, Palkar said in the interview, “We chose LLVM because its an open-source compiler framework that has wide use and support; we generate LLVM directly instead of C/C++ so we don’t need to rely on the existence of a C compiler, and because it improves compilation times (we don’t need to parse C/C++ code).” In a discussion on Hacker News many users listed other Weld-like projects that developers may find useful. A user commented, “Also worth checking out OmniSci (formerly MapD), which features an LLVM query compiler to gain large speedups executing SQL on both CPU and GPU.” Users also talked about Numba, an open-source JIT compiler that translates Python functions to optimized machine code at runtime with the help of the LLVM compiler library.  “Very bizarre there is no discussion of numba here, which has been around and used widely for many years, achieves faster speedups than this, and also emits an LLVM IR that is likely a much better starting point for developing a “universal” scientific computing IR than doing yet another thing that further complicates it with fairly needless involvement of Rust,” a user added. To know more about Weld, check out the full interview on Medium. Also, watch this RustConf 2019 talk by Shoumik Palkar: https://www.youtube.com/watch?v=AZsgdCEQjFo&t Other news in Programming Darklang available in private beta GNU community announces ‘Parallel GCC’ for parallelism in real-world compilers TextMate 2.0, the text editor for macOS releases  
Read more
  • 0
  • 0
  • 4893

article-image-machine-learning-ethics-what-you-need-to-know-and-what-you-can-do
Richard Gall
23 Sep 2019
10 min read
Save for later

Machine learning ethics: what you need to know and what you can do

Richard Gall
23 Sep 2019
10 min read
Ethics is, without a doubt, one of the most important topics to emerge in machine learning and artificial intelligence over the last year. While the reasons for this are complex, it nevertheless underlines that the area has reached technological maturity. After all, if artificial intelligence systems weren’t having a real, demonstrable impact on wider society, why would anyone be worried about its ethical implications? It’s easy to dismiss the debate around machine learning and artificial intelligence as abstract and irrelevant to engineers’ and developers’ immediate practical concerns. However this is wrong. Ethics needs to be seen as an important practical consideration for anyone using and building machine learning systems. If we fail to do so the consequences could be serious. The last 12 months has been packed with stories of artificial intelligence not only showing theoretical bias, but also causing discriminatory outcomes in the real world. Amazon scrapped its AI tool for hiring last October because it showed significant bias against female job applicants. Even more recently, last month it emerged that algorithms built to detect hate speech online have in-built biases against black people. Although these might seem like edge cases, it’s vital that everyone in the industry takes responsibility. This isn’t something we can leave up to regulation or other organizations the people who can really affect change are the developers and engineers on the ground. It’s true that machine learning and artificial intelligence systems will be operating in ways where ethics isn’t really an issue - and that’s fine. But by focusing on machine learning ethics, and thinking carefully about the impact of your work you will ultimately end up building better systems that are more robust and have better outcomes. So with that in mind, let’s look at the practical ways to start thinking about ethics in machine learning and artificial intelligence. Machine learning ethics and bias The first step towards thinking seriously about ethics in machine learning is to think about bias. Once you are aware of how bias can creep into machine learning systems, and how that can have ethical implications, it becomes much easier to identify issues and make changes - or, even better, stop them before they arise. Bias isn’t strictly an ethical issue. It could be a performance issue that’s affecting the effectiveness of your system. But in the conversation around AI and machine learning ethics, it’s the most practical way of starting to think seriously about the issue. Types of machine learning and algorithmic bias Although there are a range of different types of bias, the best place to begin is with two top level concepts. You may have read lists of numerous different biases, but for the purpose of talking about ethics there are two important things to think about. Pre-existing and data set biases Pre-existing biases are embedded in the data on which we choose to train algorithms. While it’s true that just about every data set will be ‘biased’ in some way (data is a representation, after all - there will always be something ‘missing), the point here is that we need to be aware of the extent of the bias and the potential algorithmic consequences. You might have heard terms like ‘sampling bias’, ‘exclusion bias’ and ‘prejudice bias’ - these aren’t radically different. They all result from pre-existing biases about how a data set looks or what it represents. Technical and contextual biases Technical machine learning bias is about how an algorithm is programmed. It refers to the problems that arise when an algorithm is built to operate in a specific way. Essentially, it occurs when the programmed elements of an algorithm fail to properly account for the context in which it is being used. A good example is the plagiarism checker Turnitin - this used an algorithm that was trained to identify strings of texts, which meant it would target non-native English speakers over English speaking ones, who were able to make changes to avoid detection. Although there are, as I’ve said, many different biases in the field of machine learning, by thinking about the data on which your algorithm is trained and the context in which the system is working, you will be in a much better place to think about the ethical implications of your work. Equally, you will also be building better systems that don’t cause unforeseen issues. Read next: How to learn data science: from data mining to machine learning The importance of context in machine learning The most important thing for anyone working in machine learning and artificial intelligence is context. Put another way, you need to have a clear sense of why you are trying to do something and what the possible implications could be. If this is unclear, think about it this way: when you use an algorithm, you’re essentially automating away decision making. That’s a good thing when you want to make lots of decisions at a huge scale. But the one thing you lose when turning decision making into a mathematical formula is context. The decisions an algorithm makes lack context because it is programmed to react in a very specific way. This means contextual awareness is your problem. That’s part of the bargain of using an algorithm. Context in data collection Let’s look at what thinking about context means when it comes to your data set. Step 1: what are you trying to achieve? Essentially, the first thing you’ll want to consider is what you’re trying to achieve. Do you want to train an algorithm to recognise faces? Do you want it to understand language in some way? Step 2: why are you doing this? What’s the point of doing what you’re doing? Sometimes this might be a straightforward answer, but be cautious if the answer is too easy to answer. Making something work more efficiently or faster isn’t really a satisfactory reason. What’s the point of making something more efficient? This is often where you’ll start to see ethical issues emerge more clearly. Sometimes they’re not easily resolved. You might not even be in a position to resolve them yourself (if you’re employed by a company, after all, you’re quite literally contracted to perform a specific task). But even if you do feel like there’s little room to maneuver, it’s important to ensure that these discussions actually take place and that you consider the impact of an algorithm. That will make it easier for you to put safeguarding steps in place. Step 3: Understanding the data set Think about how your data set fits alongside the what and the why. Is there anything missing? How was the data collected? Could it be biased or skewed in some way? Indeed, it might not even matter. But if it does, it’s essential that you pay close attention to the data you’re using. It’s worth recording any potential limitations or issues, so if a problem arises at a later stage in your machine learning project, the causes are documented and visible to others. The context of algorithm implementation The other aspect of thinking about context is to think carefully about how your machine learning or artificial intelligence system is being implemented. Is it working how you thought it would? Is it showing any signs of bias? Many articles about the limitations of artificial intelligence and machine learning ethics cite the case of Microsoft’s Tay. Tay was a chatbot that ‘learned’ from its interactions with users on Twitter. Built with considerable naivety, Twitter users turned Tay racist in a matter of days. Users ‘spoke’ to Tay using racist language, and because Tay learned through interactions with Twitter users, the chatbot quickly became a reflection of the language and attitudes of those around it. This is a good example of how the algorithm’s designers didn’t consider how the real-world implementation of the algorithm would have a negative consequence. Despite, you’d think, the best of intentions, the developers didn’t have the foresight to consider the reality of the world into which they were releasing their algorithmic progeny. Read next: Data science vs. machine learning: understanding the difference and what it means today Algorithmic impact assessments It’s true that ethics isn’t always going to be an urgent issue for engineers. But in certain domains, it’s going to be crucial, particularly in public services and other aspects of government, like justice. Maybe there should be a debate about whether artificial intelligence and machine learning should be used in those contexts at all. But if we can’t have that debate, at the very least we can have tools that help us to think about the ethical implications of the machine learning systems we build. This is where Algorithmic Impact Assessments come in. The idea was developed by the AI Now institute and outlined in a paper published last year, and was recently implemented by the Canadian government. There’s no one way to do an algorithmic impact assessment - the Canadian government uses a questionnaire “designed to help you assess and mitigate the risks associated with deploying an automated decision system.” This essentially provides a framework for those using and building algorithms to understand the scope of their project and to identify any potential issues or problems that could arise. Tools for assessing bias and supporting ethical engineering However, although algorithmic impact assessments can provide you with a solid conceptual grounding for thinking about the ethical implications of artificial intelligence and machine learning systems, there are also a number of tools that can help you better understand the ways in which algorithms could be perpetuating biases or prejudices. One of these is FairML, “an end-to- end toolbox for auditing predictive models by quantifying the relative significance of the model's inputs” - helping engineers to identify the extent to which algorithmic inputs could cause harm or bias - while another is LIME (Local Interpretable Model Agnostic Explanations). LIME is not dissimilar to FairML. it aims to understand why an algorithm makes the decisions it does by ‘perturbing’ inputs and seeing how this affects its outputs. There’s also Deon, which is a lot like a more lightweight, developer-friendly version of an algorithmic assessment impact. It’s a command line tool that allows you to add an ethics checklist to your projects. All these tools underline some of the most important elements in the fight for machine learning ethics. FairML and LIME are both attempting to make interpretability easier, while Deon is making it possible for engineers to bring a holistic and critical approach directly into their day to day work. It aims to promote transparency and improve communication between engineers and others. The future of artificial intelligence and machine learning depends on developers taking responsibility Machine learning and artificial intelligence are hitting maturity. They’re technologies that are now, after decades incubated in computer science departments and military intelligence organizations, transforming and having an impact in a truly impressive range of domains. With this maturity comes more responsibility. Ethical questions arise as machine learning affects change everywhere, spilling out into everything from marketing to justice systems. If we can’t get machine learning ethics right, then we’ll never properly leverage the benefits of artificial intelligence and machine learning. People won’t trust it and legislation will start to severely curb what it can do. It’s only by taking responsibility for its effects and consequences that we can be sure it will not only have a transformative impact on the world, but also one that’s safe and for the benefit of everyone.
Read more
  • 0
  • 0
  • 14245
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-quarkus-brings-java-into-the-modern-world-of-enterprise-tech
Guest Contributor
22 Sep 2019
6 min read
Save for later

How Quarkus brings Java into the modern world of enterprise tech

Guest Contributor
22 Sep 2019
6 min read
What is old is new again, even - and maybe especially - in the world of technology. To name a few milestones that are being celebrated this year: Java is roughly 25 years old, it is the 10th anniversary of Minecraft, and Nintendo is back in vogue. None of these three examples are showing signs of slowing down anytime soon. I would argue that they are continuing to be leaders in innovation because of the simple fact that there are still people behind them that are creatively breathing new life into what otherwise could have been “been there, done that” technologies. With Java, in particular, it is so widely used, that from an enterprise efficiency perspective, it simply does not make sense NOT to have Java be a key language in the development of emerging tech apps. In fact, more and more apps are being developed with a Java-first approach. But, how can this be done, especially when apps are being built using new architectures like serverless and microservices? One technology that shows promise is Quarkus, a newly introduced Kubernetes-native platform that addresses many of the barriers hindering Java’s ability to shine in the modern world of emerging tech. Why does Java still matter Even though its continued relevance has been questioned for years, I believe Java still matters and is not likely to go away anytime soon. This is because of two reasons. First, there is a  whole list of programming languages that are based on Java and the Java Virtual Machine (JVM), such as Kotlin, Groovy, Clojure and JRuby. Also, Java continues to be one of the most popular programming languages for Android apps, as well as for the development of edge devices and the internet of things. In fact, according to SlashData’s State of the Developer Nation Q4 2018 report, there are 7.6 million active Java developers worldwide. Other factors that I think are contributing to Java’s continued popularity include network portability, the fact that it is object-oriented, converts data to bytecode so that it can be read and run on any platform which has a JVM installed, and, maybe most importantly, has a syntax similar to C++, making it a relatively easy language for developers to learn. Additionally, SlashData’s research suggested that newer and niche languages do not seem to be adding many new developers, if any, per year, begging the question of whether or not it is easy for newer languages to scale beyond their niche and become the next big thing. It also makes it clear that while there is value for newer programming languages that do not serve as wide a purpose, they may not be able to or need to overtake languages like Java. In fact, the success of Java relies on the entire ecosystem surrounding it, including the editors, third party libraries, CI/CD pipelines, and systems. Each aspect of the ecosystem is something that is so easy to take for granted in established languages but are things that need to be created from scratch in new languages if they want to catch up to or overtake Java. How Quarkus brings Java into modern enterprise tech Quarkus is more than just a cool name. It is a Kubernetes Native Java framework that is tailored for GraalVM and HotSpot, and crafted by best-of-breed Java libraries and standards. The overall goal of Quarkus is to make Java one of the leading platforms in Kubernetes and serverless environments, while also enabling developers to work within what they know and in a reactive and imperative programming model. Put simply, Quarkus works to bring Java into the modern microservices and serverless modes of developing. This is important because Java continues to be a top programming language for back-end enterprise developers. Many organizations have tied both time and money into Java, which has been a dominant force in the development landscape for a number of years. As enterprises increasingly shift toward cloud computing, it is important for Java to carry over into these new programming methods. Why a “Java First” approach Java has been a top programming language for enterprises for over a decade. We should not lose sight of that fact, and that there are many developers with excellent Java skills, as well as existing applications that run on Java. Furthermore, because Java has been around so long it has not only matured as a language but also as an ecosystem. There are editors, logging systems, debuggers, build systems, unit testing environments, QA testing environments, and more--all tuned for Java, if not also implemented in Java. Therefore, when starting a new Java application it can be easier to find third-party components or entire systems that can help the developer gain productivity advancements over other languages that have not yet grown to have the breadth and depth of the Java ecosystem. Using a full-stack framework such as Quarkus, and taking advantage of libraries that use Java, such as Eclipse MicroProfile and Eclipse Vert.x, makes this easier, and also encourages the use of different combinations of tools and dependencies. With Quarkus in particular, it also includes an extension framework that third party authors can use to build native executables and expand the functionality of Java in the enterprise. Quarkus not only brings Java into the modern world of containers, but it also does so quickly with short start-up times. Java is not looking like it will go away anytime soon. Between the number of developers who still use Java as their first language and the number of apps that run almost entirely from it, Java’s take in the game is as solid as ever. Through new tools like Quarkus, it can continue to evolve in the modern app dev world. Author Bio Mark Little works at RedHat where he leads the JBoss Technical Direction and research & development. Prior to this, he was SOA Technical Development Manager and Director of Standards. He also has experience with two successful startup companies. Other interesting news in Tech Media manipulation by Deepfakes and cheap fakes require both AI and social fixes, finds a Data and Society report. Open AI researchers advance multi-agent competition by training AI agents in a hide and seek environment. France and Germany reaffirm blocking Facebook’s Libra cryptocurrency
Read more
  • 0
  • 0
  • 4865

article-image-bad-metadata-can-get-you-in-legal-hot-water
Guest Contributor
21 Sep 2019
6 min read
Save for later

Bad Metadata can get you in legal hot water

Guest Contributor
21 Sep 2019
6 min read
Metadata isn't just something that concerns business intelligence and IT teams; but lawyers are extremely interested in it as well. Metadata, it turns out, can win or lose lawsuits, send politicians to jail, and even decide medical malpractice cases. It's not uncommon for attorneys who conduct discovery of electronic records in organizations to find that the claims of plaintiffs or defendants are contradicted by metadata, like time and date, type of data, etc. If a discovery process is initiated against them, an organization had better be sure that its metadata is in order. All it would take for an organization to lose a case would be for an attorney to discover a discrepancy in different databases – a different time stamp on some communication, a different job title for a principal in the case. Such discrepancies could lead to accusations of data tampering, fraud, or worse – and would most definitely put the organization in a very tough position versus a judge or jury. Metadata errors are difficult to spot The problem with that, of course, is that catching metadata errors is extremely difficult. In large organizations, data is stored in repositories that are spread throughout the organization, maybe even the world – in different departments. Each department is responsible for maintaining its own database, and the metadata in it; and on different cloud storage repositories, which may have their own system of classifying data. An enterprising attorney could have a field day with the different categories and tags data is stored under, making claims that the organization is trying to “hide something.” The organization's only defense: We're poor administrators. That may not be enough to impress the court. Types of Metadata Metadata is “data about data,” and comes in three flavors: System Metadata, which is data that is automatically generated from the computer and includes specific labeled criteria, like the date and time of creation and date a document was modified, etc. Substantive Metadata reflects changes to a document, like tracked changes. Embedded metadata is data entered into a document or file but not normally visible, such as formulas in cells in an Excel spreadsheet. All of these have increasingly become targets for attorneys in recent years. Metadata has been used in thousands of cases – medical, financial, patent and trademark law, product liability, civil rights, and many more. Metadata is both discoverable and admissible as evidence. According to one New York court, “General information about the creation of a document, including who authored a document and when it was created, is pedigree information often important for purposes of determining admissibility at trial.” According to legal experts, “from a legal standpoint metadata is evidence… that describes the characteristics, origins, usage, and validity of other electronic evidence.” The biggest metadata-linked payout until now - $10.8 million – occurred in 2017, when a jury awarded a plaintiff $8 million (eventually this was increased to nearly $11 million) after claiming he was fired from a biotechnology company after telling authorities about potential bribery in China. The key piece of evidence was the metadata timestamp on a performance review that was written after the plaintiff was fired; with that evidence, the court increased the defendant's payout for violating laws against firing whistleblowers. In that case, records claiming that the employee was fired for cause were belied by the metadata in the performance review. That, of course, was a case in which there was clear wrongdoing by an organization. But the same metadata errors could have cropped up in any number of scenarios, even if no laws were broken. The precedent in this case, and others like it, might be enough to convince the court to penalize an organization based on claims of a plaintiff. How can organizations defend themselves from this legal bind of metadata The answer would seem obvious; get control of your metadata and make sure it corresponds to the data it represents. With that kind of control over data, organizations would discover for themselves if something was amiss that could cost them in a settlement later. But execution of that obvious answer is a different story. With reams of data to pore through, it would take an organization's business intelligence team months, or even years, to manually sift through the databases. And because to err is human, there would be no guarantee they hadn't missed something. Clearly Business Intelligence and Data Analysis teams need some help in doing this. One solution would be to hire more staff, expanding teams at least temporarily to make sense of the data and metadata that could prove problematic. There are services that will lend their staff to an organization to do just that, and for companies that prefer the “human touch,” adding that temporary staff may be the best solution. Another idea is to automate the process, with advanced tools that will do a full examination of data, both across systems and within repositories themselves. Such automated tools would examine the data in the various repositories and find where the metadata for the same information is different – pointing BI teams in the right direction and cutting down on the time needed to determine what needs to be fixed. Using automated metadata management tools, companies can ensure that they remain secure. If a company is being sued and discovery has commenced, it will be too late for the organization to fix anything. Honest mistakes or disorganized file keeping can no longer be corrected, and the fate of the organization will be in the hands of a jury or a judge. Automated metadata management tools can help Business Intelligence and Data Analysis teams figure out which metadata entries are not consistent across the repositories, ensuring that things are fixed before discovery takes place. There are a variety of tools on the market, with various strengths and weaknesses. Companies will need to decide whether a data dictionary, a business glossary, or a more all-encompassing product best answers their needs. They’ll also need to make sure the enterprise software they currently use is supported by the metadata management solution they are after. As the market develops, AI will be a huge distinguishing factor between metadata solutions, as machine learning will reduce the cost and manpower investment of solution onboarding significantly. With the success of recent metadata-based lawsuits, you can be sure more attorneys will be using metadata in their discovery processes. Organizations that want to defend themselves need to get their data in order, and ensure that they won't end up losing lots of money because of their own errors. Author Bio Amnon Drori is the Co-Founder and CEO of Octopai and has over 20 years of leadership experience in technology companies. Before co-founding Octopai he led sales efforts at companies like Panaya (Acquired by Infosys), Zend Technologies (Acquired by Rogue Wave Software), ModusNovo and Alvarion. Other interesting news in Tech Media manipulation by Deepfakes and cheap fakes require both AI and social fixes, finds a Data and Society report. Open AI researchers advance multi-agent competition by training AI agents in a hide and seek environment. France and Germany reaffirm blocking Facebook’s Libra cryptocurrency
Read more
  • 0
  • 0
  • 4943

article-image-how-to-handle-categorical-data-for-machine-learning-algorithms
Packt Editorial Staff
20 Sep 2019
9 min read
Save for later

How to handle categorical data for machine learning algorithms

Packt Editorial Staff
20 Sep 2019
9 min read
The quality of data and the amount of useful information are key factors that determine how well a machine learning algorithm can learn. Therefore, it is absolutely critical that we make sure to encode categorical variables correctly, before we feed data into a machine learning algorithm. In this article, with simple yet effective examples we will explain how to deal with categorical data in computing machine learning algorithms and how we to map ordinal and nominal feature values to integer representations. The article is an excerpt from the book Python Machine Learning - Third Edition by Sebastian Raschka and Vahid Mirjalili. This book is a comprehensive guide to machine learning and deep learning with Python. It acts as both a clear step-by-step tutorial, and a reference you’ll keep coming back to as you build your machine learning systems.  It is not uncommon that real-world datasets contain one or more categorical feature columns. When we are talking about categorical data, we have to further distinguish between nominal and ordinal features. Ordinal features can be understood as categorical values that can be sorted or ordered. For example, t-shirt size would be an ordinal feature, because we can define an order XL > L > M. In contrast, nominal features don't imply any order and, to continue with the previous example, we could think of t-shirt color as a nominal feature since it typically doesn't make sense to say that, for example, red is larger than blue. Categorical data encoding with pandas Before we explore different techniques to handle such categorical data, let's create a new DataFrame to illustrate the problem: >>> import pandas as pd >>> df = pd.DataFrame([ ...            ['green', 'M', 10.1, 'class1'], ...            ['red', 'L', 13.5, 'class2'], ...            ['blue', 'XL', 15.3, 'class1']]) >>> df.columns = ['color', 'size', 'price', 'classlabel'] >>> df color  size price  classlabel 0   green     M 10.1     class1 1     red   L 13.5      class2 2    blue   XL 15.3      class1 As we can see in the preceding output, the newly created DataFrame contains a nominal feature (color), an ordinal feature (size), and a numerical feature (price) column. The class labels (assuming that we created a dataset for a supervised learning task) are stored in the last column. Mapping ordinal features To make sure that the learning algorithm interprets the ordinal features correctly, we need to convert the categorical string values into integers. Unfortunately, there is no convenient function that can automatically derive the correct order of the labels of our size feature, so we have to define the mapping manually. In the following simple example, let's assume that we know the numerical difference between features, for example, XL = L + 1 = M + 2: >>> size_mapping = { ...                 'XL': 3, ...                 'L': 2, ...                 'M': 1} >>> df['size'] = df['size'].map(size_mapping) >>> df color  size price  classlabel 0   green     1 10.1     class1 1     red   2 13.5      class2 2    blue     3 15.3     class1 If we want to transform the integer values back to the original string representation at a later stage, we can simply define a reverse-mapping dictionary inv_size_mapping = {v: k for k, v in size_mapping.items()} that can then be used via the pandas map method on the transformed feature column, similar to the size_mapping dictionary that we used previously. We can use it as follows: >>> inv_size_mapping = {v: k for k, v in size_mapping.items()} >>> df['size'].map(inv_size_mapping) 0   M 1   L 2   XL Name: size, dtype: object Encoding class labels Many machine learning libraries require that class labels are encoded as integer values. Although most estimators for classification in scikit-learn convert class labels to integers internally, it is considered good practice to provide class labels as integer arrays to avoid technical glitches. To encode the class labels, we can use an approach similar to the mapping of ordinal features discussed previously. We need to remember that class labels are not ordinal, and it doesn't matter which integer number we assign to a particular string label. Thus, we can simply enumerate the class labels, starting at 0: >>> import numpy as np >>> class_mapping = {label:idx for idx,label in ...                  enumerate(np.unique(df['classlabel']))} >>> class_mapping {'class1': 0, 'class2': 1} Next, we can use the mapping dictionary to transform the class labels into integers: >>> df['classlabel'] = df['classlabel'].map(class_mapping) >>> df     color  size price  classlabel 0   green     1 10.1         0 1     red   2 13.5           1 2    blue     3 15.3           0 We can reverse the key-value pairs in the mapping dictionary as follows to map the converted class labels back to the original string representation: >>> inv_class_mapping = {v: k for k, v in class_mapping.items()} >>> df['classlabel'] = df['classlabel'].map(inv_class_mapping) >>> df     color  size price  classlabel 0   green     1 10.1     class1 1     red   2 13.5      class2 2    blue     3 15.3     class1 Alternatively, there is a convenient LabelEncoder class directly implemented in scikit-learn to achieve this: >>> from sklearn.preprocessing import LabelEncoder >>> class_le = LabelEncoder() >>> y = class_le.fit_transform(df['classlabel'].values) >>> y array([0, 1, 0]) Note that the fit_transform method is just a shortcut for calling fit and transform separately, and we can use the inverse_transform method to transform the integer class labels back into their original string representation: >>> class_le.inverse_transform(y) array(['class1', 'class2', 'class1'], dtype=object) Performing a technique ‘one-hot encoding’ on nominal features In the Mapping ordinal features section, we used a simple dictionary-mapping approach to convert the ordinal size feature into integers. Since scikit-learn's estimators for classification treat class labels as categorical data that does not imply any order (nominal), we used the convenient LabelEncoder to encode the string labels into integers. It may appear that we could use a similar approach to transform the nominal color column of our dataset, as follows: >>> X = df[['color', 'size', 'price']].values >>> color_le = LabelEncoder() >>> X[:, 0] = color_le.fit_transform(X[:, 0]) >>> X array([[1, 1, 10.1],        [2, 2, 13.5],        [0, 3, 15.3]], dtype=object) After executing the preceding code, the first column of the NumPy array X now holds the new color values, which are encoded as follows: blue = 0 green = 1 red = 2 If we stop at this point and feed the array to our classifier, we will make one of the most common mistakes in dealing with categorical data. Can you spot the problem? Although the color values don't come in any particular order, a learning algorithm will now assume that green is larger than blue, and red is larger than green. Although this assumption is incorrect, the algorithm could still produce useful results. However, those results would not be optimal. A common workaround for this problem is to use a technique called one-hot encoding. The idea behind this approach is to create a new dummy feature for each unique value in the nominal feature column. Here, we would convert the color feature into three new features: blue, green, and red. Binary values can then be used to indicate the particular color of an example; for example, a blue example can be encoded as blue=1, green=0, red=0. To perform this transformation, we can use the OneHotEncoder that is implemented in scikit-learn's preprocessing module: >>> from sklearn.preprocessing import OneHotEncoder >>> X = df[['color', 'size', 'price']].values >>> color_ohe = OneHotEncoder() >>> color_ohe.fit_transform(X[:, 0].reshape(-1, 1)).toarray()  array([[0., 1., 0.],            [0., 0., 1.],            [1., 0., 0.]]) Note that we applied the OneHotEncoder to a single column (X[:, 0].reshape(-1, 1))) only, to avoid modifying the other two columns in the array as well. If we want to selectively transform columns in a multi-feature array, we can use the ColumnTransformer that accepts a list of (name, transformer, column(s)) tuples as follows: >>> from sklearn.compose import ColumnTransformer >>> X = df[['color', 'size', 'price']].values >>> c_transf = ColumnTransformer([ ...     ('onehot', OneHotEncoder(), [0]), ...     ('nothing', 'passthrough', [1, 2]) ... ]) >>> c_transf.fit_transform(X) .astype(float)     array([[0.0, 1.0, 0.0, 1, 10.1],            [0.0, 0.0, 1.0, 2, 13.5],            [1.0, 0.0, 0.0, 3, 15.3]]) In the preceding code example, we specified that we only want to modify the first column and leave the other two columns untouched via the 'passthrough' argument. An even more convenient way to create those dummy features via one-hot encoding is to use the get_dummies method implemented in pandas. Applied to a DataFrame, the get_dummies method will only convert string columns and leave all other columns unchanged: >>> pd.get_dummies(df[['price', 'color', 'size']])     price  size color_blue  color_green color_red 0    10.1     1   0 1          0 1    13.5     2   0 0          1 2    15.3     3   1 0          0 When we are using one-hot encoding datasets, we have to keep in mind that it introduces multicollinearity, which can be an issue for certain methods (for instance, methods that require matrix inversion). If features are highly correlated, matrices are computationally difficult to invert, which can lead to numerically unstable estimates. To reduce the correlation among variables, we can simply remove one feature column from the one-hot encoded array. Note that we do not lose any important information by removing a feature column, though; for example, if we remove the column color_blue, the feature information is still preserved since if we observe color_green=0 and color_red=0, it implies that the observation must be blue. If we use the get_dummies function, we can drop the first column by passing a True argument to the drop_first parameter, as shown in the following code example: >>> pd.get_dummies(df[['price', 'color', 'size']], ...                drop_first=True)     price  size color_green  color_red 0    10.1     1     1 0 1    13.5     2     0 1 2    15.3     3     0 0 In order to drop a redundant column via the OneHotEncoder , we need to set drop='first' and set categories='auto' as follows: >>> color_ohe = OneHotEncoder(categories='auto', drop='first') >>> c_transf = ColumnTransformer([  ...            ('onehot', color_ohe, [0]), ...            ('nothing', 'passthrough', [1, 2]) ... ]) >>> c_transf.fit_transform(X).astype(float) array([[  1. , 0. ,  1. , 10.1],        [  0. ,  1. , 2. ,  13.5],        [  0. ,  0. , 3. ,  15.3]]) In this article, we have gone through some of the methods to deal with categorical data in datasets. We distinguished between nominal and ordinal features, and with examples we explained how they can be handled. To harness the power of the latest Python open source libraries in machine learning check out this book Python Machine Learning - Third Edition, written by Sebastian Raschka and Vahid Mirjalili. Other interesting read in data! The best business intelligence tools 2019: when to use them and how much they cost Introducing Microsoft’s AirSim, an open-source simulator for autonomous vehicles built on Unreal Engine Media manipulation by Deepfakes and cheap fakes require both AI and social fixes, finds a Data & Society report
Read more
  • 0
  • 0
  • 17485

article-image-github-acquires-semmle-to-secure-open-source-supply-chain-attains-cve-numbering-authority-status
Savia Lobo
19 Sep 2019
5 min read
Save for later

GitHub acquires Semmle to secure open-source supply chain; attains CVE Numbering Authority status

Savia Lobo
19 Sep 2019
5 min read
Yesterday, GitHub announced that it has acquired Semmle, a code analysis platform provider and also that it is now a Common Vulnerabilities and Exposures (CVE) Numbering Authority. https://twitter.com/github/status/1174371016497405953 The Semmle acquisition is a part of the plan to securing the open-source supply chain, Nat Friedman explains in his blog post. Semmle provides a code analysis engine, named QL, which allows developers to write queries that identify code patterns in large codebases and search for vulnerabilities and their variants. Security researchers use Semmle to quickly find vulnerabilities in code with simple declarative queries. “Semmle is trusted by security teams at Uber, NASA, Microsoft, Google, and has helped find thousands of vulnerabilities in some of the largest codebases in the world, as well as over 100 CVEs in open source projects to date,” Friedman writes. Also Read: GitHub now supports two-factor authentication with security keys using the WebAuthn API Semmle originally spun out of research at Oxford in 2006 announced a $21 million Series B investment led by Accel Partners, last year. “In total, the company raised $31 million before this acquisition,” Techcrunch reports. Shanku Niyogi, Senior Vice President of Product at GitHub, in his blog post writes, “An important measure of the success of Semmle’s approach is the number of vulnerabilities that have been identified and disclosed through their technology. Today, over 100 CVEs in open source projects have been found using Semmle, including high-profile projects like Apache Struts, Apple’s XNU, the Linux Kernel, Memcached, U-Boot, and VLC. No other code analysis tool has a similar success rate.” GitHub also announced that it has been approved as a CVE Numbering Authority for open source projects. Now, GitHub will be able to issue CVEs for security advisories opened on GitHub, allowing for even broader awareness across the industry. With Semmle integration, every CVE-ID can be associated with a Semmle QL query, which can then be shared and tracked by the broader developer community. The CVE approval will make it easier for project maintainers to report security flaws directly from their repositories. Also, GitHub can assign CVE identifiers directly and post them to the CVE List and the National Vulnerability Database (NVD). Earlier this year, GitHub acquired Dependabot, to provide automatic security fixes natively within GitHub. With automatic security fixes, developers no longer need to manually patch their dependencies. When a vulnerability is found in a dependency, GitHub will automatically issue a pull request on downstream repositories with the information needed to accept the patch. In August, GitHub was in the limelight for being a part of the Capital One data breach that affected 106 million users in the US and Canada. The law firm Tycko & Zavareei LLP filed a lawsuit in California’s federal district court on behalf of their plaintiffs Seth Zielicke and Aimee Aballo. Also Read: GitHub acquires Spectrum, a community-centric conversational platform Both plaintiffs claimed Capital One and GitHub were unable to protect user’s personal data. The complaint highlighted that Paige A. Thompson, the alleged hacker stole the data in March, posted about the theft on GitHub in April. According to the lawsuit, “As a result of GitHub’s failure to monitor, remove, or otherwise recognize and act upon obviously-hacked data that was displayed, disclosed, and used on or by GitHub and its website, the Personal Information sat on GitHub.com for nearly three months.” The Semmle acquisition may be GitHub’s move to improve security for users in the future. It would be interesting to know how GitHub will mold security for users with additional CVE approval. A user on Reddit writes, “I took part in a tutorial session Semmle held at a university CS society event, where we were shown how to use their system to write semantic analysis passes to look for things like use-after-free and null pointer dereferences. It was only an hour and a bit long, but I found the query language powerful & intuitive and the platform pretty effective. At the time, you could set up your codebase to run Semmle passes on pre-commit hooks or CI deployments etc. and get back some pretty smart reporting if you had introduced a bug.” The user further writes, “The session focused on Java, but a few other languages were supported as first-class, iirc. It felt kinda like writing an SQL query, but over AST rather than tuples in a table, and using modal logic to choose the selections. It took a little while to first get over the 'wut' phase (like 'how do I even express this'), but I imagine that a skilled team, once familiar with the system, could get a lot of value out of Semmle's QL/semantic analysis, especially for large/enterprise-scale codebases.” https://twitter.com/kurtseifried/status/1174395660960796672 https://twitter.com/timneutkens/status/1174598659310313472 To know more about this announcement in detail, read GitHub’s official blog post. Other news in Data Keras 2.3.0, the first release of multi-backend Keras with TensorFlow 2.0 support is now out Introducing Microsoft’s AirSim, an open-source simulator for autonomous vehicles built on Unreal Engine GQL (Graph Query Language) joins SQL as a Global Standards Project and is now the international standard declarative query language for graphs
Read more
  • 0
  • 0
  • 2815
article-image-the-best-business-intelligence-tools-2019-when-to-use-them-and-how-much-they-cost
Richard Gall
19 Sep 2019
11 min read
Save for later

The best business intelligence tools 2019: when to use them and how much they cost

Richard Gall
19 Sep 2019
11 min read
Business intelligence is big business. Salesforce’s purchase of Tableau earlier this year (for a cool $16 billion) proves the value of a powerful data analytics platform, and demonstrates how the business intelligence space is reshaping expectations and demands in the established CRM and ERP marketplace. To a certain extent, the amount Salesforce paid for Tableau highlights that when it comes to business intelligence, tooling is paramount. Without a tool that fits the needs and skill levels of those that need BI and use analytics, discussions around architecture and strategy are practically moot. So, what are the best business intelligence tools? And more importantly, how do they differ from one another? Which one is right for you? Read next: 4 important business intelligence considerations for the rest of 2019 The best business intelligence tools 2019 Tableau Let’s start with the obvious one: Tableau. It’s one of the most popular business intelligence tools on the planet, and with good reason; it makes data visualization and compelling data storytelling surprisingly easy. With a drag and drop interface, Tableau requires no coding knowledge from users. It also allows users to ask ‘what if’ scenarios to model variable changes, which means you can get some fairly sophisticated insights with just a few simple interactions. But while Tableau is undoubtedly designed to be simple, it doesn’t sacrifice complexity. Unlike other business intelligence tools, Tableau allows users to include an unlimited number of datapoints in their analytics projects. When should you use Tableau and how much does it cost? The core Tableau product is aimed at data scientists and data analysts who want to be able to build end-to-end analytics pipelines. You can trial the product for free for 14 days, but this will then cost you $70/month. This is perhaps one of the clearest use cases - if you’re interested and passionate about data, Tableau practically feels like a toy. However, for those that want to employ Tableau across their organization, the product offers a neat pricing tier. Tableau Creator is built for individual power users - like those described above, Tableau Explorer for self-service analytics, and Tableau Viewer for those that need access to Tableau for limited access to dashboard and analytics. Tableau eBooks and videos Mastering Tableau 2019.1 - Second Edition Tableau 2019.x Cookbook Getting Started with Tableau 2019.2 - Second Edition Tableau in 7 Steps [Video] PowerBI PowerBI is Microsoft’s business intelligence platform. Compared to Tableau it is designed more for reporting and dashboards rather than data exploration and storytelling. If you use a wide range of Microsoft products, PowerBI is particularly powerful. It can become a centralized space for business reporting and insights. Like Tableau, it’s also relatively easy to use. With support from Microsoft Cortana - the company’s digital assistant - it’s possible to perform natural language queries. When should you use PowerBI and how much does it cost? PowerBI is an impressive business intelligence product. But to get the most value, you need to be committed to Microsoft. This isn’t to say you shouldn’t be - the company has been on form in recent years and appears to really understand what modern businesses and users need. On a similar note, a good reason to use PowerBI is for unified and aligned business insights. If Tableau is more suited to personal exploration, or project-based storytelling, PowerBI is an effective option for organizations that want more clarity and shared visibility on key performance metrics. This is reflected in the price. For personal users the desktop version of PowerBI is free, while a pro license is $9.99 a month. A premium plan which includes cloud resources (storage and compute) starts at $4,995. This is the option for larger organizations that are fully committed to the Microsoft suite and has a clear vision of how it wants to coordinate analytics and reporting. PowerBI eBooks and videos Learn Power BI Microsoft Power BI Quick Start Guide Learning Microsoft Power BI [Video] Qlik Sense and QlikView Okay, so here we’re going to include two business intelligence products together: Qlik Sense and QlikView. Obviously, they’re both part of the same family - they’re built by business intelligence company Qlik. More importantly, they’re both quite different products. What’s the difference between Qlik Sense and QlikView? As we’ve said, Qlik Sense and QlikView are two different products. QlikView is the older and more established tool. It’s what’s usually described as a ‘guided analytics’ platform, which means dashboards and analytics applications can be built for end users. The tool gives freedom to engineers and data scientists to build what they want but doesn’t allow end users to ‘explore’ data in any more detail than what is provided. QlikView is quite a sophisticated platform and is widely regarded as being more complex to use than Tableau or PowerBI. While PowerBI or Tableau can be used by anyone with an intermediate level of data literacy and a willingness to learn, QlikView will always be the preserve of data scientists and analysts. This doesn’t make it a poor choice. If you know how to use it properly, QlikView can provide you with more in-depth analysis than any other business intelligence platforms, helping users to see patterns and relationships across different data sets. If you’re working with big data, for example, and you have a team of data scientists and data engineers, QlikView is a good option. Qlik Sense, meanwhile, could be seen as Qlik’s attempt to compete with the likes of Tableau and PowerBI. It’s a self-service BI tool, which allows end users to create their own data visualisations and explore data through a process of ‘data discovery’. When should you use QlikView and how much does it cost? QlikView should be used when you need to build a cohesive reporting and business intelligence solution. It’s perfect for when you need a space to manage KPIs and metrics across different teams. Although a free edition is available for personal use, Qlik doesn’t publish prices for enterprise users. You’ll need to get in touch with the company’s sales team to purchase. QlikView eBooks and videos QlikView: Advanced Data Visualization QlikView Dashboard Development [Video] When should you use Qlik Sense and how much does it cost? Qlik Sense should be used when you have an organization full of people curious and prepared to get their hands on their data. If you already have an established model of reporting performance, Qlik Sense is a useful extra that can give employees more autonomy over how data can be used. When it comes to pricing, Qlik Sense is one of the more complicated business intelligence options. Like QlikView, there’s a free option for personal use, and again like QlikView, there’s no public price available - so you’ll have to connect with Qlik directly. To add an additional layer of complexity, there’s also a product called ‘Cloud Basic’ - this is free and can be shared between up to 5 users. It’s essentially a SaaS version of the Qlik Sense product. If you need to add more than 5 users, it costs $15 per user/month. Qlik Sense eBooks and videos Mastering Qlik Sense [Video] Qlik Sense Cookbook - Second Edition Data Storytelling with Qlik Sense [Video] Hands-On Business Intelligence with Qlik Sense Read next: Top 5 free Business Intelligence tools Splunk Splunk isn’t just a business intelligence tool. To a certain extent, it’s related to application monitoring and logging tools such as Datadog, New Relic, and AppDynamics. It’s built for big data and real-time analytics, which means that it’s well-suited to offering insights on business processes and product performance. The copy on the Splunk website talks about “real-time visibility across the enterprise” and describes Splunk as a “data-to-everything” platform. The product, then, is pitching itself as something that can embed itself inside existing systems, and bring insight and intelligence to places and spaces where it’s particularly valuable. This is in contrast to PowerBI and Tableau, which are designed for exploration and accessibility. This isn’t to say that Splunk doesn’t enable sophisticated data exploration, but rather that it is geared towards monitoring systems and processes, understanding change. It’s a tool built for companies that need full transparency - or, in other words, dynamic operational intelligence. When should you use Splunk and how much does it cost? Splunk is a tool that should be used if you’re dealing with dynamic and real-time data. If you want to be able to model and explore wide-ranging existing sets of data Tableau or PowerBI are probably a better bet. But if you need to be able to make decisions in an active and ongoing scenario, Splunk is a tool that can provide substantial support. The reason that Splunk is included as a part of this list of business intelligence tools is because real-time visibility and insight is vital for businesses. Typically understanding application performance or process efficiency might have been embedded within particular departments, such as a centralized IT function. Now, with businesses dependent upon operational excellence, and security and reliability in the digital arena becoming business critical, Splunk is a tool that deserves its status inside (and across) organizations. Splunk’s pricing is complicated. Prices are generally dependent on how much data you want to index - or, in other words, how much you’re giving Splunk to deal with. But to add to that, Splunk also have a perpetual license ( a one time payment) and an annual term license, which needs to be renewed. So, you can index 1GB/day for $4,500 on a perpetual license, or $1,800 on an annual license. If you want to learn more about Splunk’s pricing option, this post is very useful. Splunk eBooks and videos Splunk 7 Essentials [E-Learning] Splunk 7.x Quick Start Guide Splunk Operational Intelligence Cookbook - Third Edition IBM Cognos IBM Cognos is IBM’s flagship business intelligence tool. It’s probably best viewed as existing somewhere between PowerBI and Tableau. It’s designed for reporting dashboards that allow monitoring and analytics, but it is nevertheless also intended for self-service. To that end, you might say it’s more limited in capabilities than PowerBI, but it’s nevertheless more accessible for non-technical end users to explore data. It’s also relatively easy to integrate with other systems and data sources. So, if your data is stored in Microsoft or Oracle cloud services and databases, it’s relatively straightforward to get started with IBM Cognos. However, it’s worth noting that despite the accesibility of IBM’s product, it still needs centralized control and implementation. It doesn’t offer the level of ease that you get with Tableau, for example. When should you use IBM Cognos and how much does it cost? Cognos is perhaps the go-to option if PowerBI and Tableau don’t quite work for you. Perhaps you like the idea of Tableau but need more centralization. Or maybe you need a strong and cohesive reporting system but don’t feel prepared to buy into Microsoft. This isn’t to make IBM Cognos sound like the outsider - in fact, from an efficiency perspective it’s possibly the best way to ensure to ensure some degree of portability between data sources and to manage the age-old problem of data silos. If you’re not quite sure what business intelligence tool is right for you, it’s well worth taking advantage of Cognos’s free trial - you get unlimited access for a month. If you like what you get, you then have a choice between a premium version - which costs $70 per user/month, and the enterprise plan, the price of which isn’t publicly available. IBM Cognos eBooks and videos IBM Cognos Framework Manager [Video] IBM Cognos Report Studio [Video] IBM Cognos Connection and Workspace Advanced [Video] Conclusion: To choose the best business intelligence solution for your organization, you need to understand your needs and goals Business intelligence is a crowded market. The products listed here are the tip of the iceberg when it comes to analytics, monitoring, and data visualization. This is good and bad - it means there are plenty of options and opportunities, but it also means that sorting through the options to find the right one might take up some of your time. That’s okay though - if possible, try to take advantage of free trial periods. And if you’re in a rush to get work done, use them on active projects. You could even allocate different platforms and tools to different team members and get them to report on what worked well and what didn’t. That way you can have documented insights on how the products might actually be used within the organization. This will help you to better reach a conclusion about the best tool for the job. Business intelligence done well can be extremely valuable - so don’t waste money and don’t waste time on tools that aren’t going to deliver what you need.
Read more
  • 0
  • 0
  • 8651

article-image-gql-graph-query-language-joins-sql-as-a-global-standards-project-and-is-now-the-international-standard-declarative-query-language-for-graphs
Amrata Joshi
19 Sep 2019
6 min read
Save for later

GQL (Graph Query Language) joins SQL as a Global Standards Project and will be the international standard declarative query language for graphs

Amrata Joshi
19 Sep 2019
6 min read
On Tuesday, the team at Neo4j, the graph database management system announced that the international committees behind the development of the SQL standard have voted to initiate GQL (Graph Query Language) as the new database query language. GQL is now going to be the international standard declarative query language for property graphs and it is also a Global Standards Project. GQL is developed and maintained by the same international group that maintains the SQL standard. How did the proposal for GQL pass? Last year in May, the initiative for GQL was first time processed in the GQL Manifesto. This year in June, the national standards bodies across the world from the ISO/IEC’s Joint Technical Committee 1 (responsible for IT standards) started voting on the GQL project proposal.  The ballot closed earlier this week and the proposal was passed wherein ten countries including Germany, Korea, United States, UK, and China voted in favor. And seven countries agreed to put forward their experts to work on this project. Japan was the only country to vote against in the ballot because according to Japan, existing languages already do the job, and SQL/Property Graph Query extensions along with the rest of the SQL standard can do the same job. According to the Neo4j team, the GQL project will initiate development of next-generation technology standards for accessing data. Its charter mandates building on core foundations that are established by SQL and ongoing collaboration in order to ensure SQL and GQL interoperability and compatibility. GQL would reflect rapid growth in the graph database market by increasing adoption of the Cypher language.  Stefan Plantikow, GQL project lead and editor of the planned GQL specification, said, “I believe now is the perfect time for the industry to come together and define the next generation graph query language standard.”  Plantikow further added, “It’s great to receive formal recognition of the need for a standard language. Building upon a decade of experience with property graph querying, GQL will support native graph data types and structures, its own graph schema, a pattern-based approach to data querying, insertion and manipulation, and the ability to create new graphs, and graph views, as well as generate tabular and nested data. Our intent is to respect, evolve, and integrate key concepts from several existing languages including graph extensions to SQL.” Keith Hare, who has served as the chair of the international SQL standards committee for database languages since 2005, charted the progress toward GQL, said, “We have reached a balance of initiating GQL, the database query language of the future whilst preserving the value and ubiquity of SQL.” Hare further added, “Our committee has been heartened to see strong international community participation to usher in the GQL project.  Such support is the mark of an emerging de jure and de facto standard .” The need for a graph-specific query language Researchers and vendors needed a graph-specific query language because of the following limitations: SQL/PGQ language is restricted to read-only queries SQL/PGQ cannot project new graphs The SQL/PGQ language can only access those graphs that are based on taking a graph view over SQL tables. Researchers and vendors needed a language like Cypher that would cover insertion and maintenance of data and not just data querying. But SQL wasn’t the apt model for a graph-centric language that takes graphs as query inputs and outputs a graph as a result. But GQL, on the other hand, builds in openCypher, a project that brings Cypher to Apache Spark and gives users a composable graph query language. SQL and GQL can work together According to most of the companies and national standards bodies that are supporting the GQL initiative, GQL and SQL are not competitors. Instead, these languages can complement each other via interoperation and shared foundations. Alastair Green, Query Languages Standards & Research Lead at Neo4j writes, “A SQL/PGQ query is in fact a SQL sub-query wrapped around a chunk of proto-GQL.” SQL is a language that is built around tables whereas GQL is built around graphs. Users can use GQL to find and project a graph from a graph.  Green further writes, “I think that the SQL standards community has made the right decision here: allow SQL, a language built around tables, to quote GQL when the SQL user wants to find and project a table from a graph, but use GQL when the user wants to find and project a graph from a graph. Which means that we can produce and catalog graphs which are not just views over tables, but discrete complex data objects.” It is still not clear when will the first implementation version of GQL will be out. The official page reads,  “The work of the GQL project starts in earnest at the next meeting of the SQL/GQL standards committee, ISO/IEC JTC 1 SC 32/WG3, in Arusha, Tanzania, later this month. It is impossible at this stage to say when the first implementable version of GQL will become available, but it is highly likely that some reasonably complete draft will have been created by the second half of 2020.” Developer community welcomes the new addition Users are excited to see how GQL will incorporate Cypher, a user commented on HackerNews, “It's been years since I've worked with the product and while I don't miss Neo4j, I do miss the query language. It's a little unclear to me how GQL will incorporate Cypher but I hope the initiative is successful if for no other reason than a selfish one: I'd love Cypher to be around if I ever wind up using a GraphDB again.” Few others mistook GQL to be Facebook’s GraphQL and are sceptical about the name. A comment on HackerNews reads, “Also, the name is of course justified, but it will be a mess to search for due to (Facebook) GraphQL.” A user commented, “I read the entire article and came away mistakenly thinking this was the same thing as GraphQL.” Another user commented, “That's quiet an unfortunate name clash with the existing GraphQL language in a similar domain.” Other interesting news in Data Media manipulation by Deepfakes and cheap fakes refquire both AI and social fixes, finds a Data & Society report Percona announces Percona Distribution for PostgreSQL to support open source databases  Keras 2.3.0, the first release of multi-backend Keras with TensorFlow 2.0 support is now out
Read more
  • 0
  • 0
  • 6633

article-image-mitres-2019-cwe-top-25-most-dangerous-software-errors-list-released
Savia Lobo
19 Sep 2019
10 min read
Save for later

MITRE’s 2019 CWE Top 25 most dangerous software errors list released

Savia Lobo
19 Sep 2019
10 min read
Two days ago, the Cybersecurity and Infrastructure Security Agency (CISA) announced MITRE’s 2019 Common Weakness Enumeration (CWE) Top 25 Most Dangerous Software Errors list. This list includes a compilation of the most frequent and critical errors that can lead to serious vulnerabilities in software. For aggregating the data for this list, the CWE Team used a data-driven approach that leverages published Common Vulnerabilities and Exposures (CVE®) data and related CWE mappings found within the National Institute of Standards and Technology (NIST) National Vulnerability Database (NVD), as well as the Common Vulnerability Scoring System (CVSS) scores associated with the CVEs. The team then applied a scoring formula (elaborated in later sections) to determine the level of prevalence and danger each weakness presents.  This Top 25 list of errors include the NVD data from the years 2017 and 2018, which consisted of approximately twenty-five thousand CVEs. The previous SANS/CWE Top 25 list was released in 2011 and the major difference between the lists released in the year 2011 and the current 2019 is in the approach used. In 2011, the data was constructed using surveys and personal interviews with developers, top security analysts, researchers, and vendors. These responses were normalized based on the prevalence, ranked by the CWSS methodology. However, in the 2019 CWE Top 25, the list was formed based on the real-world vulnerabilities found in NVD’s data. CWE Top 25 dangerous software errors, developers should watch out for  Improper Restriction of Operations within the Bounds of a Memory Buffer In this error(CWE-119), the software performs operations on a memory buffer. However, it can read from or write to a memory location that is outside of the intended boundary of the buffer. The likelihood of exploit of this error is high as an attacker may be able to execute arbitrary code, alter the intended control flow, read sensitive information, or cause the system to crash.  This error can be exploited in any programming language without memory management support to attempt an operation outside of the bounds of a memory buffer, but the consequences will vary widely depending on the language, platform, and chip architecture. Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') This error, CWE-79, can cause the software to incorrectly neutralize the user-controllable input before it is placed in output that is used as a web page that is served to other users. Once the malicious script is injected, the attacker could transfer private information, such as cookies that may include session information, from the victim's machine to the attacker. This error can also allow attackers to send malicious requests to a website on behalf of the victim, which could be especially dangerous to the site if the victim has administrator privileges to manage that site. Such XSS flaws are very common in web applications since they require a great deal of developer discipline to avoid them. Improper Input Validation With this error, CWE-20, the product does not validate or incorrectly validates input thus affecting the control flow or data flow of a program. This can allow an attacker to craft the input in a form that is not expected by the rest of the application. This will lead to parts of the system receiving unintended input, which may result in an altered control flow, arbitrary control of a resource, or arbitrary code execution. Input validation is problematic in any system that receives data from an external source. “CWE-116 [Improper Encoding or Escaping of Output] and CWE-20 have a close association because, depending on the nature of the structured message, proper input validation can indirectly prevent special characters from changing the meaning of a structured message,” the researchers mention in the CWE definition post.  Information Exposure This error, CWE-200, is the intentional or unintentional disclosure of information to an actor that is not explicitly authorized to have access to that information. According to the CEW- Individual Dictionary Definition, “Many information exposures are resultant (e.g. PHP script error revealing the full path of the program), but they can also be primary (e.g. timing discrepancies in cryptography). There are many different types of problems that involve information exposures. Their severity can range widely depending on the type of information that is revealed.” This error can be executed for specific named Languages, Operating Systems, Architectures, Paradigms(Mobiles), Technologies, or a class of such platforms. Out-of-bounds Read In this error, CWE-125, the software reads data past the end, or before the beginning, of the intended buffer. This can allow attackers to read sensitive information from other memory locations or cause a crash. The software may modify an index or perform pointer arithmetic that references a memory location that is outside of the boundaries of the buffer.  This error may occur for specific named Languages (C, C++), Operating Systems, Architectures, Paradigms, Technologies, or a class of such platforms.  Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') In this error (weakness ID: CWE-89) the software constructs all or part of an SQL command using externally-influenced input from an upstream component. However, it incorrectly neutralizes special elements that could modify the intended SQL command when it is sent to a downstream component. This error can be used to alter query logic to bypass security checks, or to insert additional statements that modify the back-end database, possibly including the execution of system commands. It can occur in specific named Languages, Operating Systems, Architectures, Paradigms, Technologies, or a class of such platforms. Cross-Site Request Forgery (CSRF) This error CWE-352, the web application does not, or can not, sufficiently verify whether a well-formed, valid, consistent request was intentionally provided by the user who submitted the request. This might allow an attacker to trick a client into making an unintentional request to the webserver which will be treated as an authentic request. The likelihood of the occurrence of this error is medium.  This can be done via a URL, image load, XMLHttpRequest, etc. and can result in the exposure of data or unintended code execution.  Integer Overflow or Wraparound In this error, CWE-190, the software performs a calculation that can produce an integer overflow or wraparound, when the logic assumes that the resulting value will always be larger than the original value. This can introduce other weaknesses when the calculation is used for resource management or execution control. Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') In this error, CWE-22, the software uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory. However, the software does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory.  In most programming languages, injection of a null byte (the 0 or NUL) may allow an attacker to truncate a generated filename to widen the scope of attack. For example, when the software adds ".txt" to any pathname, this may limit the attacker to text files, but a null injection may effectively remove this restriction. Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection') In this error, CWE-78, the software constructs all or part of an OS command using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the intended OS command when it is sent to a downstream component. This error can allow attackers to execute unexpected, dangerous commands directly on the operating system. This weakness can lead to a vulnerability in environments in which the attacker does not have direct access to the operating system, such as in web applications.  Alternately, if the weakness occurs in a privileged program, it could allow the attacker to specify commands that normally would not be accessible, or to call alternate commands with privileges that the attacker does not have. The researchers write, “More investigation is needed into the distinction between the OS command injection variants, including the role with argument injection (CWE-88). Equivalent distinctions may exist in other injection-related problems such as SQL injection.” Here’s the list of the remaining errors from MITRE’s 2019 CWE Top 25 list: CWE ID Name of the Error Average CVSS score CWE-416 Use After Free 17.94 CWE-287 Improper Authentication 10.78 CWE-476 NULL Pointer Dereference 9.74 CWE-732  Incorrect Permission Assignment for Critical Resource 6.33 CWE-434 Unrestricted Upload of File with Dangerous Type 5.50 CWE-611 Improper Restriction of XML External Entity Reference 5.48 CWE-94 Improper Control of Generation of Code ('Code Injection') 5.36 CWE-798 Use of Hard-coded Credentials 5.1 CWE-400 Uncontrolled Resource Consumption 5.04 CWE-772  Missing Release of Resource after Effective Lifetime 5.04 CWE-426  Untrusted Search Path 4.40 CWE-502 Deserialization of Untrusted Data 4.30 CWE-269 Improper Privilege Management 4.23 CWE-295 Improper Certificate Validation 4.06 To know about the other errors in detail, read CWE’s official report. Scoring formula to calculate the rank of weaknesses The CWE team had developed a scoring formula to calculate a rank order of weaknesses. The scoring formula combines the frequency that a CWE is the root cause of vulnerability with the projected severity of its exploitation. In both cases, the frequency and severity are normalized relative to the minimum and maximum values seen.  A few properties of the scoring method include: Weaknesses that are rarely exploited will not receive a high score, regardless of the typical severity associated with any exploitation. This makes sense, since if developers are not making a particular mistake, then the weakness should not be highlighted in the CWE Top 25. Weaknesses with a low impact will not receive a high score. This again makes sense, since the inability to cause significant harm by exploiting a weakness means that weakness should be ranked below those that can.  Weaknesses that are both common and can cause harm should receive a high score. However, there are a few limitations to the methodology of the data-driven approach chosen by the CWE Team. Limitations of the data-driven methodology This approach only uses data publicly reported and captured in NVD, while numerous vulnerabilities exist that do not have CVE IDs. Vulnerabilities that are not included in NVD are therefore excluded from this approach. For vulnerabilities that receive a CVE, often there is not enough information to make an accurate (or precise) identification of the appropriate CWE that is exploited.  There is an inherent bias in the CVE/NVD dataset due to the set of vendors that report vulnerabilities and the languages that are used by those vendors. If one of the largest contributors to CVE/NVD primarily uses C as its programming language, the weaknesses that often exist in C programs are more likely to appear.  Another bias in the CVE/NVD dataset is that most vulnerability researchers and/or detection tools are very proficient at finding certain weaknesses but do not find other types of weaknesses. Those types of weaknesses that researchers and tools struggle to find will end up being under-represented within 2019 CWE Top 25. Gaps or perceived mischaracterizations of the CWE hierarchy itself lead to incorrect mappings.  In Metric bias, it indirectly prioritizes implementation flaws over design flaws, due to their prevalence within individual software packages. For example, a web application may have many different cross-site scripting (XSS) vulnerabilities due to a large attack surface, yet only one instance of the use of an insecure cryptographic algorithm. https://twitter.com/mattfahrner/status/1173984732926943237 To know more about this CWE Top 25 list in detail, head over to MITRE’s CWE Top 25 official page.  Other news in Security LastPass patched a security vulnerability from the extensions generated on pop-up windows An unsecured Elasticsearch database exposes personal information of 20 million Ecuadoreans including 6.77M children under 18 A new Stuxnet-level vulnerability named Simjacker used to secretly spy over mobile phones in multiple countries for over 2 years: Adaptive Mobile Security reports
Read more
  • 0
  • 0
  • 5702
article-image-as-kickstarter-reels-in-the-aftermath-of-its-alleged-union-busting-move-is-the-tech-industry-at-a-tipping-point
Vincy Davis
18 Sep 2019
12 min read
Save for later

As Kickstarter reels in the aftermath of its alleged union-busting move, is the tech industry at a tipping point?

Vincy Davis
18 Sep 2019
12 min read
Update: With the immense dissent accorded on Kickstarter since its union-busting move of firing two of its union organizing workers, many would have hoped the public-benefit corporation company to switch its stand.  However, in an e-mail statement to CurrentAffairs on September 28th, Kickstarter’s CEO, Aziz Hasan, has made the company’s standpoint firm and clear. Hasan confirmed that Kickstarter is standing by its decision to fire the organizers and would be fighting the lawsuit filed by them with the National Labor Relations Board. He went on to confirm the company’s perspective of not voluntarily recognizing a union, even if the majority of its workers signed in support of a union. He also dubbed the union framework as “inherently adversarial”. Hassan also pledged not to remain neutral on unionization and said that Kickstarter would continue to actively oppose unionization efforts.  https://twitter.com/NathanJRobinson/status/1177668530864607232 After Kickstarter’s stand was made public on Twitter, many influential and past successful Kickstarter project initiators have openly pledged to boycott Kickstarter for future projects. https://twitter.com/vornietom/status/1178203986890899457 https://twitter.com/joshfoxfilm/status/1177721539015385094 https://twitter.com/neilhimself/status/1178046366137892865 https://twitter.com/GunnerGale/status/1178239961314840576 Update: Following the alleged union-busting move at Kickstarter, many workers around the world came out in support of the two fired employees.   On September 24th, HCL employed 80 Google contract workers in Pittsburgh voted in favor of unionizing with the United Steel Workers (USW). They will now be working under the name Pittsburgh Association of Tech Professionals (PATP). Per Vice, the main aim of the union is to create an umbrella organization to facilitate organizing in the tech sector, and eventually to help tech locals coordinate and cooperate amongst themselves. According to Motherboard, HCL America’s deputy general manager of operations had also sent out emails, before the voting, to all the contractors in an attempt to prevent them from unionizing. https://twitter.com/veenadubal/status/1176631300498657280 The vote to unionize is a historic movement among tech workers and will motivate others to come forward and form unions. Meredith Whittaker, ex-Google Walkout organizer, who had left Google over ethical concerns has come out in support of Google contractors unionizing. https://twitter.com/mer__edith/status/1176581810035273728 The HCL workers have also expressed their solidarity with Kickstarter workers. https://twitter.com/SethGoldstein13/status/1174815837762531328 HCL America has not yet officially commented on its workers unionizing. Head over to Vice, for the full coverage on Google workers unionizing. [dropcap]Past week[/dropcap] has been quite rough for many at Kickstarter, the American public-benefit corporation. Kickstarter fired two of its employees, Taylor Moore, Head of Comedy and Podcasts at Kickstarter and Clarissa Redwine, former Senior Design & Tech Outreach Lead at Kickstarter, citing performance issues. Both workers have openly protested against Kickstarter on Twitter. https://twitter.com/taylordotbiz/status/1172257828473573377 https://twitter.com/ClarissaRedwine/status/1172167251623124997 The workers have claimed that there has been no performance issues from their end and their dismissal is the result of their participation in the unionization effort within the company. In her own words, Redwine says, “Kickstarter's management continues to state that I was fired for performance issues. I find this strange because I not only met but exceeded all performance metrics in Q2. I was great at my job. And I loved it.” Moore also informed on Twitter that another prominent member of the proposed union (Kickstarter United) has also been asked to leave the firm by Kickstarter. These actions have of course garnered a lot of negative reactions for the company, with people tagging Kickstarter as a ‘union buster’. https://twitter.com/GJenkins310114/status/1172765408287428608 https://twitter.com/nickblackford/status/1172432658372055041 Moore is now urging fellow Kickstarter staff members to unionize and sign petitions for exercising their right to a fair and equitable workplace. https://twitter.com/taylordotbiz/status/1172260259823534080 Following all the furore, Kickstarter has said in a statement to engadget that if an employee fails to achieve the meeting expectations, they are bound to meet this treatment. The statement further reads, “Kickstarter has not fired or otherwise retaliated against anyone for union organizing. Obviously we knew how these terminations could be perceived. But it would be unfair to not hold these people to the same standards as the rest of our staff simply because they are union organizers.” Read Also: Perry Chen, Kickstarter CEO steps down from his role in the middle of employees’ Unionization efforts Per Slate, two days ago, the labor union working with employees at Kickstarter have filed a  charge with the National Labor Relations Board. The labor union have accused Kickstarter of wrongfully terminating Redwine and Moore. They have further alleged that “Kickstarter interfered with employees’ right to organize, since such firings could have a chilling effect on union efforts. Moore and Redwine are asking for back pay and to be reinstated to their positions. Next, the NLRB will ask the union to file an affidavit describing the charges, and the employer will have to respond.” https://twitter.com/NathanJRobinson/status/1173287284663345152 The labor union workers have called on Kickstarter project creators to express their solidarity with Kickstarter workers’ union and to condemn the company’s firings with a ‘Support of Unionizing Kickstarter Workers’ form. With 80 Kickstarter creators already signed up, the statement in support of the Kickstarter Union has already received lots of appreciation from the public on Twitter. One Kickstarter creator tweeted, “I owe my fine art career to Kickstarter. It is a site I love, and a site that changed my life. They need to recognize the union.  I signed this petition.” Kickstarter has not released a statement regarding the lawsuit yet. Unfair treatment of protesting workers have always been the norm Kickstarter has now joined the long list of firms like Google and Npm Inc who have opposed workers organizing to demand better treatment. For workers, unions act as negotiators who can talk/protest on their behalf against unlawful working conditions like non-cash compensation, harassment, unbiased hiring, and other unfair labor practices. On the other hand, the employers’ will to maximize the company’s profit at the cost of the employer’s salary or time are received with resistance from trade unions. With such conflicting goals, it comes as no surprise that organizations will attempt to remove the organizers behind mass protests, in order to avoid such movements taking concrete shapes such as trade unions. Another method used by firms is to issue demotion or an unplanned transfer, as experienced by Claire Stapleton and Meredith Whittaker, ex-Google employees. These developers played a major role in organizing an pan-Google Walkout in November last year, but had to quit Google eight months later, after facing retaliation for protesting against the company. Read Also: Meredith Whittaker, Google Walkout organizer, and AI ethics researcher is leaving the company, adding to its brain-drain woes over ethical concerns Read Also: Google Walkout organizer, Claire Stapleton resigns after facing retaliation from management The organizers of the Google Walkout had laid five demands for change within the workplace. An end to Forced Arbitration in cases of harassment and discrimination A commitment to end pay and opportunity inequities A publicly disclosed sexual harassment transparency report A clear, uniform, and globally inclusive process for reporting sexual misconduct safely and anonymously Elevate the Chief Diversity Officer to answer directly to the CEO In April, NPM Inc, the company behind the widely used NPM JavaScript package repository dismissed 5 of its employees, allegedly for engaging in union organizing activities. The union organizers were protesting against the company’s profit prioritizing strategy at the expense of employees. According to a special report from The Register, the JavaScript package registry and NPM Inc were planning to fight union-busting complaints by firing staffers, rather than settling their claims. Following all the unrest in the company, the Npm Inc. co-founder and Chief data officer, Laurie Voss, quit the company, in July. Voss’ s resignation came third in line after Rebecca Turner, former core contributor who resigned in March and Kat Marchan, former CLI and community architect also resigned from NPM the same month. Voss stated in his blog that he supports unions and added, “As far as the labor dispute goes, I will say that I have always supported unions, I think they’re great, and at no point in my time at NPM did anybody come to me proposing a union,” he said. “If they had, I would have been in favor of it. The whole thing was a total surprise to me.” This can be mostly true as employees tend not to talk to the management in the fear of retaliation. Read Also: Npm Inc, after a third try, settles former employee claims, who were fired for being pro-union, The Register reports Why is tech workers’ unionizing, a tipping point? There has always been a debate around the need for unions in tech or rather if at all the industry needs one. Supporters of the latter are of the opinion that the tech industry does not need a union as this field is already doing well, in terms of money and safety for employees. Jeff Atwood, one of the co-founders of the stackoverflow.com responded to the Kickstarter story by commenting, “I will never understand the desire to "unionize" in tech. Why should rich people unionize? So they can get .. uh.. even more money?” Lack of Pay Equity A popular issue that employees, including those in tech, face is the pay equity issue, particularly for women. The current Equal Pay Act specifies that employers can’t differentiate salary based on gender (unless based on factors like seniority, merit, and work level). However, it was found that most  women were making far less money than male colleagues with the same experience and job titles. According to Vox, “women in the United States who work full time make, on an average, 82 cents for every dollar their male counterparts make.” After a series of attempts to make sure women and men are paid equally, Congress passed the Paycheck Fairness bill, in March this year. The Paycheck Fairness Act is intended to close all the loopholes in the current Equal Pay Act of 1963. It is now awaiting Senate approval post which the bill will become a law. Need for safe working conditions Job insecurity However, fair pay is not the only reason unions exist. Two months ago, Amazon workers protested against their employers on its Prime day, calling for a safe work environment. According to a report by Verge, an attorney for Amazon confirmed that hundreds of employees at the Baltimore facility were terminated within a year for failing to meet productivity rates. The workers complained that Amazon gamifies and makes the productivity goals dynamic for its workers, which becomes unrealistic to achieve. They also demanded the company to take action against ease quotas, and make more temp employees permanent. Toxic work culture In April this year, Riot games employees had demonstrated a walkout in protest of the company’s sexist culture and lack of diversity. Riot games was put in the spotlight after many complaints by Riot workers on grounds of sexual harassment and discrimination faced at their workspace. In contrast the complaints and public outcry against Riot games did not fall on deaf ears, as Riot games settled the class-action lawsuit filed by their workers. The CEO of Riot Games said, “We are grateful for every Rioter who has come forward with their concerns and believe this resolution is fair for everyone involved.” Unrealistic target demands and burnout In January, the 2019 Game Developers Conference survey revealed that nearly 50% of game developers believed that the game industry workers should unionize to fight against unreal target demands and long working hours. Following which in February, the American Federation of Labor and Congress of Industrial Organizations (AFL-CIO) published an open letter to game developers, persuading them to unionize and voice support for fair treatment at work. The letter pointed out that workers are asked to submit to unrealistic targets with horrible work conditions, job instability, and inadequate pay. Forced arbitration keeps the status quo The Google Walkout not only pushed Google to take a stand against sexual assault but also inspired other companies to take the right steps in case of sensitive issues. Post the Google Walkout, Sundar Pichai, Chief Executive Officer of Google Inc. addressed these demands and listed some major changes that will be incorporated in Google. Out of all the demands, forced arbitration was one of the most criticized and protested policies by the Google workers. Later in January, Google workers even launched an industry-wide awareness campaign to fight against it. The workers stated that the implementation of forced arbitration policy has grown significantly in the past seven years, with 65% of the companies consisting of 1,000 or more employees, now having mandatory arbitration procedures. Relenting to the demands, Google finally ended its forced arbitration policy for all it’s employees in February. According to the forced arbitration policy, the employee is required to waive their right to sue, to participate in a class action lawsuit, or to appeal. Following suit, Facebook also changed its policy of forced arbitration which required the employees to settle sexual harassment claims in private. This allows Facebook employees to take any of their sexual harassment complaints to a court of law. It is safe to say that unionization in tech is of utmost importance, as only when a large group of protestors revolt together, the company is forced to change course, which otherwise would have gone unchallenged. Only structural changes such as worker unions that challenge the current power dynamics  can make a company’s “if they don’t like it they can leave” mentality change in the tech industry. With the tech industry playing fast and loose with laws and regulations, workers may be the only guardrail in keeping them accountable. Unions, for all their limitations and shortcomings, help protect workers against company retaliation. Unions are likely not the end goal, but a means to achieve a fairer and inclusive workplace, a more responsible business entity and a saner marketplace. Latest news in Tech How Artificial Intelligence and machine learning can help address the ongoing climate change A new Stuxnet-level vulnerability named Simjacker used to secretly spy over mobile phones in multiple countries for over 2 years: Adaptive Mobile Security reports The CAP Theorem in practice: The consistency vs. availability trade-off in distributed databases
Read more
  • 0
  • 0
  • 3825

article-image-why-companies-that-dont-invest-in-technology-training-cant-compete
Richard Gall
17 Sep 2019
8 min read
Save for later

Why companies that don’t invest in technology training can’t compete

Richard Gall
17 Sep 2019
8 min read
Here’s the bad news: the technology skills gap is seriously harming many businesses. Without people who can help drive innovation, companies are going to struggle to compete on the global stage. Fortunately, there’s also some good news: we can all do something about it. Although it might seem that the tech skills gap is an issue that’s too big for employers to properly tackle on their own, by investing time and energy in technology training, employers can ensure that they have the skills and knowledge within their team to continue to innovate and build solutions to the conveyor belt of problems. Businesses ignore technology training at their peril. Sure, a great recruitment team is important, as are industry and academic connections. But without a serious focus on upskilling employees - software developers or otherwise - many companies won’t be able to compete. It’s not just about specific skills; it’s also about building a culture that is forward thinking and curious. A working environment that is committed to exploring new ways to solve problems. If companies aren’t serious about building this type of culture through technology training, they’re likely to run in a number of critical problems. These could have a long term impact on growth and business success. If you don’t invest in training and resources it's harder to retain employees Something that I find amusing is seeing business leaders complain about the dearth of talent in their industry or region, while also failing to engage or develop the talent they already have at their disposal. It sounds obvious, but it’s something that is often missed in conversations about the skills gap. If it’s so hard to find the talent you need, make sure you train to retain. The consequences could be significant. If you have talented employees that are willing and eager to learn but you’re not prepared to support them, with either resources or time, you can bet that they’ll be looking for other jobs. For employees, a company that doesn’t invest in their skills and knowledge is demoralising in multiple ways - on the one hand it signals that the company doesn’t trust or value them, while on the other it leaves them doing work that doesn’t push or engage them. If someone can’t do something, your first thought shouldn’t be to turn to a recruitment manager - you should instead be asking if someone in your team can do it. If they can’t, then ask yourself if they could with some training. This doesn’t mean that your HR problems can always be solved internally, but it does mean that if you take technology training seriously you can solve it much faster, and maybe even cheaper. Indeed, there’s another important point here. There’s sometimes an assumption that the right person is out there who can do the job you need. Someone with a perfect background, all the skills, all the knowledge and expertise you need. This person almost certainly doesn’t exist. And, while you were dreaming of this perfect employee, your other developers all got jobs elsewhere. That leaves you in a tricky position. All the initiatives you wanted to achieve in the second half of the year now have to be pushed back. Good technology training and team-based resources can attract talent Although it’s important to invest in tech training and resources to develop and retain employees, a considered approach to technology training can also be useful in attracting people to your company. Think of companies like Google and Facebook. One of the reasons they’re so attractive to many talented tech professionals (as well as the impressive salaries…) is that they offer so much scope to solve interesting problems and work on complex projects. Now, you probably can’t offer either the inflated salaries or the possibility of working on industry defining technology, but by making technology training a critical part of your ‘employer brand’, you’ll find it much easier to make talented tech professionals pay attention to you. It’s also a way of showing prospective employees that you’re serious about their personal development, and that you understand just how important learning is in tech. There are a number of ways that this could play out - from highlighting the resources you make available and give software engineering teams access to, to qualifications, and even just the opportunity to learn for a set period every week. Ultimately, by having a clear and concerted tech learning initiative, you can both motivate and develop your existing employees while also attracting new ones. This means that you’ll have a healthy developer culture, and be viewed as a developer and tech-focused organization. Read next: 5 barriers to learning and technology training for small software development teams Organizations that invest in tech training and resources encourage employee flexibility and curiosity To follow on from the point above, to successfully leverage technology, you need to build a culture where change is embraced. You need to value curiosity as a means of uncovering innovative solutions to problems. The way to encourage this is through technology training. Failing to provide even a basic level of support is not only unhelpful in a practical sense, it also sends a negative message to your employees: change doesn’t really matter, we always do things this way, so why does anyone need to spend time learning new things? By showing your engineering employees that yes, their curiosity is important, you can ensure that you have people who are taking full ownership of problems. Instead of people saying that’s how things have always been done, employees will be asking if there's another way. Ultimately that should lead to improved productivity and better outcomes. In turn, this means that your tech employees will become more flexible and adaptable without even realising it. Without technology training and high-quality resources, businesses can’t evolve quickly This brings us neatly to the next point. If your employees aren’t flexible and haven’t been encouraged to explore and learn new technologies and topics the business will suffer. The speed at which it can adapt to changes in the market and deliver new products will harm the bottom line. All too often we think about business agility in an abstract way. But if you aren’t able to support your employees in embracing change and learning new skills, you simply can’t have business agility. Of course, you might imagine a counterargument popping up here - couldn’t individual development harm business goals? True, most offices aren’t study spaces. But while business objectives and deadlines should always be the priority, if learning is constantly pushed to the bottom of the pile, you will find that the business, and your team, are going to hit a brick wall. The status quo is rarely sustainable. There will always be a point at which something won’t work, or something becomes out of date. If everyone is looking at immediate problems and tasks, they’ll never see the bigger picture. Technology training and learning resources can help to remove silos and improve collaboration Silos are an inevitable reality in many modern businesses. This is particularly true in tech teams. Often they arise because of good intentions. Splitting people up and getting them to focus on different things isn’t exactly the worst idea in the world, right? Today, however, silos are significant barriers to innovation and change. Change happens when people break out of their silos and share knowledge. It happens when people develop new ways of working that are free from traditional constraints. This isn’t something that’s easy to accomplish. It requires a lot of work on the culture and processes of a team. But one element that’s often overlooked when it comes to tackling silos is training and resources. If silos exist because people feel comfortable with specialization and focus, by offering wide-reaching resources and training materials you will be going a long way to helping your employees to get outside of their comfort zone. Indeed, in some respects we’re not talking about sustained training courses. Instead, it’s just as valuable - and more cost-effective - to provide people with resources that allow them to share their frame of reference. That might feel insignificant in the face of technological change beyond the business, and strategic visions inside it. However, it’s impossible to overestimate the importance of a shared language or understanding. It’s a critical part of getting people to work together, and getting people to think with more clarity about how they work, and the tools they use. Make sure your team has the resources they need to solve problems quickly and keep their skills up to date. Learn more about Packt for Teams here.
Read more
  • 0
  • 0
  • 4372