Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials

7019 Articles
article-image-python-3-8-new-features-the-walrus-operator-positional-only-parameters-and-much-more
Bhagyashree R
18 Jul 2019
5 min read
Save for later

Python 3.8 new features: the walrus operator, positional-only parameters, and much more

Bhagyashree R
18 Jul 2019
5 min read
Earlier this month, the team behind Python announced the release of Python 3.8b2, the second of four planned beta releases. Ahead of the third beta release, which is scheduled for 29th July, we look at some of the key features coming to Python 3.8. The "incredibly controversial" walrus operator The walrus operator was proposed in PEP 572 (Assignment Expressions) by Chris Angelico, Tim Peters, and Guido van Rossum last year. Since then it has been heavily discussed in the Python community with many questioning whether it is a needed improvement. Others were excited as the operator does make the code a tiny bit more readable. At the end of the PEP discussion, Guido van Rossum stepped down as BDFL (benevolent dictator for life) and the creation of a new governance model. In an interview with InfoWorld, Guido shared, “The straw that broke the camel’s back was a very contentious Python enhancement proposal, where after I had accepted it, people went to social media like Twitter and said things that really hurt me personally. And some of the people who said hurtful things were actually core Python developers, so I felt that I didn’t quite have the trust of the Python core developer team anymore.” According to PEP 572, the assignment expression is a syntactical operator that allows you to assign values to a variable as a part of an expression. Its aim is to simplify things like multiple-pattern matches and the so-called loop and a half. At PyCon 2019, Dustin Ingram, a PyPI maintainer, gave a few examples where you can use this syntax: Balancing lines of codes and complexity Avoiding inefficient comprehensions Avoiding unnecessary variables in scope You can watch the full talk on YouTube: https://www.youtube.com/watch?v=6uAvHOKofws The feature was implemented by Emily Morehouse, Python core developer and Founder, Director of Engineering at Cuttlesoft, and was merged earlier this year: https://twitter.com/emilyemorehouse/status/1088593522142339072 Explaining other improvements this feature brings, Jake Edge, a contributor on LWN.net wrote, “These and other uses (e.g. in list and dict comprehensions) help make the intent of the programmer clearer. It is a feature that many other languages have, but Python has, of course, gone without it for nearly 30 years at this point. In the end, it is actually a fairly small change for all of the uproars it caused.” Positional-only parameters Proposed in PEP 570, this introduces a new syntax (/) to specify positional-only parameters in Python function definitions. This is similar to how * indicates that the arguments to its right are keyword only. This syntax is already used by many CPython built-in and standard library functions, for instance, the pow() function: pow(x, y, z=None, /) This syntax gives library authors more control over better expressing the intended usage of an API and allows the API to “evolve in a safe, backward-compatible way.”  It gives library authors the flexibility to change the name of positional-only parameters without breaking callers. Additionally, this also ensures consistency of the Python language with existing documentation and the behavior of various  "builtin" and standard library functions. As with PEP 572, this proposal also got mixed reactions from Python developers. In support, one developer said, “Position-only parameters already exist in cpython builtins like range and min. Making their support at the language level would make their existence less confusing and documented.” While others think that this will allow authors to “dictate” how their methods could be used. “Not the biggest fan of this one because it allows library authors to overly dictate how their functions can be used, as in, mark an argument as positional merely because they want to. But cool all the same,” a Redditor commented. Debug support for f-strings Formatted strings (f-strings) were introduced in Python 3.6 with PEP 498. It enables you to evaluate an expression as part of the string along with inserting the result of function calls and so on. In Python 3.8, some additional syntax changes have been made by adding add (=) specifier and a !d conversion for ease of debugging. You can use this feature like this: print(f'{foo=} {bar=}') This provides developers a better way of doing “print-style debugging”, especially for those who have a background in languages that already have such feature such as  Perl, Ruby, JavaScript, etc. One developer expressed his delight on Hacker News, “F strings are pretty awesome. I’m coming from JavaScript and partially java background. JavaScript’s String concatenation can become too complex and I have difficulty with large strings.” Python Initialization Configuration Though Python is highly configurable, its configuration seems scattered all around the code.  The PEP 587 introduces a new C API to configure the Python Initialization giving developers finer control over the configuration and better error reporting. Among the improvements, this API will bring include ability to read and modify configuration before it is applied and overriding how Python computes the module search paths (``sys.path``). Along with these, there are many other exciting features coming to Python 3.8, which is currently scheduled for October, including a fast calling protocol for CPython, Vectorcall, support for out-of-band buffers in pickle protocol 5, and more. You can find the full list on Python’s official website. Python serious about diversity, dumps offensive ‘master’, ‘slave’ terms in its documentation Introducing PyOxidizer, an open source utility for producing standalone Python applications, written in Rust Python 3.8 beta 1 is now ready for you to test  
Read more
  • 0
  • 0
  • 7656

article-image-microsoft-mulls-replacing-c-and-c-code-with-rust-calling-it-a-a-modern-safer-system-programming-language-with-great-memory-safety-features
Vincy Davis
18 Jul 2019
3 min read
Save for later

Microsoft mulls replacing C and C++ code with Rust calling it a "modern safer system programming language" with great memory safety features

Vincy Davis
18 Jul 2019
3 min read
Here's another reason why Rust is the present and the future in programming. Few days ago, Microsoft announced that they are going to start exploring Rust and skip their own C languages. This announcement was made by the Principal Security Engineering Manager of Microsoft Security Response Centre (MSRC), Gavin Thomas. Thomas states that ~70% of the vulnerabilities which Microsoft assigns a CVE each year are caused by developers, who accidently insert memory corruption bugs into their C and C++ code. He adds, "As Microsoft increases its code base and uses more Open Source Software in its code, this problem isn’t getting better, it's getting worse. And Microsoft isn’t the only one exposed to memory corruption bugs—those are just the ones that come to MSRC." Image Source: Microsoft blog He highlights the fact that even after having so many security mechanisms (like static analysis tools, fuzzing at scale, taint analysis, many encyclopaedias of coding guidelines, threat modelling guidance, etc) to make a code secure, developers have to invest a lot of time in studying about more tools for training and vulnerability fixes. Thomas states that though C++ has many qualities like fast, mature, small memory and disk footprint, it does not have the memory security guarantee of languages like .NET C#. He believes that Rust is one language, which can provide both the requirements. Thomas strongly advocates that a software security industry should focus on providing a secure environment for developers to work on, rather than turning deaf ear to the importance of security, outdated methods and approaches. He thus concludes by hinting that Microsoft is going to adapt the Rust programming language. As he says that, "Perhaps it's time to scrap unsafe legacy languages and move on to a modern safer system programming language?" Microsoft exploring Rust is not surprising as Rust has been popular with many developers for its simpler syntax, less bugs, memory safe and thread safety. It has also been voted as the most loved programming language, according to the 2019 StackOverflow survey, the biggest developer survey on the internet. It allows developers to focus on their applications, rather than worrying about its security and maintenance. Recently, there have been many applications written in Rust, like Vector, Brave ad-blocker, PyOxidizer and more. Developers couldn't agree more with this post, as all have expressed their love for Rust. https://twitter.com/alilleybrinker/status/1151495738158977024 https://twitter.com/karanganesan/status/1151485485644054528 https://twitter.com/shah_sheikh/status/1151457054004875264 A Redditor says, "While this first post is very positive about memory-safe system programming languages in general and Rust in particular, I would not call this an endorsement. Still, great news!" Visit the Microsoft blog for more details. Introducing Ballista, a distributed compute platform based on Kubernetes and Rust EU Commission opens an antitrust case against Amazon on grounds of violating EU competition rules Fastly CTO Tyler McMullen on Lucet and the future of WebAssembly and Rust [Interview]
Read more
  • 0
  • 0
  • 7721

article-image-what-is-hcl-hashicorp-configuration-language-how-does-it-relate-to-terraform-and-why-is-it-growing-in-popularity
Savia Lobo
18 Jul 2019
6 min read
Save for later

What is HCL (Hashicorp Configuration Language), how does it relate to Terraform, and why is it growing in popularity?

Savia Lobo
18 Jul 2019
6 min read
HCL (Hashicorp Configuration language), is rapidly growing in popularity. Last year's Octoverse report by GitHub showed it to be the second fastest growing language on the platform, more than doubling in contributors since 2017 (Kotlin was top, with GitHub contributors growing 2.6 times). However, despite its growth, it hasn’t had the level of attention that other programming languages have had. One of the reasons for this is that HCL is a configuration language. It's also part of a broader ecosystem of tools built by cloud automation company HashiCorp that largely center around Terraform. What is Terraform? Terraform is an infrastructure-as-code tool that makes it easier to define and manage your cloud infrastructure. HCL is simply the syntax that allows you to better leverage its capabilities. It gives you a significant degree of control over your infrastructure in a way that’s more ‘human-readable’ than other configuration languages such as YAML and JSON. HCL and Terraform are both important parts of the DevOps world. They are not only built for a world that has transitioned to infrastructure-as-code, but also one in which this transition demands more from engineers. By making HCL a more readable, higher-level configuration language, the language can better facilitate collaboration and transparency between cross-functional engineering teams. With all of this in mind, HCL’s growing popularity can be taken to indicate broader shifts in the software development world. HashiCorp clearly understands them very well and is eager to help drive them forward. But before we go any further, let's dive a bit deeper into why HCL was created, how it works, and how it sits within the Terraform ecosystem. Why did Hashicorp create HCL? The development of HCL was borne from of HashiCorp’s experience of trying multiple different options for configuration languages. “What we learned,” the team explains on GitHub, “is that some people wanted human-friendly configuration languages and some people wanted machine-friendly languages.” The HashiCorp team needed a compromise - something that could offer a degree of flexibility and accessibility. As the team outlines their thinking, it’s clear to see what the drivers behind HCL actually are. JSON, they say, “is fairly verbose and... doesn't support comments” while YAML is viewed as too complex for beginners to properly parse and use effectively. Traditional programming languages also pose problems. Again, they’re too sophisticated and demand too much background knowledge from users to make them a truly useful configuration language. Put together, this underlines the fact that with HCL HashiCorp wanted to build something that is accessible to engineers of different abilities and skill sets, while also being clear enough to enable appropriate levels of transparency between teams. It is “designed to be written and modified by humans.” Listen: Uber engineer Yuri Shkuro talks distributed tracing and observability on the Packt Podcast How does the Hashicorp Configuration Language work? HCL is not a replacement for the likes of YAML or JSON. The team’s aim “is not to alienate other configuration languages. It is,” they say, “instead to provide HCL as a specialized language for our tools, and JSON as the interoperability layer.” Effectively, it builds on some of the things you can get with JSON, but reimagines them in the context of infrastructure and application configuration. According to the documentation, we should see HCL as a “structured configuration language rather than a data structure serialization language.” HCL is “always decoded using an application-defined schema,” which gives you a level of flexibility. It quite means the application is always at the center of the language. You don't have to work around it. If you want to learn more about the HCL syntax and how it works at a much deeper level, the documentation is a good place to start, as is this page on GitHub. Read next: Why do IT teams need to transition from DevOps to DevSecOps? The advantages of HCL and Terraform You can’t really talk about the advantages of HCL without also considering the advantages of Terraform. Indeed, while HCL might well be a well designed configuration language that’s accessible and caters to a wide range of users and use cases, it’s only in the context of Terraform that its growth really makes sense. Why is Terraform so popular? To understand the popularity of Terraform, you need to place it in the context of current trends and today’s software marketplace for infrastructure configuration. Terraform is widely seen as a competitor to configuration management tools like Chef, Ansible and Puppet. However, Terraform isn’t exactly a configuration management - it’s more accurate to call it a provisioning tool (config management tools configure software on servers that already exist - provisioning tools set up new ones). This is important because thanks to Docker and Kubernetes, the need for configuration has radically changed - you might even say that it’s no longer there. If a Docker container is effectively self-sufficient, with all the configuration files it needs to run, then the need for ‘traditional’ configuration management begins to drop. Of course, this isn’t to say that one tool is intrinsically better than any other. There are use cases for all of these types of tools. But the fact remains is that Terraform suits use cases that are starting to grow. Part of this is due to the rise of cloud agnosticism. As multi-cloud and hybrid cloud architectures become prevalent, DevOps teams need tools that let them navigate and manage resources across different platforms. Although all the major public cloud vendors have native tools for managing resources, these can sometimes be restrictive. The templates they offer can also be difficult to reuse. Take Azure ARM templates, for example - it can only be used to create an Azure resource. In contrast, Terraform allows you to provision and manage resources across different cloud platforms. Conclusion: Terraform and HCL can make DevOps more accessible It’s not hard to see why ThoughtWorks sees Terraform as such an important emerging technology. (In the last edition of ThoughtWorks Radar is claimed that now is the time to adopt it.) But it’s also important to understand that HCL is an important element in the success of Terraform. It makes infrastructure-as-code not only something that’s accessible to developers that might have previously only dipped their toes in operations, but also something that can be more collaborative, transparent, and observable for team members. The DevOps picture will undoubtedly evolve over the next few years, but it would appear that HashiCorp is going to have a big part to play in it.
Read more
  • 0
  • 0
  • 13386

article-image-implementing-horizontal-pod-autoscaling-in-kubernetes-tutorial
Savia Lobo
18 Jul 2019
18 min read
Save for later

Implementing Horizontal Pod Autoscaling in Kubernetes [Tutorial]

Savia Lobo
18 Jul 2019
18 min read
When we use Kubernetes deployments to deploy our pod workloads, it is simple to scale the number of replicas used by our applications up and down using the kubectl scale command. However, if we want our applications to automatically respond to changes in their workloads and scale to meet demand, then Kubernetes provides us with Horizontal Pod Autoscaling. This article is an excerpt taken from the book Kubernetes on AWS written by Ed Robinson. In this book, you will start by learning about Kubernetes' powerful abstractions - Pods and Services - that make managing container deployments easy.  Horizontal Pod Autoscaling allows us to define rules that will scale the numbers of replicas up or down in our deployments based on CPU utilization and optionally other custom metrics. Before we are able to use Horizontal Pod Autoscaling in our cluster, we need to deploy the Kubernetes metrics server; this server provides endpoints that are used to discover CPU utilization and other metrics generated by our applications. In this article, you will learn how to use the horizontal pod autoscaling method to automatically scale your applications and to automatically provision and terminate EC2 instances. Deploying the metrics server Before we can make use of Horizontal Pod Autoscaling, we need to deploy the Kubernetes metrics server to our cluster. This is because the Horizontal Pod Autoscaling controller makes use of the metrics provided by the metrics.k8s.io API, which is provided by the metrics server. While some installations of Kubernetes may install this add-on by default, in our EKS cluster we will need to deploy it ourselves. There are a number of ways to deploy add-on components to your cluster: If you are using helm to manage applications on your cluster, you could use the stable/metrics server chart. For simplicity we are just going to deploy the metrics server manifests using kubectl. I like to integrate deploying add-ons such as the metrics server and kube2iam with the process that provisions the cluster, as I see them as integral parts of the cluster infrastructure. But if you are going to use a tool like a helm to manage deploying applications to your cluster, then you might prefer to manage everything running on your cluster with the same tool. The decision you take really depends on the processes you and your team adopt for managing your cluster and the applications that run on it. The metrics server is developed in the GitHub repository. You will find the manifests required to deploy it in the deploy directory of that repository. Start by cloning the configuration from GitHub. The metrics server began supporting the authentication methods provided by EKS in version 0.0.3 so make sure the manifests you have use at least that version. You will find a number of manifests in the deploy/1.8+ directory. The auth-reader.yaml and auth-delegator.yaml files configure the integration of the metrics server with the Kubernetes authorization infrastructure. The resource-reader.yaml file configures a role to give the metrics server the permissions to read resources from the API server, in order to discover the nodes that pods are running on. Basically, metrics-server-deployment.yaml and metrics-server-service.yaml define the deployment used to run the service itself and a service to be able to access it. Finally, the metrics-apiservice.yaml file defines an APIService resource that registers the metrics.k8s.io API group with the Kubernetes API server aggregation layer; this means that requests to the API server for the metrics.k8s.io group will be proxied to the metrics server service. Deploying these manifests with kubectl is simple, just submit all of the manifests to the cluster with kubectl apply: $ kubectl apply -f deploy/1.8+ You should see a message about each of the resources being created on the cluster. If you are using a tool like Terraform to provision your cluster, you might use it to submit the manifests for the metrics server when you create your cluster. Verifying the metrics server and troubleshooting Before we continue, we should take a moment to check that our cluster and the metrics server are correctly configured to work together. After the metrics server is running on your cluster and has had a chance to collect metrics from the cluster (give it a minute or so), you should be able to use the kubectl top command to see the resource usage of the pods and nodes in your cluster. Start by running kubectl top nodes. If you see output like this, then the metrics server is configured correctly and is collecting metrics from your nodes: $ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-3-29-209 20m 1% 717Mi 19% ip-10-3-61-119 24m 1% 1011Mi 28% If you see an error message, then there are a number of troubleshooting steps you can follow. You should start by describing the metrics server deployment and checking that one replica is available: kubectl -n kube-system describe deployment metrics-server If it is not, you should debug the created pod by running kubectl -n kube-system describe pod. Look at the events to see why the server is not available. Make sure that you are running at least version 0.0.3 of the metrics server. If the metrics server is running correctly and you still see errors when running kubectl top, the issue is that the APIservice registered with the aggregation layer is not configured correctly. Check the events output at the bottom of the information returned when you run kubectl describe apiservice v1beta1.metrics.k8s.io. One common issue is that the EKS control plane cannot connect to the metrics server service on port 443. Autoscaling pods based on CPU usage Once the metrics server has been installed into our cluster, we will be able to use the metrics API to retrieve information about CPU and memory usage of the pods and nodes in our cluster. Using the kubectl top command is a simple example of this. The Horizontal Pod Autoscaler can also use this same metrics API to gather information about the current resource usage of the pods that make up a deployment. Let's look at an example of this; we are going to deploy a sample application that uses a lot of CPU under load, then configure a Horizontal Pod Autoscaler to scale up extra replicas of this pod to provide extra capacity when CPU utilization exceeds a target level. The application we will be deploying as an example is a simple Ruby web application that can calculate the nth number in the Fibonacci sequence, this application uses a simple recursive algorithm, and is not very efficient (perfect for us to experiment with autoscaling). The deployment for this application is very simple. It is important to set resource limits for CPU because the target CPU utilization is based on a percentage of this limit: deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: fib labels: app: fib spec: selector: matchLabels: app: fib template: metadata: labels: app: fib spec: containers: - name: fib image: errm/fib ports: - containerPort: 9292 resources: limits: cpu: 250m memory: 32Mi We are not specifying a number of replicas in the deployment spec; when we first submit this deployment to the cluster, the number of replicas will therefore default to 1. This is good practice when creating a deployment where we intend the replicas to be adjusted by a Horizontal Pod Autoscaler, because it means that if we use kubectl apply to update the deployment later, we won't override the replica value the Horizonal Pod Autoscaler has set (inadvertently scaling the deployment down or up). Let's deploy this application to the cluster: kubectl apply -f deployment.yaml You could run kubectl get pods -l app=fib to check that the application started up correctly. We will create a service, so we are able to access the pods in our deployment, requests will be proxied to each of the replicas, spreading the load: service.yaml kind: Service apiVersion: v1 metadata: name: fib spec: selector: app: fib ports: - protocol: TCP port: 80 targetPort: 9292 Submit the service manifest to the cluster with kubectl: kubectl apply -f service.yaml We are going to configure a Horizonal Pod Autoscaler to control the number of replicas in our deployment. The spec defines how we want the autoscaler to behave; we have defined here that we want the autoscaler to maintain between 1 and 10 replicas of our application and achieve a target average CPU utilization of 60, across those replicas. When CPU utilization falls below 60%, then the autoscaler will adjust the replica count of the targeted deployment down; when it goes above 60%, replicas will be added: hpa.yaml kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: fib spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: app/v1 kind: Deployment name: fib metrics: - type: Resource resource: name: cpu targetAverageUtilization: 60 Create the autoscaler with kubectl: kubectl apply -f hpa.yaml The kubectl autoscale command is a shortcut to create a HorizontalPodAutoscaler. Running kubectl autoscale deployment fib --min=1 --max=10 --cpu-percent=60 would create an equivalent autoscaler. Once you have created the Horizontal Pod Autoscaler, you can see a lot of interesting information about its current state with kubectl describe: $ kubectl describe hpa fib Name: fib Namespace: default CreationTimestamp: Sat, 15 Sep 2018 14:32:46 +0100 Reference: Deployment/fib Metrics: ( current / target ) resource cpu: 0% (1m) / 60% Min replicas: 1 Max replicas: 10 Deployment pods: 1 current / 1 desired Now we have set up our Horizontal Pod Autoscaler, we should generate some load on the pods in our deployment to illustrate how it works. In this case, we are going to use the ab (Apache benchmark) tool to repeatedly ask our application to compute the thirtieth Fibonacci number: load.yaml apiVersion: batch/v1 kind: Job metadata: name: fib-load labels: app: fib component: load spec: template: spec: containers: - name: fib-load image: errm/ab args: ["-n1000", "-c4", "fib/30"] restartPolicy: OnFailure This job uses ab to make 1,000 requests to the endpoint (with a concurrency of 4). Submit the job to the cluster, then observe the state of the Horizontal Pod Autoscaler: kubectl apply -f load.yaml watch kubectl describe hpa fib Once the load job has started to make requests, the autoscaler will scale up the deployment in order to handle the load: Name: fib Namespace: default CreationTimestamp: Sat, 15 Sep 2018 14:32:46 +0100 Reference: Deployment/fib Metrics: ( current / target ) resource cpu: 100% (251m) / 60% Min replicas: 1 Max replicas: 10 Deployment pods: 2 current / 2 desired Autoscaling pods based on other metrics The metrics server provides APIs that the Horizontal Pod Autoscaler can use to gain information about the CPU and memory utilization of pods in the cluster. It is possible to target a utilization percentage like we did for the CPU metric, or to target the absolute value as we have here for the memory metric: hpa.yaml kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: fib spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: app/v1 kind: Deployment name: fib metrics: - type: Resource resource: name: memory targetAverageValue: 20M The Horizonal Pod Autoscaler also allows us to scale on other metrics provided by more comprehensive metrics systems. Kubernetes allows for metrics APIs to be aggregated for custom and external metrics. Custom metrics are metrics other than CPU and memory that are associated with a pod. You might for example use an adapter that allows you to use metrics that a system like Prometheus has collected from your pods. This can be very beneficial if you have more detailed metrics available about the utilization of your application, for example, a forking web server that exposes a count of busy worker processes, or a queue processing application that exposes metrics about the number of items currently enqueued. External metrics adapters provide information about resources that are not associated with any object within Kubernetes, for example, if you were using an external queuing system, such as the AWS SQS service.   On the whole, it is simpler if your applications can expose metrics about resources that they depend on that use an external metrics adapter, as it can be hard to limit access to particular metrics, whereas custom metrics are tied to a particular Pod, so Kubernetes can limit access to only those users and processes that need to use them. Autoscaling the cluster The capabilities of Kubernetes Horizontal Pod Autoscaler allow us to add and remove pod replicas from our applications as their resource usage changes over time. However, this makes no difference to the capacity of our cluster. If our pod autoscaler is adding pods to handle an increase in load, then eventually we might run out of space in our cluster, and additional pods would fail to be scheduled. If there is a decrease in the load on our application and the pod autoscaler removes pods, then we are paying AWS for EC2 instances that will sit idle. When we created our cluster in Chapter 7, A Production-Ready Cluster, we deployed the cluster nodes using an autoscaling group, so we should be able to use this to grow and shrink the cluster as the needs of the applications deployed to it change over time. Autoscaling groups have built-in support for scaling the size of the cluster, based on the average CPU utilization of the instances. This, however, is not really suitable when dealing with a Kubernetes cluster because the workloads running on each node of our cluster might be quite different, so the average CPU utilization is not really a very good proxy for the free capacity of the cluster. Thankfully, in order to schedule pods to nodes effectively, Kubernetes keeps track of the capacity of each node and the resources requested by each pod. By utilizing this information, we can automate scaling the cluster to match the size of the workload. The Kubernetes autoscaler project provides a cluster autoscaler component for some of the main cloud providers, including AWS. The cluster autoscaler can be deployed to our cluster quite simply. As well as being able to add instances to our cluster, the cluster autoscaler is also able to drain the pods from and then terminate instances when the capacity of the cluster can be reduced.   Deploying the cluster autoscaler Deploying the cluster autoscaler to our cluster is quite simple as it just requires a simple pod to be running. All we need for this is a simple Kubernetes deployment. In order for the cluster autoscaler to update the desired capacity of our autoscaling group, we need to give it permissions via an IAM role. If you are using kube2iam, we will be able to specify this role for the cluster autoscaler pod via an appropriate annotation: cluster_autoscaler.tf data "aws_iam_policy_document" "eks_node_assume_role_policy" { statement { actions = ["sts:AssumeRole"] principals { type = "AWS" identifiers = ["${aws_iam_role.node.arn}"] } } } resource "aws_iam_role" "cluster-autoscaler" { name = "EKSClusterAutoscaler" assume_role_policy = "${data.aws_iam_policy_document.eks_node_assume_role_policy.json}" } data "aws_iam_policy_document" "autoscaler" { statement { actions = [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeTags", "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup" ] resources = ["*"] } } resource "aws_iam_role_policy" "cluster_autoscaler" { name = "cluster-autoscaler" role = "${aws_iam_role.cluster_autoscaler.id}" policy = "${data.aws_iam_policy_document.autoscaler.json}" }   In order to deploy the cluster autoscaler to our cluster, we will submit a deployment manifest using kubectl. We will use Terraform's templating system to produce the manifest. We create a service account that is used by the autoscaler to connect to the Kubernetes API: cluster_autoscaler.tpl --- apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler name: cluster-autoscaler namespace: kube-system The cluster autoscaler needs to read information about the current resource usage of the cluster, and needs to be able to evict pods from nodes that need to be removed from the cluster and terminated. Basically, cluster-autoscalerClusterRole provides the required permissions for these actions. The following is the code continuation for cluster_autoscaler.tpl: --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: cluster-autoscaler labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler rules: - apiGroups: [""] resources: ["events","endpoints"] verbs: ["create", "patch"] - apiGroups: [""] resources: ["pods/eviction"] verbs: ["create"] - apiGroups: [""] resources: ["pods/status"] verbs: ["update"] - apiGroups: [""] resources: ["endpoints"] resourceNames: ["cluster-autoscaler"] verbs: ["get","update"] - apiGroups: [""] resources: ["nodes"] verbs: ["watch","list","get","update"] - apiGroups: [""] resources: ["pods","services","replicationcontrollers","persistentvolumeclaims","persistentvolumes"] verbs: ["watch","list","get"] - apiGroups: ["extensions"] resources: ["replicasets","daemonsets"] verbs: ["watch","list","get"] - apiGroups: ["policy"] resources: ["poddisruptionbudgets"] verbs: ["watch","list"] - apiGroups: ["apps"] resources: ["statefulsets"] verbs: ["watch","list","get"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["watch","list","get"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: cluster-autoscaler labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-autoscaler subjects: - kind: ServiceAccount name: cluster-autoscaler namespace: kube-system Note that cluster-autoscaler stores state information in a config map, so needs permissions to be able to read and write from it. This role allows that. The following is the code continuation for cluster_autoscaler.tpl: --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: Role metadata: name: cluster-autoscaler namespace: kube-system labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler rules: - apiGroups: [""] resources: ["configmaps"] verbs: ["create"] - apiGroups: [""] resources: ["configmaps"] resourceNames: ["cluster-autoscaler-status"] verbs: ["delete","get","update"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: RoleBinding metadata: name: cluster-autoscaler namespace: kube-system labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: cluster-autoscaler subjects: - kind: ServiceAccount name: cluster-autoscaler namespace: kube-system Finally, let's consider the manifest for the cluster autoscaler deployment itself. The cluster autoscaler pod contains a single container running the cluster autoscaler control loop. You will notice that we are passing some configuration to the cluster autoscaler as command-line arguments. Most importantly, the --node-group-auto-discovery flag allows the autoscaler to operate on autoscaling groups with the kubernetes.io/cluster/<cluster_name> tag. This is convenient because we don't have to explicitly configure the autoscaler with our cluster autoscaling group. If your Kubernetes cluster has nodes in more than one availability zone and you are running pods that rely on being scheduled to a particular zone (for example, pods that are making use of EBS volumes), it is recommended to create an autoscaling group for each availability zone that you plan to use. If you use one autoscaling group that spans several zones, then the cluster autoscaler will be unable to specify the availability zone of the instances that it launches. Here is the code continuation for cluster_autoscaler.tpl: --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system labels: app: cluster-autoscaler spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler template: metadata: annotations: iam.amazonaws.com/role: ${iam_role} labels: app: cluster-autoscaler spec: serviceAccountName: cluster-autoscaler containers: - image: k8s.gcr.io/cluster-autoscaler:v1.3.3 name: cluster-autoscaler resources: limits: cpu: 100m memory: 300Mi requests: cpu: 100m memory: 300Mi command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=kubernetes.io/cluster/${cluster_name} env: - name: AWS_REGION value: ${aws_region} volumeMounts: - name: ssl-certs mountPath: /etc/ssl/certs/ca-certificates.crt readOnly: true imagePullPolicy: "Always" volumes: - name: ssl-certs hostPath: path: "/etc/ssl/certs/ca-certificates.crt" Finally, we render the templated manifest by passing in the variables for the AWS region, cluster name and IAM role, and submitting the file to Kubernetes using kubectl: Here is the code continuation for cluster_autoscaler.tpl: data "aws_region" "current" {} data "template_file" " cluster_autoscaler " { template = "${file("${path.module}/cluster_autoscaler.tpl")}" vars { aws_region = "${data.aws_region.current.name}" cluster_name = "${aws_eks_cluster.control_plane.name}" iam_role = "${aws_iam_role.cluster_autoscaler.name}" } } resource "null_resource" "cluster_autoscaler" { trigers = { manifest_sha1 = "${sha1("${data.template_file.cluster_autoscaler.rendered}")}" } provisioner "local-exec" { command = "kubectl --kubeconfig=${local_file.kubeconfig.filename} apply -f -<<EOF\n${data.template_file.cluster_autoscaler.rendered}\nEOF" } } Thus, by understanding how Kubernetes assigns Quality of Service classes to your pods based on the resource requests and limits that you assign them, you can have precisely control how your pods are managed. By ensuring your critical applications, such as web servers and databases, run with the Guaranteed class, you can ensure that they will perform consistently and suffer minimal disruption when pods need to be rescheduled. If you have enjoyed reading this post, head over to our book, Kubernetes on AWS, for tips on deploying and managing applications, keeping your cluster and applications secure, and ensuring that your whole system is reliable and resilient to failure Low Carbon Kubernetes Scheduler: A demand side management solution that consumes electricity in low grid carbon intensity areas A vulnerability discovered in Kubernetes kubectl cp command can allow malicious directory traversal attack on a targeted system Kubernetes 1.15 releases with extensibility around core Kubernetes APIs, cluster lifecycle stability, and more!
Read more
  • 0
  • 0
  • 16802
Banner background image

article-image-elon-musks-neuralink-unveils-a-sewing-machine-like-robot-to-control-computers-via-the-brain
Sugandha Lahoti
17 Jul 2019
8 min read
Save for later

Elon Musk's Neuralink unveils a “sewing machine-like” robot to control computers via the brain

Sugandha Lahoti
17 Jul 2019
8 min read
After two years of being super-secretive about their work, Neuralink, Elon’s Musk’s neurotechnology company, has finally presented their progress in brain-computer interface technology. The Livestream which was uploaded on YouTube showcases a “sewing machine-like” robot that can implant ultrathin threads deep into the brain giving people the ability to control computers and smartphones using their thoughts. For its brain-computer interface tech, the company has received $158 million in funding and has 90 employees. Note: All images are taken from Neuralink Livestream video unless stated otherwise. Elon Musk opened the presentation talking about the primary aim of Neuralink which is to use brain-computer interface tech to understand and treat brain disorders, preserve and enhance the brain, and ultimately and this may sound weird, “achieve a symbiosis with artificial intelligence”. He added, “This is not a mandatory thing. It is a thing you can choose to have if you want. This is something that I think will be really important on a civilization-level scale.” Neuralink wants to build, record from and selectively stimulate as many neurons as possible across diverse brain areas. They have three goals: Increase by orders of magnitude, the number of neurons you can read from and write to in safe, long-lasting ways. At each stage, produce devices that serve critical unmet medical needs of patients. Make inserting a computer connection into your brain as safe and painless as LASIK eye surgery. The robot that they have built was designed to be completely wireless, with a  practical bandwidth that is usable at home and lasts for a long time. Their system has an N1 sensor, which is an 8mm wide, 4mm tall cylinder having 1024 electrodes. It consists of a thin film, which has threads. The threads are placed using thin needles, into the brain by a robotic system in a manner akin to a sewing machine avoiding blood vessels. The robot peels off the threads one by one from the N1 Sensor and places it in the brain. A needle would grab each thread by a small loop and then is inserted into the brain by the robot. The robot is under the supervision of a human neurosurgeon who lays out where the threads are placed. The actual needle which the robot uses is 24 microns. The process puts a 2mm incision near the human ear, which is dilated to 8mm. The threads A robot implants threads using a needle For the first patients, the Neuralink team is looking at four sensors which will be connected via very small wires under the scalp to an inductive coil behind the ear. This is encased in a wearable device that they call the ‘Link’ which contains a Bluetooth radio and a battery. They will be controlled through an iPhone app. Source: NYT Neuralink/MetaLab iPhone app The goal is to drill four 8mm holes into paralyzed patients’ skulls and insert implants that will give them the ability to control computers and smartphones using their thoughts. For the first product, they are focusing on giving patients the ability to control their mobile device, and then redirect the output from their phone to a keyboard or a mouse. The company will seek U.S. Food and Drug Administration approval and is aspiring to target first-in-human clinical study by 2020. They will use it for treating upper cervical spinal cord injury. They’re expecting those patients to get four 1024 channel sensors, one each in the primary motor cortex, supplementary motor area, premotor cortex and closed-loop feedback into the primary somatosensory cortex. As reported by Bloomberg who got a pre-media briefing, Neuralink said it has performed at least 19 surgeries on animals with its robots and successfully placed the wires, which it calls “threads,” about 87% of the time. They used a lab rat and implanted a USB-C port in its head. A wire attached to the port transmitted its thoughts to a nearby computer where a software recorded and analyzed its brain activity, measuring the strength of brain spikes. The amount of data being gathered from a lab rat was about 10 times greater than what today’s most powerful sensors can collect. The flexibility of the Neuralink threads would be an advance, said Terry Sejnowski, the Francis Crick Professor at the Salk Institute for Biological Studies, in La Jolla, Calif to the New York Times. However, he noted that the Neuralink researchers still needed to prove that the insulation of their threads could survive for long periods in a brain’s environment, which has a salt solution that deteriorates many plastics. Musk's bizarre attempts to revolutionalize the world are far from reality Elon Musk is known for his dramatic promises and showmanship as much as he is for his eccentric projects. But how far they are grounded in reality is another thing. In May he successfully launched his mammoth space mission, Starlink sending 60 communications satellites to the orbit which will eventually be part of a single constellation providing high-speed internet to the globe. However, the satellites were launched after postponing it two times to “update satellite software”. Not just that,  three of the 60 satellites have lost contact with ground control teams, a SpaceX spokesperson said on June 28. Experts are already worried about how the Starlink constellation will contribute to the space debris problem. Currently, there are 2,000 operational satellites in orbit around Earth, according to the latest figures from the European Space Agency, and the completed Starlink constellation will drastically add to that number. Observers had also noticed some Starlink satellites had not initiated orbit raising after being released. Musk’s much-anticipated Hyperloop (first publicly mentioned in 2012) was supposed to shuttle passengers at near-supersonic speeds via pods traveling in a long, underground tunnel. But it was soon reduced to a car in a very small tunnel. When they unveiled the underground tunnel to the media in California last year in December, reporters climbed into electric cars made by Musk’s Tesla and were treated to a 40 mph ride along a bumpy path. Here as well there have been public concerns regarding its impact on public infrastructure and the environment. The biggest questions surrounding hyperloop’s environmental impact are its effect on carbon dioxide emissions, the effect of infrastructure on ecosystems, and the environmental footprint of the materials used to build it. Other concerns include noise pollution and how to repurpose hyperloop tubes and tunnels at the end of their lifespan. Researchers from Tencent Keen Security Lab criticized Tesla’s self-driving car software, publishing a report detailing their successful attacks on Tesla firmware. It includes remote control over the steering and an adversarial example attack on the autopilot that confuses the car into driving into oncoming traffic lane. Musk had also made promises to have a fully self-driving car for Tesla by 2020 which caused a lot of activity in the stock markets. But most are skeptical about this claim as well. Whether Elon Musk’s AI symbiotic visions will come in existence in the foreseeable future is questionable. Neuralink's long-term goals are characteristically unrealistic, considering not much is known about the human brain; cognitive functions and their representation as brain signals are still an area where much further research is required. While Musk’s projects are known for their technical excellence, History shows a lack of thought into the broader consequences and cost of such innovations such as the ethical concerns, environmental and societal impacts. Neuralink’s implant is also prone to invading one’s privacy as it will be storing sensitive medical information of a patient. There is also the likelihood of it violating one’s constitutional rights such as freedom of speech, expression among others. What does it mean to live in a world where one’s thoughts are constantly monitored and not truly one’s own? Then, because this is an implant what if the electrodes malfunction and send wrong signals to the brain. Who will be accountable in such scenarios? Although the FDA will be probing into such questions, these are some questions any responsible company should ask of itself proactively while developing life-altering products or services. These are equally important aspects that are worthy of stage time in a product launch. Regardless, Musk’s bold claims and dramatic representations are sure to gain the attention of investors and enthusiasts for now. Elon Musk reveals big plans with Neuralink SpaceX shares new information on Starlink after the successful launch of 60 satellites What Elon Musk can teach us about Futurism & Technology Forecasting
Read more
  • 0
  • 0
  • 4530

article-image-implementing-data-modeling-techniques-in-qlik-sense-tutorial
Bhagyashree R
17 Jul 2019
14 min read
Save for later

Implementing Data Modeling techniques in Qlik Sense [Tutorial]

Bhagyashree R
17 Jul 2019
14 min read
Data modeling is a conceptual process, representing the associations between the data in a manner in which it caters to specific business requirements. In this process, the various data tables are linked as per the business rules to achieve business needs. This article is taken from the book Hands-On Business Intelligence with Qlik Sense by Kaushik Solanki, Pablo Labbe, Clever Anjos, and Jerry DiMaso. By the end of this book, you will be well-equipped to run successful business intelligence applications using Qlik Sense's functionality, data modeling techniques, and visualization best practices. To follow along with the examples implemented in this article, you can download the code from the book’s GitHub repository. In this article, we will look at the basic concept of data modeling, its various types, and learn which technique is best suited for Qlik Sense dashboards. We will also learn about the methods for linking data with each other using joins and concatenation. Technical requirements For this article, we will use the app created earlier in the book, as a starting point with a loaded data model. You can find it in the book's GitHub repository. You can also download the initial and final version of the application from the repository. After downloading the initial version of the application, perform the following steps: If you are using Qlik Sense Desktop, place the app in the Qlik\Sense\Apps folder under your Documents personal folder If you are using Qlik Sense Cloud, upload the app to your personal workspace Advantages of data modeling Data modeling helps business in many ways. Let's look at some of the advantages of data modeling: High-speed retrieval: Data modeling helps to get the required information much faster than expected. This is because the data is interlinked between the different tables using the relationship. Provides ease of accessing data: Data modeling eases the process of giving the right access to the data to the end-users. With the simple data query language, you can get the required data easily. Helps in handling multiple relations: Various datasets have various kinds of relationship between the other data. For example, there could be one-to-one, or one-to-many, or many-to-many relationships. Data modeling helps in handling this kind of relationship easily. Stability: Data modeling provides stability to the system. Data modeling techniques There are various techniques in which data models can be built, each technique has its own advantages and disadvantages. The following are two widely-used data modeling techniques. Entity-relationship modeling The entity-relationship modeling (ER modeling) technique uses the entity and relationships to create a logical data model.  This technique is best suited for the Online Transaction Processing (OLTP) systems. An entity in this model refers to anything or object in the real world that has distinguishable characteristics. While a relationship in this model is the relationship between the two or more entities. There are three basic types of relationship that can exist: One-to-one: This relation means each value from one entity has a single relation with a value from the other entity. For example, one customer is handled by one sales representative: One-to-many: This relation means each value from one entity has multiple relations with values from other entities. For example, one sales representative handles multiple customers: Many-to-many: This relation means all values from both entities have multiple relations with each other. For example, one book can have many authors and each author can have multiple books: Dimensional modeling The dimensional modeling technique uses facts and dimensions to build the data model. This modeling technique was developed by Ralf Kimball. Unlike ER modeling, which uses normalization to build the model, this technique uses the denormalization of data to build the model. Facts, in this context, are tables that store the most granular transactional details. They mainly store the performance measurement metrics, which are the outcome of the business process. Fact tables are huge in size, because they store the transactional records. For example, let's say that sales data is captured at a retail store. The fact table for such data would look like the following: A fact table has the following characteristics: It contains the measures, which are mostly numeric in nature It stores the foreign key, which refers to the dimension tables It stores large numbers of records Mostly, it does not contain descriptive data The dimension table stores the descriptive data, describing the who, what, which, when, how, where, and why associated with the transaction. It has the maximum number of columns, but the records are generally fewer than fact tables. Dimension tables are also referred to as companions of the fact table. They store textual, and sometimes numerical, values. For example, a PIN code is numeric in nature, but they are not the measures and thus they get stored in the dimension table. In the previous sales example that we discussed, the customer, product, time, and salesperson are the dimension tables. The following diagram shows a sample dimension table: The following are the characteristics of the dimension table: It stores descriptive data, which describes the attributes of the transaction It contains many columns and fewer records compared to the fact table It also contains numeric data, which is descriptive in nature There are two types of dimensional modeling techniques that are widely used: Star schema: This schema model has one fact table that is linked with multiple dimension tables. The name star is given because once the model is ready, it looks like a star. The advantages of the star schema model include the following: Better query performance Simple to understand The following diagram shows an example of the star schema model: Snowflake schema: This schema model is similar to the star schema, but in this model, the dimensional tables are normalized further. The advantages of the snowflake schema model include the following: It provides better referential integrity It requires less space as data is normalized The following diagram shows an example of the snowflake schema model: When it comes to data modeling in Qlik Sense, the best option is to use the star schema model for better performance. Qlik Sense works very well when the data is loaded in a denormalized form, thus the star schema is suitable for Qlik Sense development. The following diagram shows the performance impact of different data models on Qlik Sense: Now that we know what data modeling is and which technique is most appropriate for Qlik Sense data modeling, let's look at some other fundamentals of handling data. Joining While working on data model building, we often encounter a situation where we want to have some fields added from one table into another to do some sort of calculations. In such situations, we use the option of joining those tables based on the common fields between them. Let's understand how we can use joins between tables with a simple example. Assume you want to calculate the selling price of a product. The information you have is SalesQty in Sales Table and UnitPrice of product in Product Table. The calculation for getting the sales price is UnitPrice * SalesQty. Now, let's see what output we get when we apply a join on these tables: Types of joins There are various kinds of joins available but let's take a look at the various types of joins supported by Qlik Sense. Let's consider the following tables to understand each type better: Order table: This table stores the order-related data: OrderNumber Product CustomerID OrderValue 100 Fruits 1 100 101 Fruits 2 80 102 Fruits 3 120 103 Vegetables 6 200 Customer table: This table stores the customer details, which include the CustomerID and Name: CustomerID Name 1 Alex 2 Linda 3 Sam 4 Michael 5 Sara Join/outer join When you want to get the data from both the tables you use the Join keyword. When you just use only Join between two tables, it is always a full outer join. The Outer keyword is optional. The following diagram shows the Venn diagram for the outer join: Now, let's see how we script this joining condition in Qlik Sense: Create a new Qlik Sense application. Give it a name of your choice. Jump to Script editor, create a new tab, and rename it as Outer Join, as shown in the following screenshot. Write the script shown in the following screenshot: Once you write the script, click on Load Data to run the script and load the data. Once the data is loaded, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: As the output of Outer Join, we got five fields, as shown in the preceding screenshot. You can also observe that the last two rows have null values for the fields, which come from the Order table, where the customers 4 and 5 are not present. Left join When you want to extract all the records from the left table and matching records from the right table, then you use the Left Join keyword to join those two tables. The following diagram shows the Venn diagram for left join: Let's see the script for left join: In the previous application created, delete the Outer Join tab. Create a new tab and rename it as Left Join, as shown in the following screenshot. Write the script shown in the following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Right join When you want to extract all the records from the right table and the matching records from the left table, then you use the right join keyword to join those two tables. The following diagram shows the Venn diagram for right join: Let's see the script for right join: In the previous application created, comment the existing script. Create a new tab and rename it as Right Join, as shown in the following screenshot. Write the script, as shown in the following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Inner join When you want to extract matching records from both the tables, you use the Inner Join keyword to join those two tables. The following diagram shows the Venn diagram for inner join: Let's see the script for inner join: In the previous application created, comment the existing script. Create a new tab and rename it as Inner Join, as shown in the following screenshot. Write the script shown in following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Concatenation Sometimes you come across a situation while building the data model where you may have to append one table below another. In such situations, you can use the concatenate function. Concatenating, as the name suggests, helps to add the records of one table below another. Concatenate is different from joins. Unlike joins, concatenate does not merge the matching records of both the tables in a single row. Automatic concatenation When the number of columns and their naming is same in two tables, Qlik Sense, by default, concatenates those tables without any explicit command. This is called the automatic concatenation. For example, you may get the customer information from two different sources, but with the same columns names. In such a case, automatic concatenation will be done by Qlik, as is shown in the following screenshot: You can see in the preceding screenshot that both the Source1 and Source2 tables have two columns with same names (note that names in Qlik Sense are case-sensitive). Thus, they are auto concatenated. One more thing to note here is that, in such a situation, Qlik Sense ignores the name given to the second table and stores all the data under the name given to the first table. The output table after concatenation is shown in the following screenshot: Forced concatenation There will be some cases in which you would like to concatenate two tables irrespective of the number of columns and name. In such a case, you should use the keyword Concatenate between two Load statements to concatenate those two tables. This is called the forced concatenation. For example, if you have sales and budget data at similar granularity, then you should use the Concatenate keyword to forcefully concatenate both tables, as shown in the following screenshot: The output table after loading this script will have data for common columns, one below the other. For the columns that are not same, there will be null values in those columns for the table in which they didn't exist. This is shown in the following output: You can see in the preceding screenshot that the SalesAmount is null for the budget data, and Budget is null for the sales data. The NoConcatenate In some situations when even though the columns and their name from the two tables are the same, you may want to treat them differently and don’t want to concatenate them. So Qlik Sense provides the NoConcatenate keyword, which helps to prevent automatic concatenation. Let's see how to write the script for NoConcatenate: You should handle the tables properly; otherwise, the output of NoConcatenate may create a synthetic table. Filtering In this section, we will learn how to filter the data while loading in Qlik Sense. As you know, there are two ways in which we can load the data in Qlik Sense: either by using the Data manager or the script editor. Let's see how to filter data with each of these options. Filtering data using the Data manager When you load data using the Data manager, you get an option named Filters at the top-right corner of the window, as shown in the following screenshot: This filter option enables us to set the filtering condition, which loads only the data that satisfies the condition given. The filter option allows the following conditions: = >  >= <  <= Using the preceding conditions, you can filter the text or numeric values of a field. For example, you can set a condition such as Date >= '01/01/2012' or ProductID = 80. The following screenshot shows such conditions applied in the Data load editor: Filtering data in the script editor If you are familiar with the Load statement or the SQL Select statement, it will be easy for you to filter the data while loading it. In the script editor, the best way to restrict the data is to include the Where clause at the end of the Load or Select statement; for example, Where Date >= '01/01/2012'. When you use the Where clause with the Load statement, you can use the following conditions: = > >= <  <= When you write the Where clause with the SQL Select statement, you can use the following conditions: = >  >= <  <= In Between Like Is Null Is Not Null The following screenshot shows an example of both the statements: This article walked you through various data modeling techniques. We also saw different types of joins and how we can implement them in Qlik Sense.  Then, we learned about concatenation and the scenarios in which we should use the concatenation option. We also looked at automatic concatenation, forced concatenation, and NoConcatenation. Further, we learned about the ways in which data can be filtered while loading in Qlik Sense. If you found this post useful, do check out the book, Hands-On Business Intelligence with Qlik Sense. This book teaches you how to create dynamic dashboards to bring interactive data visualization to your enterprise using Qlik Sense. 5 ways to create a connection to the Qlik Engine [Tip] What we learned from Qlik Qonnections 2018 Why AWS is the preferred cloud platform for developers working with big data
Read more
  • 0
  • 0
  • 15087
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-to-manage-complex-applications-using-kubernetes-based-helm-tool-tutorial
Savia Lobo
16 Jul 2019
16 min read
Save for later

How to manage complex applications using Kubernetes-based Helm tool [Tutorial]

Savia Lobo
16 Jul 2019
16 min read
Helm is a popular tool in the Kubernetes ecosystem that gives us a way of building packages (known as charts) of related Kubernetes objects that can be deployed in a cohesive way to a cluster. It also allows us to parameterize these packages, so they can be reused in different contexts and deployed to the varying environments that the services they provide might be needed in. This article is an excerpt taken from the book Kubernetes on AWS written by Ed Robinson. In this book, you will discover how to utilize the power of Kubernetes to manage and update your applications. In this article, you will learn how to manage complex applications using Kubernetes-based Helm tool. You will start by learning how to install Helm and later on how to configure and package Helm charts. Like Kubernetes, development of Helm is overseen by the Cloud Native Computing Foundation. As well as Helm (the package manager), the community maintains a repository of standard charts for a wide range of open source software you can install and run on your cluster. From the Jenkins CI server to MySQL or Prometheus, it's simple to install and run complex deployments involving many underlying Kubernetes resources with Helm. Installing Helm If you have already set up your own Kubernetes cluster and have correctly configured kubectl on your machine, then it is simple to install Helm. On macOS On macOS, the simplest way to install the Helm client is with Homebrew: $ brew install kubernetes-helm On Linux and Windows Every release of Helm includes prebuilt binaries for Linux, Windows, and macOS. Visit https://github.com/kubernetes/helm/releases to download the version you need for your platform. To install the client, simply unpack and copy the binary onto your path. For example, on a Linux machine you might do the following: $ tar -zxvf helm-v2.7.2-linux-amd64.tar.gz $ mv linux-amd64/helm /usr/local/bin/helm Installing Tiller Once you have the Helm CLI tool installed on your machine, you can go about installing Helm's server-side component, Tiller. Helm uses the same configuration as kubectl, so start by checking which context you will be installing Tiller onto: $ kubectl config current-context minikube Here, we will be installing Tiller into the cluster referenced by the Minikube context. In this case, this is exactly what we want. If your kubectl is not currently pointing to another cluster, you can quickly switch to the context you want to use like this: $ kubectl config use-context minikube If you are still not sure that you are using the correct context, take a quick look at the full config and check that the cluster server field is correct: $ kubectl config view --minify=true The minify flag removes any config not referenced by the current context. Once you are happy that the cluster that kubectl is connecting to is the correct one, we can set up Helm's local environment and install Tiller on to your cluster: $ helm init $HELM_HOME has been configured at /Users/edwardrobinson/.helm. Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster. Happy Helming! We can use kubectl to check that Tiller is indeed running on our cluster: $ kubectl -n kube-system get deploy -l app=helm NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE tiller-deploy 1 1 1 1 3m Once we have verified that Tiller is correctly running on the cluster, let's use the version command. This will validate that we are able to connect correctly to the API of the Tiller server and return the version number of both the CLI and the Tiller server: $ helm version Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"} Installing a chart Let's start by installing an application by using one of the charts provided by the community. You can discover applications that the community has produced Helm charts for at https://hub.kubeapps.com/. As well as making it simple to deploy a wide range of applications to your Kubernetes cluster, it's a great resource for learning some of the best practices the community uses when packaging applications for Helm. Helm charts can be stored in a repository, so it is simple to install them by name. By default, Helm is already configured to use one remote repository called Stable. This makes it simple for us to try out some commonly used applications as soon as Helm is installed. Before you install a chart, you will need to know three things: The name of the chart you want to install The name you will give to this release (If you omit this, Helm will create a random name for this release) The namespace on the cluster you want to install the chart into (If you omit this, Helm will use the default namespace) Helm calls each distinct installation of a particular chart a release. Each release has a unique name that is used if you later want to update, upgrade, or even remove a release from your cluster. Being able to install multiple instances of a chart onto a single cluster makes Helm a little bit different from how we think about traditional package managers that are tied to a single machine, and typically only allow one installation of a particular package at once. But once you have got used to the terminology, it is very simple to understand: A chart is the package that contains all the information about how to install a particular application or tool to the cluster. You can think of it as a template that can be reused to create many different instances or releases of the packaged application or tool. A release is a named installation of a chart to a particular cluster. By referring to a release by name, Helm can make upgrades to a particular release, updating the version of the installed tool, or making configuration changes. A repository is an HTTP server storing charts along with an index file. When configured with the location of a repository, the Helm client can install a chart from that repository by downloading it and then making a new release. Before you can install a chart onto your cluster, you need to make sure that Helm knows about the repository that you want to use. You can list the repositories that are currently in use by running the helm repo list command: $ helm repo list NAME URL stable https://kubernetes-charts.storage.googleapis.com local http://127.0.0.1:8879/charts By default, Helm is configured with a repository named stable pointing at the community chart repository and local repository that points at a local address for testing your own local repository. (You need to be running helm serve for this.) Adding a Helm repository to this list is simple with the helm repo add command. You can add my Helm repository that contains some example applications related to this book by running the following command: $ helm repo add errm https://charts.errm.co.uk "errm" has been added to your repositories In order to pull the latest chart information from the configured repositories, you can run the following command: $ helm repo update Hang tight while we grab the latest from your chart repositories... ...Skip local chart repository ...Successfully got an update from the "errm" chart repository ...Successfully got an update from the "stable" chart repository Update Complete. Happy Helming! Let's start with one of the simplest applications available in my Helm repository, kubeslate. This provides some very basic information about your cluster, such as the version of Kubernetes you are running and the number of pods, deployments, and services in your cluster. We are going to start with this application, since it is very simple and doesn't require any special configuration to run on Minikube, or indeed any other cluster. Installing a chart from a repository on your cluster couldn't be simpler: $ helm install --name=my-slate errm/kubeslate You should see a lot of output from the helm command. Firstly, you will see some metadata about the release, such as its name, status, and namespace: NAME: my-slate LAST DEPLOYED: Mon Mar 26 21:55:39 2018 NAMESPACE: default STATUS: DEPLOYED Next, you should see some information about the resources that Helm has instructed Kubernetes to create on the cluster. As you can see, a single service and a single deployment have been created: RESOURCES: ==> v1/Service NAME TYPE CLUSTER-IP PORT(S) AGE my-slate-kubeslate ClusterIP 10.100.209.48 80/TCP 0s ==> v1/Deployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE my-slate-kubeslate 2 0 0 0 0s ==> v1/Pod(related) NAME READY STATUS AGE my-slate-kubeslate-77bd7479cf-gckf8 0/1 ContainerCreating 0s my-slate-kubeslate-77bd7479cf-vvlnz 0/1 ContainerCreating 0s Finally, there is a section with some notes that have been provided by the chart's author to give us some information about how to start using the application: Notes: To access kubeslate. First start the kubectl proxy: kubectl proxy Now open the following URL in your browser: http://localhost:8001/api/v1/namespaces/default/services/my-slate-kubeslate:http/proxy Please try reloading the page if you see ServiceUnavailable / no endpoints available for service, as pod creation might take a few moments. Try following these instructions yourself and open Kubeslate in your browser: Kubeslate deployed with Helm Configuring a chart When you use Helm to make a release of a chart, there are certain attributes that you might need to change or configuration you might need to provide. Luckily, Helm provides a standard way for users of a chart to override some or all of the configuration values. In this section, we are going to look at how, as the user of a chart, you might go about supplying configuration to Helm. Later in the chapter, we are going to look at how you can create your own charts and use the configuration passed in to allow your chart to be customized. When we invoke helm install, there are two ways we can provide configuration values: passing them as command-line arguments, or by providing a configuration file. These configuration values are merged with the default values provided by a chart. This allows a chart author to provide a default configuration to allow users to get up and running quickly, but still allow users to tweak important settings, or enable advanced features. Providing a single value to Helm on the command line is achieved by using the set flag. The kubeslate chart allows us to specify additional labels for the pod(s) that it launches using the podLabels variable. Let's make a new release of the kubeslate chart, and then use the podLabels variable to add an additional hello label with the value world: $ helm install --name labeled-slate --set podLabels.hello=world errm/kubeslate Once you have run this command, you should be able to prove that the extra variable you passed to Helm did indeed result in the pods launched by Helm having the correct label. Using the kubectl get pods command with a label selector for the label we applied using Helm should return the pods that have just been launched with Helm: $ kubectl get pods -l hello=world NAME READY STATUS labeled-slate-kubeslate-5b75b58cb-7jpfk 1/1 Running labeled-slate-kubeslate-5b75b58cb-hcpgj 1/1 Running As well as being able to pass a configuration to Helm when we create a new release, it is also possible to update the configuration in a pre-existing release using the upgrade command. When we use Helm to update a configuration, the process is much the same as when we updated deployment resources in the last chapter, and a lot of those considerations still apply if we want to avoid downtime in our services. For example, by launching multiple replicas of a service, we can avoid downtime, as a new version of a deployment configuration is rolled out. Let's also upgrade our original kubeslate release to include the same hello: world pod label that we applied to the second release. As you can see, the structure of the upgrade command is quite similar to the install command. But rather than specifying the name of the release with the --name flag, we pass it as the first argument. This is because when we install a chart to the cluster, the name of the release is optional. If we omit it, Helm will create a random name for the release. However, when performing an upgrade, we need to target a pre-existing release to upgrade, and thus this argument is mandatory: $ helm upgrade my-slate --set podLabels.hello=world errm/kubeslate If you now run helm ls, you should see that the release named my-slate has been upgraded to Revision 2. You can test that the deployment managed by this release has been upgraded to include this pod label by repeating our kubectl get command: $ kubectl get pods -l hello=world NAME READY STATUS labeled-slate-kubeslate-5b75b58cb-7jpfk 1/1 Running labeled-slate-kubeslate-5b75b58cb-hcpgj 1/1 Running my-slate-kubeslate-5c8c4bc77-4g4l4 1/1 Running my-slate-kubeslate-5c8c4bc77-7pdtf 1/1 Running We can now see that four pods, two from each of our releases, now match the label selector we passed to kubectl get. Passing variables on the command line with the set flag is convenient when we just want to provide values for a few variables. But when we want to pass more complex configurations, it can be simpler to provide the values as a file. Let's prepare a configuration file to apply several labels to our kubeslate pods: values.yml podLabels: hello: world access: internal users: admin We can then use the helm command to apply this configuration file to our release: $ helm upgrade labeled-slate -f values.yml errm/kubeslate To learn how to create your own charts, head over to our book. Packaging Helm charts While we are developing our chart, it is simple to use the Helm CLI to deploy our chart straight from the local filesystem. However, Helm also allows you to create your own repository in order to share your charts. A Helm repository is a collection of packaged Helm charts, plus an index stored in a particular directory structure on a standard HTTP web server. Once you are happy with your chart, you will want to package it so it is ready to distribute in a Helm repository. This is simple to do with the helm package command. When you start to distribute your charts with a repository, versioning becomes important. The version number of a chart in a Helm repository needs to follow the SemVer 2 guidelines. In order to build a packaged chart, start by checking that you have set an appropriate version number in Chart.yaml. If this is the first time you have packaged your chart, the default will be OK: $ helm package version-app Successfully packaged chart and saved it to: ~/helm-charts/version-app-0.1.0.tgz You can test a packaged chart without uploading it to a repository by using the helm serve command. This command will serve all of the packaged charts found in the current directory and generate an index on the fly: $ helm serve Regenerating index. This may take a moment. Now serving you on 127.0.0.1:8879 You can now try installing your chart by using the local repository: $ helm install local/version-app You can test building an index An Helm repository is just a collection of packaged charts stored in a directory. In order to discover and search the charts and versions available in a particular repository, the Helm client downloads a special index.yaml that includes metadata about each packaged chart and the location it can be downloaded from. In order to generate this index file, we need to copy all the packaged charts that we want in our index to the same directory: cp ~/helm-charts/version-app-0.1.0.tgz ~/helm-repo/ Then, in order to generate the index.yaml file, we use the helm repo index command. You will need to pass the root URL where the packaged charts will be served from. This could be the address of a web server, or on AWS, you might use a S3 bucket: helm repo index ~/helm-repo --url https://helm-repo.example.org The chart index is quite a simple format, listing the name of each chart available, and then providing a list of each version available for each named chart. The index also includes a checksum in order to validate the download of charts from the repository: apiVersion: v1 entries: version-app: - apiVersion: v1 created: 2018-01-10T19:28:27.802896842Z description: A Helm chart for Kubernetes digest: 79aee8b48cab65f0d3693b98ae8234fe889b22815db87861e590276a657912c1 name: version-app urls: - https://helm-repo.example.org/version-app-0.1.0.tgz version: 0.1.0 generated: 2018-01-10T19:28:27.802428278Z The generated index.yaml file for our new chart repository. Once we have created the index.yaml file, it is simply a question of copying your packaged charts and the index file to the host you have chosen to use. If you are using S3, this might look like this: aws s3 sync ~/helm-repo s3://my-helm-repo-bucket In order for Helm to be able to use your repository, your web server (or S3) needs to be correctly configured. The web server needs to serve the index.yaml file with the correct content type header (text/yaml or text/x-yaml). The charts need to be available at the URLs listed in the index. Using your repository Once you have set up the repository, you can configure Helm to use it: helm repo add my-repo https://helm-repo.example.org my-repo has been added to your repositories When you add a repository, Helm validates that it can indeed connect to the URL given and download the index file. You can check this by searching for your chart by using helm search: $ helm search version-app NAME VERSION DESCRIPTION my-repo/version-app 0.1.1 A Helm chart for Kubernetes Thus, in this article you learned how to install Helm, configuring and packaging Helm charts.  It can be used for a wide range of scenarios where you want to deploy resources to a Kubernetes cluster, from providing a simple way for others to install an application you have written on their own clusters, to forming the cornerstone of an internal Platform as a Service within a larger organization. To know more about how to configure your own charts using Helm and to know the organizational patterns for Helm, head over to our book, Kubernetes on AWS. Elastic launches Helm Charts (alpha) for faster deployment of Elasticsearch and Kibana to Kubernetes Introducing ‘Quarkus’, a Kubernetes native Java framework for GraalVM & OpenJDK HotSpot Pivotal and Heroku team up to create Cloud Native Buildpacks for Kubernetes
Read more
  • 0
  • 0
  • 6939

article-image-linux-kernel-announces-a-patch-to-allow-0-0-0-0-8-as-a-valid-address-range
Savia Lobo
15 Jul 2019
6 min read
Save for later

Linux kernel announces a patch to allow 0.0.0.0/8 as a valid address range

Savia Lobo
15 Jul 2019
6 min read
Last month, the team behind Linux kernel announced a patch that allows 0.0.0.0/8 as a valid address range. This patch allows for these 16m new IPv4 addresses to appear within a box or on the wire. The aim is to use this 0/8 as a global unicast as this address was never used except the 0.0.0.0. In a post written by Dave Taht, Director of the Make-Wifi-Fast, and committed by David Stephen Miller, an American software developer working on the Linux kernel mentions that the use of 0.0.0.0/8 has been prohibited since the early internet due to two issues. First, an interoperability problem with BSD 4.2 in 1984, which was fixed in BSD 4.3 in 1986. “BSD 4.2 has long since been retired”, the post mentions. The second issue is that addresses of the form 0.x.y.z were initially defined only as a source address in an ICMP datagram, indicating "node number x.y.z on this IPv4 network", by nodes that know their address on their local network, but do not yet know their network prefix, in RFC0792 (page 19). The use of 0.x.y.z was later repealed in RFC1122 because the original ICMP-based mechanism for learning the network prefix was unworkable on many networks such as Ethernet. This is because these networks have longer addresses that would not fit into the 24 "node number" bits. Modern networks use reverse ARP (RFC0903) or BOOTP (RFC0951) or DHCP (RFC2131) to find their full 32-bit address and CIDR netmask (and other parameters such as default gateways). 0.x.y.z has had 16,777,215 addresses in 0.0.0.0/8 space left unused and reserved for future use, since 1989. The whole discussion of using allowing these IP address and making them available started early this year at the NetDevConf 2019, The Technical Conference on Linux Networking. The conference took place in Prague, Czech Republic, from March 20th to 22nd, 2019. One of the sessions, “Potential IPv4 Unicast Expansions”, conducted by  Dave Taht, along with John Gilmore, and Paul Wouters explains how IPv4 success story was in carrying unicast packets worldwide. The speakers say, service sites still need IPv4 addresses for everything, since the majority of Internet client nodes don't yet have IPv6 addresses. IPv4 addresses now cost 15 to 20 dollars apiece (times the size of your network!) and the price is rising. In their keynote, they described, the IPv4 address space includes hundreds of millions of addresses reserved for obscure (the ranges 0/8, and 127/16), or obsolete (225/8-231/8) reasons, or for "future use" (240/4 - otherwise known as class E). They highlighted the fact: “instead of leaving these IP addresses unused, we have started an effort to make them usable, generally. This work stalled out 10 years ago, because IPv6 was going to be universally deployed by now, and reliance on IPv4 was expected to be much lower than it in fact still is”. “We have been reporting bugs and sending patches to various vendors. For Linux, we have patches accepted in the kernel and patches pending for the distributions, routing daemons, and userland tools. Slowly but surely, we are decontaminating these IP addresses so they can be used in the near future. Many routers already handle many of these addresses, or can easily be configured to do so, and so we are working to expand unicast treatment of these addresses in routers and other OSes”, they further mentioned. They said they wanted to carry out an “authorized experiment to route some of these addresses globally, monitor their reachability from different parts of the Internet, and talk to ISPs who are not yet treating them as unicast to update their networks”. Here’s the patch code for 0.0.0.0/8 for Linux: Users have a mixed reaction to this announcement and assumed that these addresses would be unassigned forever. A few are of the opinion that for most business, IPv6 is an unnecessary headache. A user explained the difference between the address ranges in a reply to Jeremy Stretch’s (a network engineer) post, “0.0.0.0/8 - Addresses in this block refer to source hosts on "this" network. Address 0.0.0.0/32 may be used as a source address for this host on this network; other addresses within 0.0.0.0/8 may be used to refer to specified hosts on this network [RFC1700, page 4].” A user on Reddit writes, this announcement will probably get “the same reaction when 1.1.1.1 and 1.0.0.1 became available, and AT&T blocked it 'by accident' or most equipment vendors or major ISP will use 0.0.0.0/8 as a loopback interface or test interface because they never thought it would be assigned to anyone.” Another user on Elegant treader writes, “I could actually see us successfully inventing, and implementing, a multiverse concept for ipv4 to make these 32 bit addresses last another 40 years, as opposed to throwing these non-upgradable, hardcoded v4 devices out”. Another writes, if they would have “taken IPv4 and added more bits - we might all be using IPv6 now”. The user further mentions, “Instead they used the opportunity to cram every feature but the kitchen sink in there, so none of the hardware vendors were interested in implementing it and the backbones were slow to adopt it. So we got mass adoption of NAT instead of mass adoption of IPv6”. A user explains, “A single /8 isn’t going to meaningfully impact the exhaustion issues IPv4 faces. I believe it was APNIC a couple of years ago who said they were already facing allocation requests equivalent to an /8 a month”. “It’s part of the reason hand-wringing over some of the “wasteful” /8s that were handed out to organizations in the early days is largely pointless. Even if you could get those orgs to consolidate and give back large useable ranges in those blocks, there’s simply not enough there to meaningfully change the long term mismatch between demand and supply”, the user further adds. To know about these developments in detail, watch Dave Taht’s keynote video on YouTube: https://www.youtube.com/watch?v=92aNK3ftz6M&feature=youtu.be An attack on SKS Keyserver Network, a write-only program, poisons two high-profile OpenPGP certificates Former npm CTO introduces Entropic, a federated package registry with a new CLI and much more! Amazon adds UDP load balancing support for Network Load Balancer
Read more
  • 0
  • 0
  • 13111

article-image-amazons-partnership-with-nhs-to-make-alexa-offer-medical-advice-raises-privacy-concerns-and-public-backlash
Bhagyashree R
12 Jul 2019
6 min read
Save for later

Amazon’s partnership with NHS to make Alexa offer medical advice raises privacy concerns and public backlash

Bhagyashree R
12 Jul 2019
6 min read
Virtual assistants like Alexa and smart speakers are being increasingly used in today’s time because of the convenience they come packaged with. It is good to have someone play a song or restock your groceries just on your one command, or probably more than one command. You get the point! But, how comfortable will you be if these assistants can provide you some medical advice? Amazon has teamed up with UK’s National Health Service (NHS) to make Alexa your new medical consultant. The voice-enabled digital assistant will now answer your health-related queries by looking through the NHS website vetted by professional doctors. https://twitter.com/NHSX/status/1148890337504583680 The NHSX initiative to drive digital innovation in healthcare Voice search definitely gives us the most “humanized” way of finding information from the web. One of the striking advantages of voice-enabled digital assistants is that the elderly, the blind and those who are unable to access the internet in other ways can also benefit from them. UK’s health secretary, Matt Hancock, believes that “embracing” such technologies will not only reduce the pressure General Practitioners (GPs) and pharmacists face but will also encourage people to take better control of their health care. He adds, "We want to empower every patient to take better control of their healthcare." Partnering with Amazon is just one of many steps by NHS to adopt technology for healthcare. The NHS launched a full-fledged unit named NHSX (where X stands for User Experience) last week. Its mission is to provide staff and citizens “the technology they need” with an annual investment of more than $1 billion a year. This partnership was announced last year and NHS plans to partner with other companies such as Microsoft in the future to achieve its goal of “modernizing health services.” Can we consider Alexa’s advice safe Voice assistants are very fun and convenient to use, but only when they are actually working. Many a time it happens that the assistant fails to understand something and we have to yell the command again and again, which makes the experience outright frustrating. Furthermore, the track record of consulting the web to diagnose our symptoms has not been the most accurate one. Many Twitter users trolled this decision saying that Alexa is not yet capable of doing simple tasks like playing a song accurately and the NHS budget could have been instead used on additional NHS staff, lowering drug prices, and many other facilities. The public was also left sore because the government has given Amazon a new means to make a profit, instead of forcing them to pay taxes. Others also talked about the times when Google (mis)-diagnosed their symptoms. https://twitter.com/NHSMillion/status/1148883285952610304 https://twitter.com/doctor_oxford/status/1148857265946079232 https://twitter.com/TechnicallyRon/status/1148862592254906370 https://twitter.com/withorpe/status/1148886063290540032 AI ethicists and experts raise data privacy issues Amazon has been involved in several controversies around privacy concerns regarding Alexa. Earlier this month, it admitted that a few voice recordings made by Alexa are never deleted from the company's server, even when the user manually deletes them. Another news in April this year revealed that when you speak to an Echo smart speaker, not only does Alexa but potentially Amazon employees also listen to your requests. Last month, two lawsuits were filed in Seattle stating that Amazon is recording voiceprints of children using its Alexa devices without their consent. Last year, an Amazon Echo user in Portland, Oregon was shocked when she learned that her Echo device recorded a conversation with her husband and sent the audio file to one of his employees in Seattle. Amazon confirmed that this was an error because of which the device’s microphone misheard a series of words. Another creepy, yet funny incident was when Alexa users started hearing an unprompted laugh from their smart speaker devices. Alexa laughed randomly when the device was not even being used. https://twitter.com/CaptHandlebar/status/966838302224666624 Big tech including Amazon, Google, and Facebook constantly try to reassure their users that their data is safe and they have appropriate privacy measures in place. But, these promises are hard to believe when there is so many news of data breaches involving these companies. Last year, a German computer magazine c’t reported that a user received 1,700 Alexa voice recordings from Amazon when he asked for copies of the personal data Amazon has about him. Many experts also raised their concerns about using Alexa for giving medical advice. A Berlin-based tech expert Manthana Stender calls this move a “corporate capture of public institutions”. https://twitter.com/StenderWorld/status/1148893625914404864 Dr. David Wrigley, a British medical doctor who works as a general practitioner also asked how the voice recordings of people asking for health advice will be handled. https://twitter.com/DavidGWrigley/status/1148884541144219648 Director of Big Brother Watch, Silkie Carlo told BBC,  "Any public money spent on this awful plan rather than frontline services would be a breathtaking waste. Healthcare is made inaccessible when trust and privacy is stripped away, and that's what this terrible plan would do. It's a data protection disaster waiting to happen." Prof Helen Stokes-Lampard, of the Royal College of GPs, believes that the move has "potential", especially for minor ailments. She added that it is important individuals do independent research to ensure the advice given is safe or it could "prevent people from seeking proper medical help and create even more pressure". She further said that not everyone is comfortable using such technology or could afford it. Amazon promises that the data will be kept confidential and will not be used to build a profile on customers. A spokesman shared with The Times, "All data was encrypted and kept confidential. Customers are in control of their voice history and can review or delete recordings." Amazon is being sued for recording children’s voices through Alexa without consent Amazon Alexa is HIPAA-compliant: bigger leap in the health care sector Amazon is supporting research into conversational AI with Alexa fellowships
Read more
  • 0
  • 0
  • 2658

article-image-best-practices-for-restful-web-services-naming-conventions-and-api-versioning-tutorial
Sugandha Lahoti
12 Jul 2019
12 min read
Save for later

Best practices for RESTful web services : Naming conventions and API Versioning [Tutorial]

Sugandha Lahoti
12 Jul 2019
12 min read
This article covers two important best practices for REST and RESTful APIs: Naming conventions and API Versioning. This article is taken from the book Hands-On RESTful Web Services with TypeScript 3 by Biharck Muniz Araújo. This book will guide you in designing and developing RESTful web services with the power of TypeScript 3 and Node.js. What are naming conventions One of the keys to achieving a good RESTful design is naming the HTTP verbs appropriately. It is really important to create understandable resources that allow people to easily discover and use your services. A good resource name implies that the resource is intuitive and clear to use. On the other hand, the usage of HTTP methods that are incompatible with REST patterns creates noise and makes the developer's life harder. In this section, there will be some suggestions for creating clear and good resource URIs. It is good practice to expose resources as nouns instead of verbs. Essentially, a resource represents a thing, and that is the reason you should use nouns. Verbs refer to actions, which are used to factor HTTP actions. Three words that describe good resource naming conventions are as follows: Understandability: The resource's representation format should be understandable and utilizable by both the server and the client Completeness: A resource should be completely represented by the format Linkability: A resource can be linked to another resource Some example resources are as follows: Users of a system Blogs posts An article Disciplines in which a student is enrolled Students in which a professor teaches A blog post draft Each resource that's exposed by any service in a best-case scenario should be exposed by a unique URI that identifies it. It is quite common to see the same resource being exposed by more than one URI, which is definitely not good. It is also good practice to do this when the URI makes sense and describes the resource itself clearly. URIs need to be predictable, which means that they have to be consistent in terms of data structure. In general, this is not a REST required rule, but it enhances the service and/or the API. A good way to write good RESTful APIs is by writing them while having your consumers in mind. There is no reason to write an API and name it while thinking about the APIs developers rather than its consumers, who will be the people who are actually consuming your resources and API (as the name suggests). Even though the resource now has a good name, which means that it is easier to understand, it is still difficult to understand its boundaries. Imagine that services are not well named; bad naming creates a lot of chaos, such as business rule duplications, bad API usage, and so on. In addition to this, we will explain naming conventions based on a hypothetical scenario. Let's imagine that there is a company that manages orders, offers, products, items, customers, and so on. Considering everything that we've said about resources, if we decided to expose a customer resource and we want to insert a new customer, the URI might be as follows: POST https://<HOST>/customers The hypothetical request body might be as follows: { "fist-name" : "john", "last-name" : "doe", "e-mail" : "john.doe@email.com" } Imagine that the previous request will result in a customer ID of 445839 when it needs to recover the customer. The GET method could be called as follows: GET https://<HOST>/customers/445839 The response will look something like this: sample body response for customer #445839: { "customer-id": 445839, "fist-name" : "john", "last-name" : "doe", "e-mail" : "john.doe@email.com" } The same URI can be used for the PUT and DELETE operations, respectively: PUT https://<HOST>/customers/445839 The PUT body request might be as follows: { "last-name" : "lennon" } For the DELETE operation, the HTTP request to the URI will be as follows: DELETE https://<HOST>/customers/445839 Moving on, based on the naming conventions, the product URI might be as follows: POST https://<HOST>/products sample body request: { "name" : "notebook", "description" : "and fruit brand" } GET https://<HOST>/products/9384 PUT https://<HOST>/products/9384 sample body request: { "name" : "desktop" } DELETE https://<HOST>/products/9384 Now, the next step is to expose the URI for order creation. Before we continue, we should go over the various ways to expose the URI. The first option is to do the following: POST https://<HOST>/orders However, this could be outside the context of the desired customer. The order exists without a customer, which is quite odd. The second option is to expose the order inside a customer, like so: POST https://<HOST>/customers/445839/orders Based on that model, all orders belong to user 445839. If we want to retrieve those orders, we can make a GET request, like so: GET https://<HOST>/customers/445839/orders As we mentioned previously, it is also possible to write hierarchical concepts when there is a relationship between resources or entities. Following the same idea of orders, how should we represent the URI to describe items within an order and an order that belongs to user 445839? First, if we would like to get a specific order, such as order 7384, we can do that like so: GET https://<HOST>/customers/445839/orders/7384 Following the same approach, to get the items, we could use the following code: GET https://<HOST>/customers/445839/orders/7384/items The same concept applies to the create process, where the URI is still the same, but the HTTP method is POST instead of GET. In this scenario, the body also has to be sent: POST https://<HOST>/customers/445839/orders/7384 { "id" : 7834, "quantity" : 10 } Now, you should have a good idea of what the GET operation offers in regard to orders. The same approach can also be applied so that you can go deeper and get a specific item from a specific order and from a specific user: GET https://<HOST>/customers/445839/orders/7384/items/1 Of course, this hierarchy applies to the PUT, PATCH, and POST methods, and in some cases, the DELETE method as well. It will depend on your business rules; for example, can the item be deleted? Can I update an order? What is API versioning As APIs are being developed, gathering more business rules for their context on a day-to-day basis, generating tech debits and maturing, there often comes a point where teams need to release breaking functionality. It is also a challenge to keep their existing consumers working perfectly. One way to keep them working is by versioning APIs. Breaking changes can get messy. When something changes abruptly, it often generates issues for consumers, as this usually isn't planned and directly affects the ability to deliver new business experiences. There is a variant that says that APIs should be versionless. This means that building APIs that won't change their contract forces every change to be viewed through the lens of backward compatibility. This drives us to create better API interfaces, not only to solve any current issues, but to allow us to build APIs based on foundational capabilities or business capabilities themselves. Here are a few tips that should help you out: Put yourself in the consumer's shoes: When it comes to product perspective, it is suggested that you think from the consumer's point of view when building APIs. Most breaking changes happen because developers build APIs without considering the consumers, which means that they are building something for themselves and not for the real users' needs. Contract-first design: The API interface has to be treated as a formal contract, which is harder to change and more important than the coding behind it. The key to API design success is understanding the consumer's needs and the business associated with it to create a reliable contract. This is essentially a good, productive conversation between the consumers and the producers. Requires tolerant readers: It is quite common to add new fields to a contract with time. Based on what we have learned so far, this could generate a breaking change. This sometimes occurs because, unfortunately, many consumers utilize a deserializer strategy, which is strict by default. This means that, in general, the plugin that's used to deserialize throws exceptions on fields that have never been seen before. It is not recommended to version APIs, but only because you need to add a new optional field to the contract. However, in the same way, we don't want to break changes on the client side. Some good advice is documenting any changes, stating that new fields might be added so that the consumers aren't surprised by any new changes. Add an object wrapper: This sounds obvious, but when teams release APIs without object wrappers, the APIs turn on hard APIs, which means that they are near impossible to evolve without having to make breaking changes. For instance, let's say your team has delivered an API based on JSON that returns a raw JSON array. So far, so good. However, as they continue, they find out that they have to deal with paging, or have to internationalize the service or any other context change. There is no way of making changes without breaking something because the return is based on raw JSON. Always plan to version: Don't think you have built the best turbo API in the world ever. APIs are built with a final date, even though you don't know it yet. It's always a good plan to build APIs while taking versioning into consideration. Including the version in the URL Including the version in the URL is an easy strategy for having the version number added at the end of the URI. Let's see how this is done: https://api.domain.com/v1/ https://api.domain.com/v2/ https://api.domain.com/v3/ Basically, this model tells the consumers which API version they are using. Every breaking change increases the version number. One issue that may occur when the URI for a resource changes is that the resource may no longer be found with the old URI unless redirects are used. Versioning in the subdomain In regard to versioning in the URL, subdomain versioning puts the version within the URI but associated with the domain, like so: https://v1.api.domain.com/ https://v2.api.domain.com/ https://v3.api.domain.com/ This is quite similar to versioning at the end of the URI. One of the advantages of using a subdomain strategy is that your API can be hosted on different servers. Versioning on media types Another approach to versioning is using MIME types to include the API version. In short, API producers register these MIME types on their backend and then the consumers need to include accept and content-type headers. The following code lets you use an additional header: GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json Version: 1 GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json Version: 2 GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json Version: 3 The following code lets you use an additional field in the accept/content-type header: GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json; version=1 GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json; version=2 GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json; version=3 The following code lets you use a Media type: GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/vnd.<host>.orders.v1+json GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/vnd.<host>.orders.v2+json GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/vnd.<host>.orders.v3+json Recommendation When using a RESTful service, it is highly recommended that you use header-based versioning. However, the recommendation is to keep the version in the URL. This strategy allows the consumers to open the API in a browser, send it in an email, bookmark it, share it more easily, and so on. This format also enables human log readability. There are also a few more recommendations regarding API versioning: Use only the major version: API consumers should only care about breaking changes. Use a version number: Keep things clear; numbering the API incrementally allows the consumer to track evolvability. Versioning APIs using timestamps or any other format only creates confusion in the consumer's mind. This also exposes more information about versioning than is necessary. Require that the version has to be passed: Even though this is more convenient from the API producer's perspective, starting with a version is a good strategy because the consumers will know that the API version might change and they will be prepared for that. Document your API time-to-live policy: Good documentation is a good path to follow. Keeping everything well-described will mean that consumers avoid finding out that there is no Version 1 available anymore because it has been deprecated. Policies allow consumers to be prepared for issues such as depreciation. In this article, we learned about best practices related to RESTful web services such naming conventions, and API versioning formats. Next, to look at how to design RESTful web services with OpenAPI and Swagger, focusing on the core principles while creating web services, read our book Hands-On RESTful Web Services with TypeScript 3. 7 reasons to choose GraphQL APIs over REST for building your APIs Which Python framework is best for building RESTful APIs? Django or Flask? Understanding advanced patterns in RESTful API [Tutorial]
Read more
  • 0
  • 0
  • 29167
article-image-defining-rest-and-its-various-architectural-styles
Sugandha Lahoti
11 Jul 2019
9 min read
Save for later

Defining REST and its various architectural styles

Sugandha Lahoti
11 Jul 2019
9 min read
RESTful web services are services built according to REST principles. The idea is to have them designed to essentially work well on the web. But, what is REST? Let's start from the beginning by defining REST. This article is taken from the book Hands-On RESTful Web Services with TypeScript 3 by Biharck Muniz Araújo. This book is a  step-by-step guide that will help you design, develop, scale, and deploy RESTful APIs with TypeScript 3 and Node.js. In this article we will learn what is REST and talk about various REST architectural styles. What is REST? The REST (Representational State Transfer) style is a set of software engineering practices that contains constraints that should be used in order to create web services in distributed hypermedia systems. REST is not a tool and neither is it a language; in fact, REST is agnostic of protocols, components, and languages. It is important to say that REST is an architectural style and not a toolkit. REST provides a set of design rules in order to create stateless services that are shown as resources and, in some cases, sources of specific information such as data and functionality. The identification of each resource is performed by its unique Uniform Resource Identifier (URI). REST describes simple interfaces that transmit data over a standardized interface such as HTTP and HTTPS without any additional messaging layer, such as Simple Object Access Protocol (SOAP). The consumer will access REST resources via a URI using HTTP methods (this will be explained in more detail later). After the request, it is expected that a representation of the requested resource is returned. The representation of any resource is, in general, a document that reflects the current or intended state of the requested resource. REST architectural styles The REST architectural style describes six constraints. These constraints were originally described by Roy Fielding in his Ph.D. thesis. They include the following: Uniform interface Stateless Cacheable Client-server architecture A layered system Code on demand (optional) We will discuss them all minutely in the following subsections. Uniform interface Uniform interface is a constraint that describes a contract between clients and servers. One of the reasons to create an interface between them is to allow each part to evolve regardless of each other. Once there is a contract aligned with the client and server parts, they can start their works independently because, at the end of the day, the way that they will communicate is firmly based on the interface: The uniform interface is divided into four main groups, called principles: Resource-based The manipulation of resources using representations Self-descriptive messages Hypermedia as the Engine of Application State (HATEOAS) Let's talk more about them. Resource-based One of the key things when a resource is being modeled is the URI definition. The URI is what defines a resource as unique. This representation is what will be returned for clients. If you decided to perform GET to the offer URI, the resource that returns should be a resource representing an order containing the ID order, creation date, and so on. The representation should be in JSON or XML. Here is a JSON example: { id : 1234, creation-date : "1937-01-01T12:00:27.87+00:20", any-other-json-fields... } Here is an XML example: <order> <id>1234</id> <creation-date>1937-01-01T12:00:27.87+00:20</creation-date> any-other-xml-fields </order> The manipulation of resources using representations Following the happy path, when the client makes a request to the server, the server responds with a resource that represents the current state of its resource. This resource can be manipulated by the client. The client can request what kind it desires for the representation such as JSON, XML, or plain text. When the client needs to specify the representation, the HTTP Accept header is used. Here you can see an example in plain text: GET https://<HOST>/orders/12345 Accept: text/plain The next one is in JSON format: GET https://<HOST>/orders/12345 Accept: application/json Self-descriptive messages In general, the information provided by the RESTful service contains all the information about the resource that the client should be aware of. There is also a possibility of including more information than the resource itself. This information can be included as a link. In HTTP, it is used as the content-type header and the agreement needs to be bilateral—that is, the requestor needs to state the media type that it's waiting for and the receiver must agree about what the media type refers to. Some examples of media types are listed in the following table: Extension Document Type MIME type .aac AAC audio file audio/aac .arc Archive document application/octet-stream .avi Audio Video Interleave (AVI) video/x-msvideo .css Cascading Style Sheets (CSS) text/css .csv Comma-separated values (CSV) text/csv .doc Microsoft Word application/msword .epub Electronic publication (EPUB) application/epub+zip .gif Graphics Interchange Format (GIF) image/gif .html HyperText Markup Language (HTML) text/html .ico Icon format image/x-icon .ics iCalendar format text/calendar .jar Java Archive (JAR) application/java-archive .jpeg JPEG images image/jpeg .js JavaScript (ECMAScript) application/javascript .json JSON format application/json .mpeg MPEG video video/mpeg .mpkg Apple Installer Package application/vnd.apple.installer+xml .odt OpenDocument text document application/vnd.oasis.opendocument.text .oga OGG audio audio/ogg .ogv OGG video video/ogg .ogx OGG application/ogg .otf OpenType font font/otf .png Portable Network Graphics image/png .pdf Adobe Portable Document Format (PDF) application/pdf .ppt Microsoft PowerPoint application/vnd.ms-powerpoint .rar RAR archive application/x-rar-compressed .rtf Rich Text Format (RTF) application/rtf .sh Bourne shell script application/x-sh .svg Scalable Vector Graphics (SVG) image/svg+xml .tar Tape Archive (TAR) application/x-tar .ts TypeScript file application/typescript .ttf TrueType Font font/ttf .vsd Microsoft Visio application/vnd.visio .wav Waveform Audio Format audio/x-wav .zip ZIP archive application/zip .7z 7-zip archive application/x-7z-compressed There is also a possibility of creating custom media types. A complete list can be found here. HATEOAS HATEOAS is a way that the client can interact with the response by navigating within it through the hierarchy in order to get complementary information. For example, here the client makes a GET call to the order URI : GET https://<HOST>/orders/1234 The response comes with a navigation link to the items within the 1234 order, as in the following code block: { id : 1234, any-other-json-fields..., links": [ { "href": "1234/items", "rel": "items", "type" : "GET" } ] } What happens here is that the link fields allow the client to navigate until 1234/items in order to see all the items that belong to the 1234 order. Stateless Essentially, stateless means that the necessary state during the request is contained within the request and it is not persisted in any hypothesis that could be recovered further. Basically, the URI is the unique identifier to the destination and the body contains the state or changeable state, or the resource. In other words, after the server handles the request, the state could change and it will send back to the requestor with the appropriate HTTP status code: In comparison to the default session scope found in a lot of existing systems, the REST client must be the one that is responsible in providing all necessary information to the server, considering that the server should be idempotent. Stateless allows high scalability since the server will not maintain sessions. Another interesting point to note is that the load balancer does not care about sessions at all in stateless systems. In other words, the client needs to always pass the whole request in order to get the resource because the server is not allowed to hold any previous request state. Cacheable The aim of caching is to never have to generate the same response more than once. The key benefits of using this strategy are an increase in speed and a reduction in server processing. Essentially, the request flows through a cache or a series of caches, such as local caching, proxy caching, or reverse proxy caching, in front of the service hosting the resource. If any of them match with any criteria during the request (for example, the timestamp or client ID), the data is returned based on the cache layer, and if the caches cannot satisfy the request, the request goes to the server: Client-server architecture The REST style separates clients from a server. In short, whenever it is necessary to replace either the server or client side, things should flow naturally since there is no coupling between them. The client side should not care about data storage and the server side should not care about the interface at all: A layered system Each layer must work independently and interact only with the layers directly connected to it. This strategy allows passing the request without bypassing other layers. For instance, when scaling a service is desired, you might use a proxy working as a load balancer—that way, the incoming requests are deliverable to the appropriate server instance. That being the case, the client side does not need to understand how the server is going to work; it just makes requests to the same URI. The cache is another example that behaves in another layer, and the client does not need to understand how it works either: Code on demand In summary, this optional pattern allows the client to download and execute code from the server on the client side. The constraint says that this strategy improves scalability since the code can execute independently of the server on the client side: In this post, we discussed various REST architectural styles based on six constraints. To know more about best practices for RESTful design such as API endpoint organization, different ways to expose an API service, how to handle large datasets, check out the book Hands-On RESTful Web Services with TypeScript 3. 7 reasons to choose GraphQL APIs over REST for building your APIs Which Python framework is best for building RESTful APIs? Django or Flask? Understanding advanced patterns in RESTful API [Tutorial]
Read more
  • 0
  • 0
  • 10264

article-image-british-airways-set-to-face-a-record-breaking-fine-of-183m-by-the-ico-over-customer-data-breach
Sugandha Lahoti
08 Jul 2019
6 min read
Save for later

British Airways set to face a record-breaking fine of £183m by the ICO over customer data breach

Sugandha Lahoti
08 Jul 2019
6 min read
UK’s watchdog ICO is all set to fine British Airways more than £183m over a customer data breach. In September last year, British Airways notified ICO about a data breach that compromised personal identification information of over 500,000 customers and is believed to have begun in June 2018. ICO said in a statement, “Following an extensive investigation, the ICO has issued a notice of its intention to fine British Airways £183.39M for infringements of the General Data Protection Regulation (GDPR).” Information Commissioner Elizabeth Denham said, "People's personal data is just that - personal. When an organisation fails to protect it from loss, damage or theft, it is more than an inconvenience. That's why the law is clear - when you are entrusted with personal data, you must look after it. Those that don't will face scrutiny from my office to check they have taken appropriate steps to protect fundamental privacy rights." How did the data breach occur? According to the details provided by the British Airways website, payments through its main website and mobile app were affected from 22:58 BST August 21, 2018, until 21:45 BST September 5, 2018. Per ICO’s investigation, user traffic from the British Airways site was being directed to a fraudulent site from where customer details were harvested by the attackers. Personal information compromised included log in, payment card, and travel booking details as well name and address information. The fraudulent site performed what is known as a supply chain attack embedding code from third-party suppliers to run payment authorisation, present ads or allow users to log into external services, etc. According to a cyber-security expert, Prof Alan Woodward at the University of Surrey, the British Airways hack may possibly have been a company insider who tampered with the website and app's code for malicious purposes. He also pointed out that live data was harvested on the site rather than stored data. https://twitter.com/EerkeBoiten/status/1148130739642413056 RiskIQ, a cyber security company based in San Francisco, linked the British Airways attack with the modus operandi of a threat group Magecart. Magecart injects scripts designed to steal sensitive data that consumers enter into online payment forms on e-commerce websites directly or through compromised third-party suppliers. Per RiskIQ, Magecart set up custom, targeted infrastructure to blend in with the British Airways website specifically and to avoid detection for as long as possible. What happens next for British Airways? The ICO noted that British Airways cooperated with its investigation, and has made security improvements since the breach was discovered. They now have 28 days to appeal. Responding to the news, British Airways’ chairman and chief executive Alex Cruz said that the company was “surprised and disappointed” by the ICO’s decision, and added that the company has found no evidence of fraudulent activity on accounts linked to the breach. He said, "British Airways responded quickly to a criminal act to steal customers' data. We have found no evidence of fraud/fraudulent activity on accounts linked to the theft. We apologise to our customers for any inconvenience this event caused." ICO was appointed as the lead supervisory authority to tackle this case on behalf of other EU Member State data protection authorities. Under the GDPR ‘one stop shop’ provisions the data protection authorities in the EU whose residents have been affected will also have the chance to comment on the ICO’s findings. The penalty is divided up between the other European data authorities, while the money that comes to the ICO goes directly to the Treasury. What is somewhat surprising is that ICO disclosed the fine publicly even before Supervisory Authorities commented on ICOs findings and a final decision has been taken based on their feedback, as pointed by Simon Hania. https://twitter.com/simonhania/status/1148145570961399808 Record breaking fine appreciated by experts The penalty imposed on British Airways is the first one to be made public since GDPR’s new policies about data privacy were introduced. GDPR makes it mandatory to report data security breaches to the information commissioner. They also increased the maximum penalty to 4% of turnover of the penalized company. The fine would be the largest the ICO has ever issued; last ICO fined Facebook £500,000 fine for the Cambridge Analytica scandal, which was the maximum under the 1998 Data Protection Act. The British Airways penalty amounts to 1.5% of its worldwide turnover in 2017, making it roughly 367 times than of Facebook’s. Infact, it could have been even worse if the maximum penalty was levied;  the full 4% of turnover would have meant a fine approaching £500m. Such a massive fine would clearly send a sudden shudder down the spine of any big corporation responsible for handling cybersecurity - if they compromise customers' data, a severe punishment is in order. https://twitter.com/j_opdenakker/status/1148145361799798785 Carl Gottlieb, Privacy Lead & Data Protection Officer at Duolingo has summarized the factoids of this attack in a twitter thread which were much appreciated. GDPR fines are for inappropriate security as opposed to getting breached. Breaches are a good pointer but are not themselves actionable. So organisations need to implement security that is appropriate for their size, means, risk and need. Security is an organisation's responsibility, whether you host IT yourself, outsource it or rely on someone else not getting hacked. The GDPR has teeth against anyone that messes up security, but clearly action will be greatest where the human impact is most significant. Threats of GDPR fines are what created change in privacy and security practices over the last 2 years (not orgs suddenly growing a conscience). And with very few fines so far, improvements have slowed, this will help. Monetary fines are a great example to change behaviour in others, but a TERRIBLE punishment to drive change in an affected organisation. Other enforcement measures, e.g. ceasing processing personal data (e.g. ban new signups) would be much more impactful. https://twitter.com/CarlGottlieb/status/1148119665257963521 Facebook fined $2.3 million by Germany for providing incomplete information about hate speech content European Union fined Google 1.49 billion euros for antitrust violations in online advertising French data regulator, CNIL imposes a fine of 50M euros against Google for failing to comply with GDPR.
Read more
  • 0
  • 0
  • 5557

article-image-the-road-to-cassandra-4-0-what-does-the-future-have-in-store
Guest Contributor
06 Jul 2019
5 min read
Save for later

The road to Cassandra 4.0 – What does the future have in store?

Guest Contributor
06 Jul 2019
5 min read
In May 2019, DataStax hosted the Accelerate conference for Apache Cassandra™ inviting community members, DataStax customers, and other users to come together, discuss the latest developments around Cassandra, and find out more about the development of Cassandra. Nate McCall, Apache Cassandra Project Chair, presented the road to version 4.0 and what the community is focusing on for the future. So, what does the future really hold for Cassandra? The project has been going for ten years already, so what has to be added?  First off, listening to Nate’s keynote, the approach to development has evolved. As part of the development approach around Cassandra, it’s important to understand who is committing updates to Cassandra. The number of organisations contributing to Cassandra has increased, while the companies involved in the Project Management Committee includes some of the biggest companies in the world.  The likes of Instagram, Facebook and Netflix have team members contributing and leading the development of Cassandra because it is essential to their businesses. For DataStax, we continue to support the growth and development of Cassandra as an open source project through our own code contributions, our development and training, and our drivers that are available for the community and for our customers alike.  Having said all this, there are still areas where Cassandra can improve as we get ready for 4.0. From a development standpoint, the big things to look forward to as mentioned in Nate’s keynote are:  An improved Repair model For a distributed database, being able to carry on through any failure event is critical. After a failure, those nodes will have to be brought back online, and then catch up with the transactions that they missed. Making nodes consistent is a big task, covered by the Repair function. In Cassandra 4.0, the aim is to make Repair smarter. For example, Cassandra can preview the impact of a repair on a host to check that the operation will go through successfully, and specific pull requests for data can also be supported. Alongside this, a new transient replication feature should reduce the cost and bandwidth overhead associated with repair. By replicating temporary copies of data to supplement full copies, the overall cluster should be able to achieve higher levels of availability but at the same time reduce the overall volume of storage required significantly. For companies running very large clusters, the cost savings achievable here could be massive. A Messaging rewrite Efficient messaging between nodes is essential when your database is distributed. Cassandra 4.0 will have a new messaging system in place based on Netty, an asynchronous event-driven network application framework. In practice, using Netty will improve performance of messaging between nodes within clusters and between clusters. On top of this change, zero copy support will provide the ability to improve how quickly data can be streamed between nodes. Zero copy support achieves this by modifying the streaming path to add additional information into the streaming header, and then using ZeroCopy APIs to transfer bytes to and from the network and disk. This allows nodes to transfer large files faster. Cassandra and Kubernetes support Adding new messaging support and being able to transfer SSTables means that Cassandra can add more support for Kubernetes, and for Kubernetes to do interesting things around Cassandra too. One area that has been discussed is around dynamic cluster management, where the number of nodes and the volume of storage can be increased or decreased on demand. Sidecars Sidecars are additional functional tools designed to work alongside a main process. These sidecars fill a gap that is not part of the main application or service, and that should remain separate but linked. For Cassandra, running sidecars allows developers to add more functionality to their operations, such as creating events on an application. Java 11 support Java 11 support has been added to the Cassandra trunk version and will be present in 4.0. This will allow Cassandra users to use Java 11, rather than version 8 which is no longer supported.  Diagnostic events and logging This will make it easier for teams to use events for a range of things, from security requirements through to logging activities and triggering tools.  As part of the conference, there were two big trends that I took from the event. The first is – as Nate commented in his keynote – that there was a definite need for more community events that can bring together people who care about Cassandra and get them working together.   The second is that Apache Cassandra is essential to many companies today. Some of the world’s largest internet companies and most valuable brands out there rely on Cassandra in order to achieve what they do. They are contributors and committers to Cassandra, and they have to be sure that Cassandra is ready to meet their requirements. For everyone using Cassandra, this means that versions have to be ready for use in production rather than having issues to be fixed. Things will get released when they are ready, rather than to meet a particular deadline. And the community will take the lead in ensuring that they are happy with any release.  Cassandra 4.0 is nearing release. It’ll be out when it is ready. Whether you are looking at getting involved with the project through contributions, developing drivers or through writing documentation, there is a warm welcome for everyone in the run up to what should be a great release.  I’m already looking forward to ApacheCon later this year! Author Bio Patrick McFadin is the vice president of developer relations at DataStax, where he leads a team devoted to making users of DataStax products successful. Previously, he was chief evangelist for Apache Cassandra and a consultant for DataStax, where he helped build some of the largest and most exciting deployments in production; a chief architect at Hobsons; and an Oracle DBA and developer for over 15 years.
Read more
  • 0
  • 0
  • 6359
article-image-understanding-the-disambiguation-of-functional-expressions-in-lambda-leftovers-tutorial
Vincy Davis
05 Jul 2019
5 min read
Save for later

Understanding the Disambiguation of functional expressions in Lambda Leftovers [Tutorial]

Vincy Davis
05 Jul 2019
5 min read
Type inference was introduced with Java 5 and has been increasing in coverage ever since. With Java 8, the resolution of overloaded methods was restructured to allow for working with type inference. Before the introduction of lambdas and method references, a call to a method was resolved by checking the types of the arguments that were passed to it (the return type wasn't considered). With Java 8, implicit lambdas and implicit method references couldn't be checked for the types of values that they accepted, leading to restricted compiler capabilities, to rule out ambiguous calls to overloaded methods. However, explicit lambdas and method references could still be checked by their arguments by the compiler. The lambdas that explicitly specify the types of their parameters are termed explicit lambdas. Limiting the compiler's ability and relaxing the rules in this way was purposeful. It lowered the cost of type-checking for lambdas and avoided brittleness.  Lambda Leftovers proposes using an underscore for unused parameters in lambdas, methods, and catch handlers. [box type="shadow" align="" class="" width=""]This article is an excerpt taken from the book, "Java 11 and 12 - New Features", written by Mala Gupta. In this book, you will learn the latest developments in Java, right from variable type inference and simplified multi-threading through to performance improvements, and much more.[/box] In this article, you will understand the existing issues like resolving overloaded methods – passing lambdas, resolving overloaded methods – passing method references and also a proposed solution to define Lambda Leftover parameters Issues with resolving overloaded methods – passing lambdas Let's cover the existing issues with resolving overloaded methods when lambdas are passed as method parameters. Let's define two interfaces, Swimmer and Diver, as follows: interface Swimmer { boolean test(String lap); } interface Diver { String dive(int height); } In the following code, the overloaded evaluate method accepts the interfaces Swimmer and Diver as method parameters: class SwimmingMeet { static void evaluate(Swimmer swimmer) { // code compiles System.out.println("evaluate swimmer"); } static void evaluate(Diver diver) { // code compiles System.out.println("evaluate diver"); } } Let's call the overloaded evaluate() method in the following code: class FunctionalDisambiguation { public static void main(String args[]) { SwimmingMeet.evaluate(a -> false); // This code WON'T compile } } Revisit the lambda from the preceding code: a -> false // this is an implicit lambda Since the preceding lambda expression doesn't specify the type of its input parameter, it could be either String (the test() method and the Swimmer interface) or int (the dive() method and the Diver interface). Since the call to the evaluate() method is ambiguous, it doesn't compile. Let's add the type of the method parameter to the preceding code, making it an explicit lambda: SwimmingMeet.evaluate((String a) -> false); // This compiles!! The preceding call is not ambiguous now; the lambda expression accepts an input parameter of the String type and returns a boolean value, which maps to the evaluate() method which accepts Swimmer as a parameter (the functional test() method in the Swimmer interface accepts a parameter of the String type). Let's see what happens if the Swimmer interface is modified, changing the data type of the lap parameter from String to int. To avoid confusion, all of the code will be repeated, with the modifications in bold: interface Swimmer { // test METHOD IS // MODIFIED boolean test(int lap); // String lap changed to int lap } interface Diver { String dive(int height); } class SwimmingMeet { static void evaluate(Swimmer swimmer) { // code compiles System.out.println("evaluate swimmer"); } static void evaluate(Diver diver) { // code compiles System.out.println("evaluate diver"); } } Consider the following code, thinking about which of the lines of code will compile: 1. SwimmingMeet.evaluate(a -> false); 2. SwimmingMeet.evaluate((int a) -> false); In the preceding example, the code on both of the line numbers won't compile for the same reason—the compiler is unable to determine the call to the overloaded evaluate() method. Since both of the functional methods (that is, test() in the Swimmer interface and dive() in the Diver interface) accept one method parameter of the int type, it isn't feasible for the compiler to determine the method call. As a developer, you might argue that since the return types of test() and dive() are different, the compiler should be able to infer the correct calls. Just to reiterate, the return types of a method don't participate in method overloading. Overloaded methods must return in the count or type of their parameters. Issues with resolving overloaded methods – passing method references Overloaded methods can be defined with different parameter types, as follows: However, the following code doesn't compile: someMethod(Chamionship::reward); // ambiguous call In the preceding line of code, since the compiler is not allowed to examine the method reference, the code fails to compile. This is unfortunate since the method parameters to the overloaded methods are Integer and String—no value can be compatible with both. The proposed solution The accidental compiler issues involved with overloaded methods that use either lambda expressions or method references can be resolved by allowing the compiler to consider their return type as also. The compiler would then be able to choose the right overloaded method and eliminate the unmatched option. Summary For Java developers working with lambdas and method references, this article demonstrates what Java has in the pipeline to help ease problems. Lambda Leftovers plans to allow developers to define lambda parameters that can overshadow variables with the same name in their enclosing block. The disambiguation of functional expressions is an important and powerful feature. It will allow compilers to consider the return types of lambdas in order to determine the right overloaded methods. To know more about the exciting capabilities that are being added to the Java language in pattern matching and switch expressions, head over to the book, Java 11 and 12 - New Features. Using lambda expressions in Java 11 [Tutorial] How to deploy Serverless Applications in Go using AWS Lambda [Tutorial] Java 11 is here with TLS 1.3, Unicode 11, and more update
Read more
  • 0
  • 0
  • 4171

article-image-are-you-looking-at-transitioning-from-being-a-developer-to-manager-here-are-some-leadership-roles-to-consider
Packt Editorial Staff
04 Jul 2019
6 min read
Save for later

Are you looking at transitioning from being a developer to manager? Here are some leadership roles to consider

Packt Editorial Staff
04 Jul 2019
6 min read
What does the phrase "a manager" really mean anyway? This phrase means different things to different people and is often overused for the position which nearly matches an analyst-level profile! This term, although common, is worth defining what it really means, especially in the context of software development. This article is an excerpt from the book The Successful Software Manager written by an internationally experienced IT manager, Herman Fung. This book is a comprehensive and practical guide to managing software developers, software customers, and explores the process of deciding what software needs to be built, not how to build it. In this article, we’ll look into aspects you must be aware of before making the move to become a manager in the software industry. A simple distinction I once used to illustrate the difference between an analyst and a manager is that while an analyst identifies, collects, and analyzes information, a manager uses this analysis and makes decisions, or more accurately, is responsible and accountable for the decisions they make. The structure of software companies is now enormously diverse and varies a lot from one to another, which has an obvious impact on how the manager’s role and their responsibilities are defined, which will be unique to each company. Even within the same company, it's subject to change from time to time, as the company itself changes. Broadly speaking, a manager within software development can be classified into three categories, as we will now discuss: Team Leader/Manager This role is often a lead developer who also doubles up as the team spokesperson and single point of contact. They'll typically be the most senior and knowledgeable member of a small group of developers, who work on the same project, product, and technology. There is often a direct link between each developer in the team and their code, which means the team manager has a direct responsibility to ensure the product as a whole works. Usually, the team manager is also asked to fulfill the people management duties, such as performance reviews and appraisals, and day-to-day HR responsibilities. Development/Delivery Manager This person could be either a techie or a non-techie. They will have a good understanding of the requirements, design, code, and end product. They will manage running workshops and huddles to facilitate better overall team working and delivery. This role may include setting up visual aids, such as team/project charts or boards. In a matrix management model, where developers and other experts are temporarily asked to work in project teams, the development manager will not be responsible for HR and people management duties. Project Manager This person is most probably a non-techie, but there are exceptions, and this could be a distinct advantage on certain projects. Most importantly, a project manager will be process-focused and output-driven and will focus on distributing tasks to individuals. They are not expected to jump in to solve technical problems, but they are responsible for ensuring that the proper resources are available, while managing expectations. Specifically, they take part in managing the project budget, timeline, and risks. They should also be aware of the political landscape and management agenda within the organization to be able to navigate through them. The project manager ensures the project follows the required methodology or process framework mandated by the Project Management Office (PMO). They will not have people-management responsibilities for project team members. Agile practitioner As with all roles in today's world of tech, these categories will vary and overlap. They can even be held by the same person, which is becoming an increasingly common trait. They are also constantly evolving, which exemplifies the need to learn and grow continually, regardless of your role or position. If you are a true Agile practitioner, you may have issues in choosing these generalized categories, (Team Leader, Development Manager and Project Manager)  and you'd be right to do so! These categories are most applicable to an organization that practises the traditional Waterfall model. Without diving into the everlasting Waterfall vs Agile debate, let's just say that these are the categories that transcend any methodologies. Even if they're not referred to by these names, they are the roles that need to be performed, to varying degrees, at various times. For completeness, it is worth noting one role specific to Agile, that is being a scrum master. Scrum master A scrum master is a role often compared – rightly or wrongly – with that of the project manager. The key difference is that their focus is on facilitation and coaching, instead of organizing and control. This difference is as much of a mindset as it is a strict practice, and is often referred to as being attributes of Servant Leadership. I believe a good scrum master will show traits of a good project manager at various times, and vice versa. This is especially true in ensuring that there is clear communication at all times and the team stays focused on delivering together. Yet, as we look back at all these roles, it's worth remembering that with the advent of new disciplines such as big data, blockchain, artificial intelligence, and machine learning, there are new categories and opportunities to move from a developer role into a management position, for example, as an algorithm manager or data manager. Transitioning, growing, progressing, or simply changing from a developer to a manager is a wonderfully rewarding journey that is unique to everyone. After clarifying what being a “modern manager" really means, and the broad categories applicable in software development (Team / Development / Project / Agile), the overarching and often key consideration for developers is whether it means they will be managing people and writing less code. In this article, we looked into different leadership roles that are available for developers for their career progression plan. Develop crucial skills to enhance your performance and advance your career with The Successful Software Manager written by Herman Fung. “Developers don’t belong on a pedestal, they’re doing a job like everyone else” – April Wensel on toxic tech culture and Compassionate Coding [Interview] Curl’s lead developer announces Google’s “plan to reimplement curl in Libcrurl” ‘I code in my dreams too’, say developers in Jetbrains State of Developer Ecosystem 2019 Survey
Read more
  • 0
  • 0
  • 4212