Microservices | 16 articles | Tech News, Tutorials & Expert Insights

article-image-yuri-shkuro-on-observability-challenges-in-microservices-and-cloud-native-applications

05 Apr 2019

11 min read

Yuri Shkuro on Observability challenges in microservices and cloud-native applications

05 Apr 2019

In the last decade, we saw a significant shift in how modern, internet-scale applications are being built. Cloud computing (infrastructure as a service) and containerization technologies (popularized by Docker) enabled a new breed of distributed system designs commonly referred to as microservices (and their next incarnation, FaaS). Successful companies like Twitter and Netflix have been able to leverage them to build highly scalable, efficient, and reliable systems, and to deliver more features faster to their customers. In this article we explain the concept of observability in microservices, its challenges and traditional monitoring tools in microservices. This article is an extract taken from the book Mastering Distributed Tracing, written by Yuri Shkuro. This book will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. While there is no official definition of microservices, a certain consensus has evolved over time in the industry. Martin Fowler, the author of many books on software design, argues that microservices architectures exhibit the following common characteristics: Componentization via (micro)services Smart endpoints and dumb pipes Organized around business capabilities Decentralized governance Decentralized data management Infrastructure automation Design for failure Evolutionary design Because of the large number of microservices involved in building modern applications, rapid provisioning, rapid deployment via decentralized continuous delivery, strict DevOps practices, and holistic service monitoring are necessary to effectively develop, maintain, and operate such applications. The infrastructure requirements imposed by the microservices architectures spawned a whole new area of development of infrastructure platforms and tools for managing these complex cloud-native applications. In 2015, the Cloud Native Computing Foundation (CNCF) was created as a vendor-neutral home for many emerging open source projects in this area, such as Kubernetes, Prometheus, Linkerd, and so on, with a mission to "make cloud-native computing ubiquitous." Read more on Honeycomb CEO Charity Majors discusses observability and dealing with “the coming armageddon of complexity” [Interview] What is observability? The term "observability" in control theory states that the system is observable if the internal states of the system and, accordingly, its behavior, can be determined by only looking at its inputs and outputs. At the 2018 Observability Practitioners Summit, Bryan Cantrill, the CTO of Joyent and one of the creators of the tool dtrace, argued that this definition is not practical to apply to software systems because they are so complex that we can never know their complete internal state, and therefore the control theory's binary measure of observability is always zero (I highly recommend watching his talk on YouTube: https://youtu.be/U4E0QxzswQc). Instead, a more useful definition of observability for a software system is its "capability to allow a human to ask and answer questions". The more questions we can ask and answer about the system, the more observable it is. Figure 1: The Twitter debate There are also many debates and Twitter zingers about the difference between monitoring and observability. Traditionally, the term monitoring was used to describe metrics collection and alerting. Sometimes it is used more generally to include other tools, such as "using distributed tracing to monitor distributed transactions." The definition by Oxford dictionaries of the verb "monitor" is "to observe and check the progress or quality of (something) over a period of time; keep under systematic review." However, it is better scoped to describing the process of observing certain a priori defined performance indicators of our software system, such as those measuring an impact on the end-user experience, like latency or error counts, and using their values to alert us when these signals indicate an abnormal behavior of the system. Metrics, logs, and traces can all be used as a means to extract those signals from the application. We can then reserve the term "observability" for situations when we have a human operator proactively asking questions that were not predefined. As Bryan Cantrill put it in his talk, this process is debugging, and we need to "use our brains when debugging." Monitoring does not require a human operator; it can and should be fully automated. "If you want to talk about (metrics, logs, and traces) as pillars of observability–great. The human is the foundation of observability!" -- BryanCantrill In the end, the so-called "three pillars of observability" (metrics, logs, and traces) are just tools, or more precisely, different ways of extracting sensor data from the applications. Even with metrics, the modern time series solutions like Prometheus, InfluxDB, or Uber's M3 are capable of capturing the time series with many labels, such as which host emitted a particular value of a counter. Not all labels may be useful for monitoring, since a single misbehaving service instance in a cluster of thousands does not warrant an alert that wakes up an engineer. But when we are investigating an outage and trying to narrow down the scope of the problem, the labels can be very useful as observability signals. The observability challenge of microservices By adopting microservices architectures, organizations are expecting to reap many benefits, from better scalability of components to higher developer productivity. There are many books, articles, and blog posts written on this topic, so I will not go into that. Despite the benefits and eager adoption by companies large and small, microservices come with their own challenges and complexity. Companies like Twitter and Netflix were successful in adopting microservices because they found efficient ways of managing that complexity. Vijay Gill, Senior VP of Engineering at Databricks, goes as far as saying that the only good reason to adopt microservices is to be able to scale your engineering organization and to "ship the org chart". So, what are the challenges of this design? There are quite a few: In order to run these microservices in production, we need an advanced orchestration platform that can schedule resources, deploy containers, autoscale, and so on. Operating an architecture of this scale manually is simply not feasible, which is why projects like Kubernetes became so popular. In order to communicate, microservices need to know how to find each other on the network, how to route around problematic areas, how to perform load balancing, how to apply rate limiting, and so on. These functions are delegated to advanced RPC frameworks or external components like network proxies and service meshes. Splitting a monolith into many microservices may actually decrease reliability. Suppose we have 20 components in the application and all of them are required to produce a response to a single request. When we run them in a monolith, our failure modes are restricted to bugs and potentially a crush of the whole server running the monolith. But if we run the same components as microservices, on different hosts and separated by a network, we introduce many more potential failure points, from network hiccups, to resource constraints due to noisy neighbors. The latency may also increase. Assume each microservice has 1 ms average latency, but the 99th percentile is 1s. A transaction touching just one of these services has a 1% chance to take ≥ 1s. A transaction touching 100 of these services has 1 - (1 - 0.01)100 = 63% chance to take ≥ 1s. Finally, the observability of the system is dramatically reduced if we try to use traditional monitoring tools. When we see that some requests to our system are failing or slow, we want our observability tools to tell us the story about what happens to that request. Traditional monitoring tools Traditional monitoring tools were designed for monolith systems, observing the health and behavior of a single application instance. They may be able to tell us a story about that single instance, but they know almost nothing about the distributed transaction that passed through it. These tools "lack the context" of the request. Metrics It goes like this: "Once upon a time…something bad happened. The end." How do you like this story? This is what the chart in Figure 2 tells us. It's not completely useless; we do see a spike and we could define an alert to fire when this happens. But can we explain or troubleshoot the problem? Figure 2: A graph of two time series representing (hypothetically) the volume of traffic to a service Metrics, or stats, are numerical measures recorded by the application, such as counters, gauges, or timers. Metrics are very cheap to collect, since numeric values can be easily aggregated to reduce the overhead of transmitting that data to the monitoring system. They are also fairly accurate, which is why they are very useful for the actual monitoring (as the dictionary defines it) and alerting. Yet the same capacity for aggregation is what makes metrics ill-suited for explaining the pathological behavior of the application. By aggregating data, we are throwing away all the context we had about the individual transactions. Logs Logging is an even more basic observability tool than metrics. Every programmer learns their first programming language by writing a program that prints (that is, logs) "Hello, World!" Similar to metrics, logs struggle with microservices because each log stream only tells us about a single instance of a service. However, the evolving programming paradigms creates other problems for logs as a debugging tool. Ben Sigelman, who built Google's distributed tracing system Dapper, explained it in his KubeCon 2016 keynote talk as four types of concurrency (Figure 3): Figure 3: Evolution of concurrency Years ago, applications like early versions of Apache HTTP Server handled concurrency by forking child processes and having each process handle a single request at a time. Logs collected from that single process could do a good job of describing what happened inside the application. Then came multi-threaded applications and basic concurrency. A single request would typically be executed by a single thread sequentially, so as long as we included the thread name in the logs and filtered by that name, we could still get a reasonably accurate picture of the request execution. Then came asynchronous concurrency, with asynchronous and actor-based programming, executor pools, futures, promises, and event-loop-based frameworks. The execution of a single request may start on one thread, then continue on another, then finish on the third. In the case of event loop systems like Node.js, all requests are processed on a single thread but when the execution tries to make an I/O, it is put in a wait state and when the I/O is done, the execution resumes after waiting its turn in the queue. Both of these asynchronous concurrency models result in each thread switching between multiple different requests that are all in flight. Observing the behavior of such a system from the logs is very difficult, unless we annotate all logs with some kind of unique id representing the request rather than the thread, a technique that actually gets us close to how distributed tracing works. Finally, microservices introduced what we can call "distributed concurrency." Not only can the execution of a single request jump between threads, but it can also jump between processes, when one microservice makes a network call to another. Trying to troubleshoot request execution from such logs is like debugging without a stack trace: we get small pieces, but no big picture. In order to reconstruct the flight of the request from the many log streams, we need powerful logs aggregation technology and a distributed context propagation capability to tag all those logs in different processes with a unique request id that we can use to stitch those requests together. We might as well be using the real distributed tracing infrastructure at this point! Yet even after tagging the logs with a unique request id, we still cannot assemble them into an accurate sequence, because the timestamps from different servers are generally not comparable due to clock skews. In this article we looked at the concept of observability and some challenges one has to face in microservices. We further discussed traditional monitoring tools for microservices. Applying distributed tracing to microservices-based architectures will be easy with Mastering Distributed Tracing written by Yuri Shkuro. 6 Ways to blow up your Microservices! Have Microservices killed the monolithic architecture? Maybe not! How to build Dockers with microservices

0
0
5507

article-image-why-moving-from-a-monolithic-architecture-to-microservices-is-so-hard-gitlabs-jason-plum-breaks-it-down-kubeconcnc-talk

Amrata Joshi

19 Dec 2018

12 min read

Why moving from a monolithic architecture to microservices is so hard, Gitlab’s Jason Plum breaks it down [KubeCon+CNC Talk]

Amrata Joshi

19 Dec 2018

12 min read

Last week, at the KubeCon+CloudNativeCon North America 2018, Jason Plum, Sr. software engineer, distribution at GitLab spoke about GitLab, Omnibus, and the concept of monolith and its downsides. He spent the last year working on the cloud native helm charts and breaking out a complicated pile of code. This article highlights few insights from Jason Plum’s talk on Monolith to Microservice: Pitchforks Not Included at the KubeCon + CloudNativeCon. Key takeaways “You could not have seen the future that you live in today, learn from what you've got in the past, learn what's available now and work your way to it.” - Jason Plum GitLab’s beginnings as the monolithic project provided the means for focused acceleration and innovation. The need to scale better and faster than the traditional models caused to reflect on our choices, as we needed to grow beyond the current architecture to keep up. New ways of doing things require new ways of looking at them. Be open minded, and remember your correct choices in the past could not see the future you live in. “So the real question people don't realize is what is GitLab?”- Jason Plum Gitlab is the first single application to have the entire DevOps lifecycle in a single Interface. Omnibus - The journey from a package to a monolith “We had a group of people working on a single product to binding that and then we took that, we bundled that. And we shipped it and we shipped it and we shipped it and we shipped it and all the twenties every month for the entire lifespan of this company we have done that, that's not been easy. Being a monolith made that something that was simple to do at scale.”- Jason Plum In the beginning it was simple as Ruby on Rails was on a single codebase and users had to deploy it from source. Just one gigantic code was used but that's not the case these days. Ruby on Rails is still used for the primary application but now a shim proxy called workhorse is used that takes the heavy lifting away from Ruby. It ensures the users and their API’s are are responsive. The team at GitLab started packaging this because doing everything from source was difficult. They created the Omnibus package which eventually became the gigantic monolith. Monoliths make sense because... Adding features is simple It’s easy as everything is one bundle Clear focus for Minimum Viable Product (MVP) Advantages of Omnibus Full-stack bundle provides all components necessary to use every feature of GitLab. Simple to install. Components can be individually enabled/disabled. East to distribute. Highly controlled, version locked components. Guaranteed configuration stability. The downsides of monoliths “The problem is this thing is massive” - Jason Plum The Omnibus package can work on any platform, any cloud and under any distribution. But the question is how many of us would want to manage fleets of VMs? This package has grown so much that it is 1.5 gigabytes and unpacked. It has all the features and is still usable. If a user downloads 500 megabytes as an installation package then it unpacks almost a gigabyte and a half. This package contains everything that is required to run the SaaS but the problem is that this package is massive. “The trick is Git itself is the reason that moving to cloud native was hard.” - Jason Plum While using Git, the users run a couple of commands, they push them and deploy the app. But at the core of that command is how everything is handled and how everything is put together. Git works with snapshots of the entire file. The number of files include, every file the user has and every version the user had. It also involves all the indexes and references and some optimizations. But the problem is the more the files, the harder it gets. “Has anybody ever checked out the Linux tree? You check out that tree, get your coffee, come back check out, any branch I don't care what it is and then dip that against current master. How many files just got read on the file system?” - Jason Plum When you come back you realize that all the files that are marked as different and between the two of them when you do diff, that information is not stored, it's not greeting and it is not even cutting it out. It is running differently on all of those files. Imagine how bad that gets when you have 10 million lines of code in a repository that's 15 years old ? That’s expensive in terms of performance. - Jason Plum Traditional methods - A big problem “Now let's actually go and make a branch make some changes and commit them right. Now you push them up to your fork and now you go into add if you on an M R. Now it's my job to do the thing that was already hard on your laptop, right? Okay cool, that's one of you, how about 10,000 people a second right do you see where this is going? Suddenly it's harder but why is this the problem?” - Jason Plum The answer is traditional methods, as they are quite slow. If we have hundreds of things in the fleet, accessing tens of machines that are massive and it still won’t work because the traditional methods are a problem. Is NFS a solution to this problem? NFS (Network File System) works well when there are just 10 or 100 people. But if a user is asked to manage an NFS server for 5,000 people, one might rather choose pitchfork. NFS is capable but it can’t work at such a scale. The Git team now has a mount that has to be on every single node, as the API code and web code and other processes which needs to be functional enough to read the files. The team has previously used Garrett, Lib Git to read the files on the file system. Every time, one reads the file, the whole file used to get pulled. This gave rise to another problem, disk i/o problems. Since, everybody tries to read the disparate set of files, the traffic increases. “Okay so we have definitely found a scaling limit now we can only push the traditional methods of up and out so far before we realize that that's just not going to work because we don't have big enough pipes, end of line. So now we've got all of this and we've just got more of them and more of them and more of them. And all of a sudden we need to add 15 nodes to the fleet and another 15 nodes to the fleet and another 15 nodes to the fleet to keep up with sudden user demand. With every single time we have to double something the choke points do not grow - they get tighter and tighter” - Jason Plum The team decided to take a second look at the problem and started working on a project called Gitaly. They took the API calls that the users would make to live Git. So the Git mechanics was sent over a GRPC and then Gitaly was put on the actual file servers. Further the users were asked to call for a diff on whatever they want and then Gitaly was asked for the response. There is no need of NFS now. “I can send a 1k packet get a 4k response instead of NFS and reading 10,000 files. We centralized everything across and this gives us the ability to actually meet throughput because that pipe that's not getting any bigger suddenly has 1/10 of the traffic going through it.” - Jason Plum This leaves more space for users to easily get to the file servers and further removes the need of NFS mounts for everything. Incase one node is lost then half of the fleet is not lost in an instant. How is Gitaly useful? With Gitaly the throughput requirement significantly reduced. The service nodes no more need disk access. It provides optimization for specific problems. How to solve Git’s performance related issue? For better optimization and performance it is important to treat it like a service or like a database. The file system is still in use and all of the accesses to the files are on the node where we have the best performance and best caching and there is no issue with regards to the network. “To take the monolith and rip a chunk out make it something else and literally prop the thing up, but how long are we going to be able to do this?” - Jason Plum If a user plans to upload something then he/she has to use a file system and which means that NFS hasn't gone away. Do we really need to have NFS because somebody uploaded a cat picture? Come on guys we can do better than that right?- Jason Plum The next solution was to take everything as a traditional file that does not get and move into object store as an option. This matters because there is no need to have a file system locally. The files can be handed over to a service that works well. And it could run on Prem in a cloud and can be handled by any number of men and service providers. Pets cattle is a popular term by CERN which means anything that can be replaced easily is cattle and anything that you have to care and feed for on a regular basis is a pet. The pet could be the stateful information, for example, database. The problem can be better explained with configuring the Omnibus at scale. If there are hundreds of the VM’s and they are getting installed, further which the entire package is getting installed. So now there are 20 gigabytes per VM. The package needs to be downloaded for all the VM’s which means almost 500 megabytes. All the individual components can be configured out of the Omnibus. But even the load gets spreaded, it will still remain this big. And each of the nodes will at least take two minutes to come up from. So to speed up this process, the massive stack needs to be broken down into chunks and containers so they can be treated as individualized services. Also, there is no need of NFS as the components are no longer bound to the NFS disk. And this process would now take just five seconds instead of two minutes. A problem called legacy debt, a shared file system expectation which was a bugger. If there are separate containers and there is no shared disk then it could again give rise to a problem. “I can't do a shared disk because if we do shared disk through rewrite many. What's the major provider that will do that for us on every platform, anybody remember another three-letter problem.” - Jason Plum Then there came an interesting problem called workhorse, a smart proxy that talks to the UNIX sockets and not TCP. Though this problem got fixed. Time constraints - another problem “We can't break existing users and we can't have hiccups we have to think about everything ahead of time plan well and execute.” - Jason Plum Time constraints is a serious problem for a project’s developers, the development resources milestones, roadmaps deliverables. The new features would keep on coming into the project. The project would keep on functioning in the background but the existing users can’t be kept waiting. Is it possible to define individual component requirements? “Do you know how much CPU you need when idle versus when there's 10 people versus literally some guy clicking around and if files because he's one to look at what the kernel would like in 2 6 2 ?”- Jason Plum Monitoring helps to understand the component requirements. Metrics and performance data are few of the key elements for getting the exact component requirements. Other parameters like network, throughput, load balance, services etc also play an important role. But the problem is how to deal with throughput? How to balance the services? How to ensure that those services are always up? Then the other question comes up regarding the providers and load balancers as everyone doesn’t want to use the same load balancers or the same services. The system must support all the load balancers from all the major cloud providers and which is difficult. Issues with scaling “Maybe 50 percent for the thing that needs a lot of memory is a bad idea. I thought 50 percent was okay because when I ran a QA test against it, it didn't ever use more than 50 percent of one CPU. Apparently when I ran three more it now used 115 percent and I had 16 pounds and it fell over again.” - Jason Plum It's important to know what things needs to be scaled horizontally and which ones needs to be scaled vertically. To go automated or manual is also a crucial question. Also, it is equally important to understand which things should be configurable and how to tweak them as the use cases may vary from project to project. So, one should know how to go about a test and how to document a test. Issues with resilience “What happens to the application when a node, a whole node disappears off the cluster? Do you know how that behaves?” - Jason Plum It is important to understand which things shouldn't be on the same nodes. But the problem is how to recover it. These things are not known and by the time one understands the problem and the solution, it is too late. We need new ways of examining these issues and for planning the solution. Jason’s insightful talk on Monolith to Microservice gives a perfect end to the KubeCon + CloudNativeCon and is a must watch for everyone. Kelsey Hightower on Serverless and Security on Kubernetes at KubeCon + CloudNative RedHat contributes etcd, a distributed key-value store project, to the Cloud Native Computing Foundation at KubeCon + CloudNativeCon Oracle introduces Oracle Cloud Native Framework at KubeCon+CloudNativeCon 2018

0
0
5869

article-image-have-microservices-killed-monolithic-software-architecture-for-good

Aaron Lazar

04 Jun 2018

6 min read

Have Microservices killed the monolithic architecture? Maybe not!

Aaron Lazar

04 Jun 2018

6 min read

Microservices have been growing in popularity since the past few years, 2014 to be precise. Honestly speaking they weren’t that popular until around 2016 - take a look at the steep rise in the curve. The outbreak has happened over the past few years and there are quite a few factors contributing to their growth, like the cloud, distributed architectures, etc. Source: Google Trends Microservices allow for a clearer and refined architecture, with services built to work in isolation, without affecting the resilience and robustness of the application in any way. But does that mean that the Monolith is dead and only Microservices reign? Let’s find out, shall we? Those of you who participated in this year’s survey, I thank you for taking the time out to share such valuable information. For those of you who don’t know what the survey is all about, it a thing that we do every year, where thousands of developers, architects, managers, admins, share their insights with us, and we share our findings with the community. This year’s survey was as informative as the last, if not more! We had developers tell us so much about what they’re doing, where they see technology heading and what tools and techniques they use to stay relevant at what they do. So we took the opportunity and asked our respondents a question about the topic under discussion. Source: WWE.com Revelations If I asked a developer in 2018, what they thought would be the response, they’d instantly say that a majority would be for microservices. Source: Packtpub Skill Up Survey 2018 If you were the one who guessed the answer was going to be Yes, give yourself a firm pat on the back! It’s great to see that 1,603 people are throwing their hands up in the air and building microservices. On the other hand, it’s possible that it’s purely their manager’s decision (See how this forms a barrier to achieving business goals). Anyway, I was particularly concerned about the remaining 314 people who said ‘No’ (those who skipped answering, now is your chance to say something in the comments section below!). Why no Microservices? I thought I’d analyse the possibilities as to why one wouldn’t want to use the microservices pattern in their application architecture. It’s not like developers are migrating from monoliths to microservices, just because everyone else is doing it. Like any other architectural decision, there are several factors that need to be taken into consideration before making the switch. So here’s what I thought were some reasons why developers are sticking to monoliths. #1 One troll vs many elves: Complex times Well imagine you could be attacked by one troll or a hundred house elves. Which situation would you choose to be in if neither isn’t an option? I don’t know about you, but I’d choose the troll any day! Keeping the troll’s size aside, I’d be better off knowing I had one large enemy in front of me, rather than being surrounded by a hundred miniature ones. The same goes for microservices. More services means more complexity, more issues that could crop up. For developers, more services means that they would need to run or connect to all of them on their machine. Although there are tools that help solve this problem, you have to admit that it’s a task to run all services together as a whole application. On the other hand, Ops professionals are tasked to monitor and keep all these services up and running. #2 We lack the expertise Let alone having Developer Rockstars or Admin Ninjas (Oops, I shouldn’t be using those words now, find out why), if your organisation lacks experienced professionals, you’ve got a serious problem. What if there’s an organisation that has been having issues developing/managing a monolith itself. There’s no guarantee that they will be able to manage a microservices based application more effectively. It’s a matter of the organisation having enough hands on skills needed to perform these tasks. These skills are tough to acquire and it’s not simple for organisations to find the right talent. #3 Tower of Babel: Communication gaps In a monolith, communication happens within the application itself and the network channels exist internally. However, this isn’t the case for a microservices architecture as inter-service communication is necessary to keep everything running in tandem. This results in the generation of multiple points of failure, complicating things. To minimise failure, each service has a certain number of retries when trying to establish communication with another. When scaled up, these retries add a load on the database, what with communication formats having to follow strict rules to avoid complexity back again. It’s a vicious circle! #4 Rebuilding a monolith When you build an application based on the microservices architecture, you may benefit a great deal from robustness and reliability. However, microservices together form a large, complicated system, which can be managed by orchestration platforms like Kubernetes. Although, if individual teams are managing clusters of these services, it’s quite likely that orchestration, deployment and management of such a system will be a pain. #5 Burning in dependency hell Microservices are notorious for inviting developers to build services in various languages and then to glue them together. While this is an advantage to a certain extent, it complicates dependency management in the entire application. Moreover, dependencies get even more complicated when versions of tools don’t receive instantaneous support as they are updated. You and your team can go crazy keeping track of versions and dependencies that need to be managed to maintain smooth functioning of your application. So while the microservice architecture is hot, it is not always the best option and teams can actually end up making things worse if they choose to make the change unprepared. Yes, the cloud does benefit much more when applications are deployed as services, rather than as a monolith, but the renowned/infamous “lift and shift” method still exists and works when needed. Ultimately, if you think past the hype, the monolith is not really dead yet and is in fact still being deployed and run in several organisations. Finally, I want to stress that it’s critical that developers and architects take a well informed decision, keeping in mind all the above factors, before they choose an architecture. Like they say, “With great power comes great responsibility”, that’s exactly what great architecture is all about, rather than just jumping on the bandwagon. Building Scalable Microservices Why microservices and DevOps are a match made in heaven What is a multi layered software architecture?

0
0
6183

Packt Editorial Staff

03 Apr 2018

18 min read

What is domain driven design?

Packt Editorial Staff

03 Apr 2018

18 min read

Domain driven design exists because all software exists for a purpose. It does something. For example, you can't provide a software solution for a financial system such as online stock trading if you don't understand the stock exchanges and their functioning. Having domain knowledge is essential to solving problems with software. Domain driven design is simply designing software with the specific domain - whether that's finance, medicine, law, eCommerce - in mind. This has been taken from Mastering Microservices with Java 9 - Second Edition. Central to Domain Driven Design is the concept of a model. A model is an abstraction, or a blueprint, of the domain. Domain driven design is a collaborative activity Designing this model is not rocket science, but it does take a lot of effort, refining, and input from domain experts. It is the collective job of software designers, domain experts, and developers. They organize information, divide it into smaller parts, group them logically, and create modules. Each module can be taken up individually, and can be divided using a similar approach. This process can be followed until we reach the unit level, or when we cannot divide it any further. A complex project may have more of such iterations; similarly, a simple project could have just a single iteration of it. Once a model is defined and well documented, it can move onto the next stage - code design. So, here we have a software design—a domain model and code design, and code implementation of the domain model. The domain model provides a high level of the architecture of a solution (software/application), and the code implementation gives the domain model a life, as a working model. Domain Driven Design makes design and development work together. It provides the ability to develop software continuously, while keeping the design up to date based on feedback received from the development. It solves one of the limitations offered by Agile and Waterfall methodologies, making software maintainable, including design and code, as well as keeping application minimum viable. It gives developers the right platform to understand the domain, and provides the opportunity to share early feedback of the domain model implementation. It removes the bottleneck that appears in later stages when stockholders wait for deliverables. The fundamental components of Domain Driven Design To understand domain driven design, you can break it down into 3 fundamental concepts: Ubiquitous language and unified model language (UML) Multilayer architecture Artifacts (components) Ubiquitous language Ubiquitous language is a common language to communicate within a project. It's because designing a model is a collaborative effort of software designers, domain experts, and developers that it requires a common language to communicate with. It removes misunderstandings, misinterpretations. Communication gaps so often lead to bad software - ubiquitous language minimizes these gaps. It does, however, need to be used everywhere on a project. Unified Modeling Language (UML) is widely used and very popular when creating models. It also has a few limitations; for example, when you have thousands of classes drawn from a paper, it's difficult to represent class relationships and simultaneously understand their abstraction while taking a meaning from it. Also, UML diagrams do not represent the concepts of a model and what objects are supposed to do. Therefore, UML should always be used with other documents, code, or any other reference for effective communication. Multilayered architecture Multilayered architecture is a common solution for Domain Driven Design. It contains four layers: Presentation layer or (UI) Application layer - responsible for application logic. It maintains and coordinates the overall flow of the product/service. It does not contain business logic or UI. It may hold the state of application objects, like tasks in progress. Domain layer - contains the domain information and business logic. It holds the state of the business object. Infrastructure layer - provides support to all the other layers and is responsible for communication between them. To understand the interaction of the different layers, take the example of table booking at a restaurant. The end user places a request for a table booking using UI. The UI passes the request to the application layer. The application layer fetches the domain objects, such as the restaurant, the table, a date, and so on, from the domain layer. The domain layer fetches these existing persisted objects from the infrastructure, and invokes relevant methods to make the booking and persist them back to the infrastructure layer. Once domain objects are persisted, the application layer shows the booking confirmation to the end user. Artifacts used in Domain Driven Design There are seven different artifacts used in Domain Driven Design to express, create, and retrieve domain models: Entities Value objects Services Aggregates Repository Factory Module Entities are certain types of objects that are identifiable and remain the same throughout the states of the products/services. These objects are not identified by their attributes, but by their identity and thread of continuity. These type of objects are known as entities. It sounds pretty simple, but it carries complexity. You need to understand how we can define the entities. Let's take an example of a table booking system, where we have a restaurant class with attributes such as restaurant name, address, phone number, establishment data, and so on. We can take two instances of the restaurant class that are not identifiable using the restaurant name, as there could be other restaurants with the same name. Similarly, if we go by any other single attribute, we will not find any attributes that can singularly identify a unique restaurant. If two restaurants have all the same attribute values, they are therefore the same and are interchangeable with each other. Still, they are not the same entities, as both have different references (memory addresses). Conversely, let's take a class of U.S. citizens. Every U.S. citizen has his or her own social security number. This number is not only unique, but remains unchanged throughout the life of the citizen and assures continuity. This citizen object would exist in the memory, would be serialized, and would be removed from the memory and stored in the database. It even exists after the person is deceased. It will be kept in the system for as long as the system exists. A citizen's social security number remains the same irrespective of its representation. Therefore, creating entities in a product means creating an identity. So, now give an identity to any restaurant in the previous example, then either use a combination of attributes such as restaurant name, establishment date, and street, or add an identifier such as restaurant_id to identify it. The basic rule is that two identifiers cannot be the same. Therefore, when we introduce an identifier for an entity, we need to be sure of it. There are different ways to create a unique identity for objects, described as follows: Using the primary key in a table. Using an automated generated ID by a domain module. A domain program generates the identifier and assigns it to objects that are being persisted among different layers. A few real-life objects carry user-defined identifiers themselves. For example, each country has its own country codes for dialing ISD calls. Composite key. This is a combination of attributes that can also be used for creating an identifier, as explained for the preceding restaurant object. Value objects Value objects (VOs) simplify the design. In contrast to entities, value objects have only attributes and no conceptual identity. A best practice is to keep value objects as immutable objects. If possible, you should even keep entity objects immutable too. You might want to keep all objects as entities, but you're likely to run into problems if you do this; there has to be one instance for each object. Let's say you are creating customers as entity objects. Each customer object would represent the restaurant guest; this cannot be used for booking orders for other guests. This may create millions of customer entity objects in the memory if millions of customers are using the system. Not only are there millions of uniquely identifiable objects that exist in the system, but each object is being tracked. Tracking as well as creating an identity is complex. A highly credible system is required to create and track these objects, which is not only very complex, but also resource heavy. It may result in system performance degradation. Therefore, it is important to use value objects instead of using entities. The reasons are explained in the next few paragraphs. Applications don't always need to have to be trackable and have an identifiable customer object. There are cases when you just need to have some or all attributes of the domain element. These are the cases when value objects can be used by the application. It makes things simple and improves the performance. Value objects can easily be created and destroyed, owing to the absence of identity. This simplifies the design—it makes value objects available for garbage collection if no other object has referenced them. Value objects should be designed and coded as immutable. Once they are created, they should never be modified during their life-cycle. If you need a different value of the VO, or any of its objects, then simply create a new value object, but don't modify the original value object. Here, immutability carries all the significance from object-oriented programming (OOP). A value object can be shared and used without impacting on its integrity if, and only if, it is immutable. Services While creating the domain model, you may come across situations where behavior may not be related to any object. These behaviors can be accommodated in service objects. Service objects are part of the domain layer and do not have any internal state. The sole purpose of service objects is to provide behavior to the domain that does not belong to a single entity or value object. Ubiquitous language helps you to identify different objects, identities, or value objects with different attributes and behaviors during the process of domain driven design and domain modelling. During the course of creating the domain model, you may find different behaviors or methods that do not belong to any specific object. Such behaviors are important, and so cannot be neglected. Neither can you add them to entities or value objects. It would spoil the object to add behavior that does not belong to it. Keep in mind, that behavior may impact on various objects. The use of object-oriented programming makes it possible to attach to some objects; this is known as a service. Services are common in technical frameworks. These are also used in domain layers in domain driven design. A service object does not have any internal state; its only purpose is to provide a behavior to the domain. Service objects provide behaviors that cannot be related to specific entities or value objects. Service objects may provide one or more related behaviors to one or more entities or value objects. It is a practice to define the services explicitly in the domain model. While creating the services, you need to tick all of the following points: Service objects' behavior performs on entities and value objects, but it does not belong to entities or value objects Service objects' behavior state is not maintained, and hence, they are stateless Services are part of the domain model Services may also exist in other layers. It is very important to keep domain-layer services isolated. It removes the complexities and keeps the design decoupled. Let's take an example where a restaurant owner wants to see the report of his monthly table bookings. In this case, he will log in as an admin and click the Display Report button after providing the required input fields, such as duration. Application layers pass the request to the domain layer that owns the report and templates objects, with some parameters such as report ID, and so on. Reports get created using the template, and data is fetched from either the database or other sources. Then the application layer passes through all the parameters, including the report ID to the business layer. Here, a template needs to be fetched from the database or another source to generate the report based on the ID. This operation does not belong to either the report object or the template object. Therefore, a service object is used that performs this operation to retrieve the required template from the database. Aggregates Aggregate domain pattern is related to the object's life cycle. It defines ownership and boundaries which is crucial in Domain Driven Design When you reserve a table at your favorite restaurant online using an application, you don't need to worry about the internal system and process that takes place to book your reservation, including searching for available restaurants, then for available tables on the given date, time, and so on and so forth. Therefore, you can say that a reservation application is an aggregate of several other objects, and works as a root for all the other objects for a table reservation system. This root should be an entity that binds collections of objects together. It is also called the aggregate root. This root object does not pass any reference of inside objects to external worlds, and protects the changes performed within internal objects. We need to understand why aggregators are required. A domain model can contain large numbers of domain objects. The bigger the application functionalities and size and the more complex its design, the greater number of objects present. A relationship exists between these objects. Some may have a many-to-many relationship, a few may have a one-to-many relationship, and others may have a one-to-one relationship. These relationships are enforced by the model implementation in the code, or in the database that ensures that these relationships among the objects are kept intact. Relationships are not just unidirectional; they can also be bidirectional. They can also increase in complexity. The designer's job is to simplify these relationships in the model. Some relationships may exist in a real domain, but may not be required in the domain model. Designers need to ensure that such relationships do not exist in the domain model. Similarly, multiplicity can be reduced by these constraints. One constraint may do the job where many objects satisfy the relationship. It is also possible that a bidirectional relationship could be converted into a unidirectional relationship. No matter how much simplification you input, you may still end up with relationships in the model. These relationships need to be maintained in the code. When one object is removed, the code should remove all the references to this object from other places. For example, a record removal from one table needs to be addressed wherever it has references in the form of foreign keys and such, to keep the data consistent and maintain its integrity. Also, invariants (rules) need to be forced and maintained whenever data changes. Relationships, constraints, and invariants bring a complexity that requires an efficient handling in code. We find the solution by using the aggregate represented by the single entity known as the root, which is associated with the group of objects that maintains consistency with regards to data changes. This root is the only object that is accessible from outside, so this root element works as a boundary gate that separates the internal objects from the external world. Roots can refer to one or more inside objects, and these inside objects can have references to other inside objects that may or may not have relationships with the root. However, outside objects can also refer to the root, and not to any inside objects. An aggregate ensures data integrity and enforces the invariant. Outside objects cannot make any change to inside objects; they can only change the root. However, they can use the root to make a change inside the object by calling exposed operations. The root should pass the value of inside objects to outside objects if required. If an aggregate object is stored in the database, then the query should only return the aggregate object. Traversal associations should be used to return the object when it is internally linked to the aggregate root. These internal objects may also have references to other aggregates. An aggregate root entity holds its global identity, and holds local identities inside their entities. A simple example of an aggregate in the table booking system is the customer. Customers can be exposed to external objects, and their root object contains their internal object address and contact information. When requested, the value object of internal objects, such as address, can be passed to external objects: Repository In a domain model, at a given point in time, many domain objects may exist. Each object may have its own life-cycle, from the creation of objects to their removal or persistence. Whenever any domain operation needs a domain object, it should retrieve the reference of the requested object efficiently. It would be very difficult if you didn't maintain all of the available domain objects in a central object. A central object carries the references of all the objects, and is responsible for returning the requested object reference. This central object is known as the repository. The repository is a point that interacts with infrastructures such as the database or file system. A repository object is the part of the domain model that interacts with storage such as the database, external sources, and so on, to retrieve the persisted objects. When a request is received by the repository for an object's reference, it returns the existing object's reference. If the requested object does not exist in the repository, then it retrieves the object from storage. For example, if you need a customer, you would query the repository object to provide the customer with ID 31. The repository would provide the requested customer object if it is already available in the repository, and if not, it would query the persisted stores such as the database, fetch it, and provide its reference. The main advantage of using the repository is having a consistent way to retrieve objects where the requestor does not need to interact directly with the storage such as the database. A repository may query objects from various storage types, such as one or more databases, filesystems, or factory repositories, and so on. In such cases, a repository may have strategies that also point to different sources for different object types As you can see in the repository object flow diagram on the right, the repository interacts with the infrastructure layer, and this interface is part of the domain layer. The requestor may belong to a domain layer, or an application layer. The repository helps the system to manage the life cycle of domain objects. Factory A factory is required when a simple constructor is not enough to create the object. It helps to create complex objects, or an aggregate that involves the creation of other related objects. A factory is also a part of the life cycle of domain objects, as it is responsible for creating them. Factories and repositories are in some way related to each other, as both refer to domain objects. The factory refers to newly created objects, whereas the repository returns the already existing objects either from the memory, or from external storage. Let's see how control flows, by using a user creation process application. Let's say that a user signs up with a username user1. This user creation first interacts with the factory, which creates the name user1 and then caches it in the domain using the repository, which also stores it in the storage for persistence. When the same user logs in again, the call moves to the repository for a reference. This uses the storage to load the reference and pass it to the requestor. The requestor may then use this user1 object to book the table in a specified restaurant, and at a specified time. These values are passed as parameters, and a table booking record is created in storage using the repository: The factory may use one of the object-oriented programming patterns, such as the factory or abstract factory pattern, for object creation. Modules Modules are the best way to separate related business objects. These are best suited to large projects where the size of domain objects is bigger. For the end user, it makes sense to divide the domain model into modules and set the relationship between these modules. Once you understand the modules and their relationship, you start to see the bigger picture of the domain model, thus it's easier to drill down further and understand the model. Modules also help you to write code that is highly cohesive, or maintains low coupling. Ubiquitous language can be used to name these modules. For the table booking system, we could have different modules, such as user-management, restaurants and tables, analytics and reports, and reviews, and so on. This introduction to domain driven design should give you a strong foundation for using it when you build software. It's principles are useful - in particular, making sure you collaborate and use the same language as different stakeholders is one of domain driven design's most valuable contributions to the way we approach software development.

0
0
10438

Packt

21 Feb 2018

9 min read

API Gateway and its Need

Packt

21 Feb 2018

9 min read

In this article by Umesh R Sharma, author of the book Practical Microservices, we will cover API Gateway and its need with simple and short examples. (For more resources related to this topic, see here.) Dynamic websites show a lot on a single page, and there is a lot of information that needs to be shown on the page. The common success order summary page shows the cart detail and customer address. For this, frontend has to fire a different query to the customer detail service and order detail service. This is a very simple example of having multiple services on a single page. As a single microservice has to deal with only one concern, in result of that to show much information on page, there are many API calls on the same page. So, a website or mobile page can be very chatty in terms of displaying data on the same page. Another problem is that, sometimes, microservice talks on another protocol, then HTTP only, such as thrift call and so on. Outer consumers can't directly deal with microservice in that protocol. As a mobile screen is smaller than a web page, the result of the data required by the mobile or desktop API call is different. A developer would want to give less data to the mobile API or have different versions of the API calls for mobile and desktop. So, you could face a problem such as this: each client is calling different web services and keeping track of their web service and developers have to give backward compatibility because API URLs are embedded in clients like in mobile app. Why do we need the API Gateway? All these preceding problems can be addressed with the API Gateway in place. The API Gateway acts as a proxy between the API consumer and the API servers. To address the first problem in that scenario, there will only be one call, such as /successOrderSummary, to the API Gateway. The API Gateway, on behalf of the consumer, calls the order and user detail, then combines the result and serves to the client. So basically, it acts as a facade or API call, which may internally call many APIs. The API Gateway solves many purposes, some of which are as follows. Authentication API Gateways can take the overhead of authenticating an API call from outside. After that, all the internal calls remove security check. If the request comes from inside the VPC, it can remove the check of security, decrease the network latency a bit, and make the developer focus more on business logic than concerning about security. Different protocol Sometimes, microservice can internally use different protocols to talk to each other; it can be thrift call, TCP, UDP, RMI, SOAP, and so on. For clients, there can be only one rest-based HTTP call. Clients hit the API Gateway with the HTTP protocol and the API Gateway can make the internal call in required protocol and combine the results in the end from all web service. It can respond to the client in required protocol; in most of the cases, that protocol will be HTTP. Load-balancing The API Gateway can work as a load balancer to handle requests in the most efficient manner. It can keep a track of the request load it has sent to different nodes of a particular service. Gateway should be intelligent enough to load balances between different nodes of a particular service. With NGINX Plus coming into the picture, NGINX can be a good candidate for the API Gateway. It has many of the features to address the problem that is usually handled by the API Gateway. Request dispatching (including service discovery) One main feature of the gateway is to make less communication between client and microservcies. So, it initiates the parallel microservices if that is required by the client. From the client side, there will only be one hit. Gateway hits all the required services and waits for the results from all services. After obtaining the response from all the services, it combines the result and sends it back to the client. Reactive microservice designs can help you achieve this. Working with service discovery can give many extra features. It can mention which is the master node of service and which is the slave. Same goes for DB in case any write request can go to the master or read request can go to the slave. This is the basic rule, but users can apply so many rules on the basis of meta information provided by the API Gateway. Gateway can record the basic response time from each node of service instance. For higher priority API calls, it can be routed to the fastest responding node. Again, rules can be defined on the basis of the API Gateway you are using and how it will be implemented. Response transformation Being a first and single point of entry for all API calls, the API Gateway knows which type of client is calling a mobile, web client, or other external consumer; it can make the internal call to the client and give the data to different clients as per needs and configuration. Circuit breaker To handle the partial failure, the API Gateway uses a technique called circuit breaker pattern. A service failure in one service can cause the cascading failure in the flow to all the service calls in stack. The API Gateway can keep an eye on some threshold for any microservice. If any service passes that threshold, it marks that API as open circuit and decides not to make the call for a configured time. Hystrix (by Netflix) served this purpose efficiently. Default value in this is failing of 20 requests in 5 seconds. Developers can also mention the fall back for this open circuit. This fall back can be of dummy service. Once API starts giving results as expected, then gateway marks it as a closed service again. Pros and cons of API Gateway Using the API Gateway itself has its own pros and cons. In the previous section, we have described the advantages of using the API Gateway already. I will still try to make them in points as the pros of the API Gateway. Pros Microservice can focus on business logic Clients can get all the data in a single hit Authentication, logging, and monitoring can be handled by the API Gateway Gives flexibility to use completely independent protocols in which clients and microservice can talk It can give tailor-made results, as per the clients needs It can handle partial failure Addition to the preceding mentioned pros, some of the trade-offs are also to use this pattern. Cons It can cause performance degrade due to lots of happenings on the API Gateway With this, discovery service should be implemented Sometimes, it becomes the single point of failure Managing routing is an overhead of the pattern Adding additional network hope in the call Overall. it increases the complexity of the system Too much logic implementation in this gateway will lead to another dependency problem So, before using the API Gateway, both of the aspects should be considered. Decision of including the API Gateway in the system increases the cost as well. Before putting effort, cost, and management in this pattern, it is recommended to analysis how much you can gain from it. Example of API Gateway In this example, we will try to show only sample product pages that will fetch the data from service product detail to give information about the product. This example can be increased in many aspects. Our focus of this example is to only show how the API Gateway pattern works; so we will try to keep this example simple and small. This example will be using Zuul from Netflix as an API Gateway. Spring also had an implementation of Zuul in it, so we are creating this example with Spring Boot. For a sample API Gateway implementation, we will be using http://start.spring.io/ to generate an initial template of our code. Spring initializer is the project from Spring to help beginners generate basic Spring Boot code. A user has to set a minimum configuration and can hit the Generate Project button. If any user wants to set more specific details regarding the project, then they can see all the configuration settings by clicking on the Switch to the full version button, as shown in the following screenshot: Let's create a controller in the same package of main application class and put the following code in the file: @SpringBootApplication @RestController public class ProductDetailConrtoller { @Resource ProductDetailService pdService; @RequestMapping(value = "/product/{id}") public ProductDetail getAllProduct( @PathParam("id") String id) { return pdService.getProductDetailById(id); } } In the preceding code, there is an assumption of the pdService bean that will interact with Spring data repository for product detail and get the result for the required product ID. Another assumption is that this service is running on port 10000. Just to make sure everything is running, a hit on a URL such as http://localhost:10000/product/1 should give some JSON as response. For the API Gateway, we will create another Spring Boot application with Zuul support. Zuul can be activated by just adding a simple @EnableZuulProxy annotation. The following is a simple code to start the simple Zuul proxy: @SpringBootApplication @EnableZuulProxy public class ApiGatewayExampleInSpring { public static void main(String[] args) { SpringApplication.run(ApiGatewayExampleInSpring.class, args); } } Rest all the things are managed in configuration. In the application.properties file of the API Gateway, the content will be something as follows: zuul.routes.product.path=/product/** zuul.routes.produc.url=http://localhost:10000 ribbon.eureka.enabled=false server.port=8080 With this configuration, we are defining rules such as this: for any request for a URL such as /product/xxx, pass this request to http://localhost:10000. For outer world, it will be like http://localhost:8080/product/1, which will internally be transferred to the 10000 port. If we defined a spring.application.name variable as product in product detail microservice, then we don't need to define the URL path property here (zuul.routes.product.path=/product/** ), as Zuul, by default, will make it a URL/product. The example taken here for an API Gateway is not very intelligent, but this is a very capable API Gateway. Depending on the routes, filter, and caching defined in the Zuul's property, one can make a very powerful API Gateway. Summary In this article, you learned about the API Gateway, its need, and its pros and cons with the code example. Resources for Article: Further resources on this subject: What are Microservices? [article] Microservices and Service Oriented Architecture [article] Breaking into Microservices Architecture [article]

0
0
16057

article-image-understanding-microservices

Packt

22 Jun 2017

19 min read

Understanding Microservices

Packt

22 Jun 2017

19 min read

This article by Tarek Ziadé, author of the book Python Microservices Development explains the benefits and implementation of microservices with Python. While the microservices architecture looks more complicated than its monolithic counterpart, its advantages are multiple. It offers the following benefits. (For more resources related to this topic, see here.) Separation of concerns First of all, each microservice can be developed independently by a separate team. For instance, building a reservation service can be a full project on its own. The team in charge can make it in whatever programming language and database, as long as it has a well-documented HTTP API. That also means the evolution of the app is more under control than with monoliths. For example, if the payment system changes its underlying interactions with the bank, the impact is localized inside that service and the rest of the application stays stable and under control. This loose coupling improves a lot the overall project velocity as we're applying at the service level a similar philosophy than the single responsibility principle. The single responsibility principle was defined by Robert Martin to explain that a class should have only one reason to change - in other words, each class should be providing a single, well-defined feature. Applied to microservices, it means that we want to make sure that each microservice focuses on a single role. Smaller projects The second benefit is breaking the complexity of the project. When you are adding a feature to an application like the PDF reporting, even if you are doing it cleanly, you are making the base code bigger, more complicated and sometimes slower. Building that feature in a separate application avoids this problem, and makes it easier to write it with whatever tools you want. You can refactor it often and shorten your release cycles, and stay on the top of things. The growth of the application remains under your control. Dealing with a smaller project also reduces risks when improving the application: if a team wants to try out the latest programming language or framework, they can iterate quickly on a prototype that implements the same microservice API, try it out, and decide whether or not to stick with it. One real-life example in mind is the Firefox Sync storage microservice. There are currently some experiments to switch from the current Python+MySQL implementation to a Go based one that stores users data in standalone SQLite databases. That prototype is highly experimental, but since we have isolated the storage feature in a microservice with a well-defined HTTP API, it's easy enough to give it a try with a small subset of the user base. Scaling and deployment Last, having your application split into components makes it easier to scale depending on your constraints. Let's say you are starting to get a lot of customers that are booking hotels daily, and the PDF generation is starting to heat up the CPUs. You can deploy that specific microservice in some servers that have bigger CPUs. Another typical example is RAM-consuming microservices like the ones that are interacting with memory databases like Redis or Memcache. You could tweak your deployments consequently by deploying them on servers with less CPU and a lot more RAM. To summarize microservices benefits: A team can develop each microservice independently, and use whatever technological stack makes sense. They can define a custom release cycle. The tip of the iceberg is its language agnostic HTTP API. Developers break the application complexity into logical components. Each microservice focuses on doing one thing well. Since microservices are standalone applications, there's a finer control on deployments, which makes scaling easier. Microservices architectures are good at solving a lot of the problems that may arise once your application is starting to grow. Although, we need to be aware of some of the new issues they also bring in practice. Implementing microservices with Python Python is an amazingly versatile language. As you probably already know, it's used to build many different kinds of applications, from simple system scripts that perform tasks on a server, to large object-oriented applications that run services for millions of users. According to a study conducted by Philip Guo in 2014, published in the Association for Computing Machinery (ACM) website, Python has surpassed Java in top U.S. universities and is the most popular language to learn Computer Science. This trend is also true in the software industry. Python sits now in the top 5 languages in the TIOBE index (http://www.tiobe.com/tiobe-index/), and it's probably even bigger in the web development land since languages like C are rarely used as main languages to build web applications. However, some developers criticize Python for being slow and unfit for building efficient web services. Python is slow, and this is undeniable. But it's still is a language of choice for building microservices, and many major companies are happily using it. This section will give you some background on the different ways you can write microservices using Python, some insights on asynchronous versus synchronous programming, and conclude with some details on Python performances. It's composed of 4 parts: The WSGI standard Greenlet & Gevent Twisted & Tornado asyncio Language performances The WSGI standard What strikes the most web developers that are starting with Python is how easy it is to get a web application up and running. The Python web community has created a standard inspired from the Common Gateway Interface (CGI) called Web Server Gateway Interface (WSGI) that simplifies a lot how you can write a Python application which goal is to serve HTTP requests. When your code is using that standard, your project can be executed by standard web servers like Apache or NGinx, using WSGI extensions like uwsgi or mod_wsgi. Your application just has to deal with incoming requests and send back JSON responses, and Python includes all that goodness in its standard library. You can create a fully functional microservice that returns the server's local time with a vanilla Python module of fewer than ten lines: import JSON import time def application(environ, start_response): headers = [('Content-type', 'application/json')] start_response('200 OK', headers) return bytes(json.dumps({'time': time.time()}), 'utf8') Since its introduction, the WSGI protocol became an essential standard and the Python web community widely adopted it. Developers wrote middlewares, which are functions you can hook before or after the WSGI application function itself, to do something within the environment. Some web frameworks were created specifically around that standard, like Bottle (http://bottlepy.org) - and soon enough, every framework out there could be used through WSGI in a way or another. The biggest problem with WSGI though is its synchronous nature. The application function you see above is called exactly once per incoming request, and when the function returns, it has to send back the response. That means that every time you are calling the function, it will block until the response is ready. And writing microservices means your code will be waiting for responses from various network resources all the time. In other words, your application will idle and just block the client until everything is ready. That's an entirely okay behavior for HTTP APIs. We're not talking about building bidirectional applications like web socket based ones. But what happens when you have several incoming requests that are calling your application at the same time? WSGI servers will let you run a pool of threads to serve several requests concurrently. But you can't run thousands of them, and as soon as the pool is exhausted, the next request will be blocking even if your microservice is doing nothing but idling and waiting for backend services responses. That's one of the reasons why non-WSGI frameworks like Twisted, Tornado and in Javascript land Node.js became very successful - it's fully async. When you're coding a Twisted application, you can use callbacks to pause and resume the work done to build a response. That means you can accept new requests and start to treat them. That model dramatically reduces the idling time in your process. It can serve thousands of concurrent requests. Of course, that does not mean the application will return each single response faster. It just means one process can accept more concurrent requests and juggle between them as the data is getting ready to be sent back. There's no simple way with the WSGI standard to introduce something similar, and the community has debated for years to come up with a consensus - and failed. The odds are that the community will eventually drop the WSGI standard for something else. In the meantime, building microservices with synchronous frameworks is still possible and completely fine if your deployments take into account the one request == one thread limitation of the WSGI standard. There's, however, one trick to boost synchronous web applications: greenlets. Greenlet & Gevent The general principle of asynchronous programming is that the process deals with several concurrent execution contexts to simulate parallelism. Asynchronous applications are using an event loop that pauses and resumes execution contexts when an event is triggered - only one context is active, and they take turns. Explicit instruction in the code will tell the event loop that this is where it can pause the execution. When that occurs, the process will look for some other pending work to resume. Eventually, the process will come back to your function and continue it where it stopped - moving from an execution context to another is called switching. The Greenlet project (https://github.com/python-greenlet/greenlet) is a package based on the Stackless project, a particular CPython implementation, and provides greenlets. Greenlets are pseudo-threads that are very cheap to instantiate, unlike real threads, and that can be used to call python functions. Within those functions, you can switch and give back the control to another function. The switching is done with an event loop and allows you to write an asynchronous application using a Thread-like interface paradigm. Here's an example from the Greenlet documentation def test1(x, y): z = gr2.switch(x+y) print z def test2(u): print u gr1.switch(42) gr1 = greenlet(test1) gr2 = greenlet(test2) gr1.switch("hello", " world") The two greenlets are explicitly switching from one to the other. For building microservices based on the WSGI standard, if the underlying code was using greenlets we could accept several concurrent requests and just switch from one to another when we know a call is going to block the request - like performing a SQL query. Although, switching from one greenlet to another has to be done explicitly, and the resulting code can quickly become messy and hard to understand. That's where Gevent can become very useful. The Gevent project (http://www.gevent.org/) is built on the top of Greenlet and offers among other things an implicit and automatic way of switching between greenlets. It provides a cooperative version of the socket module that will use greenlets to automatically pause and resume the execution when some data is made available in the socket. There's even a monkey patch feature that will automatically replace the standard lib socket with Gevent's version. That makes your standard synchronous code magically asynchronous every time it uses sockets - with just one extra line. from gevent import monkey; monkey.patch_all() def application(environ, start_response): headers = [('Content-type', 'application/json')] start_response('200 OK', headers) # ...do something with sockets here... return result This implicit magic comes with a price, though. For Gevent to work well, all the underlying code needs to be compatible with the patching Gevent is doing. Some packages from the community will continue to block or even have unexpected results because of this. In particular, if they use C extensions and bypass some of the features of the standard library Gevent patched. But for most cases, it works well. Projects that are playing well with Gevent are dubbed "green," and when a library is not functioning well, and the community asks its authors to "make it green," it usually happens. That's what was used to scale the Firefox Sync service at Mozilla for instance. Twisted and Tornado If you are building microservices where increasing the number of concurrent requests you can hold is important, it's tempting to drop the WSGI standard and just use an asynchronous framework like Tornado (http://www.tornadoweb.org/) or Twisted (https://twistedmatrix.com/trac/). Twisted has been around for ages. To implement the same microservices you need to write a slightly more verbose code: import time from twisted.web import server, resource from twisted.internet import reactor, endpoints class Simple(resource.Resource): isLeaf = True def render_GET(self, request): request.responseHeaders.addRawHeader(b"content-type", b"application/json") return bytes(json.dumps({'time': time.time()}), 'utf8') site = server.Site(Simple()) endpoint = endpoints.TCP4ServerEndpoint(reactor, 8080) endpoint.listen(site) reactor.run() While Twisted is an extremely robust and efficient framework, it suffers from a few problems when building HTTP microservices: You need to implement each endpoint in your microservice with a class derived from a Resource class, and that implements each supported method. For a few simple APIs, it adds a lot of boilerplate code. Twisted code can be hard to understand & debug due to its asynchronous nature. It's easy to fall into callback hell when you're chaining too many functions that are getting triggered successively one after the other - and the code can get messy Properly testing your Twisted application is hard, and you have to use Twisted-specific unit testing model. Tornado is based on a similar model but is doing a better job in some areas. It has a lighter routing system and does everything possible to make the code closer to plain Python. Tornado is also using a callback model, so debugging can be hard. But both frameworks are working hard at bridging the gap to rely on the new async features introduced in Python 3. asyncio When Guido van Rossum started to work on adding async features in Python 3, part of the community pushed for a Gevent-like solution because it made a lot of sense to write applications in a synchronous, sequential fashion - rather than having to add explicit callbacks like in Tornado or Twisted. But Guido picked the explicit technique and experimented in a project called Tulip that Twisted inspired. Eventually, asyncio was born out of that side project and added into Python. In hindsight, implementing an explicit event loop mechanism in Python instead of going the Gevent way makes a lot of sense. The way the Python core developers coded asyncio and how they elegantly extended the language with the async and await keywords to implement coroutines, made asynchronous applications built with vanilla Python 3.5+ code look very elegant and close to synchronous programming. By doing this, Python did a great job at avoiding the callback syntax mess we sometimes see in Node.js or Twisted (Python 2) applications. And beyond coroutines, Python 3 has introduced a full set of features and helpers in the asyncio package to build asynchronous applications, see https://docs.python.org/3/library/asyncio.html. Python is now as expressive as languages like Lua to create coroutine-based applications, and there are now a few emerging frameworks that have embraced those features and will only work with Python 3.5+ to benefit from this. KeepSafe's aiohttp (http://aiohttp.readthedocs.io) is one of them, and building the same microservice, fully asynchronous, with it would simply be these few elegant lines. from aiohttp import web import time async def handle(request): return web.json_response({'time': time.time()}) if __name__ == '__main__': app = web.Application() app.router.add_get('/', handle) web.run_app(app) In this small example, we're very close to how we would implement a synchronous app. The only hint we're async is the async keyword marking the handle function as being a coroutine. And that's what's going to be used at every level of an async Python app going forward. Here's another example using aiopg - a Postgresql lib for asyncio. From the project documentation: import asyncio import aiopg dsn = 'dbname=aiopg user=aiopg password=passwd host=127.0.0.1' async def go(): pool = await aiopg.create_pool(dsn) async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute("SELECT 1") ret = [] async for row in cur: ret.append(row) assert ret == [(1,)] loop = asyncio.get_event_loop() loop.run_until_complete(go()) With a few async and await prefixes, the function that's performing a SQL query and send back the result looks a lot like a synchronous function. But asynchronous frameworks and libraries based on Python 3 are still emerging, and if you are using asyncio or a framework like aiohttp, you will need to stick with particular asynchronous implementations for each feature you need. If you require using a library that is not asynchronous in your code, using it from your asynchronous code means you will need to go through some extra and challenging work if you want to prevent blocking the event loop. If your microservices are dealing with a limited number of resources, it could be manageable. But it's probably a safer bet at this point (2017) to stick with a synchronous framework that's been around for a while rather than an asynchronous one. Let's enjoy the existing ecosystem of mature packages, and wait until the asyncio ecosystem gets more sophisticated. And there are many great synchronous frameworks to build microservices with Python, like Bottle, Pyramid with Cornice or Flask. Language performances In the previous sections we've been through the two different ways to write microservices - asynchronous vs. synchronous, and whatever technique you are using, the speed of Python is directly impacting the performance of your microservice. Of course, everyone knows Python is slower than Java or Go - but execution speed is not always the top priority. A microservice is often a thin layer of code that is sitting most of its life waiting for some network responses from other services. Its core speed is usually less important than how fast your SQL queries will take to return from your Postgres server because the latter will represent most of the time spent to build the response. But wanting an application that's as fast as possible is legitimate. One controversial topic in the Python community around speeding up the language is how the Global Interpreter Lock (GIL) mutex can ruin performances because multi-threaded applications cannot use several processes. The GIL has good reasons to exist. It protects non thread-safe parts of the CPython interpreter and exists in other languages like Ruby. And all attempts to remove it so far have failed to produce a faster CPython implementation. Larry Hasting is working on a GIL-free CPython project called Gilectomy - https://github.com/larryhastings/gilectomy - its minimal goal is to come up with a GIL-free implementation that can run a single-threaded application as fast as CPython. As of today (2017), this implementation is still slower that CPython. But it's interesting to follow this work and see if it reaches speed parity one day. That would make a GIL-free CPython very appealing. For microservices, besides preventing the usage of multiple cores in the same process, the GIL will slightly degrade performances on high load, because of the system calls overhead introduced by the mutex. Although, all the scrutiny around the GIL had one beneficial impact: some work has been done in the past years to reduce its contention in the interpreter, and in some area, Python performances have improved a lot. But bear in mind that even if the core team removes the GIL, Python is an interpreted language and the produced code will never be very efficient at execution time. Python provides the dis module if you are interested to see how the interpreter decomposes a function. In the example below, the interpreter will decompose a simple function that yields incremented values from a sequence in no less than 29 steps! >>> def myfunc(data): ... for value in data: ... yield value + 1 ... >>> import dis >>> dis.dis(myfunc) 2 0 SETUP_LOOP 23 (to 26) 3 LOAD_FAST 0 (data) 6 GET_ITER >> 7 FOR_ITER 15 (to 25) 10 STORE_FAST 1 (value) 3 13 LOAD_FAST 1 (value) 16 LOAD_CONST 1 (1) 19 BINARY_ADD 20 YIELD_VALUE 21 POP_TOP 22 JUMP_ABSOLUTE 7 >> 25 POP_BLOCK >> 26 LOAD_CONST 0 (None) 29 RETURN_VALUE A similar function written in a statically compiled language will dramatically reduce the number of operations required to produce the same result. There are ways to speed up Python execution, though. One is to write part of your code into compiled code by building C extensions or using a static extension of the language like Cython (http://cython.org/) - but that makes your code more complicated. Another solution, which is the most promising one, is by simply running your application using the PyPy interpreter (http://pypy.org/). PyPy implements a Just-In-Time compiler (JIT). This compiler is directly replacing at run time pieces of Python with machine code that can be directly used by the CPU. The whole trick for the JIT is to detect in real time, ahead of the execution, when and how to do it. Even if PyPy is always a few Python versions behind CPython, it reached a point where you can use it in production, and its performances can be quite amazing. In one of our projects at Mozilla that needs fast execution, the PyPy version was almost as fast as the Go version, and we've decided to use Python there instead. The Pypy Speed Center website is a great place to look at how PyPy compares to CPython - http://speed.pypy.org/ However, if your program uses C extensions, you will need to recompile them for PyPy, and that can be a problem. In particular, if other developers maintain some of the extensions you are using. But if you are building your microservice with a standard set of libraries, the chances are that will it work out of the box with the PyPy interpreter, so that's worth a try. In any case, for most projects, the benefits of Python and its ecosystem largely surpasses the performances issues described in this section because the overhead in a microservice is rarely a problem. Summary In this article we saw that Python is considered to be one of the best languages to write web applications, and therefore microservices - for the same reasons, it's a language of choice in other areas and also because it provides tons of mature frameworks and packages to do the work. Resources for Article: Further resources on this subject: Inbuilt Data Types in Python [article] Getting Started with Python Packages [article] Layout Management for Python GUI [article]

0
0
8281

Packt

20 Jun 2017

12 min read

What are Microservices?

Packt

20 Jun 2017

12 min read

0
0
7267

Packt

06 Apr 2017

12 min read

Hands on with Service Fabric

Packt

06 Apr 2017

12 min read

In this article by Rahul Rai and Namit Tanasseri, authors of the book Microservices with Azure, explains that Service Fabric as a platform supports multiple programming models. Each of which is best suited for specific scenarios. Each programming model offers different levels of integration with the underlying management framework. Better integration leads to more automation and lesser overheads. Picking the right programming model for your application or services is the key to efficiently utilize the capabilities of Service Fabric as a hosting platform. Let's take a deeper look into these programming models. (For more resources related to this topic, see here.) To start with, let's look at the least integrated hosting option: Guest Executables. Native windows applications or application code using Node.js or Java can be hosted on Service Fabric as a guest executable. These executables can be packaged and pushed to a Service Fabric cluster like any other services. As the cluster manager has minimal knowledge about the executable, features like custom health monitoring, load reporting, state store and endpoint registration cannot be leveraged by the hosted application. However, from a deployment standpoint, a guest executable is treated like any other service. This means that for a guest executable, Service Fabric cluster manager takes care of high availability, application lifecycle management, rolling updates, automatic failover, high density deployment and load balancing. As an orchestration service, Service Fabric is responsible for deploying and activating an application or application services within a cluster. It is also capable of deploying services within a container image. This programming model is addressed as Guest Containers. The concept of containers is best explained as an implementation of operating system level virtualization. They are encapsulated deployable components running on isolated process boundaries sharing the same kernel. Deployed applications and their runtime dependencies are bundles within the container with an isolated view of all operating system constructs. This makes containers highly portable and secure. Guest container programming model is usually chosen when this level of isolation is required for the application. As containers don't have to boot an operating system, they have fast boot up time and are comparatively small in size. A prime benefit of using Service Fabric as a platform is the fact that it supports heterogeneous operating environments. Service Fabric supports two types of containers to be deployed as guest containers: Docker containers on Linux and Windows server containers. Container images for Docker containers are stored in Docker Hub and Docker APIs are used to create and manage the containers deployed on Linux kernel. Service Fabric supports two different types of containers in Windows Server 2016 with different levels of isolation. They are: Windows Server containers and Windows Hyper-V containers Windows Server containers are similar to Docker containers in terms of the isolation they provide. Windows Hyper-V containers offer higher degree of isolation and security by not sharing the operating system kernel across instances. These are ideally used when a higher level of security isolation is required such as systems requiring hostile multitenant hosts. The following figure illustrates the different isolation levels achieved by using these containers. Container isolation levels Service Fabric application model treats containers as an application host which can in turn host service replicas. There are three ways of utilizing containers within a Service Fabric application mode. Existing applications like Node.js, JavaScript application of other executables can be hosted within a container and deployed on Service Fabric as a Guest Container. A Guest Container is treated similar to a Guest Executable by Service Fabric runtime. The second scenario supports deploying stateless services inside a container hosted on Service Fabric. Stateless services using Reliable Services and Reliable actors can be deployed within a container. The third option is to deploy stateful services in containers hosted on Service Fabric. This model also supports Reliable Services and Reliable Actors. Service Fabric offers several features to manage containerized Microservices. These include container deployment and activation, resource governance, repository authentication, port mapping, container discovery and communication and ability to set environment variables. While containers offer a good level of isolation it is still heavy in terms of deployment footprint. Service Fabric offers a simpler, powerful programming model to develop your services which they call Reliable Services. Reliable services let you develop stateful and stateless services which can be directly deployed on Service Fabric clusters. For stateful services, the state can be stored close to the compute by using Reliable Collections. High availability of the state store and replication of the state is taken care by the Service Fabric cluster management services. This contributes substantially to the performance of the system by improving the latency of data access. Reliable services come with a built-in pluggable communication model which supports HTTP with Web API, WebSockets and custom TCP protocols out of the box. A Reliable service is addressed as stateless if it does not maintain any state within it or if the scope of the state stored is limited to a service call and is entirely disposable. This means that a stateless service does not require to persist, synchronize or replicate state. A good example for this service is a weather service like MSN weather service. A weather service can be queried to retrieve weather conditions associated with a specific geographical location. The response is totally based on the parameters supplied to the service. This service does not store any state. Although stateless services are simpler to implement, most of the services in real life are not stateless. They either store state in an external state store or an internal one. Web front end hosting APIs or web applications are good use cases to be hosted as stateless services. A stateful service persists states. The outcome of a service call made to a stateful service is usually influenced by the state persisted by the service. A service exposed by a bank to return the balance on an account is a good example for a stateful service. The state may be stored in an external data store such as Azure SQL Database, Azure Blobs or Azure Table store. Most services prefer to store the state externally considering the challenges around reliability, availability, scalability and consistency of the data store. With Service Fabric, state can be stored close to the compute by using reliable collections. To makes things more lightweight, Service Fabric also offers a programming model based on Virtual actor pattern. This programming model is called Reliable Actors. The Reliable Actors programming model is built on top of Reliable Services. This guarantees the scalability and reliability of the services. An Actor can be defined as an isolated, independent unit of compute and state with single-threaded execution. Actors can be created, managed and disposed independent of each other. Large number of actors can coexist and execute at a time. Service Fabric Reliable Actors are a good fit for systems which are highly distributed and dynamic by nature. Every actor is defined as an instance of an actor type; the same way an object is an instance of a class. Each actor is uniquely identified by an actor ID. The lifetime of Service Fabric Actors is not tied to their in-memory state. As a result, Actors are automatically created the first time a request for them is made. Reliable Actor's garbage collector takes care of disposing unused Actors in memory. Now that we understand the programming models, let's take a look at how the services deployed on Service Fabric are discovered and how the communication between services takes place. Service Fabric discovery and communication An application built on top of Microservices is usually composed of multiple services, each of which runs multiple replicas. Each service is specialized in a specific task. To achieve an end to end business use case, multiple services will need to be stitched together. This requires services to communicate to each other. A simple example would be web front end service communicating with the middle tier services which in turn connects to the back end services to handle a single user request. Some of these middle tier services can also be invoked by external applications. Services deployed on Service Fabric are distributed across multiple nodes in a cluster of virtual machines. The services can move across dynamically. This distribution of services can wither be triggered by a manual action of be result of Service Fabric cluster manager re-balancing services to achieve optimal resource utilization. This makes communication a challenge as services are not tied to a particular machine. Let's understand how Service Fabric solved this challenge for its consumers. Service protocols Service Fabric, as a hosting platform for Microservices does not interfere in the implementation of the service. On top of this, it also lets services decide on the communication channels they want to open. These channels are addressed as service endpoints. During service initiation, Service Fabric provides the opportunity for the services to set up the endpoints for incoming request on any protocol or communication stack. The endpoints are defined according to common industry standards, that is IP:Port. It is possible that multiple service instances share a single host process. In which case, they either have to use different ports or a port sharing mechanism. This will ensure that every service instance is uniquely addressable. Service endpoints Service discovery Service Fabric can rebalance services deployed on a cluster as a part of orchestration activities. This can be caused by resource balancing activities, failovers, upgrades, scale outs or scale ins. This will result in change in service endpoint addresses as the services move across different virtual machines. Service distribution The Service Fabric Naming Service is responsible for abstracting this complexity from the consuming service or application. Naming service takes care of service discovery and resolution. All service instances in Services Fabric are identified by a unique URL like fabric:/MyMicroServiceApp/AppService1. This name stays constant across the lifetime of the service although the endpoint addresses which physically host the service may change. Internally, Service Fabric manages a map between the service names and the physical location where the service is hosted. This is similar to the DNS service which is used to resolve Website URLs to IP addresses. The following figure illustrates the name resolution process for a service hosted on Service Fabric: Name resolution Connections from applications external to Service Fabric Service communications to or between services hosted in Service Fabric can be categorized as internal or external. Internal communication among services hosted on Service Fabric is easily achieved using the Naming Service. External communication, originated from an application or a user outside the boundaries of Service Fabric will need some extra work. To understand how this works, let's dive deeper in to the logical network layout of a typical Service Fabric cluster. Service Fabric cluster is always placed behind an Azure Load Balancer. The Load Balancer acts like a gateway to all traffic which needs to pass to the Service Fabric cluster. The Load Balancer is aware of every post open on every node of a cluster. When a request hits the Load Balancer, it identifies the port the request is looking for and randomly routes the request to one of the nodes which has the requested port open. The Load Balancer is not aware of the services running on the nodes or the ports associated with the services. The following figure illustrates request routing in action. Request routing Configuring ports and protocols The protocol and the ports to be opened by a Service Fabric cluster can be easily configured through the portal. Let's take an example to understand the configuration in detail. If we need a web application to be hosted on a Service Fabric cluster which should have port 80 opened on HTTP to accept incoming traffic, the following steps should be performed. Configuring service manifest Once a service listening to port 80 is authored, we need to configure port 80 in the service manifest to open a listener in the service. This can be done by editing the Service Manifest.xml. <Resources> <Endpoints> <Endpoint Name="WebEndpoint" Protocol="http" Port="80" /> </Endpoints> </Resources> Configuring custom end point On the Service Fabric cluster, configure port 80 as a custom endpoint. This can be easily done through the Azure Management portal. Configuring custom port Configure Azure Load Balancer Once the cluster is configured and created, the Azure Load Balancer can be instructed to forward the traffic to port 80. If the Service Fabric cluster is created through the portal, this step is automatically taken care for every port which is configured on the cluster configuration. Configuring Azure Load Balancer Configure health check Azure Load Balancer probes the ports on the nodes for their availability to ensure reliability of the service. The probes can be configured on the Azure portal. This is an optional step as a default probe configuration is applied for each endpoint when a cluster is created. Configuring probe Built-in Communication API Service Fabric offers many built-in communication options to support inter service communications. Service Remoting is one of them. This option allows strong typed remote procedure calls between Reliable Services and Reliable Actors. This option is very easy to set up and operate with as Service Remoting handles resolution of service addresses, connection, retry and error handling. Service Fabric also supports HTTP for language-agnostic communication. Service Fabric SDK exposes ICommunicationClient and ServicePartitionClient classes for service resolution, HTTP connections, and retry loops. WCF is also supported by Service Fabric as a communication channel to enable legacy workload to be hosted on it. The SDK exposed WcfCommunicationListener for the server side and WcfCommunicationClient and ServicePartitionClient classes for the client to ease programming hurdles. Resources for Article: Further resources on this subject: Installing Neutron [article] Designing and Building a vRealize Automation 6.2 Infrastructure [article] Insight into Hyper-V Storage [article]

0
0
6094

article-image-microservices-and-service-oriented-architecture

Packt

09 Mar 2017

6 min read

Microservices and Service Oriented Architecture

Packt

09 Mar 2017

6 min read

Microservices are an architecture style and an approach for software development to satisfy modern business demands. They are not a new invention as such. They are instead an evolution of previous architecture styles. Many organizations today use them - they can improve organizational agility, speed of delivery, and ability to scale. Microservices give you a way to develop more physically separated modular applications. This tutorial has been taken from Spring 5.0 Microsevices - Second Edition Microservices are similar to conventional service-oriented architectures. In this article, we will see how microservices are related to SOA. The emergence of microservices Many organizations, such as Netflix, Amazon, and eBay, successfully used what is known as the 'divide and conquer' technique to functionally partition their monolithic applications into smaller atomic units. Each one performs a single function - a 'service'. These organizations solved a number of prevailing issues they were experiencing with their monolithic application. Following the success of these organizations, many other organizations started adopting this as a common pattern to refactor their monolithic applications. Later, evangelists termed this pattern as microservices architecture. Microservices originated from the idea of Hexagonal Architecture, coined by Alistair Cockburn back in 2005. Hexagonal Architecture or Hexagonal pattern is also known as the Ports and Adapters pattern. Cockburn defined microservices as: "...an architectural style or an approach for building IT systems as a set of business capabilities that are autonomous, self contained, and loosely coupled." The following diagram depicts a traditional N-tier application architecture having presentation layer, business layer, and database layer: Modules A, B, and C represent three different business capabilities. The layers in the diagram represent separation of architecture concerns. Each layer holds all three business capabilities pertaining to that layer. Presentation layer has web components of all three modules, business layer has business components of all three modules, and database hosts tables of all three modules. In most cases, layers are physically spreadable, whereas modules within a layer are hardwired. Let's now examine a microservice-based architecture: As we can see in the preceding diagram, the boundaries are inversed in the microservices architecture. Each vertical slice represents a microservice. Each microservice will have its own presentation layer, business layer, and database layer. Microservices is aligned toward business capabilities. By doing so, changes to one microservice do not impact the others. There is no standard for communication or transport mechanisms for microservices. In general, microservices communicate with each other using widely adopted lightweight protocols, such as HTTP and REST, or messaging protocols, such as JMS or AMQP. In specific cases, one might choose more optimized communication protocols, such as Thrift, ZeroMQ, Protocol Buffers, or Avro. As microservices is more aligned to the business capabilities and has independently manageable lifecycles, they are the ideal choice for enterprises embarking on DevOps and cloud. DevOps and cloud are two facets of microservices. How do microservices compare to Service Oriented Architectures? One of the common question arises when dealing with microservices architecture is, how is it different from SOA. SOA and microservices follow similar concepts. Earlier in this article, we saw that microservices is evolved from SOA and many service characteristics that are common in both approaches. However, are they the same or different? As microservices evolved from SOA, many characteristics of microservices is similar to SOA. Let’s first examine the definition of SOA. The Open Group definition of SOA is as follows: "SOA is an architectural style that supports service-orientation. Service-orientation is a way of thinking in terms of services and service-based development and the outcomes of services. Is self-contained May be composed of other services Is a “black box” to consumers of the service" You have learned similar aspects in microservices as well. So, in what way is microservices different? The answer is--it depends. The answer to the previous question could be yes or no, depending upon the organization and its adoption of SOA. SOA is a broader term and different organizations approached SOA differently to solve different organizational problems. The difference between microservices and SOA is in the way based on how an organization approaches SOA. In order to get clarity, a few cases will be examined here. Service oriented integration Service-oriented integration refers to a service-based integration approach used by many organizations: Many organizations would have used SOA primarily to solve their integration complexities, also known as integration spaghetti. Generally, this is termed as Service Oriented Integration (SOI). In such cases, applications communicate with each other through a common integration layer using standard protocols and message formats, such as SOAP/XML-based web services over HTTP or Java Message Service (JMS). These types of organizations focus on Enterprise Integration Patterns (EIP) to model their integration requirements. This approach strongly relies on heavyweight Enterprise Service Bus (ESB),such as TIBCO Business Works, WebSphere ESB, Oracle ESB, and the likes. Most of the ESB vendors also packed a set of related product, such as Rules Engines, Business Process Management Engines, and so on as a SOA suite. Such organization's integrations are deeply rooted into these products. They either write heavy orchestration logic in the ESB layer or business logic itself in the service bus. In both cases, all enterprise services are deployed and accessed through the ESB. These services are managed through an enterprise governance model. For such organizations, microservices is altogether different from SOA. Legacy modernization SOA is also used to build service layers on top of legacy applications which is shown in the following diagram: Another category of organizations would have used SOA in transformation projects or legacy modernization projects. In such cases, the services are built and deployed in the ESB connecting to backend systems using ESB adapters. For these organizations, microservices are different from SOA. Service oriented application Some organizations would have adopted SOA at an application level: In this approach as shown in the preceding diagram, lightweight Integration frameworks, such as Apache Camel or Spring Integration, are embedded within applications to handle service related cross-cutting capabilities, such as protocol mediation, parallel execution, orchestration, and service integration. As some of the lightweight integration frameworks had native Java object support, such applications would have even used native Plain Old Java Objects (POJO) services for integration and data exchange between services. As a result, all services have to be packaged as one monolithic web archive. Such organizations could see microservices as the next logical step of their SOA. Monolithic migration using SOA The following diagram represents Logical System Boundaries: The last possibility is transforming a monolithic application into smaller units after hitting the breaking point with the monolithic system. They would have broken the application into smaller physically deployable subsystems, similar to the Y axis scaling approach explained earlier and deployed them as web archives on web servers or as jars deployed on some home grown containers. These subsystems as service would have used web services or other lightweight protocols to exchange data between services. They would have also used SOA and service design principles to achieve this. For such organizations, they may tend to think that microservices is the same old wine in a new bottle. Further resources on this subject: Building Scalable Microservices [article] Breaking into Microservices Architecture [article] A capability model for microservices [article]

0
0
7795

article-image-building-scalable-microservices

Packt

18 Jan 2017

33 min read

Building Scalable Microservices

Packt

18 Jan 2017

33 min read

0
0
7679

article-image-testing-and-quality-control

Packt

04 Jan 2017

19 min read

Testing and Quality Control

Packt

04 Jan 2017

19 min read

In this article by Pablo Solar Vilariño and Carlos Pérez Sánchez, the author of the book, PHP Microservices, we will see the following topics: (For more resources related to this topic, see here.) Test-driven development Behavior-driven development Acceptance test-driven development Tools Test-driven development Test-Driven Development (TDD) is part of Agile philosophy, and it appears to solve the common developer's problem that shows when an application is evolving and growing, and the code is getting sick, so the developers fix the problems to make it run but every single line that we add can be a new bug or it can even break other functions. Test-driven development is a learning technique that helps the developer to learn about the domain problem of the application they are going to build, doing it in an iterative, incremental, and constructivist way: Iterative because the technique always repeats the same process to get the value Incremental because for each iteration, we have more unit tests to be used Constructivist because it is possible to test all we are developing during the process straight away, so we can get immediate feedback Also, when we finish developing each unit test or iteration, we can forget it because it will be kept from now on throughout the entire development process, helping us to remember the domain problem through the unit test; this is a good approach for forgetful developers. It is very important to understand that TDD includes four things: analysis, design, development, and testing; in other words, doing TDD is understanding the domain problem and correctly analyzing the problem, designing the application well, developing well, and testing it. It needs to be clear; TDD is not just about implementing unit tests, it is the whole process of software development. TDD perfectly matches projects based on microservices because using microservices in a large project is dividing it into little microservices or functionalities, and it is like an aggrupation of little projects connected by a communication channel. The project size is independent of using TDD because in this technique, you divide each functionality into little examples, and to do this, it does not matter if the project is big or small, and even less when our project is divided by microservices. Also, microservices are still better than a monolithic project because the functionalities for the unit tests are organized in microservices, and it will help the developers to know where they can begin using TDD. How to do TDD? Doing TDD is not difficult; we just need to follow some steps and repeat them by improving our code and checking that we did not break anything. TDD involves the following steps: Write the unit test: It needs to be the simplest and clearest test possible, and once it is done, it has to fail; this is mandatory. If it does not fail, there is something that we are not doing properly. Run the tests: If it has errors (it fails), this is the moment to develop the minimum code to pass the test, just what is necessary, do not code additional things. Once you develop the minimum code to pass the test, run the test again (step two); if it passes, go to the next step, if not then fix it and run the test again. Improve the test: If you think it is possible to improve the code you wrote, do it and run the tests again (step two). If you think it is perfect then write a new unit test (step one). To do TDD, it is necessary to write the tests before implementing the function; if the tests are written after the implementation has started, it is not TDD; it is just testing. If we start implementing the application without testing and it is finished, or if we start creating unit tests during the process, we are doing the classic testing and we are not approaching the TDD benefits. Developing the functions without prior testing, the abstract idea of the domain problem in your mind can be wrong or may even be clear at the start but during the development process it can change or the concepts can be mixed. Writing the tests after that, we are checking if all the ideas in our main were correct after we finished the implementation, so probably we have to change some methods or even whole functionalities after spend time coding. Obviously, testing is always better than not testing, but doing TDD is still better than just classic testing. Why should I use TDD? TDD is the answer to questions such as: Where shall I begin? How can I do it? How can I write code that can be modified without breaking anything? How can I know what I have to implement? The goal is not to write many unit tests without sense but to design it properly following the requirements. In TDD, we do not to think about implementing functions, but we think about good examples of functions related with the domain problem in order to remove the ambiguity created by the domain problem. In other words, by doing TDD, we should reproduce a specific function or case of use in X examples until we get the necessary examples to describe the function or task without ambiguity or misinterpretations. TDD can be the best way to document your application. Using other methodologies of software development, we start thinking about how the architecture is going to be, what pattern is going to be used, how the communication between microservices is going to be, and so on, but what happens if once we have all this planned, we realize that this is not necessary? How much time is going to pass until we realize that? How much effort and money are we going to spend? TDD defines the architecture of our application by creating little examples in many iterations until we realize what the architecture is; the examples will slowly show us the steps to follow in order to define what the best structures, patterns, or tools to use are, avoiding expenditure of resources during the firsts stages of our application. This does not mean that we are working without an architecture; obviously, we have to know if our application is going to be a website or a mobile app and use a proper framework. What is going to be the interoperability in the application? In our case it will be an application based on microservices, so it will give us support to start creating the first unit tests. The architectures that we remove are the architectures on top of the architecture, in other words, the guidelines to develop an application as always. TDD will produce an architecture without ambiguity from unit testing. TDD is not cure-all: In other words, it does not give the same results to a senior developer as to a junior developer, but it is useful for the entire team. Let's look at some advantages of using TDD: Code reuse: Creates every functionality with only the necessary code to pass the tests in the second stage (Green) and allows you to see if there are more functions using the same code structure or parts of a specific function, so it helps you to reuse the previous code you wrote. Teamwork is easier: It allows you to be confident with your team colleagues. Some architects or senior developers do not trust developers with poor experience, and they need to check their code before committing the changes, creating a bottleneck at that point, so TDD helps to trust developers with less experience. Increases communication between team colleagues: The communication is more fluent, so the team share their knowledge about the project reflected on the unit tests. Avoid overdesigning application in the first stages: As we said before, doing TDD allows you to have an overview of the application little by little, avoiding the creation of useless structures or patterns in your project, which, maybe, you will trash in the future stages. Unit tests are the best documentation: The best way to give a good point of view of a specific functionality is reading its unit test. It will help to understand how it works instead of human words. Allows discovering more use cases in the design stage: In every test you have to create, you will understand how the functionality should work better and all the possible stages that a functionality can have. Increases the feeling of a job well done: In every commit of your code, you will have the feeling that it was done properly because the rest of the unit tests passes without errors, so you will not be worried about other broken functionalities. Increases the software quality: During the step of refactoring, we spend our efforts on making the code more efficient and maintainable, checking that the whole project still works properly after the changes. TDD algorithm The technical concepts and steps to follow the TDD algorithm are easy and clear, and the proper way to make it happen improves by practicing it. There are only three steps, called red, green, and refactor: Red – Writing the unit tests It is possible to write a test even when the code is not written yet; you just need to think about whether it is possible to write a specification before implementing it. So, in this first step you should consider that the unit test you start writing is not like a unit test, but it is like an example or specification of the functionality. In TDD, this first example or specification is not immovable; in other words, the unit test can be modified in the future. Before starting to write the first unit test, it is necessary to think about how the Software Under Test (SUT) is going to be. We need to think about how the SUT code is going to be and how we would check that it works they way we want it to. The way that TDD works drives us to firstly design what is more comfortable and clear if it fits the requirements. Green – Make the code work Once the example is written, we have to code the minimum to make it pass the test; in other words, set the unit test to green. It does not matter if the code is ugly and not optimized, it will be our task in the next step and iterations. In this step, the important thing is only to write the necessary code for the requirements without unnecessary things. It does not mean writing without thinking about the functionality, but thinking about it to be efficient. It looks easy but you will realize that you will write extra code the first time. If you concentrate on this step, new questions will appear about the SUT behavior with different entries, but you should be strong and avoid writing extra code about other functionalities related to the current one. Instead of coding them, take notes to convert them into functionalities in the next iterations. Refactor – Eliminate redundancy Refactoring is not the same as rewriting code. You should be able to change the design without changing the behavior. In this step, you should remove the duplicity in your code and check if the code matches the principles of good practices, thinking about the efficiency, clarity, and future maintainability of the code. This part depends on the experience of each developer. The key to good refactoring is making it in small steps To refactor a functionality, the best way is to change a little part and then execute all the available tests; if they pass, continue with another little part, until you are happy with the obtained result. Behavior-driven development Behavior-Driven Development (BDD) is a process that broadens the TDD technique and mixes it with other design ideas and business analyses provided to the developers, in order to improve the software development. In BDD, we test the scenarios and classes’ behavior in order to meet the scenarios, which can be composed by many classes. It is very useful to use a DSL in order to have a common language to be used by the customer, project owner, business analyst, or developers. The goal is to have a ubiquitous language. What is BDD? As we said before, BDD is an AGILE technique based on TDD and ATDD, promoting the collaboration between the entire team of a project. The goal of BDD is that the entire team understands what the customer wants, and the customer knows what the rest of the team understood from their specifications. Most of the times, when a project starts, the developers don't have the same point of view as the customer, and during the development process the customer realizes that, maybe, they did not explain it or the developer did not understand it properly, so it adds more time to changing the code to meet the customer's needs. So, BDD is writing test cases in human language, using rules, or in a ubiquitous language, so the customer and developers can understand it. It also defines a DSL for the tests. How does it work? It is necessary to define the features as user stories (we will explain what this is in the ATDD section of this article) and their acceptance criteria. Once the user story is defined, we have to focus on the possible scenarios, which describe the project behavior for a concrete user or a situation using DSL. The steps are: Given [context], When [event occurs], Then [Outcome]. To sum up, the defined scenario for a user story gives the acceptance criteria to check if the feature is done. Acceptance Test-Driven Development Perhaps, the most important methodology in a project is the Acceptance Test-Driven Development (ATDD) or Story Test-Driven Development (STDD); it is TDD but on a different level. The acceptance (or customer) tests are the written criteria for a project meeting the business requirements that the customer demands. They are examples (like the examples in TDD) written by the project owner. It is the start of development for each iteration, the bridge between Scrum and agile development. In ATDD, we start the implementation of our project in a way different from the traditional methodologies. The business requirements written in human language are replaced by executables agreed upon by some team members and also the customer. It is not about replacing the whole documentation, but only a part of the requirements. The advantages of using ATDD are the following: Real examples and a common language for the entire team to understand the domain It allows identifying the domain rules properly It is possible to know if a user story is finished in each iteration The workflow works from the first steps The development does not start until the tests are defined and accepted by the team ATDD algorithm The algorithm of ATDD is like that of TDD but reaches more people than only the developers; in other words, doing ATDD, the tests of each story are written in a meeting that includes the project owners, developers, and QA technicians because the entire team must understand what is necessary to do and why it is necessary, so they can see if it is what the code should do. The ATDD cycle is depicted in the following diagram: Discuss The starting point of the ATDD algorithm is the discussion. In this first step, the business has a meeting with the customer to clarify how the application should work, and the analyst should create the user stories from that conversation. Also, they should be able to explain the conditions of satisfaction of every user story in order to be translated into examples. By the end of the meeting, the examples should be clear and concise, so we can get a list of examples of user stories in order to cover all the needs of the customer, reviewed and understood for him. Also, the entire team will have a project overview in order to understand the business value of the user story, and in case the user story is too big, it could be divided into little user stories, getting the first one for the first iteration of this process. Distill High-level acceptance tests are written by the customer and the development team. In this step, the writing of the test cases that we got from the examples in the discussion step begins, and the entire team can take part in the discussion and help clarify the information or specify the real needs of that. The tests should cover all the examples that were discovered in the discussion step, and extra tests could be added during this process bit by bit till we understand the functionality better. At the end of this step, we will obtain the necessary tests written in human language, so the entire team (including the customer) can understand what they are going to do in the next step. These tests can be used like a documentation. Develop In this step, the development of acceptance test cases is begun by the development team and the project owner. The methodology to follow in this step is the same as TDD, the developers should create a test and watch it fail (Red) and then develop the minimum amount of lines to pass (Green). Once the acceptance tests are green, this should be verified and tested to be ready to be delivered. During this process, the developers may find new scenarios that need to be added into the tests or even if it needs a large amount of work, it could be pushed to the user story. At the end of this step, we will have software that passes the acceptance tests and maybe more comprehensive tests. Demo The created functionality is shown by running the acceptance test cases and manually exploring the features of the new functionality. After the demonstration, the team discusses whether the user story was done properly and it meets the product owner's needs and decides if it can continue with the next story. Tools After knowing more about TDD and BDD, it is time to explain a few tools you can use in your development workflow. There are a lot of tools available, but we will only explain the most used ones. Composer Composer is a PHP tool used to manage software dependencies. You only need to declare the libraries needed by your project and the composer will manage them, installing and updating when necessary. This tool has only a few requirements: if you have PHP 5.3.2+, you are ready to go. In the case of a missing requirement, the composer will warn you. You could install this dependency manager on your development machine, but since we are using Docker, we are going to install it directly on our PHP-FPM containers. The installation of composer in Docker is very easy; you only need to add the following rule to the Dockerfile: RUN curl -sS https://getcomposer.org/installer | php -- --install-"dir=/usr/bin/ --filename=composer PHPUnit Another tool we need for our project is PHPUnit, a unit test framework. As before, we will be adding this tool to our PHP-FPM containers to keep our development machine clean. If you are wondering why we are not installing anything on our development machine except for Docker, the response is clear. Having everything in the containers will help you avoid any conflict with other projects and gives you the flexibility of changing versions without being too worried. Add the following RUN command to your PHP-FPM Dockerfile, and you will have the latest PHPUnit version installed and ready to use: RUN curl -sSL https://phar.phpunit.de/phpunit.phar -o "/usr/bin/phpunit && chmod +x /usr/bin/phpunit Now that we have all our requirements too, it is time to install our PHP framework and start doing some TDD stuff. Later, we will continue updating our Docker environment with new tools. We choose Lumen for our example. Please feel free to adapt all the examples to your favorite framework. Our source code will be living inside our containers, but at this point of development, we do not want immutable containers. We want every change we make to our code to be available instantaneously in our containers, so we will be using a container as a storage volume. To create a container with our source and use it as a storage volume, we only need to edit our docker-compose.yml and create one source container per each microservice, as follows: source_battle: image: nginx:stable volumes: - ../source/battle:/var/www/html command: "true" The above piece of code creates a container image named source_battle, and it stores our battle source (located at ../source/battle from the docker-compose.yml current path). Once we have our source container available, we can edit each one of our services and assign a volume. For instance, we can add the following line in our microservice_battle_fpm and microservice_battle_nginx container descriptions: volumes_from: - source_battle Our battle source will be available in our source container in the path, /var/www/html, and the remaining step to install Lumen is to do a simple composer execution. First, you need to be sure that your infrastructure is up with a simple command, as follows: $ docker-compose up The preceding command spins up our containers and outputs the log to the standard IO. Now that we are sure that everything is up and running, we need to enter in our PHP-FPM containers and install Lumen. If you need to know the names assigned to each one of your containers, you can do a $ docker ps and copy the container name. As an example, we are going to enter the battle PHP-FPM container with the following command: $ docker exec -it docker_microservice_battle_fpm_1 /bin/bash The preceding command opens an interactive shell in your container, so you can do anything you want; let's install Lumen with a single command: # cd /var/www/html && composer create-project --prefer-dist "laravel/lumen . Repeat the preceding commands for each one of your microservices. Now, you have everything ready to start doing Unit tests and coding your application. Summary In this article, you learned about test-driven development, behavior-driven development, acceptance test-driven development, and PHPUnit. Resources for Article: Further resources on this subject: Running Simpletest and PHPUnit [Article] Understanding PHP basics [Article] The Multi-Table Query Generator using phpMyAdmin and MySQL [Article]

0
0
3864

article-image-examining-encodingjson-package-go

Packt

28 Dec 2016

13 min read

Examining the encoding/json Package with Go

Packt

28 Dec 2016

13 min read

In this article by Nic Jackson, author of the book Building Microservices with Go, we will examine the encoding/json package to see just how easy Go makes it for us to use JSON objects for our requests and responses. (For more resources related to this topic, see here.) Reading and writing JSON Thanks to the encoding/json package, which is built into the standard library, encoding and decoding JSON to and from Go types is both fast and easy. It implements the simplistic Marshal and Unmarshal functions; however, if we need them, the package also provides Encoder and Decoder types, which allow us greater control when reading and writing streams of JSON data. In this section, we are going to examine both of these approaches, but first let's take a look at how simple it is to convert a standard Go struct into its corresponding JSON string. Marshalling Go structs to JSON To encode JSON data, the encoding/json package provides the Marshal function, which has the following signature: func Marshal(v interface{}) ([]byte, error) This function takes one parameter, which is of the interface type, so that's pretty much any object you can think of, since interface represents any type in Go. It returns a tuple of ([]byte, error). You will see this return style quite frequently in Go. Some languages implement a try...catch approach, which encourages an error to be thrown when an operation cannot be performed. Go suggests the (return type, error) pattern, where the error is nil when an operation succeeds. In Go, unhanded errors are a bad thing, and while the language does implement the panic and recover functions, which resemble exception handling in other languages, the situations in which you should use them are quite different (The Go Programming Language, Donovan and Kernighan). In Go, panic causes normal execution to stop, and all deferred function calls in the Go routine are executed; the program will then crash with a log message. It is generally used for unexpected errors that indicate a bug in the code, and good, robust Go code will attempt to handle these runtime exceptions and return a detailed error object back to the calling function. This pattern is exactly what is implemented with the Marshal function. In case Marshal cannot create a JSON-encoded byte array from the given object, which could be due to a runtime panic, then this is captured and an error object detailing the problem is returned to the caller. Let's try this out, expanding on our existing example. Instead of simply printing a string from our handler, let's create a simple struct for the response and return that: 10 type helloWorldResponse struct { 11 Message string 12 } In our handler, we will create an instance of this object, set the message, and then use the Marshal function to encode it to a string before returning. Let's see what that will look like: 23 func helloWorldHandler(w http.ResponseWriter, r *http.Request) { 24 response := helloWorldResponse{Message: "HelloWorld"} 25 data, err := json.Marshal(response) 26 if err != nil { 27 panic("Ooops") 28 } 29 30 fmt.Fprint(w, string(data)) 31 } Now when we rerun our program and refresh our browser, we'll see the following output rendered in valid JSON: {"Message":"Hello World"} This is awesome, but the default behavior of Marshal is to take the literal name of the field and use that as the field in the JSON output. What if I prefer to use camel case and would rather see message—could we just rename the field in our struct message? Unfortunately, we can't because in Go, lowercase properties are not exported. Marshal will ignore these and will not include them in the output. All is not lost: the encoding/json package implements struct field attributes, which allow us to change the output for the property to anything we choose. The example code is as follows: 10 type helloWorldResponse struct { 11 Message string `json:"message"` 12 } Using the struct field's tags, we can have greater control over how the output will look. In the preceding example, when we marshal this struct, the output from our server would be the following: {"message":"Hello World"} This is exactly what we want, but we can use field tags to control the output even further. We can convert object types and even ignore a field altogether if we need to: struct helloWorldResponse { // change the output field to be "message" Message string `json:"message"` // do not output this field Author string `json:"-"` // do not output the field if the value is empty Date string `json:",omitempty"` // convert output to a string and rename "id" Id int `json:"id, string"` } The channel, complex types, and functions cannot be encoded in JSON. Attempting to encode these types will result in an UnsupportedTypeError being returned by the Marshal function. It also can't represent cyclic data structures, so if your stuct contains a circular reference, then Marshal will result in an infinite recursion, which is never a good thing for a web request. If we want to export our JSON pretty formatted with indentation, we can use the MarshallIndent function, which allows you to pass an additional string parameter to specify what you would like the indent to be—two spaces, not a tab, right? func MarshalIndent(v interface{}, prefix, indent string) ([]byte, error) The astute reader might have noticed that we are decoding our struct into a byte array and then writing that to the response stream. This does not seem to be particularly efficient, and in fact, it is not. Go provides encoders and decoders, which can write directly to a stream. Since we already have a stream with the ResponseWriter interface, let's do just that. Before we do so, I think we need to look at the ResponseWriter interface a little to see what is going on there. ResponseWriter is an interface that defines three methods: // Returns the map of headers which will be sent by the // WriteHeader method. Header() // Writes the data to the connection. If WriteHeader has not // already been called then Write will call // WriteHeader(http.StatusOK). Write([]byte) (int, error) // Sends an HTTP response header with the status code. WriteHeader(int) If we have a ResponseWriter, how can we use this with fmt.Fprint(w io.Writer, a ...interface{})? This method requires a Writer interface as a parameter, and we have a ResponseWriter. If we look at the signature for Writer, we can see that it is the following: Write(p []byte) (n int, err error) Because the ResponseWriter interface implements this method, it also satisfies the Writer interface; therefore, any object that implements ResponseWriter can be passed to any function that expects Writer. Amazing! Go rocks—but we don't have an answer to our question: is there any better way to send our data to the output stream without marshalling to a temporary string before we return it? The encoding/json package has a function called NewEncoder. This returns an Encoder object, which can be used to write JSON straight to an open writer, and guess what—we have one of those: func NewEncoder(w io.Writer) *Encoder So instead of storing the output of Marshal into a byte array, we can write it straight to the HTTP response, as shown in the following code: func helloWorldHandler(w http.ResponseWriter, r *http.Request) { response := HelloWorldResponse{Message: "HelloWorld"} encoder := json.NewEncoder(w) encoder.Encode(&response) } We will look at benchmarking in a later chapter, but to see why this is important, here's a simple benchmark to check the two methods against each other; have a look at the output: go test -v -run="none" -bench=. -benchtime="5s" -benchmem testing: warning: no tests to run PASS BenchmarkHelloHandlerVariable 10000000 1211 ns/op 248 B/op 5 allocs/op BenchmarkHelloHandlerEncoder 10000000 662 ns/op 8 B/op 1 allocs/op ok github.com/nicholasjackson/building-microservices-in-go/chapter1/bench 20.650s Using the Encoder rather than marshalling to a byte array is nearly 50% faster. We are dealing with nanoseconds here, so that time may seem irrelevant, but it isn't; this was two lines of code. If you have that level of inefficiency throughout the rest of your code, your application will run slower, you will need more hardware to satisfy the load, and that will cost you money. There is nothing clever in the differences between the two methods—all we have done is understood how the standard packages work and chosen the correct option for our requirements. That is not performance tuning, that is understanding the framework. Unmarshalling JSON to Go structs Now that we have learned how we can send JSON back to the client, what if we need to read input before returning the output? We could use URL parameters, and we will see what that is all about in the next chapter, but usually, you will need more complex data structures, which include the service to accept JSON as part of an HTTP POST request. If we apply techniques similar to those we learned in the previous section (to write JSON), reading JSON is just as easy. To decode JSON into a stuct, the encoding/json package provides us with the Unmarshal function: func Unmarshal(data []byte, v interface{}) error The Unmarshal function works in the opposite way to Marshal: it allocates maps, slices, and pointers as required. Incoming object keys are matched using either the struct field name or its tag and will work with a case-insensitive match; however, an exact match is preferred. Like Marshal, Unmarshal will only set exported struct fields: those that start with an upper case letter. We start by adding a new struct to represent the request, while Unmarshal can decode the JSON into an interface{} array, which would be of one of the following types: map[string]interface{} // for JSON objects []interface{} // for JSON arrays Which type it is depends on whether our JSON is an object or an array. In my opinion, it is much clearer to the readers of our code if we explicitly state what we are expecting as a request. We can also save ourselves work by not having to manually cast the data when we come to use it. Remember two things: You do not write code for the compiler; you write code for humans to understand You will spend more time reading code than you do writing it We are going to do ourselves a favor by taking into account these two points and creating a simple struct to represent our request, which will look like this: 14 type helloWorldRequest struct { 15 Name string `json:"name"` 16 } Again, we are going to use struct field tags because while we could let Unmarshal do case-insensitive matching so that {"name": "World} would correctly unmarshal into the struct the same as {"Name": "World"}, when we specify a tag, we are being explicit about the request form, and that is a good thing. In terms of speed and performance, it is also about 10% faster, and remember: performance matters. To access the JSON sent with the request, we need to take a look at the http.Request object passed to our handler. The following listing does not show all the methods in the request, just the ones we are going to be immediately dealing with. For the full documentation, I recommend checking out the docs at https://godoc.org/net/http#Request. type Requests struct { … // Method specifies the HTTP method (GET, POST, PUT, etc.). Method string // Header contains the request header fields received by the server. The type Header is a link to map[string] []string. Header Header // Body is the request's body. Body io.ReadCloser … } The JSON that has been sent with the request is accessible in the Body field. The Body field implements the io.ReadCloser interface as a stream and does not return []byte or string data. If we need the data contained in the body, we can simply read it into a byte array, like the following example: 30 body, err := ioutil.ReadAll(r.Body) 31 if err != nil { 32 http.Error(w, "Bad request", http.StatusBadRequest) 33 return 34 } Here is something we'll need to remember: we are not calling Body.Close(); if we were making a call with a client, we would need to do this as it is not automatically closed; however, when used in a ServeHTTP handler, the server automatically closes the request stream. To see how this all works inside our handler, we can look at the following handler: 28 func helloWorldHandler(w http.ResponseWriter, r *http.Request) { 29 30 body, err := ioutil.ReadAll(r.Body) 31 if err != nil { 32 http.Error(w, "Bad request", http.StatusBadRequest) 33 return 34 } 35 36 var request HelloWorldRequest 37 err = json.Unmarshal(body, &request) 38 if err != nil { 39 http.Error(w, "Bad request", http.StatusBadRequest) 40 return 41 } 42 43 response := HelloWorldResponse{Message: "Hello " + request.Name} 44 45 encoder := json.NewEncoder(w) 46 encoder.Encode(response) 47 } Let's run this example and see how it works; to test it, we can simply use curl to send a request to the running server. If you feel more comfortable using a GUI tool, then Postman, which is available for the Google Chrome browser, will work just fine. Otherwise, feel free to use your preferred tool. $ curl localhost:8080/helloworld -d '{"name":"Nic"}' You should see the following response: {"message":"Hello Nic"} What do you think will happen if you do not include a body with your request? $ curl localhost:8080/helloworld If you guessed correctly that you would get an "HTTP status 400 Bad Request" error, then you win a prize. The following error replies to the request with the given message and status code: func Error(w ResponseWriter, error string, code int) Once we have sent this, we need to return, stopping further execution of the function as this does not close the ResponseWriter and return flow to the calling function automatically. You might think you are done, but have a go and see whether you can improve the performance of the handler. Think about the things we were talking about when marshaling JSON. Got it? Well, if not, here is the answer: again, all we are doing is using Decoder, which is the opposite of the Encoder function we used when writing JSON, as shown in the following code example. This nets an instant 33% performance increase, and with less code, too. 27 func helloWorldHandler(w http.ResponseWriter, r *http.Request) { 28 29 var request HelloWorldRequest 30 decoder := json.NewDecoder(r.Body) 31 32 err := decoder.Decode(&request) 33 if err != nil { 34 http.Error(w, "Bad request", http.StatusBadRequest) 35 return 36 } 37 38 response := HelloWorldResponse{Message: "Hello " + request.Name} 39 40 encoder := json.NewEncoder(w) 41 encoder.Encode(response) 42 } Now that you can see just how easy it is to encode and decode JSON with Go, I would recommend taking 5 minutes to spend some time digging through the documentation for the encoding/json package as there is a whole lot more than you can do with it: https://golang.org/pkg/encoding/json/ Summary In this article, we looked at encoding and decoding data using the encoding/json package. Resources for Article: Further resources on this subject: Microservices – Brave New World [article] A capability model for microservices [article] Breaking into Microservices Architecture [article]

0
0
4283

article-image-capability-model-microservices

Packt

17 Jun 2016

19 min read

A capability model for microservices

Packt

17 Jun 2016

19 min read

0
0
11442

Packt

17 Mar 2016

9 min read

Microservices – Brave New World

Packt

17 Mar 2016

9 min read

0
0
3336

article-image-how-to-build-12-factor-design-microservices-on-docker-part-2

Cody A.

29 Jun 2015

14 min read

How to Build 12 Factor Microservices on Docker - Part 2

Cody A.

29 Jun 2015

14 min read

Welcome back to our how-to on Building and Running 12 Factor Microservices on Docker. In Part 1, we introduced a very simple python flask application which displayed a list of users from a relational database. Then we walked through the first four of these factors, reworking the example application to follow these guidelines. In Part 2, we'll be introducing a multi-container Docker setup as the execution environment for our application. We’ll continue from where we left off with the next factor, number five. Build, Release, Run. A 12-factor app strictly separates the process for transforming a codebase into a deploy into distinct build, release, and run stages. The build stage creates an executable bundle from a code repo, including vendoring dependencies and compiling binaries and asset packages. The release stage combines the executable bundle created in the build with the deploy’s current config. Releases are immutable and form an append-only ledger; consequently, each release must have a unique release ID. The run stage runs the app in the execution environment by launching the app’s processes against the release. This is where your operations meet your development and where a PaaS can really shine. For now, we’re assuming that we’ll be using a Docker-based containerized deploy strategy. We’ll start by writing a simple Dockerfile. The Dockerfile starts with an ubuntu base image and then I add myself as the maintainer of this app. FROM ubuntu:14.04.2 MAINTAINER codyaray Before installing anything, let’s make sure that apt has the latest versions of all the packages. RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sources.list RUN apt-get update Install some basic tools and the requirements for running a python webapp RUN apt-get install -y tar curl wget dialog net-tools build-essential RUN apt-get install -y python python-dev python-distribute python-pip RUN apt-get install -y libmysqlclient-dev Copy over the application to the container. ADD /. /src Install the dependencies. RUN pip install -r /src/requirements.txt Finally, set the current working directory, expose the port, and set the default command. EXPOSE 5000 WORKDIR /src CMD python app.py Now, the build phase consists of building a docker image. You can build and store locally with docker build -t codyaray/12factor:0.1.0 . If you look at your local repository, you should see the new image present. $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE codyaray/12factor 0.1.0 bfb61d2bbb17 1 hour ago 454.8 MB The release phase really depends on details of the execution environment. You’ll notice that none of the configuration is stored in the image produced from the build stage; however, we need a way to build a versioned release with the full configuration as well. Ideally, the execution environment would be responsible for creating releases from the source code and configuration specific to that environment. However, if we’re working from first principles with Docker rather than a full-featured PaaS, one possibility is to build a new docker image using the one we just built as a base. Each environment would have its own set of configuration parameters and thus its own Dockerfile. It could be something as simple as FROM codyaray/12factor:0.1.0 MAINTAINER codyaray ENV DATABASE_URL mysql://sa:mypwd@mydbinstance.abcdefghijkl.us-west-2.rds.amazonaws.com/mydb This is simple enough to be programmatically generated given the environment-specific configuration and the new container version to be deployed. For the demonstration purposes, though, we’ll call the above file Dockerfile-release so it doesn’t conflict with the main application’s Dockerfile. Then we can build it with docker build -f Dockerfile-release -t codyaray/12factor-release:0.1.0.0 . The resulting built image could be stored in the environment’s registry as codyaray/12factor-release:0.1.0.0. The images in this registry would serve as the immutable ledger of releases. Notice that the version has been extended to include a fourth level which, in this instance, could represent configuration version “0” applied to source version “0.1.0”. The key here is that these configuration parameters aren’t collated into named groups (sometimes called “environments”). For example, these aren’t static files named like Dockerfile.staging or Dockerfile.dev in a centralized repo. Rather, the set of parameters is distributed so that each environment maintains its own environment mapping in some fashion. The deployment system would be setup such that a new release to the environment automatically applies the environment variables it has stored to create a new Docker image. As always, the final deploy stage depends on whether you’re using a cluster manager, scheduler, etc. If you’re using standalone Docker, then it would boil down to docker run -P -t codyaray/12factor-release:0.1.0.0 Processes. A 12-factor app is executed as one or more stateless processes which share nothing and are horizontally partitionable. All data which needs to be stored must use a stateful backing service, usually a database. This means no sticky sessions and no in-memory or local disk-based caches. These processes should never daemonize or write their own PID files; rather, they should rely on the execution environment’s process manager (such as Upstart). This factor must be considered up-front, in line with the discussions on antifragility, horizontal scaling, and overall application design. As the example app delegates all stateful persistence to a database, we’ve already succeeded on this point. However, it is good to note that a number of issues have been found using the standard ubuntu base image for Docker, one of which is its process management (or lack thereof). If you would like to use a process manager to automatically restart crashed daemons, or to notify a service registry or operations team, check out baseimage-docker. This image adds runit for process supervision and management, amongst other improvements to base ubuntu for use in Docker such as obsoleting the need for pid files. To use this new image, we have to update the Dockerfile to set the new base image and use its init system instead of running our application as the root process in the container. FROM phusion/baseimage:0.9.16 MAINTAINER codyaray RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sources.list RUN apt-get update RUN apt-get install -y tar git curl nano wget dialog net-tools build-essential RUN apt-get install -y python python-dev python-distribute python-pip RUN apt-get install -y libmysqlclient-dev ADD /. /src RUN pip install -r /src/requirements.txt EXPOSE 5000 WORKDIR /src RUN mkdir /etc/service/12factor ADD 12factor.sh /etc/service/12factor/run # Use baseimage-docker's init system. CMD ["/sbin/my_init"] Notice the file 12factor.sh that we’re now adding to /etc/service. This is how we instruct runit to run our application as a service. Let’s add the new 12factor.sh file. #!/bin/sh python /src/app.py Now the new containers we deploy will attempt to be a little more fault-tolerant by using an OS-level process manager. Port Binding. A 12-factor app must be self-contained and bind to a port specified as an environment variable. It can’t rely on the injection of a web container such as tomcat or unicorn; instead it must embed a server such as jetty or thin. The execution environment is responsible for routing requests from a public-facing hostname to the port-bound web process. This is trivial with most embedded web servers. If you’re currently using an external web server, this may require more effort to support an embedded server within your application. For the example python app (which uses the built-in flask web server), it boils down to port = int(os.environ.get("PORT", 5000)) app.run(host='0.0.0.0', port=port) Now the execution environment is free to instruct the application to listen on whatever port is available. This obviates the need for the application to tell the environment what ports must be exposed, as we’ve been required to do with Docker. Concurrency. Because a 12-factor exclusively uses stateless processes, it can scale out by adding processes. A 12-factor app can have multiple process types, such as web processes, background worker processes, or clock processes (for cron-like scheduled jobs). As each process type is scaled independently, each logical process would become its own Docker container as well. We’ve already seen building a web process; other processes are very similar. In most cases, scaling out simply means launching more instances of the container. (Its usually not desirable to scale out the clock processes, though, as they often generate events that you want to be scheduled singletons within your infrastructure.) Disposability. A 12-factor app’s processes can be started or stopped (with a SIGTERM) anytime. Thus, minimizing startup time and gracefully shutting down is very important. For example, when a web service receives a SIGTERM, it should stop listening on the HTTP port, allow in-flight requests to finish, and then exit. Similar, processes should be robust against sudden death; for example, worker processes should use a robust queuing backend. You want to ensure the web server you select can gracefully shutdown. The is one of the trickier parts of selecting a web server, at least for many of the common python http servers that I’ve tried. In theory, shutting down based on receiving a SIGTERM should be as simple as follows. import signal signal.signal(signal.SIGTERM, lambda *args: server.stop(timeout=60)) But often times, you’ll find that this will immediately kill the in-flight requests as well as closing the listening socket. You’ll want to test this thoroughly if dependable graceful shutdown is critical to your application. Dev/Prod Parity. A 12-factor app is designed to keep the gap between development and production small. Continuous deployment shrinks the amount of time that code lives in development but not production. A self-serve platform allows developers to deploy their own code in production, just like they do in their local development environments. Using the same backing services (databases, caches, queues, etc) in development as production reduces the number of subtle bugs that arise in inconsistencies between technologies or integrations. As we’re deploying this solution using fully Dockerized containers and third-party backing services, we’ve effectively achieved dev/prod parity. For local development, I use boot2docker on my Mac which provides a Docker-compatible VM to host my containers. Using boot2docker, you can start the VM and setup all the env variables automatically with boot2docker up $(boot2docker shellinit) Once you’ve initialized this VM and set the DOCKER_HOST variable to its IP address with shellinit, the docker commands given above work exactly the same for development as they do for production. Logs. Consider logs as a stream of time-ordered events collected from all running processes and backing services. A 12-factor app doesn’t concern itself with how its output is handled. Instead, it just writes its output to its `stdout` stream. The execution environment is responsible for collecting, collating, and routing this output to its final destination(s). Most logging frameworks either support logging to stderr/stdout by default or easily switching from file-based logging to one of these streams. In a 12-factor app, the execution environment is expected to capture these streams and handle them however the platform dictates. Because our app doesn’t have specific logging yet, and the only logs are from flask and already to stderr, we don’t have any application changes to make. However, we can show how an execution environment which could be used handle the logs. We’ll setup a Docker container which collects the logs from all the other docker containers on the same host. Ideally, this would then forward the logs to a centralized service such as Elasticsearch. Here we’ll demo using Fluentd to capture and collect the logs inside the log collection container; a simple configuration change would allow us to switch from writing these logs to disk as we demo here and instead send them from Fluentd to a local Elasticsearch cluster. We’ll create a Dockerfile for our new logcollector container type. For more detail, you can find a Docker fluent tutorial here. We can call this file Dockerfile-logcollector. FROM kiyoto/fluentd:0.10.56-2.1.1 MAINTAINER kiyoto@treasure-data.com RUN mkdir /etc/fluent ADD fluent.conf /etc/fluent/ CMD "/usr/local/bin/fluentd -c /etc/fluent/fluent.conf" We use an existing fluentd base image with a specific fluentd configuration. Notably this tails all the log files in /var/lib/docker/containers/<container-id>/<container-id>-json.log, adds the container ID to the log message, and then writes to JSON-formatted files inside /var/log/docker. <source> type tail path /var/lib/docker/containers/*/*-json.log pos_file /var/log/fluentd-docker.pos time_format %Y-%m-%dT%H:%M:%S tag docker.* format json </source> <match docker.var.lib.docker.containers.*.*.log> type record_reformer container_id ${tag_parts[5]} tag docker.all </match> <match docker.all> type file path /var/log/docker/*.log format json include_time_key true </match> As usual, we create a Docker image. Don’t forget to specify the logcollector Dockerfile. docker build -f Dockerfile-logcollector -t codyaray/docker-fluentd . We’ll need to mount two directories from the Docker host into this container when we launch it. Specifically, we’ll mount the directory containing the logs from all the other containers as well as the directory to which we’ll be writing the consolidated JSON logs. docker run -d -v /var/lib/docker/containers:/var/lib/docker/containers -v /var/log/docker:/var/log/docker codyaray/docker-fluentd Now if you check in the /var/log/docker directory, you’ll see the collated JSON log files. Note that this is on the docker host rather than in any container; if you’re using boot2docker, you can ssh into the docker host with boot2docker ssh and then check /var/log/docker. Admin Processes. Any admin or management tasks for a 12-factor app should be run as one-off processes within a deploy’s execution environment. This process runs against a release using the same codebase and configs as any process in that release and uses the same dependency isolation techniques as the long-running processes. This is really a feature of your app's execution environment. If you’re running a Docker-like containerized solution, this may be pretty trivial. docker run -i -t --entrypoint /bin/bash codyaray/12factor-release:0.1.0.0 The -i flag instructs docker to provide interactive session, that is, to keep the input and output ttys attached. Then we instruct docker to run the /bin/bash command instead of another 12factor app instance. This creates a new container based on the same docker image, which means we have access to all the code and configs for this release. This will drop us into a bash terminal to do whatever we want. But let’s say we want to add a new “friends” table to our database, so we wrote a migration script add_friends_table.py. We could run it as follows: docker run -i -t --entrypoint python codyaray/12factor-release:0.1.0.0 /src/add_friends_table.py As you can see, following the few simple rules specified in the 12 Factor manifesto really allows your execution environment to manage and scale your application. While this may not be the most feature-rich integration within a PaaS, it is certainly very portable with a clean separation of responsibilities between your app and its environment. Much of the tools and integration demonstrated here were a do-it-yourself container approach to the environment, which would be subsumed by an external vertically integrated PaaS such as Deis. If you’re not familiar with Deis, its one of several competitors in the open source platform-as-a-service space which allows you to run your own PaaS on a public or private cloud. Like many, Deis is inspired by Heroku. So instead of Dockerfiles, Deis uses a buildpack to transform a code repository into an executable image and a Procfile to specify an app’s processes. Finally, by default you can use a specialized git receiver to complete a deploy. Instead of having to manage separate build, release, and deploy stages yourself like we described above, deploying an app to Deis could be a simple as git push deis-prod While it can’t get much easier than this, you’re certainly trading control for simplicity. It's up to you to determine which works best for your business. Find more Docker tutorials alongside our latest releases on our dedicated Docker page. About the Author Cody A. Ray is an inquisitive, tech-savvy, entrepreneurially-spirited dude. Currently, he is a software engineer at Signal, an amazing startup in downtown Chicago, where he gets to work with a dream team that’s changing the service model underlying the Internet.

0
1
9516

How-To Tutorials - Microservices

Yuri Shkuro on Observability challenges in microservices and cloud-native applications

Why moving from a monolithic architecture to microservices is so hard, Gitlab’s Jason Plum breaks it down [KubeCon+CNC Talk]

Have Microservices killed the monolithic architecture? Maybe not!

What is domain driven design?

API Gateway and its Need

Understanding Microservices

What are Microservices?

Hands on with Service Fabric

Microservices and Service Oriented Architecture

Building Scalable Microservices

Trending Topics

Testing and Quality Control

Examining the encoding/json Package with Go

A capability model for microservices

Microservices – Brave New World

How to Build 12 Factor Microservices on Docker - Part 2